CGroup Practical Experiments
A cgroup (control group) is a Linux kernel feature that organises processes into hierarchical groups, enabling the management (limiting, accounting and isolating) of system resources such as CPU, memory, disk I/O, and network bandwidth.
Each cgroup is associated with a set of limits or parameters, which are enforced by the kernel through subsystems (sometimes also known as resource controllers, or simply controllers).
All cgroup functionalities are accessed through the cgroup filesystem
(/sys/fs/cgroup). This is a virtual filesystem with special files that
act as the interface for creating, removing, or altering cgroups.
We have all heard of the above CGroup theory many many times, but how does it work exactly? Well, better to learn by getting your hands dirty than just reading theory. This blog will guide you through some practical experiments!
Experiments
Prerequisites
Before starting experiments, install following tools:
apt install -y cgroup-tools iperf3
BTW, to see available cgroup subsystems on your Linux OS, run:
❯ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 0 50 1
cpu 0 50 1
cpuacct 0 50 1
blkio 0 50 1
memory 0 50 1
devices 0 50 1
freezer 0 50 1
net_cls 0 50 1
perf_event 0 50 1
net_prio 0 50 1
hugetlb 0 50 1
pids 0 50 1
rdma 0 50 1
misc 0 50 1
Experiment 1: Memory Limitation
Let's create a simple program that continuously allocates memory:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main() {
// Allocated times
size_t times = 0;
// 1 MiB
size_t blockSize = 1024 * 1024;
while (1) {
int *ptr = (int *) malloc(blockSize);
// This is necessary or the actual memory won't be allocated.
for (int j = 0; j < blockSize/sizeof(int); ++j) {
ptr[j] = j;
}
++times;
printf("\rMemory allocated: %ldMiB", times);
fflush(stdout);
usleep(2000);
}
return 0;
}
Compile this program and run it without cgroup first to see how much memory it can consume:
Memory allocated: 1761MiBzsh: killed ./a.out
We can see it is killed by OS after allocating around 1.7 GiB memory.
Now run the executable under a memory-limited cgroup:
# Create the memory cgroup
cgcreate -g memory:/my_mem
# The unit is actually 500MiB
echo 500M > /sys/fs/cgroup/my_mem/memory.max
# Run the program within the cgroup
cgexec -g memory:/my_mem ./a.out
The output:
Memory allocated: 496 MiBzsh: killed cgexec -g memory:/my_mem ./a.out
It is killed after allocating approximately 500 MiB memory, limited by cgroup as expected.
Experiment 2: Network Throttling with net_cls
Kernel module net_cls (Network Classifier CGroup) may not be enabled by default. To enable it:
mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -onet_cls net_cls /sys/fs/cgroup/net_cls
Egress Limiting
Steps:
- Use net_cls to tag network packages
- Use
tcto control the traffic
Limitations for the following method:
- Does not work for WiFi interface (the reason has not been investigated)
- If you want to control both ingress and egress, consider using virtual interfaces further
- Only one QDISC can be added to a network interface
First create a new net_cls cgroup:
cgcreate -g net_cls:limited_bw
Assign a class ID to the cgroup (class ID format is 0xAAAABBBB,
where AAAA is the major number and BBBB is the minor number):
# Set a a 10:1 handle
cgset -r net_cls.classid=0x00100001 limited_bw
# Alternatively
echo 0x00100001 > /sys/fs/cgroup/net_cls/limited_bw/net_cls.classid
# Check the value, 0x00100001 equals 1048577
cat /sys/fs/cgroup/net_cls/limited_bw/net_cls.classid
1048577
Then control traffic using tc:
-
Add a queueing discipline HTB (Hierarchy Token Bucket) to shape outbound traffic:
tc qdisc add dev enp1s0 root handle 10: htbqdisc: queueing discipline10:is the qdisc-id in format<major>:(also called handle), this id should be consistent with the cgroup class ID major number
-
Create a traffic class with limit:
tc class add dev enp1s0 parent 10: classid 10:1 htb rate 1mbps10:1is the class-id in format<major>:<minor>, the major number must be consistent with parent handle, the minor number must be consistent with minor number of the cgroup1mbpsmeans 1 mega-bytes per second
-
Add a filter to associate traffic from the cgroup with this class
tc filter add dev enp1s0 parent 10: handle 1: cgroup1:is the filter-id, don't need to be consistent with the cgroup class ID minor number (tested)
Starting a new process to test:
cgexec -g net_cls:limited_bw iperf3 -s
# Need to test on another machine
iperf3 -c <server_ip> -R
For example, the result in my LAN environment:
❯ iperf3 -c 10.10.0.141 -R
Connecting to host 10.10.0.141, port 5201
Reverse mode, remote host 10.10.0.141 is sending
[ 5] local 10.10.0.45 port 57032 connected to 10.10.0.141 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.01 sec 1.12 MBytes 9.39 Mbits/sec
[ 5] 1.01-2.00 sec 1.12 MBytes 9.46 Mbits/sec
[ 5] 2.00-3.01 sec 1.12 MBytes 9.42 Mbits/sec
[ 5] 3.01-4.00 sec 1.12 MBytes 9.46 Mbits/sec
[ 5] 4.00-5.01 sec 1.12 MBytes 9.41 Mbits/sec
[ 5] 5.01-6.00 sec 1.12 MBytes 9.48 Mbits/sec
[ 5] 6.00-7.00 sec 1.12 MBytes 9.41 Mbits/sec
[ 5] 7.00-8.00 sec 1.12 MBytes 9.47 Mbits/sec
[ 5] 8.00-9.00 sec 1.25 MBytes 10.5 Mbits/sec
[ 5] 9.00-10.00 sec 1.12 MBytes 9.44 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 12.1 MBytes 10.2 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 11.4 MBytes 9.54 Mbits/sec receiver
iperf Done.
As a comparison, the speed result without cgroup limit:
❯ iperf3 -c 10.10.0.141 -R
Connecting to host 10.10.0.141, port 5201
Reverse mode, remote host 10.10.0.141 is sending
[ 5] local 10.10.0.45 port 56763 connected to 10.10.0.141 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.01 sec 69.5 MBytes 580 Mbits/sec
[ 5] 1.01-2.00 sec 75.8 MBytes 638 Mbits/sec
[ 5] 2.00-3.00 sec 71.9 MBytes 601 Mbits/sec
[ 5] 3.00-4.00 sec 76.5 MBytes 642 Mbits/sec
[ 5] 4.00-5.00 sec 76.0 MBytes 637 Mbits/sec
[ 5] 5.00-6.01 sec 76.9 MBytes 644 Mbits/sec
[ 5] 6.01-7.00 sec 76.2 MBytes 643 Mbits/sec
[ 5] 7.00-8.00 sec 76.0 MBytes 635 Mbits/sec
[ 5] 8.00-9.00 sec 76.1 MBytes 641 Mbits/sec
[ 5] 9.00-10.00 sec 72.8 MBytes 609 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 751 MBytes 629 Mbits/sec 667 sender
[ 5] 0.00-10.00 sec 748 MBytes 627 Mbits/sec receiver
iperf Done.
Alternatively, apply to an existing process:
cgclassify -g net_cls:limited_bw ${pid}
For convenience, here is a helper script.