Home

CGroup Practical Experiments

A cgroup (control group) is a Linux kernel feature that organises processes into hierarchical groups, enabling the management (limiting, accounting and isolating) of system resources such as CPU, memory, disk I/O, and network bandwidth.

Each cgroup is associated with a set of limits or parameters, which are enforced by the kernel through subsystems (sometimes also known as resource controllers, or simply controllers).

All cgroup functionalities are accessed through the cgroup filesystem (/sys/fs/cgroup). This is a virtual filesystem with special files that act as the interface for creating, removing, or altering cgroups.

We have all heard of the above CGroup theory many many times, but how does it work exactly? Well, better to learn by getting your hands dirty than just reading theory. This blog will guide you through some practical experiments!

Experiments

Prerequisites

Before starting experiments, install following tools:

apt install -y cgroup-tools iperf3

BTW, to see available cgroup subsystems on your Linux OS, run:

cat /proc/cgroups
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	0	50	1
cpu	0	50	1
cpuacct	0	50	1
blkio	0	50	1
memory	0	50	1
devices	0	50	1
freezer	0	50	1
net_cls	0	50	1
perf_event	0	50	1
net_prio	0	50	1
hugetlb	0	50	1
pids	0	50	1
rdma	0	50	1
misc	0	50	1

Experiment 1: Memory Limitation

Let's create a simple program that continuously allocates memory:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main() {
  // Allocated times
  size_t times = 0;
  // 1 MiB
  size_t blockSize = 1024 * 1024;

  while (1) {
    int *ptr = (int *) malloc(blockSize);

    // This is necessary or the actual memory won't be allocated.
    for (int j = 0; j < blockSize/sizeof(int); ++j) {
      ptr[j] = j;
    }

    ++times;
    printf("\rMemory allocated: %ldMiB", times);
    fflush(stdout);
    usleep(2000);
  }
  return 0;
}

Compile this program and run it without cgroup first to see how much memory it can consume:

Memory allocated: 1761MiBzsh: killed     ./a.out

We can see it is killed by OS after allocating around 1.7 GiB memory.

Now run the executable under a memory-limited cgroup:

# Create the memory cgroup
cgcreate -g memory:/my_mem

# The unit is actually 500MiB
echo 500M > /sys/fs/cgroup/my_mem/memory.max

# Run the program within the cgroup
cgexec -g memory:/my_mem ./a.out

The output:

Memory allocated: 496 MiBzsh: killed     cgexec -g memory:/my_mem ./a.out

It is killed after allocating approximately 500 MiB memory, limited by cgroup as expected.

Experiment 2: Network Throttling with net_cls

Kernel module net_cls (Network Classifier CGroup) may not be enabled by default. To enable it:

mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -onet_cls net_cls /sys/fs/cgroup/net_cls

Egress Limiting

Steps:

  1. Use net_cls to tag network packages
  2. Use tc to control the traffic

Limitations for the following method:

First create a new net_cls cgroup:

cgcreate -g net_cls:limited_bw

Assign a class ID to the cgroup (class ID format is 0xAAAABBBB, where AAAA is the major number and BBBB is the minor number):

# Set a a 10:1 handle
cgset -r net_cls.classid=0x00100001 limited_bw
# Alternatively
echo 0x00100001 > /sys/fs/cgroup/net_cls/limited_bw/net_cls.classid

# Check the value, 0x00100001 equals 1048577
cat /sys/fs/cgroup/net_cls/limited_bw/net_cls.classid
1048577

Then control traffic using tc:

  1. Add a queueing discipline HTB (Hierarchy Token Bucket) to shape outbound traffic:

    tc qdisc add dev enp1s0 root handle 10: htb
    
    • qdisc: queueing discipline
    • 10: is the qdisc-id in format <major>: (also called handle), this id should be consistent with the cgroup class ID major number
  2. Create a traffic class with limit:

    tc class add dev enp1s0 parent 10: classid 10:1 htb rate 1mbps
    
    • 10:1 is the class-id in format <major>:<minor>, the major number must be consistent with parent handle, the minor number must be consistent with minor number of the cgroup
    • 1mbps means 1 mega-bytes per second
  3. Add a filter to associate traffic from the cgroup with this class

    tc filter add dev enp1s0 parent 10: handle 1: cgroup
    
    • 1: is the filter-id, don't need to be consistent with the cgroup class ID minor number (tested)

Starting a new process to test:

cgexec -g net_cls:limited_bw iperf3 -s

# Need to test on another machine
iperf3 -c <server_ip> -R

For example, the result in my LAN environment:

❯ iperf3 -c 10.10.0.141 -R
Connecting to host 10.10.0.141, port 5201
Reverse mode, remote host 10.10.0.141 is sending
[  5] local 10.10.0.45 port 57032 connected to 10.10.0.141 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  1.12 MBytes  9.39 Mbits/sec
[  5]   1.01-2.00   sec  1.12 MBytes  9.46 Mbits/sec
[  5]   2.00-3.01   sec  1.12 MBytes  9.42 Mbits/sec
[  5]   3.01-4.00   sec  1.12 MBytes  9.46 Mbits/sec
[  5]   4.00-5.01   sec  1.12 MBytes  9.41 Mbits/sec
[  5]   5.01-6.00   sec  1.12 MBytes  9.48 Mbits/sec
[  5]   6.00-7.00   sec  1.12 MBytes  9.41 Mbits/sec
[  5]   7.00-8.00   sec  1.12 MBytes  9.47 Mbits/sec
[  5]   8.00-9.00   sec  1.25 MBytes  10.5 Mbits/sec
[  5]   9.00-10.00  sec  1.12 MBytes  9.44 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.1 MBytes  10.2 Mbits/sec    0            sender
[  5]   0.00-10.00  sec  11.4 MBytes  9.54 Mbits/sec                  receiver

iperf Done.

As a comparison, the speed result without cgroup limit:

❯ iperf3 -c 10.10.0.141 -R
Connecting to host 10.10.0.141, port 5201
Reverse mode, remote host 10.10.0.141 is sending
[  5] local 10.10.0.45 port 56763 connected to 10.10.0.141 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  69.5 MBytes   580 Mbits/sec
[  5]   1.01-2.00   sec  75.8 MBytes   638 Mbits/sec
[  5]   2.00-3.00   sec  71.9 MBytes   601 Mbits/sec
[  5]   3.00-4.00   sec  76.5 MBytes   642 Mbits/sec
[  5]   4.00-5.00   sec  76.0 MBytes   637 Mbits/sec
[  5]   5.00-6.01   sec  76.9 MBytes   644 Mbits/sec
[  5]   6.01-7.00   sec  76.2 MBytes   643 Mbits/sec
[  5]   7.00-8.00   sec  76.0 MBytes   635 Mbits/sec
[  5]   8.00-9.00   sec  76.1 MBytes   641 Mbits/sec
[  5]   9.00-10.00  sec  72.8 MBytes   609 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   751 MBytes   629 Mbits/sec  667            sender
[  5]   0.00-10.00  sec   748 MBytes   627 Mbits/sec                  receiver

iperf Done.

Alternatively, apply to an existing process:

cgclassify -g net_cls:limited_bw ${pid}

For convenience, here is a helper script.

References