High-Performance Metric Collection Strategies and Collecting Metrics on Container Exit

the cgroup of an in-container process whose network usage you want to measure. From there, you can examine the pseudo-file named `tasks`, which contains all the PIDs in the cgroup (and thus, in the container). Pick any one of the PIDs. Putting everything together, if the "short ID" of a container is held in the environment variable `$CID`, then you can do this: ```console $ TASKS=/sys/fs/cgroup/devices/docker/$CID*/tasks $ PID=$(head -n 1 $TASKS) $ mkdir -p /var/run/netns $ ln -sf /proc/$PID/ns/net /var/run/netns/$CID $ ip netns exec $CID netstat -i ``` ## Tips for high-performance metric collection Running a new process each time you want to update metrics is (relatively) expensive. If you want to collect metrics at high resolutions, and/or over a large number of containers (think 1000 containers on a single host), you don't want to fork a new process each time. Here is how to collect metrics from a single process. You need to write your metric collector in C (or any language that lets you do low-level system calls). You need to use a special system call, `setns()`, which lets the current process enter any arbitrary namespace. It requires, however, an open file descriptor to the namespace pseudo-file (remember: that's the pseudo-file in `/proc/<pid>/ns/net`). However, there is a catch: you must not keep this file descriptor open. If you do, when the last process of the control group exits, the namespace isn't destroyed, and its network resources (like the virtual interface of the container) stays around forever (or until you close that file descriptor). The right approach would be to keep track of the first PID of each container, and re-open the namespace pseudo-file each time. ## Collect metrics when a container exits Sometimes, you don't care about real time metric collection, but when a container exits, you want to know how much CPU, memory, etc. it has used. Docker makes this difficult because it relies on `lxc-start`, which carefully cleans up after itself. It is usually easier to collect metrics at regular intervals, and this is the way the `collectd` LXC plugin works. But, if you'd still like to gather the stats when a container stops, here is how: For each container, start a collection process, and move it to the control groups that you want to monitor by writing its PID to the tasks file of the cgroup. The collection process should periodically re-read the tasks file to check if it's the last process of the control group. (If you also want to collect network statistics as explained in the previous section, you should also move the process to the appropriate network namespace.) When the container exits, `lxc-start` attempts to delete the control groups. It fails, since the control group is still in use; but that's fine. Your process should now detect that it is the only one remaining in the group. Now is the right time to collect all the metrics you need! Finally, your process should move itself back to the root control group, and remove the container control group. To remove a control group, just `rmdir` its directory. It's counter-intuitive to `rmdir` a directory as it still contains files; but remember that this is a pseudo-filesystem, so usual rules don't apply. After the cleanup is done, the collection process can exit safely.

This section details advanced techniques for metric collection. It describes how to use `setns()` system call in C (or other languages with low-level system call access) to enter a container's namespace from a single process for efficient metric gathering, emphasizing the importance of not keeping the namespace file descriptor open to avoid resource leaks. Additionally, it outlines a method for collecting metrics when a container exits, involving starting a collection process, moving it to the container's control groups, and detecting when it's the last process in the group to collect final statistics before cleaning up the control group.