Enhanced Statistics with BPF, Application-Applied LSM, and Landlock

Generally, such statistics are collected by periodically checking the relevant file in Linux's `/sys` directory or using Netlink. By using BPF programs attached to cgroups for this, we can get much more detailed statistics: for example, how many packets/bytes on tcp port 443, or how many packets/bytes from IP 10.2.3.4. In general, because BPF programs have a kernel context, they can safely and efficiently deliver more detailed information to user space. To explore the idea, the Kinvolk team implemented a proof-of-concept: [https://github.com/kinvolk/cgnet](https://github.com/kinvolk/cgnet). This project attaches a BPF program to each cgroup and exports the information to [Prometheus](https://prometheus.io/). There are of course other interesting possibilities, like doing actual packet filtering. But the obstacle currently standing in the way of this is having cgroup v2 support—required by cgroup-bpf—in [Docker](https://github.com/opencontainers/runc/issues/654) and Kubernetes. ## Application-applied LSM [Linux Security Modules](https://en.wikipedia.org/wiki/Linux_Security_Modules) (LSM) implements a generic framework for security policies in the Linux kernel. [SELinux](https://wiki.centos.org/HowTos/SELinux) and [AppArmor](https://wiki.ubuntu.com/AppArmor) are examples of these. Both of these implement rules at a system-global scope, placing the onus on the administrator to configure the security policies. [Landlock](https://landlock.io/) is another LSM under development that would co-exist with SELinux and AppArmor. An initial patchset has been submitted to the Linux kernel and is in an early stage of development. The main difference with other LSMs is that Landlock is designed to allow unprivileged applications to build their own sandbox, effectively restricting themselves instead of using a global configuration. With Landlock, an application can load a BPF program and have it executed when the process performs a specific action. For example, when the application opens a file with the open() system call, the kernel will execute the BPF program, and, depending on what the BPF program returns, the action will be accepted or denied. In some ways, it is similar to seccomp-bpf: using a BPF program, seccomp-bpf allows unprivileged processes to restrict what system calls they can perform. Landlock will be more powerful and provide more flexibility. Consider the following system call: ``` C fd = open(“myfile.txt”, O\_RDWR); ``` The first argument is a “char \*”, a pointer to a memory address, such as `0xab004718`. With seccomp, a BPF program only has access to the parameters of the syscall but cannot dereference the pointers, making it impossible to make security decisions based on a file. seccomp also uses classic BPF, meaning it cannot make use of eBPF maps, the mechanism for interfacing with user space. This restriction means security policies cannot be changed in seccomp-bpf based on a configuration in an eBPF map. BPF programs with Landlock don’t receive the arguments of the syscalls but a reference to a kernel object. In the example above, this means it will have a reference to the file, so it does not need to dereference a pointer, consider relative paths, or perform chroots.

The text discusses how BPF programs attached to cgroups can provide detailed network statistics in Kubernetes, highlighting the Kinvolk cgnet project. It then introduces Linux Security Modules (LSM) and Landlock, an LSM under development that allows unprivileged applications to create their own sandboxes. Landlock is compared to seccomp-bpf, noting Landlock's ability to reference kernel objects and make more informed security decisions based on file access, overcoming seccomp-bpf's limitations in dereferencing pointers and using eBPF maps.