Kube Reserved, System Reserved, and Explicitly Reserved CPU List

`kubeReserved` is meant to capture resource reservation for kubernetes system daemons like the `kubelet`, `container runtime`, etc. It is not meant to reserve resources for system daemons that are run as pods. `kubeReserved` is typically a function of `pod density` on the nodes. In addition to `cpu`, `memory`, and `ephemeral-storage`, `pid` may be specified to reserve the specified number of process IDs for kubernetes system daemons. To optionally enforce `kubeReserved` on kubernetes system daemons, specify the parent control group for kube daemons as the value for `kubeReservedCgroup` setting, and [add `kube-reserved` to `enforceNodeAllocatable`](#enforcing-node-allocatable). It is recommended that the kubernetes system daemons are placed under a top level control group (`runtime.slice` on systemd machines for example). Each system daemon should ideally run within its own child control group. Refer to [the design proposal](https://git.k8s.io/design-proposals-archive/node/node-allocatable.md#recommended-cgroups-setup) for more details on recommended control group hierarchy. Note that Kubelet **does not** create `kubeReservedCgroup` if it doesn't exist. The kubelet will fail to start if an invalid cgroup is specified. With `systemd` cgroup driver, you should follow a specific pattern for the name of the cgroup you define: the name should be the value you set for `kubeReservedCgroup`, with `.slice` appended. ### System Reserved - **KubeletConfiguration Setting**: `systemReserved: {}`. Example value `{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}` - **KubeletConfiguration Setting**: `systemReservedCgroup: ""` `systemReserved` is meant to capture resource reservation for OS system daemons like `sshd`, `udev`, etc. `systemReserved` should reserve `memory` for the `kernel` too since `kernel` memory is not accounted to pods in Kubernetes at this time. Reserving resources for user login sessions is also recommended (`user.slice` in systemd world). In addition to `cpu`, `memory`, and `ephemeral-storage`, `pid` may be specified to reserve the specified number of process IDs for OS system daemons. To optionally enforce `systemReserved` on system daemons, specify the parent control group for OS system daemons as the value for `systemReservedCgroup` setting, and [add `system-reserved` to `enforceNodeAllocatable`](#enforcing-node-allocatable). It is recommended that the OS system daemons are placed under a top level control group (`system.slice` on systemd machines for example). Note that `kubelet` **does not** create `systemReservedCgroup` if it doesn't exist. `kubelet` will fail if an invalid cgroup is specified. With `systemd` cgroup driver, you should follow a specific pattern for the name of the cgroup you define: the name should be the value you set for `systemReservedCgroup`, with `.slice` appended. ### Explicitly Reserved CPU List {{< feature-state for_k8s_version="v1.17" state="stable" >}} **KubeletConfiguration Setting**: `reservedSystemCPUs:`. Example value `0-3` `reservedSystemCPUs` is meant to define an explicit CPU set for OS system daemons and kubernetes system daemons. `reservedSystemCPUs` is for systems that do not intend to define separate top level cgroups for OS system daemons and kubernetes system daemons with regard to cpuset resource. If the Kubelet **does not** have `kubeReservedCgroup` and `systemReservedCgroup`, the explicit cpuset provided by `reservedSystemCPUs` will take precedence over the CPUs defined by `kubeReservedCgroup` and `systemReservedCgroup` options. This option is specifically designed for Telco/NFV use cases where uncontrolled interrupts/timers may impact the workload performance. you can use this option to define the explicit cpuset for the system/kubernetes daemons as well as the interrupts/timers, so the rest CPUs on the system can be used exclusively for workloads, with less impact from uncontrolled interrupts/timers. To move the system daemon, kubernetes daemons and interrupts/timers to the explicit cpuset

This section details how to reserve resources for Kubernetes system daemons (using `kubeReserved`), OS system daemons (using `systemReserved`), and how to define an explicit CPU set for these daemons using `reservedSystemCPUs`. It explains the configuration settings for each, including specifying control groups (`kubeReservedCgroup`, `systemReservedCgroup`) and the importance of following specific naming patterns with the `systemd` cgroup driver. The `reservedSystemCPUs` option is highlighted for Telco/NFV use cases to minimize interrupt impact on workloads.