Enforcing Node Allocatable, General Guidelines, and Example Scenario

availability on the node drops below the reserved value. Hypothetically, if system daemons did not exist on a node, pods cannot use more than `capacity - eviction-hard`. For this reason, resources reserved for evictions are not available for pods. ### Enforcing Node Allocatable **KubeletConfiguration setting**: `enforceNodeAllocatable: [pods]`. Example value: `[pods,system-reserved,kube-reserved]` The scheduler treats 'Allocatable' as the available `capacity` for pods. `kubelet` enforce 'Allocatable' across pods by default. Enforcement is performed by evicting pods whenever the overall usage across all pods exceeds 'Allocatable'. More details on eviction policy can be found on the [node pressure eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/) page. This enforcement is controlled by specifying `pods` value to the KubeletConfiguration setting `enforceNodeAllocatable`. Optionally, `kubelet` can be made to enforce `kubeReserved` and `systemReserved` by specifying `kube-reserved` & `system-reserved` values in the same setting. Note that to enforce `kubeReserved` or `systemReserved`, `kubeReservedCgroup` or `systemReservedCgroup` needs to be specified respectively. ## General Guidelines System daemons are expected to be treated similar to [Guaranteed pods](/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed). System daemons can burst within their bounding control groups and this behavior needs to be managed as part of kubernetes deployments. For example, `kubelet` should have its own control group and share `kubeReserved` resources with the container runtime. However, Kubelet cannot burst and use up all available Node resources if `kubeReserved` is enforced. Be extra careful while enforcing `systemReserved` reservation since it can lead to critical system services being CPU starved, OOM killed, or unable to fork on the node. The recommendation is to enforce `systemReserved` only if a user has profiled their nodes exhaustively to come up with precise estimates and is confident in their ability to recover if any process in that group is oom-killed. * To begin with enforce 'Allocatable' on `pods`. * Once adequate monitoring and alerting is in place to track kube system daemons, attempt to enforce `kubeReserved` based on usage heuristics. * If absolutely necessary, enforce `systemReserved` over time. The resource requirements of kube system daemons may grow over time as more and more features are added. Over time, kubernetes project will attempt to bring down utilization of node system daemons, but that is not a priority as of now. So expect a drop in `Allocatable` capacity in future releases.  ## Example Scenario Here is an example to illustrate Node Allocatable computation: * Node has `32Gi` of `memory`, `16 CPUs` and `100Gi` of `Storage` * `kubeReserved` is set to `{cpu: 1000m, memory: 2Gi, ephemeral-storage: 1Gi}` * `systemReserved` is set to `{cpu: 500m, memory: 1Gi, ephemeral-storage: 1Gi}` * `evictionHard` is set to `{memory.available: "<500Mi", nodefs.available: "<10%"}` Under this scenario, 'Allocatable' will be 14.5 CPUs, 28.5Gi of memory and `88Gi` of local storage. Scheduler ensures that the total memory `requests` across all pods on this node does not exceed 28.5Gi and storage doesn't exceed 88Gi. Kubelet evicts pods whenever the overall memory usage across pods exceeds 28.5Gi, or if overall disk usage exceeds 88Gi. If all processes on the node consume as much CPU as they can, pods together cannot consume more than 14.5 CPUs. If `kubeReserved` and/or `systemReserved` is not enforced and system daemons exceed their reservation, `kubelet` evicts pods whenever the overall node memory usage is higher than 31.5Gi or `storage` is greater than 90Gi.

This section explains how the kubelet enforces 'Allocatable' resources for pods and how to optionally enforce `kubeReserved` and `systemReserved`. It provides guidelines on how to gradually enforce resource reservations, starting with pods and potentially moving to kube-reserved and system-reserved. The text also cautions about the risks of over-constraining system-reserved resources. Finally, it illustrates Node Allocatable computation with an example scenario, showing how `kubeReserved`, `systemReserved`, and `evictionHard` settings affect the available resources for pods.