Home Explore Blog CI



kubernetes

5th chunk of `content/en/docs/tasks/administer-cluster/memory-manager.md`
d5ab8cc51d887bfeb37a3d62044913e602ea76759a6697200000000100000e55
For pods in a QoS class other than `Guaranteed`, the Memory Manager provides default topology hints
to the Topology Manager.

The following excerpts from pod manifests assign a pod to the `Guaranteed` QoS class.

Pod with integer CPU(s) runs in the `Guaranteed` QoS class, when `requests` are equal to `limits`:

```yaml
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"
      requests:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"
```

Also, a pod sharing CPU(s) runs in the `Guaranteed` QoS class, when `requests` are equal to `limits`.

```yaml
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "300m"
        example.com/device: "1"
      requests:
        memory: "200Mi"
        cpu: "300m"
        example.com/device: "1"
```

Notice that both CPU and memory requests must be specified for a Pod to lend it to Guaranteed QoS class.

## Troubleshooting

The following means can be used to troubleshoot the reason why a pod could not be deployed or
became rejected at a node:

- pod status - indicates topology affinity errors
- system logs - include valuable information for debugging, e.g., about generated hints
- state file - the dump of internal state of the Memory Manager
  (includes [Node Map and Memory Maps][2])
- starting from v1.22, the [device plugin resource API](#device-plugin-resource-api) can be used
  to retrieve information about the memory reserved for containers

### Pod status (TopologyAffinityError) {#TopologyAffinityError}

This error typically occurs in the following situations:

* a node has not enough resources available to satisfy the pod's request
* the pod's request is rejected due to particular Topology Manager policy constraints

The error appears in the status of a pod:

```shell
kubectl get pods
```

```none
NAME         READY   STATUS                  RESTARTS   AGE
guaranteed   0/1     TopologyAffinityError   0          113s
```

Use `kubectl describe pod <id>` or `kubectl get events` to obtain detailed error message:

```none
Warning  TopologyAffinityError  10m   kubelet, dell8  Resources cannot be allocated with Topology locality
```

### System logs

Search system logs with respect to a particular pod.

The set of hints that Memory Manager generated for the pod can be found in the logs.
Also, the set of hints generated by CPU Manager should be present in the logs.

Topology Manager merges these hints to calculate a single best hint.
The best hint should be also present in the logs.

The best hint indicates where to allocate all the resources.
Topology Manager tests this hint against its current policy, and based on the verdict,
it either admits the pod to the node or rejects it.

Also, search the logs for occurrences associated with the Memory Manager,
e.g. to find out information about `cgroups` and `cpuset.mems` updates.

### Examine the memory manager state on a node

Let us first deploy a sample `Guaranteed` pod whose specification is as follows:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed
spec:
  containers:
  - name: guaranteed
    image: consumer
    imagePullPolicy: Never
    resources:
      limits:
        cpu: "2"
        memory: 150Gi
      requests:
        cpu: "2"
        memory: 150Gi
    command: ["sleep","infinity"]
```

Next, let us log into the node where it was deployed and examine the state file in
`/var/lib/kubelet/memory_manager_state`:

```json
{
   "policyName":"Static",
   "machineState":{
      "0":{
         "numberOfAssignments":1,

Title: Guaranteed QoS Class Pods and Troubleshooting
Summary
This section details how the Memory Manager provides default topology hints for non-Guaranteed QoS class pods. It presents YAML examples for assigning a pod to the Guaranteed QoS class by ensuring that resource requests equal resource limits for both integer and shared CPUs, while emphasizing the necessity of specifying both CPU and memory requests. It also covers troubleshooting steps, including checking pod status for TopologyAffinityError, examining system logs for hint generation and cgroup updates, and analyzing the memory manager state file on the node to understand resource allocation.