Analyzing Pod Creation Latency and Identifying Bottlenecks

![](https://lh5.googleusercontent.com/4WM9bX-Vzn-h2otSaVcES4FBPDeTFuIueo_uRXDctpKPO_lFAANjRj9QmezSn5x81QLcDAq8ui_Gvbik1edyjUwPKWQNKjbW7uSNwCFnGg7Bd1KqqU1U7B1gvwzK_X6Wo7DJjYH3) | | Figure 4. Pod startup latency when creating 105 pods. | Looking specifically at build #162, we are able to see that the tracing data plotted in the pod creation latency chart (Figure 5). Each curve is an accumulated histogram of the number of pod operations which have already arrive at a certain tracing probe. The timestamp of tracing pod is either collected from the performance tests or by parsing the Kubelet log. Currently we collect the following tracing data: - "create" (in test): the test creates pods through API client; - "running" (in test): the test watches that pods are running from API server; - "pod\_config\_change": pod config change detected by Kubelet SyncLoop; - "runtime\_manager": runtime manager starts to create containers; - "infra\_container\_start": the infra container of a pod starts; - "container\_start': the container of a pod starts; - "pod\_running": a pod is running; - "pod\_status\_running": status manager updates status for a running pod; The time series chart illustrates that it is taking a long time for the status manager to update pod status (the data of "running" is not shown since it overlaps with "pod\_status\_running"). We figure out this latency is introduced due to the query per second (QPS) limits of Kubelet to the API server (default is 5). After being aware of this, we find in additional tests that by increasing QPS limits, curve "running" gradually converges with "pod\_running', and results in much lower latency. Therefore the previous e2e test pod startup results reflect the combined latency of both Kubelet and time of uploading status, the performance of Kubelet is thus under-estimated. | ![](https://lh3.googleusercontent.com/_8y02WcgZ7ETvDTeZ893rZYNuIR2j32_jnl7O1Mj3cP9Y7I3C-gegDgSdYX1VtTpGDUo6JEouueSj8hGWPJSXj_5GcC9nE21tjIXgTIrwRXW-0jYpXdRh6oDSSdQ1XKPyXIf3yQu)

Tracing data for build #162 shows latencies in pod operations. The time series chart indicates a significant delay in the status manager updating the pod status, which is attributed to the QPS limits of Kubelet to the API server. Increasing the QPS limits reduces latency, suggesting that previous e2e test results underestimated Kubelet's actual performance due to the combined latency of Kubelet and status update time.