Kubernetes 1.6 Scalability Updates: 5,000 Nodes and 150,000 Pods

--- title: " Scalability updates in Kubernetes 1.6: 5,000 node and 150,000 pod clusters " date: 2017-03-30 slug: scalability-updates-in-kubernetes-1.6 url: /blog/2017/03/Scalability-Updates-In-Kubernetes-1-6 author: > Wojciech Tyczynski (Google) --- _Editor’s note: this post is part of a [series of in-depth articles](https://kubernetes.io/blog/2017/03/five-days-of-kubernetes-1-6) on what's new in Kubernetes 1.6_ Last summer we [shared](https://kubernetes.io/blog/2016/07/update-on-kubernetes-for-windows-server-containers/) updates on Kubernetes scalability, since then we’ve been working hard and are proud to announce that [Kubernetes 1.6](https://kubernetes.io/blog/2017/03/kubernetes-1-6-multi-user-multi-workloads-at-scale) can handle 5,000-node clusters with up to 150,000 pods. Moreover, those cluster have even better end-to-end pod startup time than the previous 2,000-node clusters in the 1.3 release; and latency of the API calls are within the one-second SLO. In this blog post we review what metrics we monitor in our tests and describe our performance results from Kubernetes 1.6. We also discuss what changes we made to achieve the improvements, and our plans for upcoming releases in the area of system scalability. **X-node clusters - what does it mean?** Now that Kubernetes 1.6 is released, it is a good time to review what it means when we say we “support” X-node clusters. As described in detail in a [previous blog post](https://kubernetes.io/blog/2016/03/1000-nodes-and-beyond-updates-to-Kubernetes-performance-and-scalability-in-12), we currently have two performance-related [Service Level Objectives (SLO)](https://en.wikipedia.org/wiki/Service_level_objective): - **API-responsiveness** : 99% of all API calls return in less than 1s - **Pod startup time** : 99% of pods and their containers (with pre-pulled images) start within 5s. As before, it is possible to run larger deployments than the stated supported 5,000-node cluster (and users have), but performance may be degraded and it may not meet our strict SLO defined above. We are aware of the limited scope of these SLOs. There are many aspects of the system that they do not exercise. For example, we do not measure how soon a new pod that is part of a service will be reachable through the service IP address after the pod is started. If you are considering using large Kubernetes clusters and have performance requirements not covered by our SLOs, please contact the Kubernetes [Scalability SIG](https://github.com/kubernetes/community/blob/master/sig-scalability/README.md) so we can help you understand whether Kubernetes is ready to handle your workload now. The top scalability-related priority for upcoming Kubernetes releases is to enhance our definition of what it means to support X-node clusters by: - refining currently existing SLOs - adding more SLOs (that will cover various areas of Kubernetes, including networking) **Kubernetes 1.6 performance metrics at scale** So how does performance in large clusters look in Kubernetes 1.6? The following graph shows the end-to-end pod startup latency with 2000- and 5000-node clusters. For comparison, we also show the same metric from Kubernetes 1.3, which we published in our previous scalability blog post that described support for 2000-node clusters. As you can see, Kubernetes 1.6 has better pod startup latency with both 2000 and 5000 nodes compared to Kubernetes 1.3 with 2000 nodes [1]. ![](https://lh6.googleusercontent.com/LdjAOmsLGdxLNTo222uif1V0Eupoyaq6dY-leg1FBGkyQxUNt5ROjrFh_XzW27P7nP865FYUVwTOaUpDEnirdHSBKvh9xl8PsBNEFlVWpJUbnj0FEdLX4MywqbjwK9oc8avLRNAX "Wykres")

Kubernetes 1.6 introduces enhanced scalability, supporting 5,000-node clusters with up to 150,000 pods. The update also improves pod startup time and API call latency compared to previous releases. The post defines the Service Level Objectives (SLO) for performance and discusses ongoing efforts to refine and expand these SLOs for future releases. Performance metrics for pod startup latency in 2000 and 5000 node clusters are provided, demonstrating improvements over Kubernetes 1.3.