Kubernetes Scalability Improvements: etcd v3 Migration

![](https://lh6.googleusercontent.com/RFGwgw9hvRshHH11vrUxGwl-X8vXdCvyd8ETdWS9Ud5_OFpG4WctzZbCy2ad4Ao_neYaMMDz46Z2JCQUzRI1jdk6OABTFIOyvZysZpDCAfr7Ztj-EM7v25sfHxf6dOe59fncDnra "Wykres") **How did we get here?** Over the past nine months (since the last scalability blog post), there have been a huge number of performance and scalability related changes in Kubernetes. In this post we will focus on the two biggest ones and will briefly enumerate a few others. **etcd v3** In Kubernetes 1.6 we switched the default storage backend (key-value store where the whole cluster state is stored) from etcd v2 to [etcd v3](https://coreos.com/etcd/docs/3.0.17/index.html). The initial works towards this transition has been started during the 1.3 release cycle. You might wonder why it took us so long, given that: - the first stable version of etcd supporting the v3 API [was announced](https://coreos.com/blog/etcd3-a-new-etcd.html) on June 30, 2016 - the new API was designed together with the Kubernetes team to support our needs (from both a feature and scalability perspective) - the integration of etcd v3 with Kubernetes had already mostly been finished when etcd v3 was announced (indeed CoreOS used Kubernetes as a proof-of-concept for the new etcd v3 API) As it turns out, there were a lot of reasons. We will describe the most important ones below. - Changing storage in a backward incompatible way, as is in the case for the etcd v2 to v3 migration, is a big change, and thus one for which we needed a strong justification. We found this justification in September when we determined that we would not be able to scale to 5000-node clusters if we continued to use etcd v2 ([kubernetes/32361](https://github.com/kubernetes/kubernetes/issues/32361) contains some discussion about it). In particular, what didn’t scale was the watch implementation in etcd v2. In a 5000-node cluster, we need to be able to send at least 500 watch events per second to a single watcher, which wasn’t possible in etcd v2. - Once we had the strong incentive to actually update to etcd v3, we started thoroughly testing it. As you might expect, we found some issues. There were some minor bugs in Kubernetes, and in addition we requested a performance improvement in etcd v3’s watch implementation (watch was the main bottleneck in etcd v2 for us). This led to the 3.0.10 etcd patch release.

This section discusses the scalability improvements in Kubernetes, focusing on the migration from etcd v2 to etcd v3. The migration was driven by the need to scale to 5000-node clusters, which was not possible with etcd v2 due to limitations in its watch implementation. The migration process involved thorough testing and bug fixes, resulting in the 3.0.10 etcd patch release.