API Latency Analysis: LIST Operation Performance and Improvements

6th chunk of `content/en/blog/_posts/2015-09-00-Kubernetes-Performance-Measurements-And.md`

139a19c8427d4e0dd9d1c5235d0f54b5277e7bb9037b3def0000000100000449

![list.png](https://lh6.googleusercontent.com/6Gy-UKBZUoEwJ9iFytq-k_wrdvh6FsTJexSpn6nNnBwOvxv-Sp6PV7vmArCL22MUkz0tWH7MxhaIc-JE8YpEc0X4nDUMn-cKWF3ANHtgd2aJ5t3osoaezDe_xqjpi748Cbw=s1600)


Some resources only appear on certain graphs, based on what was running during that operation (e.g. no namespace was put at that time).


As you can see in the results, we are ahead of target for our 100-node cluster with pod startup time even in a fully-packed cluster occurring 14% faster in the 99th percentile than 5 seconds. It’s interesting to point out that  LISTing pods is significantly slower than any other operation. This makes sense: in a full cluster there are 3000 pods and each of pod is roughly few kilobytes of data, meaning megabytes of data that need to processed for each LIST.


#####Work done and some future plans

The initial performance work to make 100-node clusters stable enough to run any tests on them involved a lot of small fixes and tuning, including increasing the limit for file descriptors in the apiserver and reusing tcp connections between different requests to etcd.

Title: API Latency Analysis: LIST Operation Performance and Improvements

Summary

The analysis of API latency on a 100-node cluster shows that pod startup time is 14% faster than the target of 5 seconds at the 99th percentile. However, listing pods is significantly slower due to the large amount of data processed. Initial performance improvements involved small fixes and tuning, such as increasing the file descriptor limit and reusing TCP connections.