Simple Leader Election with Kubernetes and Docker

--- title: " Simple leader election with Kubernetes and Docker " date: 2016-01-11 slug: simple-leader-election-with-kubernetes url: /blog/2016/01/Simple-Leader-Election-With-Kubernetes --- #### Overview Kubernetes simplifies the deployment and operational management of services running on clusters. However, it also simplifies the development of these services. In this post we'll see how you can use Kubernetes to easily perform leader election in your distributed application. Distributed applications usually replicate the tasks of a service for reliability and scalability, but often it is necessary to designate one of the replicas as the leader who is responsible for coordination among all of the replicas. Typically in leader election, a set of candidates for becoming leader is identified. These candidates all race to declare themselves the leader. One of the candidates wins and becomes the leader. Once the election is won, the leader continually "heartbeats" to renew their position as the leader, and the other candidates periodically make new attempts to become the leader. This ensures that a new leader is identified quickly, if the current leader fails for some reason. Implementing leader election usually requires either deploying software such as ZooKeeper, etcd or Consul and using it for consensus, or alternately, implementing a consensus algorithm on your own. We will see below that Kubernetes makes the process of using leader election in your application significantly easier. #### Implementing leader election in Kubernetes The first requirement in leader election is the specification of the set of candidates for becoming the leader. Kubernetes already uses _Endpoints_ to represent a replicated set of pods that comprise a service, so we will re-use this same object. (aside: You might have thought that we would use _ReplicationControllers_, but they are tied to a specific binary, and generally you want to have a single leader even if you are in the process of performing a rolling update) To perform leader election, we use two properties of all Kubernetes API objects: * ResourceVersions - Every API object has a unique ResourceVersion, and you can use these versions to perform compare-and-swap on Kubernetes objects * Annotations - Every API object can be annotated with arbitrary key/value pairs to be used by clients. Given these primitives, the code to use master election is relatively straightforward, and you can find it [here][1]. Let's run it ourselves. ``` $ kubectl run leader-elector --image=gcr.io/google_containers/leader-elector:0.4 --replicas=3 -- --election=example ``` This creates a leader election set with 3 replicas: ``` $ kubectl get pods NAME READY STATUS RESTARTS AGE leader-elector-inmr1 1/1 Running 0 13s leader-elector-qkq00 1/1 Running 0 13s leader-elector-sgwcq 1/1 Running 0 13s ``` To see which pod was chosen as the leader, you can access the logs of one of the pods, substituting one of your own pod's names in place of ``` ${pod_name}, (e.g. leader-elector-inmr1 from the above) $ kubectl logs -f ${name} leader is (leader-pod-name) ``` … Alternately, you can inspect the endpoints object directly: _'example' is the name of the candidate set from the above kubectl run … command_ ``` $ kubectl get endpoints example -o yaml ``` Now to validate that leader election actually works, in a different terminal, run: ``` $ kubectl delete pods (leader-pod-name) ``` This will delete the existing leader. Because the set of pods is being managed by a replication controller, a new pod replaces the one that was deleted, ensuring that the size of the replicated set is still three. Via leader election one of these three pods is selected as the new leader, and you should see the leader failover to a different pod. Because pods in Kubernetes have a _grace period_ before termination, this may take 30-40 seconds.

This blog post explains how to use Kubernetes to perform leader election in a distributed application. It explains the concept of leader election, and shows how to implement it in Kubernetes using Endpoints, ResourceVersions, and Annotations. It provides a simple example of creating a leader election set with three replicas, and demonstrates how the leader fails over to a different pod when the current leader is deleted.