Setting Up a Highly Available Kubernetes Cluster

--- title: " Highly Available Kubernetes Clusters " date: 2017-02-02 slug: highly-available-kubernetes-clusters url: /blog/2017/02/Highly-Available-Kubernetes-Clusters author: > Jerzy Szczepkowski (Google) --- Today’s post shows how to set-up a reliable, highly available distributed Kubernetes cluster. The support for running such clusters on Google Compute Engine (GCE) was added as an alpha feature in [Kubernetes 1.5 release](https://kubernetes.io/blog/2016/12/kubernetes-1-5-supporting-production-workloads/). **Motivation** We will create a Highly Available Kubernetes cluster, with master replicas and worker nodes distributed among three zones of a region. Such setup will ensure that the cluster will continue operating during a zone failure. **Setting Up HA cluster** The following instructions apply to GCE. First, we will setup a cluster that will span over one zone (europe-west1-b), will contain one master and three worker nodes and will be HA-compatible (will allow adding more master replicas and more worker nodes in multiple zones in future). To implement this, we’ll export the following environment variables: ``` $ export KUBERNETES\_PROVIDER=gce $ export NUM\_NODES=3 $ export MULTIZONE=true $ export ENABLE\_ETCD\_QUORUM\_READ=true ``` and run kube-up script (note that the entire cluster will be initially placed in zone europe-west1-b): ``` $ KUBE\_GCE\_ZONE=europe-west1-b ./cluster/kube-up.sh ``` Now, we will add two additional pools of worker nodes, each of three nodes, in zones europe-west1-c and europe-west1-d (more details on adding pools of worker nodes can be find [here](/docs/setup/multiple-zones/)): ``` $ KUBE\_USE\_EXISTING\_MASTER=true KUBE\_GCE\_ZONE=europe-west1-c ./cluster/kube-up.sh $ KUBE\_USE\_EXISTING\_MASTER=true KUBE\_GCE\_ZONE=europe-west1-d ./cluster/kube-up.sh ``` To complete setup of HA cluster, we will add two master replicase, one in zone europe-west1-c, the other in europe-west1-d: ``` $ KUBE\_GCE\_ZONE=europe-west1-c KUBE\_REPLICATE\_EXISTING\_MASTER=true ./cluster/kube-up.sh $ KUBE\_GCE\_ZONE=europe-west1-d KUBE\_REPLICATE\_EXISTING\_MASTER=true ./cluster/kube-up.sh ``` Note that adding the first replica will take longer (~15 minutes), as we need to reassign the IP of the master to the load balancer in front of replicas and wait for it to propagate (see [design doc](https://github.com/kubernetes/kubernetes/blob/master/docs/design/ha_master.md) for more details). **Verifying in HA cluster works as intended** We may now list all nodes present in the cluster: ``` $ kubectl get nodes NAME STATUS AGE kubernetes-master Ready,SchedulingDisabled 48m kubernetes-master-2d4 Ready,SchedulingDisabled 5m kubernetes-master-85f Ready,SchedulingDisabled 32s kubernetes-minion-group-6s52 Ready 39m kubernetes-minion-group-cw8e Ready 48m kubernetes-minion-group-fw91 Ready 48m kubernetes-minion-group-h2kn Ready 31m kubernetes-minion-group-ietm Ready 39m kubernetes-minion-group-j6lf Ready 31m kubernetes-minion-group-soj7 Ready 31m kubernetes-minion-group-tj82 Ready 39m kubernetes-minion-group-vd96 Ready 48m ``` As we can see, we have 3 master replicas (with disabled scheduling) and 9 worker nodes. We will deploy a sample application (nginx server) to verify that our cluster is working correctly: ``` $ kubectl run nginx --image=nginx --expose --port=80 ``` After waiting for a while, we can verify that both the deployment and the service were correctly created and are running: ``` $ kubectl get pods NAME READY STATUS RESTARTS AGE ... nginx-3449338310-m7fjm 1/1 Running 0 4s

This blog post explains how to set up a highly available (HA) Kubernetes cluster on Google Compute Engine (GCE). It details the steps to create a cluster that spans multiple zones with master replicas and worker nodes, ensuring continued operation even during a zone failure. The guide covers setting up the initial cluster, adding worker node pools in different zones, and replicating master instances for high availability. It also includes instructions on verifying the HA cluster's functionality by deploying a sample application.