Recovering a Master Replica and Shutting Down the HA Kubernetes Cluster

After a while the master replica will be marked as NotReady: ``` $ kubectl get nodes NAME STATUS AGE kubernetes-master Ready,SchedulingDisabled 51m kubernetes-master-2d4 NotReady,SchedulingDisabled 8m kubernetes-master-85f Ready,SchedulingDisabled 4m ... ``` However, the cluster is still operational. We may verify it by checking if our nginx server works correctly: ``` $ kubectl run -i --tty test-b --image=busybox /bin/sh If you don't see a command prompt, try pressing enter. # wget -q -O- http://nginx.default.svc.cluster.local ... \<title\>Welcome to nginx!\</title\> ... ``` We may also run another nginx server: ``` $ kubectl run nginx-next --image=nginx --expose --port=80 ``` The new server should be also working correctly: ``` $ kubectl run -i --tty test-c --image=busybox /bin/sh If you don't see a command prompt, try pressing enter. # wget -q -O- http://nginx-next.default.svc.cluster.local ... \<title\>Welcome to nginx!\</title\> ... ``` Let’s now reset the broken replica: ``` $ gcloud compute instances start kubernetes-master-2d4 --zone=europe-west1-c ``` After a while, the replica should be re-attached to the cluster: ``` $ kubectl get nodes NAME STATUS AGE kubernetes-master Ready,SchedulingDisabled 57m kubernetes-master-2d4 Ready,SchedulingDisabled 13m kubernetes-master-85f Ready,SchedulingDisabled 9m ... ``` **Shutting down HA cluster** To shutdown the cluster, we will first shut down master replicas in zones D and E: ``` $ KUBE\_DELETE\_NODES=false KUBE\_GCE\_ZONE=europe-west1-c ./cluster/kube-down.sh $ KUBE\_DELETE\_NODES=false KUBE\_GCE\_ZONE=europe-west1-d ./cluster/kube-down.sh ``` Note that the second removal of replica will take longer (~15 minutes), as we need to reassign the IP of the load balancer in front of replicas to the remaining master and wait for it to propagate (see [design doc](https://github.com/kubernetes/kubernetes/blob/master/docs/design/ha_master.md) for more details). Then, we will remove the additional worker nodes from zones europe-west1-c and europe-west1-d: ``` $ KUBE\_USE\_EXISTING\_MASTER=true KUBE\_GCE\_ZONE=europe-west1-c ./cluster/kube-down.sh $ KUBE\_USE\_EXISTING\_MASTER=true KUBE\_GCE\_ZONE=europe-west1-d ./cluster/kube-down.sh ``` And finally, we will shutdown the remaining master with the last group of nodes (zone europe-west1-b): ``` $ KUBE\_GCE\_ZONE=europe-west1-b ./cluster/kube-down.sh ``` **Conclusions** We have shown how, by adding worker node pools and master replicas, a Highly Available Kubernetes cluster can be created. As of Kubernetes version 1.5.2, it is supported in kube-up/kube-down scripts for GCE (as alpha). Additionally, there is a support for HA cluster on AWS in kops scripts (see [this article](http://kubecloud.io/setup-ha-k8s-kops/) for more details). - [Download](http://get.k8s.io/) Kubernetes - Get involved with the Kubernetes project on [GitHub](https://github.com/kubernetes/kubernetes) - Post questions (or answer questions) on [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes) - Connect with the community on [Slack](http://slack.k8s.io/) - Follow us on Twitter [@Kubernetesio](https://twitter.com/kubernetesio) for latest updates

This section describes the process of bringing a failed master replica back online and subsequently shutting down the HA Kubernetes cluster. It details the commands to restart the broken replica, verifies its re-attachment to the cluster, and outlines the steps to shutdown the cluster by deactivating master replicas and worker nodes in specific zones in a specific order. The conclusion highlights the successful creation of an HA Kubernetes cluster using worker node pools and master replicas and provides resources for further engagement with the Kubernetes community.