Securing etcd Clusters and Replacing Failed Members

clients. See the [example scripts](https://github.com/coreos/etcd/tree/master/hack/tls-setup) provided by the etcd project to generate key pairs and CA files for client authentication. ### Securing communication To configure etcd with secure peer communication, specify flags `--peer-key-file=peer.key` and `--peer-cert-file=peer.cert`, and use HTTPS as the URL schema. Similarly, to configure etcd with secure client communication, specify flags `--key=k8sclient.key` and `--cert=k8sclient.cert`, and use HTTPS as the URL schema. Here is an example on a client command that uses secure communication: ``` ETCDCTL_API=3 etcdctl --endpoints 10.2.0.9:2379 \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ member list ``` ### Limiting access of etcd clusters After configuring secure communication, restrict the access of the etcd cluster to only the Kubernetes API servers using TLS authentication. For example, consider key pairs `k8sclient.key` and `k8sclient.cert` that are trusted by the CA `etcd.ca`. When etcd is configured with `--client-cert-auth` along with TLS, it verifies the certificates from clients by using system CAs or the CA passed in by `--trusted-ca-file` flag. Specifying flags `--client-cert-auth=true` and `--trusted-ca-file=etcd.ca` will restrict the access to clients with the certificate `k8sclient.cert`. Once etcd is configured correctly, only clients with valid certificates can access it. To give Kubernetes API servers the access, configure them with the flags `--etcd-certfile=k8sclient.cert`, `--etcd-keyfile=k8sclient.key` and `--etcd-cafile=ca.cert`. {{< note >}} etcd authentication is not planned for Kubernetes. {{< /note >}} ## Replacing a failed etcd member etcd cluster achieves high availability by tolerating minor member failures. However, to improve the overall health of the cluster, replace failed members immediately. When multiple members fail, replace them one by one. Replacing a failed member involves two steps: removing the failed member and adding a new member. Though etcd keeps unique member IDs internally, it is recommended to use a unique name for each member to avoid human errors. For example, consider a three-member etcd cluster. Let the URLs be, `member1=http://10.0.0.1`, `member2=http://10.0.0.2`, and `member3=http://10.0.0.3`. When `member1` fails, replace it with `member4=http://10.0.0.4`. 1. Get the member ID of the failed `member1`: ```shell etcdctl --endpoints=http://10.0.0.2,http://10.0.0.3 member list ``` The following message is displayed: ```console 8211f1d0f64f3269, started, member1, http://10.0.0.1:2380, http://10.0.0.1:2379 91bc3c398fb3c146, started, member2, http://10.0.0.2:2380, http://10.0.0.2:2379 fd422379fda50e48, started, member3, http://10.0.0.3:2380, http://10.0.0.3:2379 ``` 1. Do either of the following: 1. If each Kubernetes API server is configured to communicate with all etcd members, remove the failed member from the `--etcd-servers` flag, then restart each Kubernetes API server. 1. If each Kubernetes API server communicates with a single etcd member, then stop the Kubernetes API server that communicates with the failed etcd. 1. Stop the etcd server on the broken node. It is possible that other clients besides the Kubernetes API server are causing traffic to etcd and it is desirable to stop all traffic to prevent writes to the data directory. 1. Remove the failed member: ```shell etcdctl member remove 8211f1d0f64f3269 ``` The following message is displayed: ```console Removed member 8211f1d0f64f3269 from cluster ``` 1. Add the new member: ```shell etcdctl member add member4 --peer-urls=http://10.0.0.4:2380 ``` The following message is displayed: ```console Member 2be1eb8f84b7f63e added to cluster ef37ad9dc622a7c4 ``` 1. Start the newly added member on a machine with the IP `10.0.0.4`:

This section discusses securing etcd clusters by restricting access to Kubernetes API servers using TLS authentication and x509 key pairs. It details how to configure etcd to verify client certificates and how to configure the Kubernetes API servers with the necessary certificates. Furthermore, it explains the process of replacing a failed etcd member in a cluster, including identifying the failed member, removing it, and adding a new member with a unique name and IP.