Completing the Failed etcd Member Replacement and Backing up an etcd Cluster

members, remove the failed member from the `--etcd-servers` flag, then restart each Kubernetes API server. 1. If each Kubernetes API server communicates with a single etcd member, then stop the Kubernetes API server that communicates with the failed etcd. 1. Stop the etcd server on the broken node. It is possible that other clients besides the Kubernetes API server are causing traffic to etcd and it is desirable to stop all traffic to prevent writes to the data directory. 1. Remove the failed member: ```shell etcdctl member remove 8211f1d0f64f3269 ``` The following message is displayed: ```console Removed member 8211f1d0f64f3269 from cluster ``` 1. Add the new member: ```shell etcdctl member add member4 --peer-urls=http://10.0.0.4:2380 ``` The following message is displayed: ```console Member 2be1eb8f84b7f63e added to cluster ef37ad9dc622a7c4 ``` 1. Start the newly added member on a machine with the IP `10.0.0.4`: ```shell export ETCD_NAME="member4" export ETCD_INITIAL_CLUSTER="member2=http://10.0.0.2:2380,member3=http://10.0.0.3:2380,member4=http://10.0.0.4:2380" export ETCD_INITIAL_CLUSTER_STATE=existing etcd [flags] ``` 1. Do either of the following: 1. If each Kubernetes API server is configured to communicate with all etcd members, add the newly added member to the `--etcd-servers` flag, then restart each Kubernetes API server. 1. If each Kubernetes API server communicates with a single etcd member, start the Kubernetes API server that was stopped in step 2. Then configure Kubernetes API server clients to again route requests to the Kubernetes API server that was stopped. This can often be done by configuring a load balancer. For more information on cluster reconfiguration, see [etcd reconfiguration documentation](https://etcd.io/docs/current/op-guide/runtime-configuration/#remove-a-member). ## Backing up an etcd cluster All Kubernetes objects are stored in etcd. Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all control plane nodes. The snapshot file contains all the Kubernetes state and critical information. In order to keep the sensitive Kubernetes data safe, encrypt the snapshot files. Backing up an etcd cluster can be accomplished in two ways: etcd built-in snapshot and volume snapshot. ### Built-in snapshot etcd supports built-in snapshot. A snapshot may either be created from a live member with the `etcdctl snapshot save` command or by copying the `member/snap/db` file from an etcd [data directory](https://etcd.io/docs/current/op-guide/configuration/#--data-dir) that is not currently used by an etcd process. Creating the snapshot will not affect the performance of the member. Below is an example for creating a snapshot of the keyspace served by `$ENDPOINT` to the file `snapshot.db`: ```shell ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db ``` Verify the snapshot: {{< tabs name="etcd_verify_snapshot" >}} {{% tab name="Use etcdutl" %}} The below example depicts the usage of the `etcdutl` tool for verifying a snapshot: ```shell etcdutl --write-out=table snapshot status snapshot.db ``` This should generate an output resembling the example provided below: ```console +----------+----------+------------+------------+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | +----------+----------+------------+------------+ | fe01cf57 | 10 | 7 | 2.1 MB | +----------+----------+------------+------------+ ``` {{% /tab %}} {{% tab name="Use etcdctl (Deprecated)" %}} {{< note >}} The usage of `etcdctl snapshot status` has been **deprecated** since etcd v3.5.x and is slated for removal from etcd v3.6. It is recommended to utilize [`etcdutl`](https://github.com/etcd-io/etcd/blob/main/etcdutl/README.md) instead.

This section details the final steps in replacing a failed etcd member, including starting the new member, updating the Kubernetes API server configurations, and configuring clients. It also covers the importance of backing up an etcd cluster to recover Kubernetes clusters in case of disaster, and it describes two backup methods: using etcd's built-in snapshot functionality and volume snapshots. The built-in snapshot method involves using the `etcdctl snapshot save` command and provides an example of verifying the snapshot.