Home Explore Blog CI



docker

6th chunk of `content/manuals/engine/swarm/admin_guide.md`
c3ff9d2d9d93f1958533e23a028dcd9beb549939d1316ab5000000010000085c
transient errors. However, a swarm cannot automatically recover if it loses a
quorum. Tasks on existing worker nodes continue to run, but administrative
tasks are not possible, including scaling or updating services and joining or
removing nodes from the swarm. The best way to recover is to bring the missing
manager nodes back online. If that is not possible, continue reading for some
options for recovering your swarm.

In a swarm of `N` managers, a quorum (a majority) of manager nodes must always
be available. For example, in a swarm with five managers, a minimum of three must be
operational and in communication with each other. In other words, the swarm can
tolerate up to `(N-1)/2` permanent failures beyond which requests involving
swarm management cannot be processed. These types of failures include data
corruption or hardware failures.

If you lose the quorum of managers, you cannot administer the swarm. If you have
lost the quorum and you attempt to perform any management operation on the swarm,
an error occurs:

```none
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
```

The best way to recover from losing the quorum is to bring the failed nodes back
online. If you can't do that, the only way to recover from this state is to use
the `--force-new-cluster` action from a manager node. This removes all managers
except the manager the command was run from. The quorum is achieved because
there is now only one manager. Promote nodes to be managers until you have the
desired number of managers.

From the node to recover, run:

```console
$ docker swarm init --force-new-cluster --advertise-addr node01:2377
```

When you run the `docker swarm init` command with the `--force-new-cluster`
flag, the Docker Engine where you run the command becomes the manager node of a
single-node swarm which is capable of managing and running services. The manager
has all the previous information about services and tasks, worker nodes are
still part of the swarm, and services are still running. You need to add or
re-add  manager nodes to achieve your previous task distribution and ensure that

Title: Recovering a Swarm After Losing Quorum: Using --force-new-cluster
Summary
This section explains how to recover a Docker Swarm when the manager quorum is lost, making administrative tasks impossible. The recommended approach is to bring the failed manager nodes back online. If that is not feasible, the only recourse is to use the `--force-new-cluster` option with `docker swarm init` on a manager node. This action removes all other managers, establishing a new single-node swarm and re-establishing quorum. Worker nodes remain part of the swarm, and services continue to run. The procedure requires adding or re-adding manager nodes to achieve the desired task distribution and fault tolerance.