Home Explore Blog CI



docker

5th chunk of `content/manuals/engine/swarm/admin_guide.md`
00faaa06aa24735e6c4527ae27c75c99cb64d48842c53aef0000000100000cc0
    > losing the quorum if further nodes are lost. The number of managers you
    > run is a trade-off. If you regularly take down managers to do backups,
    > consider running a five manager swarm, so that you can lose an additional
    > manager while the backup is running, without disrupting your services.

3.  Back up the entire `/var/lib/docker/swarm` directory.

4.  Restart the manager.

To restore, see [Restore from a backup](#restore-from-a-backup). 

## Recover from disaster

### Restore from a backup

After backing up the swarm as described in
[Back up the swarm](#back-up-the-swarm), use the following procedure to
restore the data to a new swarm.

1.  Shut down Docker on the target host machine for the restored swarm.

2.  Remove the contents of the `/var/lib/docker/swarm` directory on the new
    swarm.

3.  Restore the `/var/lib/docker/swarm` directory with the contents of the
    backup.

    > [!NOTE]
    > 
    > The new node uses the same encryption key for on-disk
    > storage as the old one. It is not possible to change the on-disk storage
    > encryption keys at this time.
    >
    > In the case of a swarm with auto-lock enabled, the unlock key is also the
    > same as on the old swarm, and the unlock key is needed to restore the
    > swarm.

4.  Start Docker on the new node. Unlock the swarm if necessary. Re-initialize
    the swarm using the following command, so that this node does not attempt
    to connect to nodes that were part of the old swarm, and presumably no
    longer exist.

    ```console
    $ docker swarm init --force-new-cluster
    ```

5.  Verify that the state of the swarm is as expected. This may include
    application-specific tests or simply checking the output of
    `docker service ls` to be sure that all expected services are present.

6.  If you use auto-lock,
    [rotate the unlock key](swarm_manager_locking.md#rotate-the-unlock-key).

7.  Add manager and worker nodes to bring your new swarm up to operating
    capacity.

8.  Reinstate your previous backup regimen on the new swarm.

### Recover from losing the quorum

Swarm is resilient to failures and can recover from any number
of temporary node failures (machine reboots or crash with restart) or other
transient errors. However, a swarm cannot automatically recover if it loses a
quorum. Tasks on existing worker nodes continue to run, but administrative
tasks are not possible, including scaling or updating services and joining or
removing nodes from the swarm. The best way to recover is to bring the missing
manager nodes back online. If that is not possible, continue reading for some
options for recovering your swarm.

In a swarm of `N` managers, a quorum (a majority) of manager nodes must always
be available. For example, in a swarm with five managers, a minimum of three must be
operational and in communication with each other. In other words, the swarm can
tolerate up to `(N-1)/2` permanent failures beyond which requests involving
swarm management cannot be processed. These types of failures include data
corruption or hardware failures.

If you lose the quorum of managers, you cannot administer the swarm. If you have
lost the quorum and you attempt to perform any management operation on the swarm,

Title: Disaster Recovery: Restoring from Backup and Recovering from Quorum Loss
Summary
This section describes the procedure to restore a Docker swarm from a backup after a disaster, emphasizing the importance of shutting down Docker on the target host, restoring the `/var/lib/docker/swarm` directory, and initializing a new swarm cluster. It also covers recovering from a quorum loss, where administrative tasks become impossible. The primary recommendation is to restore the missing manager nodes. Additionally, the text outlines the concept of quorum in a swarm and the tolerance for permanent failures, indicating the importance of maintaining a majority of manager nodes for swarm administration.