By default manager nodes also act as a worker nodes. This means the scheduler
can assign tasks to a manager node. For small and non-critical swarms
assigning tasks to managers is relatively low-risk as long as you schedule
services using resource constraints for cpu and memory.
However, because manager nodes use the Raft consensus algorithm to replicate data
in a consistent way, they are sensitive to resource starvation. You should
isolate managers in your swarm from processes that might block swarm
operations like swarm heartbeat or leader elections.
To avoid interference with manager node operation, you can drain manager nodes
to make them unavailable as worker nodes:
```console
$ docker node update --availability drain <NODE>
```
When you drain a node, the scheduler reassigns any tasks running on the node to
other available worker nodes in the swarm. It also prevents the scheduler from
assigning tasks to the node.
## Add worker nodes for load balancing
[Add nodes to the swarm](join-nodes.md) to balance your swarm's
load. Replicated service tasks are distributed across the swarm as evenly as
possible over time, as long as the worker nodes are matched to the requirements
of the services. When limiting a service to run on only specific types of nodes,
such as nodes with a specific number of CPUs or amount of memory, remember that
worker nodes that do not meet these requirements cannot run these tasks.
## Monitor swarm health
You can monitor the health of manager nodes by querying the docker `nodes` API
in JSON format through the `/nodes` HTTP endpoint. Refer to the
[nodes API documentation](/reference/api/engine/v1.25/#tag/Node)
for more information.
From the command line, run `docker node inspect <id-node>` to query the nodes.
For instance, to query the reachability of the node as a manager:
```console
$ docker node inspect manager1 --format "{{ .ManagerStatus.Reachability }}"
reachable
```
To query the status of the node as a worker that accept tasks:
```console
$ docker node inspect manager1 --format "{{ .Status.State }}"
ready
```
From those commands, we can see that `manager1` is both at the status
`reachable` as a manager and `ready` as a worker.
An `unreachable` health status means that this particular manager node is unreachable
from other manager nodes. In this case you need to take action to restore the unreachable
manager:
- Restart the daemon and see if the manager comes back as reachable.
- Reboot the machine.
- If neither restarting nor rebooting works, you should add another manager node or promote a worker to be a manager node. You also need to cleanly remove the failed node entry from the manager set with `docker node demote <NODE>` and `docker node rm <id-node>`.
Alternatively you can also get an overview of the swarm health from a manager
node with `docker node ls`:
```console
$ docker node ls
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
1mhtdwhvsgr3c26xxbnzdc3yp node05 Accepted Ready Active
516pacagkqp2xc3fk9t1dhjor node02 Accepted Ready Active Reachable
9ifojw8of78kkusuc4a6c23fx * node01 Accepted Ready Active Leader
ax11wdpwrrb6db3mfjydscgk7 node04 Accepted Ready Active
bb1nrq2cswhtbg4mrsqnlx1ck node03 Accepted Ready Active Reachable
di9wxgz8dtuh9d2hn089ecqkf node06 Accepted Ready Active
```
## Troubleshoot a manager node
You should never restart a manager node by copying the `raft` directory from another node. The data directory is unique to a node ID. A node can only use a node ID once to join the swarm. The node ID space should be globally unique.
To cleanly re-join a manager node to a cluster:
1. Demote the node to a worker using `docker node demote <NODE>`.
2. Remove the node from the swarm using `docker node rm <NODE>`.
3. Re-join the node to the swarm with a fresh state using `docker swarm join`.
For more information on joining a manager node to a swarm, refer to