Running Ollama in a Container with GPU Support

The sample application supports both [Ollama](https://ollama.ai/) and [OpenAI](https://openai.com/). This guide provides instructions for the following scenarios: - Run Ollama in a container - Run Ollama outside of a container - Use OpenAI While all platforms can use any of the previous scenarios, the performance and GPU support may vary. You can use the following guidelines to help you choose the appropriate option: - Run Ollama in a container if you're on Linux, and using a native installation of the Docker Engine, or Windows 10/11, and using Docker Desktop, you have a CUDA-supported GPU, and your system has at least 8 GB of RAM. - Run Ollama outside of a container if you're on an Apple silicon Mac. - Use OpenAI if the previous two scenarios don't apply to you. Choose one of the following options for your LLM service. {{< tabs >}} {{< tab name="Run Ollama in a container" >}} When running Ollama in a container, you should have a CUDA-supported GPU. While you can run Ollama in a container without a supported GPU, the performance may not be acceptable. Only Linux and Windows 11 support GPU access to containers. To run Ollama in a container and provide GPU access: 1. Install the prerequisites. - For Docker Engine on Linux, install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). - For Docker Desktop on Windows 10/11, install the latest [NVIDIA driver](https://www.nvidia.com/Download/index.aspx) and make sure you are using the [WSL2 backend](/manuals/desktop/features/wsl/_index.md#turn-on-docker-desktop-wsl-2) 2. Add the Ollama service and a volume in your `compose.yaml`. The following is the updated `compose.yaml`: ```yaml {hl_lines=["24-38"]} services: server: build: context: . ports: - 8000:8000 env_file: - .env depends_on: database: condition: service_healthy database: image: neo4j:5.11 ports: - "7474:7474" - "7687:7687" environment: - NEO4J_AUTH=${NEO4J_USERNAME}/${NEO4J_PASSWORD} healthcheck: test: [ "CMD-SHELL", "wget --no-verbose --tries=1 --spider localhost:7474 || exit 1", ] interval: 5s timeout: 3s retries: 5 ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_volume:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: ollama_volume: ``` > [!NOTE] > > For more details about the Compose instructions, see [Turn on GPU access with Docker Compose](/manuals/compose/how-tos/gpu-support.md). 3. Add the ollama-pull service to your `compose.yaml` file. This service uses the `docker/genai:ollama-pull` image, based on the GenAI Stack's [pull_model.Dockerfile](https://github.com/docker/genai-stack/blob/main/pull_model.Dockerfile). The service will automatically pull the model for your Ollama container. The following is the updated section of the `compose.yaml` file:

This section details how to run Ollama within a Docker container, specifically focusing on enabling GPU access for improved performance. It highlights the importance of having a CUDA-supported GPU and the limitations of GPU access to containers (Linux and Windows 11 only). The instructions cover installing prerequisites like the NVIDIA Container Toolkit (for Linux) or the latest NVIDIA driver (for Windows), updating the `compose.yaml` file to include the Ollama service with GPU device requests and a volume for persistent data, and adding a service to pull the model for your Ollama container.