Docker Model Runner API Endpoints and Interaction Examples

> In this case, add an `extra_hosts` directive to your Compose service YAML: > > ```yaml > extra_hosts: > - "model-runner.docker.internal:host-gateway" > ``` > Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/ {{< /tab >}} {{</tabs >}} Docker Model management endpoints: ```text POST /models/create GET /models GET /models/{namespace}/{name} DELETE /models/{namespace}/{name} ``` OpenAI endpoints: ```text GET /engines/llama.cpp/v1/models GET /engines/llama.cpp/v1/models/{namespace}/{name} POST /engines/llama.cpp/v1/chat/completions POST /engines/llama.cpp/v1/completions POST /engines/llama.cpp/v1/embeddings ``` To call these endpoints via a Unix socket (`/var/run/docker.sock`), prefix their path with with `/exp/vDD4.40`. > [!NOTE] > You can omit `llama.cpp` from the path. For example: `POST /engines/v1/chat/completions`. ### How do I interact through the OpenAI API? #### From within a container To call the `chat/completions` OpenAI endpoint from within another container using `curl`: ```bash #!/bin/sh curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Please write 500 words about the fall of Rome." } ] }' ``` #### From the host using TCP To call the `chat/completions` OpenAI endpoint from the host via TCP: 1. Enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). For example: `docker desktop enable model-runner --tcp <port>`. If you are running on Windows, also enable GPU-backed inference. See [Enable Docker Model Runner](#enable-dmr-in-docker-desktop). 2. Interact with it as documented in the previous section using `localhost` and the correct port. ```bash #!/bin/sh curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2", "messages": [

This section provides Docker Model management and OpenAI API endpoints for Docker Model Runner, including instructions on how to access them. It also outlines how to interact with the OpenAI API, both from within a container and from the host using TCP, with example `curl` commands for the `chat/completions` endpoint.