Interacting with OpenAI API from Host and Known Issues

"role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Please write 500 words about the fall of Rome." } ] }' ``` #### From the host using TCP To call the `chat/completions` OpenAI endpoint from the host via TCP: 1. Enable the host-side TCP support from the Docker Desktop GUI, or via the [Docker Desktop CLI](/manuals/desktop/features/desktop-cli.md). For example: `docker desktop enable model-runner --tcp <port>`. If you are running on Windows, also enable GPU-backed inference. See [Enable Docker Model Runner](#enable-dmr-in-docker-desktop). 2. Interact with it as documented in the previous section using `localhost` and the correct port. ```bash #!/bin/sh curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Please write 500 words about the fall of Rome." } ] }' ``` #### From the host using a Unix socket To call the `chat/completions` OpenAI endpoint through the Docker socket from the host using `curl`: ```bash #!/bin/sh curl --unix-socket $HOME/.docker/run/docker.sock \ localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Please write 500 words about the fall of Rome." } ] }' ``` ## Known issues ### `docker model` is not recognised If you run a Docker Model Runner command and see: ```text docker: 'model' is not a docker command ``` It means Docker can't find the plugin because it's not in the expected CLI plugins directory. To fix this, create a symlink so Docker can detect it: ```console $ ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model ``` Once linked, rerun the command. ### No safeguard for running oversized models Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed your system's available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or may render the system temporarily unusable. This issue is particularly common when running LLMs without sufficient GPU memory or system RAM. ### No consistent digest support in Model CLI The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a temporary workaround, you should refer to models by name instead of digest. ## Share feedback Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.

This section provides instructions on how to call the `chat/completions` OpenAI endpoint from the host using TCP and Unix sockets, including example `curl` commands. It also outlines known issues with Docker Model Runner, such as the `docker model` command not being recognized, lack of safeguards for running oversized models, and inconsistent digest support in the Model CLI. The section concludes with a request for user feedback.