Download and install Ollama from [ollama.com/download](https://ollama.com/download) (Linux or macOS) and ensure it's running with `ollama --version`.
1. Download one of the [available models](https://ollama.com/models), for example, for `mistral`:
```sh
ollama pull mistral
```
2. Make sure that the Ollama server is running. You can start it either via running Ollama.app (macOS) or launching:
```sh
ollama serve
```
3. In the Agent Panel, select one of the Ollama models using the model dropdown.
#### Ollama Context Length {#ollama-context}
Zed has pre-configured maximum context lengths (`max_tokens`) to match the capabilities of common models.
Zed API requests to Ollama include this as `num_ctx` parameter, but the default values do not exceed `16384` so users with ~16GB of ram are able to use most models out of the box.
See [get_max_tokens in ollama.rs](https://github.com/zed-industries/zed/blob/main/crates/ollama/src/ollama.rs) for a complete set of defaults.
> **Note**: Token counts displayed in the Agent Panel are only estimates and will differ from the model's native tokenizer.
Depending on your hardware or use-case you may wish to limit or increase the context length for a specific model via settings.json:
```json
{
"language_models": {
"ollama": {
"api_url": "http://localhost:11434",
"available_models": [
{
"name": "qwen2.5-coder",
"display_name": "qwen 2.5 coder 32K",
"max_tokens": 32768,
"supports_tools": true,
"supports_thinking": true,
"supports_images": true
}
]
}
}
}
```
If you specify a context length that is too large for your hardware, Ollama will log an error.
You can watch these logs by running: `tail -f ~/.ollama/logs/ollama.log` (macOS) or `journalctl -u ollama -f` (Linux).
Depending on the memory available on your machine, you may need to adjust the context length to a smaller value.
You may also optionally specify a value for `keep_alive` for each available model.
This can be an integer (seconds) or alternatively a string duration like "5m", "10m", "1h", "1d", etc.
For example, `"keep_alive": "120s"` will allow the remote server to unload the model (freeing up GPU VRAM) after 120 seconds.
The `supports_tools` option controls whether or not the model will use additional tools.
If the model is tagged with `tools` in the Ollama catalog this option should be supplied, and built in profiles `Ask` and `Write` can be used.
If the model is not tagged with `tools` in the Ollama catalog, this option can still be supplied with value `true`; however be aware that only the `Minimal` built in profile will work.
The `supports_thinking` option controls whether or not the model will perform an explicit “thinking” (reasoning) pass before producing its final answer.
If the model is tagged with `thinking` in the Ollama catalog, set this option and you can use it in zed.
The `supports_images` option enables the model’s vision capabilities, allowing it to process images included in the conversation context.
If the model is tagged with `vision` in the Ollama catalog, set this option and you can use it in zed.
### OpenAI {#openai}
> ✅ Supports tool use
1. Visit the OpenAI platform and [create an API key](https://platform.openai.com/account/api-keys)
2. Make sure that your OpenAI account has credits
3. Open the settings view (`agent: open configuration`) and go to the OpenAI section
4. Enter your OpenAI API key
The OpenAI API key will be saved in your keychain.
Zed will also use the `OPENAI_API_KEY` environment variable if it's defined.
#### Custom Models {#openai-custom-models}
The Zed Assistant comes pre-configured to use the latest version for common models (GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o mini).
To use alternate models, perhaps a preview release or a dated model release, or if you wish to control the request parameters, you can do so by adding the following to your Zed `settings.json`: