Model Serving¶
Two strategies for running AI models. Choose when you create the project.
Centralized (recommended)¶
One shared Ollama instance serves all agents. Less RAM, easier to manage.
abi-core create project my-app --model-serving centralized
┌─────────────┐
│ Ollama │ ← All agents connect here
└─────────────┘
↑
Agent 1, Agent 2, Agent 3
Agents point to the same OLLAMA_HOST:
environment:
- OLLAMA_HOST=http://my-app-ollama:11434
- START_OLLAMA=false
Distributed¶
Each agent runs its own Ollama. Full isolation, independent model versions.
abi-core create project my-app --model-serving distributed
Agent 1 ← Ollama 1 (qwen2.5:3b)
Agent 2 ← Ollama 2 (llama3:8b)
Agent 3 ← Ollama 3 (mistral:7b)
Agents manage their own Ollama:
environment:
- OLLAMA_HOST=http://localhost:11434
- START_OLLAMA=true
- LOAD_MODELS=true
Cloud providers (no Ollama needed)¶
If your agent uses OpenAI, Gemini, or another cloud provider, it doesn’t need Ollama at all:
# config.py
LLM_CONFIG = {
"provider": "openai",
"model": "gpt-4o",
"api_key": os.getenv("OPENAI_API_KEY"),
}
You can mix: some agents use local Ollama, others use cloud APIs. Each agent has its own LLM_CONFIG.
Switch strategy¶
Edit .abi/runtime.yaml:
project:
model_serving: centralized # or distributed
Then rebuild: docker compose up --build -d