Model Serving¶

Two strategies for running AI models. Choose when you create the project.

Centralized (recommended)¶

One shared Ollama instance serves all agents. Less RAM, easier to manage.

abi-core create project my-app --model-serving centralized

┌─────────────┐
│   Ollama    │ ← All agents connect here
└─────────────┘
      ↑
  Agent 1, Agent 2, Agent 3

Agents point to the same OLLAMA_HOST:

environment:
  - OLLAMA_HOST=http://my-app-ollama:11434
  - START_OLLAMA=false

Each agent runs its own Ollama. Full isolation, independent model versions.

abi-core create project my-app --model-serving distributed

Agent 1 ← Ollama 1 (qwen2.5:3b)
Agent 2 ← Ollama 2 (llama3:8b)
Agent 3 ← Ollama 3 (mistral:7b)

Agents manage their own Ollama:

environment:
  - OLLAMA_HOST=http://localhost:11434
  - START_OLLAMA=true
  - LOAD_MODELS=true

If your agent uses OpenAI, Gemini, or another cloud provider, it doesn’t need Ollama at all:

# config.py
LLM_CONFIG = {
    "provider": "openai",
    "model": "gpt-4o",
    "api_key": os.getenv("OPENAI_API_KEY"),
}

You can mix: some agents use local Ollama, others use cloud APIs. Each agent has its own LLM_CONFIG.

Edit .abi/runtime.yaml:

project:
  model_serving: centralized  # or distributed

Then rebuild: docker compose up --build -d