Model Serving

Two strategies for running AI models. Choose when you create the project.

Distributed

Each agent runs its own Ollama. Full isolation, independent model versions.

abi-core create project my-app --model-serving distributed
Agent 1 ← Ollama 1 (qwen2.5:3b)
Agent 2 ← Ollama 2 (llama3:8b)
Agent 3 ← Ollama 3 (mistral:7b)

Agents manage their own Ollama:

environment:
  - OLLAMA_HOST=http://localhost:11434
  - START_OLLAMA=true
  - LOAD_MODELS=true

Cloud providers (no Ollama needed)

If your agent uses OpenAI, Gemini, or another cloud provider, it doesn’t need Ollama at all:

# config.py
LLM_CONFIG = {
    "provider": "openai",
    "model": "gpt-4o",
    "api_key": os.getenv("OPENAI_API_KEY"),
}

You can mix: some agents use local Ollama, others use cloud APIs. Each agent has its own LLM_CONFIG.

Switch strategy

Edit .abi/runtime.yaml:

project:
  model_serving: centralized  # or distributed

Then rebuild: docker compose up --build -d

Next step

👉 Monitoring & Logs