# Model Serving Two strategies for running AI models. Choose when you create the project. ## Centralized (recommended) One shared Ollama instance serves all agents. Less RAM, easier to manage. ```bash abi-core create project my-app --model-serving centralized ``` ``` ┌─────────────┐ │ Ollama │ ← All agents connect here └─────────────┘ ↑ Agent 1, Agent 2, Agent 3 ``` Agents point to the same `OLLAMA_HOST`: ```yaml environment: - OLLAMA_HOST=http://my-app-ollama:11434 - START_OLLAMA=false ``` ## Distributed Each agent runs its own Ollama. Full isolation, independent model versions. ```bash abi-core create project my-app --model-serving distributed ``` ``` Agent 1 ← Ollama 1 (qwen2.5:3b) Agent 2 ← Ollama 2 (llama3:8b) Agent 3 ← Ollama 3 (mistral:7b) ``` Agents manage their own Ollama: ```yaml environment: - OLLAMA_HOST=http://localhost:11434 - START_OLLAMA=true - LOAD_MODELS=true ``` ## Cloud providers (no Ollama needed) If your agent uses OpenAI, Gemini, or another cloud provider, it doesn't need Ollama at all: ```python # config.py LLM_CONFIG = { "provider": "openai", "model": "gpt-4o", "api_key": os.getenv("OPENAI_API_KEY"), } ``` You can mix: some agents use local Ollama, others use cloud APIs. Each agent has its own `LLM_CONFIG`. ## Switch strategy Edit `.abi/runtime.yaml`: ```yaml project: model_serving: centralized # or distributed ``` Then rebuild: `docker compose up --build -d` ## Next step 👉 [Monitoring & Logs](02-monitoring-logs.md)