Ollama Adapter
Overview
Mechanism: HTTP POST to local Ollama server (/api/chat) with stream: true.
Why Ollama for cheap/local tasks:
- Zero cost per call — ideal for classification, routing, extraction
- Sensitive client data never leaves the machine — data sovereignty
- Works offline / air-gapped — required for some enterprise on-prem tenants
- No rate limits, no API quotas
When to use:
- Topic Researcher, Research Note Writer
- Content Researcher (scraping + extraction)
- Task classification and routing
- Session summarisation (reduce cloud spend)
- Any task involving sensitive client data where the tenant has set
dataPrivacyLevel: 'local_only' - Enterprise on-prem tenants that have disabled cloud providers
Configuration
interface OllamaAdapterConfig {
model: string; // e.g. 'gemma3:4b', 'llama3.2', 'mistral'
baseUrl: string; // default: http://ollama:11434
timeoutMs: number;
}How Data Flows IN
POST http://ollama:11434/api/chat
Content-Type: application/json
{
"model": "gemma3:4b",
"stream": true,
"messages": [
{ "role": "system", "content": "<skills content prepended + system prompt>" },
{ "role": "user", "content": "<task prompt>" }
]
}The system content is the skills Markdown concatenated with the system prompt. Ollama’s message format mirrors OpenAI’s chat format. Conversation history is threaded as prior user/assistant turns appended to the messages array.
How Data Flows OUT — NDJSON Streaming
{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "Why " }, "done": false }
{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "Your " }, "done": false }
{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "" }, "done": true, "eval_count": 89, "prompt_eval_count": 201 }The adapter reads the response body line by line (NDJSON). When done: false, it accumulates message.content. When done: true, it captures eval_count (output tokens) and prompt_eval_count (input tokens) for cost logging. Cost is always $0.00 for local models.
Session Handling
Ollama has no built-in session management. History is maintained manually:
- Message history stored in MongoDB
sessionscollection (same pattern as OpenAI) - Context limit is model-dependent (Gemma 3 4B = 8k tokens); older turns are summarised when approaching the limit
Skills Injection
There is no --add-dir equivalent. Skills content is injected directly into the system message:
- Skills content is retrieved from MongoDB as Markdown strings
- Concatenated and prepended to the system message:
[skill1content]\n\n---\n\n[skill2content]\n\n---\n\n[systemPrompt] - All content is sent upfront with every request — this consumes tokens on every call, unlike Claude’s
--add-dirwhich is lazy (Claude reads files only when it needs them)
Health Checks
Before dispatching a task, the worker optionally runs testAdapter() to verify the adapter is operational.
OllamaAdapter checks:
| Check | What it verifies |
|---|---|
| Server reachable | GET http://ollama:11434/api/tags returns 200 |
| Model available | Configured model name appears in the tags list |
When health checks run:
- On agent creation — validate before the agent is set to
active - On manual “Test Connection” from the Manage App agent config screen
- Optionally on worker startup — configurable per deployment; skipped if
ADAPTER_HEALTH_CHECK_ON_START=false - Health check failures do not block existing queued work — they surface as a warning on the agent config record
Cost Source
Always $0.00 — local inference, no API billing. Token counts (eval_count + prompt_eval_count) are still recorded in llm_calls for usage visibility.
Timeout
HTTP fetch with AbortController at timeoutMs. The activity run is marked failed with error: 'timeout' and BullMQ retry policy picks it up.