Ollama Adapter

Overview

Mechanism: HTTP POST to local Ollama server (/api/chat) with stream: true.

Why Ollama for cheap/local tasks:

Zero cost per call — ideal for classification, routing, extraction
Sensitive client data never leaves the machine — data sovereignty
Works offline / air-gapped — required for some enterprise on-prem tenants
No rate limits, no API quotas

When to use:

Topic Researcher, Research Note Writer
Content Researcher (scraping + extraction)
Task classification and routing
Session summarisation (reduce cloud spend)
Any task involving sensitive client data where the tenant has set dataPrivacyLevel: 'local_only'
Enterprise on-prem tenants that have disabled cloud providers

Configuration


interface OllamaAdapterConfig {
  model:    string;    // e.g. 'gemma3:4b', 'llama3.2', 'mistral'
  baseUrl:  string;    // default: http://ollama:11434
  timeoutMs: number;
}

How Data Flows IN


POST http://ollama:11434/api/chat
Content-Type: application/json

{
  "model": "gemma3:4b",
  "stream": true,
  "messages": [
    { "role": "system", "content": "<skills content prepended + system prompt>" },
    { "role": "user",   "content": "<task prompt>" }
  ]
}

The system content is the skills Markdown concatenated with the system prompt. Ollama’s message format mirrors OpenAI’s chat format. Conversation history is threaded as prior user/assistant turns appended to the messages array.

How Data Flows OUT — NDJSON Streaming


{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "Why " }, "done": false }
{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "Your " }, "done": false }
{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "" }, "done": true, "eval_count": 89, "prompt_eval_count": 201 }

The adapter reads the response body line by line (NDJSON). When done: false, it accumulates message.content. When done: true, it captures eval_count (output tokens) and prompt_eval_count (input tokens) for cost logging. Cost is always $0.00 for local models.

Session Handling

Ollama has no built-in session management. History is maintained manually:

Message history stored in MongoDB sessions collection (same pattern as OpenAI)
Context limit is model-dependent (Gemma 3 4B = 8k tokens); older turns are summarised when approaching the limit

Skills Injection

There is no --add-dir equivalent. Skills content is injected directly into the system message:

Skills content is retrieved from MongoDB as Markdown strings
Concatenated and prepended to the system message: [skill1content]\n\n---\n\n[skill2content]\n\n---\n\n[systemPrompt]
All content is sent upfront with every request — this consumes tokens on every call, unlike Claude’s --add-dir which is lazy (Claude reads files only when it needs them)

Health Checks

Before dispatching a task, the worker optionally runs testAdapter() to verify the adapter is operational.

OllamaAdapter checks:

Check	What it verifies
Server reachable	`GET http://ollama:11434/api/tags` returns 200
Model available	Configured model name appears in the tags list

When health checks run:

On agent creation — validate before the agent is set to active
On manual “Test Connection” from the Manage App agent config screen
Optionally on worker startup — configurable per deployment; skipped if ADAPTER_HEALTH_CHECK_ON_START=false
Health check failures do not block existing queued work — they surface as a warning on the agent config record

Cost Source

Always $0.00 — local inference, no API billing. Token counts (eval_count + prompt_eval_count) are still recorded in llm_calls for usage visibility.

Timeout

HTTP fetch with AbortController at timeoutMs. The activity run is marked failed with error: 'timeout' and BullMQ retry policy picks it up.