Skip to Content
AdaptersOllama Adapter

Ollama Adapter

Overview

Mechanism: HTTP POST to local Ollama server (/api/chat) with stream: true.

Why Ollama for cheap/local tasks:

  • Zero cost per call — ideal for classification, routing, extraction
  • Sensitive client data never leaves the machine — data sovereignty
  • Works offline / air-gapped — required for some enterprise on-prem tenants
  • No rate limits, no API quotas

When to use:

  • Topic Researcher, Research Note Writer
  • Content Researcher (scraping + extraction)
  • Task classification and routing
  • Session summarisation (reduce cloud spend)
  • Any task involving sensitive client data where the tenant has set dataPrivacyLevel: 'local_only'
  • Enterprise on-prem tenants that have disabled cloud providers

Configuration

interface OllamaAdapterConfig { model: string; // e.g. 'gemma3:4b', 'llama3.2', 'mistral' baseUrl: string; // default: http://ollama:11434 timeoutMs: number; }

How Data Flows IN

POST http://ollama:11434/api/chat Content-Type: application/json { "model": "gemma3:4b", "stream": true, "messages": [ { "role": "system", "content": "<skills content prepended + system prompt>" }, { "role": "user", "content": "<task prompt>" } ] }

The system content is the skills Markdown concatenated with the system prompt. Ollama’s message format mirrors OpenAI’s chat format. Conversation history is threaded as prior user/assistant turns appended to the messages array.


How Data Flows OUT — NDJSON Streaming

{ "model": "gemma3:4b", "message": { "role": "assistant", "content": "Why " }, "done": false } { "model": "gemma3:4b", "message": { "role": "assistant", "content": "Your " }, "done": false } { "model": "gemma3:4b", "message": { "role": "assistant", "content": "" }, "done": true, "eval_count": 89, "prompt_eval_count": 201 }

The adapter reads the response body line by line (NDJSON). When done: false, it accumulates message.content. When done: true, it captures eval_count (output tokens) and prompt_eval_count (input tokens) for cost logging. Cost is always $0.00 for local models.


Session Handling

Ollama has no built-in session management. History is maintained manually:

  • Message history stored in MongoDB sessions collection (same pattern as OpenAI)
  • Context limit is model-dependent (Gemma 3 4B = 8k tokens); older turns are summarised when approaching the limit

Skills Injection

There is no --add-dir equivalent. Skills content is injected directly into the system message:

  • Skills content is retrieved from MongoDB as Markdown strings
  • Concatenated and prepended to the system message: [skill1content]\n\n---\n\n[skill2content]\n\n---\n\n[systemPrompt]
  • All content is sent upfront with every request — this consumes tokens on every call, unlike Claude’s --add-dir which is lazy (Claude reads files only when it needs them)

Health Checks

Before dispatching a task, the worker optionally runs testAdapter() to verify the adapter is operational.

OllamaAdapter checks:

CheckWhat it verifies
Server reachableGET http://ollama:11434/api/tags returns 200
Model availableConfigured model name appears in the tags list

When health checks run:

  • On agent creation — validate before the agent is set to active
  • On manual “Test Connection” from the Manage App agent config screen
  • Optionally on worker startup — configurable per deployment; skipped if ADAPTER_HEALTH_CHECK_ON_START=false
  • Health check failures do not block existing queued work — they surface as a warning on the agent config record

Cost Source

Always $0.00 — local inference, no API billing. Token counts (eval_count + prompt_eval_count) are still recorded in llm_calls for usage visibility.


Timeout

HTTP fetch with AbortController at timeoutMs. The activity run is marked failed with error: 'timeout' and BullMQ retry policy picks it up.

© 2026 Leadmetrics — Internal use only