Performance

Performance practices, patterns, and targets across the Leadmetrics platform. Covers database, API, queue, frontend, mobile, LLM cost optimisation, and observability.

Related: Tech Stack — Backend | Tech Stack — Web | Tech Stack — Mobile | Infrastructure | Observability

Performance Targets

Metric	Target	Measured by
API p50 response time	< 80 ms	OpenTelemetry spans → Grafana
API p99 response time	< 500 ms	OpenTelemetry spans → Grafana
Dashboard initial page load (LCP)	< 2.5 s	Vercel Speed Insights / Lighthouse
Dashboard Time to Interactive	< 3.5 s	Lighthouse CI in GitHub Actions
Mobile app cold start	< 2 s	React Native Performance Monitor
Mobile app frame rate	60 fps (no jank)	Flipper / React Native Perf
BullMQ job pick-up latency	< 500 ms from enqueue	BullMQ metrics → Grafana
Agent dispatch-to-first-token	< 3 s	Activity run `dispatchedAt` → `firstTokenAt`
SSE first-event latency	< 200 ms	OpenTelemetry span from enqueue to SSE write
PostgreSQL query p99	< 20 ms	`pg_stat_statements` → Grafana

Database Performance

PostgreSQL

Indexing strategy:

Every tenant_id column is indexed — all queries are tenant-scoped; this is the most-used filter

Composite indexes on the most common query patterns:


-- Activities: tenant + status (approval queue, activity list)
CREATE INDEX idx_activities_tenant_status ON activities(tenant_id, status);
 
-- Activities: tenant + deliverable (deliverable detail view)
CREATE INDEX idx_activities_tenant_deliverable ON activities(tenant_id, deliverable_id);
 
-- LLM calls: tenant + created_on (cost dashboard aggregation)
CREATE INDEX idx_llm_calls_tenant_created ON llm_calls(tenant_id, created_on DESC);
 
-- Approvals: tenant + status + expires_at (approval queue + expiry cron)
CREATE INDEX idx_approvals_tenant_status_expiry ON approvals(tenant_id, status, expires_at);

Partial indexes for hot paths:


-- Only index pending approvals (completed approvals are cold)
CREATE INDEX idx_approvals_pending ON approvals(tenant_id, expires_at)
  WHERE status = 'pending';
 
-- Only index non-terminated agents (terminated agents never queried in workers)
CREATE INDEX idx_agent_configs_active ON agent_configs(tenant_id, role)
  WHERE status != 'terminated';

Connection pooling:

PgBouncer in transaction mode with pool size 25 per app instance
Connection string points to PgBouncer, not PostgreSQL directly
Each Fastify worker uses a pool of max 10 connections to PgBouncer

Query patterns:

No N+1 queries — Prisma include relations are used for joins; never fetch a list then loop to fetch related records
SELECT * is banned in application code (ESLint rule) — only required columns are fetched
Aggregation queries (cost dashboards, goal progress) use PostgreSQL GROUP BY with index-covered filters; never computed in JavaScript
Read-heavy list endpoints use LIMIT/OFFSET-free cursor pagination — no full table scans to compute offsets

Slow query monitoring:

pg_stat_statements enabled with track = all
Queries exceeding 100 ms logged to Grafana Loki with full query text and EXPLAIN plan
Alert fires when p99 query time exceeds 50 ms for more than 2 minutes

MongoDB

Index strategy:

tenantId compound index on every collection
activity_streams: TTL index on createdAt (24h expiry) — streaming buffers auto-cleared
audit_logs: compound index on (tenantId, createdOn DESC) for paginated log views
skills: text index on name + content for skill search

Lean queries:

.lean() used on all read-only Mongoose queries — skips hydrating full Mongoose Document objects
Projection specified on all queries — { content: 1 } not {} (fetch all fields)

Write patterns:

activity_streams (SSE buffer): uses insertOne not insertMany for real-time low-latency writes
audit_logs: fire-and-forget async writes — audit logging never delays the API response

Qdrant (Vector Store)

Collections are created with HNSW index (default) — approximate nearest-neighbour for sub-10ms similarity search at scale
Embedding batch size: 32 documents per batch on upload — balances throughput vs memory
Payload filters always applied before vector search to reduce candidate set (filter on tenantId first)

API Performance

Fastify Configuration

logger: false in production — structured logging via Pino directly; Fastify’s built-in logger adds overhead
Schema-based serialisation: every response type has a JSON Schema defined in the route; Fastify uses fast-json-stringify to serialise responses ~2× faster than JSON.stringify
keepAliveTimeout: 5000 — keep HTTP connections alive to avoid TCP handshake overhead for the same client

Response Caching

Endpoint	Cache	TTL	Invalidation
`GET /mobile/v1/home`	Redis (per tenant)	60 s	On approval created/resolved, activity status change
`GET /dashboard/v1/analytics/spend`	Redis (per tenant + period)	5 min	On new `llm_calls` records
`GET /admin/v1/system/health`	Redis (global)	10 s	None (polling-safe)
`GET /agent/v1/skills/:role` manifest	Redis (per tenant + role)	5 min	On skill assignment change
`GET /.well-known/jwks.json`	HTTP `Cache-Control: max-age=3600`	1 h	On key rotation

Cache keys include the tenant ID to prevent cross-tenant cache poisoning.

Rate Limiting

Rate limits are enforced in Redis using a sliding window — this protects downstream services from being overloaded by a single client and creates natural backpressure. See API Overview for per-surface limits.

Compression

Fastify @fastify/compress enabled for responses > 1 KB
Brotli for modern browsers; gzip fallback
SSE streams are not compressed (streaming incompatibility)

Queue Performance (BullMQ)

Concurrency tuning

Each agent role has a configured concurrency on agent_configs:

Agent	Default concurrency	Notes
Activity Planner	2	Orchestration — low concurrency, high reasoning
Copywriter	4	Common deliverable — benefits from parallelism
SEO Specialist	3	Tool-heavy — rate-limited by external APIs
Social Media Manager	4	Similar to Copywriter
Paid Ads Manager	2	API rate limits on Google/Meta constrain parallelism
Data Analyst	2	GA4 quota limits
Content Researcher	8	Ollama (local, free, fast) — high concurrency fine

Priority queuing

Job priority is set at enqueue time. High-priority jobs (triggered by human approval of time-sensitive deliverables) jump ahead of normal background tasks. Priority levels: 1 (high) → 10 (low, background).

Job deduplication

Recurring jobs use a deduplication key {tenantId}:{templateId}:{periodKey} — if a cron job fires and an identical job is already in the queue, the duplicate is dropped silently.

Dead letter handling

Failed jobs beyond their retry limit move to the dead letter queue. The dead letter handler:

Sets agent status to error
Creates a human escalation activity
Emits an SSE event to the DM portal
Fires an alert to Grafana

The dead letter queue is monitored; sustained dead letter growth triggers an on-call alert.

Frontend Performance (Web)

Server Components first

Campaign lists, deliverable content, reports, approval detail — rendered on the server (no client-side data fetch, no hydration overhead)
Only interactive components (forms, live feeds, charts) are Client Components
Large markdown content (blog deliverables) rendered server-side to HTML with remark — no react-markdown shipped to the client

Code splitting

Next.js automatic per-route code splitting

Heavy components (TipTap editor, Recharts) lazy-loaded with next/dynamic:


const RichTextEditor = dynamic(() => import('./rich-text-editor'), {
  loading: () => <Skeleton />,
  ssr: false,
});

The approval detail screen (D5/D6) loads TipTap only when the user opens an approval — not on initial page load

Image optimisation

All tenant-uploaded images served via Next.js <Image> component — automatic WebP conversion, responsive sizes, lazy loading
Avatar images: 64×64 px max, served from CDN with long Cache-Control

Bundle size

pnpm build output analysed in CI with @next/bundle-analyzer
Bundle size budget: Dashboard main bundle < 200 KB gzipped
Alert when a PR increases the main bundle by > 10 KB

TanStack Query caching

All API responses cached in TanStack Query with appropriate staleTime:


// Approval list: stale after 30s (approvals change frequently)
useQuery({ queryKey: ['approvals'], staleTime: 30_000 });
 
// Campaign list: stale after 2min (campaigns change infrequently)
useQuery({ queryKey: ['campaigns'], staleTime: 120_000 });
 
// Analytics/spend: stale after 5min
useQuery({ queryKey: ['spend'], staleTime: 300_000 });

Background refetch on window focus for time-sensitive data (approvals, activity status)
SSE events invalidate query cache selectively: approval_created invalidates ['approvals'], not the full cache

Virtualisation

Long lists (activity feeds, lead lists, LLM call logs) use TanStack Virtual for windowed rendering — only visible rows are in the DOM

Mobile Performance

FlatList over ScrollView

All list screens use FlatList with windowSize={5} and maxToRenderPerBatch={10}. ScrollView is only used for non-list content (campaign detail summary section).

Image caching

react-native-fast-image for avatar and channel logo images — memory + disk cache; no re-download on revisit

Avoiding re-renders

useCallback and useMemo for functions and derived values passed to FlatList renderItem
React.memo on list item components to prevent re-renders when parent re-renders

TanStack Query’s select option used to derive minimal data:


useQuery({
  queryKey: ['approvals'],
  select: (data) => data.filter(a => a.riskLevel === 'high'),
});

Hermes JS engine

Hermes is enabled (default since React Native 0.70) — AOT bytecode compilation reduces JS startup time by ~30% vs JSCore.

Offline-first reads

TanStack Query + MMKV persistence means all previously loaded data is available instantly on app reopen — no loading spinner for returning users. Stale data shows immediately; fresh data loads in background.

Push over polling

The mobile app never polls for updates. All real-time data arrives via push notifications (for locked-screen events) or SSE (for in-app live feeds). This eliminates background polling battery drain.

LLM Cost & Performance Optimisation

Model selection by task

Expensive cloud models are used only where quality is critical:

Task quality requirement	Model	Cost
Brand-sensitive copy, strategy	Claude Sonnet 4.6	$$$
Classification, routing, extraction	Ollama gemma3:4b	$0
Session summarisation	Ollama gemma3:4b	$0
Competitor research scraping	Ollama gemma3:4b	$0

Using Ollama for classification and research tasks saves ~30–40% of total LLM spend.

Token budgeting

Two-stage skill loading (manifest → on-demand load) prevents injecting full skill content for irrelevant skills — see Skills System
buildActivityPrompt() injects only the context relevant to the wakeReason — revision activities don’t re-inject the full history, only the delta
maxTurnsPerRun limits agent turns per heartbeat; prevents runaway multi-turn loops

Per-activity cost caps

agent_configs.max_cost_usd_per_activity halts dispatch if the current run would exceed the cap. Combined with campaigns.budget_cap_usd and tenants.monthly_spend_cap_usd, spend is bounded at three levels.

Session resumption

Claude Code CLI session resumption (--resume <sessionId>) carries forward context from prior runs without re-sending history. This avoids re-tokenising prior output on retry or continuation activities — potentially saving hundreds of input tokens per run.

Observability for Performance

All performance work is grounded in measurement. We don’t optimise without data.

Tracing

Every API request, database query, BullMQ job, and LLM call is traced with OpenTelemetry spans. Distributed traces are visualised in Grafana Tempo. Slow traces surface automatically via anomaly detection.

Key Grafana dashboards

Dashboard	Key signals
API latency	p50/p90/p99 per route, error rates
Queue health	Job waiting time, active workers, dead letter count
LLM costs	Spend by tenant/agent/model, token usage per run, cost per deliverable
Database	PostgreSQL query time, connection pool saturation, slow queries
SSE connections	Active connections, event throughput, reconnect rate

Alerting thresholds

Signal	Warning	Critical
API p99 > threshold	> 300 ms	> 1 000 ms
Queue wait time	> 30 s	> 5 min
Dead letter jobs	> 5 in 10 min	> 20 in 10 min
PostgreSQL p99 query	> 20 ms	> 100 ms
LLM cost spike	> 2× daily average	> 5× daily average
Error rate	> 1%	> 5%

Lighthouse CI

Every PR that touches a Next.js app runs Lighthouse CI against a preview deployment:

Performance score must be ≥ 85
LCP must be < 2.5 s
CLS must be < 0.1
Failures block merge

Performance Testing

Load testing

Tool: k6

Load tests run against the staging environment before each major release:

Scenario	Virtual users	Duration	Acceptance criteria
Approval queue load	50 VU	5 min	p99 < 200 ms, 0 errors
Campaign submit burst	20 VU	2 min	p99 < 500 ms, 0 5xx
SSE concurrent connections	200 VU	10 min	All connections maintained, < 1% drop
Agent callback burst	100 VU	2 min	p99 < 100 ms, 0 errors

Database query benchmarks

Critical queries have benchmark tests in tests/benchmarks/ using Vitest with real PostgreSQL:

Approval queue fetch for a tenant with 1 000 pending approvals — target < 5 ms
Activity list with 10 000 activities — target < 10 ms
LLM cost aggregation for 1M rows — target < 100 ms

These run in CI nightly (not on every PR — too slow).