Skip to Content
Performance

Performance

Performance practices, patterns, and targets across the Leadmetrics platform. Covers database, API, queue, frontend, mobile, LLM cost optimisation, and observability.

Related: Tech Stack — Backend | Tech Stack — Web | Tech Stack — Mobile | Infrastructure | Observability


Performance Targets

MetricTargetMeasured by
API p50 response time< 80 msOpenTelemetry spans → Grafana
API p99 response time< 500 msOpenTelemetry spans → Grafana
Dashboard initial page load (LCP)< 2.5 sVercel Speed Insights / Lighthouse
Dashboard Time to Interactive< 3.5 sLighthouse CI in GitHub Actions
Mobile app cold start< 2 sReact Native Performance Monitor
Mobile app frame rate60 fps (no jank)Flipper / React Native Perf
BullMQ job pick-up latency< 500 ms from enqueueBullMQ metrics → Grafana
Agent dispatch-to-first-token< 3 sActivity run dispatchedAtfirstTokenAt
SSE first-event latency< 200 msOpenTelemetry span from enqueue to SSE write
PostgreSQL query p99< 20 mspg_stat_statements → Grafana

Database Performance

PostgreSQL

Indexing strategy:

  • Every tenant_id column is indexed — all queries are tenant-scoped; this is the most-used filter
  • Composite indexes on the most common query patterns:
    -- Activities: tenant + status (approval queue, activity list) CREATE INDEX idx_activities_tenant_status ON activities(tenant_id, status); -- Activities: tenant + deliverable (deliverable detail view) CREATE INDEX idx_activities_tenant_deliverable ON activities(tenant_id, deliverable_id); -- LLM calls: tenant + created_on (cost dashboard aggregation) CREATE INDEX idx_llm_calls_tenant_created ON llm_calls(tenant_id, created_on DESC); -- Approvals: tenant + status + expires_at (approval queue + expiry cron) CREATE INDEX idx_approvals_tenant_status_expiry ON approvals(tenant_id, status, expires_at);
  • Partial indexes for hot paths:
    -- Only index pending approvals (completed approvals are cold) CREATE INDEX idx_approvals_pending ON approvals(tenant_id, expires_at) WHERE status = 'pending'; -- Only index non-terminated agents (terminated agents never queried in workers) CREATE INDEX idx_agent_configs_active ON agent_configs(tenant_id, role) WHERE status != 'terminated';

Connection pooling:

  • PgBouncer in transaction mode with pool size 25 per app instance
  • Connection string points to PgBouncer, not PostgreSQL directly
  • Each Fastify worker uses a pool of max 10 connections to PgBouncer

Query patterns:

  • No N+1 queries — Prisma include relations are used for joins; never fetch a list then loop to fetch related records
  • SELECT * is banned in application code (ESLint rule) — only required columns are fetched
  • Aggregation queries (cost dashboards, goal progress) use PostgreSQL GROUP BY with index-covered filters; never computed in JavaScript
  • Read-heavy list endpoints use LIMIT/OFFSET-free cursor pagination — no full table scans to compute offsets

Slow query monitoring:

  • pg_stat_statements enabled with track = all
  • Queries exceeding 100 ms logged to Grafana Loki with full query text and EXPLAIN plan
  • Alert fires when p99 query time exceeds 50 ms for more than 2 minutes

MongoDB

Index strategy:

  • tenantId compound index on every collection
  • activity_streams: TTL index on createdAt (24h expiry) — streaming buffers auto-cleared
  • audit_logs: compound index on (tenantId, createdOn DESC) for paginated log views
  • skills: text index on name + content for skill search

Lean queries:

  • .lean() used on all read-only Mongoose queries — skips hydrating full Mongoose Document objects
  • Projection specified on all queries — { content: 1 } not {} (fetch all fields)

Write patterns:

  • activity_streams (SSE buffer): uses insertOne not insertMany for real-time low-latency writes
  • audit_logs: fire-and-forget async writes — audit logging never delays the API response

Qdrant (Vector Store)

  • Collections are created with HNSW index (default) — approximate nearest-neighbour for sub-10ms similarity search at scale
  • Embedding batch size: 32 documents per batch on upload — balances throughput vs memory
  • Payload filters always applied before vector search to reduce candidate set (filter on tenantId first)

API Performance

Fastify Configuration

  • logger: false in production — structured logging via Pino directly; Fastify’s built-in logger adds overhead
  • Schema-based serialisation: every response type has a JSON Schema defined in the route; Fastify uses fast-json-stringify to serialise responses ~2× faster than JSON.stringify
  • keepAliveTimeout: 5000 — keep HTTP connections alive to avoid TCP handshake overhead for the same client

Response Caching

EndpointCacheTTLInvalidation
GET /mobile/v1/homeRedis (per tenant)60 sOn approval created/resolved, activity status change
GET /dashboard/v1/analytics/spendRedis (per tenant + period)5 minOn new llm_calls records
GET /admin/v1/system/healthRedis (global)10 sNone (polling-safe)
GET /agent/v1/skills/:role manifestRedis (per tenant + role)5 minOn skill assignment change
GET /.well-known/jwks.jsonHTTP Cache-Control: max-age=36001 hOn key rotation

Cache keys include the tenant ID to prevent cross-tenant cache poisoning.

Rate Limiting

Rate limits are enforced in Redis using a sliding window — this protects downstream services from being overloaded by a single client and creates natural backpressure. See API Overview for per-surface limits.

Compression

  • Fastify @fastify/compress enabled for responses > 1 KB
  • Brotli for modern browsers; gzip fallback
  • SSE streams are not compressed (streaming incompatibility)

Queue Performance (BullMQ)

Concurrency tuning

Each agent role has a configured concurrency on agent_configs:

AgentDefault concurrencyNotes
Activity Planner2Orchestration — low concurrency, high reasoning
Copywriter4Common deliverable — benefits from parallelism
SEO Specialist3Tool-heavy — rate-limited by external APIs
Social Media Manager4Similar to Copywriter
Paid Ads Manager2API rate limits on Google/Meta constrain parallelism
Data Analyst2GA4 quota limits
Content Researcher8Ollama (local, free, fast) — high concurrency fine

Priority queuing

Job priority is set at enqueue time. High-priority jobs (triggered by human approval of time-sensitive deliverables) jump ahead of normal background tasks. Priority levels: 1 (high) → 10 (low, background).

Job deduplication

Recurring jobs use a deduplication key {tenantId}:{templateId}:{periodKey} — if a cron job fires and an identical job is already in the queue, the duplicate is dropped silently.

Dead letter handling

Failed jobs beyond their retry limit move to the dead letter queue. The dead letter handler:

  1. Sets agent status to error
  2. Creates a human escalation activity
  3. Emits an SSE event to the DM portal
  4. Fires an alert to Grafana

The dead letter queue is monitored; sustained dead letter growth triggers an on-call alert.


Frontend Performance (Web)

Server Components first

  • Campaign lists, deliverable content, reports, approval detail — rendered on the server (no client-side data fetch, no hydration overhead)
  • Only interactive components (forms, live feeds, charts) are Client Components
  • Large markdown content (blog deliverables) rendered server-side to HTML with remark — no react-markdown shipped to the client

Code splitting

  • Next.js automatic per-route code splitting
  • Heavy components (TipTap editor, Recharts) lazy-loaded with next/dynamic:
    const RichTextEditor = dynamic(() => import('./rich-text-editor'), { loading: () => <Skeleton />, ssr: false, });
  • The approval detail screen (D5/D6) loads TipTap only when the user opens an approval — not on initial page load

Image optimisation

  • All tenant-uploaded images served via Next.js <Image> component — automatic WebP conversion, responsive sizes, lazy loading
  • Avatar images: 64×64 px max, served from CDN with long Cache-Control

Bundle size

  • pnpm build output analysed in CI with @next/bundle-analyzer
  • Bundle size budget: Dashboard main bundle < 200 KB gzipped
  • Alert when a PR increases the main bundle by > 10 KB

TanStack Query caching

  • All API responses cached in TanStack Query with appropriate staleTime:
    // Approval list: stale after 30s (approvals change frequently) useQuery({ queryKey: ['approvals'], staleTime: 30_000 }); // Campaign list: stale after 2min (campaigns change infrequently) useQuery({ queryKey: ['campaigns'], staleTime: 120_000 }); // Analytics/spend: stale after 5min useQuery({ queryKey: ['spend'], staleTime: 300_000 });
  • Background refetch on window focus for time-sensitive data (approvals, activity status)
  • SSE events invalidate query cache selectively: approval_created invalidates ['approvals'], not the full cache

Virtualisation

  • Long lists (activity feeds, lead lists, LLM call logs) use TanStack Virtual for windowed rendering — only visible rows are in the DOM

Mobile Performance

FlatList over ScrollView

All list screens use FlatList with windowSize={5} and maxToRenderPerBatch={10}. ScrollView is only used for non-list content (campaign detail summary section).

Image caching

  • react-native-fast-image for avatar and channel logo images — memory + disk cache; no re-download on revisit

Avoiding re-renders

  • useCallback and useMemo for functions and derived values passed to FlatList renderItem
  • React.memo on list item components to prevent re-renders when parent re-renders
  • TanStack Query’s select option used to derive minimal data:
    useQuery({ queryKey: ['approvals'], select: (data) => data.filter(a => a.riskLevel === 'high'), });

Hermes JS engine

Hermes is enabled (default since React Native 0.70) — AOT bytecode compilation reduces JS startup time by ~30% vs JSCore.

Offline-first reads

TanStack Query + MMKV persistence means all previously loaded data is available instantly on app reopen — no loading spinner for returning users. Stale data shows immediately; fresh data loads in background.

Push over polling

The mobile app never polls for updates. All real-time data arrives via push notifications (for locked-screen events) or SSE (for in-app live feeds). This eliminates background polling battery drain.


LLM Cost & Performance Optimisation

Model selection by task

Expensive cloud models are used only where quality is critical:

Task quality requirementModelCost
Brand-sensitive copy, strategyClaude Sonnet 4.6$$$
Classification, routing, extractionOllama gemma3:4b$0
Session summarisationOllama gemma3:4b$0
Competitor research scrapingOllama gemma3:4b$0

Using Ollama for classification and research tasks saves ~30–40% of total LLM spend.

Token budgeting

  • Two-stage skill loading (manifest → on-demand load) prevents injecting full skill content for irrelevant skills — see Skills System
  • buildActivityPrompt() injects only the context relevant to the wakeReason — revision activities don’t re-inject the full history, only the delta
  • maxTurnsPerRun limits agent turns per heartbeat; prevents runaway multi-turn loops

Per-activity cost caps

agent_configs.max_cost_usd_per_activity halts dispatch if the current run would exceed the cap. Combined with campaigns.budget_cap_usd and tenants.monthly_spend_cap_usd, spend is bounded at three levels.

Session resumption

Claude Code CLI session resumption (--resume <sessionId>) carries forward context from prior runs without re-sending history. This avoids re-tokenising prior output on retry or continuation activities — potentially saving hundreds of input tokens per run.


Observability for Performance

All performance work is grounded in measurement. We don’t optimise without data.

Tracing

Every API request, database query, BullMQ job, and LLM call is traced with OpenTelemetry spans. Distributed traces are visualised in Grafana Tempo. Slow traces surface automatically via anomaly detection.

Key Grafana dashboards

DashboardKey signals
API latencyp50/p90/p99 per route, error rates
Queue healthJob waiting time, active workers, dead letter count
LLM costsSpend by tenant/agent/model, token usage per run, cost per deliverable
DatabasePostgreSQL query time, connection pool saturation, slow queries
SSE connectionsActive connections, event throughput, reconnect rate

Alerting thresholds

SignalWarningCritical
API p99 > threshold> 300 ms> 1 000 ms
Queue wait time> 30 s> 5 min
Dead letter jobs> 5 in 10 min> 20 in 10 min
PostgreSQL p99 query> 20 ms> 100 ms
LLM cost spike> 2× daily average> 5× daily average
Error rate> 1%> 5%

Lighthouse CI

Every PR that touches a Next.js app runs Lighthouse CI against a preview deployment:

  • Performance score must be ≥ 85
  • LCP must be < 2.5 s
  • CLS must be < 0.1
  • Failures block merge

Performance Testing

Load testing

Tool: k6

Load tests run against the staging environment before each major release:

ScenarioVirtual usersDurationAcceptance criteria
Approval queue load50 VU5 minp99 < 200 ms, 0 errors
Campaign submit burst20 VU2 minp99 < 500 ms, 0 5xx
SSE concurrent connections200 VU10 minAll connections maintained, < 1% drop
Agent callback burst100 VU2 minp99 < 100 ms, 0 errors

Database query benchmarks

Critical queries have benchmark tests in tests/benchmarks/ using Vitest with real PostgreSQL:

  • Approval queue fetch for a tenant with 1 000 pending approvals — target < 5 ms
  • Activity list with 10 000 activities — target < 10 ms
  • LLM cost aggregation for 1M rows — target < 100 ms

These run in CI nightly (not on every PR — too slow).

© 2026 Leadmetrics — Internal use only