Performance
Performance practices, patterns, and targets across the Leadmetrics platform. Covers database, API, queue, frontend, mobile, LLM cost optimisation, and observability.
Related: Tech Stack — Backend | Tech Stack — Web | Tech Stack — Mobile | Infrastructure | Observability
Performance Targets
| Metric | Target | Measured by |
|---|---|---|
| API p50 response time | < 80 ms | OpenTelemetry spans → Grafana |
| API p99 response time | < 500 ms | OpenTelemetry spans → Grafana |
| Dashboard initial page load (LCP) | < 2.5 s | Vercel Speed Insights / Lighthouse |
| Dashboard Time to Interactive | < 3.5 s | Lighthouse CI in GitHub Actions |
| Mobile app cold start | < 2 s | React Native Performance Monitor |
| Mobile app frame rate | 60 fps (no jank) | Flipper / React Native Perf |
| BullMQ job pick-up latency | < 500 ms from enqueue | BullMQ metrics → Grafana |
| Agent dispatch-to-first-token | < 3 s | Activity run dispatchedAt → firstTokenAt |
| SSE first-event latency | < 200 ms | OpenTelemetry span from enqueue to SSE write |
| PostgreSQL query p99 | < 20 ms | pg_stat_statements → Grafana |
Database Performance
PostgreSQL
Indexing strategy:
- Every
tenant_idcolumn is indexed — all queries are tenant-scoped; this is the most-used filter - Composite indexes on the most common query patterns:
-- Activities: tenant + status (approval queue, activity list) CREATE INDEX idx_activities_tenant_status ON activities(tenant_id, status); -- Activities: tenant + deliverable (deliverable detail view) CREATE INDEX idx_activities_tenant_deliverable ON activities(tenant_id, deliverable_id); -- LLM calls: tenant + created_on (cost dashboard aggregation) CREATE INDEX idx_llm_calls_tenant_created ON llm_calls(tenant_id, created_on DESC); -- Approvals: tenant + status + expires_at (approval queue + expiry cron) CREATE INDEX idx_approvals_tenant_status_expiry ON approvals(tenant_id, status, expires_at); - Partial indexes for hot paths:
-- Only index pending approvals (completed approvals are cold) CREATE INDEX idx_approvals_pending ON approvals(tenant_id, expires_at) WHERE status = 'pending'; -- Only index non-terminated agents (terminated agents never queried in workers) CREATE INDEX idx_agent_configs_active ON agent_configs(tenant_id, role) WHERE status != 'terminated';
Connection pooling:
- PgBouncer in transaction mode with pool size 25 per app instance
- Connection string points to PgBouncer, not PostgreSQL directly
- Each Fastify worker uses a pool of max 10 connections to PgBouncer
Query patterns:
- No N+1 queries — Prisma
includerelations are used for joins; never fetch a list then loop to fetch related records SELECT *is banned in application code (ESLint rule) — only required columns are fetched- Aggregation queries (cost dashboards, goal progress) use PostgreSQL
GROUP BYwith index-covered filters; never computed in JavaScript - Read-heavy list endpoints use
LIMIT/OFFSET-free cursor pagination — no full table scans to compute offsets
Slow query monitoring:
pg_stat_statementsenabled withtrack = all- Queries exceeding 100 ms logged to Grafana Loki with full query text and
EXPLAINplan - Alert fires when p99 query time exceeds 50 ms for more than 2 minutes
MongoDB
Index strategy:
tenantIdcompound index on every collectionactivity_streams: TTL index oncreatedAt(24h expiry) — streaming buffers auto-clearedaudit_logs: compound index on(tenantId, createdOn DESC)for paginated log viewsskills: text index onname+contentfor skill search
Lean queries:
.lean()used on all read-only Mongoose queries — skips hydrating full Mongoose Document objects- Projection specified on all queries —
{ content: 1 }not{}(fetch all fields)
Write patterns:
activity_streams(SSE buffer): usesinsertOnenotinsertManyfor real-time low-latency writesaudit_logs: fire-and-forget async writes — audit logging never delays the API response
Qdrant (Vector Store)
- Collections are created with HNSW index (default) — approximate nearest-neighbour for sub-10ms similarity search at scale
- Embedding batch size: 32 documents per batch on upload — balances throughput vs memory
- Payload filters always applied before vector search to reduce candidate set (filter on
tenantIdfirst)
API Performance
Fastify Configuration
logger: falsein production — structured logging via Pino directly; Fastify’s built-in logger adds overhead- Schema-based serialisation: every response type has a JSON Schema defined in the route; Fastify uses
fast-json-stringifyto serialise responses ~2× faster thanJSON.stringify keepAliveTimeout: 5000— keep HTTP connections alive to avoid TCP handshake overhead for the same client
Response Caching
| Endpoint | Cache | TTL | Invalidation |
|---|---|---|---|
GET /mobile/v1/home | Redis (per tenant) | 60 s | On approval created/resolved, activity status change |
GET /dashboard/v1/analytics/spend | Redis (per tenant + period) | 5 min | On new llm_calls records |
GET /admin/v1/system/health | Redis (global) | 10 s | None (polling-safe) |
GET /agent/v1/skills/:role manifest | Redis (per tenant + role) | 5 min | On skill assignment change |
GET /.well-known/jwks.json | HTTP Cache-Control: max-age=3600 | 1 h | On key rotation |
Cache keys include the tenant ID to prevent cross-tenant cache poisoning.
Rate Limiting
Rate limits are enforced in Redis using a sliding window — this protects downstream services from being overloaded by a single client and creates natural backpressure. See API Overview for per-surface limits.
Compression
- Fastify
@fastify/compressenabled for responses > 1 KB - Brotli for modern browsers; gzip fallback
- SSE streams are not compressed (streaming incompatibility)
Queue Performance (BullMQ)
Concurrency tuning
Each agent role has a configured concurrency on agent_configs:
| Agent | Default concurrency | Notes |
|---|---|---|
| Activity Planner | 2 | Orchestration — low concurrency, high reasoning |
| Copywriter | 4 | Common deliverable — benefits from parallelism |
| SEO Specialist | 3 | Tool-heavy — rate-limited by external APIs |
| Social Media Manager | 4 | Similar to Copywriter |
| Paid Ads Manager | 2 | API rate limits on Google/Meta constrain parallelism |
| Data Analyst | 2 | GA4 quota limits |
| Content Researcher | 8 | Ollama (local, free, fast) — high concurrency fine |
Priority queuing
Job priority is set at enqueue time. High-priority jobs (triggered by human approval of time-sensitive deliverables) jump ahead of normal background tasks. Priority levels: 1 (high) → 10 (low, background).
Job deduplication
Recurring jobs use a deduplication key {tenantId}:{templateId}:{periodKey} — if a cron job fires and an identical job is already in the queue, the duplicate is dropped silently.
Dead letter handling
Failed jobs beyond their retry limit move to the dead letter queue. The dead letter handler:
- Sets agent status to
error - Creates a human escalation activity
- Emits an SSE event to the DM portal
- Fires an alert to Grafana
The dead letter queue is monitored; sustained dead letter growth triggers an on-call alert.
Frontend Performance (Web)
Server Components first
- Campaign lists, deliverable content, reports, approval detail — rendered on the server (no client-side data fetch, no hydration overhead)
- Only interactive components (forms, live feeds, charts) are Client Components
- Large markdown content (blog deliverables) rendered server-side to HTML with
remark— noreact-markdownshipped to the client
Code splitting
- Next.js automatic per-route code splitting
- Heavy components (TipTap editor, Recharts) lazy-loaded with
next/dynamic:const RichTextEditor = dynamic(() => import('./rich-text-editor'), { loading: () => <Skeleton />, ssr: false, }); - The approval detail screen (D5/D6) loads TipTap only when the user opens an approval — not on initial page load
Image optimisation
- All tenant-uploaded images served via Next.js
<Image>component — automatic WebP conversion, responsive sizes, lazy loading - Avatar images: 64×64 px max, served from CDN with long
Cache-Control
Bundle size
pnpm buildoutput analysed in CI with@next/bundle-analyzer- Bundle size budget: Dashboard main bundle < 200 KB gzipped
- Alert when a PR increases the main bundle by > 10 KB
TanStack Query caching
- All API responses cached in TanStack Query with appropriate
staleTime:// Approval list: stale after 30s (approvals change frequently) useQuery({ queryKey: ['approvals'], staleTime: 30_000 }); // Campaign list: stale after 2min (campaigns change infrequently) useQuery({ queryKey: ['campaigns'], staleTime: 120_000 }); // Analytics/spend: stale after 5min useQuery({ queryKey: ['spend'], staleTime: 300_000 }); - Background refetch on window focus for time-sensitive data (approvals, activity status)
- SSE events invalidate query cache selectively:
approval_createdinvalidates['approvals'], not the full cache
Virtualisation
- Long lists (activity feeds, lead lists, LLM call logs) use TanStack Virtual for windowed rendering — only visible rows are in the DOM
Mobile Performance
FlatList over ScrollView
All list screens use FlatList with windowSize={5} and maxToRenderPerBatch={10}. ScrollView is only used for non-list content (campaign detail summary section).
Image caching
react-native-fast-imagefor avatar and channel logo images — memory + disk cache; no re-download on revisit
Avoiding re-renders
useCallbackanduseMemofor functions and derived values passed toFlatListrenderItemReact.memoon list item components to prevent re-renders when parent re-renders- TanStack Query’s select option used to derive minimal data:
useQuery({ queryKey: ['approvals'], select: (data) => data.filter(a => a.riskLevel === 'high'), });
Hermes JS engine
Hermes is enabled (default since React Native 0.70) — AOT bytecode compilation reduces JS startup time by ~30% vs JSCore.
Offline-first reads
TanStack Query + MMKV persistence means all previously loaded data is available instantly on app reopen — no loading spinner for returning users. Stale data shows immediately; fresh data loads in background.
Push over polling
The mobile app never polls for updates. All real-time data arrives via push notifications (for locked-screen events) or SSE (for in-app live feeds). This eliminates background polling battery drain.
LLM Cost & Performance Optimisation
Model selection by task
Expensive cloud models are used only where quality is critical:
| Task quality requirement | Model | Cost |
|---|---|---|
| Brand-sensitive copy, strategy | Claude Sonnet 4.6 | $$$ |
| Classification, routing, extraction | Ollama gemma3:4b | $0 |
| Session summarisation | Ollama gemma3:4b | $0 |
| Competitor research scraping | Ollama gemma3:4b | $0 |
Using Ollama for classification and research tasks saves ~30–40% of total LLM spend.
Token budgeting
- Two-stage skill loading (manifest → on-demand load) prevents injecting full skill content for irrelevant skills — see Skills System
buildActivityPrompt()injects only the context relevant to thewakeReason— revision activities don’t re-inject the full history, only the deltamaxTurnsPerRunlimits agent turns per heartbeat; prevents runaway multi-turn loops
Per-activity cost caps
agent_configs.max_cost_usd_per_activity halts dispatch if the current run would exceed the cap. Combined with campaigns.budget_cap_usd and tenants.monthly_spend_cap_usd, spend is bounded at three levels.
Session resumption
Claude Code CLI session resumption (--resume <sessionId>) carries forward context from prior runs without re-sending history. This avoids re-tokenising prior output on retry or continuation activities — potentially saving hundreds of input tokens per run.
Observability for Performance
All performance work is grounded in measurement. We don’t optimise without data.
Tracing
Every API request, database query, BullMQ job, and LLM call is traced with OpenTelemetry spans. Distributed traces are visualised in Grafana Tempo. Slow traces surface automatically via anomaly detection.
Key Grafana dashboards
| Dashboard | Key signals |
|---|---|
| API latency | p50/p90/p99 per route, error rates |
| Queue health | Job waiting time, active workers, dead letter count |
| LLM costs | Spend by tenant/agent/model, token usage per run, cost per deliverable |
| Database | PostgreSQL query time, connection pool saturation, slow queries |
| SSE connections | Active connections, event throughput, reconnect rate |
Alerting thresholds
| Signal | Warning | Critical |
|---|---|---|
| API p99 > threshold | > 300 ms | > 1 000 ms |
| Queue wait time | > 30 s | > 5 min |
| Dead letter jobs | > 5 in 10 min | > 20 in 10 min |
| PostgreSQL p99 query | > 20 ms | > 100 ms |
| LLM cost spike | > 2× daily average | > 5× daily average |
| Error rate | > 1% | > 5% |
Lighthouse CI
Every PR that touches a Next.js app runs Lighthouse CI against a preview deployment:
- Performance score must be ≥ 85
- LCP must be < 2.5 s
- CLS must be < 0.1
- Failures block merge
Performance Testing
Load testing
Tool: k6
Load tests run against the staging environment before each major release:
| Scenario | Virtual users | Duration | Acceptance criteria |
|---|---|---|---|
| Approval queue load | 50 VU | 5 min | p99 < 200 ms, 0 errors |
| Campaign submit burst | 20 VU | 2 min | p99 < 500 ms, 0 5xx |
| SSE concurrent connections | 200 VU | 10 min | All connections maintained, < 1% drop |
| Agent callback burst | 100 VU | 2 min | p99 < 100 ms, 0 errors |
Database query benchmarks
Critical queries have benchmark tests in tests/benchmarks/ using Vitest with real PostgreSQL:
- Approval queue fetch for a tenant with 1 000 pending approvals — target < 5 ms
- Activity list with 10 000 activities — target < 10 ms
- LLM cost aggregation for 1M rows — target < 100 ms
These run in CI nightly (not on every PR — too slow).