Skip to Content
APIAPI Gateway

API Gateway

Status: [To Build] · Pattern: Gateway layer within Fastify (Option A)

All requests to the Fastify API pass through a gateway layer before reaching any surface router. The gateway owns every cross-cutting concern — authentication, tenant resolution, rate limiting, throttling, request logging, and audit — so that none of those concerns leak into the individual surface routers.


Why a Gateway Layer

Previously, auth, rate limiting, and logging were scattered across a shared common/ module imported by each router. The problems with that:

  • Cross-cutting logic duplicated or inconsistently applied across surfaces
  • Adding a new concern (e.g. audit logging) required touching every router
  • No single place to enforce policy changes globally

The gateway layer is a Fastify plugin registered before all routers. Every request — regardless of surface — passes through the same ordered chain of hooks. Routers only see a fully-authenticated, rate-checked, logged request.


Request Lifecycle

Every request flows through this chain in order:

Inbound request (from Traefik) ┌───────────────────────────────────────────────────────┐ │ GATEWAY LAYER │ │ │ │ 1. Request ID assign unique trace ID │ │ 2. IP Extraction resolve real IP behind Traefik │ │ 3. Authentication validate JWT or API key │ │ 4. Tenant Resolution resolve + attach tenantId │ │ 5. Rate Limiting sliding window check (Redis) │ │ 6. Throttling per-surface concurrency cap │ │ 7. Request Log structured log entry (pre) │ │ 8. Audit Pre-hook snapshot state (write ops only) │ │ │ └───────────────────┬───────────────────────────────────┘ │ forward ┌───────────────────────────────────────────────────────┐ │ SURFACE ROUTER │ │ /dashboard/v1 /dm/v1 /admin/v1 │ │ /mobile/v1 /cli/v1 /auth/v1 /agent/v1 │ └───────────────────┬───────────────────────────────────┘ │ response ┌───────────────────────────────────────────────────────┐ │ GATEWAY LAYER (post) │ │ │ │ 9. Response Log status, duration, size │ │ 10. Audit Post-hook write audit record (write ops) │ │ │ └───────────────────────────────────────────────────────┘ Response returned to client

Gateway Responsibilities

1. Request ID

Every request is assigned a ULID requestId before any other processing. It is:

  • Attached to the Fastify request object (req.requestId)
  • Included in every log line for this request
  • Returned in the response header: X-Request-Id: <ulid>
  • Used to correlate logs across services (passed to BullMQ jobs, agent callbacks, MongoDB writes)
// gateway/request-id.ts fastify.addHook('onRequest', async (req) => { req.requestId = ulid(); req.log = req.log.child({ requestId: req.requestId }); });

2. IP Extraction

The API sits behind Traefik. The real client IP is in the X-Forwarded-For header. The gateway normalises this to req.clientIp (used by rate limiting and audit logging).

// Trust only the first IP in the chain (set by Traefik) const xff = req.headers['x-forwarded-for']; req.clientIp = Array.isArray(xff) ? xff[0] : xff?.split(',')[0] ?? req.ip;

3. Authentication

Two credential types are accepted:

CredentialFormatUsed by
JWT (access token)Authorization: Bearer <jwt>Dashboard, DM Portal, Manage, Mobile
API keyAuthorization: ApiKey <key>CLI, agent callbacks, external integrations

JWT validation flow:

  1. Decode header — reject if malformed
  2. Verify signature against JWT_SECRET
  3. Check exp — reject if expired (return 401 with WWW-Authenticate: Bearer error="expired_token")
  4. Attach decoded payload to req.principal

API key validation flow:

  1. Look up key hash in PostgreSQL api_keys table
  2. Check is_active and expires_at
  3. Load associated user + role + tenant scopes
  4. Attach to req.principal in the same shape as JWT principal

Agent callbacks use a short-lived task-scoped JWT (issued per run, 30-min TTL, signed with JWT_SECRET). The gateway validates these the same way as user JWTs — the sub is the runId, not a userId.

Unauthenticated paths (bypass auth hook):

  • POST /auth/v1/login
  • POST /auth/v1/refresh
  • GET /health
interface GatewayPrincipal { id: string; // user ref_id or runId (agent) type: 'human' | 'agent' | 'api_key'; role: 'admin' | 'member' | 'reviewer' | 'super_admin' | 'agent'; tenantId?: string; // absent for super_admin and cross-tenant reviewers appAccess: string[]; // which surfaces this principal can reach keyId?: string; // api_keys.ref_id (api_key type only) }

4. Tenant Resolution

After auth, the gateway resolves and validates the tenant context:

SurfaceSource of tenantId
/dashboard/v1JWT tenantId field
/mobile/v1JWT tenantId field
/dm/v1Optional ?tenantId= query param; validated against reviewer’s assigned tenants
/admin/v1Optional path param :tenantId; no restriction (super_admin only)
/cli/v1Optional ?tenantId= or X-Tenant-Id header; validated against principal’s access
/agent/v1From run record in PostgreSQL (looked up by runId)

The resolved tenant record is attached to req.tenant. All downstream route handlers use req.tenant.id — they never look it up themselves.


5. Rate Limiting

Sliding window rate limits enforced per (tenantId, userId) key in Redis. Limits are configured per surface:

SurfaceRequests / minuteBurst allowance
/dashboard/v1300+60 (20% burst)
/dm/v1600+120
/admin/v1120+24
/mobile/v1200+40
/cli/v11,200+240 (scripts may send bursts)
/agent/v160 per runId

On every request, the gateway:

  1. Increments the Redis counter for rate:{surface}:{tenantId}:{userId}
  2. Sets TTL of 60s on first write
  3. If count > limit: return 429 with Retry-After and X-RateLimit-* headers
  4. Otherwise: attach remaining count to response headers
X-RateLimit-Limit: 300 X-RateLimit-Remaining: 247 X-RateLimit-Reset: 1704067261 # Unix timestamp when window resets Retry-After: 14 # seconds (only on 429)

6. Throttling

Rate limiting counts total requests. Throttling caps concurrent requests per surface to prevent a single client from saturating the API with slow long-running requests (e.g. large SSE connections, file uploads).

SurfaceMax concurrent per user
/dashboard/v110
/dm/v120
/admin/v15
/mobile/v18
/cli/v115
SSE connections5 per user (shared across surfaces)

Implemented with a Redis counter incremented on request start, decremented on response end (including on error/disconnect).


7. Request Logging

Every request is logged as a structured JSON entry before the route handler runs:

{ "level": "info", "time": "2026-04-04T09:00:00.000Z", "requestId": "01ARZ3NDEKTSV4RRFFQ69G5FAV", "method": "POST", "path": "/dm/v1/approvals/01ARZ.../resolve", "surface": "dm", "userId": "01BRZ...", "tenantId": "01CRZ...", "clientIp": "203.0.113.5", "userAgent": "Leadmetrics-CLI/1.0.0" }

And after the handler completes:

{ "level": "info", "requestId": "01ARZ3NDEKTSV4RRFFQ69G5FAV", "status": 200, "durationMs": 42, "responseBytes": 318 }

Logs are written via pino (already in the stack) and shipped to Grafana Loki.


8 & 10. Audit Logging

The gateway writes an audit record for every state-changing operation (POST, PUT, PATCH, DELETE). Read-only GETs are not audited (they are covered by request logs).

Pre-hook (step 8): Before the handler runs, for update/delete operations, the gateway fetches the current state of the resource and attaches it to req.auditBefore.

Post-hook (step 10): After the handler returns successfully, the gateway writes an audit_logs record to PostgreSQL:

interface AuditLog { id: string; // ULID requestId: string; // from step 1 tenantId: string | null; actorId: string; // user ref_id or 'agent:{runId}' actorType: 'human' | 'agent' | 'api_key'; impersonating: string | null; // set if super_admin is impersonating surface: string; // 'dashboard' | 'dm' | 'admin' | 'mobile' | 'cli' | 'agent' method: string; // HTTP method path: string; // full request path action: string; // semantic label, e.g. 'approval.resolve' resourceType: string; // e.g. 'approval', 'activity', 'tenant' resourceId: string; // ULID of the affected resource before: JsonValue | null; // state before change (nullable) after: JsonValue | null; // state after change (nullable) status: number; // HTTP response status durationMs: number; createdAt: Date; }

The action label is set by the route handler via req.setAuditAction('approval.resolve'). If not set, the gateway derives it from the method + path (e.g. POST /dm/v1/approvals/:id/resolvedm.approvals.resolve).

Impersonation flagging: When a super_admin is impersonating a tenant, impersonating is set to the target tenantId. This makes every action traceable even across impersonation sessions.


Package Structure

apps/api/src/ ├── gateway/ │ ├── index.ts # Fastify plugin — registers hooks in order │ ├── request-id.ts # Step 1 — assign ULID request ID │ ├── ip.ts # Step 2 — resolve real client IP │ ├── auth.ts # Step 3 — JWT + API key validation │ ├── tenant.ts # Step 4 — tenant resolution + attachment │ ├── rate-limit.ts # Step 5 — sliding window Redis rate limiter │ ├── throttle.ts # Step 6 — concurrent request cap │ ├── logger.ts # Step 7 + 9 — request + response logs │ └── audit.ts # Step 8 + 10 — audit pre/post hooks ├── routers/ │ ├── auth/ # /auth/v1 │ ├── dashboard/ # /dashboard/v1 │ ├── dm/ # /dm/v1 │ ├── admin/ # /admin/v1 │ ├── mobile/ # /mobile/v1 │ ├── cli/ # /cli/v1 │ └── agent/ # /agent/v1 ├── common/ # Shared non-gateway utilities │ ├── pagination.ts │ ├── error.ts │ └── sse.ts └── index.ts # App bootstrap — register gateway plugin, then routers

Registration order in index.ts:

// 1. Register gateway (must be first — before any routers) await fastify.register(gatewayPlugin); // 2. Register surface routers (gateway hooks already in place) await fastify.register(authRouter, { prefix: '/auth/v1' }); await fastify.register(dashboardRouter, { prefix: '/dashboard/v1' }); await fastify.register(dmRouter, { prefix: '/dm/v1' }); await fastify.register(adminRouter, { prefix: '/admin/v1' }); await fastify.register(mobileRouter, { prefix: '/mobile/v1' }); await fastify.register(cliRouter, { prefix: '/cli/v1' }); await fastify.register(agentRouter, { prefix: '/agent/v1' });

Surface Access Matrix

The gateway enforces this access matrix before forwarding to any router:

Surface prefixAllowed rolesAllowed credential types
/auth/v1— (public login/refresh endpoints)None required
/dashboard/v1admin, memberJWT
/dm/v1reviewer, super_adminJWT
/admin/v1super_adminJWT
/mobile/v1admin, memberJWT
/cli/v1reviewer, super_adminAPI key, JWT
/agent/v1agentTask-scoped JWT

Any mismatch returns 403 Forbidden before the router is reached.


Error Responses from the Gateway

Gateway errors use the same ApiError envelope as surface routers:

StepError conditionStatusCode
AuthMissing Authorization header401UNAUTHORIZED
AuthJWT signature invalid401INVALID_TOKEN
AuthJWT expired401TOKEN_EXPIRED
AuthAPI key not found / inactive401INVALID_API_KEY
TenanttenantId in JWT not found in DB401TENANT_NOT_FOUND
TenantReviewer not assigned to requested tenant403FORBIDDEN
Surface accessRole not allowed on this surface403FORBIDDEN
Rate limitRequest count exceeded429RATE_LIMITED
ThrottleConcurrent request cap exceeded429THROTTLED

Future Upgrade — Option B: Separate Gateway Service

Future story — not in scope for the current build.

As tenant volume and traffic grow, the gateway layer can be extracted into a standalone gateway service that proxies to separately-deployed surface services. The upgrade path is:

  1. Extract gateway plugin into a standalone Fastify app (apps/gateway)
  2. Split surface routers into separate deployable services (apps/api-dashboard, apps/api-dm, apps/api-admin, etc.) — each on its own port
  3. Gateway proxies using @fastify/http-proxy — routes by URL prefix to the correct downstream service after running all gateway hooks
  4. Each service removes its own auth/rate-limit middleware (gateway now owns this exclusively)
  5. Coolify deploys gateway + each service as separate containers with a private Docker network between them — only the gateway is exposed to Traefik

Benefits of Option B:

  • Individual surfaces can scale independently (e.g. api-dm gets more replicas during peak review hours)
  • Surface services can be deployed separately without taking down the whole API
  • Gateway becomes a true choke point — circuit breaking, retries, and observability all in one place

Pre-conditions before migration:

  • Each surface must be independently testable
  • API contracts between gateway and services must be stable
  • Monitoring per-service latency to identify where scaling is actually needed

© 2026 Leadmetrics — Internal use only