Skip to Content
FeaturesGovernance & Guardrails

Governance & Guardrails

Purpose

Ensure agents operate within defined boundaries — critical for a client-facing agency. Agents must never publish content, send emails, or make ad changes without explicit human approval. Brand voice must be consistent. Character limits must be enforced. Runaway agents must be halted automatically.

Related: Workflow Model — HITL approval activities | Task Queue — activity state machine | Agent Hierarchy — which agents have write-action tools


Guardrail Layers

┌──────────────────────────────────────────────────────────┐ │ Layer 1: Permission Whitelist (agent config) │ │ Which tools can this agent even call? │ └──────────────────────────────┬───────────────────────────┘ │ passes ┌──────────────────────────────▼───────────────────────────┐ │ Layer 2: Output Validators (post-generation) │ │ Character limits, brand voice, banned words │ └──────────────────────────────┬───────────────────────────┘ │ passes ┌──────────────────────────────▼───────────────────────────┐ │ Layer 3: Approvals Queue (human-in-the-loop) │ │ Human signs off before ANY client-facing action │ └──────────────────────────────┬───────────────────────────┘ │ approved ┌──────────────────────────────▼───────────────────────────┐ │ Layer 4: Rate Limiting & Budget Halt │ │ Prevent runaway loops and cost overruns │ └──────────────────────────────────────────────────────────┘

Layer 1 — Permission Whitelist

Each agent has a toolNames[] array in its config. The tool dispatcher checks this before executing any tool call.

async function dispatchToolCall( toolName: string, method: string, input: unknown, agentConfig: AgentConfig, taskRun: TaskRun, ): Promise<unknown> { // 1. Permission check if (!agentConfig.toolNames.includes(toolName)) { throw new PermissionDeniedError( `Agent "${agentConfig.role}" does not have permission to use tool "${toolName}"` ); } // 2. Write-action approval check if (isWriteAction(toolName, method)) { await assertApprovalExists(taskRun.taskId, toolName, method); } // 3. Execute return integrations[toolName][method](input); }

Defined write actions (require approval):

  • google_ads.createAd, google_ads.pauseAd, google_ads.updateBid
  • meta_ads.createAd, meta_ads.pauseAd
  • wordpress.createPost (with status: 'publish')
  • mailchimp.scheduleCampaign
  • klaviyo.createCampaign
  • slack.postMessage (to client channels only)
  • google_docs.shareDocument (to external emails)

Layer 2 — Output Validators

Validators run after the agent produces a deliverable, before it is written to the deliverables table. A hard-fail validator blocks the deliverable from being saved and sends the task back for re-run. A warn validator flags the issue in approvals.validation_results for human review.

Character Limit Validator

interface CharacterLimitRule { field: string; platform: string; maxChars: number; severity: 'hard_fail' | 'warn'; } const PLATFORM_LIMITS: CharacterLimitRule[] = [ { field: 'headline', platform: 'google_ads', maxChars: 30, severity: 'hard_fail' }, { field: 'description', platform: 'google_ads', maxChars: 90, severity: 'hard_fail' }, { field: 'primary_text',platform: 'meta_ads', maxChars: 125, severity: 'hard_fail' }, { field: 'headline', platform: 'meta_ads', maxChars: 40, severity: 'hard_fail' }, { field: 'subject_line',platform: 'email', maxChars: 60, severity: 'warn' }, { field: 'preview_text',platform: 'email', maxChars: 100, severity: 'warn' }, { field: 'tweet', platform: 'x_twitter', maxChars: 280, severity: 'hard_fail' }, { field: 'linkedin_post',platform: 'linkedin', maxChars: 3000,severity: 'warn' }, ]; function validateCharacterLimits(content: DeliverableContent): ValidationResult[] { // Parse structured content, check each field against limits // Return array of { field, platform, actual, limit, severity } }

If any hard_fail violation is found, the task is marked failed and BullMQ re-runs it with the violation details appended to the task prompt so the agent corrects itself.

Brand Voice Validator

A lightweight LLM call (using a cheap Ollama model to keep costs low) that checks generated content against the client’s brand-voice-guide.md:

async function validateBrandVoice( content: string, brandVoiceSkill: string, clientId: string ): Promise<BrandVoiceScore> { const prompt = ` You are a brand voice reviewer. Given the brand voice guide and the content below, score the content from 0-100 for brand alignment and list any violations. BRAND VOICE GUIDE: ${brandVoiceSkill} CONTENT TO REVIEW: ${content} Respond with JSON: { "score": number, "violations": string[], "recommendations": string[] } `; const result = await ollamaExecute('gemma3:4b', prompt); return JSON.parse(result.text); }

Score below 60 = warn (surfaced in approvals UI). Below 40 = re-run with brand voice feedback.

Banned Words Filter

const GLOBAL_BANNED_WORDS = ['guaranteed', 'best in class', 'world-class', /* ... */]; async function validateBannedWords(content: string, clientId: string): Promise<string[]> { const clientBannedWords = await getClientBannedWords(clientId); // from client settings const allBanned = [...GLOBAL_BANNED_WORDS, ...clientBannedWords]; return allBanned.filter(word => content.toLowerCase().includes(word.toLowerCase()) ); }

Any match = hard fail. The matched words and their positions are returned to the agent in the retry prompt.

Required Disclaimer Injection

For regulated industries (finance, health, legal), required disclaimers are automatically appended:

async function injectDisclaimers( content: string, clientId: string, deliverableType: string ): Promise<string> { const disclaimers = await getRequiredDisclaimers(clientId, deliverableType); if (!disclaimers.length) return content; return content + '\n\n---\n\n' + disclaimers.join('\n\n'); }

Layer 3 — Human-in-the-Loop (HITL) Approvals

Non-negotiable gate

No agent can trigger a write action on an external system without a corresponding approvals record with status = 'approved'. This is enforced at the tool dispatcher level (Layer 1) — the UI cannot bypass it.


Approval Types

TypeCreated byWho decidesWhat happens on resolution
content_reviewSystem (auto, when agent deliverable completes)DM reviewerApproved → publish gate unlocked; Rejected → revision activity
content_directionAgent (create_approval tool)DM reviewer / tenant adminAll linked writing activities re-enqueued with direction decision
brand_directionAgent (create_approval tool)DM reviewer / tenant adminLinked agents resume with approved brand guidance
strategy_changeAgent (create_approval tool)DM reviewer + tenant adminStrategy updated; linked campaign activities re-planned
budget_authorizationAgent (create_approval tool)Tenant admin onlyBudget allocated; blocked activities resume
channel_actionSystem (write-tool gate)DM reviewerApproved → external API call executes; Rejected → action cancelled

Default risk level by type:

TypeDefault riskExpiry window
content_reviewDerived from deliverable (see table below)72h low / 48h medium / 24h high
content_directionmedium48h
brand_directionmedium48h
strategy_changehigh24h
budget_authorizationhigh24h
channel_actionhigh24h

Risk level by deliverable type (content_review only):

DeliverableRiskReason
Blog post draftlowNo external action
Social postmediumDirect audience reach
Ad copymediumSpend-impacting
Email campaignhighIrreversible send
Live ad create/updatehighImmediate spend impact
Live social posthighPublic, hard to retract

Database Schema

CREATE TABLE approvals ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL REFERENCES tenants(id), type VARCHAR(50) NOT NULL, -- ApprovalType enum title TEXT NOT NULL, -- short label in DM Portal inbox description TEXT, -- context for the reviewer status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending | approved | rejected | expired risk_level VARCHAR(10) NOT NULL, -- low | medium | high created_by_type VARCHAR(10) NOT NULL, -- 'system' | 'agent' created_by_agent_role VARCHAR(100), -- set when created_by_type = 'agent' reviewed_by_user_id UUID REFERENCES users(id), reviewer_notes TEXT, validation_results JSONB, -- output validator results (content_review only) options TEXT[], -- reviewer choice options (if applicable) expires_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT NOW(), resolved_at TIMESTAMPTZ ); -- Links one approval to one or more activities that are blocked on it CREATE TABLE approval_linked_activities ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), approval_id UUID NOT NULL REFERENCES approvals(id) ON DELETE CASCADE, activity_id UUID NOT NULL REFERENCES activities(id), linked_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE (approval_id, activity_id) ); CREATE INDEX ON approvals(tenant_id, status); CREATE INDEX ON approvals(expires_at) WHERE status = 'pending'; CREATE INDEX ON approval_linked_activities(approval_id); CREATE INDEX ON approval_linked_activities(activity_id);

How Approvals are Created

1. System-created (automatic) — content_review and channel_action

Triggered automatically when an agent deliverable completes or when a write-action tool call is intercepted:

// After agent deliverable saved (content_review) const approval = await db.insert(approvals).values({ tenantId: tenant.id, type: 'content_review', title: `Review: ${deliverable.name}`, description: `Agent-produced deliverable ready for review.`, status: 'pending', riskLevel: deriveRiskLevel(deliverable.type), createdByType: 'system', validationResults: validatorOutput, expiresAt: addHours(new Date(), expiryHoursForRisk(riskLevel)), }).returning(); // Link to the producing activity await db.insert(approvalLinkedActivities).values({ approvalId: approval.id, activityId: activity.id, });

2. Agent-created — via create_approval tool

Any agent can create a first-class approval for multi-activity decisions. See Tool / Integration Layer for the full tool spec.

// Tool handler for create_approval async function handleCreateApproval( input: CreateApprovalInput, context: ToolCallContext, ): Promise<CreateApprovalResult> { const approval = await db.insert(approvals).values({ tenantId: context.tenantId, type: input.type, title: input.title, description: input.description, status: 'pending', riskLevel: input.riskLevel ?? defaultRiskForType(input.type), createdByType: 'agent', createdByAgentRole: context.agentRole, options: input.options ?? null, expiresAt: addHours(new Date(), input.expiresInHours ?? expiryHoursForRisk(riskLevel)), }).returning(); // Block the calling activity if requested (default: true) const activityIdsToBlock = [...(input.linkedActivityIds ?? [])]; if (input.blockCurrentActivity !== false) { activityIdsToBlock.push(context.activityId); } // Link and block all target activities for (const activityId of activityIdsToBlock) { await db.insert(approvalLinkedActivities).values({ approvalId: approval.id, activityId }); await db.update(activities) .set({ status: 'awaiting_approval', approvalId: approval.id }) .where(eq(activities.id, activityId)); } return { status: 'approval_created', approvalId: approval.id, message: `Approval created. ${activityIdsToBlock.length} activities suspended until resolved.`, }; }

Approval Resolution Flow

When a human reviewer approves or rejects in the DM Portal:

async function onApprovalResolved( approvalId: string, tenantId: string, approved: boolean, reviewerNotes: string, userId: string, chosenOption?: string, // if the approval had options[], which one the reviewer selected ): Promise<void> { // 1. Update the approval record await db.update(approvals) .set({ status: approved ? 'approved' : 'rejected', reviewedByUserId: userId, reviewerNotes, resolvedAt: new Date(), }) .where(eq(approvals.id, approvalId)); // 2. Fetch all linked activities const links = await db.query.approvalLinkedActivities.findMany({ where: eq(approvalLinkedActivities.approvalId, approvalId), }); // 3. Re-enqueue each linked activity with the resolution context for (const link of links) { const activity = await getActivity(link.activityId, tenantId); await db.update(activities) .set({ status: 'created', approvalId: null }) .where(eq(activities.id, link.activityId)); await enqueueActivity(activity, tenantId, activity.assigneeAgentRole, { wakeReason: approved ? 'review_approved' : 'review_feedback', reviewApproved: approved, reviewerFeedback: reviewerNotes + (chosenOption ? `\n\nSelected option: ${chosenOption}` : ''), }); } }

All blocked activities resume in a single resolution event. For a content_direction approval that was blocking 10 blog post writing activities, all 10 are re-enqueued simultaneously when the reviewer approves.


Approval Outcomes

Reviewer actionEffect on linked activitiesEffect on approval
ApproveAll re-enqueued with wakeReason: 'review_approved' + reviewer notesstatus → approved
Approve with optionRe-enqueued with wakeReason: 'review_approved' + chosen option injected into promptstatus → approved
Reject / Send feedbackAll re-enqueued with wakeReason: 'review_feedback' + reviewer notesstatus → rejected
Edit and approve (content_review only)Edited content saved as new output; activity marked donestatus → approved
Approve subset (content_review multi-link)Approved items advance; rejected items get revision activitiesMixed per-activity

Approval Expiry & Urgency Escalation

A BullMQ cron job runs hourly and checks pending approvals against their expires_at:

// Runs every hour async function checkApprovalExpiry(): Promise<void> { const now = new Date(); // 1. Warning: 24h before expiry — send in-app notification to reviewers const approachingExpiry = await db.query.approvals.findMany({ where: and( eq(approvals.status, 'pending'), lte(approvals.expiresAt, addHours(now, 24)), gt(approvals.expiresAt, now), ), }); for (const approval of approachingExpiry) { await notifyReviewers(approval, 'expiry_warning'); } // 2. Expired: mark as expired, block all linked activities const expired = await db.query.approvals.findMany({ where: and( eq(approvals.status, 'pending'), lte(approvals.expiresAt, now), ), }); for (const approval of expired) { await db.update(approvals) .set({ status: 'expired', resolvedAt: now }) .where(eq(approvals.id, approval.id)); // Escalate: create a high-priority human task in DM Portal await createEscalationActivity({ tenantId: approval.tenantId, name: `ESCALATED: Approval expired — ${approval.title}`, description: `Approval "${approval.title}" expired without a decision. ` + `${linkedCount} activities are blocked. Resolve immediately.`, priority: 'urgent', }); } }

Escalation does not auto-proceed. Expired approvals block linked activities indefinitely until a human resolves the escalation. Auto-proceeding on expiry is too risky for client-facing content — an unreviewed social post publishing or an ad going live would be a worse outcome than a delayed deliverable.

Urgency flag: Approvals created within 6 hours of a campaign go-live date (derived from deliverable_periods.endDate) are automatically flagged riskLevel: 'high' and get a 6h expiry window regardless of type.


Layer 4 — Rate Limiting & Automatic Halt

Per-agent rate limiting

Each BullMQ worker has a configurable rateLimiter that caps how many jobs per minute an agent can run, preventing loops:

new Worker(queue, processor, { connection: redis, concurrency: 3, limiter: { max: 10, // max 10 jobs duration: 60000, // per 60 seconds } });

Runaway detection

A watchdog runs every 5 minutes and looks for anomalies:

async function runWatchdog(): Promise<void> { // Agent running more than maxTimeoutMs const stuckRuns = await db.query.taskRuns.findMany({ where: and( eq(taskRuns.status, 'running'), lt(taskRuns.startedAt, new Date(Date.now() - MAX_TASK_DURATION_MS)) ) }); for (const run of stuckRuns) { await killTaskRun(run); await createEscalation(run.taskId, 'watchdog_timeout'); } // Campaign cost exceeded cap const overBudget = await db.query.campaigns.findMany({ where: and( isNotNull(campaigns.budgetCapUsd), gt(campaigns.totalCostUsd, campaigns.budgetCapUsd) ) }); for (const campaign of overBudget) { await pauseCampaign(campaign.id); await drainCampaignJobs(campaign.id); await notifyBudgetBreached(campaign); } }

Automatic halt conditions

ConditionAction
Task cost cap exceededAbort call, fail task, escalate
Campaign budget cap exceededPause campaign, drain queue, Slack alert
Agent LLM error rate > 50% in 10 minPause that agent’s queue, Slack alert
Task running > configured timeoutMsKill process, fail task, retry
Tool call returns rate-limit errorBack off, retry after reset, pause if persistent

Audit Trail

All agent actions are attributable. The tool_calls table records:

  • Which agent (task_run → agent_config)
  • For which client (task_run → task → campaign → client)
  • What action (tool_name, method)
  • What parameters (stored in input JSONB)
  • What the outcome was (output, status, error)
  • When (created_at)

This supports compliance reporting: “show me every action taken on behalf of Client X in the last 90 days.”


Package Location

apps/api/src/ ├── middleware/ # Rate limiting middleware └── routes/ └── approvals.ts # HITL approval endpoints (create, resolve, query) packages/agent-engine/src/ ├── validators/ # Character limits, brand voice, banned words, disclaimers ├── budget.ts # Budget halt logic └── watchdog.ts # Runaway detection packages/queue/src/ ├── workers.ts # BullMQ rate limiter config └── approval.ts # enqueueApprovalLinkedActivities() — fan-out on resolve packages/integrations/src/ └── control-plane/ └── create-approval.ts # create_approval tool handler

Database tables: approvals, approval_linked_activities — see schema above.


Audit Trail Security

Immutability

The MongoDB audit_logs collection is append-only. The application’s MongoDB user has insert + find only on audit_logsupdate and delete are revoked at the database role level. This means no application code path — even one with a bug or under active attack — can tamper with past audit records. See MongoDB Security for the role definition.

PII Redaction

Before any before/after diff is written to audit_logs, fields annotated as PII in the schema (email, phone, firstName, lastName, password) are replaced with '[REDACTED]'. This prevents audit logs from becoming a PII data store, while still capturing what changed.

IP addresses are stored at the top-level audit record (for legal acceptance events) but excluded from diff payloads.

Retention

Audit logs are retained for a minimum of 2 years. There is no TTL index on audit_logs. After 2 years, records may be archived to cold storage (S3 Glacier or equivalent) but are never deleted. Archival is an admin-only operation and itself generates an audit entry.


Budget Halt — Real-Time Cost Tracking

Costs are tracked incrementally during agent execution, not just after completion. This prevents a single long-running agent from blowing through the budget before the halt can fire.

// Called by the LLM adapter after every streaming chunk that includes usage metadata async function trackAndCheckBudget( tenantId: string, campaignId: string, activityRunId: string, newCostUsd: number, ): Promise<'continue' | 'halt'> { // Atomic increment + check in a single DB transaction const [updatedRun] = await db .update(activityRuns) .set({ costUsd: sql`cost_usd + ${newCostUsd}` }) .where(eq(activityRuns.id, activityRunId)) .returning({ costUsd: activityRuns.costUsd }); // Check all three caps in order (cheapest check first) const agentConfig = await getCachedAgentConfig(tenantId); if (agentConfig.maxCostUsdPerActivity && updatedRun.costUsd > agentConfig.maxCostUsdPerActivity) { return 'halt'; } const campaign = await getCachedCampaign(campaignId); if (campaign.budgetCapUsd) { const campaignSpend = await getCampaignTotalSpend(campaignId); // Redis-cached, 5s TTL if (campaignSpend > campaign.budgetCapUsd) return 'halt'; } const tenant = await getCachedTenant(tenantId); if (tenant.monthlySpendCapUsd) { const tenantSpend = await getTenantMonthlySpend(tenantId); // Redis-cached, 5s TTL if (tenantSpend > tenant.monthlySpendCapUsd) return 'halt'; } return 'continue'; }

On 'halt': the adapter kills the LLM process (SIGTERM), the run is marked failed with error: 'budget_exceeded', and no retry is scheduled. The campaign’s status is set to paused. A Slack alert fires to the ops channel and a notification is sent to the tenant admin.

Cached cap lookups: Agent config and campaign budget caps are cached in Redis (5-second TTL). This means a budget overrun may add at most one 5-second window of extra spend — an acceptable trade-off vs. a DB query on every token.

© 2026 Leadmetrics — Internal use only