Governance & Guardrails
Purpose
Ensure agents operate within defined boundaries — critical for a client-facing agency. Agents must never publish content, send emails, or make ad changes without explicit human approval. Brand voice must be consistent. Character limits must be enforced. Runaway agents must be halted automatically.
Related: Workflow Model — HITL approval activities | Task Queue — activity state machine | Agent Hierarchy — which agents have write-action tools
Guardrail Layers
┌──────────────────────────────────────────────────────────┐
│ Layer 1: Permission Whitelist (agent config) │
│ Which tools can this agent even call? │
└──────────────────────────────┬───────────────────────────┘
│ passes
┌──────────────────────────────▼───────────────────────────┐
│ Layer 2: Output Validators (post-generation) │
│ Character limits, brand voice, banned words │
└──────────────────────────────┬───────────────────────────┘
│ passes
┌──────────────────────────────▼───────────────────────────┐
│ Layer 3: Approvals Queue (human-in-the-loop) │
│ Human signs off before ANY client-facing action │
└──────────────────────────────┬───────────────────────────┘
│ approved
┌──────────────────────────────▼───────────────────────────┐
│ Layer 4: Rate Limiting & Budget Halt │
│ Prevent runaway loops and cost overruns │
└──────────────────────────────────────────────────────────┘Layer 1 — Permission Whitelist
Each agent has a toolNames[] array in its config. The tool dispatcher checks this before executing any tool call.
async function dispatchToolCall(
toolName: string,
method: string,
input: unknown,
agentConfig: AgentConfig,
taskRun: TaskRun,
): Promise<unknown> {
// 1. Permission check
if (!agentConfig.toolNames.includes(toolName)) {
throw new PermissionDeniedError(
`Agent "${agentConfig.role}" does not have permission to use tool "${toolName}"`
);
}
// 2. Write-action approval check
if (isWriteAction(toolName, method)) {
await assertApprovalExists(taskRun.taskId, toolName, method);
}
// 3. Execute
return integrations[toolName][method](input);
}Defined write actions (require approval):
google_ads.createAd,google_ads.pauseAd,google_ads.updateBidmeta_ads.createAd,meta_ads.pauseAdwordpress.createPost(withstatus: 'publish')mailchimp.scheduleCampaignklaviyo.createCampaignslack.postMessage(to client channels only)google_docs.shareDocument(to external emails)
Layer 2 — Output Validators
Validators run after the agent produces a deliverable, before it is written to the deliverables table. A hard-fail validator blocks the deliverable from being saved and sends the task back for re-run. A warn validator flags the issue in approvals.validation_results for human review.
Character Limit Validator
interface CharacterLimitRule {
field: string;
platform: string;
maxChars: number;
severity: 'hard_fail' | 'warn';
}
const PLATFORM_LIMITS: CharacterLimitRule[] = [
{ field: 'headline', platform: 'google_ads', maxChars: 30, severity: 'hard_fail' },
{ field: 'description', platform: 'google_ads', maxChars: 90, severity: 'hard_fail' },
{ field: 'primary_text',platform: 'meta_ads', maxChars: 125, severity: 'hard_fail' },
{ field: 'headline', platform: 'meta_ads', maxChars: 40, severity: 'hard_fail' },
{ field: 'subject_line',platform: 'email', maxChars: 60, severity: 'warn' },
{ field: 'preview_text',platform: 'email', maxChars: 100, severity: 'warn' },
{ field: 'tweet', platform: 'x_twitter', maxChars: 280, severity: 'hard_fail' },
{ field: 'linkedin_post',platform: 'linkedin', maxChars: 3000,severity: 'warn' },
];
function validateCharacterLimits(content: DeliverableContent): ValidationResult[] {
// Parse structured content, check each field against limits
// Return array of { field, platform, actual, limit, severity }
}If any hard_fail violation is found, the task is marked failed and BullMQ re-runs it with the violation details appended to the task prompt so the agent corrects itself.
Brand Voice Validator
A lightweight LLM call (using a cheap Ollama model to keep costs low) that checks generated content against the client’s brand-voice-guide.md:
async function validateBrandVoice(
content: string,
brandVoiceSkill: string,
clientId: string
): Promise<BrandVoiceScore> {
const prompt = `
You are a brand voice reviewer. Given the brand voice guide and the content below,
score the content from 0-100 for brand alignment and list any violations.
BRAND VOICE GUIDE:
${brandVoiceSkill}
CONTENT TO REVIEW:
${content}
Respond with JSON: { "score": number, "violations": string[], "recommendations": string[] }
`;
const result = await ollamaExecute('gemma3:4b', prompt);
return JSON.parse(result.text);
}Score below 60 = warn (surfaced in approvals UI). Below 40 = re-run with brand voice feedback.
Banned Words Filter
const GLOBAL_BANNED_WORDS = ['guaranteed', 'best in class', 'world-class', /* ... */];
async function validateBannedWords(content: string, clientId: string): Promise<string[]> {
const clientBannedWords = await getClientBannedWords(clientId); // from client settings
const allBanned = [...GLOBAL_BANNED_WORDS, ...clientBannedWords];
return allBanned.filter(word =>
content.toLowerCase().includes(word.toLowerCase())
);
}Any match = hard fail. The matched words and their positions are returned to the agent in the retry prompt.
Required Disclaimer Injection
For regulated industries (finance, health, legal), required disclaimers are automatically appended:
async function injectDisclaimers(
content: string,
clientId: string,
deliverableType: string
): Promise<string> {
const disclaimers = await getRequiredDisclaimers(clientId, deliverableType);
if (!disclaimers.length) return content;
return content + '\n\n---\n\n' + disclaimers.join('\n\n');
}Layer 3 — Human-in-the-Loop (HITL) Approvals
Non-negotiable gate
No agent can trigger a write action on an external system without a corresponding approvals record with status = 'approved'. This is enforced at the tool dispatcher level (Layer 1) — the UI cannot bypass it.
Approval Types
| Type | Created by | Who decides | What happens on resolution |
|---|---|---|---|
content_review | System (auto, when agent deliverable completes) | DM reviewer | Approved → publish gate unlocked; Rejected → revision activity |
content_direction | Agent (create_approval tool) | DM reviewer / tenant admin | All linked writing activities re-enqueued with direction decision |
brand_direction | Agent (create_approval tool) | DM reviewer / tenant admin | Linked agents resume with approved brand guidance |
strategy_change | Agent (create_approval tool) | DM reviewer + tenant admin | Strategy updated; linked campaign activities re-planned |
budget_authorization | Agent (create_approval tool) | Tenant admin only | Budget allocated; blocked activities resume |
channel_action | System (write-tool gate) | DM reviewer | Approved → external API call executes; Rejected → action cancelled |
Default risk level by type:
| Type | Default risk | Expiry window |
|---|---|---|
content_review | Derived from deliverable (see table below) | 72h low / 48h medium / 24h high |
content_direction | medium | 48h |
brand_direction | medium | 48h |
strategy_change | high | 24h |
budget_authorization | high | 24h |
channel_action | high | 24h |
Risk level by deliverable type (content_review only):
| Deliverable | Risk | Reason |
|---|---|---|
| Blog post draft | low | No external action |
| Social post | medium | Direct audience reach |
| Ad copy | medium | Spend-impacting |
| Email campaign | high | Irreversible send |
| Live ad create/update | high | Immediate spend impact |
| Live social post | high | Public, hard to retract |
Database Schema
CREATE TABLE approvals (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
type VARCHAR(50) NOT NULL, -- ApprovalType enum
title TEXT NOT NULL, -- short label in DM Portal inbox
description TEXT, -- context for the reviewer
status VARCHAR(20) NOT NULL DEFAULT 'pending',
-- pending | approved | rejected | expired
risk_level VARCHAR(10) NOT NULL, -- low | medium | high
created_by_type VARCHAR(10) NOT NULL, -- 'system' | 'agent'
created_by_agent_role VARCHAR(100), -- set when created_by_type = 'agent'
reviewed_by_user_id UUID REFERENCES users(id),
reviewer_notes TEXT,
validation_results JSONB, -- output validator results (content_review only)
options TEXT[], -- reviewer choice options (if applicable)
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
resolved_at TIMESTAMPTZ
);
-- Links one approval to one or more activities that are blocked on it
CREATE TABLE approval_linked_activities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
approval_id UUID NOT NULL REFERENCES approvals(id) ON DELETE CASCADE,
activity_id UUID NOT NULL REFERENCES activities(id),
linked_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE (approval_id, activity_id)
);
CREATE INDEX ON approvals(tenant_id, status);
CREATE INDEX ON approvals(expires_at) WHERE status = 'pending';
CREATE INDEX ON approval_linked_activities(approval_id);
CREATE INDEX ON approval_linked_activities(activity_id);How Approvals are Created
1. System-created (automatic) — content_review and channel_action
Triggered automatically when an agent deliverable completes or when a write-action tool call is intercepted:
// After agent deliverable saved (content_review)
const approval = await db.insert(approvals).values({
tenantId: tenant.id,
type: 'content_review',
title: `Review: ${deliverable.name}`,
description: `Agent-produced deliverable ready for review.`,
status: 'pending',
riskLevel: deriveRiskLevel(deliverable.type),
createdByType: 'system',
validationResults: validatorOutput,
expiresAt: addHours(new Date(), expiryHoursForRisk(riskLevel)),
}).returning();
// Link to the producing activity
await db.insert(approvalLinkedActivities).values({
approvalId: approval.id,
activityId: activity.id,
});2. Agent-created — via create_approval tool
Any agent can create a first-class approval for multi-activity decisions. See Tool / Integration Layer for the full tool spec.
// Tool handler for create_approval
async function handleCreateApproval(
input: CreateApprovalInput,
context: ToolCallContext,
): Promise<CreateApprovalResult> {
const approval = await db.insert(approvals).values({
tenantId: context.tenantId,
type: input.type,
title: input.title,
description: input.description,
status: 'pending',
riskLevel: input.riskLevel ?? defaultRiskForType(input.type),
createdByType: 'agent',
createdByAgentRole: context.agentRole,
options: input.options ?? null,
expiresAt: addHours(new Date(), input.expiresInHours ?? expiryHoursForRisk(riskLevel)),
}).returning();
// Block the calling activity if requested (default: true)
const activityIdsToBlock = [...(input.linkedActivityIds ?? [])];
if (input.blockCurrentActivity !== false) {
activityIdsToBlock.push(context.activityId);
}
// Link and block all target activities
for (const activityId of activityIdsToBlock) {
await db.insert(approvalLinkedActivities).values({ approvalId: approval.id, activityId });
await db.update(activities)
.set({ status: 'awaiting_approval', approvalId: approval.id })
.where(eq(activities.id, activityId));
}
return {
status: 'approval_created',
approvalId: approval.id,
message: `Approval created. ${activityIdsToBlock.length} activities suspended until resolved.`,
};
}Approval Resolution Flow
When a human reviewer approves or rejects in the DM Portal:
async function onApprovalResolved(
approvalId: string,
tenantId: string,
approved: boolean,
reviewerNotes: string,
userId: string,
chosenOption?: string, // if the approval had options[], which one the reviewer selected
): Promise<void> {
// 1. Update the approval record
await db.update(approvals)
.set({
status: approved ? 'approved' : 'rejected',
reviewedByUserId: userId,
reviewerNotes,
resolvedAt: new Date(),
})
.where(eq(approvals.id, approvalId));
// 2. Fetch all linked activities
const links = await db.query.approvalLinkedActivities.findMany({
where: eq(approvalLinkedActivities.approvalId, approvalId),
});
// 3. Re-enqueue each linked activity with the resolution context
for (const link of links) {
const activity = await getActivity(link.activityId, tenantId);
await db.update(activities)
.set({ status: 'created', approvalId: null })
.where(eq(activities.id, link.activityId));
await enqueueActivity(activity, tenantId, activity.assigneeAgentRole, {
wakeReason: approved ? 'review_approved' : 'review_feedback',
reviewApproved: approved,
reviewerFeedback: reviewerNotes
+ (chosenOption ? `\n\nSelected option: ${chosenOption}` : ''),
});
}
}All blocked activities resume in a single resolution event. For a content_direction approval that was blocking 10 blog post writing activities, all 10 are re-enqueued simultaneously when the reviewer approves.
Approval Outcomes
| Reviewer action | Effect on linked activities | Effect on approval |
|---|---|---|
| Approve | All re-enqueued with wakeReason: 'review_approved' + reviewer notes | status → approved |
| Approve with option | Re-enqueued with wakeReason: 'review_approved' + chosen option injected into prompt | status → approved |
| Reject / Send feedback | All re-enqueued with wakeReason: 'review_feedback' + reviewer notes | status → rejected |
Edit and approve (content_review only) | Edited content saved as new output; activity marked done | status → approved |
Approve subset (content_review multi-link) | Approved items advance; rejected items get revision activities | Mixed per-activity |
Approval Expiry & Urgency Escalation
A BullMQ cron job runs hourly and checks pending approvals against their expires_at:
// Runs every hour
async function checkApprovalExpiry(): Promise<void> {
const now = new Date();
// 1. Warning: 24h before expiry — send in-app notification to reviewers
const approachingExpiry = await db.query.approvals.findMany({
where: and(
eq(approvals.status, 'pending'),
lte(approvals.expiresAt, addHours(now, 24)),
gt(approvals.expiresAt, now),
),
});
for (const approval of approachingExpiry) {
await notifyReviewers(approval, 'expiry_warning');
}
// 2. Expired: mark as expired, block all linked activities
const expired = await db.query.approvals.findMany({
where: and(
eq(approvals.status, 'pending'),
lte(approvals.expiresAt, now),
),
});
for (const approval of expired) {
await db.update(approvals)
.set({ status: 'expired', resolvedAt: now })
.where(eq(approvals.id, approval.id));
// Escalate: create a high-priority human task in DM Portal
await createEscalationActivity({
tenantId: approval.tenantId,
name: `ESCALATED: Approval expired — ${approval.title}`,
description: `Approval "${approval.title}" expired without a decision. `
+ `${linkedCount} activities are blocked. Resolve immediately.`,
priority: 'urgent',
});
}
}Escalation does not auto-proceed. Expired approvals block linked activities indefinitely until a human resolves the escalation. Auto-proceeding on expiry is too risky for client-facing content — an unreviewed social post publishing or an ad going live would be a worse outcome than a delayed deliverable.
Urgency flag: Approvals created within 6 hours of a campaign go-live date (derived from deliverable_periods.endDate) are automatically flagged riskLevel: 'high' and get a 6h expiry window regardless of type.
Layer 4 — Rate Limiting & Automatic Halt
Per-agent rate limiting
Each BullMQ worker has a configurable rateLimiter that caps how many jobs per minute an agent can run, preventing loops:
new Worker(queue, processor, {
connection: redis,
concurrency: 3,
limiter: {
max: 10, // max 10 jobs
duration: 60000, // per 60 seconds
}
});Runaway detection
A watchdog runs every 5 minutes and looks for anomalies:
async function runWatchdog(): Promise<void> {
// Agent running more than maxTimeoutMs
const stuckRuns = await db.query.taskRuns.findMany({
where: and(
eq(taskRuns.status, 'running'),
lt(taskRuns.startedAt, new Date(Date.now() - MAX_TASK_DURATION_MS))
)
});
for (const run of stuckRuns) {
await killTaskRun(run);
await createEscalation(run.taskId, 'watchdog_timeout');
}
// Campaign cost exceeded cap
const overBudget = await db.query.campaigns.findMany({
where: and(
isNotNull(campaigns.budgetCapUsd),
gt(campaigns.totalCostUsd, campaigns.budgetCapUsd)
)
});
for (const campaign of overBudget) {
await pauseCampaign(campaign.id);
await drainCampaignJobs(campaign.id);
await notifyBudgetBreached(campaign);
}
}Automatic halt conditions
| Condition | Action |
|---|---|
| Task cost cap exceeded | Abort call, fail task, escalate |
| Campaign budget cap exceeded | Pause campaign, drain queue, Slack alert |
| Agent LLM error rate > 50% in 10 min | Pause that agent’s queue, Slack alert |
Task running > configured timeoutMs | Kill process, fail task, retry |
| Tool call returns rate-limit error | Back off, retry after reset, pause if persistent |
Audit Trail
All agent actions are attributable. The tool_calls table records:
- Which agent (
task_run → agent_config) - For which client (
task_run → task → campaign → client) - What action (
tool_name,method) - What parameters (stored in
inputJSONB) - What the outcome was (
output,status,error) - When (
created_at)
This supports compliance reporting: “show me every action taken on behalf of Client X in the last 90 days.”
Package Location
apps/api/src/
├── middleware/ # Rate limiting middleware
└── routes/
└── approvals.ts # HITL approval endpoints (create, resolve, query)
packages/agent-engine/src/
├── validators/ # Character limits, brand voice, banned words, disclaimers
├── budget.ts # Budget halt logic
└── watchdog.ts # Runaway detection
packages/queue/src/
├── workers.ts # BullMQ rate limiter config
└── approval.ts # enqueueApprovalLinkedActivities() — fan-out on resolve
packages/integrations/src/
└── control-plane/
└── create-approval.ts # create_approval tool handlerDatabase tables: approvals, approval_linked_activities — see schema above.
Audit Trail Security
Immutability
The MongoDB audit_logs collection is append-only. The application’s MongoDB user has insert + find only on audit_logs — update and delete are revoked at the database role level. This means no application code path — even one with a bug or under active attack — can tamper with past audit records. See MongoDB Security for the role definition.
PII Redaction
Before any before/after diff is written to audit_logs, fields annotated as PII in the schema (email, phone, firstName, lastName, password) are replaced with '[REDACTED]'. This prevents audit logs from becoming a PII data store, while still capturing what changed.
IP addresses are stored at the top-level audit record (for legal acceptance events) but excluded from diff payloads.
Retention
Audit logs are retained for a minimum of 2 years. There is no TTL index on audit_logs. After 2 years, records may be archived to cold storage (S3 Glacier or equivalent) but are never deleted. Archival is an admin-only operation and itself generates an audit entry.
Budget Halt — Real-Time Cost Tracking
Costs are tracked incrementally during agent execution, not just after completion. This prevents a single long-running agent from blowing through the budget before the halt can fire.
// Called by the LLM adapter after every streaming chunk that includes usage metadata
async function trackAndCheckBudget(
tenantId: string,
campaignId: string,
activityRunId: string,
newCostUsd: number,
): Promise<'continue' | 'halt'> {
// Atomic increment + check in a single DB transaction
const [updatedRun] = await db
.update(activityRuns)
.set({ costUsd: sql`cost_usd + ${newCostUsd}` })
.where(eq(activityRuns.id, activityRunId))
.returning({ costUsd: activityRuns.costUsd });
// Check all three caps in order (cheapest check first)
const agentConfig = await getCachedAgentConfig(tenantId);
if (agentConfig.maxCostUsdPerActivity && updatedRun.costUsd > agentConfig.maxCostUsdPerActivity) {
return 'halt';
}
const campaign = await getCachedCampaign(campaignId);
if (campaign.budgetCapUsd) {
const campaignSpend = await getCampaignTotalSpend(campaignId); // Redis-cached, 5s TTL
if (campaignSpend > campaign.budgetCapUsd) return 'halt';
}
const tenant = await getCachedTenant(tenantId);
if (tenant.monthlySpendCapUsd) {
const tenantSpend = await getTenantMonthlySpend(tenantId); // Redis-cached, 5s TTL
if (tenantSpend > tenant.monthlySpendCapUsd) return 'halt';
}
return 'continue';
}On 'halt': the adapter kills the LLM process (SIGTERM), the run is marked failed with error: 'budget_exceeded', and no retry is scheduled. The campaign’s status is set to paused. A Slack alert fires to the ops channel and a notification is sent to the tenant admin.
Cached cap lookups: Agent config and campaign budget caps are cached in Redis (5-second TTL). This means a budget overrun may add at most one 5-second window of extra spend — an acceptable trade-off vs. a DB query on every token.