Governance & Guardrails

Purpose

Ensure agents operate within defined boundaries — critical for a client-facing agency. Agents must never publish content, send emails, or make ad changes without explicit human approval. Brand voice must be consistent. Character limits must be enforced. Runaway agents must be halted automatically.

Related: Workflow Model — HITL approval activities | Task Queue — activity state machine | Agent Hierarchy — which agents have write-action tools

Guardrail Layers


┌──────────────────────────────────────────────────────────┐
│  Layer 1: Permission Whitelist (agent config)            │
│  Which tools can this agent even call?                   │
└──────────────────────────────┬───────────────────────────┘
                               │ passes
┌──────────────────────────────▼───────────────────────────┐
│  Layer 2: Output Validators (post-generation)            │
│  Character limits, brand voice, banned words             │
└──────────────────────────────┬───────────────────────────┘
                               │ passes
┌──────────────────────────────▼───────────────────────────┐
│  Layer 3: Approvals Queue (human-in-the-loop)            │
│  Human signs off before ANY client-facing action         │
└──────────────────────────────┬───────────────────────────┘
                               │ approved
┌──────────────────────────────▼───────────────────────────┐
│  Layer 4: Rate Limiting & Budget Halt                    │
│  Prevent runaway loops and cost overruns                 │
└──────────────────────────────────────────────────────────┘

Layer 1 — Permission Whitelist

Each agent has a toolNames[] array in its config. The tool dispatcher checks this before executing any tool call.


async function dispatchToolCall(
  toolName:    string,
  method:      string,
  input:       unknown,
  agentConfig: AgentConfig,
  taskRun:     TaskRun,
): Promise<unknown> {
  // 1. Permission check
  if (!agentConfig.toolNames.includes(toolName)) {
    throw new PermissionDeniedError(
      `Agent "${agentConfig.role}" does not have permission to use tool "${toolName}"`
    );
  }
 
  // 2. Write-action approval check
  if (isWriteAction(toolName, method)) {
    await assertApprovalExists(taskRun.taskId, toolName, method);
  }
 
  // 3. Execute
  return integrations[toolName][method](input);
}

Defined write actions (require approval):

google_ads.createAd, google_ads.pauseAd, google_ads.updateBid
meta_ads.createAd, meta_ads.pauseAd
wordpress.createPost (with status: 'publish')
mailchimp.scheduleCampaign
klaviyo.createCampaign
slack.postMessage (to client channels only)
google_docs.shareDocument (to external emails)

Layer 2 — Output Validators

Validators run after the agent produces a deliverable, before it is written to the deliverables table. A hard-fail validator blocks the deliverable from being saved and sends the task back for re-run. A warn validator flags the issue in approvals.validation_results for human review.

Character Limit Validator


interface CharacterLimitRule {
  field:    string;
  platform: string;
  maxChars: number;
  severity: 'hard_fail' | 'warn';
}
 
const PLATFORM_LIMITS: CharacterLimitRule[] = [
  { field: 'headline',    platform: 'google_ads',  maxChars: 30,  severity: 'hard_fail' },
  { field: 'description', platform: 'google_ads',  maxChars: 90,  severity: 'hard_fail' },
  { field: 'primary_text',platform: 'meta_ads',    maxChars: 125, severity: 'hard_fail' },
  { field: 'headline',    platform: 'meta_ads',    maxChars: 40,  severity: 'hard_fail' },
  { field: 'subject_line',platform: 'email',       maxChars: 60,  severity: 'warn'      },
  { field: 'preview_text',platform: 'email',       maxChars: 100, severity: 'warn'      },
  { field: 'tweet',       platform: 'x_twitter',   maxChars: 280, severity: 'hard_fail' },
  { field: 'linkedin_post',platform: 'linkedin',   maxChars: 3000,severity: 'warn'      },
];
 
function validateCharacterLimits(content: DeliverableContent): ValidationResult[] {
  // Parse structured content, check each field against limits
  // Return array of { field, platform, actual, limit, severity }
}

If any hard_fail violation is found, the task is marked failed and BullMQ re-runs it with the violation details appended to the task prompt so the agent corrects itself.

Brand Voice Validator

A lightweight LLM call (using a cheap Ollama model to keep costs low) that checks generated content against the client’s brand-voice-guide.md:


async function validateBrandVoice(
  content: string,
  brandVoiceSkill: string,
  clientId: string
): Promise<BrandVoiceScore> {
  const prompt = `
You are a brand voice reviewer. Given the brand voice guide and the content below,
score the content from 0-100 for brand alignment and list any violations.
 
BRAND VOICE GUIDE:
${brandVoiceSkill}
 
CONTENT TO REVIEW:
${content}
 
Respond with JSON: { "score": number, "violations": string[], "recommendations": string[] }
  `;
 
  const result = await ollamaExecute('gemma3:4b', prompt);
  return JSON.parse(result.text);
}

Score below 60 = warn (surfaced in approvals UI). Below 40 = re-run with brand voice feedback.

Banned Words Filter


const GLOBAL_BANNED_WORDS = ['guaranteed', 'best in class', 'world-class', /* ... */];
 
async function validateBannedWords(content: string, clientId: string): Promise<string[]> {
  const clientBannedWords = await getClientBannedWords(clientId);  // from client settings
  const allBanned = [...GLOBAL_BANNED_WORDS, ...clientBannedWords];
 
  return allBanned.filter(word =>
    content.toLowerCase().includes(word.toLowerCase())
  );
}

Any match = hard fail. The matched words and their positions are returned to the agent in the retry prompt.

Required Disclaimer Injection

For regulated industries (finance, health, legal), required disclaimers are automatically appended:


async function injectDisclaimers(
  content: string,
  clientId: string,
  deliverableType: string
): Promise<string> {
  const disclaimers = await getRequiredDisclaimers(clientId, deliverableType);
  if (!disclaimers.length) return content;
  return content + '\n\n---\n\n' + disclaimers.join('\n\n');
}

Layer 3 — Human-in-the-Loop (HITL) Approvals

Non-negotiable gate

No agent can trigger a write action on an external system without a corresponding approvals record with status = 'approved'. This is enforced at the tool dispatcher level (Layer 1) — the UI cannot bypass it.

Approval Types

Type	Created by	Who decides	What happens on resolution
`content_review`	System (auto, when agent deliverable completes)	DM reviewer	Approved → publish gate unlocked; Rejected → revision activity
`content_direction`	Agent (`create_approval` tool)	DM reviewer / tenant admin	All linked writing activities re-enqueued with direction decision
`brand_direction`	Agent (`create_approval` tool)	DM reviewer / tenant admin	Linked agents resume with approved brand guidance
`strategy_change`	Agent (`create_approval` tool)	DM reviewer + tenant admin	Strategy updated; linked campaign activities re-planned
`budget_authorization`	Agent (`create_approval` tool)	Tenant admin only	Budget allocated; blocked activities resume
`channel_action`	System (write-tool gate)	DM reviewer	Approved → external API call executes; Rejected → action cancelled

Default risk level by type:

Type	Default risk	Expiry window
`content_review`	Derived from deliverable (see table below)	72h low / 48h medium / 24h high
`content_direction`	medium	48h
`brand_direction`	medium	48h
`strategy_change`	high	24h
`budget_authorization`	high	24h
`channel_action`	high	24h

Risk level by deliverable type (content_review only):

Deliverable	Risk	Reason
Blog post draft	low	No external action
Social post	medium	Direct audience reach
Ad copy	medium	Spend-impacting
Email campaign	high	Irreversible send
Live ad create/update	high	Immediate spend impact
Live social post	high	Public, hard to retract

Database Schema


CREATE TABLE approvals (
  id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id           UUID NOT NULL REFERENCES tenants(id),
  type                VARCHAR(50) NOT NULL,       -- ApprovalType enum
  title               TEXT NOT NULL,              -- short label in DM Portal inbox
  description         TEXT,                       -- context for the reviewer
  status              VARCHAR(20) NOT NULL DEFAULT 'pending',
                                                  -- pending | approved | rejected | expired
  risk_level          VARCHAR(10) NOT NULL,        -- low | medium | high
  created_by_type     VARCHAR(10) NOT NULL,        -- 'system' | 'agent'
  created_by_agent_role VARCHAR(100),             -- set when created_by_type = 'agent'
  reviewed_by_user_id UUID REFERENCES users(id),
  reviewer_notes      TEXT,
  validation_results  JSONB,                      -- output validator results (content_review only)
  options             TEXT[],                     -- reviewer choice options (if applicable)
  expires_at          TIMESTAMPTZ,
  created_at          TIMESTAMPTZ DEFAULT NOW(),
  resolved_at         TIMESTAMPTZ
);
 
-- Links one approval to one or more activities that are blocked on it
CREATE TABLE approval_linked_activities (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  approval_id  UUID NOT NULL REFERENCES approvals(id) ON DELETE CASCADE,
  activity_id  UUID NOT NULL REFERENCES activities(id),
  linked_at    TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE (approval_id, activity_id)
);
 
CREATE INDEX ON approvals(tenant_id, status);
CREATE INDEX ON approvals(expires_at) WHERE status = 'pending';
CREATE INDEX ON approval_linked_activities(approval_id);
CREATE INDEX ON approval_linked_activities(activity_id);

How Approvals are Created

1. System-created (automatic) — content_review and channel_action

Triggered automatically when an agent deliverable completes or when a write-action tool call is intercepted:


// After agent deliverable saved (content_review)
const approval = await db.insert(approvals).values({
  tenantId:         tenant.id,
  type:             'content_review',
  title:            `Review: ${deliverable.name}`,
  description:      `Agent-produced deliverable ready for review.`,
  status:           'pending',
  riskLevel:        deriveRiskLevel(deliverable.type),
  createdByType:    'system',
  validationResults: validatorOutput,
  expiresAt:        addHours(new Date(), expiryHoursForRisk(riskLevel)),
}).returning();
 
// Link to the producing activity
await db.insert(approvalLinkedActivities).values({
  approvalId:  approval.id,
  activityId:  activity.id,
});

2. Agent-created — via create_approval tool

Any agent can create a first-class approval for multi-activity decisions. See Tool / Integration Layer for the full tool spec.


// Tool handler for create_approval
async function handleCreateApproval(
  input:      CreateApprovalInput,
  context:    ToolCallContext,
): Promise<CreateApprovalResult> {
  const approval = await db.insert(approvals).values({
    tenantId:            context.tenantId,
    type:                input.type,
    title:               input.title,
    description:         input.description,
    status:              'pending',
    riskLevel:           input.riskLevel ?? defaultRiskForType(input.type),
    createdByType:       'agent',
    createdByAgentRole:  context.agentRole,
    options:             input.options ?? null,
    expiresAt:           addHours(new Date(), input.expiresInHours ?? expiryHoursForRisk(riskLevel)),
  }).returning();
 
  // Block the calling activity if requested (default: true)
  const activityIdsToBlock = [...(input.linkedActivityIds ?? [])];
  if (input.blockCurrentActivity !== false) {
    activityIdsToBlock.push(context.activityId);
  }
 
  // Link and block all target activities
  for (const activityId of activityIdsToBlock) {
    await db.insert(approvalLinkedActivities).values({ approvalId: approval.id, activityId });
    await db.update(activities)
      .set({ status: 'awaiting_approval', approvalId: approval.id })
      .where(eq(activities.id, activityId));
  }
 
  return {
    status:     'approval_created',
    approvalId: approval.id,
    message:    `Approval created. ${activityIdsToBlock.length} activities suspended until resolved.`,
  };
}

Approval Resolution Flow

When a human reviewer approves or rejects in the DM Portal:


async function onApprovalResolved(
  approvalId:    string,
  tenantId:      string,
  approved:      boolean,
  reviewerNotes: string,
  userId:        string,
  chosenOption?: string,   // if the approval had options[], which one the reviewer selected
): Promise<void> {
  // 1. Update the approval record
  await db.update(approvals)
    .set({
      status:           approved ? 'approved' : 'rejected',
      reviewedByUserId: userId,
      reviewerNotes,
      resolvedAt:       new Date(),
    })
    .where(eq(approvals.id, approvalId));
 
  // 2. Fetch all linked activities
  const links = await db.query.approvalLinkedActivities.findMany({
    where: eq(approvalLinkedActivities.approvalId, approvalId),
  });
 
  // 3. Re-enqueue each linked activity with the resolution context
  for (const link of links) {
    const activity = await getActivity(link.activityId, tenantId);
 
    await db.update(activities)
      .set({ status: 'created', approvalId: null })
      .where(eq(activities.id, link.activityId));
 
    await enqueueActivity(activity, tenantId, activity.assigneeAgentRole, {
      wakeReason:       approved ? 'review_approved' : 'review_feedback',
      reviewApproved:   approved,
      reviewerFeedback: reviewerNotes
        + (chosenOption ? `\n\nSelected option: ${chosenOption}` : ''),
    });
  }
}

All blocked activities resume in a single resolution event. For a content_direction approval that was blocking 10 blog post writing activities, all 10 are re-enqueued simultaneously when the reviewer approves.

Approval Outcomes

Reviewer action	Effect on linked activities	Effect on approval
Approve	All re-enqueued with `wakeReason: 'review_approved'` + reviewer notes	`status → approved`
Approve with option	Re-enqueued with `wakeReason: 'review_approved'` + chosen option injected into prompt	`status → approved`
Reject / Send feedback	All re-enqueued with `wakeReason: 'review_feedback'` + reviewer notes	`status → rejected`
Edit and approve (`content_review` only)	Edited content saved as new output; activity marked `done`	`status → approved`
Approve subset (`content_review` multi-link)	Approved items advance; rejected items get revision activities	Mixed per-activity

Approval Expiry & Urgency Escalation

A BullMQ cron job runs hourly and checks pending approvals against their expires_at:


// Runs every hour
async function checkApprovalExpiry(): Promise<void> {
  const now = new Date();
 
  // 1. Warning: 24h before expiry — send in-app notification to reviewers
  const approachingExpiry = await db.query.approvals.findMany({
    where: and(
      eq(approvals.status, 'pending'),
      lte(approvals.expiresAt, addHours(now, 24)),
      gt(approvals.expiresAt, now),
    ),
  });
  for (const approval of approachingExpiry) {
    await notifyReviewers(approval, 'expiry_warning');
  }
 
  // 2. Expired: mark as expired, block all linked activities
  const expired = await db.query.approvals.findMany({
    where: and(
      eq(approvals.status, 'pending'),
      lte(approvals.expiresAt, now),
    ),
  });
  for (const approval of expired) {
    await db.update(approvals)
      .set({ status: 'expired', resolvedAt: now })
      .where(eq(approvals.id, approval.id));
 
    // Escalate: create a high-priority human task in DM Portal
    await createEscalationActivity({
      tenantId:    approval.tenantId,
      name:        `ESCALATED: Approval expired — ${approval.title}`,
      description: `Approval "${approval.title}" expired without a decision. `
                 + `${linkedCount} activities are blocked. Resolve immediately.`,
      priority:    'urgent',
    });
  }
}

Escalation does not auto-proceed. Expired approvals block linked activities indefinitely until a human resolves the escalation. Auto-proceeding on expiry is too risky for client-facing content — an unreviewed social post publishing or an ad going live would be a worse outcome than a delayed deliverable.

Urgency flag: Approvals created within 6 hours of a campaign go-live date (derived from deliverable_periods.endDate) are automatically flagged riskLevel: 'high' and get a 6h expiry window regardless of type.

Layer 4 — Rate Limiting & Automatic Halt

Per-agent rate limiting

Each BullMQ worker has a configurable rateLimiter that caps how many jobs per minute an agent can run, preventing loops:


new Worker(queue, processor, {
  connection: redis,
  concurrency: 3,
  limiter: {
    max:      10,    // max 10 jobs
    duration: 60000, // per 60 seconds
  }
});

Runaway detection

A watchdog runs every 5 minutes and looks for anomalies:


async function runWatchdog(): Promise<void> {
  // Agent running more than maxTimeoutMs
  const stuckRuns = await db.query.taskRuns.findMany({
    where: and(
      eq(taskRuns.status, 'running'),
      lt(taskRuns.startedAt, new Date(Date.now() - MAX_TASK_DURATION_MS))
    )
  });
 
  for (const run of stuckRuns) {
    await killTaskRun(run);
    await createEscalation(run.taskId, 'watchdog_timeout');
  }
 
  // Campaign cost exceeded cap
  const overBudget = await db.query.campaigns.findMany({
    where: and(
      isNotNull(campaigns.budgetCapUsd),
      gt(campaigns.totalCostUsd, campaigns.budgetCapUsd)
    )
  });
 
  for (const campaign of overBudget) {
    await pauseCampaign(campaign.id);
    await drainCampaignJobs(campaign.id);
    await notifyBudgetBreached(campaign);
  }
}

Automatic halt conditions

Condition	Action
Task cost cap exceeded	Abort call, fail task, escalate
Campaign budget cap exceeded	Pause campaign, drain queue, Slack alert
Agent LLM error rate > 50% in 10 min	Pause that agent’s queue, Slack alert
Task running > configured `timeoutMs`	Kill process, fail task, retry
Tool call returns rate-limit error	Back off, retry after reset, pause if persistent

Audit Trail

All agent actions are attributable. The tool_calls table records:

Which agent (task_run → agent_config)
For which client (task_run → task → campaign → client)
What action (tool_name, method)
What parameters (stored in input JSONB)
What the outcome was (output, status, error)
When (created_at)

This supports compliance reporting: “show me every action taken on behalf of Client X in the last 90 days.”

Package Location


apps/api/src/
├── middleware/                   # Rate limiting middleware
└── routes/
    └── approvals.ts              # HITL approval endpoints (create, resolve, query)

packages/agent-engine/src/
├── validators/                   # Character limits, brand voice, banned words, disclaimers
├── budget.ts                     # Budget halt logic
└── watchdog.ts                   # Runaway detection

packages/queue/src/
├── workers.ts                    # BullMQ rate limiter config
└── approval.ts                   # enqueueApprovalLinkedActivities() — fan-out on resolve

packages/integrations/src/
└── control-plane/
    └── create-approval.ts        # create_approval tool handler

Database tables: approvals, approval_linked_activities — see schema above.

Audit Trail Security

Immutability

The MongoDB audit_logs collection is append-only. The application’s MongoDB user has insert + find only on audit_logs — update and delete are revoked at the database role level. This means no application code path — even one with a bug or under active attack — can tamper with past audit records. See MongoDB Security for the role definition.

PII Redaction

Before any before/after diff is written to audit_logs, fields annotated as PII in the schema (email, phone, firstName, lastName, password) are replaced with '[REDACTED]'. This prevents audit logs from becoming a PII data store, while still capturing what changed.

IP addresses are stored at the top-level audit record (for legal acceptance events) but excluded from diff payloads.

Retention

Audit logs are retained for a minimum of 2 years. There is no TTL index on audit_logs. After 2 years, records may be archived to cold storage (S3 Glacier or equivalent) but are never deleted. Archival is an admin-only operation and itself generates an audit entry.

Budget Halt — Real-Time Cost Tracking

Costs are tracked incrementally during agent execution, not just after completion. This prevents a single long-running agent from blowing through the budget before the halt can fire.


// Called by the LLM adapter after every streaming chunk that includes usage metadata
async function trackAndCheckBudget(
  tenantId:      string,
  campaignId:    string,
  activityRunId: string,
  newCostUsd:    number,
): Promise<'continue' | 'halt'> {
  // Atomic increment + check in a single DB transaction
  const [updatedRun] = await db
    .update(activityRuns)
    .set({ costUsd: sql`cost_usd + ${newCostUsd}` })
    .where(eq(activityRuns.id, activityRunId))
    .returning({ costUsd: activityRuns.costUsd });
 
  // Check all three caps in order (cheapest check first)
  const agentConfig = await getCachedAgentConfig(tenantId);
  if (agentConfig.maxCostUsdPerActivity && updatedRun.costUsd > agentConfig.maxCostUsdPerActivity) {
    return 'halt';
  }
 
  const campaign = await getCachedCampaign(campaignId);
  if (campaign.budgetCapUsd) {
    const campaignSpend = await getCampaignTotalSpend(campaignId);  // Redis-cached, 5s TTL
    if (campaignSpend > campaign.budgetCapUsd) return 'halt';
  }
 
  const tenant = await getCachedTenant(tenantId);
  if (tenant.monthlySpendCapUsd) {
    const tenantSpend = await getTenantMonthlySpend(tenantId);  // Redis-cached, 5s TTL
    if (tenantSpend > tenant.monthlySpendCapUsd) return 'halt';
  }
 
  return 'continue';
}

On 'halt': the adapter kills the LLM process (SIGTERM), the run is marked failed with error: 'budget_exceeded', and no retry is scheduled. The campaign’s status is set to paused. A Slack alert fires to the ops channel and a notification is sent to the tenant admin.

Cached cap lookups: Agent config and campaign budget caps are cached in Redis (5-second TTL). This means a budget overrun may add at most one 5-second window of extra spend — an acceptable trade-off vs. a DB query on every token.