Skip to Content
ProvidersPlaywright (Browser Automation)

Playwright (Browser Automation)

Category: Browser Automation / Scraping
Integration type: Internal tool — no external credentials required
Runtime: Chromium (bundled via playwright-core; or system Chromium on Ollama/local agents)
Package: packages/tools/src/browser.ts


Purpose

The Playwright tool provides headless browser capabilities to agents that need to interact with live websites — without API access. It is an internal tool used by specific research and audit agents; tenants do not configure or connect it.

Use cases:

AgentPlaywright use
Competitor ResearcherScrape competitor home pages, service pages, and blog listings
Site AuditorAudit the client’s own site: page titles, meta descriptions, heading structure, broken links
Keyword ResearcherCheck live SERP results for target keywords (Google, Bing)
Backlink ResearcherFetch referring-domain pages to verify anchor text and link placement
Content RepurposerExtract full article text from URLs before repurposing
Landing Page WriterCapture screenshots of competitor landing pages for reference

Playwright is never used to take automated actions (form submissions, login, purchases). It is read-only: page loads, DOM parsing, and screenshots only.


Config Structure

No tenant-level configuration. The tool is configured at the platform level via environment variables:

interface BrowserToolConfig { headless: boolean; // Always true in production; false for local debugging timeout: number; // Page load timeout in ms (default: 30_000) proxyUrl?: string; // Optional HTTP proxy for residential IP rotation userAgent?: string; // Override default Chromium user agent maxConcurrent: number; // Max simultaneous browser contexts (default: 3) }

Environment variables:

BROWSER_HEADLESS=true BROWSER_TIMEOUT_MS=30000 BROWSER_PROXY_URL= # optional BROWSER_MAX_CONCURRENT=3

Integration Pattern

Tool layer (packages/tools/src/browser.ts)

import { chromium, Browser, BrowserContext } from 'playwright-core'; class BrowserTool { private browser: Browser | null = null; private async getBrowser(): Promise<Browser> { if (!this.browser) { this.browser = await chromium.launch({ headless: true, args: [ '--no-sandbox', '--disable-setuid-sandbox', '--disable-blink-features=AutomationControlled', ], }); } return this.browser; } private async newContext(): Promise<BrowserContext> { const browser = await this.getBrowser(); return browser.newContext({ userAgent: process.env.BROWSER_USER_AGENT ?? 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' + '(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36', viewport: { width: 1280, height: 900 }, ...(process.env.BROWSER_PROXY_URL && { proxy: { server: process.env.BROWSER_PROXY_URL }, }), }); } /** * Scrape the text content and metadata of a URL. * Used by Competitor Researcher, Content Repurposer. */ async scrapeUrl(url: string): Promise<{ title: string; description: string; h1: string[]; h2: string[]; bodyText: string; // Visible text, whitespace-normalised links: string[]; // All absolute href values on the page statusCode: number; }> { const context = await this.newContext(); const page = await context.newPage(); try { const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: parseInt(process.env.BROWSER_TIMEOUT_MS ?? '30000'), }); const [title, description, h1, h2, bodyText, links] = await Promise.all([ page.title(), page.$eval('meta[name="description"]', (el) => el.getAttribute('content') ?? '').catch(() => ''), page.$$eval('h1', (els) => els.map((el) => el.textContent?.trim() ?? '')), page.$$eval('h2', (els) => els.map((el) => el.textContent?.trim() ?? '')), page.$eval('body', (el) => el.innerText.replace(/\s+/g, ' ').trim()), page.$$eval('a[href]', (els) => els.map((el) => (el as HTMLAnchorElement).href).filter((h) => h.startsWith('http')), ), ]); return { title, description, h1, h2, bodyText: bodyText.slice(0, 50_000), // Cap at 50k chars to avoid token blowout links: [...new Set(links)].slice(0, 200), statusCode: response?.status() ?? 200, }; } finally { await context.close(); } } /** * Take a full-page screenshot of a URL. * Used by Landing Page Writer, Site Auditor. */ async screenshot(options: { url: string; fullPage: boolean; // false = viewport only }): Promise<{ buffer: Buffer; width: number; height: number }> { const context = await this.newContext(); const page = await context.newPage(); try { await page.goto(options.url, { waitUntil: 'networkidle', timeout: parseInt(process.env.BROWSER_TIMEOUT_MS ?? '30000'), }); const buffer = await page.screenshot({ fullPage: options.fullPage, type: 'png', }); const viewport = page.viewportSize(); return { buffer, width: viewport?.width ?? 1280, height: viewport?.height ?? 900, }; } finally { await context.close(); } } /** * Perform a Google SERP check for a keyword and return the top results. * Used by Keyword Researcher for live rank verification. */ async checkSerp(options: { keyword: string; engine: 'google' | 'bing'; countryCode: string; // e.g. "us", "in", "ae" limit: number; // Max results to return (e.g. 10) }): Promise<{ position: number; url: string; title: string; snippet: string; }[]> { const context = await this.newContext(); const page = await context.newPage(); const searchUrl = options.engine === 'google' ? `https://www.google.com/search?q=${encodeURIComponent(options.keyword)}&gl=${options.countryCode}&hl=en&num=${options.limit}` : `https://www.bing.com/search?q=${encodeURIComponent(options.keyword)}&cc=${options.countryCode}&count=${options.limit}`; try { await page.goto(searchUrl, { waitUntil: 'domcontentloaded', timeout: 30_000 }); if (options.engine === 'google') { return await page.$$eval('#search .g', (els, lim) => els.slice(0, lim).map((el, i) => ({ position: i + 1, url: (el.querySelector('a') as HTMLAnchorElement)?.href ?? '', title: el.querySelector('h3')?.textContent?.trim() ?? '', snippet: el.querySelector('.VwiC3b')?.textContent?.trim() ?? '', })), options.limit); } else { return await page.$$eval('#b_results .b_algo', (els, lim) => els.slice(0, lim).map((el, i) => ({ position: i + 1, url: (el.querySelector('a') as HTMLAnchorElement)?.href ?? '', title: el.querySelector('h2')?.textContent?.trim() ?? '', snippet: el.querySelector('.b_caption p')?.textContent?.trim() ?? '', })), options.limit); } } finally { await context.close(); } } /** * Extract all internal and external links from a page. * Used by Site Auditor for crawl and broken-link detection. */ async extractLinks(url: string): Promise<{ internal: string[]; external: string[]; }> { const result = await this.scrapeUrl(url); const origin = new URL(url).origin; const internal = result.links.filter((l) => l.startsWith(origin)); const external = result.links.filter((l) => !l.startsWith(origin)); return { internal, external }; } async close(): Promise<void> { if (this.browser) { await this.browser.close(); this.browser = null; } } }

Agent Workflow

Competitor Researcher

Competitor Researcher agent receives list of competitor URLs ▼ (for each URL, via tool call) BrowserTool.scrapeUrl(competitorUrl) ├── Extracts: title, meta description, H1/H2s, body text, links └── Returns structured page data to agent Agent analyses copy, positioning, and content structure Agent writes findings to research note (MongoDB)

Site Auditor

Site Auditor agent receives client site URL ├── BrowserTool.scrapeUrl(url) — for each page in crawl │ └── Checks: title, meta description, H1 count, missing meta ├── BrowserTool.extractLinks(url) — detects broken internal links └── BrowserTool.screenshot({ url, fullPage: false }) — captures viewport └── Screenshot stored to S3, linked in audit report

Keyword SERP check

Keyword Researcher agent has keyword list from SEMrush/DataForSEO ▼ (spot-check top keywords) BrowserTool.checkSerp({ keyword, engine: 'google', countryCode, limit: 10 }) ├── Returns live SERP positions for top results └── Agent notes which competitors rank for target keywords

Browser Resource Management

Browsers are expensive to spawn. The tool singleton is:

  • Created once per BullMQ worker process (not per task)
  • Shared across concurrent tasks up to BROWSER_MAX_CONCURRENT contexts
  • Closed gracefully on worker shutdown via process.on('SIGTERM')

Task-level isolation is achieved via BrowserContext (separate cookies, cache, network state per task), not separate Browser instances.


SERP Scraping Considerations

Direct Google scraping may be blocked by CAPTCHAs at high volume. Escalation path:

VolumeApproach
Low (< 50 checks/day)Direct Chromium scraping
Medium (50–500/day)Residential proxy rotation via BROWSER_PROXY_URL
High (> 500/day)Replace with DataForSEO SERP API (zero scraping)

The checkSerp() method is considered a fallback. The DataForSEO provider is the preferred source for keyword ranking data at scale.


Security Constraints

  • Read-only: No page.click(), page.fill(), or form submissions in any agent tool call
  • Domain allowlist: scrapeUrl() rejects URLs matching internal platform domains (e.g. app.leadmetrics.io)
  • Content size cap: bodyText is capped at 50,000 characters to prevent memory exhaustion and token blowout in LLM context
  • No credential injection: Browser contexts never receive cookies or auth tokens

Test Cases

Unit tests (packages/tools/src/browser.test.ts)

TestApproach
scrapeUrl() returns normalised body text capped at 50k charsMock page eval; assert bodyText.length <= 50000
scrapeUrl() deduplicates linksMock eval returning duplicate hrefs; assert Set dedup applied
screenshot() uses fullPage flag correctlyMock page.screenshot; assert fullPage forwarded
checkSerp() uses correct Google URL with country codeMock page.goto; assert URL contains &gl=${countryCode}
checkSerp() switches selector set for BingAssert Bing selectors used when engine: 'bing'
extractLinks() separates internal from external by originMock scrapeUrl returning mixed URLs; assert partition
close() clears browser referenceAssert browser set to null after close

Integration tests

Integration tests spin up a real Chromium instance against a local http-server fixture serving static HTML. They are tagged @slow and excluded from the default CI run.


© 2026 Leadmetrics — Internal use only