Skip to content

Browser Automation

Navigate across multiple live webpages, take a screenshot at each stop, and download every PNG — all from a single prompt.

What you'll build

A script that launches a headless browser inside a cloud sandbox, visits Hacker News, clicks into the top story's comments, then visits a GitHub repo — screenshotting each page along the way.

The Script

Create a file called browser.ts:

typescript
import { createClient } from "swarmlord";
import { writeFileSync, mkdirSync } from "fs";

// prepare
const client = createClient({ apiKey: process.env.SWARMLORD_API_KEY! });
const session = await client.agent("build").createSession();

// run agent
await session.send(
    `Browse a few pages and take screenshots along the way:
1. Go to news.ycombinator.com and screenshot the front page → /workspace/1-hn.png
2. Click into the top story and screenshot the comments page → /workspace/2-comments.png
3. Go to https://github.com/pingdotgg/t3code and screenshot the repo page → /workspace/3-github.png
Save each screenshot before navigating to the next page.`,
    { onText: delta => process.stdout.write(delta) }
);

// download artifacts
mkdirSync("output", { recursive: true });
const files = ["1-hn.png", "2-comments.png", "3-github.png"];
for (const file of files) {
    try {
        const buf = await session.getFileBuffer(`/workspace/${file}`);
        writeFileSync(`output/${file}`, new Uint8Array(buf));
        console.log(`\nDownloaded output/${file}`);
    } catch {
        console.log(`\nSkipped ${file} (not found)`);
    }
}

await session.end();

Run It

bash
export SWARMLORD_API_KEY="your-key-here"
bun browser.ts

The agent streams its work in real-time — launching a headless Chromium browser, navigating to each page, capturing screenshots, and saving them to the sandbox. When it finishes, all three PNGs land in your local output/ folder.

Output

These are the actual screenshots produced by the script above:

1. Hacker News front page

Screenshot of Hacker News front page

2. Top story comments

Screenshot of Hacker News comments page

3. GitHub repository

Screenshot of t3code GitHub repo

How It Works

StepWhat happens
createClientAuthenticates with the swarmlord API
agent("build").createSession()Spins up a session with a Linux sandbox and all tools enabled
session.send(prompt)The agent invokes the browser tool three times — a headless Chromium instance powered by Cloudflare Browser Rendering — navigating, clicking, and screenshotting at each stop
getFileBuffer (loop)Downloads each binary PNG from the sandbox to your machine
session.end()Cleans up the session and its sandbox

Browser tool capabilities

The browser tool supports more than screenshots. It can click, type, scroll, wait for selectors, and scrape rendered text from JavaScript-heavy pages. See the Tools reference for the full API.

SDK released under the MIT License.