Claude Prompt for Computer Use & Browser Agents
End-to-end Computer Use agent that can scrape product listings on Amazon autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
More prompts for Computer Use & Browser Agents.
End-to-end Computer Use agent that can fill job applications on company portals autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
End-to-end Computer Use agent that can manage ads in Meta Ads Manager autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
End-to-end Computer Use agent that can fill job applications on company portals autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
End-to-end Computer Use agent that can download reports from Stripe dashboard autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
Reproducible eval sandbox for testing Computer Use / browser agents on schedule posts in Buffer in cybersecurity context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on triage tickets in Zendesk in education context. Fixture sites, gold trajectories, and regression gates.
You are building a Computer Use agent (screen-seeing + mouse/keyboard controlling) that can scrape product listings on Amazon. Model: o3-mini with computer-use tools. Runtime: Python 3.11 + uv.
Computer Use is more capable than pure browser automation but also more dangerous: the agent sees the whole screen and can click anywhere. Design accordingly.
## Part 1 — Scope and constraints
- **What exactly is "done"?** Define success criteria for scrape product listings on Amazon as a machine-checkable predicate (URL pattern, DOM state, downloaded file existence, DB row).
- **What's out-of-scope?** The agent must not touch: other apps, other browser tabs, system settings, files outside working dir.
- **What's the budget?** Max screenshots, max seconds, max cost.
- **Who runs it?** Dedicated VM / ephemeral container / user's machine? Each has different safety requirements.
## Part 2 — Environment
- Host: dedicated Docker container or cloud VM (not the user's personal desktop)
- Display: Xvfb virtual display at fixed resolution (1280x800 recommended — balances detail vs. token cost)
- Browser: Chrome/Firefox launched in a pristine profile every run
- Network: egress allowlist to the domains needed for scrape product listings on Amazon
- Filesystem: scratch dir; no access to secrets outside env
Write the Dockerfile + launch script.
## Part 3 — The perception-action loop
Pseudocode the loop:
1. Take screenshot
2. Send to o3-mini with tools: screenshot, click, type, scroll, key, wait
3. Model returns next action(s)
4. Execute action with guardrails (below)
5. Loop until success predicate OR budget exhausted OR safety trip
## Part 4 — Action guardrails
Each action passes through middleware:
- **Bounds check**: click coordinates inside the screen? Click inside the intended app?
- **Rate limit**: no more than N actions/sec (humans don't machine-gun clicks)
- **Destructive detection**: does the click land on "Delete", "Pay", "Submit $", a confirm dialog? → require explicit high-confidence OR human confirmation
- **Off-task detection**: did we navigate somewhere unrelated to scrape product listings on Amazon?
- **Loop detection**: same screenshot → same action → same result three times in a row → break out
Write the middleware.
## Part 5 — Handling unexpected UI
Real-world sites will throw curveballs. Design responses to:
- Cookie consent banners (auto-dismiss with "Reject all" if possible, else manage)
- Login walls (if credentials not provided → stop and ask; never guess)
- CAPTCHA (stop and hand off — never try to bypass)
- 2FA (stop and hand off)
- Rate limit pages ("too many requests") — back off, don't retry
- A/B'd UI variants (the screenshot the agent expected isn't what it sees) — re-plan, don't force
- Modal dialogs (handle or dismiss explicitly)
## Part 6 — Safety gates
Before any action in these categories, require explicit confirmation:
- Payment / checkout completion
- Account creation
- Sending messages to humans (email, DM)
- Deleting data
- Changing account settings / passwords
The confirmation channel is either the calling user (Slack DM, UI prompt) or a supervisor agent with a stricter rubric.
## Part 7 — Memory within a task
Track:
- Initial goal (scrape product listings on Amazon) and sub-goals
- Actions taken (for retry avoidance)
- Facts discovered from the screen (e.g. "confirmation #A123")
- Stuck-counter: if no progress in N screenshots, escalate
## Part 8 — Recording + observability
- Save every screenshot with the action taken (for debugging + audit)
- Structured trace: timestamp, screenshot hash, action, outcome
- Redact PII from stored artifacts
- Cost meter (screenshots are expensive — each is ~1200+ tokens)
## Part 9 — Eval
Run scrape product listings on Amazon 30 times on a fresh environment. Measure:
- Success rate
- Avg actions to complete
- Avg cost
- Failure categories (login, CAPTCHA, off-task, loop, UI-changed)
Ship criteria: ≥80% success OR reliable human-handoff on failure.
## Part 10 — Implementation
Write the full agent:
- Loop driver
- Safety middleware
- o3-mini client with computer-use tools wired
- Success predicate checker
- Run logger
- CLI entry: `agent run --goal="scrape product listings on Amazon" --max-actions=50`
Produce real, runnable code, not pseudocode.