AI Prompt for Computer Use & Browser Agents
Browser automation agent using Playwright + GPT-4o that can pull analytics from GA4 dashboard including login, session reuse, and dynamic DOM selectors.
More prompts for Computer Use & Browser Agents.
End-to-end Computer Use agent that can fill job applications on company portals autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
End-to-end Computer Use agent that can manage ads in Meta Ads Manager autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
End-to-end Computer Use agent that can fill job applications on company portals autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
End-to-end Computer Use agent that can download reports from Stripe dashboard autonomously. Screenshot loop, action grounding, safety gates, and recovery from unexpected UI states.
Reproducible eval sandbox for testing Computer Use / browser agents on schedule posts in Buffer in cybersecurity context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on triage tickets in Zendesk in education context. Fixture sites, gold trajectories, and regression gates.
Build a browser automation agent using Playwright + GPT-4o that can pull analytics from GA4 dashboard. DOM-based (not vision-based) — faster, cheaper, more deterministic where possible. Runtime: TypeScript + Node 20.
## Part 1 — When DOM beats vision (and vice versa)
Compared to Computer Use / vision agents, Playwright is:
- Faster (no screenshot round trips)
- Cheaper (no image tokens)
- More deterministic (selectors don't drift mid-action)
But weaker when:
- The UI is canvas/image-based
- Selectors are deeply obfuscated or change every page load
- Anti-bot detection is aggressive
For pull analytics from GA4 dashboard, justify DOM as the right tool. If it's marginal, architect a hybrid fallback to vision.
## Part 2 — Setup
- Playwright with Chromium, headed mode for dev, headless for production
- Per-run browser context (fresh), or persistent context if we need session reuse
- User agent, viewport, locale — make it look like a normal user
- Request/response interception for debugging
Dockerfile + launch script.
## Part 3 — Login + session
Most pull analytics from GA4 dashboard workflows require login. Design:
- **Credentials source**: env vars, secret manager, 1Password CLI — never hardcoded
- **Login flow**: scripted (login form) vs. imported cookies vs. OAuth redirect
- **Session persistence**: save `storageState` after login, reuse for subsequent runs
- **Session expiry**: detect logged-out state, trigger re-login automatically
- **2FA**: pause for human OTP input, resume after
Write the login module.
## Part 4 — Selector strategy
Hardcoded selectors rot. Instead:
1. Prefer role + name (`getByRole('button', { name: 'Submit' })`) — resilient to CSS changes
2. Prefer `data-testid` where available
3. Fall back to LLM-driven selector: send the page's a11y tree to GPT-4o and ask for the selector for "the Submit button in the checkout form"
4. Cache successful selectors per site + action, invalidate on failure
Write the resilient-selector module.
## Part 5 — The agent loop
1. Navigate / observe current page (a11y tree + URL)
2. Ask GPT-4o: "given goal=pull analytics from GA4 dashboard and current state, what's the next action?"
3. Structured action output: { type: "click"|"fill"|"goto"|"wait"|"done", selector?, value? }
4. Execute with Playwright
5. Verify expected post-condition
6. Loop
Write the code.
## Part 6 — Wait strategies
Never `page.waitForTimeout(n)` except as last resort. Instead:
- `waitForURL` after navigation
- `waitForSelector` with state (visible/hidden/attached)
- `waitForResponse` for API calls the click triggers
- `waitForLoadState('networkidle')` sparingly (slow)
## Part 7 — Error recovery
- Selector not found → re-query a11y tree, ask model for updated selector
- Unexpected navigation → verify we're where we expected; if not, re-plan
- Element blocked by overlay → dismiss common overlays, retry
- Rate limit → exponential backoff
- Downtime page → wait + retry, cap attempts
## Part 8 — Anti-detection
Be a good citizen:
- Respect robots.txt where applicable
- Realistic action timing (don't click 10 things in 100ms)
- Don't run N parallel agents on one account
- Honor rate limit headers
We do NOT help bypass CAPTCHAs, click-fraud, scraping in violation of ToS. Bake those refusals into the system prompt.
## Part 9 — Data extraction
For extraction steps in pull analytics from GA4 dashboard:
- Prefer structured extraction from the a11y tree or DOM (not OCR of screenshots)
- Validate with a schema
- Dedupe across pagination
- Normalize before storing
## Part 10 — Implementation
Write:
- `src/agent.ts`: the loop driver
- `src/selectors.ts`: resilient selector resolver
- `src/session.ts`: login + storageState handling
- `src/actions.ts`: typed action executor
- `src/extract.ts`: structured extraction helpers
- Tests hitting a local fixture site
- CLI: `pnpm agent --goal="pull analytics from GA4 dashboard"`
Ship real code.