Claude Prompt for Agent Architectures (ReAct, Plan-Execute, Multi-agent)
End-to-end agent debate agent implemented in Inngest agent-kit for competitive analysis. Includes graph/state design, tool wiring, loop termination, observability via LangSmith, and evals.
You are a senior agent engineer. Design and ship a production agent debate agent using Inngest agent-kit for the task: **competitive analysis**.
**Model:** Claude Opus 4
**Runtime:** TypeScript + Bun
**Observability:** LangSmith
**Primary eval metric:** LLM-as-judge score
## Part 1 — Why agent debate for this task
Briefly justify agent debate over 2 alternatives for competitive analysis. Be honest about where agent debate breaks down for this task and how you'll mitigate.
## Part 2 — State / graph design
Draw (in text or Mermaid) the agent topology:
- Nodes: each agent, each tool, each human-in-the-loop gate
- Edges: the allowed transitions
- State object: exact shape of the state that flows through the graph
- Termination conditions: when does the loop stop? (success, max steps, budget, user cancel)
If Inngest agent-kit is LangGraph: define the `StateGraph`, the `TypedDict` state, each node function, conditional edges, and the compiled graph.
If CrewAI: define agents (role/goal/backstory), tasks (description/expected_output), and the crew process (sequential/hierarchical).
If AutoGen: define the agents, the group chat manager, and the speaker selection policy.
If OpenAI Agents SDK or Claude Agent SDK: the agent definitions, handoffs, and tool registrations.
If Mastra / Vercel AI SDK / Pydantic AI / Smolagents: framework-idiomatic setup.
Write the code. Real code, not a sketch.
## Part 3 — Tool surface
For competitive analysis, list the 5–10 tools the agent needs. For each:
- Name and one-line purpose
- Inputs (typed)
- Output shape
- Whether it's read-only or side-effectful (and if side-effectful, is there a confirmation gate?)
## Part 4 — Prompts
Write the system prompt for each agent in the topology. Prompts must include:
- Role + objective
- Tools available and when to use each
- Termination condition ("stop when you can answer X, DO NOT keep exploring")
- Output format the caller expects
For agent debate specifically, include the pattern-specific instructions (e.g. for ReAct: "Think step-by-step. Output Thought → Action → Observation blocks." For Reflexion: "After each attempt, write a Reflection paragraph analyzing what went wrong.").
## Part 5 — Loop control
- Max total steps across the whole run
- Max cost (tokens × price)
- Max wall clock
- How each limit is enforced (check in the loop vs. middleware)
- What the agent returns when a limit is hit (partial result + reason)
## Part 6 — Observability with LangSmith
- What to log per step (inputs, outputs, tool calls, tokens, latency)
- Trace ID propagation across agent handoffs
- Dashboards: success rate, avg steps, cost per run, error categories
- Alerting thresholds
## Part 7 — Evals
Create an eval harness:
- 20 held-out competitive analysis inputs covering easy / medium / hard / adversarial
- Grading: mix of programmatic checks and LLM-as-judge (show the judge rubric)
- Report format: per-case pass/fail + aggregate on LLM-as-judge score
- CI integration: fail the build if LLM-as-judge score drops >5% vs. last green
## Part 8 — Failure modes and guardrails
List the 5 most likely ways this agent will go wrong in production, and the guard for each. Examples to consider: infinite loops, tool abuse, PII leakage, cost blowup, confidently wrong answers, stuck on a dead tool.
## Part 9 — Deployment
- Where it runs
- How a task is triggered (cron, webhook, user-invoked, slash command)
- How results are delivered (Slack, email, PR, DB write)
Ship real code. A reviewer should be able to clone, configure env vars, and run the agent end-to-end.More prompts for Agent Architectures (ReAct, Plan-Execute, Multi-agent).
End-to-end CodeAct (code as action) agent implemented in Vercel AI SDK for SEO keyword research. Includes graph/state design, tool wiring, loop termination, observability via Braintrust, and evals.
Multi-agent loop-until-done with critic system in AutoGen tackling SQL report writing in a e-commerce context. Roles, handoffs, shared state, and supervisor logic.
Multi-agent loop-until-done with critic system in Inngest agent-kit tackling onboarding coordinator in a HR context. Roles, handoffs, shared state, and supervisor logic.
Agent loop that critiques and revises its own output for customer support triage. Full trace capture via LangSmith, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for incident postmortem drafting. Full trace capture via OpenTelemetry + Honeycomb, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for content calendar planning. Full trace capture via Weights & Biases Weave, retry budget, and ship criteria.