AI Prompt for Agent Architectures (ReAct, Plan-Execute, Multi-agent)
End-to-end agent debate agent implemented in LlamaIndex agents for email triage and drafting. Includes graph/state design, tool wiring, loop termination, observability via Langfuse, and evals.
More prompts for Agent Architectures (ReAct, Plan-Execute, Multi-agent).
End-to-end CodeAct (code as action) agent implemented in Vercel AI SDK for SEO keyword research. Includes graph/state design, tool wiring, loop termination, observability via Braintrust, and evals.
Multi-agent loop-until-done with critic system in AutoGen tackling SQL report writing in a e-commerce context. Roles, handoffs, shared state, and supervisor logic.
Multi-agent loop-until-done with critic system in Inngest agent-kit tackling onboarding coordinator in a HR context. Roles, handoffs, shared state, and supervisor logic.
Agent loop that critiques and revises its own output for customer support triage. Full trace capture via LangSmith, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for incident postmortem drafting. Full trace capture via OpenTelemetry + Honeycomb, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for content calendar planning. Full trace capture via Weights & Biases Weave, retry budget, and ship criteria.
You are a senior agent engineer. Design and ship a production agent debate agent using LlamaIndex agents for the task: **email triage and drafting**.
**Model:** o1
**Runtime:** TypeScript + Bun
**Observability:** Langfuse
**Primary eval metric:** tool call precision
## Part 1 — Why agent debate for this task
Briefly justify agent debate over 2 alternatives for email triage and drafting. Be honest about where agent debate breaks down for this task and how you'll mitigate.
## Part 2 — State / graph design
Draw (in text or Mermaid) the agent topology:
- Nodes: each agent, each tool, each human-in-the-loop gate
- Edges: the allowed transitions
- State object: exact shape of the state that flows through the graph
- Termination conditions: when does the loop stop? (success, max steps, budget, user cancel)
If LlamaIndex agents is LangGraph: define the `StateGraph`, the `TypedDict` state, each node function, conditional edges, and the compiled graph.
If CrewAI: define agents (role/goal/backstory), tasks (description/expected_output), and the crew process (sequential/hierarchical).
If AutoGen: define the agents, the group chat manager, and the speaker selection policy.
If OpenAI Agents SDK or Claude Agent SDK: the agent definitions, handoffs, and tool registrations.
If Mastra / Vercel AI SDK / Pydantic AI / Smolagents: framework-idiomatic setup.
Write the code. Real code, not a sketch.
## Part 3 — Tool surface
For email triage and drafting, list the 5–10 tools the agent needs. For each:
- Name and one-line purpose
- Inputs (typed)
- Output shape
- Whether it's read-only or side-effectful (and if side-effectful, is there a confirmation gate?)
## Part 4 — Prompts
Write the system prompt for each agent in the topology. Prompts must include:
- Role + objective
- Tools available and when to use each
- Termination condition ("stop when you can answer X, DO NOT keep exploring")
- Output format the caller expects
For agent debate specifically, include the pattern-specific instructions (e.g. for ReAct: "Think step-by-step. Output Thought → Action → Observation blocks." For Reflexion: "After each attempt, write a Reflection paragraph analyzing what went wrong.").
## Part 5 — Loop control
- Max total steps across the whole run
- Max cost (tokens × price)
- Max wall clock
- How each limit is enforced (check in the loop vs. middleware)
- What the agent returns when a limit is hit (partial result + reason)
## Part 6 — Observability with Langfuse
- What to log per step (inputs, outputs, tool calls, tokens, latency)
- Trace ID propagation across agent handoffs
- Dashboards: success rate, avg steps, cost per run, error categories
- Alerting thresholds
## Part 7 — Evals
Create an eval harness:
- 20 held-out email triage and drafting inputs covering easy / medium / hard / adversarial
- Grading: mix of programmatic checks and LLM-as-judge (show the judge rubric)
- Report format: per-case pass/fail + aggregate on tool call precision
- CI integration: fail the build if tool call precision drops >5% vs. last green
## Part 8 — Failure modes and guardrails
List the 5 most likely ways this agent will go wrong in production, and the guard for each. Examples to consider: infinite loops, tool abuse, PII leakage, cost blowup, confidently wrong answers, stuck on a dead tool.
## Part 9 — Deployment
- Where it runs
- How a task is triggered (cron, webhook, user-invoked, slash command)
- How results are delivered (Slack, email, PR, DB write)
Ship real code. A reviewer should be able to clone, configure env vars, and run the agent end-to-end.