ChatGPT Prompt for Agent Architectures (ReAct, Plan-Execute, Multi-agent)
Refactor an existing single-loop tool-calling agent for marketplace moderation into a CodeAct (code as action) architecture using Haystack agents. Focus: what to split, what to keep, what to evaluate.
You are reviewing a working but brittle marketplace moderation agent. It's a single-loop tool-calling agent (one LLM in a while-loop with N tools) and it's hitting a ceiling. Your job: refactor it to CodeAct (code as action) using Haystack agents without regressing what works. **Model:** Claude Opus 4 **Runtime:** Python 3.11 + uv ## Part 1 — Honest baseline Before touching the code, run the existing agent on 50 held-out marketplace moderation inputs and record: - Success rate - Avg steps to completion - Cost per run - The 10 most common failure modes (categorized) You will grade the refactor against this baseline. **If the refactor is worse on cost or latency without clearly better success, you roll back.** ## Part 2 — Decide what to split Not every problem needs multi-agent. For marketplace moderation, decide: - Is the failure mode "tool confusion" (too many tools → router)? - Is it "shallow reasoning" (→ Plan-and-Execute or Tree-of-Thoughts)? - Is it "confidently wrong" (→ Reflexion / critic)? - Is it "context bloat" (→ worker agents with scoped context)? Match the pathology to CodeAct (code as action). If CodeAct (code as action) doesn't match the actual pathology, refuse the refactor and propose the right one. ## Part 3 — Migration plan Write a phased plan: 1. **Phase 0:** freeze the old agent, lock baseline metrics 2. **Phase 1:** extract shared state + tool layer (no behavior change) 3. **Phase 2:** introduce the new CodeAct (code as action) scaffolding alongside the old, behind a feature flag 4. **Phase 3:** dual-run on a shadow traffic slice, compare 5. **Phase 4:** promote if wins on success AND no worse on cost/latency 6. **Phase 5:** delete the old path ## Part 4 — Refactor in Haystack agents Write the new code. Show the diff-style structure: - What moved from a single system prompt into specialized agent prompts - How the old tool list was partitioned by role - How state replaces previous ad-hoc scratchpads - Where the new control edges are in the graph ## Part 5 — Compatibility surface The agent is called from somewhere. Preserve its input/output contract so callers don't break. Document: - Input schema (unchanged) - Output schema (unchanged) - New trace structure (may change — callers of logs care) ## Part 6 — New failure modes CodeAct (code as action) introduces new failure modes the single-loop didn't have (e.g. agents getting stuck in a loop handing off to each other). List them and add guards. ## Part 7 — Eval Re-run the 50 held-out inputs. Compare head-to-head: - Success rate - Cost - Latency p50/p95 - New qualitative failure modes Decision rule: ship only if success improves by ≥10% OR cost drops by ≥20% with equal success. ## Part 8 — Rollback - How to flip the flag back - Data migration concerns (trace format, state serialization) - Communication plan Produce the new agent code, the migration plan, the eval report template, and the rollback runbook.
More prompts for Agent Architectures (ReAct, Plan-Execute, Multi-agent).
End-to-end CodeAct (code as action) agent implemented in Vercel AI SDK for SEO keyword research. Includes graph/state design, tool wiring, loop termination, observability via Braintrust, and evals.
Multi-agent loop-until-done with critic system in AutoGen tackling SQL report writing in a e-commerce context. Roles, handoffs, shared state, and supervisor logic.
Multi-agent loop-until-done with critic system in Inngest agent-kit tackling onboarding coordinator in a HR context. Roles, handoffs, shared state, and supervisor logic.
Agent loop that critiques and revises its own output for customer support triage. Full trace capture via LangSmith, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for incident postmortem drafting. Full trace capture via OpenTelemetry + Honeycomb, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for content calendar planning. Full trace capture via Weights & Biases Weave, retry budget, and ship criteria.