AI Prompt for Agent Architectures (ReAct, Plan-Execute, Multi-agent)
Agent loop that critiques and revises its own output for recruiting resume screen. Full trace capture via Arize Phoenix, retry budget, and ship criteria.
More prompts for Agent Architectures (ReAct, Plan-Execute, Multi-agent).
End-to-end CodeAct (code as action) agent implemented in Vercel AI SDK for SEO keyword research. Includes graph/state design, tool wiring, loop termination, observability via Braintrust, and evals.
Multi-agent loop-until-done with critic system in AutoGen tackling SQL report writing in a e-commerce context. Roles, handoffs, shared state, and supervisor logic.
Multi-agent loop-until-done with critic system in Inngest agent-kit tackling onboarding coordinator in a HR context. Roles, handoffs, shared state, and supervisor logic.
Agent loop that critiques and revises its own output for customer support triage. Full trace capture via LangSmith, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for incident postmortem drafting. Full trace capture via OpenTelemetry + Honeycomb, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for content calendar planning. Full trace capture via Weights & Biases Weave, retry budget, and ship criteria.
Build a self-correcting agent loop for recruiting resume screen using the Plan-and-Execute architecture. Every attempt is traced to Arize Phoenix, a critic scores it, and the loop retries until a ship threshold is met or the budget is exhausted.
**Model:** GPT-4o (actor), GPT-4o (critic — can be smaller)
**Runtime:** TypeScript + Node 20
**Framework:** OpenAI Agents SDK
## Part 1 — Architecture
Describe the loop:
1. **Actor** produces an attempt at recruiting resume screen
2. **Critic** scores it against a rubric and produces feedback
3. **Reviser** (usually the actor, re-prompted) applies feedback
4. Loop until score ≥ threshold OR retries ≥ max
Draw it as a OpenAI Agents SDK graph.
## Part 2 — Critic rubric
Write an explicit rubric for recruiting resume screen. The rubric must:
- Score on 3–5 dimensions, each 0–5
- Have concrete anchors ("5 = cites 3+ primary sources; 0 = no sources")
- Produce a structured JSON score, not free text
- Include a `blockers` field listing must-fix issues vs. nice-to-haves
Write the full critic prompt.
## Part 3 — Actor prompts
Two variants:
- **Initial attempt prompt** (cold)
- **Revision prompt** (takes previous attempt + critic feedback, produces revised attempt)
The revision prompt must force the actor to address every blocker explicitly. Include a checklist it fills in.
## Part 4 — Termination
- Score threshold to ship (justify the number based on eval data)
- Max retries (3? 5?) — cost vs. quality tradeoff
- Regression guard: if revision N scores lower than revision N-1, roll back to N-1
## Part 5 — Arize Phoenix instrumentation
Each loop iteration emits a span with:
- Iteration number
- Actor output (full)
- Critic scores (per dimension)
- Critic feedback (full)
- Tokens + cost
- Wall time
Parent trace for the whole task with aggregate score trajectory. Dashboards: score-over-iteration curve, avg iterations to ship, % of tasks that hit max-retries without shipping.
## Part 6 — Failure modes
- **Sycophantic critic:** critic always scores high. Mitigate with adversarial examples, stricter rubric, critic ablation tests.
- **Critic-actor collusion:** when both are the same model, they develop shared blind spots. Mitigate by using a different critic model, or a human-authored rubric-enforced critic.
- **Oscillation:** revisions go back and forth between two states. Detect via output similarity across N iterations; break by forcing a different approach.
- **Cost blowup:** each retry is a full task cost. Log cumulative cost, alarm past 3x a single-shot baseline.
## Part 7 — Human-in-the-loop escape hatch
If the loop can't ship after max retries, hand off to a human with:
- The final attempt
- The critic's blockers
- The iteration history (so the human sees what was tried)
Define the exact handoff payload.
## Part 8 — Evaluation
Compare the self-correcting loop to a single-shot baseline on 30 held-out recruiting resume screen inputs:
- Quality delta (does correction actually help?)
- Cost multiplier (is the quality worth it?)
- Time delta
Only ship if quality gain > cost multiplier would suggest.
## Part 9 — Implementation
Write the code in OpenAI Agents SDK:
- Graph definition
- Actor + critic + reviser nodes
- State type
- Arize Phoenix hooks
- Entry point
Ship real, runnable code with the prompts inlined and the rubric committed to the repo.