ChatGPT Prompt for Memory & Tool Use
Scalable tool selection pattern for agents that outgrow 10-tool context. Strategy: tool allowlist by user role. Framework: Mastra. Covers retrieval, routing, and eval.
More prompts for Memory & Tool Use.
Entity-centric memory model for a legal agent performing customer support triage. Tracks people, orgs, docs, and relationships with episodic memory patterns over Milvus.
Implement a knowledge graph memory memory system for a AutoGen agent handling code PR review. Vector store: Supabase pgvector. Covers write, retrieve, prune, and eval.
Implement a knowledge graph memory memory system for a LangGraph agent handling meeting note extraction. Vector store: Chroma. Covers write, retrieve, prune, and eval.
Entity-centric memory model for a sales ops agent performing onboarding coordinator. Tracks people, orgs, docs, and relationships with knowledge graph memory patterns over pgvector.
Managed context window for long-running agents doing contract redlining. Covers rolling summarization, reference-and-expand, budget allocation, and eval of context loss.
Managed context window for long-running agents doing investor update drafting. Covers rolling summarization, reference-and-expand, budget allocation, and eval of context loss.
You are scaling an agent from 8 tools to 120+ tools. At that scale, dumping all tool definitions into the system prompt breaks: context bloat, model confusion, tool picking error rate climbs. Build a tool allowlist by user role tool selection layer for Mastra. **Model:** GPT-4o **Runtime:** TypeScript + Bun ## Part 1 — The scaling problem Quantify the baseline: - Avg tool-definitions tokens in prompt today - Tool call accuracy at 8 tools vs. projected at 120 - Where picking errors come from (similar names, overlapping descriptions, tool sprawl) ## Part 2 — Strategy: tool allowlist by user role Describe how tool allowlist by user role works in concrete terms for this agent: - What the pre-filter / router sees - How it returns a candidate tool subset - What the main agent then sees - Fallback when the pre-filter misses ## Part 3 — Tool catalog Design a tool registry: - Per-tool metadata: name, short description, long description, tags, cost tier, destructive flag, example queries - Per-tool embedding (for embedding-retrieval strategies) - Index structure (inverted index for regex/tag, vector index for embedding) Write the data model + populate with 5 realistic example tools for reference. ## Part 4 — Implementation For tool allowlist by user role: - If **LLM-driven routing**: router prompt, structured output of tool IDs, confidence scores - If **embedding-based**: embed query, top-K from the tool vector index, threshold filtering - If **regex / keyword**: rule file, ordering semantics, ambiguity resolution - If **allowlist by role**: RBAC model, role→tool map - If **cost-aware routing**: price tiers, cheap-first policy, escalation - If **human-in-the-loop**: confirmation UI spec for destructive ops - If **cache-hit fallback**: cache key design, staleness rules - If **MCP capability discovery**: dynamic `tools/list` per session Write the full code in TypeScript + Bun + Mastra. ## Part 5 — Output summarization Tools often return verbose JSON. Large results break context. Add: - Per-tool result schema - Summarizer (LLM or template) that converts raw result → model-friendly summary - `full_result_id` pointer so the agent can fetch the full payload if it actually needs it Design the pattern + show a before/after for 2 of your example tools. ## Part 6 — Metrics Track per-turn: - Candidate set size (how aggressive is the filter?) - Picked-tool-in-candidate-set rate (does the filter include the right one?) - Picked-tool-is-correct rate (does the model pick well from the candidates?) - Latency overhead of the selection step A/B gate: only ship tool allowlist by user role if tool-call accuracy improves AND total latency doesn't regress >15%. ## Part 7 — Failure modes - Router returns empty → fallback to full list - Router returns wrong subset → escalation path (retry with larger K, or full list) - Embedding drift when tool descriptions change → re-embed on deploy - Cache staleness for cache-hit fallback ## Part 8 — Evaluation harness Build 50 (query → correct tool) pairs covering common, edge, and adversarial. Measure: - Top-1 accuracy - Top-5 recall - Latency p50/p95 CI gate. ## Part 9 — Rollout Shadow mode first: run tool allowlist by user role in parallel with the old full-list path, log divergences, only promote once divergences are acceptable. Produce: the tool registry schema, the selection layer code, the summarizer, the eval harness.