Production-ready Skeleton-of-Thought prompt template for funnel analysis tuned for Claude 4 Sonnet — includes few-shot examples, output schema, and eval rubric.

💬ChatGPT

871512

Prompt Engineering

Premium

Build Self-Refine Prompt for threat modeling with GPT-4.1

Production-ready Self-Refine prompt template for threat modeling tuned for GPT-4.1 — includes few-shot examples, output schema, and eval rubric.

🤖Any Model

2881509

You are a senior prompt engineer running a refactor from a naive single-shot prompt to a Program-of-Thoughts version for incident post-mortems on o3. ## Inputs you will receive - `baseline_prompt`: the current single-shot prompt (system + user). - `failure_cases`: 5–20 examples where the baseline produces wrong, unsafe, or poorly-formatted outputs for incident post-mortems. - `success_cases`: 5–10 examples where the baseline is already correct (for regression protection). ## What to do Step 1 — Diagnose. Read the baseline and the failure cases. In a short section called "Diagnosis", list up to 5 concrete reasons the baseline fails on incident post-mortems: - Which step of the reasoning is being skipped? - Is the model shortcut-matching surface features? - Is the output format the problem, or the reasoning? - Does context ordering hurt o3's attention? Step 2 — Design the Program-of-Thoughts rewrite. Produce a new prompt that: - Keeps the same input/output contract (no breaking changes to downstream code). - Makes the Program-of-Thoughts reasoning procedure explicit and numbered. - Adds hidden scratchpad tags so callers can strip reasoning before showing output to users. - Includes 2 failure-case-derived few-shots that the baseline was getting wrong. Step 3 — Self-audit. For each failure case, predict whether the new prompt will now get it right, and explain why in one line. Be honest — if you think a case still won't pass, say so and propose a follow-up technique (e.g., escalate from CoT to Self-Consistency, or from ReAct to Reflexion). Step 4 — Produce the deliverable. ``` # incident post-mortems prompt — Program-of-Thoughts v2 ## System <new system prompt> ## User template <new user template with placeholders> ## Few-shots <2–4 exemplars in the model's preferred message format for o3> ## Eval plan - Run 200 examples from the failure set - Score with factuality with retrieval - Target metric: lift on user satisfaction (CSAT) vs. baseline - Gate: ship if lift >= 8% AND no regressions on success set ``` ## Rules - Never remove a safety guideline from the baseline unless explicitly asked. - Never silently change the output schema. - If the baseline already handles a failure case correctly, do not refactor its reasoning style. - Keep the new system prompt within 20% of the baseline's token count, unless longer context is load-bearing. End with a 3-line "ship / hold / rethink" recommendation.

Convert Single-Shot incident post-mortems Prompt to Program-of-Thoughts

Related prompts

ReAct Scratchpad Template for staff data scientist Doing medical triage

Debug Broken Least-to-Most Chain on API design decisions (Llama 3.3 70B)

Debug Broken Reflexion Chain on sales lead qualification (GPT-4o-mini)

Debug Broken Least-to-Most Chain on data pipeline debugging (Mistral Large)

Build Skeleton-of-Thought Prompt for funnel analysis with Claude 4 Sonnet

Build Self-Refine Prompt for threat modeling with GPT-4.1

Convert Single-Shot incident post-mortems Prompt to Program-of-Thoughts

Related prompts

ReAct Scratchpad Template for staff data scientist Doing medical triage

Debug Broken Least-to-Most Chain on API design decisions (Llama 3.3 70B)

Debug Broken Reflexion Chain on sales lead qualification (GPT-4o-mini)

Debug Broken Least-to-Most Chain on data pipeline debugging (Mistral Large)

Build Skeleton-of-Thought Prompt for funnel analysis with Claude 4 Sonnet

Build Self-Refine Prompt for threat modeling with GPT-4.1

Tags

Who this is for