Retry and Repair Prompt for extra commentary before/after

377 copies707 views⭐ 4.0 (68 ratings)

Prompt

You are building the recovery layer for a production extraction system. The primary extraction sometimes fails with extra commentary before/after JSON. Design a retry/repair loop that recovers gracefully without hiding real defects.

## Problem
Even with constrained decoding + Pydantic/Zod validation, extra commentary before/after JSON happens in roughly 2% of production traffic. At current volume (500k req/day) that's 10k incidents — each one is either a downstream error or a silent bad record.

## Design Principles
1. **Make failures loud, not quiet.** Every repair attempt is logged with before/after.
2. **Cap repair attempts at 2.** If we can't fix it in 2, fail closed and flag for human review.
3. **Repair must not mask real model defects.** If a given prompt fails consistently, we need an alert, not infinite retries.
4. **Temperature: 0 on both original and repair.** Non-determinism here adds noise.
5. **Repair prompt should be SHORTER than original, not longer.** Focused on fixing the specific error.

## Layered Recovery

### Layer 1: Cheap Post-Processing (no LLM call)
Before calling the LLM again, try deterministic fixes:
- Strip markdown code fences ```json ... ``` → parse inner
- Trim text before/after outermost braces via balanced-bracket matcher
- Normalize smart quotes to ASCII
- Remove trailing commas in JSON (common LLM artifact)
- Unescape double-escaped JSON strings (nested JSON-in-string problem)
- Convert single quotes to double quotes (risky — only when whole doc uses single)

Example fix for extra commentary before/after JSON: [write the deterministic fix code for the specific failure mode].

If Layer 1 yields valid output → return it with `repaired=true, repair_layer=1` in metadata.

### Layer 2: LLM Repair Prompt
If Layer 1 fails, call the LLM with a focused repair prompt:

```
Your previous response failed validation with this error:
{validation_error}

Your previous response was:
```
{bad_response}
```

The expected schema is:
```
{schema}
```

Fix ONLY the validation errors. Preserve all valid fields unchanged.
Output the corrected JSON object only. No explanation.
```

Keep it terse. Every extra instruction risks the model rewriting valid content.

Call with:
- temperature=0
- max_tokens= same as original
- Same model (do NOT switch to a bigger model for repair — hides the underlying problem)

If Layer 2 yields valid output → return with `repair_layer=2`.

### Layer 3: Partial Fallback
If Layer 2 fails, do NOT retry again. Instead:

1. Extract any fields that DID parse from the malformed output (best-effort partial parse)
2. For each required field missing, set to `null` and add to `missing_fields`
3. Set `extraction_status="partial"` in metadata
4. Enqueue the record for human review
5. Emit a high-severity log event to trigger alerting

## Prompts & Schemas
Produce:

### Original extraction prompt
Schema-aware, includes full schema + 2 few-shot examples.

### Layer 2 repair prompt
Minimal, focused, as above.

### Schema (Protobuf)
Show the schema that's being validated against. For this task, use the purchase orders schema.

## Error Classification
Classify the validation error before choosing a repair strategy:
- `parse_error` → invalid JSON → Layer 1 likely helps (fences, quotes)
- `missing_required_field` → Layer 2 repair usually recovers (LLM re-extracts)
- `type_mismatch` → Layer 2 usually recovers (e.g., "5" → 5)
- `enum_violation` → Layer 2 with explicit valid values list in the error
- `cross_field_invariant` → Layer 2 with invariant described
- `hallucinated_field` (additionalProperties violation) → Layer 2, telling model which field is disallowed

For each class, describe the layer 2 error-message template that teaches the model what to fix.

## Monitoring
Emit traces to Humanloop:
- span: `extract` with attributes: `primary_success, repair_layer, error_class, total_latency_ms, retry_count`
- Alert: `repair_rate > 5%` for 15 min → page on-call
- Alert: `partial_rate > 1%` → slack team
- Dashboard: repair reasons broken down by doc_type, model, schema_version

Run weekly analysis: top-10 prompts that triggered repair — these are usually either a schema ambiguity or a class of inputs the prompt doesn't handle. Either fix the prompt/schema or add them to the fine-tune dataset.

## Tests
Test the repair loop with synthetic malformed inputs covering every error class:
```python
def test_repair_strips_code_fences():
    bad = '```json\n{"a": 1}\n```'
    result = extract_with_repair(bad_input=bad)
    assert result.record.a == 1
    assert result.metadata.repair_layer == 1

def test_repair_recovers_missing_field():
    # LLM omits "date"; repair prompt adds it
    ...

def test_no_infinite_retry():
    # Force persistent failure, ensure max 2 attempts
    ...
```

## Red Flags (Don't Build These)
- `max_retries=5` — if the model can't get it in 2, it won't in 5
- `temperature=0.7` on repair — "let's hope we roll a good sample" is not engineering
- Silent catch-and-continue — you'll have a fleet of bad records you never notice
- Repair prompt that IS the original prompt — doesn't teach the model anything new

## Deliverables
1. `repair.py` / `.ts` with Layer 1, 2, 3 implementations
2. Error classifier with unit tests per class
3. Repair prompt templates (one per error class)
4. Humanloop dashboard spec
5. Alert rules
6. Weekly analysis script to surface recurring failure prompts

- Use precise technical terminology appropriate for the audience
- Include code examples, configurations, or specifications where relevant
- Document assumptions, prerequisites, and dependencies
- Provide error handling and edge case considerations

Tags

structured-output retry repair validation extra commentary before/after JSON

Who this is for

Recover from malformed LLM output
Build a robust repair loop
Stop silent extraction errors

Browse all AI Engineering & LLM Apps prompts →