Prompts/AI Engineering & LLM Apps/Structured Output & Tool Schemas

FreeAI Engineering & LLM Apps🤖 Any Model

JSON Schema (Draft 2020-12) for Extracting product specs from datasheets from earnings call transcripts

AI Prompt for Structured Output & Tool Schemas

Design JSON Schema (Draft 2020-12), LLM prompt, and validator for extracting product specs from datasheets from earnings call transcripts with strict schema adherence.

331 copies1185 views⭐ 4.6 (21 ratings)

Prompt

You are a senior engineer shipping an extraction pipeline: given earnings call transcripts, extract product specs from datasheets into a validated structured record. Zero tolerance for malformed output — downstream systems depend on strict schema adherence.

## Requirements
- Input: raw earnings call transcripts (may be OCR'd, multilingual, partially structured)
- Output: validated JSON Schema (Draft 2020-12) record
- Accuracy target: 0.90 F1 on held-out eval set
- Schema validation pass rate: ≥ 99.5%
- Hallucination rate: ≤ 0.5%

## Schema Design (JSON Schema (Draft 2020-12))

Produce the full schema. Key design decisions:
- Every field has an explicit type (no `any`)
- Every field is either `required` or has an explicit default
- Dates use ISO 8601 strings, validated
- Currency: separate `amount: number` + `currency: ISO4217` — NEVER mix symbol into the number
- Enums for controlled vocabularies (don't leave as free-form string)
- `confidence: number` per extracted field (0-1)
- `source_span: { start: int, end: int, quote: string }` for each field — the exact text chunk it was extracted from
- `not_found` marker for missing-but-expected fields (differentiates from null)

Example skeleton for product specs from datasheets:
```JSON Schema (Draft 2020-12)
{
  "fields": [
    // 8-15 fields specific to product specs from datasheets
  ],
  "line_items": [
    // if applicable — array of sub-records
  ],
  "metadata": {
    "confidence_overall": 0.0,
    "extraction_model": "string",
    "extracted_at": "ISO8601"
  }
}
```

Provide the full schema with 10-20 realistic fields for product specs from datasheets, each with type, description, required/optional, validation constraints.

## Extraction Prompt

```
You are an expert data extraction system for earnings call transcripts.

Extract product specs from datasheets from the DOCUMENT below.

Your output MUST be a single valid JSON object matching this schema:
{schema_json}

RULES:
1. Extract ONLY facts present in the document. NEVER invent.
2. For each field, include a "source_span" with the exact text you based the value on.
3. If a field is absent, return null (or "not_found" per schema).
4. For numeric fields, extract the number WITHOUT units or symbols.
5. Dates: convert to ISO 8601 (YYYY-MM-DD). Ambiguous formats → log in "warnings".
6. For enum fields, use the exact literal values — do not paraphrase.
7. Do not output anything outside the JSON. No explanation, no markdown fences.

DOCUMENT:
{document_text}

JSON:
```

### Constrained Decoding Options
If using a model that supports constrained decoding:
- **OpenAI Structured Outputs / response_format:** use with strict=true
- **Anthropic tool use:** define a single `submit_extraction` tool with the schema; model MUST call it
- **Open models via Outlines / guidance / XGrammar:** compile a grammar from the schema
- **vLLM guided_json:** pass the JSON schema at request time

Even with constrained decoding, keep the Pydantic/Zod validator as a second line of defense — grammars can have bugs and schemas evolve.

## Validator (JSON Schema (Draft 2020-12))
Produce the runtime validator. For example with Pydantic:

```python
from pydantic import BaseModel, Field, field_validator
from datetime import date

class StructuredData(BaseModel):
    # fields with full type annotations and Field(..., description=...)
    # custom @field_validator for cross-field invariants

class ExtractionResult(BaseModel):
    record: StructuredData
    confidence: float = Field(ge=0.0, le=1.0)
    warnings: list[str] = Field(default_factory=list)
```

## Retry / Repair Loop
LLMs occasionally produce invalid JSON despite all precautions. Implement:

1. **Attempt 1:** full prompt with schema, temperature 0
2. **On ValidationError:** build a repair prompt:

```
Your previous response failed validation:
{validation_error}

Here was your response:
{bad_response}

Fix ONLY the validation errors. Return the full corrected JSON. Do not change valid fields.
```

3. **Attempt 2:** with repair prompt, temperature 0
4. **On second ValidationError:** fall back to partial extraction — keep valid fields, mark invalid ones as `extraction_error`, flag the record for human review
5. Cap retries at 2. Never loop.

## Failure Modes & Mitigations
For earnings call transcripts extracting product specs from datasheets:
- **currency symbols mixed with numbers** → mitigation
- **partial JSON (truncation)** → mitigation
- **hallucinated field values** → mitigation
- **Markdown code fences wrapping JSON** → strip ```json and ``` before parsing
- **Extra commentary** → regex-extract first balanced JSON object
- **Partial extraction** → accept if required fields present; flag if not
- **Hallucinated enum value** → fuzzy match to nearest valid; reject if distance too large

## Evaluation
Golden set of 2500 earnings call transcripts examples with human-labeled product specs from datasheets. Metrics:
- **Per-field accuracy** (exact match for enums/dates; tolerance-bounded for numbers)
- **Field recall** (did we find the field when it existed?)
- **Field precision** (when we extracted, was it right?)
- **Schema validation rate** (% passing Pydantic)
- **Hallucination rate** (fields extracted that weren't in source)
- **End-to-end record accuracy** (all required fields correct)

Error analysis: bucket failures by field name to find systematic issues.

## Observability
Log every extraction to Braintrust:
- Input doc id, length, language
- Model, prompt version, schema version
- Validation pass/fail, which fields failed
- Retry count
- Confidence score distribution
- User-flagged errors (feed back into eval set)

## Deliverables
1. Schema file (JSON Schema (Draft 2020-12)) with comprehensive docstrings
2. Prompt template with version tag
3. Validator with unit tests covering happy path + each failure mode
4. Retry/repair loop implementation
5. Golden set + eval script + CI gate
6. Dashboard in Braintrust

Present as numbered steps. Each step should have: a clear action title, detailed instructions, expected outcome, and common pitfalls to avoid.

How to customize this prompt

Replace the bracketed placeholders with your own context before running the prompt:

[// if applicable — array of sub-records]— fill in your specific // if applicable — array of sub-records.
[str]— fill in your specific str.

Tags

structured-output JSON Schema (Draft 2020-12)extraction product specs from datasheets json-mode

Who this is for

Extract product specs from datasheets from earnings call transcripts
Design a production extraction schema
Ship 99%+ schema-valid LLM output

Browse all AI Engineering & LLM Apps prompts →

Related prompts

More prompts for Structured Output & Tool Schemas.

Browse all AI Engineering & LLM Apps →

AI Engineering & LLM Apps

Pydantic BaseModel for Extracting job postings (title, salary, location, skills) from HTML docs

Design Pydantic BaseModel, LLM prompt, and validator for extracting job postings (title, salary, location, skills) from HTML docs with strict schema adherence.

AI Engineering & LLM Apps

JSON Schema (Draft 2020-12) for Extracting bug reports from support chats from Confluence wikis

Design JSON Schema (Draft 2020-12), LLM prompt, and validator for extracting bug reports from support chats from Confluence wikis with strict schema adherence.

AI Engineering & LLM Apps

Retry and Repair Prompt for null vs empty string inconsistency on JSON Schema (Draft 2020-12) Output

Robust retry/repair loop that recovers from null vs empty string inconsistency in LLM JSON Schema (Draft 2020-12) output without looping or masking bugs.

AI Engineering & LLM Apps

Pydantic BaseModel for Extracting purchase orders from Jira tickets

Design Pydantic BaseModel, LLM prompt, and validator for extracting purchase orders from Jira tickets with strict schema adherence.

AI Engineering & LLM Apps

Pydantic BaseModel for Extracting purchase orders from research papers

Design Pydantic BaseModel, LLM prompt, and validator for extracting purchase orders from research papers with strict schema adherence.

AI Engineering & LLM Apps

Pydantic BaseModel for Extracting insurance claim line items from regulatory filings

Design Pydantic BaseModel, LLM prompt, and validator for extracting insurance claim line items from regulatory filings with strict schema adherence.