Prompts/AI Engineering & LLM Apps/RAG Pipelines

FreeAI Engineering & LLM Apps💬 ChatGPT

Build token-based sliding window RAG Pipeline for earnings call transcripts with arctic-embed-l

ChatGPT Prompt for RAG Pipelines

Production RAG recipe: token-based sliding window chunking, arctic-embed-l embeddings, pgvector storage, Cohere Rerank 3.5 reranking. Includes retrieval evals.

Related prompts

More prompts for RAG Pipelines.

Browse all AI Engineering & LLM Apps →

AI Engineering & LLM Apps

Free

query decomposition Query Transformation for support tickets RAG

Implement query decomposition to improve retrieval recall for support tickets using jina-embeddings-v3 + multi-vector (per chunk).

💬ChatGPT

3511516

AI Engineering & LLM Apps

Premium

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Production RAG recipe: recursive character chunking, mxbai-embed-large embeddings, Redis Vector storage, Voyage rerank-2 reranking. Includes retrieval evals.

💬ChatGPT

3111513

AI Engineering & LLM Apps

Premium

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Production RAG recipe: semantic (embedding-based) chunking, stella_en_1.5B_v5 embeddings, Chroma storage, mxbai-rerank-large reranking. Includes retrieval evals.

💬ChatGPT

931513

AI Engineering & LLM Apps

Premium

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Hybrid BM25 + dense retrieval architecture with Cohere Rerank 3.5 cross-encoder reranking, tuned for customer interview transcripts.

🤖Any Model

201513

AI Engineering & LLM Apps

Premium

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Production RAG recipe: token-based sliding window chunking, stella_en_1.5B_v5 embeddings, Weaviate storage, mxbai-rerank-large reranking. Includes retrieval evals.

🤖Any Model

3361512

AI Engineering & LLM Apps

Free

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Production RAG recipe: token-based sliding window chunking, cohere-embed-multilingual-v3 embeddings, pgvector storage, Cohere Rerank 3.5 reranking. Includes retrieval evals.

🤖Any Model

3361511

You are a senior AI engineer designing a production RAG pipeline. Produce a complete, implementation-ready spec that a mid-level engineer can ship in one sprint. ## System Context - **Document corpus:** earnings call transcripts (~100k documents) - **Query volume:** 2 queries/second peak, 1M daily - **Latency SLO:** end-to-end p95 under 1.2s - **Language coverage:** 40+ languages - **Freshness requirement:** minutes ## Pipeline Architecture ### 1. Ingestion & Parsing - Parser choice for earnings call transcripts (Unstructured, LlamaParse, pdfplumber, custom): justify tradeoff - How to preserve tables, code blocks, and hierarchical structure - OCR fallback strategy for scanned pages (Tesseract vs GPT-4o vision vs Textract) - Metadata to extract per document: source_url, created_at, author, section_path, doc_type, access_level - Deduplication: content hash + near-duplicate detection via MinHash/SimHash ### 2. Chunking: token-based sliding window - Target chunk size: 256 tokens with 0 token overlap - Boundary rules (where you're allowed and NOT allowed to split) - How to handle chunks that are too small (merge) and too large (recursive split) - Special handling for tables, code blocks, lists (do NOT split these) - Contextual header: prepend document title + section breadcrumb to each chunk - If chunking strategy is 'contextual retrieval', include the Claude Haiku prompt to generate the 50-100 token contextual prefix per chunk ### 3. Embedding: arctic-embed-l - Batch size, rate-limit handling, retry policy (exponential backoff + jitter) - Cost estimate at corpus size (model price × tokens) - Embedding dimensionality and storage implications - When to re-embed (model upgrade, chunking change) and migration path - Normalization: L2-normalize if using dot product; skip if cosine ### 4. Storage: pgvector - Index configuration (HNSW M, efConstruction, efSearch OR IVF nlist, nprobe) - Payload schema (which metadata is filterable, which is projected) - Sharding / namespace strategy for multi-tenancy - Backup and point-in-time recovery - Estimated storage cost per 1M chunks ### 5. Retrieval: dense (dot product) - Top-k at retrieval: 100 - Metadata filters (tenant_id, access_level, date range) - Hybrid search weights and Reciprocal Rank Fusion k=60 - Query preprocessing: lowercase, stopword handling, acronym expansion ### 6. Reranking: Cohere Rerank 3.5 - Rerank top-100 down to top-5 - Latency budget for rerank step - When to skip rerank (very high-confidence top result, latency pressure) - Fallback if reranker API fails ### 7. Answer Synthesis - Model: Claude Sonnet 4.5 or GPT-4.1 (justify) - System prompt template with citation requirements - Output format: answer + citations array [{chunk_id, quote, doc_url}] - Refusal rules: when retrieval returns low-relevance chunks, refuse rather than hallucinate ## Prompt Template ``` You are a helpful assistant answering questions about earnings call transcripts. RULES: 1. Use ONLY the context below. If the answer is not present, say "I don't have enough information." 2. Cite every factual claim with [doc_id] markers. 3. Quote short spans verbatim for critical facts. 4. If context conflicts, note the conflict. CONTEXT: {retrieved_chunks_with_ids} QUESTION: {user_question} ANSWER: ``` ## Evaluation Plan Build a golden set of 500 question+expected-sources pairs from real user queries. Score: - **Retrieval hit@k** (100, 5, 1): did the correct chunk appear? - **Context precision** (Ragas): are retrieved chunks relevant? - **Context recall**: did we retrieve everything needed? - **Faithfulness**: does the answer stay grounded in context? - **Answer relevance**: does the answer address the question? - **Citation accuracy**: do cited chunks actually support the claim? Target thresholds: hit@10 ≥ 0.90, faithfulness ≥ 0.95, citation accuracy ≥ 0.98. ## Edge Cases to Handle - Very long documents (> 1M tokens): hierarchical summarization or map-reduce - Tables with numeric data: extract to markdown tables, preserve headers - Code chunks: chunk by function/class, never mid-function - Non-English docs: language detection, multilingual embedding, answer in query language - PII redaction before storage (emails, SSNs, phone numbers) - Access-controlled docs: enforce ACL filters at query time, NEVER post-filter after retrieval ## Cost & Latency Estimate | Component | Latency (p95) | Cost per query | |-----------|---------------|----------------| | Query embed | — | — | | Vector search | — | — | | Rerank | — | — | | LLM synthesis | — | — | | **Total** | — | — | Fill these in with specific numbers for arctic-embed-l + pgvector + Cohere Rerank 3.5 at 2 QPS. ## Deliverables 1. Code scaffold (Python or TypeScript): ingest.py, chunk.py, embed.py, retrieve.py, synthesize.py, evaluate.py 2. Chunking parameter table (chunk_size, overlap, separators) tuned for earnings call transcripts 3. Prompt template (above) with variables called out 4. Eval harness wired to the golden set with CI gate on the three target thresholds 5. Runbook: what to do when retrieval quality drops, what to do when latency spikes 6. Cost model spreadsheet with ingestion cost, per-query cost, storage cost - Use precise technical terminology appropriate for the audience - Include code examples, configurations, or specifications where relevant - Document assumptions, prerequisites, and dependencies - Provide error handling and edge case considerations Present your output in a clear, organized structure with headers (##), subheaders (###), and bullet points. Use bold for key terms.

Build token-based sliding window RAG Pipeline for earnings call transcripts with arctic-embed-l

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Build token-based sliding window RAG Pipeline for earnings call transcripts with arctic-embed-l

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

How to customize this prompt

Tags

Who this is for