Prompts/AI Engineering & LLM Apps/RAG Pipelines

FreeAI Engineering & LLM Apps🟠 Claude

RAG-Fusion Query Transformation for API reference docs RAG

Claude Prompt for RAG Pipelines

Implement RAG-Fusion to improve retrieval recall for API reference docs using jina-embeddings-v3 + multi-vector (per chunk).

Related prompts

More prompts for RAG Pipelines.

Browse all AI Engineering & LLM Apps →

AI Engineering & LLM Apps

Free

query decomposition Query Transformation for support tickets RAG

Implement query decomposition to improve retrieval recall for support tickets using jina-embeddings-v3 + multi-vector (per chunk).

💬ChatGPT

3511516

AI Engineering & LLM Apps

Premium

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Production RAG recipe: recursive character chunking, mxbai-embed-large embeddings, Redis Vector storage, Voyage rerank-2 reranking. Includes retrieval evals.

💬ChatGPT

3111513

AI Engineering & LLM Apps

Premium

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Production RAG recipe: semantic (embedding-based) chunking, stella_en_1.5B_v5 embeddings, Chroma storage, mxbai-rerank-large reranking. Includes retrieval evals.

💬ChatGPT

931513

AI Engineering & LLM Apps

Premium

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Hybrid BM25 + dense retrieval architecture with Cohere Rerank 3.5 cross-encoder reranking, tuned for customer interview transcripts.

🤖Any Model

201513

AI Engineering & LLM Apps

Premium

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Production RAG recipe: token-based sliding window chunking, stella_en_1.5B_v5 embeddings, Weaviate storage, mxbai-rerank-large reranking. Includes retrieval evals.

🤖Any Model

3361512

AI Engineering & LLM Apps

Free

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Production RAG recipe: token-based sliding window chunking, cohere-embed-multilingual-v3 embeddings, pgvector storage, Cohere Rerank 3.5 reranking. Includes retrieval evals.

🤖Any Model

3361511

You are a retrieval quality specialist. The user's raw query often fails to retrieve the right chunks because it uses different vocabulary, is under-specified, or is multi-hop. Implement RAG-Fusion as a pre-retrieval step for a RAG system over API reference docs. ## Problem Statement Our baseline retriever is multi-vector (per chunk) with jina-embeddings-v3 over Chroma. On the golden eval set of 1000 queries, current hit@10 is 0.62. The dominant failure modes are: - Vocabulary mismatch (user says "get charged" but docs say "billed") - Multi-hop questions (user asks about X which requires facts A and B) - Under-specification (user pronouns or ellipsis: "why does it do that?") - Acronym ambiguity - Questions phrased negatively ("what are the exceptions to...") ## Transformation: RAG-Fusion Explain the technique, why it helps, and its failure modes. ### Prompt Template Produce the exact LLM prompt to implement RAG-Fusion. The prompt must: - Use Ragas faithfulness judge or a smaller model for cost (specify which and why) - Take raw_query + optional conversation_history as input - Output a strictly-formatted result (JSON with transformed_queries: string[]) - Include 3 few-shot examples drawn from the API reference docs domain - Handle the case where no transformation is needed (pass through) ### JSON Output Schema ```json { "strategy_used": "hyde | multi-query | step-back | decompose", "transformed_queries": ["...", "..."], "filter_hints": { "date_after": "...", "doc_type": "..." }, "confidence": 0.0 } ``` ## Fusion Strategy After running retrieval with each transformed query, you have multiple candidate sets. Combine them via: - **Reciprocal Rank Fusion** with k=60 (default, robust) - OR **weighted score fusion** if you have calibrated scores - Deduplicate by chunk_id; keep highest-rank occurrence - Truncate to final top-15 ## Evaluation On the golden set, measure: - hit@1, hit@5, hit@10 lift vs baseline - Latency overhead (extra LLM call + parallel retrievals) - Cost overhead per query - Failure-mode coverage: how many of the 5 listed failures did we fix? Run pairwise: for each query, A = baseline retrieval, B = with RAG-Fusion. Use Ragas faithfulness judge to score which result set is more relevant. Target: B wins ≥ 65% of the time AND baseline never wins more than 15% (no regressions). ## Caching RAG-Fusion adds an LLM call. Cache aggressively: - Cache key: normalized raw query (lowercase, trim, sort tokens) - TTL: 24h for stable corpora, 1h for changing corpora - Store in Redis with memory limit + LRU eviction ## Fallback & Safety - LLM call times out > 800ms → fall back to raw query - LLM returns malformed JSON → retry once with repair prompt, then fall back - Transformed query count > 6 → cap at 6 (latency/cost) - Log every transformation with trace_id for debugging ## Code Scaffold Produce `query_transform.py` (or .ts) with: 1. Function signature: `async def transform_query(raw: str, history: list[str] | None) -> TransformResult` 2. Input validation 3. LLM call with timeout + retry 4. JSON parsing with Pydantic validation 5. Metrics emission (latency, cache hit/miss, fallback rate) 6. Unit tests covering: cache hit, cache miss, malformed JSON, timeout, empty history ## Monitoring Emit traces to Galileo: - span: query_transform.call - attributes: strategy_used, num_transformed, cache_hit, latency_ms, model - linked to parent retrieval span ## Rollout Plan 1. Ship behind feature flag at 1% traffic 2. Compare online metrics (click-through on top result, user satisfaction thumbs) 3. Ramp to 10%, 50%, 100% over 2 weeks if metrics hold 4. Kill switch: env var QUERY_TRANSFORM_ENABLED=false Structure as a playbook with: Overview, Prerequisites, Step-by-step Plays, Metrics to Track, and Troubleshooting Guide.

RAG-Fusion Query Transformation for API reference docs RAG

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

RAG-Fusion Query Transformation for API reference docs RAG

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

How to customize this prompt

Tags

Who this is for