Category Not Found

1252 prompts

Sort:

Build BERTScore Eval Harness for log anomaly detection on Claude 3.5 Sonnet

Design an eval harness for log anomaly detection using BERTScore that tracks refusal rate across prompt versions on Claude 3.5 Sonnet.

Build promptfoo assertions Eval Harness for log anomaly detection on Claude 4.5 Sonnet

Design an eval harness for log anomaly detection using promptfoo assertions that tracks refusal rate across prompt versions on Claude 4.5 Sonnet.

Build human pairwise comparison Eval Harness for log anomaly detection on Claude Haiku 4

Design an eval harness for log anomaly detection using human pairwise comparison that tracks toolcall precision across prompt versions on Claude Haiku 4.

Build factuality with retrieval Eval Harness for log anomaly detection on Gemini 2.0 Flash

Design an eval harness for log anomaly detection using factuality with retrieval that tracks toolcall precision across prompt versions on Gemini 2.0 Flash.

Build embedding distance Eval Harness for log anomaly detection on DeepSeek-R1

Design an eval harness for log anomaly detection using embedding distance that tracks format-compliance rate across prompt versions on DeepSeek-R1.

Build rubric scoring Eval Harness for log anomaly detection on Llama 3.1 405B

Design an eval harness for log anomaly detection using rubric scoring that tracks format-compliance rate across prompt versions on Llama 3.1 405B.

Build LLM-as-judge Eval Harness for log anomaly detection on Qwen 2.5 72B

Design an eval harness for log anomaly detection using LLM-as-judge that tracks hallucination rate across prompt versions on Qwen 2.5 72B.

Build tool-call accuracy Eval Harness for log anomaly detection on o1-mini

Design an eval harness for log anomaly detection using tool-call accuracy that tracks hallucination rate across prompt versions on o1-mini.

Build G-Eval Eval Harness for log anomaly detection on o3-mini

Design an eval harness for log anomaly detection using G-Eval that tracks hallucination rate across prompt versions on o3-mini.

Build exact match Eval Harness for log anomaly detection on Command R+

Design an eval harness for log anomaly detection using exact match that tracks user satisfaction (CSAT) across prompt versions on Command R+.

Build JSON schema validation Eval Harness for log anomaly detection on GPT-4.1

Design an eval harness for log anomaly detection using JSON schema validation that tracks user satisfaction (CSAT) across prompt versions on GPT-4.1.

Build Trulens feedback functions Eval Harness for incident post-mortems on Claude 3.5 Sonnet

Design an eval harness for incident post-mortems using Trulens feedback functions that tracks inter-judge agreement across prompt versions on Claude 3.5 Sonnet.

💬ChatGPT

344623