Diagnose why a Least-to-Most prompt is failing on A/B test interpretation with o1 and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on resume screening with DeepSeek-V3 and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on math word problems with Claude 3.5 Sonnet and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on SQL query writing with o3 and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on contract review with DeepSeek-R1 and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on research synthesis with Claude 3.7 Sonnet and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on legal brief summarization with o3-mini and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on API design decisions with Llama 3.3 70B and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on log anomaly detection with Claude 4.5 Sonnet and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on technical spec writing with Grok 3 and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on data pipeline debugging with Mistral Large and produce a fix plan.