Rigorous evaluation harness comparing the fine-tuned model against Phi-4 base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against DeepSeek-V3 base base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Mixtral 8x7B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Mixtral 8x7B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Mixtral 8x7B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Mixtral 8x7B base, closed-source frontier, and previous checkpoint.