Full fine-tuning recipe: DPO on Mistral Small 3 via LLaMA-Factory, targeting single A100 80GB, with data mix and eval plan.
Full fine-tuning recipe: DPO on Qwen 2.5 7B via Axolotl, targeting single H100 80GB, with data mix and eval plan.
Full fine-tuning recipe: DPO on Qwen 2.5 32B via FSDP, targeting single H100 80GB, with data mix and eval plan.
Full fine-tuning recipe: DPO on Gemma 2 9B via LLaMA-Factory, targeting 2x A100 80GB, with data mix and eval plan.
Full fine-tuning recipe: DPO on Mistral Small 3 via OpenRLHF, targeting single RTX 4090 (24GB), with data mix and eval plan.
Full fine-tuning recipe: DPO on Llama 3.3 70B via Axolotl, targeting AWS g5.12xlarge, with data mix and eval plan.
Full fine-tuning recipe: DPO on Llama 3.3 70B via DeepSpeed, targeting 2x A100 80GB, with data mix and eval plan.
Full fine-tuning recipe: DPO on Mixtral 8x7B via LitGPT, targeting single RTX 3090 (24GB), with data mix and eval plan.
Full fine-tuning recipe: DPO on Llama 3.3 70B via Axolotl, targeting single RTX 3090 (24GB), with data mix and eval plan.
Full fine-tuning recipe: DPO on Mixtral 8x7B via Hugging Face TRL, targeting Lambda Labs 8xH100, with data mix and eval plan.
Full fine-tuning recipe: DPO on Mixtral 8x7B via LitGPT, targeting 8x H100, with data mix and eval plan.
Full fine-tuning recipe: DPO on Phi-3.5-mini via Megatron-LM, targeting 2x RTX 4090, with data mix and eval plan.