Tag: dpo | Gopi Krishna Tummala

Advanced MLOps & Production

45 MIN READ

Post-Training Playbook: SFT, LoRA, DPO, and GRPO from First Principles

Pre-training gives a model knowledge; post-training gives it behavior. A practitioner's breakdown of SFT, LoRA/QLoRA, DPO, and GRPO — with the memory math, concrete configs, and interview reasoning that separates candidates who've done this from candidates who've read about it.

January 15, 2026