Advanced MLOps & Production
45 MIN READ
Post-Training Playbook: SFT, LoRA, DPO, and GRPO from First Principles
Pre-training gives a model knowledge; post-training gives it behavior. A practitioner's breakdown of SFT, LoRA/QLoRA, DPO, and GRPO — with the memory math, concrete configs, and interview reasoning that separates candidates who've done this from candidates who've read about it.