When AI Sees and Speaks — The Rise of Vision-Language Models
A high level view on how modern vision-language models connect pixels and prose, from CLIP and BLIP to Flamingo, MiniGPT-4, Kosmos, and Gemini.
The System Design Round
Production systems for generative AI—scaling, optimization, and serving large language models at scale.
8 articles covering genai systems
A high level view on how modern vision-language models connect pixels and prose, from CLIP and BLIP to Flamingo, MiniGPT-4, Kosmos, and Gemini.
A clear introduction to diffusion and guided diffusion — how a simple physical process became a foundation for modern generative AI, from Stable Diffusion to robotics and protein design.
The evolution of image diffusion models from U-Net architectures to Diffusion Transformers (DiT). Covers latent diffusion, the DiT revolution, and the complete image generation pipeline.
Deep dive into state-of-the-art video generation models: Sora, Veo 3, and Open-Sora. Plus motion modeling techniques using optical flow, geometry, and diffusion fields.
How video diffusion models are built through pre-training and aligned through post-training. Covers the billion-frame training problem, DPO, RLHF, and the complete training pipeline.
How to accelerate diffusion sampling and control output quality. Covers DDIM, DPM-Solver, Classifier-Free Guidance (CFG), negative prompting, and inference optimization techniques.
Why video is harder than images, the DiT revolution for video, and how diffusion models learn temporal consistency. Covers V-DiT, AsymmDiT, and the mathematical foundations of video generation.
A deep dive into physics-aware video diffusion models: how researchers inject physical constraints into generative models, the three leading technical approaches, and their practical impact on robotics and scientific simulation.