Gopi Krishna Tummala

Tag: computer-vision

All the articles with the tag "computer-vision".

Intermediate GenAI Systems
22 MIN READ

When AI Sees and Speaks — The Rise of Vision-Language Models

A high level view on how modern vision-language models connect pixels and prose, from CLIP and BLIP to Flamingo, MiniGPT-4, Kosmos, and Gemini.

November 9, 2025
Advanced GenAI Systems
45 MIN READ

Diffusion — From Molecules to Machines

A deep dive into the physics and probability of diffusion models. Learn how reversing a stochastic process became the foundation for modern generative AI, from Stable Diffusion to robotics and protein design.

January 15, 2025
Advanced GenAI Systems
40 MIN READ

Image Diffusion Models: From U-Net to DiT

The evolution of image diffusion architectures. Learn how we moved from convolutional U-Nets to scalable Diffusion Transformers (DiT), and why treating images like language changed everything.

January 25, 2025
Advanced GenAI Systems
45 MIN READ

The Frontier: Sora, Veo, and the Future of Video

Exploring the state-of-the-art in video generation. Learn how Sora and Veo use Spatiotemporal Transformers to simulate the physical world, and the challenges of achieving perfect motion fidelity.

January 25, 2025
Advanced GenAI Systems
40 MIN READ

Physics-Aware Video Diffusion: From Pixels to Laws

How to move from visual imitation to law-governed motion. Deep dive into injecting PDEs into neural networks, implicit physics extraction, and LLM-guided physical reasoning.

January 20, 2025
Advanced GenAI Systems
45 MIN READ

The Training Lifecycle: From Noise to Nuance

How to train a world-class diffusion model. Covers the complete lifecycle: from large-scale pre-training on noisy web data to specialized post-training, alignment, and aesthetic fine-tuning.

January 25, 2025