When AI Sees and Speaks — The Rise of Vision-Language Models
A high level view on how modern vision-language models connect pixels and prose, from CLIP and BLIP to Flamingo, MiniGPT-4, Kosmos, and Gemini.
All the articles with the tag "computer-vision".
A high level view on how modern vision-language models connect pixels and prose, from CLIP and BLIP to Flamingo, MiniGPT-4, Kosmos, and Gemini.
A clear introduction to diffusion and guided diffusion — how a simple physical process became a foundation for modern generative AI, from Stable Diffusion to robotics and protein design.
An intuitive introduction to Variational Autoencoders — how compressing data into probabilistic codes enables machines to generate realistic images, sounds, and structures.
How we used deep learning to automatically calibrate traffic cameras by observing vehicle motion—work that won Best Paper Award at ACM BuildSys 2017.
My research journey from wireless communication foundations to solving the camera calibration bottleneck that enables autonomous vehicle vision.