Intermediate GenAI Systems
22 MIN READ
When AI Sees and Speaks — The Rise of Vision-Language Models
A high level view on how modern vision-language models connect pixels and prose, from CLIP and BLIP to Flamingo, MiniGPT-4, Kosmos, and Gemini.