Gopi Krishna Tummala

Tag: inference

All the articles with the tag "inference".

Advanced GenAI Systems
40 MIN READ

Sampling & Guidance: The Dialects of Noise

How to accelerate diffusion sampling and steer creativity. Learn the mechanics of DDIM, DPM-Solver, Classifier-Free Guidance (CFG), and the math of negative prompting.

January 25, 2025
Advanced MLOps & Production
40 MIN READ

Life of a Tensor: A Deep Dive into Production Inference

A comprehensive deep-dive into production inference optimization, tracing the path of a request through LLM and diffusion model serving systems. Understanding the bottlenecks from gateway to GPU kernel execution.

January 28, 2025
Advanced MLOps & Production
35 MIN READ

vLLM and the Trilogy of Modern LLM Scaling

How PagedAttention, Continuous Batching, Speculative Decoding, and Quantization unlock lightning-fast, reliable large language model serving.

November 10, 2025