Skip to content
Gopi Krishna Tummala

Tag: inference

All the articles with the tag "inference".

  • Advanced GenAI Systems
    40 MIN READ

    Sampling & Guidance: The Dialects of Noise

    How to accelerate diffusion sampling and steer creativity. Learn the mechanics of DDIM, DPM-Solver, Classifier-Free Guidance (CFG), and the math of negative prompting.

  • Advanced MLOps & Production
    40 MIN READ

    Life of a Tensor: A Deep Dive into Production Inference

    A comprehensive deep-dive into production inference optimization, tracing the path of a request through LLM and diffusion model serving systems. Understanding the bottlenecks from gateway to GPU kernel execution.

  • Advanced MLOps & Production
    35 MIN READ

    vLLM and the Trilogy of Modern LLM Scaling

    How PagedAttention, Continuous Batching, Speculative Decoding, and Quantization unlock lightning-fast, reliable large language model serving.