Gopi Krishna Tummala

← Back to Home

MLOps & Production

The Infrastructure Round

Scaling, serving, and optimizing AI systems. Custom kernels, inference engines, and production infrastructure.

Suggested Learning Path

1 Start with **Datasets and Dataloaders** for efficient data pipelines
2 Move to **Training Frameworks** for distributed training and resilience
3 Explore **vLLM** for serving infrastructure (PagedAttention, Continuous Batching)
4 Finish with **Custom Kernels** for GPU optimization (FlashAttention)

All Posts in This Track

10 articles covering mlops & production

Intermediate MLOps & Production
30 MIN READ

ML Pipeline Orchestration: Temporal, Airflow, Kubeflow, Ray — Which Layer Does What

A precise mental model for ML pipeline orchestration—mapping durable backend workflows (Temporal), data schedulers (Airflow, Prefect, Dagster), ML-native pipeline frameworks (Kubeflow, Metaflow, ZenML), and distributed compute engines (Ray). Built for engineers who need to answer 'design an ML pipeline' in interviews. Includes 2025-2026 updates: Airflow 3, KFP v2, Ray 2.x, MLflow 3.

April 2, 2026
Advanced MLOps & Production
45 MIN READ

Training Frameworks: ZeRO, FSDP, and the Memory Math That Gets You Hired

A practitioner's guide to distributed training frameworks — the memory formulas, parallelism strategies, and failure-mode reasoning that ML infra interviews actually test. Covers DDP, FSDP, DeepSpeed ZeRO, 3D parallelism, and fault tolerance.

February 1, 2025
Advanced MLOps & Production
40 MIN READ

Datasets & Dataloaders: The Art of Never Starving Your GPU

GPU utilization is a lagging indicator — the real battle is in the data pipeline. A practitioner's deep dive into PyTorch DataLoader internals, zero-copy data pumps, WebDataset streaming, and the exact questions this gets you in ML system design interviews.

January 25, 2025
Advanced MLOps & Production
45 MIN READ

Post-Training Playbook: SFT, LoRA, DPO, and GRPO from First Principles

Pre-training gives a model knowledge; post-training gives it behavior. A practitioner's breakdown of SFT, LoRA/QLoRA, DPO, and GRPO — with the memory math, concrete configs, and interview reasoning that separates candidates who've done this from candidates who've read about it.

January 15, 2026
Advanced MLOps & Production
40 MIN READ

Beyond Inference: Agentic MLOps & The Model Context Protocol (MCP)

From stateless inference to tool-augmented AI agents. Learn how the Model Context Protocol (MCP), secure sandboxes, and holistic versioning enable the next generation of AI systems.

December 18, 2025
Advanced MLOps & Production
40 MIN READ

The Custom Kernel Craze — Handcrafting GPU Performance

Why modern AI teams are handcrafting GPU kernels—from FlashAttention to Triton code—and how silicon-level tuning is the new frontier of MLOps.

November 11, 2025
Advanced MLOps & Production
25 MIN READ

The Infrastructure-First MLOps Roadmap: From Data DNA to Agentic AI

Standard MLOps advice tells you to learn Git and Docker. But for the next generation of AI Engineers, that's just the baseline. This roadmap focuses on the Infrastructure Round—deep-diving into how data is structured for speed, how it's fed into models, how those models scale across clusters, and how we squeeze every drop of performance out of the silicon.

December 18, 2025
Advanced MLOps & Production
40 MIN READ

Life of a Tensor: A Deep Dive into Production Inference

A comprehensive deep-dive into production inference optimization, tracing the path of a request through LLM and diffusion model serving systems. Understanding the bottlenecks from gateway to GPU kernel execution.

January 28, 2025
Advanced MLOps & Production
45 MIN READ

The DNA of Data: Parquet, Arrow, and the Quest for Analytic Speed

The unsung hero of modern data processing is how we structure data itself. Learn how Apache Parquet and Apache Arrow solve the fundamental trade-off between storage efficiency and compute speed.

December 3, 2025
Advanced MLOps & Production
35 MIN READ

vLLM and the Trilogy of Modern LLM Scaling

How PagedAttention, Continuous Batching, Speculative Decoding, and Quantization unlock lightning-fast, reliable large language model serving.

November 10, 2025