Life of a Tensor: A Deep Dive into Production Inference
Advanced MLOps & Production 25 min
A comprehensive deep-dive into production inference optimization, tracing the path of a request through LLM and diffusion model serving systems. Understanding the bottlenecks from gateway to GPU kernel execution.