Skip to content
Gopi Krishna Tummala

ML Cheatsheets

Visual mindmaps for machine learning concepts - Complete ML course coverage

Intro to ML

Motivation, applications, types of learning, history, and ML pipeline

🔹 Intro to ML
Motivation & Applications
Why ML?
Rules-based systems fail on complex data
Data is abundant & cheap
Human expertise is scarce/expensive
Killer Apps
Computer Vision
NLP / LLMs
Recommenders
Healthcare, Finance, Robotics
Types of Learning
Supervised
Labeled data
Regression (continuous)
Classification (discrete)
Unsupervised
No labels → discover patterns
Reinforcement
Agent + environment + rewards
Semi-/Self-supervised
Leverage unlabeled data heavily
History & Milestones
1950s–60s: Perceptron, early neural nets
1980s: Backpropagation
1990s: SVMs, Boosting
2010s: Deep Learning revolution (AlexNet → Transformers)
2020s: Foundation models, multimodal, agents
ML Pipeline (End-to-End)
Problem → Data → Features → Model → Eval → Deploy → Monitor
Bias–Variance & Generalization
Bias: Underfitting (high training error)
Variance: Overfitting (low training, high test error)
Decomposition: Error = Bias² + Variance + Noise
Goal: Minimize test error (generalization)

Data & Evaluation

Data preprocessing, train/val/test splits, cross-validation, and evaluation metrics

🔹 Data & Evaluation
Data Preprocessing
Cleaning
Missing values (impute / drop)
Outliers
Scaling
Min-Max, Standard, Robust, Quantile
Encoding
One-hot, Label, Target, Embeddings
Train/Val/Test Split
Simple 70/15/15 or 80/10/10
Stratified (for imbalance)
Time-series / Group splits
Cross-Validation
k-Fold, Stratified k-Fold
Leave-One-Out, Repeated CV
Nested CV (hyperparam tuning)
Evaluation Metrics
Classification
Accuracy, Precision, Recall, F1
ROC-AUC, PR-AUC
Confusion matrix, Calibration
Regression
MSE, RMSE, MAE, R², Adjusted R²
Ranking / Retrieval
NDCG, MAP, MRR
Imbalanced / Real-world
Fβ, Cohen's Kappa, Matthews Corr.

Classical Supervised Learning

Linear models, regularization, decision trees, and k-NN

🔹 Classical Supervised Learning
Linear Models
Linear Regression
Closed form: (XᵀX)⁻¹Xᵀy
Assumptions (linearity, homoscedasticity, independence)
Logistic Regression
Sigmoid + Cross-entropy
Multiclass: Softmax
Regularization
L2 (Ridge) → shrinks coefficients
L1 (Lasso) → sparsity / feature selection
Elastic Net (L1 + L2)
Decision Trees
Splitting: Gini / Entropy / MSE
Pruning: Cost-complexity, Reduced-error
Pros: Interpretable, non-linear
Cons: Unstable, greedy
k-Nearest Neighbors
Distance metrics: Euclidean, Manhattan, Cosine
Curse of dimensionality
Weighted KNN, Approximate NN (FAISS, HNSW)

Stats & Learning Theory

Generalization bounds, VC dimension, bias-variance, and PAC learning

🔹 Stats & Learning Theory
Generalization Bounds
Hoeffding / Chernoff
Uniform convergence
VC Dimension
Shattering
Linear classifiers: VC = d+1
Sample complexity ≈ VC / ε²
Bias-Variance Decomposition
E[(y − ŷ)²] = Bias² + Var + σ²
PAC Learning
Probably Approximately Correct
Agnostic PAC, Realizable case
Other Key Ideas
No Free Lunch Theorem
Occam's Razor
Double Descent (modern view)

Advanced Classical Models

Support Vector Machines and Bayesian methods

🔹 Advanced Classical Models
Support Vector Machines
Max-margin hyperplane
Soft-margin (slack + C)
Kernel Trick
RBF: exp(−γ‖x−x′‖²)
Polynomial, Sigmoid
Bayesian Methods
Bayes Rule
P(θ|D) ∝ P(D|θ)P(θ)
Naive Bayes
Gaussian, Multinomial, Bernoulli
Bayesian Networks
DAG + CPDs
Exact inference (variable elimination)
Approximate (MCMC, variational)

Ensemble Methods

Bagging, Random Forests, Boosting, and modern implementations

🔹 Ensemble Methods
Bagging
Bootstrap + aggregate
Reduces variance
Random Forests
Bagging + random feature subsets
OOB error, feature importance
Boosting
AdaBoost (exponential loss, weights)
Gradient Boosting (fit residuals)
Modern Boosting (Industry Standard)
XGBoost (regularized, approx splits, DART)
LightGBM (histogram, leaf-wise, GOSS)
CatBoost (ordered boosting, native categoricals)

Tree-Based Machine Learning

A comprehensive overview of decision trees, ensemble methods, and boosting algorithms

Tree-Based Machine Learning
Decision Trees
Structure
Root Node
Internal Nodes
Leaf Nodes
Depth / Height
Types
Classification Tree
Regression Tree
Splitting Criteria
Classification
Gini Impurity
Entropy
Information Gain
Regression
MSE
MAE
Variance Reduction
Stopping Criteria
Max Depth
Min Samples Split
Min Samples Leaf
Pure Node
Pruning
Pre-pruning
Post-pruning
Issues
Overfitting
High Variance
Sensitive to Noise
Bias-Variance Tradeoff
Deep Tree -> Low Bias High Variance
Shallow Tree -> High Bias Low Variance
Ensembles Reduce Variance
Ensemble Methods
Bagging
Bootstrap Sampling
Parallel Training
Majority Vote / Averaging
Reduces Variance
Random Forest
Bagging + Feature Randomness
Random Feature Subset per Split
OOB Error
Feature Importance
Extra Trees
Random Thresholds
More Randomness
Lower Variance
Boosting
Core Idea
Sequential Learning
Focus on Errors
Weak Learners
AdaBoost
Reweight Samples
Weighted Voting
Gradient Boosting
Fit Residuals
Gradient Descent in Function Space
Learning Rate
Additive Model
Regularization
Learning Rate
Number of Trees
Max Depth
Subsampling
XGBoost
Regularized Objective
Tree Pruning
Second-order Gradients
Missing Value Handling
LightGBM
Leaf-wise Growth
Histogram Splitting
GOSS Sampling
CatBoost
Native Categorical Handling
Ordered Boosting
Target Leakage Reduction
Interpretability
Feature Importance
Impurity-based
Permutation Importance
SHAP Values
Partial Dependence Plots
Decision Path Visualization
Practical Considerations
No Feature Scaling Needed
Handles Mixed Data Types
Strong for Tabular Data
Poor Extrapolation
Memory Heavy for Large Forests
Complexity
Tree ~ O(n log n)
Boosting Sequential Slower
Random Forest Parallelizable

Optimization for ML

Gradient descent variants, advanced optimizers, learning rate strategies

🔹 Optimization for ML
Gradient Descent Variants
Batch GD
SGD (noisy but fast)
Mini-batch (sweet spot)
Advanced Optimizers
Momentum
RMSProp / AdaGrad
Adam (β1=0.9, β2=0.999)
AdamW, Lion, Sophia (2024–25)
Learning Rate Strategies
Step decay, Exponential, Cosine
Warmup + decay (common in transformers)
One-cycle policy
Convex vs Non-Convex
Convex → global optimum
Non-convex → local minima, saddles, plateaus
Second-Order Methods
Newton, Quasi-Newton (BFGS, L-BFGS)
Limited by scale

Modern Deep Learning

Neural networks, activations, training, architectures, and representation learning

🔹 Modern Deep Learning
Neural Networks Basics
Perceptron → MLP
Universal approximation
Activation Functions
ReLU family (ReLU, Leaky, GELU, Swish)
Avoid vanishing gradients
Training
Backpropagation + Chain rule
Initialization (He, Xavier)
Batch Norm / Layer Norm / Group Norm
Architectures
CNNs (ResNet, EfficientNet, ConvNeXt)
RNNs → LSTMs/GRUs
Transformers (Self-attention, Multi-head, Positional encoding)
Representation Learning
Embeddings (Word2Vec → BERT → modern LLMs)
Contrastive learning (SimCLR, CLIP)

Unsupervised Learning

Clustering, dimensionality reduction, and generative models

🔹 Unsupervised Learning
Clustering
k-Means (Lloyd's, elbow, silhouette)
Hierarchical (agglomerative + dendrogram)
DBSCAN (density-based)
GMM (soft clustering)
Dimensionality Reduction
PCA (linear, variance-max)
t-SNE (perplexity, KL divergence)
UMAP (faster, better topology preservation)
Generative Models
Autoencoders (undercomplete, denoising, VAE)
GANs (minimax, modern variants like StyleGAN, Diffusion)

Probabilistic & Graphical Models

Mixture models, EM algorithm, Markov models, and Bayesian networks

🔹 Probabilistic & Graphical Models
Mixture Models & EM
Gaussian Mixture Models
EM Algorithm (E-step: responsibilities, M-step: MLE)
Markov Models
HMMs (Forward-Backward, Viterbi)
Markov Random Fields
Bayesian Networks
Structure learning
Inference (exact vs approximate)
Modern Connections
Probabilistic programming (Pyro, NumPyro)
Diffusion models as hierarchical latents

Modern Topics / Extensions

Self-supervised learning, meta-learning, federated learning, RL, and continual learning

🔹 Modern Topics / Extensions
Self-Supervised Learning
Contrastive (SimCLR, MoCo)
Masked modeling (BERT, MAE)
BYOL, SimSiam, DINO
Meta-Learning
Few-shot: MAML, Reptile, ProtoNets
Optimization-based vs metric-based
Federated Learning
FedAvg, FedProx
Privacy (differential privacy, secure aggregation)
Reinforcement Learning
MDPs, Q-Learning, Policy Gradients
Modern: PPO, SAC, Dreamer, AlphaFold-style
Continual / Lifelong Learning
Catastrophic forgetting
Replay buffers, EWC, GEM

Interpretability & Fairness

Interpretability methods, SHAP, fairness definitions, and responsible AI

🔹 Interpretability & Fairness
Interpretability Toolbox
Intrinsic: Decision trees, linear models
Post-hoc: Feature importance, Partial Dependence Plots
Model-Agnostic Methods
LIME (local surrogate)
SHAP (Shapley values, KernelSHAP, TreeSHAP)
Fairness
Definitions
Demographic Parity
Equalized Odds
Equal Opportunity
Mitigation
Pre-processing, In-processing, Post-processing
Responsible AI
Bias detection, Adversarial debiasing, Explainable AI regulations

Scaling & Production ML

Large-scale training, hyperparameter tuning, MLOps, and AutoML

🔹 Scaling & Production ML
Large-Scale Training
Data parallelism, Model parallelism, Pipeline
ZeRO, FSDP, DeepSpeed, Megatron
Hyperparameter Tuning
Grid / Random search
Bayesian optimization (Optuna, Hyperopt)
Neural Architecture Search (DARTS, NAS)
MLOps / Production
Experiment tracking (MLflow, Weights & Biases)
Model serving (TorchServe, TF Serving, vLLM)
Monitoring (data drift, concept drift, performance)
Feature stores (Feast, Tecton)
AutoML
Full pipelines: Auto-sklearn, H2O, Google AutoML
Modern: LLM-powered (e.g., AutoGPT-style agents)

Project & Research Skills

Problem formulation, experiment design, model selection, and research mindset

🔹 Project & Research Skills
Problem Formulation
Define task, success metric, baseline
Literature review (arXiv, PapersWithCode)
Experiment Design
Ablation studies
Statistical significance (t-tests, bootstrap)
Reproducibility (seeds, Docker, Hydra)
Model Selection & Deployment
Tradeoffs: accuracy vs latency vs cost
A/B testing, Canary releases
Research Mindset
Reproducibility crisis awareness
Ethics & societal impact
Open-source contribution
Writing papers, blogging, presenting