Module 03: The Bedrock (Calibration & Transforms)

By Gopi Krishna Tummala

The Ghost in the Machine — Building an Autonomous Stack

Module 1: Architecture Module 2: Sensors Module 3: Calibration Module 4: Localization Module 7: Prediction

📖 You are reading Module 3: The Bedrock — Act I: The Body and The Senses

The Story: The Most Under-Appreciated Part

The Analogy: If you don’t know where your eyes are relative to your feet, you trip. If you don’t know where your cameras are relative to your LiDAR, your perception fails.

Calibration is the most under-appreciated part of the autonomous stack. It’s invisible when it works, catastrophic when it fails. A 1-degree error in camera-LiDAR calibration can cause a 10cm error at 10m distance. At 30 mph, that’s the difference between hitting a pedestrian and missing them.

The “Oh S**t” Scenario: The Misaligned Sensor

The Failure Mode: Your vehicle has been driving for 6 months. A camera-LiDAR calibration drifts by 0.5 degrees (thermal expansion, vibration, or a minor impact). You don’t notice it — the error is small.

Then you encounter a scenario: a pedestrian is 20m ahead. Your camera sees them. Your LiDAR sees them. But because of the calibration error, when you project the LiDAR point onto the camera image, it’s offset by 10cm.

Your fusion algorithm thinks: “The camera sees a person here, but the LiDAR sees something 10cm away. These don’t match. Must be a false positive.”

Result: The pedestrian is ignored. Near-miss collision.

Why This Happens:

Calibration drift: Sensors move relative to each other over time
No detection: Small errors are hard to detect without explicit monitoring
Cascading failure: Small calibration errors cause large perception errors

The Solution: Online calibration — continuously monitor and correct calibration errors in real-time.

Intrinsics: Lens Distortion

Intrinsics describe the internal properties of a camera — how it maps 3D rays to 2D pixels.

The Pinhole Model (Ideal)

The Math:

\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = K \begin{bmatrix} X/Z \\ Y/Z \\ 1 \end{bmatrix}

Where $K$ is the intrinsic matrix:

K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}

Parameters:

$f_x, f_y$ = focal lengths (in pixels)
$c_x, c_y$ = principal point (image center, in pixels)

Lens Distortion (Real-World)

The Problem: Real lenses have distortion — straight lines in the world appear curved in the image.

Types of Distortion:

Radial Distortion: Caused by lens shape (barrel or pincushion)
Tangential Distortion: Caused by lens misalignment

The Math:

Radial Distortion:

\begin{bmatrix} x' \\ y' \end{bmatrix} = (1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \begin{bmatrix} x \\ y \end{bmatrix}

Where:

$r^2 = x^2 + y^2$ (distance from image center)
$k_1, k_2, k_3$ = radial distortion coefficients

Tangential Distortion:

\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} 2p_1 xy + p_2(r^2 + 2x^2) \\ p_1(r^2 + 2y^2) + 2p_2 xy \end{bmatrix}

Where $p_1, p_2$ = tangential distortion coefficients

The Calibration Problem: Estimate $K$ and distortion coefficients $(k_1, k_2, k_3, p_1, p_2)$ from images of a known pattern (e.g., checkerboard).

Extrinsics: Rigid Body Transforms

Extrinsics describe the position and orientation of one sensor relative to another (or relative to the vehicle frame).

The Transform

The Math:

X_{\text{target}} = R \cdot X_{\text{source}} + t

Where:

$X_{\text{source}}$ = point in source frame
$X_{\text{target}}$ = point in target frame
$R$ = rotation matrix (3×3)
$t$ = translation vector (3×1)

Example: Transform a LiDAR point to the camera frame:

X_{\text{camera}} = R_{\text{LiDAR→Camera}} \cdot X_{\text{LiDAR}} + t_{\text{LiDAR→Camera}}

Why This Matters

The Fusion Problem: To fuse camera and LiDAR data, you need to know:

Where is the LiDAR relative to the camera? (extrinsics)
How does the camera project 3D to 2D? (intrinsics)

The Error Propagation:

If calibration is off by angle $\theta$ and distance $d$ :

At range $r$ :

Angular error: $\Delta x = r \cdot \sin(\theta) \approx r \cdot \theta$ (for small $\theta$ )
Distance error: $\Delta x = d$

Example: At $r = 20$ m, if $\theta = 0.5°$ :

\Delta x = 20 \times \sin(0.5°) \approx 20 \times 0.0087 \approx 0.17 \text{ m} = 17 \text{ cm}

That’s the width of a person. A calibration error can cause you to miss a pedestrian.

Homogeneous Coordinates

The Problem: Rotation and translation are separate operations. This makes composition of transforms awkward.

The Solution: Homogeneous coordinates — represent rotation and translation as a single matrix operation.

The Math

Homogeneous Representation:

\begin{bmatrix} X' \\ Y' \\ Z' \\ 1 \end{bmatrix} = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}

Where the 4×4 transformation matrix is:

T = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}

Composition of Transforms:

If you have two transforms $T_1$ and $T_2$ :

T_{\text{combined}} = T_2 \cdot T_1

The Intuition: Transform from frame A → B → C is the same as transforming A → C directly.

Example: Transform from LiDAR → Vehicle → Camera:

T_{\text{LiDAR→Camera}} = T_{\text{Vehicle→Camera}} \cdot T_{\text{LiDAR→Vehicle}}

SE(3): Lie Groups and Lie Algebras

SE(3) is the Special Euclidean Group — the set of all rigid body transforms (rotations + translations).

Why SE(3) Matters

The Problem: How do you optimize over rotations? Rotation matrices have constraints:

$R^T R = I$ (orthonormal)
$\det(R) = 1$ (no reflection)

These constraints make optimization difficult.

The Solution: Lie Groups and Lie Algebras

Lie Algebra: $\mathfrak{se}(3)$

The Math:

A transform $T \in SE(3)$ can be represented by its Lie algebra $\xi \in \mathfrak{se}(3)$ :

\xi = \begin{bmatrix} \rho \\ \phi \end{bmatrix}

Where:

$\rho \in \mathbb{R}^3$ = translation component
$\phi \in \mathbb{R}^3$ = rotation component (axis-angle representation)

The Exponential Map:

T = \exp(\xi^\wedge)

Where $\xi^\wedge$ is the “hat” operator that converts $\xi$ to a 4×4 matrix.

The Logarithm Map:

\xi = \log(T)

Why This Helps

Optimization: Instead of optimizing over constrained rotation matrices, you optimize over unconstrained Lie algebra elements $\xi \in \mathbb{R}^6$ .

The Calibration Problem:

Minimize reprojection error:

\min_{\xi} \sum_i \| x_i - \pi(T(\xi) \cdot X_i) \|^2

Where:

$X_i$ = 3D point (from LiDAR)
$x_i$ = 2D observation (from camera)
$\pi$ = projection function
$T(\xi)$ = transform parameterized by Lie algebra

Gradient-based optimization (e.g., Levenberg-Marquardt) can now optimize over $\xi$ directly.

Calibration Methods: Online vs. Offline

Offline Calibration

The Setup: Use a known calibration target (checkerboard, AprilTag, etc.) in a controlled environment.

The Process:

Collect data: Capture images/point clouds of the calibration target from multiple viewpoints
Extract features: Detect corners/markers in images, extract points in LiDAR
Optimize: Minimize reprojection error to estimate calibration parameters

The Math:

\min_{K, R, t, \text{distortion}} \sum_i \sum_j \| x_{ij} - \pi(K, R, t, X_j) \|^2

Where:

$X_j$ = 3D point on calibration target
$x_{ij}$ = observed 2D point in image $i$
$\pi$ = projection function (with distortion)

Pros:

Accurate (millimeter-level precision)
Can calibrate all parameters at once
Well-understood, standard approach

Cons:

Requires controlled environment
Time-consuming (30-60 minutes)
Doesn’t handle calibration drift

Online Calibration

The Setup: Continuously estimate calibration from natural scene observations (no calibration target needed).

The Process:

Detect correspondences: Find matching features between camera and LiDAR
Estimate transform: Use RANSAC or optimization to find best transform
Monitor drift: Track calibration over time, detect when it drifts

The Math:

Correspondence-based:

Given correspondences $(X_i^{\text{LiDAR}}, x_i^{\text{camera}})$ :

\min_{R, t} \sum_i \| x_i - \pi(R \cdot X_i + t) \|^2

RANSAC: Robustly handle outliers (false correspondences).

Pros:

No calibration target needed
Handles calibration drift
Can run continuously

Cons:

Less accurate than offline (centimeter-level)
Requires good correspondences
Computationally expensive

Production Practice: A hybrid approach is commonly used:

Offline calibration at factory (high accuracy baseline)
Online calibration in the field (monitor and correct drift)

Time Synchronization: PTP and Timestamps

The Problem: Different sensors capture data at different times. If you fuse camera and LiDAR data, but the camera image is from time $t$ and the LiDAR scan is from time $t + 50$ ms, you’re fusing data from different moments.

The Real-World Twist: At 30 mph, in 50ms you travel 2.2 feet. If you fuse a camera image with a LiDAR scan that’s 50ms later, objects will be misaligned by 2.2 feet.

PTP (Precision Time Protocol)

The Setup: All sensors synchronize to a master clock using PTP (IEEE 1588).

The Math:

Clock Synchronization:

t_{\text{sensor}} = t_{\text{master}} + \text{offset} + \text{delay}

Where:

$\text{offset}$ = clock offset (measured via PTP)
$\text{delay}$ = network delay (measured via PTP)

PTP achieves microsecond-level synchronization — good enough for sensor fusion.

Timestamping

The Process:

Hardware timestamping: Each sensor timestamps data at capture time (not processing time)
PTP synchronization: All timestamps are in the same time reference
Temporal alignment: When fusing, align data by timestamp (not by arrival time)

The Challenge: Different sensors have different latencies:

Camera: 10-20ms (readout + processing)
LiDAR: 50-100ms (scan time)
Radar: 5-10ms (processing)

The Solution: Predict forward or interpolate backward to align timestamps.

Example: If camera image is at $t$ and LiDAR scan is at $t + 50$ ms:

Option 1: Predict LiDAR points forward to time $t$ (using motion model)
Option 2: Interpolate camera image backward to time $t + 50$ ms (not possible, so use closest frame)

The Intuition: Laser Pointer on a Unicycle

The Analogy: You’re holding a laser pointer while riding a unicycle. The laser pointer jitters. Is the jitter because:

Your hand is shaking? (sensor noise)
The unicycle is wobbling? (vehicle motion)
Both? (combined effect)

The Calibration Problem: Similarly, if a LiDAR point appears to move, is it because:

The LiDAR measurement is noisy? (sensor noise)
The vehicle is moving? (ego motion)
The calibration is wrong? (extrinsic error)
All of the above? (combined effect)

The Solution: Motion compensation — account for vehicle motion, then analyze residual error to detect calibration drift.

The Math:

Motion Compensation:

X_{\text{world}}(t) = T_{\text{vehicle}}(t) \cdot X_{\text{LiDAR}}(t)

Where $T_{\text{vehicle}}(t)$ accounts for vehicle motion between timesteps.

Calibration Monitoring:

If calibration is correct, after motion compensation, corresponding points from camera and LiDAR should align. If they don’t, calibration has drifted.

Summary: The Bedrock of Perception

Calibration is the foundation of sensor fusion:

Intrinsics: Know how each sensor maps the world to measurements
Extrinsics: Know where sensors are relative to each other
Time sync: Know when each sensor captured its data
Monitoring: Continuously verify calibration hasn’t drifted

The Path Forward:

With calibrated sensors, we can now:

Fuse sensor data (Module 6)
Localize the vehicle (Module 4)
Detect objects (Module 5)
Track them over time (Module 6)

Graduate Assignment: Camera-LiDAR Calibration

Task: Implement offline calibration between a camera and LiDAR.

Setup:

Calibration target: Checkerboard (known 3D positions)
Data: Camera images + LiDAR point clouds of the checkerboard

Deliverables:

Extract checkerboard corners from camera images
Extract checkerboard points from LiDAR scans
Implement optimization to estimate $R, t$ (extrinsics) and $K$ (intrinsics)
Visualize: Project LiDAR points onto camera image — do they align?

Extension: Implement online calibration using natural scene correspondences.

Module 03: The Bedrock (Calibration & Transforms)

Table of Contents

The Story: The Most Under-Appreciated Part

The “Oh S**t” Scenario: The Misaligned Sensor

Intrinsics: Lens Distortion

The Pinhole Model (Ideal)

Lens Distortion (Real-World)

Extrinsics: Rigid Body Transforms

The Transform

Why This Matters

Homogeneous Coordinates

The Math

SE(3): Lie Groups and Lie Algebras

Why SE(3) Matters

Lie Algebra: $\mathfrak{se}(3)$

Why This Helps

Calibration Methods: Online vs. Offline

Offline Calibration

Online Calibration

Time Synchronization: PTP and Timestamps

PTP (Precision Time Protocol)

Timestamping

The Intuition: Laser Pointer on a Unicycle

Summary: The Bedrock of Perception

Graduate Assignment: Camera-LiDAR Calibration

Further Reading

Module 03: The Bedrock (Calibration & Transforms)

Table of Contents

The Story: The Most Under-Appreciated Part

The “Oh S**t” Scenario: The Misaligned Sensor

Intrinsics: Lens Distortion

The Pinhole Model (Ideal)

Lens Distortion (Real-World)

Extrinsics: Rigid Body Transforms

The Transform

Why This Matters

Homogeneous Coordinates

The Math

SE(3): Lie Groups and Lie Algebras

Why SE(3) Matters

Lie Algebra: se(3)\mathfrak{se}(3)se(3)

Why This Helps

Calibration Methods: Online vs. Offline

Offline Calibration

Online Calibration

Time Synchronization: PTP and Timestamps

PTP (Precision Time Protocol)

Timestamping

The Intuition: Laser Pointer on a Unicycle

Summary: The Bedrock of Perception

Graduate Assignment: Camera-LiDAR Calibration

Further Reading

Lie Algebra: $\mathfrak{se}(3)$