Module 2: Turning a Photo into a Blueprint: The Perspective Problem

By Gopi Krishna Tummala

Satellite Photogrammetry Course

Module 1 Module 2: Geometry Module 3 Module 4 Module 5 Module 6 Module 7 Module 8

📖 You are reading Module 2: Turning a Photo into a Blueprint: The Perspective Problem

The Geometry Challenge: The World is a Funhouse Mirror

When you look at a photograph, it’s not an accurate map. It’s a perspective projection—a flat piece of art that tricks your brain into seeing depth. Tall buildings look like they’re leaning, and mountains appear smaller than they are. This happens because the camera is seeing the world from a single point in space.

For a satellite image to become a measurable blueprint, we need to mathematically reverse this distortion. We must answer a critical question: If this pixel $(x, y)$ is on the sensor, where exactly is the point $(X, Y, Z)$ on the ground that created it?

To do this, we need three key pieces of information, all known with extreme precision:

The Satellite’s Location $(X_C, Y_C, Z_C)$ : Where the camera was in space.
The Satellite’s Angle ( $m_{ij}$ ): How the camera was tilted (roll, pitch, yaw).
The Camera’s Insides ( $f, x_0, y_0$ ): Its focal length and sensor center.

💡 The Math Hook: The Collinearity Equations

The reversal process is governed by the Collinearity Equations. Don’t be scared by the Greek letters and subscripts; the concept is elegant. They simply state the line-of-sight rule:

The center of the camera lens, the pixel on the sensor, and the corresponding point on the ground MUST all fall on one straight line (they are collinear).

We use these equations to link the 2D image coordinates $(x, y)$ to the 3D ground coordinates $(X, Y, Z)$ :

$x - x_0 = -f \frac{m_{11}(X - X_C) + m_{12}(Y - Y_C) + m_{13}(Z - Z_C)}{m_{31}(X - X_C) + m_{32}(Y - Y_C) + m_{33}(Z - Z_C)}$

$y - y_0 = -f \frac{m_{21}(X - X_C) + m_{22}(Y - Y_C) + m_{23}(Z - Z_C)}{m_{31}(X - X_C) + m_{32}(Y - Y_C) + m_{33}(Z - Z_C)}$

This pair of formulas is the mathematical heart of photogrammetry. They are based on the pinhole camera model, using trigonometry and vector math to solve for the ground point.

Where:

$(x, y)$ : Image coordinates
$(x_0, y_0)$ : Principal point (image center)
$f$ : Focal length
$(X_C, Y_C, Z_C)$ : Camera position in 3D space
$m_{ij}$ : Rotation matrix elements (orientation angles)
$(X, Y, Z)$ : Ground coordinates

🗺️ Key Concepts: The Map’s Foundation

The Earth isn’t a Perfect Sphere

To map the Earth, we can’t just use a simple sphere.

The Geoid:

This is the true, irregular, “potato-shaped” surface of the Earth
Defined by mean sea level and gravity
Represents the actual gravitational field

The WGS84 Ellipsoid (Datum):

This is a smooth, mathematical approximation (an oblate spheroid)
Used as the mandatory reference surface for GPS and satellite mapping
We map the world onto this ideal shape first

Geographic vs. Projected Coordinates

Before we measure anything, we must choose our ruler:

Geographic Coordinates (Latitude/Longitude):

These are angles (degrees) on the curved surface of the Earth
Great for global location but bad for measuring distance
A degree of longitude near the equator is much longer than one near the pole
Units: Degrees, minutes, seconds (DMS) or decimal degrees

Projected Coordinates (UTM, State Plane):

This flattens the curved surface onto a simple $X-Y$ grid (meters)
What you use for maps and engineering
Distances and areas are preserved (at the expense of some angular distortion)
UTM: Universal Transverse Mercator, divides Earth into 60 zones

Why Satellites are Tricky: The Push-Broom Scanner

Traditional cameras take one whole picture (a frame). Most modern high-resolution satellites use a push-broom scanner. Think of it as a camera taking a continuous snapshot, one thin line of pixels at a time, as the satellite flies along its path.

The Difference:

Every single line in a push-broom image has its own, slightly different perspective center (camera location) and orientation because the satellite is constantly moving and subtly shifting.

The Fix:

This requires a highly complex, time-dependent Rigorous Sensor Model (RSM) to track the precise geometry for every single line.

Sensor Model Components:

Orbital parameters: Position and velocity vectors
Attitude angles: Roll, pitch, yaw (how the sensor is oriented)
Focal length and sensor geometry: Internal camera parameters

Rational Polynomial Coefficients (RPC):

Alternative to rigorous models
Polynomial approximation of the sensor geometry
Faster computation, widely used in commercial software

Ground Control Points (GCPs)

Even with all the math, we still need reality checks. Ground Control Points (GCPs) are landmarks on the ground (like a road intersection or a corner of a roof) whose $(X, Y, Z)$ coordinates are known with survey-grade accuracy.

We use GCPs to:

Refine the Model: We check if the Collinearity Equations correctly predict where the GCP should fall on the image. If there’s an error, we adjust the camera’s orbital parameters until the prediction is perfect.
Guarantee Accuracy: They validate that the final map is accurate within a specified error margin (e.g., $1 \text{ meter}$ ).

GCP Requirements:

Clearly visible in the image (road intersections, building corners)
Accurately surveyed coordinates (GPS, survey-grade)
Distributed across the image (not clustered)
Typically need 4-6 GCPs minimum for basic georeferencing

The Process:

Identify GCPs in the image
Measure their image coordinates
Use known ground coordinates to solve for camera parameters
Apply corrections to all other points

Understanding the geometry is crucial for accurate mapping. In the next module, we’ll use this geometry to extract 3D height information using stereo pairs.