Visual Navigation for Autonomous Vehicles · MIT 16.485 · Lecture 18–19

Optimization on Manifolds

Your SLAM system estimates camera poses at every keyframe. Each pose is a rotation plus a translation — it lives on the curved space SE(3), not in flat R⁶. Gauss-Newton from L9 works beautifully for flat variables. But if you naively add a correction matrix to a rotation, the result is no longer a valid rotation. This lesson shows why, and then builds the machinery to do Gauss-Newton correctly: tangent spaces, retractions, the ⊞/⊟ operators, and the on-manifold GN update. MIT 16.485 by Luca Carlone, Lectures 18–19.

Prerequisites: VNAV L2 (Lie groups, exp/log maps, hat map) · VNAV L9 (Gauss-Newton & Levenberg-Marquardt). No new math beyond those two.

Chapters

Live Canvases

Hand

Derived Numbers

Chapter 0: The Problem — Your State Isn't a Vector

You've just built a SLAM system. At every keyframe you maintain a camera pose: a 3×3 rotation matrix R and a translation vector t. The Gauss-Newton algorithm from L9 gives you a correction vector δ at each iteration. In the flat, vector world of L9 the update was trivially: x ← x + δ. So you try: R ← R + ΔR.

It breaks immediately. The corrected matrix R + ΔR is no longer a valid rotation. Valid rotations must satisfy R^TR = I and det(R) = +1. Adding an arbitrary matrix to R violates both. You've left the manifold — you're now in a place where no physical rotation lives.

This is not a numerical precision issue. It's a geometric one. Rotations don't form a vector space — you can't add two rotations and get a rotation. They form a Lie group: a curved surface where the flat-vector arithmetic of L9 simply doesn't apply.

The core challenge: Gauss-Newton assumes you can add a step δ to your current estimate and get a new valid estimate. For rotations this is false — the addition R + δ doesn't stay on SO(3). We need a principled replacement for the "+" operation.

What goes wrong — by the numbers

Take the identity rotation I₃ and add a tiny perturbation matrix. Let's check orthogonality and determinant before and after:

python
import numpy as np

# Current estimate: identity rotation (perfect rotation)
R = np.eye(3)
print("R^T R before:", np.round(R.T @ R, 4))   # [[1,0,0],[0,1,0],[0,0,1]] ✓
print("det(R) before:", np.round(np.linalg.det(R), 4))   # 1.0 ✓

# Naive "Gauss-Newton" correction: add a step matrix
delta_R = np.array([[0,   -0.1,  0.05],
                    [0.1,  0,   -0.03],
                    [-0.05, 0.03, 0  ]])

R_broken = R + delta_R
print("R^T R after naive add:", np.round(R_broken.T @ R_broken, 4))
# NOT the identity — orthogonality broken!
print("det(R) after naive add:", np.round(np.linalg.det(R_broken), 4))
# NOT 1.0 — determinant broken!

# Output (approx):
# R^T R after naive add:
# [[1.011  0.003 -0.005]
#  [0.003  1.010  0.008]
#  [-0.005  0.008  1.003]]   <-- NOT identity
# det(R) after naive add: 1.0149   <-- NOT 1.0

The fix? Use the exponential map from L2: instead of adding a matrix to R, we compose R with the matrix exponential of a small skew-symmetric matrix. This keeps us on the manifold. The rest of this lesson is the story of why this works and how to build a complete optimizer around it.

R + ΔR breaks orthogonality — drag the step size

The teal circle is SO(2) — valid 2D rotations (unit vectors on the unit circle). Orange dot = current estimate R. Red dot = R + δ (naive add). Green dot = R · exp(δ) (retraction, always on circle). Watch the red dot fly off the circle as the step grows.

Step size δ 0.3

You have a rotation matrix R ∈ SO(3). You run one step of vector-space Gauss-Newton and get a correction matrix ΔR. You set R ← R + ΔR. What is guaranteed to be true about R + ΔR?

It satisfies (R+ΔR)^T(R+ΔR) = I as long as ΔR is small It satisfies det(R+ΔR) = 1 because the determinant changes continuously Both orthogonality and unit determinant are preserved for any tiny ΔR Neither orthogonality nor unit determinant is guaranteed — R+ΔR is generically not in SO(3)

Chapter 1: Manifolds & Why SO(3) is Curved

A manifold is a space that looks locally flat — like a small patch of a curved surface — even though globally it curves back on itself. The surface of the Earth is a manifold: stand anywhere and the ground looks flat (a 2D plane), but zoom out and you see it's a sphere. SO(3) is the manifold of all valid 3D rotation matrices.

SO(3) has dimension 3: it takes exactly 3 real numbers to parameterize a rotation (e.g., roll, pitch, yaw). But a rotation matrix has 9 entries. The extra 6 are consumed by constraints: 3 orthogonality constraints (each column unit-length) + 3 orthogonality constraints (columns pairwise orthogonal). So SO(3) is a 3-dimensional curved surface embedded in a 9-dimensional space of matrices.

Similarly SE(3) — rigid-body poses (rotation + translation) — is a 6-dimensional manifold. It takes 6 numbers to describe a pose (3 for rotation, 3 for translation), but a homogeneous matrix has 12 free entries subject to 6 constraints.

Key insight — dimension vs. ambient dimension: SO(3) has dimension 3 (you need 3 numbers to describe a rotation), but any particular rotation matrix has 9 entries. The gap (9 - 3 = 6) is exactly the number of constraints. This is why "adding" a 9-entry matrix to a rotation is almost certainly going to break those 6 constraints.

The curvature is the problem

In flat space (Rⁿ), the straight line between two points is still in the space. On a manifold, the straight line connecting two rotations by matrix interpolation (1-t)R₁ + t·R₂ leaves SO(3) for any t ≠ 0, 1 — it dips below the manifold surface. The "straight" path between two rotations must curve to stay on SO(3).

Euler angles (roll, pitch, yaw) are a parameterization of SO(3) — a coordinate chart that maps 3 real numbers to a rotation. They look like a vector space locally. But they suffer from gimbal lock: at pitch = ±90°, roll and yaw become indistinguishable. Two different combinations of (roll, yaw) give the same rotation — the parameterization has a singularity. This is not a property of SO(3) itself; it's a property of Euler angles as coordinates.

Misconception: gimbal lock is not a fundamental limitation of 3D rotations — it's an artifact of the Euler-angle parameterization. SO(3) itself is perfectly smooth and non-degenerate everywhere. Use local coordinates (the tangent space, below) and you never see singularities.

Space	Dimension	Constraints on matrix	Parameterizations
Rⁿ (flat)	n	None	Cartesian coords (global, no singularities)
SO(2)	1	R^TR = I, det=1	Angle θ (global, periodic)
SO(3)	3	R^TR = I, det=1	Euler angles (gimbal lock!), quaternions (double-cover), axis-angle (local, no singularities)
SE(3)	6	Same for R block + t free	Twist coordinates (local, no singularities)

SO(3) has dimension 3. A rotation matrix has 9 entries. How many independent constraints must a 3×3 matrix satisfy to be a valid rotation in SO(3)?

3 constraints (one per rotation angle) 4 constraints (orthogonality + determinant) 6 constraints (9 entries minus 3 degrees of freedom) 9 constraints (every entry is fixed once the 3 angles are given)

Chapter 2: The Tangent Space — Local Flat Land

Even though SO(3) is curved, at any point R on the manifold, we can define a tangent space T_RSO(3): the flat plane that just touches the manifold at R, like a tangent plane to a sphere. The tangent space is a genuine vector space — you can add and scale vectors in it freely.

The tangent space at the identity I₃ is called the Lie algebra so(3). From L2, you know it consists of all 3×3 skew-symmetric matrices. Every element can be written as ω̂ (the hat map): given a 3-vector ω = [ω₁, ω₂, ω₃]^T, the hat map gives:

ω̂ = [ω]_× =
  [[ 0, −ω₃, ω₂],
   [ ω₃, 0, −ω₁],
   [−ω₂, ω₁, 0]]

The tangent space at an arbitrary R is obtained by "translating" the Lie algebra to R. A tangent vector at R looks like R · ω̂ (right translation) or ω̂ · R (left translation). The 3-vector ω is the local coordinate — it lives in R³, which is flat.

The power of the tangent space: At any point R on SO(3), the tangent space is just R³. This means we can do all of L9's Gauss-Newton machinery — Jacobians, linear systems, least squares — in the tangent space, where everything is flat. We just need a way to bring the answer back onto the manifold afterwards.

Tangent space at an arbitrary point

The tangent space at the identity (the Lie algebra) is the most convenient place to work because the algebra of skew-symmetric matrices is well understood. But for optimization we often need the tangent space at the current estimate R̄. The tangent space T_R̄SO(3) is a 3D linear space; any vector in it can be written as δ ∈ R³ (encoding the three "infinitesimal rotation axes").

Geometrically: stand on the manifold at R̄. The tangent space is the flat 3D plane that touches the sphere of SO(3) at that point. A small vector δ in this plane represents an infinitesimal rotation. To actually move along the manifold, we need to "curve" the straight tangent-space step back onto SO(3) — that's what the exponential map does.

Tangent space at a point on SO(2) — the unit circle

SO(2) is the unit circle (1D manifold). The tangent space at any point is a line tangent to the circle. Drag the slider to move the base point. The orange tangent line is the "flat land" where optimization steps live. A step along the tangent line (red arrow) leaves the circle; projecting it back (the exp map) lands on the green dot.

Base angle θ 0.8

Tangent step 0.5

The tangent space T_RSO(3) at a rotation R is a vector space of dimension ___.

9 (same as the ambient space of 3×3 matrices) 3 (same as the dimension of SO(3) itself) 6 (the number of constraints) 1 (there's only one rotation axis)

Chapter 3: Retraction — The Bridge Back to the Manifold

You've computed a step δ ∈ R³ in the tangent space at your current estimate R̄. Great — it lives in flat land. But now you need to actually update R̄. You cannot use R̄ + δ̂ (adding the skew matrix breaks SO(3)). Instead you use a retraction: a map that takes a tangent vector and returns a new point on the manifold.

For Lie groups, the natural retraction is the exponential map from L2. Given the current rotation R̄ and a small perturbation δ ∈ R³, the retracted point is:

R_new = R̄ · Exp(δ) = R̄ · e^δ̂

where δ̂ is the hat map of δ (a 3×3 skew-symmetric matrix) and e^δ̂ is the matrix exponential. From L2's Rodrigues formula, if θ = ‖δ‖ and k̂ = δ/θ:

Exp(δ) = I + sin(θ) k̂ + (1 − cos(θ)) k̂²

For small δ, Exp(δ) ≈ I + δ̂, so this is indeed a "tiny rotation" in the direction δ. Crucially, the result is always a valid rotation: the matrix exponential of a skew-symmetric matrix is always orthogonal with determinant +1.

Why R · Exp(δ) stays in SO(3): Let A = e^δ̂. Since δ̂^T = −δ̂, we have A^T = e^−δ̂ = A⁻¹. So A^TA = I. Also det(A) = e^tr(δ̂) = e⁰ = 1 (since skew matrices have zero trace). Thus A ∈ SO(3), and R · A ∈ SO(3) by group closure.

Worked numbers: staying on the manifold

python
from scipy.linalg import expm
import numpy as np

def hat(omega):
    """3-vector -> 3x3 skew-symmetric hat map"""
    w1, w2, w3 = omega
    return np.array([[ 0,  -w3,  w2],
                     [ w3,   0, -w1],
                     [-w2,  w1,   0]])

# Current estimate: 30-degree rotation around z-axis
theta = np.radians(30)
R_cur = np.array([[np.cos(theta), -np.sin(theta), 0],
                  [np.sin(theta),  np.cos(theta), 0],
                  [0,             0,            1]])

# Perturbation: delta = [0.1, 0.05, -0.03] rad in tangent space
delta = np.array([0.1, 0.05, -0.03])

# Method A: naive addition (WRONG)
R_naive = R_cur + hat(delta)
print("Naive: det =", np.linalg.det(R_naive))           # ~1.0002 — broken
print("Naive: ||R^T R - I||_F =", np.linalg.norm(R_naive.T @ R_naive - np.eye(3)))  # ~0.012

# Method B: retraction via exp map (CORRECT)
R_new = R_cur @ expm(hat(delta))
print("Retraction: det =", np.linalg.det(R_new))          # 1.0000000000
print("Retraction: ||R^T R - I||_F =", np.linalg.norm(R_new.T @ R_new - np.eye(3)))  # ~1e-15

Misconception: The tangent space is only a LOCAL flat approximation. Take too large a δ and the first-order linearization becomes inaccurate — the retracted point is still on SO(3), but it may not be where the true cost function minimum is. This is exactly why we still need damping (Levenberg-Marquardt from L9) even in the on-manifold setting.

R · Exp(δ) is always in SO(3) regardless of how large δ is. Why don't we just take one huge step then?

Because Exp(δ) is not defined for large δ Because the matrix exponential is numerically unstable for large inputs Because the step δ was computed by linearizing the cost function in the tangent space — the linearization is only accurate locally, so a large step overshoots the true minimum Because SO(3) has a maximum step size built into its geometry

Chapter 4: ⊞ and ⊝ — Manifold Addition and Subtraction

The retraction gives us "manifold addition": given a point R on the manifold and a tangent vector δ, we get a new point on the manifold. This operation is called boxplus (⊞):

R ⊞ δ := R · Exp(δ) ∈ SO(3)

where δ ∈ R³ is the tangent vector and Exp is the rotation exponential map. This generalizes ordinary vector addition x + δ to the manifold setting. All the properties we'd want hold: R ⊞ 0 = R (zero step does nothing), and for small δ it approximates moving along the manifold in the direction δ.

The inverse operation — "manifold subtraction" — is boxminus (⊝):

R₂ ⊝ R₁ := Log(R₁^T · R₂) ∈ R³

This returns the tangent-space vector δ such that R₁ ⊞ δ = R₂. In other words, it gives the "how-far-apart" between two rotations expressed as a 3-vector in the tangent space. The Log map (inverse of Exp) from L2 makes this concrete: Log returns the axis-angle vector.

Why ⊞/⊝ matter: These two operators are all you need to generalize any vector-space algorithm to manifolds. Wherever you see x + δ, write x ⊞ δ. Wherever you see x₂ − x₁, write x₂ ⊝ x₁. The on-manifold Gauss-Newton is literally this substitution applied to the L9 algorithm.

For SE(3): the pose boxplus

The same applies to poses T = (R, t) ∈ SE(3). A perturbation ξ ∈ R⁶ = [ρ; φ] (3 translational + 3 rotational) is applied via the SE(3) exponential map:

T ⊞ ξ := T · Exp_SE(3)(ξ)

And the difference of two poses is the 6-vector in the tangent space:

T₂ ⊝ T₁ := Log_SE(3)(T₁⁻¹ · T₂) ∈ R⁶

This is exactly what your SLAM back-end uses to compute the "relative pose error" between consecutive pose estimates — a 6-vector in the Lie algebra, not a matrix.

python
from scipy.linalg import expm, logm
import numpy as np

def hat(w):  # 3-vector → 3×3 skew
    return np.array([[0,-w[2],w[1]],[w[2],0,-w[0]],[-w[1],w[0],0]])

def vee(W):  # 3×3 skew → 3-vector
    return np.array([W[2,1], W[0,2], W[1,0]])

def boxplus(R, delta):
    """R ⊞ delta  (right perturbation convention)"""
    return R @ expm(hat(delta))

def boxminus(R2, R1):
    """R2 ⊟ R1  → tangent vector delta s.t. R1 ⊞ delta = R2"""
    return vee(logm(R1.T @ R2))

# Example: two rotations 10 degrees apart around z-axis
R1 = expm(hat(np.array([0, 0, np.radians(30)])))
R2 = expm(hat(np.array([0, 0, np.radians(40)])))

delta = boxminus(R2, R1)
print("R2 ⊟ R1 =", delta)   # ≈ [0, 0, 0.1745] (10 degrees = 0.1745 rad around z)

R2_reconstructed = boxplus(R1, delta)
print("||R2 - (R1 ⊞ (R2 ⊟ R1))||_F =", np.linalg.norm(R2 - R2_reconstructed))  # ~1e-15

⊞ and ⊝ on SO(2) — two points on a circle

Two rotations (orange = R₁, blue = R₂) on the unit circle. The ⊝ (boxminus) gives the signed angle δ between them in the tangent space. The ⊞ (boxplus) then adds δ back to R₁, reconstructing R₂ (shown as green check). Drag the angles to explore.

R₁ angle 0.5

R₂ angle 1.8

R₂ ⊝ R₁ gives a vector δ ∈ R³. What does it mean geometrically?

The matrix difference R₂ − R₁ The rotation that takes R₁ to R₂, expressed as a full rotation matrix The axis-angle tangent vector (in R³) such that applying it to R₁ via ⊞ yields R₂ The Euclidean distance between the two rotation matrices

Chapter 5: On-Manifold Gauss-Newton

You have a cost function over rotations: min_{R ∈ SO(3)} ∑_i ‖r_i(R)‖². How do you run Gauss-Newton? The L9 GN iteration for flat variables was: solve (J^TJ)δ = −J^Tr, then update x ← x + δ. The on-manifold version changes two things:

Pull back to tangent space

Reparameterize the cost as f(R̄ ⊞ δ) for a small δ ∈ R³. Now δ is a flat vector — normal L9 GN applies.

↓

Compute Jacobian w.r.t. δ

J_i = ∂r_i/∂δ |_δ=0. This is a 1×3 (or k×3) matrix. Evaluate at δ=0 (current estimate).

↓

Solve linear system in flat space

(J^TJ)δ* = −J^Tr. Exact same normal equations as L9. No manifold awareness needed here.

↓

Retract back onto manifold

R̄ ← R̄ ⊞ δ* = R̄ · Exp(δ*). The step δ* is small, so linearization was accurate. Now repeat.

↻ until ‖δ*‖ < ε

That's the entire algorithm. Steps 2 and 3 are identical to L9 — all the cleverness is in step 1 (the parameterization) and step 4 (the retraction). The Jacobian in step 2 is taken with respect to the perturbation δ, not with respect to the 9 entries of R. This gives a 3×3 (or k×3) Jacobian, not a 9×9 one.

Misconception — minimal coordinates: The Jacobian has columns for δ₁, δ₂, δ₃ (3 columns for SO(3), 6 for SE(3)) — not for the 9 entries of a rotation matrix. This is the "minimal local coordinates" approach. It avoids the redundancy of the 9-entry parameterization and dodges all singularities.

The Jacobian with respect to the perturbation

Consider a residual r(R) = Log(R^T R̃) for a rotation measurement R̃. The residual is already in the tangent space (a 3-vector). To compute ∂r/∂δ|_δ=0, expand r(R̄ ⊞ δ) = Log((R̄ ⊞ δ)^T R̃) = Log(Exp(δ)^T R̄^T R̃). For small δ this equals Log(R̄^TR̃) − J_r⁻¹ δ + O(δ²), where J_r is the right Jacobian of SO(3). At first order, the Jacobian is −J_r⁻¹ evaluated at the current residual.

In practice for small angles (which is typical near convergence), J_r ≈ I₃, so the on-manifold Jacobian simplifies to the identity scaling. The normal equations then have a particularly clean form that gives direct, meaningful updates.

python
from scipy.linalg import expm, logm
import numpy as np

def hat(w): return np.array([[0,-w[2],w[1]],[w[2],0,-w[0]],[-w[1],w[0],0]])
def vee(W): return np.array([W[2,1],W[0,2],W[1,0]])
def Log_SO3(R): return vee(logm(R))   # matrix log → 3-vector

# On-manifold Gauss-Newton for rotation averaging
# Given N noisy rotation measurements R_meas[i], find R* that minimizes
# sum_i ||Log(R^T R_meas[i])||^2
def on_manifold_GN(R_meas, max_iter=20, tol=1e-8):
    R = np.eye(3)   # initial guess: identity
    for it in range(max_iter):
        # Build J^T J and J^T r (residuals in tangent space)
        JTJ = np.zeros((3,3))
        JTr = np.zeros(3)
        cost = 0
        for Rm in R_meas:
            r = Log_SO3(R.T @ Rm)      # residual in tangent space (3-vec)
            J = -np.eye(3)             # Jacobian ≈ -I for small angles
            JTJ += J.T @ J
            JTr += J.T @ r
            cost += r @ r
        # Solve normal equations (flat 3x3 system!)
        delta = np.linalg.solve(JTJ, -JTr)
        # Retract back onto manifold
        R = R @ expm(hat(delta))
        if np.linalg.norm(delta) < tol: break
    return R, cost

# Test: 3 measurements near a 45-degree rotation around z
R_true = expm(hat(np.array([0, 0, np.radians(45)])))
np.random.seed(42)
R_meas = [R_true @ expm(hat(np.random.randn(3) * 0.05)) for _ in range(5)]
R_est, cost = on_manifold_GN(R_meas)
print("Error (deg):", np.degrees(np.linalg.norm(Log_SO3(R_true.T @ R_est))))  # <1 deg

In the on-manifold GN step, you solve a linear system for δ ∈ R³. The result δ* is a 3-vector. What do you do next?

Add δ* directly to R as a 3-vector (needs reshaping) Compute R ← R · Exp(δ*) using the matrix exponential of hat(δ*) Reshape δ* into a 3×3 matrix and add it to R Project R + hat(δ*) onto SO(3) using SVD

Chapter 6: Left vs. Right Perturbation

When we write R ⊞ δ = R · Exp(δ), we are using a right perturbation: the correction is applied on the right side of R. Alternatively, we could use a left perturbation: Exp(δ) · R. These give different algorithms, with different Jacobians. Both are valid; they just work in different tangent spaces.

Right perturbation: δ lives in the body frame of the current rotation. A step δ = [0, 0, θ]^T means "rotate around the body's z-axis." Left perturbation: δ lives in the world frame. The same vector means "rotate around the world's z-axis."

In VNAV/SLAM, right perturbation is the dominant convention (it's also what GTSAM and Ceres use internally). The residuals and Jacobians are expressed in the body frame, which aligns naturally with how IMU measurements are expressed.

Which convention to use? It doesn't matter which you choose, as long as you're consistent. The Jacobians will look different but the optimizer will converge to the same answer. The right-perturbation convention is more common in robotics because sensor measurements (IMU, wheel odometry) are naturally expressed in the body frame.

How the Jacobian changes

For a residual r(R) = Log(R^T M) where M is a fixed measurement rotation:

Right perturbation: R ← R · Exp(δ). The Jacobian of r with respect to δ at δ=0 is −J_r⁻¹(Log(R^TM)) where J_r is the right Jacobian of SO(3).

Left perturbation: R ← Exp(δ) · R. The Jacobian of r with respect to δ at δ=0 is −R^T · J_r⁻¹(Log(R^TM)).

Near convergence, when the residuals are small, J_r ≈ I and both Jacobians simplify to −I or −R^T respectively. In code, you'll often see the identity approximation used for efficiency.

Singularity avoidance: Both left and right local perturbations avoid the gimbal-lock singularity of Euler angles. The tangent space at any point R is always a smooth, non-degenerate R³ — there's no configuration where two axes align and a degree of freedom disappears. This is the key practical reason to use Lie-group optimization instead of Euler angles.

You use right perturbation: R ← R · Exp(δ). A step δ = [0, 0, 0.1]^T means rotation by 0.1 rad around which axis?

The world-frame z-axis (absolute orientation) The body-frame z-axis (the z-axis as seen from R's perspective) Whichever z-axis — left and right perturbations give the same answer Neither — right perturbation applies to translations, not rotations

Chapter 7: Rotation Averaging — A Worked Example

Rotation averaging is the canonical manifold optimization problem: given N noisy rotation measurements R̃_i, find the rotation R* that minimizes the total geodesic distance squared:

R* = argmin_{R ∈ SO(3)} ∑_i ‖Log(R^T R̃_i)‖²

This is the Riemannian Fréchet mean — the rotation closest to all measurements in the geodesic (curved-surface) distance sense. It's what your SLAM back-end computes every time it fuses multiple relative-rotation measurements for a single node.

One full iteration by hand

Let's work through a tiny example: two measurements in SO(2) (2D rotations). Let R̃₁ = 20° and R̃₂ = 40°. Initial guess: R̄ = 0°.

Step 1 — Compute residuals in tangent space. The residual for measurement i is r_i = Log(R̄^T R̃_i). In SO(2), this is just the signed angle difference:

r₁ = 20° − 0° = +20° = +0.349 rad

r₂ = 40° − 0° = +40° = +0.698 rad

Step 2 — Jacobian. In SO(2) the residual r_i(θ) = R̃_i − (θ + δ). So ∂r_i/∂δ = −1. Jacobian J = −1 for each measurement.

Step 3 — Normal equations. With J = [−1; −1] (2 measurements, 1 unknown δ):

J^TJ = [−1, −1][−1; −1] = 2

J^Tr = [−1, −1][0.349; 0.698] = −1.047

δ* = −(J^TJ)⁻¹ J^Tr = −(1/2)(−1.047) = +0.524 rad = +30°

Step 4 — Retract. R̄ ← R̄ ⊞ δ* = 0° + 30° = 30°. Verify: 30° is exactly the average of 20° and 40°. Converged in one step (because SO(2) is one-dimensional and the residuals are linear in the angle).

Sanity check: The Fréchet mean of {20°, 40°} on SO(2) is 30° — exactly the arithmetic mean, because we're in 1D and the angles are close enough that geodesics and straight lines coincide. In SO(3), the geodesic mean will generally differ from the naive componentwise average of Euler angles (which isn't even well-defined at singularities).

Convergence vs. step size

What if the initial guess is far away? Try R̄ = 180°, measurements at 20° and 40°. The residuals are now 160° and 140°. The step δ* = 150°. We jump from 180° to 30° in one step — correct! But only because SO(2) is 1D and the cost is convex. In SO(3) with many measurements, large initial errors require multiple iterations and possibly LM damping to avoid overshooting.

On-manifold GN: rotation averaging on SO(2)

Blue dots = measured rotations on the unit circle. Orange dot = current estimate. Press "Step GN" to run one on-manifold Gauss-Newton iteration. "Reset" starts from the red initial guess. Watch the estimate converge to the geodesic mean (teal star).

Noise σ 0.2

In the rotation averaging example, the on-manifold GN residual is r_i = Log(R̄^T R̃_i). What would happen if you instead used the naive residual r_i = R̃_i − R̄ (matrix subtraction)?

It would work identically because the log map is approximately linear near the identity It would converge more slowly but to the correct answer It would work but only for small rotation differences The residual R̃_i − R̄ is a 3×3 matrix (not a tangent vector), so the normal equations J^TJδ = −J^Tr wouldn't even make dimensional sense for a 3-DOF update

Chapter 8: Showcase — Full On-Manifold GN Optimizer

This showcase brings all the pieces together: a full on-manifold Gauss-Newton optimizer on SO(2), the manifold-break demonstration, gimbal-lock contrast, and convergence diagnostics — all live and interactive.

On-manifold vs. off-manifold: visual comparison with convergence diagnostics

Left panel: the unit circle manifold. Orange dot = current on-manifold estimate (always on circle). Red diamond = flat Gauss-Newton estimate (drifts off circle). Blue dots = noisy measurements. Right panel: cost vs. iteration for both methods. Sliders control noise and number of measurements. Watch the red diamond drift further off as you increase iterations.

Num. meas. 6

Noise σ 0.25

Chapter 9: Connections & Cheat Sheet

Every piece of the on-manifold puzzle is now assembled. Here's the complete picture in one table, plus connections to the rest of the VNAV pipeline.

Concept	Flat (vector) world	Manifold (SO(3)/SE(3)) world
State space	Rⁿ — a vector	Lie group — a curved surface
Local representation	The vector itself	Tangent space T_RSO(3) ≅ R³
Addition	x + δ	x ⊞ δ = x · Exp(δ)
Subtraction	x₂ − x₁	x₂ ⊝ x₁ = Log(x₁^T x₂)
Residual	z̃ − h(x)	z̃ ⊝ h(x) = Log(h(x)^T z̃)
GN step	Solve (J^TJ)δ = −J^Tr, then x ← x + δ	Same solve, then x ← x ⊞ δ
Jacobian size	m × n	m × 3 (SO(3)) or m × 6 (SE(3))
Covariance	(J^TΩJ)⁻¹ in Rⁿ	(J^TΩJ)⁻¹ in T_RSO(3) — tangent space covariance
Singularities	None (flat)	None — local coords avoid gimbal lock

The full algorithm

Initialize R̄ (e.g., identity or RANSAC output)

↓

Compute residuals: r_i = Log(R̄^T R̃_i) ∈ R³

↓

Compute Jacobians: J_i = ∂r_i/∂δ |_δ=0 ∈ R^3×3

↓

Build and solve: (J^TΩJ)δ* = −J^TΩr

↓

Retract: R̄ ← R̄ · Exp(δ*)

↻ until ‖δ*‖ < ε

Connections to the rest of VNAV

VNAV L2 (Lie Groups) built the exp/log maps and hat/vee operators that this lesson relies on. The Rodrigues formula, the BCH approximation, and the retraction are all from L2.

VNAV L9 (NLLS) is the engine this lesson wraps. All the Gauss-Newton and Levenberg-Marquardt machinery transfers directly — we only changed "+" to "⊞" and added a retraction step.

VNAV L11 (VIO) uses on-manifold optimization continuously: the IMU pre-integration residuals live in SE(3), and the optimizer maintains a pose estimate as an SE(3) element updated via ⊞ at every step.

VNAV L13 (Factor Graphs / SLAM) is the full application: every pose variable in a factor graph is an SE(3) element, every edge is a relative-pose measurement with a Gaussian noise model in the tangent space, and the back-end solver is exactly the on-manifold GN/LM developed here — but run over thousands of poses simultaneously using sparse linear algebra.

The conceptual leap in one sentence: Gauss-Newton optimizes in the tangent space (flat) and retracts back onto the manifold (curved) — keeping all the efficiency of flat optimization while respecting the geometry of rotations and poses.

"The key insight is that the constraint manifold is locally a linear space — the tangent space — and we can do all our optimization work there, then project back."
— Luca Carlone, MIT 16.485 Lecture 18

A SLAM back-end maintains 500 camera poses, each in SE(3). When it runs one GN iteration, the linear system it solves has unknowns in which space?

The 16-entry homogeneous matrix space (16 × 500 = 8000 unknowns) SE(3) directly — the solver works on the manifold The tangent space of SE(3) — 6 real unknowns per pose, 3000 unknowns total, in flat R³⁰⁰⁰ The Euler-angle space — 3 angles per pose plus translation