Your SLAM system estimates camera poses at every keyframe. Each pose is a rotation plus a translation — it lives on the curved space SE(3), not in flat R6. Gauss-Newton from L9 works beautifully for flat variables. But if you naively add a correction matrix to a rotation, the result is no longer a valid rotation. This lesson shows why, and then builds the machinery to do Gauss-Newton correctly: tangent spaces, retractions, the ⊞/⊟ operators, and the on-manifold GN update. MIT 16.485 by Luca Carlone, Lectures 18–19.
You've just built a SLAM system. At every keyframe you maintain a camera pose: a 3×3 rotation matrix R and a translation vector t. The Gauss-Newton algorithm from L9 gives you a correction vector δ at each iteration. In the flat, vector world of L9 the update was trivially: x ← x + δ. So you try: R ← R + ΔR.
It breaks immediately. The corrected matrix R + ΔR is no longer a valid rotation. Valid rotations must satisfy RTR = I and det(R) = +1. Adding an arbitrary matrix to R violates both. You've left the manifold — you're now in a place where no physical rotation lives.
This is not a numerical precision issue. It's a geometric one. Rotations don't form a vector space — you can't add two rotations and get a rotation. They form a Lie group: a curved surface where the flat-vector arithmetic of L9 simply doesn't apply.
Take the identity rotation I3 and add a tiny perturbation matrix. Let's check orthogonality and determinant before and after:
python import numpy as np # Current estimate: identity rotation (perfect rotation) R = np.eye(3) print("R^T R before:", np.round(R.T @ R, 4)) # [[1,0,0],[0,1,0],[0,0,1]] ✓ print("det(R) before:", np.round(np.linalg.det(R), 4)) # 1.0 ✓ # Naive "Gauss-Newton" correction: add a step matrix delta_R = np.array([[0, -0.1, 0.05], [0.1, 0, -0.03], [-0.05, 0.03, 0 ]]) R_broken = R + delta_R print("R^T R after naive add:", np.round(R_broken.T @ R_broken, 4)) # NOT the identity — orthogonality broken! print("det(R) after naive add:", np.round(np.linalg.det(R_broken), 4)) # NOT 1.0 — determinant broken! # Output (approx): # R^T R after naive add: # [[1.011 0.003 -0.005] # [0.003 1.010 0.008] # [-0.005 0.008 1.003]] <-- NOT identity # det(R) after naive add: 1.0149 <-- NOT 1.0
The fix? Use the exponential map from L2: instead of adding a matrix to R, we compose R with the matrix exponential of a small skew-symmetric matrix. This keeps us on the manifold. The rest of this lesson is the story of why this works and how to build a complete optimizer around it.
The teal circle is SO(2) — valid 2D rotations (unit vectors on the unit circle). Orange dot = current estimate R. Red dot = R + δ (naive add). Green dot = R · exp(δ) (retraction, always on circle). Watch the red dot fly off the circle as the step grows.
A manifold is a space that looks locally flat — like a small patch of a curved surface — even though globally it curves back on itself. The surface of the Earth is a manifold: stand anywhere and the ground looks flat (a 2D plane), but zoom out and you see it's a sphere. SO(3) is the manifold of all valid 3D rotation matrices.
SO(3) has dimension 3: it takes exactly 3 real numbers to parameterize a rotation (e.g., roll, pitch, yaw). But a rotation matrix has 9 entries. The extra 6 are consumed by constraints: 3 orthogonality constraints (each column unit-length) + 3 orthogonality constraints (columns pairwise orthogonal). So SO(3) is a 3-dimensional curved surface embedded in a 9-dimensional space of matrices.
Similarly SE(3) — rigid-body poses (rotation + translation) — is a 6-dimensional manifold. It takes 6 numbers to describe a pose (3 for rotation, 3 for translation), but a homogeneous matrix has 12 free entries subject to 6 constraints.
In flat space (Rn), the straight line between two points is still in the space. On a manifold, the straight line connecting two rotations by matrix interpolation (1-t)R1 + t·R2 leaves SO(3) for any t ≠ 0, 1 — it dips below the manifold surface. The "straight" path between two rotations must curve to stay on SO(3).
Euler angles (roll, pitch, yaw) are a parameterization of SO(3) — a coordinate chart that maps 3 real numbers to a rotation. They look like a vector space locally. But they suffer from gimbal lock: at pitch = ±90°, roll and yaw become indistinguishable. Two different combinations of (roll, yaw) give the same rotation — the parameterization has a singularity. This is not a property of SO(3) itself; it's a property of Euler angles as coordinates.
| Space | Dimension | Constraints on matrix | Parameterizations |
|---|---|---|---|
| Rn (flat) | n | None | Cartesian coords (global, no singularities) |
| SO(2) | 1 | RTR = I, det=1 | Angle θ (global, periodic) |
| SO(3) | 3 | RTR = I, det=1 | Euler angles (gimbal lock!), quaternions (double-cover), axis-angle (local, no singularities) |
| SE(3) | 6 | Same for R block + t free | Twist coordinates (local, no singularities) |
Even though SO(3) is curved, at any point R on the manifold, we can define a tangent space TRSO(3): the flat plane that just touches the manifold at R, like a tangent plane to a sphere. The tangent space is a genuine vector space — you can add and scale vectors in it freely.
The tangent space at the identity I3 is called the Lie algebra so(3). From L2, you know it consists of all 3×3 skew-symmetric matrices. Every element can be written as ω̂ (the hat map): given a 3-vector ω = [ω1, ω2, ω3]T, the hat map gives:
The tangent space at an arbitrary R is obtained by "translating" the Lie algebra to R. A tangent vector at R looks like R · ω̂ (right translation) or ω̂ · R (left translation). The 3-vector ω is the local coordinate — it lives in R3, which is flat.
The tangent space at the identity (the Lie algebra) is the most convenient place to work because the algebra of skew-symmetric matrices is well understood. But for optimization we often need the tangent space at the current estimate R̄. The tangent space TR̄SO(3) is a 3D linear space; any vector in it can be written as δ ∈ R3 (encoding the three "infinitesimal rotation axes").
Geometrically: stand on the manifold at R̄. The tangent space is the flat 3D plane that touches the sphere of SO(3) at that point. A small vector δ in this plane represents an infinitesimal rotation. To actually move along the manifold, we need to "curve" the straight tangent-space step back onto SO(3) — that's what the exponential map does.
SO(2) is the unit circle (1D manifold). The tangent space at any point is a line tangent to the circle. Drag the slider to move the base point. The orange tangent line is the "flat land" where optimization steps live. A step along the tangent line (red arrow) leaves the circle; projecting it back (the exp map) lands on the green dot.
You've computed a step δ ∈ R3 in the tangent space at your current estimate R̄. Great — it lives in flat land. But now you need to actually update R̄. You cannot use R̄ + δ̂ (adding the skew matrix breaks SO(3)). Instead you use a retraction: a map that takes a tangent vector and returns a new point on the manifold.
For Lie groups, the natural retraction is the exponential map from L2. Given the current rotation R̄ and a small perturbation δ ∈ R3, the retracted point is:
where δ̂ is the hat map of δ (a 3×3 skew-symmetric matrix) and eδ̂ is the matrix exponential. From L2's Rodrigues formula, if θ = ‖δ‖ and k̂ = δ/θ:
For small δ, Exp(δ) ≈ I + δ̂, so this is indeed a "tiny rotation" in the direction δ. Crucially, the result is always a valid rotation: the matrix exponential of a skew-symmetric matrix is always orthogonal with determinant +1.
python from scipy.linalg import expm import numpy as np def hat(omega): """3-vector -> 3x3 skew-symmetric hat map""" w1, w2, w3 = omega return np.array([[ 0, -w3, w2], [ w3, 0, -w1], [-w2, w1, 0]]) # Current estimate: 30-degree rotation around z-axis theta = np.radians(30) R_cur = np.array([[np.cos(theta), -np.sin(theta), 0], [np.sin(theta), np.cos(theta), 0], [0, 0, 1]]) # Perturbation: delta = [0.1, 0.05, -0.03] rad in tangent space delta = np.array([0.1, 0.05, -0.03]) # Method A: naive addition (WRONG) R_naive = R_cur + hat(delta) print("Naive: det =", np.linalg.det(R_naive)) # ~1.0002 — broken print("Naive: ||R^T R - I||_F =", np.linalg.norm(R_naive.T @ R_naive - np.eye(3))) # ~0.012 # Method B: retraction via exp map (CORRECT) R_new = R_cur @ expm(hat(delta)) print("Retraction: det =", np.linalg.det(R_new)) # 1.0000000000 print("Retraction: ||R^T R - I||_F =", np.linalg.norm(R_new.T @ R_new - np.eye(3))) # ~1e-15
The retraction gives us "manifold addition": given a point R on the manifold and a tangent vector δ, we get a new point on the manifold. This operation is called boxplus (⊞):
where δ ∈ R3 is the tangent vector and Exp is the rotation exponential map. This generalizes ordinary vector addition x + δ to the manifold setting. All the properties we'd want hold: R ⊞ 0 = R (zero step does nothing), and for small δ it approximates moving along the manifold in the direction δ.
The inverse operation — "manifold subtraction" — is boxminus (⊝):
This returns the tangent-space vector δ such that R1 ⊞ δ = R2. In other words, it gives the "how-far-apart" between two rotations expressed as a 3-vector in the tangent space. The Log map (inverse of Exp) from L2 makes this concrete: Log returns the axis-angle vector.
The same applies to poses T = (R, t) ∈ SE(3). A perturbation ξ ∈ R6 = [ρ; φ] (3 translational + 3 rotational) is applied via the SE(3) exponential map:
And the difference of two poses is the 6-vector in the tangent space:
This is exactly what your SLAM back-end uses to compute the "relative pose error" between consecutive pose estimates — a 6-vector in the Lie algebra, not a matrix.
python from scipy.linalg import expm, logm import numpy as np def hat(w): # 3-vector → 3×3 skew return np.array([[0,-w[2],w[1]],[w[2],0,-w[0]],[-w[1],w[0],0]]) def vee(W): # 3×3 skew → 3-vector return np.array([W[2,1], W[0,2], W[1,0]]) def boxplus(R, delta): """R ⊞ delta (right perturbation convention)""" return R @ expm(hat(delta)) def boxminus(R2, R1): """R2 ⊟ R1 → tangent vector delta s.t. R1 ⊞ delta = R2""" return vee(logm(R1.T @ R2)) # Example: two rotations 10 degrees apart around z-axis R1 = expm(hat(np.array([0, 0, np.radians(30)]))) R2 = expm(hat(np.array([0, 0, np.radians(40)]))) delta = boxminus(R2, R1) print("R2 ⊟ R1 =", delta) # ≈ [0, 0, 0.1745] (10 degrees = 0.1745 rad around z) R2_reconstructed = boxplus(R1, delta) print("||R2 - (R1 ⊞ (R2 ⊟ R1))||_F =", np.linalg.norm(R2 - R2_reconstructed)) # ~1e-15
Two rotations (orange = R1, blue = R2) on the unit circle. The ⊝ (boxminus) gives the signed angle δ between them in the tangent space. The ⊞ (boxplus) then adds δ back to R1, reconstructing R2 (shown as green check). Drag the angles to explore.
You have a cost function over rotations: minR ∈ SO(3) ∑i ‖ri(R)‖2. How do you run Gauss-Newton? The L9 GN iteration for flat variables was: solve (JTJ)δ = −JTr, then update x ← x + δ. The on-manifold version changes two things:
That's the entire algorithm. Steps 2 and 3 are identical to L9 — all the cleverness is in step 1 (the parameterization) and step 4 (the retraction). The Jacobian in step 2 is taken with respect to the perturbation δ, not with respect to the 9 entries of R. This gives a 3×3 (or k×3) Jacobian, not a 9×9 one.
Consider a residual r(R) = Log(RT R̃) for a rotation measurement R̃. The residual is already in the tangent space (a 3-vector). To compute ∂r/∂δ|δ=0, expand r(R̄ ⊞ δ) = Log((R̄ ⊞ δ)T R̃) = Log(Exp(δ)T R̄T R̃). For small δ this equals Log(R̄TR̃) − Jr−1 δ + O(δ2), where Jr is the right Jacobian of SO(3). At first order, the Jacobian is −Jr−1 evaluated at the current residual.
In practice for small angles (which is typical near convergence), Jr ≈ I3, so the on-manifold Jacobian simplifies to the identity scaling. The normal equations then have a particularly clean form that gives direct, meaningful updates.
python from scipy.linalg import expm, logm import numpy as np def hat(w): return np.array([[0,-w[2],w[1]],[w[2],0,-w[0]],[-w[1],w[0],0]]) def vee(W): return np.array([W[2,1],W[0,2],W[1,0]]) def Log_SO3(R): return vee(logm(R)) # matrix log → 3-vector # On-manifold Gauss-Newton for rotation averaging # Given N noisy rotation measurements R_meas[i], find R* that minimizes # sum_i ||Log(R^T R_meas[i])||^2 def on_manifold_GN(R_meas, max_iter=20, tol=1e-8): R = np.eye(3) # initial guess: identity for it in range(max_iter): # Build J^T J and J^T r (residuals in tangent space) JTJ = np.zeros((3,3)) JTr = np.zeros(3) cost = 0 for Rm in R_meas: r = Log_SO3(R.T @ Rm) # residual in tangent space (3-vec) J = -np.eye(3) # Jacobian ≈ -I for small angles JTJ += J.T @ J JTr += J.T @ r cost += r @ r # Solve normal equations (flat 3x3 system!) delta = np.linalg.solve(JTJ, -JTr) # Retract back onto manifold R = R @ expm(hat(delta)) if np.linalg.norm(delta) < tol: break return R, cost # Test: 3 measurements near a 45-degree rotation around z R_true = expm(hat(np.array([0, 0, np.radians(45)]))) np.random.seed(42) R_meas = [R_true @ expm(hat(np.random.randn(3) * 0.05)) for _ in range(5)] R_est, cost = on_manifold_GN(R_meas) print("Error (deg):", np.degrees(np.linalg.norm(Log_SO3(R_true.T @ R_est)))) # <1 deg
When we write R ⊞ δ = R · Exp(δ), we are using a right perturbation: the correction is applied on the right side of R. Alternatively, we could use a left perturbation: Exp(δ) · R. These give different algorithms, with different Jacobians. Both are valid; they just work in different tangent spaces.
Right perturbation: δ lives in the body frame of the current rotation. A step δ = [0, 0, θ]T means "rotate around the body's z-axis." Left perturbation: δ lives in the world frame. The same vector means "rotate around the world's z-axis."
In VNAV/SLAM, right perturbation is the dominant convention (it's also what GTSAM and Ceres use internally). The residuals and Jacobians are expressed in the body frame, which aligns naturally with how IMU measurements are expressed.
For a residual r(R) = Log(RT M) where M is a fixed measurement rotation:
Right perturbation: R ← R · Exp(δ). The Jacobian of r with respect to δ at δ=0 is −Jr−1(Log(RTM)) where Jr is the right Jacobian of SO(3).
Left perturbation: R ← Exp(δ) · R. The Jacobian of r with respect to δ at δ=0 is −RT · Jr−1(Log(RTM)).
Near convergence, when the residuals are small, Jr ≈ I and both Jacobians simplify to −I or −RT respectively. In code, you'll often see the identity approximation used for efficiency.
Rotation averaging is the canonical manifold optimization problem: given N noisy rotation measurements R̃i, find the rotation R* that minimizes the total geodesic distance squared:
This is the Riemannian Fréchet mean — the rotation closest to all measurements in the geodesic (curved-surface) distance sense. It's what your SLAM back-end computes every time it fuses multiple relative-rotation measurements for a single node.
Let's work through a tiny example: two measurements in SO(2) (2D rotations). Let R̃1 = 20° and R̃2 = 40°. Initial guess: R̄ = 0°.
Step 1 — Compute residuals in tangent space. The residual for measurement i is ri = Log(R̄T R̃i). In SO(2), this is just the signed angle difference:
Step 2 — Jacobian. In SO(2) the residual ri(θ) = R̃i − (θ + δ). So ∂ri/∂δ = −1. Jacobian J = −1 for each measurement.
Step 3 — Normal equations. With J = [−1; −1] (2 measurements, 1 unknown δ):
Step 4 — Retract. R̄ ← R̄ ⊞ δ* = 0° + 30° = 30°. Verify: 30° is exactly the average of 20° and 40°. Converged in one step (because SO(2) is one-dimensional and the residuals are linear in the angle).
What if the initial guess is far away? Try R̄ = 180°, measurements at 20° and 40°. The residuals are now 160° and 140°. The step δ* = 150°. We jump from 180° to 30° in one step — correct! But only because SO(2) is 1D and the cost is convex. In SO(3) with many measurements, large initial errors require multiple iterations and possibly LM damping to avoid overshooting.
Blue dots = measured rotations on the unit circle. Orange dot = current estimate. Press "Step GN" to run one on-manifold Gauss-Newton iteration. "Reset" starts from the red initial guess. Watch the estimate converge to the geodesic mean (teal star).
This showcase brings all the pieces together: a full on-manifold Gauss-Newton optimizer on SO(2), the manifold-break demonstration, gimbal-lock contrast, and convergence diagnostics — all live and interactive.
Left panel: the unit circle manifold. Orange dot = current on-manifold estimate (always on circle). Red diamond = flat Gauss-Newton estimate (drifts off circle). Blue dots = noisy measurements. Right panel: cost vs. iteration for both methods. Sliders control noise and number of measurements. Watch the red diamond drift further off as you increase iterations.
Every piece of the on-manifold puzzle is now assembled. Here's the complete picture in one table, plus connections to the rest of the VNAV pipeline.
| Concept | Flat (vector) world | Manifold (SO(3)/SE(3)) world |
|---|---|---|
| State space | Rn — a vector | Lie group — a curved surface |
| Local representation | The vector itself | Tangent space TRSO(3) ≅ R3 |
| Addition | x + δ | x ⊞ δ = x · Exp(δ) |
| Subtraction | x2 − x1 | x2 ⊝ x1 = Log(x1T x2) |
| Residual | z̃ − h(x) | z̃ ⊝ h(x) = Log(h(x)T z̃) |
| GN step | Solve (JTJ)δ = −JTr, then x ← x + δ | Same solve, then x ← x ⊞ δ |
| Jacobian size | m × n | m × 3 (SO(3)) or m × 6 (SE(3)) |
| Covariance | (JTΩJ)−1 in Rn | (JTΩJ)−1 in TRSO(3) — tangent space covariance |
| Singularities | None (flat) | None — local coords avoid gimbal lock |
VNAV L2 (Lie Groups) built the exp/log maps and hat/vee operators that this lesson relies on. The Rodrigues formula, the BCH approximation, and the retraction are all from L2.
VNAV L9 (NLLS) is the engine this lesson wraps. All the Gauss-Newton and Levenberg-Marquardt machinery transfers directly — we only changed "+" to "⊞" and added a retraction step.
VNAV L11 (VIO) uses on-manifold optimization continuously: the IMU pre-integration residuals live in SE(3), and the optimizer maintains a pose estimate as an SE(3) element updated via ⊞ at every step.
VNAV L13 (Factor Graphs / SLAM) is the full application: every pose variable in a factor graph is an SE(3) element, every edge is a relative-pose measurement with a Gaussian noise model in the tangent space, and the back-end solver is exactly the on-manifold GN/LM developed here — but run over thousands of poses simultaneously using sparse linear algebra.
"The key insight is that the constraint manifold is locally a linear space — the tangent space — and we can do all our optimization work there, then project back."
— Luca Carlone, MIT 16.485 Lecture 18