Visual Navigation for Autonomous Vehicles · MIT 16.485 · Lectures 4–5

Lie Groups: SO(3), SE(3) & the Exponential Map

You want to optimize a drone's orientation — nudge it slightly toward a target. In ordinary calculus you just add a small vector. But rotation matrices live on a curved manifold: add two of them and you leave SO(3) entirely, violating orthogonality. Lie theory gives the fix: encode the update as a tangent vector, map it onto the manifold via the matrix exponential, and compose. The result is the retraction R ← R·exp(δ̂) — the foundation of every modern state estimator and graph optimizer. This lesson derives it all from the power series, with worked numbers at every step. MIT 16.485 by Luca Carlone.

Prerequisites: VNAV L1 (rotation matrices, SO(3), SE(3) basics). Linear algebra (matrix multiply, eigenvalues). Basic calculus.
10
Chapters
5
Live Canvases
Derived
From Power Series

Chapter 0: The Optimization Problem

Imagine you are running a drone localization system. Every few milliseconds, new IMU data arrives and you need to update your estimate of the drone's orientation — a rotation matrix R ∈ SO(3). The natural thing to do in optimization is take a gradient step: compute a small correction δ and set R ← R + δ. This works for scalars, for position vectors, for anything that lives in a flat Euclidean space.

It does not work for rotation matrices. Here is the concrete disaster. Suppose R is the identity I₃ (facing forward) and you add a matrix ε that represents a small rotation. The sum I₃ + ε is not a rotation matrix. Its columns are no longer unit length, no longer orthogonal. RTR ≠ I. det ≠ 1. Your "rotation" is now a shear or a scaling — physically meaningless. Every subsequent operation that assumed R ∈ SO(3) gives garbage.

The core problem: SO(3) is not a vector space. You cannot add two rotation matrices and get a rotation matrix. You cannot average them. You cannot take a gradient step. The set of valid rotations is a curved surface — a manifold — embedded in ℝ3×3, and straight lines in ℝ3×3 leave that surface immediately.

The solution is Lie theory. Every rotation matrix R has a tangent space — a flat 3D vector space that "touches" the manifold at R. You express your update δ as a vector in the tangent space (specifically, at the identity). The exponential map exp curves that flat update back onto the manifold, producing a valid rotation. The update rule becomes:

R ← R · exp(δ̂)

where δ ∈ ℝ³ is the tangent update, δ̂ is its 3×3 skew-symmetric matrix form, and exp(δ̂) is the matrix exponential. This retraction is the foundational operation in SLAM, visual odometry, and any optimization over rotations. This lesson derives every piece from scratch.

Why three representations? A preview

By the end of this lesson you will have seen three things that are all "the same rotation" expressed differently:

ObjectSpaceFormUse
R ∈ SO(3)Lie group (manifold)3×3 matrixApply rotation to vectors, compose rotations
φ̂ ∈ so(3)Lie algebra (tangent space at I)3×3 skew matrixRepresent updates, compute Jacobians
φ ∈ ℝ³Euclidean vector space3-vector (axis×angle)Store compactly, do linear operations on updates

The hat map converts φ ↔ φ̂ (trivially). The exp/log maps convert φ̂ ↔ R (non-trivially — this is where Rodrigues lives). Understanding all three objects and both maps is the core of Lie theory for robotics.

Manifold vs. flat space — why adding rotations breaks them

Left: the red circle represents SO(2) — unit-det 2×2 rotations (angle θ). The blue line is the tangent space at the identity. Adding δ along the tangent (orange dot) lifts OFF the circle. Mapping via exp (teal dot) stays ON it. Drag δ to see the gap grow.

Tangent update δ 0.80
You have a valid rotation matrix R ∈ SO(3) and a small correction matrix ε. Why is R + ε NOT a valid rotation matrix in general?

Chapter 1: Groups & Lie Groups

Before we can talk about the exponential map, we need to understand what kind of mathematical object SO(3) is. The answer: it is a Lie group. A Lie group is both a group (in the algebraic sense) and a smooth manifold (in the geometric sense). Both parts matter.

Group axioms

A group (G, ⊗) is a set G with a binary operation ⊗ satisfying four axioms:

AxiomStatementFor SO(3)
ClosureA ⊗ B ∈ G for all A,B ∈ GR₁R₂ is a rotation if R₁,R₂ are
Associativity(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C)Matrix multiplication is associative
Identity∃ I: A ⊗ I = I ⊗ A = AI₃ — the identity matrix
Inverse∃ A⁻¹: A ⊗ A⁻¹ = IR⁻¹ = RT

SO(3) satisfies all four with matrix multiplication as ⊗. Note what is absent: commutativity. R₁R₂ ≠ R₂R₁ in general — SO(3) is a non-abelian group. Rotations in 3D do not commute, a fact that will return to haunt us in Chapter 7.

Proof that SO(3) is non-abelian — with numbers

Let R1 = Rz(90°) and R2 = Rx(90°). Compute both orders:

Rz(90°) = [[0,−1,0],[1,0,0],[0,0,1]]
Rx(90°) = [[1,0,0],[0,0,−1],[0,1,0]]

RzRx:
row0: [0·1+(−1)·0+0·0, 0·0+(−1)·0+0·1, 0·0+(−1)·(−1)+0·0] = [0, 0, 1]
row1: [1·1+0·0+0·0, 1·0+0·0+0·1, 1·0+0·(−1)+0·0] = [1, 0, 0]
row2: [0·1+0·0+1·0, 0·0+0·0+1·1, 0·0+0·(−1)+1·0] = [0, 1, 0]
⇒ RzRx = [[0,0,1],[1,0,0],[0,1,0]]

RxRz:
row0 = [0,−1,0] (same as Rz row0 since Rx row0=[1,0,0])
row1 = [0·0+(−1)·1+0·0, 0·(−1)+(−1)·0+0·0, ...] ⇒ [−1, 0, 0] (row1 = Rz row2 negated...)
⇒ RxRz = [[0,−1,0],[0,0,−1],[1,0,0]]

RzRx ≠ RxRz — the "final orientation" depends on which rotation came first. This is not a numerical artifact. It reflects the fundamental topology of SO(3).

SE(3) is also a group under matrix multiplication. Its identity element is the 4×4 identity. Its inverse is [[RT, −RTt],[0, 1]], as derived in L1.

Manifold structure

A smooth manifold is a space that looks locally flat — every neighborhood of every point is homeomorphic to ℝn for some n. SO(3) is a 3-dimensional manifold embedded in ℝ9 (the space of 3×3 matrices): locally, near any rotation, the set of valid rotations looks like a 3D flat space. SE(3) is a 6-dimensional manifold (3 for rotation, 3 for translation).

A Lie group is a smooth manifold that is also a group, with smooth group operations. SO(3), SE(3), O(d), and the general linear group GL(d,ℝ) are all matrix Lie groups. The key benefit: they admit a tangent space at every point, giving us the machinery to do calculus — and optimization — on them.

Key insight: a Lie group is a curved space with algebraic structure. The "curved" part (manifold) means you cannot add elements like vectors. The "algebraic" part (group) means you can compose elements, invert them, and navigate relative to the identity. The exponential map bridges both worlds: it converts flat tangent-space vectors into curved group elements.
Which of these is NOT a property of the group SO(3) under matrix multiplication?

Chapter 2: The Lie Algebra so(3)

Every matrix Lie group G has an associated Lie algebra g — the tangent space at the identity element. Concretely: imagine a smooth curve through the identity in SO(3). The derivative of that curve at the identity is a matrix in the tangent space. The Lie algebra is the set of all such derivatives.

For SO(3), what matrices can appear as derivatives of rotation-matrix curves at the identity? A curve R(t) with R(0) = I and R(t) ∈ SO(3) for all t satisfies R(t)TR(t) = I. Differentiating at t=0:

Ọ(0)T · I + I · Ọ(0) = 0 ⇒ Ọ(0)T = −Ọ(0)

So the tangent vectors at the identity must be skew-symmetric matrices — matrices satisfying AT = −A. The Lie algebra of SO(3) is:

so(3) = { A ∈ ℝ3×3 : AT = −A }

A 3×3 skew-symmetric matrix has exactly 3 free parameters (the upper triangle determines the rest via the antisymmetry). So so(3) is a 3-dimensional vector space — matching the 3 degrees of freedom of rotation in 3D.

The general form

Any element of so(3) can be written:

Φ =
  [  0, −φ3,  φ2]
  [φ3,   0, −φ1]
  [−φ2, φ1,   0 ]

where φ = [φ₁, φ₂, φ₃]T ∈ ℝ³. The 3-vector φ uniquely specifies the skew matrix and vice versa. This bijection between ℝ³ and so(3) is formalized as the hat map (Chapter 3).

Similarly, the Lie algebra of SE(3) is se(3) — 4×4 matrices of the form [Φ, ρ; 0, 0] where Φ ∈ so(3) and ρ ∈ ℝ³. The 6-vector ξ = [φ, ρ]T is called a twist.

Worked dimension count

SO(3) is a 3×3 matrix group but has only 3 degrees of freedom. Where do the other 6 entries go? The constraint RTR = I imposes 6 independent equations (the 9 entries of I, but the matrix is symmetric so only 6 are independent). So: 9 free parameters minus 6 constraints = 3 DOF. ✓

SE(3): a 4×4 matrix with the bottom row fixed to [0,0,0,1] leaves 12 free entries. The R block imposes 6 constraints. The t block is free (3 entries). Net: 3 + 3 = 6 DOF. ✓

The Lie algebra dimension equals the manifold dimension — by construction. This is why so(3) is 3-dimensional (exactly 3 free parameters in a skew-symmetric matrix) and se(3) is 6-dimensional.

Misconception: the Lie algebra is not the set of "small" rotations. so(3) is an entirely different mathematical space from SO(3). Elements of so(3) are skew-symmetric matrices, not rotation matrices. The exponential map is what converts between them. An element of so(3) is a tangent vector — a direction and speed — not a rotation.
Why must tangent vectors to SO(3) at the identity be skew-symmetric matrices?

Chapter 3: Hat Map, Vee Map & Angular Velocity

The bijection between ℝ³ and so(3) is so useful it gets two dedicated operators. The hat map (·)^ turns a 3-vector into its corresponding skew-symmetric matrix. The vee map (·)∨ does the reverse.

φ̂ = (φ) =
  [  0, −φ3,  φ2]
  [φ3,   0, −φ1]
  [−φ2, φ1,   0 ]

And the inverse: given the skew matrix on the right, (·)∨ extracts the vector [φ₁, φ₂, φ₃]T.

Worked numbers — hat and vee

Let φ = [1, 2, 3]T. Then:

φ̂ = [1,2,3]̂ =
  [ 0, −3,  2]
  [ 3,  0, −1]
  [−2,  1,  0]

Verify: transpose gives [[0,−3,2],[3,0,−1],[−2,1,0]]T = [[0,3,−2],[−3,0,1],[2,−1,0]] = −φ̂. ✓ Skew-symmetric.

Vee: given [[0,−3,2],[3,0,−1],[−2,1,0]], extract upper triangle entries: (1,3) = 2 → φ₂, (1,2) = −3 → φ₃, (2,3) = −1 → φ₁. So (·)∨ = [1, 2, 3]T. ✓

Angular velocity interpretation

The hat map is not just notation. If a rigid body has angular velocity ω = [ω₁, ω₂, ω₃]T (radians/second), then the time derivative of its rotation matrix is:

Ṙ = ω̂ · R    (body angular velocity in world frame)

This is the kinematic equation of rotation. The skew matrix ω̂ acts as the "infinitesimal generator" of rotation — it tells you which direction in ℝ3×3 the rotation is moving. The hat map converts the physical quantity (angular velocity vector) into the algebraic object (element of so(3)) that drives the kinematics.

The hat map is also the cross-product matrix: φ̂ · v = φ × v for any v ∈ ℝ³. This is why ω̂R is the right formula: ωω̂R corresponds to rotating each column of R by the angular velocity — exactly what you want kinematically.

Key algebraic properties of the hat map

These identities come up constantly in derivations and implementations:

IdentityIn words
φ̂ v = φ × vSkew matrix acts as cross product
φ̂ = −φ̂TSkew-symmetric by definition
tr(φ̂) = 0Trace of any skew matrix is zero
û̂2 = uuT − ISquare of unit skew (used in Rodrigues proof)
û̂3 = −û̂The cubing identity (makes series collapse)
(aφ + bψ)̂ = aφ̂ + bψ̂Hat map is linear
[φ̂, ψ̂] = (φ × ψ)̂Lie bracket = cross product hat

The last row says the Lie bracket of so(3) corresponds to the cross product of ℝ³ — this is why so(3) and (ℝ³, ×) are isomorphic as Lie algebras. It also gives a concrete formula for the BCH commutator in so(3): [φ̂, ψ̂] = (φ × ψ)̂.

python
import numpy as np

# Verify key hat-map identities with concrete numbers
phi = np.array([1, 2, 3])
psi = np.array([4, -1, 2])
v   = np.array([1, 0, 0])

# Identity 1: φ̂v = φ×v
print(hat(phi) @ v)            # skew-matrix-times-v
print(np.cross(phi, v))         # cross product
# Both: [ 0.  3. -2.] ✓

# Identity 4: û̂² = uuᵀ − I
u = phi / np.linalg.norm(phi)
u_hat = hat(u)
print(np.round(u_hat @ u_hat, 4))       # û̂²
print(np.round(np.outer(u,u) - np.eye(3), 4))  # uuᵀ−I
# Identical ✓

# Identity 7: [φ̂,ψ̂] = (φ×ψ)̂
lie_bracket = hat(phi) @ hat(psi) - hat(psi) @ hat(phi)
cross_hat   = hat(np.cross(phi, psi))
print(np.allclose(lie_bracket, cross_hat))  # True ✓
Hat map visualizer — vector ↔ skew matrix

Drag the sliders to set φ = [φ₁, φ₂, φ₃]. The hat matrix is shown live, with color-coded entries. Orange = upper triangle (positive), teal = lower (negative, by antisymmetry). The vee map extracts φ back. Also shown: φ̂·v for a fixed test vector v = [1,0,0]T, verifying it equals φ × v.

φ11.0
φ22.0
φ33.0
Given φ = [0, 0, 1]T, what is φ̂ (the hat matrix)?

Chapter 4: Rodrigues' Formula — Derived from the Power Series

The matrix exponential of a square matrix A is defined by the power series:

exp(A) = I + A + A2/2! + A3/3! + A4/4! + …

For a general matrix, this series does not simplify. But for skew-symmetric matrices in so(3), it collapses to a beautiful closed form — Rodrigues' formula. The key is a remarkable property of the hat map.

Step 1: the cubing identity

Let φ ∈ ℝ³ with ‖φ‖ = θ (the rotation angle). Define the unit axis û = φ/θ. Then φ̂ = θû̂. Compute û̂³:

û̂2 = uuT − I    (outer product minus identity)
û̂3 = û̂2 · û̂ = (uuT − I)û̂ = uuTû̂ − û̂

Now û̂u = û × u = 0 (cross product of a vector with itself is zero), so uuTû̂ = u(ûTû̂) = 0. Therefore:

û̂3 = −û̂

This is the critical identity. It means the powers of û̂ cycle: û̂, û̂², −û̂, −û̂², +û̂, ... Substituting φ̂ = θû̂:

φ̂3 = θ3û̂3 = −θ3û̂ = −θ2φ̂
φ̂4 = −θ2φ̂2,   φ̂5 = θ4φ̂,   ...

Step 2: split the series

Expand exp(φ̂) and separate odd and even powers:

exp(φ̂) = I + φ̂ + φ̂2/2! + φ̂3/3! + φ̂4/4! + φ̂5/5! + …

= I + (θ − θ3/3! + θ5/5! − …)û̂
     + (θ2/2! − θ4/4! + θ6/6! − …)û̂2

The first series in parentheses is exactly sin(θ). The second is exactly 1 − cos(θ). Therefore:

exp(φ̂) = I + sin(θ)û̂ + (1 − cos(θ))û̂2

Substituting back û = φ/θ and û̂ = φ̂/θ:

R = exp(φ̂) = I + sin(θ)θ φ̂ + (1−cos(θ))θ2 φ̂2

This is Rodrigues' formula — the closed-form matrix exponential for so(3) elements. The exponential map exp: so(3) → SO(3) is exactly Rodrigues' formula. What Lecture 1 presented as an axiom, we have now derived from the power series.

Worked numbers — 90° about z

Let φ = [0, 0, π/2]T. Then θ = π/2, û = [0,0,1]T.

û̂ = [[0,−1,0],[1,0,0],[0,0,0]]

û̂2 = û̂·û̂ =
[[0,−1,0],[1,0,0],[0,0,0]] · [[0,−1,0],[1,0,0],[0,0,0]]
= [[−1,0,0],[0,−1,0],[0,0,0]]

sin(π/2) = 1,   1−cos(π/2) = 1

R = I + 1·û̂ + 1·û̂2
= [[1,0,0],[0,1,0],[0,0,1]]
+ [[0,−1,0],[1,0,0],[0,0,0]]
+ [[−1,0,0],[0,−1,0],[0,0,0]]
= [[0,−1,0],[1,0,0],[0,0,1]]

Compare with L1's Rz(90°) = [[0,−1,0],[1,0,0],[0,0,1]]. ✓ Exact match. The exponential map and the geometric rotation formula are the same thing.

The exp map is NOT an analogy — it IS the matrix exponential. Rodrigues' formula is what you get when you expand exp as a power series and use the cubing identity to collapse it. The formula did not come from geometric intuition; the geometry is a consequence of the algebra.
Rodrigues builder — axis, angle, series terms

Set the rotation axis (fixed to z for clarity) and angle θ. Watch the three terms I (grey), sin(θ)·û̂ (orange), (1−cosθ)·û̂² (teal) assemble into R. A square in body frame is rotated. The numeric result updates live.

Angle θ90°
python
import numpy as np

def hat(phi):
    """3-vector → 3×3 skew-symmetric matrix."""
    p1, p2, p3 = phi
    return np.array([[   0, -p3,  p2],
                     [ p3,   0, -p1],
                     [-p2,  p1,   0]])

def vee(Phi):
    """3×3 skew-symmetric matrix → 3-vector."""
    return np.array([Phi[2,1], Phi[0,2], Phi[1,0]])

def rodrigues(phi):
    """Rodrigues' formula: exp map so(3) → SO(3)."""
    theta = np.linalg.norm(phi)
    if theta < 1e-10:
        return np.eye(3) + hat(phi)  # first-order approx
    u_hat = hat(phi / theta)
    return np.eye(3) + np.sin(theta)*u_hat + (1-np.cos(theta))*u_hat@u_hat

# Test: 90° about z-axis
phi = np.array([0, 0, np.pi/2])
R = rodrigues(phi)
print(np.round(R, 4))
# [[ 0. -1.  0.]
#  [ 1.  0.  0.]
#  [ 0.  0.  1.]]

# Verify it's a rotation: R^T R = I, det = 1
print(np.allclose(R.T @ R, np.eye(3)))  # True
print(np.round(np.linalg.det(R), 4))    # 1.0
In the Rodrigues derivation, the series collapse relies on û̂³ = −û̂. What property of û produces this identity?

Chapter 5: The Logarithm Map

The exponential map goes from the Lie algebra to the Lie group: exp: so(3) → SO(3). The inverse is the logarithm map log: SO(3) → so(3). Given a rotation matrix R, the log recovers the tangent vector φ = θû that, when exponentiated, produces R.

Extracting the rotation angle

From Rodrigues' formula R = I + sin(θ)û̂ + (1−cos(θ))û̂², take the trace:

tr(R) = tr(I) + sin(θ)·tr(û̂) + (1−cos(θ))·tr(û̂2)
= 3 + 0 + (1−cos(θ))·(−2)
= 1 + 2cos(θ)

⇒   cos(θ) = tr(R) − 12   ⇒   θ = arccos(tr(R) − 12)

Where we used tr(û̂) = 0 (diagonal is all zeros) and tr(û̂²) = tr(uuT − I) = û·û − 3 = 1 − 3 = −2.

Extracting the rotation axis

From Rodrigues: R − RT = 2sin(θ)û̂. So:

log(R) = φ̂ = θ2sin(θ) (R − RT)    (when θ ≠ 0)

The vee of this gives φ = θû. Special cases: if R = I then θ = 0 and log(I) = 0. Near θ = π the formula is numerically unstable and a different branch formula is needed (beyond this lesson).

Worked numbers

Let R = Rz(90°) = [[0,−1,0],[1,0,0],[0,0,1]]. Recover φ:

tr(R) = 0 + 0 + 1 = 1  ⇒  θ = arccos((1−1)/2) = arccos(0) = π/2

R − RT = [[0,−1,0],[1,0,0],[0,0,1]] − [[0,1,0],[−1,0,0],[0,0,1]]
= [[0,−2,0],[2,0,0],[0,0,0]]

φ̂ = (π/2)2sin(π/2) · [[0,−2,0],[2,0,0],[0,0,0]]
= (π/2)2 · [[0,−2,0],[2,0,0],[0,0,0]]
= [[0,−π/2,0],[π/2,0,0],[0,0,0]]

φ = vee(φ̂) = [0, 0, π/2]T

We recover exactly the input φ = [0,0,π/2]T. ✓ Round-trip complete.

The θ = π singularity

The formula log(R) = θ/(2sin(θ)) · (R − RT) breaks when θ = π because sin(π) = 0. At θ = π, we have R − RT = 0 (verify: sin(π) = 0, so 2sin(θ)û̂ = 0). But R ≠ I! The issue is that at θ = π, the rotation axis û is not recoverable from R − RT alone.

Instead, at θ = π: R + RT = 2(cos(π)I + (1−cos(π))uuT) = 2(−I + 2uuT). So R + I = 2uuT (the columns of R + I give 2u), and the axis can be recovered from any nonzero column of R + I. In practice, choose the column with the largest norm for numerical stability.

Additionally, at θ = π there is a 2-fold ambiguity: the rotations exp(πû̂) = exp(−πû̂) = exp(π(−û)̂) are the same (π rotation about û = −π rotation about −û). This is an inherent topological property of SO(3) — its fundamental group π₁(SO(3)) = ℤ₂, which is why quaternions (the double cover) are often preferred near θ = π.

Practical rule: use the log formula freely for |θ| < π − ε. Near θ = ±π (within ~1°), switch to the robust formula using the columns of R + I. Most SLAM problems never reach θ = π (rotations of exactly 180° are rare in smooth trajectories), but you must handle it for robustness.
exp/log round-trip — tangent vector → R → back

Set axis (z fixed) and angle θ. Top row: exp(φ̂) → R (orange). Bottom: log(R) → φ̂ → φ (teal). Both paths shown. Slider past ±π shows the wrap-around ambiguity (±θ give the same R).

Angle θ90°
python
import numpy as np

def log_so3(R):
    """Log map SO(3) → so(3). Returns skew matrix φ̂."""
    cos_theta = (np.trace(R) - 1) / 2
    cos_theta = np.clip(cos_theta, -1, 1)  # numerical safety
    theta = np.arccos(cos_theta)
    if theta < 1e-10:
        return np.zeros((3,3))
    return theta / (2 * np.sin(theta)) * (R - R.T)

# Round-trip test
phi_in = np.array([0, 0, np.pi/2])
R = rodrigues(phi_in)
Phi_hat = log_so3(R)
phi_out = vee(Phi_hat)
print(np.round(phi_out, 4))  # [0. 0. 1.5708] = [0, 0, π/2] ✓

# Note: past θ=π, log returns the "short" path (|θ| ≤ π)
phi_big = np.array([0, 0, 5])  # > π
R2 = rodrigues(phi_big)
phi_recovered = vee(log_so3(R2))
print(np.round(phi_recovered, 4))  # wrapped: ≠ [0,0,5]
For R = Rz(180°) = [[−1,0,0],[0,−1,0],[0,0,1]], what is the rotation angle θ recovered by the log map?

Chapter 6: SE(3): Twists & the exp Map

Rigid-body poses T ∈ SE(3) form a Lie group too. The Lie algebra se(3) consists of 4×4 matrices of the form:

ξ̂ = [φ̂, ρ; 0T, 0] ∈ se(3)

where φ̂ ∈ so(3) is the rotational part and ρ ∈ ℝ³ is the translational part. The 6-vector ξ = [φ; ρ]T ∈ ℝ⁶ is called a twist — it encodes both angular velocity (φ) and linear velocity (ρ) simultaneously.

The SE(3) exponential map

Applying the matrix exponential to ξ̂ ∈ se(3) gives:

T = exp(ξ̂) = [exp(φ̂), Jl(φ)ρ; 0T, 1]

The rotation block is exactly Rodrigues applied to φ. The translation block is Jl(φ)ρ, not just ρ — the rotation "bends" the translation. Jl(φ) is the left Jacobian (also called the left-invariant Jacobian) of SO(3):

Jl(φ) = I + (1−cosθ)θ2 φ̂ + (θ−sinθ)θ3 φ̂2

When θ → 0, Jl(φ) → I, and the SE(3) exp map reduces to T = [I, ρ; 0, 1] — a pure translation. This makes sense: a small twist with no rotation just translates.

When φ ≠ 0, Jl couples the rotation and translation. Physically: if you rotate while translating, the effective displacement in world coordinates depends on the rotation. The Jacobian accounts for this coupling.

Worked numbers: SE(3) exp with rotation + translation

Let ξ = [φ; ρ] = [[0,0,π/2]; [1,0,0]]T — a 90° z-rotation combined with a 1 m x-translation in the body twist. Compute the left Jacobian:

φ = [0,0,π/2], θ = π/2

Jl(φ) = I + (1−cos(π/2))(π/2)2 φ̂ + (π/2−sin(π/2))(π/2)3 φ̂2
= I + 1(π/2)2 φ̂ + (π/2−1)(π/2)3 φ̂2

φ̂ (for [0,0,π/2]) = π/2 · [[0,−1,0],[1,0,0],[0,0,0]]

Jlρ ≠ ρ = [1,0,0]T

The translation in the resulting T is Jl[1,0,0]T ≠ [1,0,0]T — the rotation bends the translation. Physically: if you drive 1 m forward while rotating 90°, the effective endpoint is a curved arc, not a straight line. The Jacobian computes the chord of that arc exactly.

Do NOT confuse ρ with the translation t in T = [R,t;0,1]. The twist's ρ is the translational velocity in the Lie algebra. The actual translation t in the group element is Jl(φ)ρ — they differ whenever there is rotation. Setting t = ρ is a common error in implementing SE(3) optimization.
python
import numpy as np
from scipy.linalg import expm  # reference: matrix exponential

def hat_se3(xi):
    """6-vector [phi; rho] → 4×4 se(3) matrix."""
    phi, rho = xi[:3], xi[3:]
    Xi = np.zeros((4,4))
    Xi[:3,:3] = hat(phi)
    Xi[:3,3]  = rho
    return Xi

def left_jacobian(phi):
    """Left Jacobian J_l(phi) for SO(3)."""
    theta = np.linalg.norm(phi)
    if theta < 1e-10:
        return np.eye(3)
    ph = hat(phi)
    return (np.eye(3) + (1-np.cos(theta))/theta**2 * ph
            + (theta-np.sin(theta))/theta**3 * ph@ph)

def exp_se3(xi):
    """SE(3) exp map: 6-vector → 4×4 pose matrix."""
    phi, rho = xi[:3], xi[3:]
    R = rodrigues(phi)
    t = left_jacobian(phi) @ rho
    T = np.eye(4)
    T[:3,:3] = R
    T[:3,3]  = t
    return T

# Test: pure z-rotation + translation
xi = np.array([0, 0, np.pi/2, 1, 0, 0])  # 90° z, 1m x-translation
T = exp_se3(xi)
print(np.round(T, 4))
# R block = Rz(90°); t block = J_l([0,0,π/2])·[1,0,0] ≠ [1,0,0]
print(np.allclose(expm(hat_se3(xi)), T))  # True — matches scipy reference
For a pure translation (φ = 0, ρ = [1,0,0]), what does the SE(3) exp map produce?

Chapter 7: BCH Formula & Adjoint (Brief)

Two more structural facts about Lie groups that come up constantly in robotics: the Baker-Campbell-Hausdorff formula and the Adjoint.

Baker-Campbell-Hausdorff (BCH)

In ordinary calculus, eaeb = ea+b. For matrix exponentials of non-commuting matrices A, B:

exp(A)exp(B) = exp(A + B + 12[A,B] + 112[A,[A,B]] − 112[B,[A,B]] + …)

where [A,B] = AB − BA is the Lie bracket. For so(3), the Lie bracket is [φ̂₁, φ̂₂] = φ̂₁φ̂₂ − φ̂₂φ̂₁, which corresponds to the cross product: [φ₁]^[φ₂]^ = (φ₁ × φ₂)^.

The practical consequence: composing two rotations exp(A)exp(B) is not exp(A+B). The error depends on the commutator [A,B]. When A and B are small (near the identity), the BCH correction is second-order in the magnitudes, so for small perturbations it is often ignored. This is the justification for the first-order approximation used in iterative solvers.

BCH tells you how much rotation order matters. If you rotate 5° about x then 5° about y, you get a slightly different result than rotating 5° about y then 5° about x. The discrepancy is proportional to [φ₁ × φ₂] — the first BCH correction. For 5°, it is tiny. For 90°, it is significant.

Adjoint

The Adjoint of a group element g ∈ G is a linear map Adg: g → g that transforms Lie algebra elements between frames. For SO(3):

AdR(φ̂) = R φ̂ RT   ⇔   AdR(φ) = Rφ

This says: if you have a tangent vector φ expressed in the body frame, multiplying by R converts it to the world frame. For SE(3), the 6×6 Adjoint matrix is:

AdT = [[R, t̂R], [0, R]]

The Adjoint is used extensively in kinematics (converting velocities between frames) and in the derivation of Jacobians for pose estimation. You will see it again in the optimization lectures.

Why BCH matters for SLAM

In a factor-graph optimizer (like g2o or GTSAM), you often need to combine two pose updates: a "motion model" update and a "measurement" update. Each is expressed as an exp of a tangent vector. Naively concatenating: T_new = T · exp(δ₁) · exp(δ₂). If you want to express this as a single exp: T · exp(BCH(δ₁, δ₂)), you need BCH. In practice, optimizers sidestep this by iterating with small steps where BCH ≈ addition. But knowing BCH exists — and when it matters — prevents subtle bugs in large-angle scenarios.

Numerical verification of BCH

Let A = [0,0,0.5]̂ (0.5 rad about z) and B = [0,0.4,0]̂ (0.4 rad about y). Then:

exp(A)exp(B) ≠ exp(A+B)

exp(A+B): rotate by combined axis/angle [0, 0.4, 0.5]
exp(A)exp(B): first rotate 0.5 rad about z, then 0.4 rad about y

The difference (first BCH correction) = ½[A,B]
= ½(AB − BA) ≠ 0    (since rotations don't commute)

For these angles (~28° each), the BCH correction has Frobenius norm ≈ 0.1. For 5° each, it would be ≈ 0.004 — small but nonzero. At 1°, it is ~10⁻⁴. This is why linearized solvers work: they iterate with steps of a few degrees at most, making BCH negligible.

python
import numpy as np
from scipy.linalg import expm

# BCH verification: exp(A)exp(B) ≠ exp(A+B) for non-small angles
A_vec = np.array([0, 0, 0.5])  # 0.5 rad about z
B_vec = np.array([0, 0.4, 0])  # 0.4 rad about y
A = hat(A_vec); B = hat(B_vec)

R_AB  = expm(A) @ expm(B)      # exp(A)·exp(B)
R_sum = expm(A + B)             # exp(A+B) — first-order approx

diff = np.linalg.norm(R_AB - R_sum, 'fro')
print(f"‖exp(A)exp(B) − exp(A+B)‖_F = {diff:.4f}")  # ≈ 0.033

# BCH first-order correction: ½[A,B] = ½(AB−BA)
commutator = 0.5 * (A @ B - B @ A)
R_bch1 = expm(A + B + commutator)
diff2 = np.linalg.norm(R_AB - R_bch1, 'fro')
print(f"‖exp(A)exp(B) − exp(A+B+½[A,B])‖_F = {diff2:.4f}")  # ≈ 0.001 — much smaller
The BCH formula says exp(A)exp(B) = exp(A + B + ½[A,B] + …). When is the first-order approximation exp(A)exp(B) ≈ exp(A+B) most accurate?

Chapter 8: Showcase: Retraction Optimizer

This is the payoff. We simulate a simple rotation optimization: given a "target" rotation Rgoal, we want to find it starting from Rinit = I. A gradient descent step in SO(3) uses the retraction:

R ← R · exp(-α δ̂)

where δ = log(RT Rgoal)∨ is the tangent error (how far R is from Rgoal, expressed in the tangent space at R) and α is the step size. Compare this with a naive additive step R ← R − α·δ̃ that ignores the manifold — the naive step immediately leaves SO(3).

The canvas shows both paths. The circular arc (left) represents SO(2) rotations (the 2D analog of SO(3)) as angles on the unit circle. The retraction step (teal) stays on the circle. The additive step (red) drifts off it.

The full algorithm spelled out

Here is the retraction-based rotation optimizer in pseudocode, annotated with every Lie-theoretic step:

Initialize
R ← I (or any known rotation). Set target R_goal.
Compute error in the Lie algebra
δ̂ = log(RT R_goal) — the relative rotation expressed as a tangent vector at R. This is the "gradient direction" on the manifold.
Retraction step
R ← R · exp(α δ̂) — compose current rotation with exp of scaled tangent. exp(α δ̂) is a rotation by ‖δ‖·α radians about axis δ/‖δ‖. Result is in SO(3).
↻ repeat until ‖δ‖ < tol

The convergence rate is linear for constant α (gradient descent). For Gauss-Newton (used in SLAM), α is replaced by the Levenberg-Marquardt damping — much faster. But the retraction structure is identical. The exp map replaces the "add" operation at every single step.

Why does exp(α δ̂) keep us on SO(3)? Because exp maps so(3) → SO(3) by the Rodrigues formula, and SO(3) is closed under composition: R · exp(α δ̂) ∈ SO(3) whenever R ∈ SO(3) and exp(α δ̂) ∈ SO(3).

Retraction vs. additive step — manifold optimizer

Set the Goal angle (orange star) and step size α. Press Step repeatedly. Teal dot = retraction R←R·exp(δ̂), stays on circle. Red dot = naive additive step, flies off. Reset resets both to identity (angle 0). Right panel shows determinant of each: retraction stays 1.0, additive drifts.

Goal angle120°
Step size α0.30
python
import numpy as np

def retract_step(R, R_goal, alpha):
    """One retraction step toward R_goal. Stays in SO(3)."""
    delta_hat = log_so3(R.T @ R_goal)     # tangent error at R
    delta = vee(delta_hat)                 # error as 3-vector
    R_new = R @ rodrigues(alpha * delta)   # retraction: R ← R·exp(α·δ̂)
    return R_new

def additive_step(R, R_goal, alpha):
    """Naive additive step — BREAKS SO(3) structure."""
    delta = R_goal - R
    R_new = R + alpha * delta             # NOT a rotation matrix!
    return R_new

# Simulate: optimize from identity toward Rz(120°)
R = np.eye(3)
R_goal = rodrigues(np.array([0, 0, 2*np.pi/3]))

for i in range(10):
    R = retract_step(R, R_goal, alpha=0.3)
    det = np.linalg.det(R)
    err = np.linalg.norm(vee(log_so3(R.T @ R_goal)))
    print(f"Step {i+1}: det={det:.6f}, error={err:.4f} rad")
# det stays ≈1.0 throughout; error decreases to 0

Chapter 9: Connections & Cheat Sheet

The complete Lie theory cheat sheet

ConceptSO(3)SE(3)
Group elementsR ∈ ℝ3×3, RTR=I, det=1T = [R,t;0,1] ∈ ℝ4×4
Lie algebraso(3): skew-symmetric 3×3se(3): [φ̂,ρ;0,0] 4×4
Dimension3 (= #DOF of rotation)6 (= #DOF of pose)
Hat mapφ ∈ ℝ³ → φ̂ ∈ so(3)ξ ∈ ℝ⁶ → ξ̂ ∈ se(3)
Exp mapexp(φ̂) = Rodriguesexp(ξ̂) = [Rodrigues(φ), Jlρ; 0, 1]
Log maplog(R) → φ̂ via θ = arccos((tr−1)/2)Closed form in notes
RetractionR ← R·exp(δ̂)T ← T·exp(δ̂)
AdjointAdR(φ) = Rφ6×6 matrix [[R, t̂R],[0,R]]

The "rotations don't add" demo — verified

We claimed R₁ + R₂ is not a rotation. Here are the numbers. Let R₁ = Rz(45°), R₂ = Rz(45°). Then:

R1 + R2 = 2·Rz(45°)   (scale by 2)
det(R1 + R2) = det(2R) = 23·det(R) = 8 ≠ 1
(R1+R2)T(R1+R2) = 4I ≠ I

Now try R₁ · R₂ = Rz(45°)·Rz(45°) = Rz(90°). Orthonormal ✓, det=1 ✓. Composition via multiplication preserves SO(3); addition does not.

"Rotations don't add" — R₁ + R₂ leaves SO(3)

Orange = R₁·R₂ (composition, valid rotation). Red = R₁+R₂ (additive, invalid). Drag angle to see both applied to the unit square. The columns of R₁+R₂ are no longer unit-length — det and ‖col‖ shown live.

Angle θ (both R₁=R₂=θ/2)90°

Distances on SO(3): three flavors

The log map also gives us a principled notion of "how far apart" two rotations are — something impossible to define without Lie theory.

DistanceFormulaProperties
Angular (geodesic)distθ(RA,RB) = ‖log(RATRB)∨‖True geodesic on SO(3). Bi-invariant. Range [0,π].
Chordaldistc(RA,RB) = ‖RA − RBFEuclidean distance in ℝ9. Easier to compute. Not geodesic.
Quaterniondistq(qA,qB) = min(‖qA−qB‖, ‖qA+qB‖)Handles double-cover (q=−q for same rotation). Pseudo-metric.

The angular distance is the "right" geodesic distance — the length of the shortest path along the SO(3) manifold. The chordal distance is the length of the chord cutting through the ambient ℝ9 space. They relate by: distc = 2√2·|sin(θ/2)|, where θ = distθ. For small angles (θ ≪ 1): distc ≈ √2 · distθ. They agree to first order, but diverge for large rotations.

In SLAM, the angular distance appears in error terms for rotation optimization (because it is geodesic and intrinsic to SO(3)). The chordal distance appears in initialization and closed-form solvers (because it is algebraically simpler to minimize). Knowing which one your optimizer uses matters: minimizing chordal distance does NOT minimize angular distance except at the optimum.

python
import numpy as np

def dist_angular(Ra, Rb):
    """Geodesic (angular) distance between two rotations."""
    err = Ra.T @ Rb         # relative rotation
    phi = vee(log_so3(err)) # tangent error
    return np.linalg.norm(phi)

def dist_chordal(Ra, Rb):
    """Chordal distance (Frobenius norm of difference)."""
    return np.linalg.norm(Ra - Rb, 'fro')

Ra = rodrigues(np.array([0, 0, np.pi/4]))  # Rz(45°)
Rb = rodrigues(np.array([0, 0, np.pi/2]))  # Rz(90°)
print(f"Angular dist: {dist_angular(Ra,Rb):.4f} rad ({dist_angular(Ra,Rb)*180/np.pi:.1f}°)")
# 0.7854 rad (45.0°)
print(f"Chordal dist: {dist_chordal(Ra,Rb):.4f}")
# ≈ 0.7654 (≈ √2·sin(22.5°)·2 — slightly different)

What's next

This lesson
Lie groups, exp/log maps, retraction — the mathematical machinery
VNAV L9–L10 (Optimization on Manifold)
Use retraction R←R·exp(δ̂) inside SLAM/VIO cost functions. Gauss-Newton and LM on SO(3)/SE(3).
VNAV L11+ (Factor Graphs, SLAM)
Lie theory is the hidden backbone of every pose-graph optimizer (g2o, GTSAM, Ceres).

Related lessons

Putting it all together: a full retraction step with numbers

Let's trace a complete optimization step. Current R = I (identity). Goal: R_goal = Rz(60°).

Step 1 — compute error in tangent space:
Δ = RT R_goal = I · Rz(60°) = Rz(60°)
δ̂ = log(Rz(60°))
tr(Rz(60°)) = 2cos(60°)+1 = 2·0.5+1 = 2, so θ = arccos((2−1)/2) = arccos(0.5) = 60°
R−RT = [[0,−1,0],[1,0,0],[0,0,0]] · √3
δ̂ = (60°/π) / (2·sin(60°)) · (R−RT) = (π/3)/(2·√3/2) · [...]
⇒ δ = [0, 0, 60°]T (axis-angle, as expected)

Step 2 — retraction step (α = 0.5):
R_new = I · exp(0.5 · [0,0,60°]̂) = exp([0,0,30°]̂) = Rz(30°)

Step 3 — recompute error:
Δ = Rz(30°)T · Rz(60°) = Rz(−30°) · Rz(60°) = Rz(30°)
θ = 30° — halved (geometric convergence with α = 0.5) ✓

Notice: at every step, R is a valid rotation matrix (determinant = 1, columns orthonormal). The retraction guarantees this automatically. The naive additive step R ← R + 0.5·δ̂ would give R + a small skew matrix — not in SO(3), but close enough to corrupt subsequent computations.

This geometric convergence (error halves each step for α = 0.5) is exactly what you see in the canvas above. Run 3-4 steps to verify.

The Feynman test for this lesson: Can you, without notes, derive Rodrigues' formula from the power series? The path: (1) write exp(φ̂) as a series, (2) use û̂³=−û̂ to show odd powers are proportional to û̂ and even to û̂², (3) recognize the sin and (1−cos) Taylor series, (4) assemble. If you can do that, you have internalized Lie theory for robotics.
A drone's current orientation is R ∈ SO(3). You want to apply a small rotation update δ ∈ ℝ³. Which update rule keeps R in SO(3)?