Lie Groups: SO(3), SE(3) & the Exponential Map

Chapter 0: The Optimization Problem

Imagine you are running a drone localization system. Every few milliseconds, new IMU data arrives and you need to update your estimate of the drone's orientation — a rotation matrix R ∈ SO(3). The natural thing to do in optimization is take a gradient step: compute a small correction δ and set R ← R + δ. This works for scalars, for position vectors, for anything that lives in a flat Euclidean space.

It does not work for rotation matrices. Here is the concrete disaster. Suppose R is the identity I₃ (facing forward) and you add a matrix ε that represents a small rotation. The sum I₃ + ε is not a rotation matrix. Its columns are no longer unit length, no longer orthogonal. R^TR ≠ I. det ≠ 1. Your "rotation" is now a shear or a scaling — physically meaningless. Every subsequent operation that assumed R ∈ SO(3) gives garbage.

The core problem: SO(3) is not a vector space. You cannot add two rotation matrices and get a rotation matrix. You cannot average them. You cannot take a gradient step. The set of valid rotations is a curved surface — a manifold — embedded in ℝ^3×3, and straight lines in ℝ^3×3 leave that surface immediately.

The solution is Lie theory. Every rotation matrix R has a tangent space — a flat 3D vector space that "touches" the manifold at R. You express your update δ as a vector in the tangent space (specifically, at the identity). The exponential map exp curves that flat update back onto the manifold, producing a valid rotation. The update rule becomes:

R ← R · exp(δ̂)

where δ ∈ ℝ³ is the tangent update, δ̂ is its 3×3 skew-symmetric matrix form, and exp(δ̂) is the matrix exponential. This retraction is the foundational operation in SLAM, visual odometry, and any optimization over rotations. This lesson derives every piece from scratch.

Why three representations? A preview

By the end of this lesson you will have seen three things that are all "the same rotation" expressed differently:

Object	Space	Form	Use
R ∈ SO(3)	Lie group (manifold)	3×3 matrix	Apply rotation to vectors, compose rotations
φ̂ ∈ so(3)	Lie algebra (tangent space at I)	3×3 skew matrix	Represent updates, compute Jacobians
φ ∈ ℝ³	Euclidean vector space	3-vector (axis×angle)	Store compactly, do linear operations on updates

The hat map converts φ ↔ φ̂ (trivially). The exp/log maps convert φ̂ ↔ R (non-trivially — this is where Rodrigues lives). Understanding all three objects and both maps is the core of Lie theory for robotics.

Manifold vs. flat space — why adding rotations breaks them

Left: the red circle represents SO(2) — unit-det 2×2 rotations (angle θ). The blue line is the tangent space at the identity. Adding δ along the tangent (orange dot) lifts OFF the circle. Mapping via exp (teal dot) stays ON it. Drag δ to see the gap grow.

Tangent update δ 0.80

You have a valid rotation matrix R ∈ SO(3) and a small correction matrix ε. Why is R + ε NOT a valid rotation matrix in general?

Because addition is not defined for matrices — only multiplication is Because ε might have negative entries which rotation matrices cannot have Because the columns of R + ε are no longer orthonormal — the constraint R^TR = I is violated, so the result is not in SO(3) Because you need to normalize first — divide by the Frobenius norm of R + ε

Chapter 1: Groups & Lie Groups

Before we can talk about the exponential map, we need to understand what kind of mathematical object SO(3) is. The answer: it is a Lie group. A Lie group is both a group (in the algebraic sense) and a smooth manifold (in the geometric sense). Both parts matter.

Group axioms

A group (G, ⊗) is a set G with a binary operation ⊗ satisfying four axioms:

Axiom	Statement	For SO(3)
Closure	A ⊗ B ∈ G for all A,B ∈ G	R₁R₂ is a rotation if R₁,R₂ are
Associativity	(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C)	Matrix multiplication is associative
Identity	∃ I: A ⊗ I = I ⊗ A = A	I₃ — the identity matrix
Inverse	∃ A⁻¹: A ⊗ A⁻¹ = I	R⁻¹ = R^T

SO(3) satisfies all four with matrix multiplication as ⊗. Note what is absent: commutativity. R₁R₂ ≠ R₂R₁ in general — SO(3) is a non-abelian group. Rotations in 3D do not commute, a fact that will return to haunt us in Chapter 7.

Proof that SO(3) is non-abelian — with numbers

Let R₁ = Rz(90°) and R₂ = Rx(90°). Compute both orders:

R_z(90°) = [[0,−1,0],[1,0,0],[0,0,1]]
R_x(90°) = [[1,0,0],[0,0,−1],[0,1,0]]

R_zR_x:
row0: [0·1+(−1)·0+0·0, 0·0+(−1)·0+0·1, 0·0+(−1)·(−1)+0·0] = [0, 0, 1]
row1: [1·1+0·0+0·0, 1·0+0·0+0·1, 1·0+0·(−1)+0·0] = [1, 0, 0]
row2: [0·1+0·0+1·0, 0·0+0·0+1·1, 0·0+0·(−1)+1·0] = [0, 1, 0]
⇒ R_zR_x = [[0,0,1],[1,0,0],[0,1,0]]

R_xR_z:
row0 = [0,−1,0] (same as R_z row0 since R_x row0=[1,0,0])
row1 = [0·0+(−1)·1+0·0, 0·(−1)+(−1)·0+0·0, ...] ⇒ [−1, 0, 0] (row1 = R_z row2 negated...)
⇒ R_xR_z = [[0,−1,0],[0,0,−1],[1,0,0]]

R_zR_x ≠ R_xR_z — the "final orientation" depends on which rotation came first. This is not a numerical artifact. It reflects the fundamental topology of SO(3).

SE(3) is also a group under matrix multiplication. Its identity element is the 4×4 identity. Its inverse is [[R^T, −R^Tt],[0, 1]], as derived in L1.

Manifold structure

A smooth manifold is a space that looks locally flat — every neighborhood of every point is homeomorphic to ℝⁿ for some n. SO(3) is a 3-dimensional manifold embedded in ℝ⁹ (the space of 3×3 matrices): locally, near any rotation, the set of valid rotations looks like a 3D flat space. SE(3) is a 6-dimensional manifold (3 for rotation, 3 for translation).

A Lie group is a smooth manifold that is also a group, with smooth group operations. SO(3), SE(3), O(d), and the general linear group GL(d,ℝ) are all matrix Lie groups. The key benefit: they admit a tangent space at every point, giving us the machinery to do calculus — and optimization — on them.

Key insight: a Lie group is a curved space with algebraic structure. The "curved" part (manifold) means you cannot add elements like vectors. The "algebraic" part (group) means you can compose elements, invert them, and navigate relative to the identity. The exponential map bridges both worlds: it converts flat tangent-space vectors into curved group elements.

Which of these is NOT a property of the group SO(3) under matrix multiplication?

Closure: R₁R₂ ∈ SO(3) whenever R₁, R₂ ∈ SO(3) Existence of inverse: R⁻¹ = R^T ∈ SO(3) Identity element: I₃ ∈ SO(3) Commutativity: R₁R₂ = R₂R₁ for all R₁, R₂ ∈ SO(3)

Chapter 2: The Lie Algebra so(3)

Every matrix Lie group G has an associated Lie algebra g — the tangent space at the identity element. Concretely: imagine a smooth curve through the identity in SO(3). The derivative of that curve at the identity is a matrix in the tangent space. The Lie algebra is the set of all such derivatives.

For SO(3), what matrices can appear as derivatives of rotation-matrix curves at the identity? A curve R(t) with R(0) = I and R(t) ∈ SO(3) for all t satisfies R(t)^TR(t) = I. Differentiating at t=0:

Ọ(0)^T · I + I · Ọ(0) = 0 ⇒ Ọ(0)^T = −Ọ(0)

So the tangent vectors at the identity must be skew-symmetric matrices — matrices satisfying A^T = −A. The Lie algebra of SO(3) is:

so(3) = { A ∈ ℝ^3×3 : A^T = −A }

A 3×3 skew-symmetric matrix has exactly 3 free parameters (the upper triangle determines the rest via the antisymmetry). So so(3) is a 3-dimensional vector space — matching the 3 degrees of freedom of rotation in 3D.

The general form

Any element of so(3) can be written:

Φ =
  [  0, −φ₃, φ₂]
  [φ₃,   0, −φ₁]
  [−φ₂, φ₁,   0 ]

where φ = [φ₁, φ₂, φ₃]^T ∈ ℝ³. The 3-vector φ uniquely specifies the skew matrix and vice versa. This bijection between ℝ³ and so(3) is formalized as the hat map (Chapter 3).

Similarly, the Lie algebra of SE(3) is se(3) — 4×4 matrices of the form [Φ, ρ; 0, 0] where Φ ∈ so(3) and ρ ∈ ℝ³. The 6-vector ξ = [φ, ρ]^T is called a twist.

Worked dimension count

SO(3) is a 3×3 matrix group but has only 3 degrees of freedom. Where do the other 6 entries go? The constraint R^TR = I imposes 6 independent equations (the 9 entries of I, but the matrix is symmetric so only 6 are independent). So: 9 free parameters minus 6 constraints = 3 DOF. ✓

SE(3): a 4×4 matrix with the bottom row fixed to [0,0,0,1] leaves 12 free entries. The R block imposes 6 constraints. The t block is free (3 entries). Net: 3 + 3 = 6 DOF. ✓

The Lie algebra dimension equals the manifold dimension — by construction. This is why so(3) is 3-dimensional (exactly 3 free parameters in a skew-symmetric matrix) and se(3) is 6-dimensional.

Misconception: the Lie algebra is not the set of "small" rotations. so(3) is an entirely different mathematical space from SO(3). Elements of so(3) are skew-symmetric matrices, not rotation matrices. The exponential map is what converts between them. An element of so(3) is a tangent vector — a direction and speed — not a rotation.

Why must tangent vectors to SO(3) at the identity be skew-symmetric matrices?

Because rotation matrices have positive entries and their derivatives must balance positive and negative Because skew-symmetric matrices have determinant zero, and the derivative of det(R) = 1 is zero Differentiating R(t)^TR(t) = I at t=0 gives Ṙ(0)^T + Ṙ(0) = 0, which is exactly the definition of skew-symmetry Skew-symmetric matrices are the closest to zero in Frobenius norm, minimizing the update step

Chapter 3: Hat Map, Vee Map & Angular Velocity

The bijection between ℝ³ and so(3) is so useful it gets two dedicated operators. The hat map (·)^ turns a 3-vector into its corresponding skew-symmetric matrix. The vee map (·)∨ does the reverse.

φ̂ = (φ)^⋀ =
  [  0, −φ₃, φ₂]
  [φ₃,   0, −φ₁]
  [−φ₂, φ₁,   0 ]

And the inverse: given the skew matrix on the right, (·)∨ extracts the vector [φ₁, φ₂, φ₃]^T.

Worked numbers — hat and vee

Let φ = [1, 2, 3]^T. Then:

φ̂ = [1,2,3]̂ =
  [ 0, −3, 2]
  [ 3, 0, −1]
  [−2, 1, 0]

Verify: transpose gives [[0,−3,2],[3,0,−1],[−2,1,0]]^T = [[0,3,−2],[−3,0,1],[2,−1,0]] = −φ̂. ✓ Skew-symmetric.

Vee: given [[0,−3,2],[3,0,−1],[−2,1,0]], extract upper triangle entries: (1,3) = 2 → φ₂, (1,2) = −3 → φ₃, (2,3) = −1 → φ₁. So (·)∨ = [1, 2, 3]^T. ✓

Angular velocity interpretation

The hat map is not just notation. If a rigid body has angular velocity ω = [ω₁, ω₂, ω₃]^T (radians/second), then the time derivative of its rotation matrix is:

Ṙ = ω̂ · R (body angular velocity in world frame)

This is the kinematic equation of rotation. The skew matrix ω̂ acts as the "infinitesimal generator" of rotation — it tells you which direction in ℝ^3×3 the rotation is moving. The hat map converts the physical quantity (angular velocity vector) into the algebraic object (element of so(3)) that drives the kinematics.

The hat map is also the cross-product matrix: φ̂ · v = φ × v for any v ∈ ℝ³. This is why ω̂R is the right formula: ωω̂R corresponds to rotating each column of R by the angular velocity — exactly what you want kinematically.

Key algebraic properties of the hat map

These identities come up constantly in derivations and implementations:

Identity	In words
φ̂ v = φ × v	Skew matrix acts as cross product
φ̂ = −φ̂^T	Skew-symmetric by definition
tr(φ̂) = 0	Trace of any skew matrix is zero
û̂² = uu^T − I	Square of unit skew (used in Rodrigues proof)
û̂³ = −û̂	The cubing identity (makes series collapse)
(aφ + bψ)̂ = aφ̂ + bψ̂	Hat map is linear
[φ̂, ψ̂] = (φ × ψ)̂	Lie bracket = cross product hat

The last row says the Lie bracket of so(3) corresponds to the cross product of ℝ³ — this is why so(3) and (ℝ³, ×) are isomorphic as Lie algebras. It also gives a concrete formula for the BCH commutator in so(3): [φ̂, ψ̂] = (φ × ψ)̂.

python
import numpy as np

# Verify key hat-map identities with concrete numbers
phi = np.array([1, 2, 3])
psi = np.array([4, -1, 2])
v   = np.array([1, 0, 0])

# Identity 1: φ̂v = φ×v
print(hat(phi) @ v)            # skew-matrix-times-v
print(np.cross(phi, v))         # cross product
# Both: [ 0.  3. -2.] ✓

# Identity 4: û̂² = uuᵀ − I
u = phi / np.linalg.norm(phi)
u_hat = hat(u)
print(np.round(u_hat @ u_hat, 4))       # û̂²
print(np.round(np.outer(u,u) - np.eye(3), 4))  # uuᵀ−I
# Identical ✓

# Identity 7: [φ̂,ψ̂] = (φ×ψ)̂
lie_bracket = hat(phi) @ hat(psi) - hat(psi) @ hat(phi)
cross_hat   = hat(np.cross(phi, psi))
print(np.allclose(lie_bracket, cross_hat))  # True ✓

Hat map visualizer — vector ↔ skew matrix

Drag the sliders to set φ = [φ₁, φ₂, φ₃]. The hat matrix is shown live, with color-coded entries. Orange = upper triangle (positive), teal = lower (negative, by antisymmetry). The vee map extracts φ back. Also shown: φ̂·v for a fixed test vector v = [1,0,0]^T, verifying it equals φ × v.

φ₁1.0

φ₂2.0

φ₃3.0

Given φ = [0, 0, 1]^T, what is φ̂ (the hat matrix)?

[[1,0,0],[0,1,0],[0,0,1]] — the identity [[0,−1,0],[1,0,0],[0,0,0]] — skew with φ₃=1 in position (2,1) [[0,0,1],[0,0,0],[−1,0,0]] — wrong sign pattern [[0,1,0],[−1,0,0],[0,0,0]] — upper triangle has +1

Chapter 4: Rodrigues' Formula — Derived from the Power Series

The matrix exponential of a square matrix A is defined by the power series:

exp(A) = I + A + A²/2! + A³/3! + A⁴/4! + …

For a general matrix, this series does not simplify. But for skew-symmetric matrices in so(3), it collapses to a beautiful closed form — Rodrigues' formula. The key is a remarkable property of the hat map.

Step 1: the cubing identity

Let φ ∈ ℝ³ with ‖φ‖ = θ (the rotation angle). Define the unit axis û = φ/θ. Then φ̂ = θû̂. Compute û̂³:

û̂² = uu^T − I (outer product minus identity)
û̂³ = û̂² · û̂ = (uu^T − I)û̂ = uu^Tû̂ − û̂

Now û̂u = û × u = 0 (cross product of a vector with itself is zero), so uu^Tû̂ = u(û^Tû̂) = 0. Therefore:

û̂³ = −û̂

This is the critical identity. It means the powers of û̂ cycle: û̂, û̂², −û̂, −û̂², +û̂, ... Substituting φ̂ = θû̂:

φ̂³ = θ³û̂³ = −θ³û̂ = −θ²φ̂
φ̂⁴ = −θ²φ̂², φ̂⁵ = θ⁴φ̂, ...

Step 2: split the series

Expand exp(φ̂) and separate odd and even powers:

exp(φ̂) = I + φ̂ + φ̂²/2! + φ̂³/3! + φ̂⁴/4! + φ̂⁵/5! + …

= I + (θ − θ³/3! + θ⁵/5! − …)û̂
+ (θ²/2! − θ⁴/4! + θ⁶/6! − …)û̂²

The first series in parentheses is exactly sin(θ). The second is exactly 1 − cos(θ). Therefore:

exp(φ̂) = I + sin(θ)û̂ + (1 − cos(θ))û̂²

Substituting back û = φ/θ and û̂ = φ̂/θ:

R = exp(φ̂) = I + ^sin(θ)⁄_θ φ̂ + ^{(1−cos(θ))}⁄_θ² φ̂²

This is Rodrigues' formula — the closed-form matrix exponential for so(3) elements. The exponential map exp: so(3) → SO(3) is exactly Rodrigues' formula. What Lecture 1 presented as an axiom, we have now derived from the power series.

Worked numbers — 90° about z

Let φ = [0, 0, π/2]^T. Then θ = π/2, û = [0,0,1]^T.

û̂ = [[0,−1,0],[1,0,0],[0,0,0]]

û̂² = û̂·û̂ =
[[0,−1,0],[1,0,0],[0,0,0]] · [[0,−1,0],[1,0,0],[0,0,0]]
= [[−1,0,0],[0,−1,0],[0,0,0]]

sin(π/2) = 1, 1−cos(π/2) = 1

R = I + 1·û̂ + 1·û̂²
= [[1,0,0],[0,1,0],[0,0,1]]
+ [[0,−1,0],[1,0,0],[0,0,0]]
+ [[−1,0,0],[0,−1,0],[0,0,0]]
= [[0,−1,0],[1,0,0],[0,0,1]]

Compare with L1's Rz(90°) = [[0,−1,0],[1,0,0],[0,0,1]]. ✓ Exact match. The exponential map and the geometric rotation formula are the same thing.

The exp map is NOT an analogy — it IS the matrix exponential. Rodrigues' formula is what you get when you expand exp as a power series and use the cubing identity to collapse it. The formula did not come from geometric intuition; the geometry is a consequence of the algebra.

Rodrigues builder — axis, angle, series terms

Set the rotation axis (fixed to z for clarity) and angle θ. Watch the three terms I (grey), sin(θ)·û̂ (orange), (1−cosθ)·û̂² (teal) assemble into R. A square in body frame is rotated. The numeric result updates live.

Angle θ90°

python
import numpy as np

def hat(phi):
    """3-vector → 3×3 skew-symmetric matrix."""
    p1, p2, p3 = phi
    return np.array([[   0, -p3,  p2],
                     [ p3,   0, -p1],
                     [-p2,  p1,   0]])

def vee(Phi):
    """3×3 skew-symmetric matrix → 3-vector."""
    return np.array([Phi[2,1], Phi[0,2], Phi[1,0]])

def rodrigues(phi):
    """Rodrigues' formula: exp map so(3) → SO(3)."""
    theta = np.linalg.norm(phi)
    if theta < 1e-10:
        return np.eye(3) + hat(phi)  # first-order approx
    u_hat = hat(phi / theta)
    return np.eye(3) + np.sin(theta)*u_hat + (1-np.cos(theta))*u_hat@u_hat

# Test: 90° about z-axis
phi = np.array([0, 0, np.pi/2])
R = rodrigues(phi)
print(np.round(R, 4))
# [[ 0. -1.  0.]
#  [ 1.  0.  0.]
#  [ 0.  0.  1.]]

# Verify it's a rotation: R^T R = I, det = 1
print(np.allclose(R.T @ R, np.eye(3)))  # True
print(np.round(np.linalg.det(R), 4))    # 1.0

In the Rodrigues derivation, the series collapse relies on û̂³ = −û̂. What property of û produces this identity?

û is a unit vector, so û·û = 1, and cross products of a vector with itself are zero û̂ is orthogonal, so û̂^Tû̂ = I, giving û̂³ = û̂·I = û̂ û̂u = û × u = 0 (cross product with itself), so uu^Tû̂ = 0, and û̂² = uu^T − I gives û̂³ = (uu^T−I)û̂ = −û̂ Skew-symmetric matrices always have û̂³ = −û̂ regardless of the vector

Chapter 5: The Logarithm Map

The exponential map goes from the Lie algebra to the Lie group: exp: so(3) → SO(3). The inverse is the logarithm map log: SO(3) → so(3). Given a rotation matrix R, the log recovers the tangent vector φ = θû that, when exponentiated, produces R.

Extracting the rotation angle

From Rodrigues' formula R = I + sin(θ)û̂ + (1−cos(θ))û̂², take the trace:

tr(R) = tr(I) + sin(θ)·tr(û̂) + (1−cos(θ))·tr(û̂²)
= 3 + 0 + (1−cos(θ))·(−2)
= 1 + 2cos(θ)

⇒ cos(θ) = ^{tr(R) − 1}⁄₂ ⇒ θ = arccos(^{tr(R) − 1}⁄₂)

Where we used tr(û̂) = 0 (diagonal is all zeros) and tr(û̂²) = tr(uu^T − I) = û·û − 3 = 1 − 3 = −2.

Extracting the rotation axis

From Rodrigues: R − R^T = 2sin(θ)û̂. So:

log(R) = φ̂ = ^θ⁄_2sin(θ) (R − R^T) (when θ ≠ 0)

The vee of this gives φ = θû. Special cases: if R = I then θ = 0 and log(I) = 0. Near θ = π the formula is numerically unstable and a different branch formula is needed (beyond this lesson).

Worked numbers

Let R = Rz(90°) = [[0,−1,0],[1,0,0],[0,0,1]]. Recover φ:

tr(R) = 0 + 0 + 1 = 1 ⇒ θ = arccos((1−1)/2) = arccos(0) = π/2

R − R^T = [[0,−1,0],[1,0,0],[0,0,1]] − [[0,1,0],[−1,0,0],[0,0,1]]
= [[0,−2,0],[2,0,0],[0,0,0]]

φ̂ = ^(π/2)⁄_2sin(π/2) · [[0,−2,0],[2,0,0],[0,0,0]]
= ^(π/2)⁄₂ · [[0,−2,0],[2,0,0],[0,0,0]]
= [[0,−π/2,0],[π/2,0,0],[0,0,0]]

φ = vee(φ̂) = [0, 0, π/2]^T

We recover exactly the input φ = [0,0,π/2]^T. ✓ Round-trip complete.

The θ = π singularity

The formula log(R) = θ/(2sin(θ)) · (R − R^T) breaks when θ = π because sin(π) = 0. At θ = π, we have R − R^T = 0 (verify: sin(π) = 0, so 2sin(θ)û̂ = 0). But R ≠ I! The issue is that at θ = π, the rotation axis û is not recoverable from R − R^T alone.

Instead, at θ = π: R + R^T = 2(cos(π)I + (1−cos(π))uu^T) = 2(−I + 2uu^T). So R + I = 2uu^T (the columns of R + I give 2u), and the axis can be recovered from any nonzero column of R + I. In practice, choose the column with the largest norm for numerical stability.

Additionally, at θ = π there is a 2-fold ambiguity: the rotations exp(πû̂) = exp(−πû̂) = exp(π(−û)̂) are the same (π rotation about û = −π rotation about −û). This is an inherent topological property of SO(3) — its fundamental group π₁(SO(3)) = ℤ₂, which is why quaternions (the double cover) are often preferred near θ = π.

Practical rule: use the log formula freely for |θ| < π − ε. Near θ = ±π (within ~1°), switch to the robust formula using the columns of R + I. Most SLAM problems never reach θ = π (rotations of exactly 180° are rare in smooth trajectories), but you must handle it for robustness.

exp/log round-trip — tangent vector → R → back

Set axis (z fixed) and angle θ. Top row: exp(φ̂) → R (orange). Bottom: log(R) → φ̂ → φ (teal). Both paths shown. Slider past ±π shows the wrap-around ambiguity (±θ give the same R).

Angle θ90°

python
import numpy as np

def log_so3(R):
    """Log map SO(3) → so(3). Returns skew matrix φ̂."""
    cos_theta = (np.trace(R) - 1) / 2
    cos_theta = np.clip(cos_theta, -1, 1)  # numerical safety
    theta = np.arccos(cos_theta)
    if theta < 1e-10:
        return np.zeros((3,3))
    return theta / (2 * np.sin(theta)) * (R - R.T)

# Round-trip test
phi_in = np.array([0, 0, np.pi/2])
R = rodrigues(phi_in)
Phi_hat = log_so3(R)
phi_out = vee(Phi_hat)
print(np.round(phi_out, 4))  # [0. 0. 1.5708] = [0, 0, π/2] ✓

# Note: past θ=π, log returns the "short" path (|θ| ≤ π)
phi_big = np.array([0, 0, 5])  # > π
R2 = rodrigues(phi_big)
phi_recovered = vee(log_so3(R2))
print(np.round(phi_recovered, 4))  # wrapped: ≠ [0,0,5]

For R = Rz(180°) = [[−1,0,0],[0,−1,0],[0,0,1]], what is the rotation angle θ recovered by the log map?

0° — the trace is −1 so arccos gives a negative angle, which maps to 0° 90° — half of 180° 180° (π radians) — tr(R) = −1, so cos(θ) = (−1−1)/2 = −1, giving θ = π 360° — a full rotation

Chapter 6: SE(3): Twists & the exp Map

Rigid-body poses T ∈ SE(3) form a Lie group too. The Lie algebra se(3) consists of 4×4 matrices of the form:

ξ̂ = [φ̂, ρ; 0^T, 0] ∈ se(3)

where φ̂ ∈ so(3) is the rotational part and ρ ∈ ℝ³ is the translational part. The 6-vector ξ = [φ; ρ]^T ∈ ℝ⁶ is called a twist — it encodes both angular velocity (φ) and linear velocity (ρ) simultaneously.

The SE(3) exponential map

Applying the matrix exponential to ξ̂ ∈ se(3) gives:

T = exp(ξ̂) = [exp(φ̂), J_l(φ)ρ; 0^T, 1]

The rotation block is exactly Rodrigues applied to φ. The translation block is J_l(φ)ρ, not just ρ — the rotation "bends" the translation. J_l(φ) is the left Jacobian (also called the left-invariant Jacobian) of SO(3):

J_l(φ) = I + ^(1−cosθ)⁄_θ² φ̂ + ^(θ−sinθ)⁄_θ³ φ̂²

When θ → 0, J_l(φ) → I, and the SE(3) exp map reduces to T = [I, ρ; 0, 1] — a pure translation. This makes sense: a small twist with no rotation just translates.

When φ ≠ 0, J_l couples the rotation and translation. Physically: if you rotate while translating, the effective displacement in world coordinates depends on the rotation. The Jacobian accounts for this coupling.

Worked numbers: SE(3) exp with rotation + translation

Let ξ = [φ; ρ] = [[0,0,π/2]; [1,0,0]]^T — a 90° z-rotation combined with a 1 m x-translation in the body twist. Compute the left Jacobian:

φ = [0,0,π/2], θ = π/2

J_l(φ) = I + ^{(1−cos(π/2))}⁄_(π/2)² φ̂ + ^{(π/2−sin(π/2))}⁄_(π/2)³ φ̂²
= I + ¹⁄_(π/2)² φ̂ + ^(π/2−1)⁄_(π/2)³ φ̂²

φ̂ (for [0,0,π/2]) = π/2 · [[0,−1,0],[1,0,0],[0,0,0]]

J_lρ ≠ ρ = [1,0,0]^T

The translation in the resulting T is J_l[1,0,0]^T ≠ [1,0,0]^T — the rotation bends the translation. Physically: if you drive 1 m forward while rotating 90°, the effective endpoint is a curved arc, not a straight line. The Jacobian computes the chord of that arc exactly.

Do NOT confuse ρ with the translation t in T = [R,t;0,1]. The twist's ρ is the translational velocity in the Lie algebra. The actual translation t in the group element is J_l(φ)ρ — they differ whenever there is rotation. Setting t = ρ is a common error in implementing SE(3) optimization.

python
import numpy as np
from scipy.linalg import expm  # reference: matrix exponential

def hat_se3(xi):
    """6-vector [phi; rho] → 4×4 se(3) matrix."""
    phi, rho = xi[:3], xi[3:]
    Xi = np.zeros((4,4))
    Xi[:3,:3] = hat(phi)
    Xi[:3,3]  = rho
    return Xi

def left_jacobian(phi):
    """Left Jacobian J_l(phi) for SO(3)."""
    theta = np.linalg.norm(phi)
    if theta < 1e-10:
        return np.eye(3)
    ph = hat(phi)
    return (np.eye(3) + (1-np.cos(theta))/theta**2 * ph
            + (theta-np.sin(theta))/theta**3 * ph@ph)

def exp_se3(xi):
    """SE(3) exp map: 6-vector → 4×4 pose matrix."""
    phi, rho = xi[:3], xi[3:]
    R = rodrigues(phi)
    t = left_jacobian(phi) @ rho
    T = np.eye(4)
    T[:3,:3] = R
    T[:3,3]  = t
    return T

# Test: pure z-rotation + translation
xi = np.array([0, 0, np.pi/2, 1, 0, 0])  # 90° z, 1m x-translation
T = exp_se3(xi)
print(np.round(T, 4))
# R block = Rz(90°); t block = J_l([0,0,π/2])·[1,0,0] ≠ [1,0,0]
print(np.allclose(expm(hat_se3(xi)), T))  # True — matches scipy reference

For a pure translation (φ = 0, ρ = [1,0,0]), what does the SE(3) exp map produce?

T = [Rz(1rad), [1,0,0]; 0, 1] — ρ's x-component causes a z-axis rotation proportional to its magnitude T = [I, [1,0,0]; 0, 1] — J_l(0)=I, so the translation t = J_l(φ)ρ = ρ = [1,0,0]; a pure translation T = [I, [0,0,0]; 0, 1] — the translational part is always zero when there is no rotation T = [I, J_l(ρ)ρ; 0, 1] where J_l is evaluated at ρ, not φ

Chapter 7: BCH Formula & Adjoint (Brief)

Two more structural facts about Lie groups that come up constantly in robotics: the Baker-Campbell-Hausdorff formula and the Adjoint.

Baker-Campbell-Hausdorff (BCH)

In ordinary calculus, e^ae^b = e^a+b. For matrix exponentials of non-commuting matrices A, B:

exp(A)exp(B) = exp(A + B + ¹⁄₂[A,B] + ¹⁄₁₂[A,[A,B]] − ¹⁄₁₂[B,[A,B]] + …)

where [A,B] = AB − BA is the Lie bracket. For so(3), the Lie bracket is [φ̂₁, φ̂₂] = φ̂₁φ̂₂ − φ̂₂φ̂₁, which corresponds to the cross product: [φ₁]^[φ₂]^ = (φ₁ × φ₂)^.

The practical consequence: composing two rotations exp(A)exp(B) is not exp(A+B). The error depends on the commutator [A,B]. When A and B are small (near the identity), the BCH correction is second-order in the magnitudes, so for small perturbations it is often ignored. This is the justification for the first-order approximation used in iterative solvers.

BCH tells you how much rotation order matters. If you rotate 5° about x then 5° about y, you get a slightly different result than rotating 5° about y then 5° about x. The discrepancy is proportional to [φ₁ × φ₂] — the first BCH correction. For 5°, it is tiny. For 90°, it is significant.

Adjoint

The Adjoint of a group element g ∈ G is a linear map Ad_g: g → g that transforms Lie algebra elements between frames. For SO(3):

Ad_R(φ̂) = R φ̂ R^T ⇔ Ad_R(φ) = Rφ

This says: if you have a tangent vector φ expressed in the body frame, multiplying by R converts it to the world frame. For SE(3), the 6×6 Adjoint matrix is:

Ad_T = [[R, t̂R], [0, R]]

The Adjoint is used extensively in kinematics (converting velocities between frames) and in the derivation of Jacobians for pose estimation. You will see it again in the optimization lectures.

Why BCH matters for SLAM

In a factor-graph optimizer (like g2o or GTSAM), you often need to combine two pose updates: a "motion model" update and a "measurement" update. Each is expressed as an exp of a tangent vector. Naively concatenating: T_new = T · exp(δ₁) · exp(δ₂). If you want to express this as a single exp: T · exp(BCH(δ₁, δ₂)), you need BCH. In practice, optimizers sidestep this by iterating with small steps where BCH ≈ addition. But knowing BCH exists — and when it matters — prevents subtle bugs in large-angle scenarios.

Numerical verification of BCH

Let A = [0,0,0.5]̂ (0.5 rad about z) and B = [0,0.4,0]̂ (0.4 rad about y). Then:

exp(A)exp(B) ≠ exp(A+B)

exp(A+B): rotate by combined axis/angle [0, 0.4, 0.5]
exp(A)exp(B): first rotate 0.5 rad about z, then 0.4 rad about y

The difference (first BCH correction) = ½[A,B]
= ½(AB − BA) ≠ 0 (since rotations don't commute)

For these angles (~28° each), the BCH correction has Frobenius norm ≈ 0.1. For 5° each, it would be ≈ 0.004 — small but nonzero. At 1°, it is ~10⁻⁴. This is why linearized solvers work: they iterate with steps of a few degrees at most, making BCH negligible.

python
import numpy as np
from scipy.linalg import expm

# BCH verification: exp(A)exp(B) ≠ exp(A+B) for non-small angles
A_vec = np.array([0, 0, 0.5])  # 0.5 rad about z
B_vec = np.array([0, 0.4, 0])  # 0.4 rad about y
A = hat(A_vec); B = hat(B_vec)

R_AB  = expm(A) @ expm(B)      # exp(A)·exp(B)
R_sum = expm(A + B)             # exp(A+B) — first-order approx

diff = np.linalg.norm(R_AB - R_sum, 'fro')
print(f"‖exp(A)exp(B) − exp(A+B)‖_F = {diff:.4f}")  # ≈ 0.033

# BCH first-order correction: ½[A,B] = ½(AB−BA)
commutator = 0.5 * (A @ B - B @ A)
R_bch1 = expm(A + B + commutator)
diff2 = np.linalg.norm(R_AB - R_bch1, 'fro')
print(f"‖exp(A)exp(B) − exp(A+B+½[A,B])‖_F = {diff2:.4f}")  # ≈ 0.001 — much smaller

The BCH formula says exp(A)exp(B) = exp(A + B + ½[A,B] + …). When is the first-order approximation exp(A)exp(B) ≈ exp(A+B) most accurate?

When A and B are both rotation matrices (group elements, not algebra elements) When A and B commute: [A,B] = AB − BA = 0 When A and B are small (near zero) — the commutator [A,B] is then second-order in their magnitudes When both A and B are diagonal matrices

Chapter 8: Showcase: Retraction Optimizer

This is the payoff. We simulate a simple rotation optimization: given a "target" rotation R_goal, we want to find it starting from R_init = I. A gradient descent step in SO(3) uses the retraction:

R ← R · exp(-α δ̂)

where δ = log(R^T R_goal)∨ is the tangent error (how far R is from R_goal, expressed in the tangent space at R) and α is the step size. Compare this with a naive additive step R ← R − α·δ̃ that ignores the manifold — the naive step immediately leaves SO(3).

The canvas shows both paths. The circular arc (left) represents SO(2) rotations (the 2D analog of SO(3)) as angles on the unit circle. The retraction step (teal) stays on the circle. The additive step (red) drifts off it.

The full algorithm spelled out

Here is the retraction-based rotation optimizer in pseudocode, annotated with every Lie-theoretic step:

Initialize

R ← I (or any known rotation). Set target R_goal.

↓

Compute error in the Lie algebra

δ̂ = log(R^T R_goal) — the relative rotation expressed as a tangent vector at R. This is the "gradient direction" on the manifold.

↓

Retraction step

R ← R · exp(α δ̂) — compose current rotation with exp of scaled tangent. exp(α δ̂) is a rotation by ‖δ‖·α radians about axis δ/‖δ‖. Result is in SO(3).

↻ repeat until ‖δ‖ < tol

The convergence rate is linear for constant α (gradient descent). For Gauss-Newton (used in SLAM), α is replaced by the Levenberg-Marquardt damping — much faster. But the retraction structure is identical. The exp map replaces the "add" operation at every single step.

Why does exp(α δ̂) keep us on SO(3)? Because exp maps so(3) → SO(3) by the Rodrigues formula, and SO(3) is closed under composition: R · exp(α δ̂) ∈ SO(3) whenever R ∈ SO(3) and exp(α δ̂) ∈ SO(3).

Retraction vs. additive step — manifold optimizer

Set the Goal angle (orange star) and step size α. Press Step repeatedly. Teal dot = retraction R←R·exp(δ̂), stays on circle. Red dot = naive additive step, flies off. Reset resets both to identity (angle 0). Right panel shows determinant of each: retraction stays 1.0, additive drifts.

Goal angle120°

Step size α0.30

python
import numpy as np

def retract_step(R, R_goal, alpha):
    """One retraction step toward R_goal. Stays in SO(3)."""
    delta_hat = log_so3(R.T @ R_goal)     # tangent error at R
    delta = vee(delta_hat)                 # error as 3-vector
    R_new = R @ rodrigues(alpha * delta)   # retraction: R ← R·exp(α·δ̂)
    return R_new

def additive_step(R, R_goal, alpha):
    """Naive additive step — BREAKS SO(3) structure."""
    delta = R_goal - R
    R_new = R + alpha * delta             # NOT a rotation matrix!
    return R_new

# Simulate: optimize from identity toward Rz(120°)
R = np.eye(3)
R_goal = rodrigues(np.array([0, 0, 2*np.pi/3]))

for i in range(10):
    R = retract_step(R, R_goal, alpha=0.3)
    det = np.linalg.det(R)
    err = np.linalg.norm(vee(log_so3(R.T @ R_goal)))
    print(f"Step {i+1}: det={det:.6f}, error={err:.4f} rad")
# det stays ≈1.0 throughout; error decreases to 0

Chapter 9: Connections & Cheat Sheet

The complete Lie theory cheat sheet

Concept	SO(3)	SE(3)
Group elements	R ∈ ℝ^3×3, R^TR=I, det=1	T = [R,t;0,1] ∈ ℝ^4×4
Lie algebra	so(3): skew-symmetric 3×3	se(3): [φ̂,ρ;0,0] 4×4
Dimension	3 (= #DOF of rotation)	6 (= #DOF of pose)
Hat map	φ ∈ ℝ³ → φ̂ ∈ so(3)	ξ ∈ ℝ⁶ → ξ̂ ∈ se(3)
Exp map	exp(φ̂) = Rodrigues	exp(ξ̂) = [Rodrigues(φ), J_lρ; 0, 1]
Log map	log(R) → φ̂ via θ = arccos((tr−1)/2)	Closed form in notes
Retraction	R ← R·exp(δ̂)	T ← T·exp(δ̂)
Adjoint	Ad_R(φ) = Rφ	6×6 matrix [[R, t̂R],[0,R]]

The "rotations don't add" demo — verified

We claimed R₁ + R₂ is not a rotation. Here are the numbers. Let R₁ = Rz(45°), R₂ = Rz(45°). Then:

R₁ + R₂ = 2·R_z(45°) (scale by 2)
det(R₁ + R₂) = det(2R) = 2³·det(R) = 8 ≠ 1
(R₁+R₂)^T(R₁+R₂) = 4I ≠ I

Now try R₁ · R₂ = Rz(45°)·Rz(45°) = Rz(90°). Orthonormal ✓, det=1 ✓. Composition via multiplication preserves SO(3); addition does not.

"Rotations don't add" — R₁ + R₂ leaves SO(3)

Orange = R₁·R₂ (composition, valid rotation). Red = R₁+R₂ (additive, invalid). Drag angle to see both applied to the unit square. The columns of R₁+R₂ are no longer unit-length — det and ‖col‖ shown live.

Angle θ (both R₁=R₂=θ/2)90°

Distances on SO(3): three flavors

The log map also gives us a principled notion of "how far apart" two rotations are — something impossible to define without Lie theory.

Distance	Formula	Properties
Angular (geodesic)	dist_θ(R_A,R_B) = ‖log(R_A^TR_B)∨‖	True geodesic on SO(3). Bi-invariant. Range [0,π].
Chordal	dist_c(R_A,R_B) = ‖R_A − R_B‖_F	Euclidean distance in ℝ⁹. Easier to compute. Not geodesic.
Quaternion	dist_q(q_A,q_B) = min(‖q_A−q_B‖, ‖q_A+q_B‖)	Handles double-cover (q=−q for same rotation). Pseudo-metric.

The angular distance is the "right" geodesic distance — the length of the shortest path along the SO(3) manifold. The chordal distance is the length of the chord cutting through the ambient ℝ⁹ space. They relate by: dist_c = 2√2·|sin(θ/2)|, where θ = dist_θ. For small angles (θ ≪ 1): dist_c ≈ √2 · dist_θ. They agree to first order, but diverge for large rotations.

In SLAM, the angular distance appears in error terms for rotation optimization (because it is geodesic and intrinsic to SO(3)). The chordal distance appears in initialization and closed-form solvers (because it is algebraically simpler to minimize). Knowing which one your optimizer uses matters: minimizing chordal distance does NOT minimize angular distance except at the optimum.

python
import numpy as np

def dist_angular(Ra, Rb):
    """Geodesic (angular) distance between two rotations."""
    err = Ra.T @ Rb         # relative rotation
    phi = vee(log_so3(err)) # tangent error
    return np.linalg.norm(phi)

def dist_chordal(Ra, Rb):
    """Chordal distance (Frobenius norm of difference)."""
    return np.linalg.norm(Ra - Rb, 'fro')

Ra = rodrigues(np.array([0, 0, np.pi/4]))  # Rz(45°)
Rb = rodrigues(np.array([0, 0, np.pi/2]))  # Rz(90°)
print(f"Angular dist: {dist_angular(Ra,Rb):.4f} rad ({dist_angular(Ra,Rb)*180/np.pi:.1f}°)")
# 0.7854 rad (45.0°)
print(f"Chordal dist: {dist_chordal(Ra,Rb):.4f}")
# ≈ 0.7654 (≈ √2·sin(22.5°)·2 — slightly different)

What's next

This lesson

Lie groups, exp/log maps, retraction — the mathematical machinery

↓

VNAV L9–L10 (Optimization on Manifold)

Use retraction R←R·exp(δ̂) inside SLAM/VIO cost functions. Gauss-Newton and LM on SO(3)/SE(3).

↓

VNAV L11+ (Factor Graphs, SLAM)

Lie theory is the hidden backbone of every pose-graph optimizer (g2o, GTSAM, Ceres).

Related lessons

VNAV L1 — 3D Geometry: Rotations, Rigid Transforms & Frames (prerequisite)
Kalman Filter — uses SE(3) retraction in the update step for VIO
Extended Kalman Filter (EKF) — linearizes about a manifold point using the Jacobian
Classical SLAM — pose-graph optimization directly uses the retraction

Putting it all together: a full retraction step with numbers

Let's trace a complete optimization step. Current R = I (identity). Goal: R_goal = Rz(60°).

Step 1 — compute error in tangent space:
Δ = R^T R_goal = I · R_z(60°) = R_z(60°)
δ̂ = log(R_z(60°))
tr(R_z(60°)) = 2cos(60°)+1 = 2·0.5+1 = 2, so θ = arccos((2−1)/2) = arccos(0.5) = 60°
R−R^T = [[0,−1,0],[1,0,0],[0,0,0]] · √3
δ̂ = (60°/π) / (2·sin(60°)) · (R−R^T) = (π/3)/(2·√3/2) · [...]
⇒ δ = [0, 0, 60°]^T (axis-angle, as expected)

Step 2 — retraction step (α = 0.5):
R_new = I · exp(0.5 · [0,0,60°]̂) = exp([0,0,30°]̂) = R_z(30°)

Step 3 — recompute error:
Δ = R_z(30°)^T · R_z(60°) = R_z(−30°) · R_z(60°) = R_z(30°)
θ = 30° — halved (geometric convergence with α = 0.5) ✓

Notice: at every step, R is a valid rotation matrix (determinant = 1, columns orthonormal). The retraction guarantees this automatically. The naive additive step R ← R + 0.5·δ̂ would give R + a small skew matrix — not in SO(3), but close enough to corrupt subsequent computations.

This geometric convergence (error halves each step for α = 0.5) is exactly what you see in the canvas above. Run 3-4 steps to verify.

The Feynman test for this lesson: Can you, without notes, derive Rodrigues' formula from the power series? The path: (1) write exp(φ̂) as a series, (2) use û̂³=−û̂ to show odd powers are proportional to û̂ and even to û̂², (3) recognize the sin and (1−cos) Taylor series, (4) assemble. If you can do that, you have internalized Lie theory for robotics.

A drone's current orientation is R ∈ SO(3). You want to apply a small rotation update δ ∈ ℝ³. Which update rule keeps R in SO(3)?

R ← R + δ̂ — additive update with the skew matrix R ← R + δ — additive update with the raw vector R ← R · exp(δ̂) — retraction: compose with the exponential of the tangent update R ← normalize(R + δ̂) — add then re-orthogonalize with SVD