Visual Navigation for Autonomous Vehicles · MIT 16.485 · Lecture 23–24

SLAM: Factor Graphs, Sparsity & Marginalization

Your robot has driven 2 km, building a map as it goes. Visual odometry accumulated 20 m of drift. A single place-recognition match (L12) just detected a loop: the robot is back where it started. That one constraint — out of thousands — should snap the entire 2 km trajectory and all landmark positions into a consistent global map. Simultaneously. In under a second. How? This is the SLAM back-end: joint MAP estimation over all poses and all landmarks, structured as a sparse factor graph, solved with sparse Cholesky decomposition. MIT 16.485 Lectures 23–24 by Luca Carlone.

Prerequisites: VNAV L9 (NLLS & Gauss-Newton) · VNAV L10 (optimization on manifolds) · VNAV L11 (VO/VIO drift) · VNAV L12 (loop closures).

Chapters

Live Canvases

Hand

Derived Numbers

Chapter 0: The Robot That Needs to Know Everything at Once

A drone completes a 2 km outdoor flight collecting stereo images and IMU data. Visual odometry (L11) estimates poses one step at a time, compounding small errors into ~20 m of drift. The map of 3D landmarks is distorted by the same errors. Now place recognition (L12) fires: the current image matches a keyframe from 8 minutes ago — the drone is back at the start. That single measurement says "pose now ≈ pose then." How does this one constraint fix 2 km of trajectory and tens of thousands of landmark positions?

The answer: we never actually threw away the old poses. A SLAM back-end holds the full state — all robot poses T₁, T₂, ..., T_n and all landmark positions l₁, ..., l_K — and finds the joint MAP estimate that best explains every measurement we have ever collected. When the loop-closure measurement arrives, the optimizer re-solves for everything simultaneously. The drift redistributes evenly around the loop.

Loop closure doesn't fix one pose — it reshapes the ENTIRE trajectory at once. The optimizer has to rebalance the tension among hundreds of odometry factors, all pulling in slightly different directions, until the new loop-closure constraint is satisfied. This is why you need a global optimizer, not a sequential filter.

The full SLAM state

Define the joint state as θ = (T₁, ..., T_n, l₁, ..., l_K). Every measurement — odometry step, landmark observation, and now a loop closure — contributes a residual that penalizes disagreement with θ. The SLAM back-end solves:

θ* = argmin_θ ∑_{(i,j) ∈ E} ‖r_ij(θ)‖²_{Σ_ij⁻¹}

where E is the set of all measurements (odometry edges, landmark measurements, loop closures), r_ij is the residual for measurement (i,j), and Σ_ij is its noise covariance. This is the same nonlinear least-squares framework from L9 — now applied to a graph with thousands of variables.

The naive computational nightmare

If we have n=1000 poses in SE(3) (6 DOF each) and K=5000 landmarks (3D each), the state vector θ has 6000+15000 = 21 000 dimensions. A dense Gauss-Newton step requires inverting a 21 000 × 21 000 matrix — about 9 × 10¹² floating-point operations. At 10 GFLOPS, that is 900 seconds per iteration. Clearly the dense approach is hopeless. The saving grace: this matrix is overwhelmingly sparse. Exploiting that sparsity is what makes SLAM tractable.

Key insight: each measurement links at most 2-3 variables. A drone pose at time t₂₀₀ appears in odometry factors T₁₉₉↔T₂₀₀ and T₂₀₀↔T₂₀₁, plus landmark observations involving T₂₀₀. That is ~10 nonzero blocks out of 21 000. The Hessian is >99.9% zeros.

A tiny worked example to anchor the rest of the lesson

Three poses, one landmark, one loop closure. The state θ = (T₁, T₂, T₃, l₁). Measurements: odometry T₁→T₂, odometry T₂→T₃, landmark observations (T₁,l₁) and (T₃,l₁), plus one loop closure T₁↔T₃. Imagine: without the loop closure, accumulated odometry drift puts T₃ 0.8 m off where it should be. With the loop closure, the optimizer pulls all three poses into consistency — each shifting by about 0.27 m — and the landmark lands in the correct global position. We will track this example through Chapters 1–7.

Factor Graph Overview: 3-Pose Example

Variable nodes (circles = poses, squares = landmarks) and factor nodes (tiny diamonds). Hover over a factor to see what it constrains.

After a 2 km loop with 1% VO drift, a loop-closure constraint is verified and added to the SLAM graph. Which statement correctly describes what happens next?

Only the most recent pose is corrected; past poses are already committed. The loop-closure factor directly zeroes out the drift at the loop-closure pose, leaving other poses unchanged. The global optimizer re-solves for ALL poses and landmarks simultaneously, distributing the ~20 m drift evenly around the loop. The SLAM system resets all poses to the last GPS fix and starts odometry fresh.

Chapter 3: Why the System Is Sparse

The Jacobian J of the full residual vector r(θ) has one block-row per measurement and one block-column per variable. Row (i,j) corresponds to measurement (i,j); its only nonzero blocks are in the columns for variables i and j. Everything else is zero. This is because the residual r_ij depends only on variable i and variable j — no other variable affects it.

Sparsity pattern = factor graph topology

The Hessian H = J^TW J has a block at position (a,b) if and only if variables a and b share at least one common factor. In a pose graph: consecutive poses T_t and T_t+1 share odometry factors, so H has blocks at (T_t, T_t+1) and (T_t+1, T_t). A loop closure between T₅ and T₄₇ adds blocks at (T₅, T₄₇). Poses that never share a measurement have no block in H — those are exactly the zero entries.

Sparsity theorem (informal): in a SLAM problem where each measurement involves at most k variables and the measurement graph has m edges, the Hessian has at most m · k² nonzero blocks out of a possible n² blocks (n = number of variables). For typical SLAM, each measurement involves 2 variables (k=2), and m ≈ n (each variable has O(1) measurements on average) → approximately 4n nonzero blocks vs n² possible. Sparsity fraction ≈ 4/n → 0 as n grows. A 1000-variable SLAM problem is 99.6% zeros.

Concrete sparsity pattern: toy example

Return to our 3-pose + 1-landmark example, 2D (pose dims: 3, landmark dims: 2). Variable ordering: [T₁, T₂, T₃, l₁] — block sizes [3,3,3,2] → total 11. Measurement set: odometry (1,2), odometry (2,3), observation (1,l₁), observation (3,l₁), loop closure (1,3). Count nonzero blocks in H (symmetric, so off-diagonal counts double):
• (1,1): appears in odo(1,2), obs(1,l₁), lc(1,3) — 3 measurements → diagonal block nonzero
• (1,2): only odo(1,2) → nonzero
• (1,3): loop closure (1,3) → nonzero
• (1,l₁): obs(1,l₁) → nonzero
• (2,2): only odo(1,2) and odo(2,3)
• (2,3): odo(2,3) → nonzero
• (3,3): odo(2,3), obs(3,l₁), lc(1,3)
• (3,l₁): obs(3,l₁) → nonzero
• (l₁,l₁): obs(1,l₁), obs(3,l₁)
Total: 9 nonzero blocks out of 16 possible (4×4). Without the loop closure: only 7 blocks. Loop closure adds 1 new off-diagonal block (T₁,T₃).

Sparsity Pattern of the Information Matrix

Filled blocks = nonzero. Toggle loop closure to see the extra block it adds. Hover a block to see which measurement creates it.

A pose graph has n=6 poses and only odometry factors (no landmarks, no loop closures). The Hessian is n×n = 6×6 block matrix. How many distinct nonzero blocks does it have (counting each off-diagonal block only once, including diagonal)?

36 (fully dense) 11 (6 diagonal + 5 off-diagonal odometry blocks) 5 (only the off-diagonal odometry blocks) 6 (one per pose)

Chapter 4: Normal Equations & Sparse Cholesky

One Gauss-Newton iteration linearizes r(θ̂) ≈ J δ + r₀ and solves for the step δ. Substituting into the NLLS objective and differentiating gives the normal equations:

(J^T W J) δ* = J^T W r₀

The matrix H = J^TWJ is the information matrix (also called the Hessian). It is symmetric positive definite (when the problem has a unique minimum). The right-hand side b = J^TW r₀ is the gradient. Both H and b are assembled by summing contributions from each measurement: H += J_ij^T W_ij J_ij, b += J_ij^T W_ij r_ij.

Sparse Cholesky decomposition

Since H is sparse, symmetric, and positive definite, we solve Hδ = b using sparse Cholesky factorization: H = L L^T where L is a lower-triangular matrix. Then solve L y = b (forward substitution) and L^T δ = y (back substitution). The cost scales with the number of nonzeros in L, not in H — and L is only slightly denser than H.

Why not just invert H? Matrix inversion of a sparse matrix gives a dense result — you lose all the sparsity. Factorization preserves sparsity in L (almost). The fill-in in L (new nonzeros created during Cholesky that were zero in H) depends critically on variable ordering — this is Chapter 7's story.

Misconception: "Modern SLAM uses iterative methods like conjugate gradient." Some do, but the dominant approach in high-accuracy SLAM (iSAM2, GTSAM) is direct sparse Cholesky with incremental factorization. Direct methods are preferred because (a) they give exact solutions (not approximations), (b) sparse Cholesky on real SLAM sparsity patterns is extremely fast, and (c) incremental updates can re-factorize only the changed variables.

Hand numbers: assembling H for the toy example

Odometry factor (1,2): residual dim = 3, so J₁₂ is 3×11 with nonzero 3×3 blocks at columns for T₁ and T₂. Call them A₁ and A₂ (3×3 Jacobian sub-blocks with W₁₂ = I₃ for simplicity). Contribution to H: H₁₁ += A₁^TA₁, H₁₂ += A₁^TA₂, H₂₂ += A₂^TA₂. Sum over all 5 measurements. For the loop-closure factor (1,3): adds to H₁₁, H₁₃, H₃₃. This is the only path by which T₁ and T₃ become directly coupled in H — before the loop closure, they were not neighbors in the information matrix.

Ordering & Fill-in: Cholesky Nonzeros

Drag the slider to reorder variables. Watch fill-in (red = Cholesky fill) grow with bad ordering.

Variable order Poses-first

When solving the sparse SLAM normal equations Hδ=b, why is it better to use sparse Cholesky factorization than direct matrix inversion of H?

Direct inversion is unstable for non-symmetric matrices. Matrix inversion of a sparse matrix produces a dense result, losing all computational benefit of sparsity. Cholesky is only valid for diagonal matrices. Cholesky works on non-symmetric matrices while inversion does not.

Chapter 5: Bundle Adjustment — SLAM Without Odometry

Bundle adjustment (BA) is the visual-SLAM back-end when measurements are only landmark projections — no odometry. The name comes from "bundles" of light rays from the camera centers to each 3D point: we jointly adjust all camera poses and point positions until each bundle aligns with the observed image coordinates. BA is the standard back-end in Structure from Motion (SfM) systems like COLMAP.

BA vs SLAM: what's the same, what differs

The objective is exactly (23.7) from the notes: minimize ∑ ‖z̄_k,t − π(T_t, l_k)‖²_{Σ_l⁻¹}. The factor graph has only projection factors (no odometry edges between consecutive poses). In SfM, the "poses" correspond to different tourist cameras photographing the same monument — no temporal ordering, no odometry. In SLAM, odometry factors are added on top.

The sparsity pattern is strikingly different from a pure pose graph. In BA: the Hessian has a characteristic arrow pattern — a dense upper-left pose–pose block, a dense lower-right landmark–landmark block, and off-diagonal pose–landmark blocks. The landmark–landmark diagonal block is purely block-diagonal (landmarks are only coupled through shared poses, not directly). This arrow pattern is famous and is why the Schur complement trick (used in Chapter 7) is especially efficient for BA: eliminate all landmarks first (cheap, diagonal blocks), leaving a smaller dense system over poses only.

The "arrow" sparsity pattern of BA means you can marginalize out all K landmarks in O(K) time (they are block-diagonal), leaving a system of size n (poses). For n=100 cameras and K=10 000 points, this reduces the system from 10 100 variables to 100 in a single Schur complement step. This is the standard "reduced camera system" in SfM solvers.

SLAM = BA + odometry

Adding odometry to BA (giving visual SLAM) fills in the pose–pose block of H along the diagonal and superdiagonal — the chain of sequential pose connections. The landmark structure remains block-diagonal. The combined sparsity is the union of the two patterns. Solving the full system is still efficient because the landmark block can still be marginalized cheaply.

python  — BA factor graph assembly (toy: 3 poses, 2 landmarks)
import numpy as np

# Projection Jacobian blocks (2×3 each for pose, 2×3 for landmark)
# J_full is (2*obs, 3*n + 3*K) = (12, 9+6) = (12, 15)
n_poses, n_lm, obs_per = 3, 2, 2  # 2 obs per pose
J = np.zeros((2*n_poses*obs_per, 3*n_poses + 3*n_lm))
# Fill in Jacobian blocks (synthetic random values)
np.random.seed(42)
for t in range(n_poses):
    for k in range(obs_per):
        row = 2*(t*obs_per + k)
        J[row:row+2, 3*t:3*t+3] = np.random.randn(2, 3)  # pose Jacobian
        J[row:row+2, 9+3*k:9+3*k+3] = np.random.randn(2, 3)  # lm Jacobian

H = J.T @ J  # 15×15 information matrix
print("Nonzeros in H:", np.sum(np.abs(H) > 1e-10), "out of", 15*15)
# Sparsity: H[0:9,0:9] = pose-pose (dense), H[9:,9:] = lm-lm (block-diag),
# H[0:9,9:] = pose-lm cross-terms
# Output: Nonzeros in H: 135 out of 225

In bundle adjustment (BA), why can all K landmark variables be eliminated cheaply (in O(K) time) via the Schur complement?

Because landmarks are always close to poses, making their Jacobians nearly zero. Because landmarks are only 3D points, not 6-DOF poses, so their submatrix is smaller. Because the landmark–landmark block of H is block-diagonal (landmarks are not directly coupled to each other, only through shared poses), so each 3×3 block inverts independently in O(1). Because landmarks are eliminated by the front-end before the back-end runs.

Chapter 6: Smoothing vs Filtering — Why SLAM Keeps the Past

There are two philosophies for handling the growing state in SLAM as the robot moves further.

Filtering (EKF-SLAM): maintain only the current pose and the current landmark map. When a new observation arrives, run an EKF prediction + update. Past poses are marginalized out immediately — the filter never remembers where the robot was at time t−100. The state is small (current pose + K landmarks), but the covariance of the landmark map becomes dense because every landmark update correlates all landmarks through the robot pose.

Full smoothing (iSAM, GTSAM): keep all poses T₁, ..., T_n in the state. The SLAM back-end is a nonlinear least-squares problem over the entire trajectory. As n grows, so does the system — but it stays sparse, because each measurement only touches 2 variables. Loop closures are handled naturally: just add a new factor and re-solve. No linearization error accumulates from incrementally updating a filter.

Misconception: "Filtering is smarter because it's smaller." Wrong. EKF-SLAM marginalizes past poses, which fills in the landmark covariance matrix (the Schur complement densifies it — Chapter 7 explains why). A full SLAM with 5000 landmarks ends up with a 15 000 × 15 000 dense covariance matrix. Full smoothing over 1000 poses + 5000 landmarks has a 21 000 × 21 000 matrix that is >99% zero. Sparse smoothing wins for large-scale SLAM. The EKF approach is only practical for small maps (<100 landmarks).

Fixed-lag smoothing: a middle ground

Fixed-lag smoothing keeps only the last W poses in the active state (a "sliding window"). Poses older than W are marginalized out using the Schur complement, contributing a dense prior on the oldest active variables. This gives bounded computation O(W) per step at the cost of: (a) losing the ability to do loop closures to poses outside the window, and (b) gradually densifying the prior on the oldest active pose. Libraries like OKVIS and Kimera-VIO use fixed-lag smoothing for real-time performance.

Approach	State size	Info matrix density	Loop closures	Used in
EKF-SLAM	n_curr + K (small, fixed)	Dense (landmark block)	No (marginalized)	Classic SLAM, small maps
Fixed-lag smoother	W poses + K landmarks	Moderate (dense prior)	Within window only	OKVIS, Kimera-VIO
Full smoother	n poses + K (grows)	Sparse	Any time	iSAM2, GTSAM, ORB-SLAM3

An EKF-SLAM system runs for 10 minutes and builds a map with K=500 landmarks. It marginalized out all past poses. What is the structure of the landmark covariance matrix Σ_L?

Block-diagonal (landmarks are independent) Sparse with O(K) nonzeros Fully dense (1500×1500) — every pair of landmarks is correlated through the robot's accumulated trajectory Diagonal (landmarks are observed independently)

Chapter 7: Marginalization — The Cost of Forgetting

Marginalization is the mathematical operation of removing variables from a SLAM system while preserving all the information those variables contributed. It is the mechanism behind fixed-lag smoothing, the EKF update, and the Schur complement trick in BA. But marginalization has a cost: it creates fill-in — new edges between the neighbors of the removed variable, densifying the remaining graph.

The Schur complement as marginalization

Partition the information matrix into blocks for the variables we want to keep (Θ_a) and the variables we want to remove (Θ_b):

H = [ H_aa H_ab ] b = [ b_a ] [ H_ba H_bb ] [ b_b ]

To eliminate Θ_b (block Gaussian elimination), multiply the second block row by H_bb⁻¹H_ba and subtract from the first. The resulting system for Θ_a is:

(H_aa − H_ab H_bb⁻¹ H_ba) δ_a = b_a − H_ab H_bb⁻¹ b_b

The matrix S = H_aa − H_ab H_bb⁻¹ H_ba is the Schur complement of H_bb. It is the information matrix of Θ_a after marginalizing Θ_b. Probabilistically: S is the precision matrix of the marginal distribution p(Θ_a).

Why marginalization causes fill-in

Suppose we marginalize pose T_old which was connected to poses T_A, T_B, T_C via factors. Before marginalization: T_A and T_B have no direct edge (no factor between them). The Schur complement formula: S contributes to H_AA, H_AB, H_BB, H_BC, H_AC, etc. — all pairs of T_old's neighbors become directly coupled. If T_old had 4 neighbors, removing it creates C(4,2) = 6 new edges. This is fill-in: the marginal prior after removing T_old is a dense Gaussian over its 4 former neighbors.

Marginalization is not free. Each removed variable adds edges between all its neighbors. If you aggressively marginalize to save computation (sliding-window SLAM), the prior on the oldest retained variables becomes increasingly dense. The "savings" from smaller state come at the cost of a denser prior factor that grows quadratically in the number of neighbors.

Marginalization: Watch Fill-in Appear

Click "Marginalize T₂" to remove the middle pose and see its neighbors become directly connected.

T₂ connects T₁, T₃, and landmark l₁. Marginalizing adds edges between all three.

Hand numbers: Schur complement for the toy example

Remove T₂ from the 3-pose example. Partitions: Θ_b = T₂ (3 DOF), Θ_a = (T₁, T₃, l₁) (8 DOF). For simplicity, set all W=I and use toy Jacobians. H_bb = J₁₂_[T2]^TJ₁₂_[T2] + J₂₃_[T2]^TJ₂₃_[T2] — a 3×3 matrix, cheap to invert. The Schur complement S = H_aa − H_ab H_bb⁻¹ H_ba is an 8×8 dense matrix. Originally (T₁,T₃) had a 3×3 block only from the loop closure. After marginalization, that block is also augmented by the Schur complement contribution −H_ab[T1] H_bb⁻¹ H_ba[T3] — even without a loop closure, T₁ and T₃ would become coupled if T₂ between them is marginalized.

A pose T_old has been observed from 5 landmark measurements, creating edges to 5 landmark nodes in the factor graph. When T_old is marginalized out, how many new edges are added among the 5 landmarks?

0 — the landmarks are still independent after removing the common pose 5 — one new edge per landmark (self-loops) 25 — all pairs including self-connections 10 — C(5,2) = 10 new edges among all pairs of the 5 landmarks

Chapter 8: Showcase — Loop Closure Snap

This is the full SLAM story in one simulation. A robot drives in a loop, accumulating drift pose-by-pose via noisy odometry. The estimated trajectory spirals outward — no global consistency. The map of landmarks (squares) is smeared by the pose errors. Then: one loop-closure factor fires. Watch the Gauss-Newton optimizer iterate, redistributing residuals across the entire trajectory until the loop closes and the map snaps into a consistent global structure.

Use the controls to: add noise, set the number of Gauss-Newton iterations, toggle the loop closure on/off, and play the optimization step-by-step. The information matrix sparsity pattern is shown live on the right.

Live SLAM Factor Graph: Loop Closure Snap

Left: trajectory + landmarks. Right: sparsity pattern. Red nodes = incorrect drift. Green = after loop closure optimization.

Drift noise 0.05

GN iters 0

Add loop closure and run optimization to see drift correction.

What to notice: (1) Without loop closure: the drifted trajectory never closes; landmarks appear duplicated near the loop endpoint. (2) With loop closure but 0 iterations: the residual from the loop-closure factor is huge — the current estimate violates it badly. (3) Each GN iteration: the trajectory rotates and translates all poses simultaneously, distributing the loop-closure residual evenly. By iteration 5–10 the loop snaps closed. (4) The sparsity pattern on the right: notice the extra off-diagonal block at (T₁,T_n) that appears when the loop closure is activated — that is the only structural change, yet it reshapes the entire solution.

Chapter 1: MAP Estimation Becomes Nonlinear Least Squares

Every measurement in SLAM follows the same recipe: there is a measurement model h_ij(θ) that predicts what the sensor should observe given the current state θ, and there is an actual observation z_ij. The gap between prediction and observation — the residual — should be small if θ is correct. Under Gaussian noise with covariance Σ_ij, maximizing the MAP posterior is equivalent to minimizing the sum of squared Mahalanobis-weighted residuals.

Measurement models in SLAM

Odometry factor: the relative pose between consecutive frames, measured by VO (L11) or wheel encoders. Residual: r_t,t+1 = log([T_t⁻¹ T_t+1]⁻¹ T̄_t,t+1)^∨ ∈ R⁶. This is the SE(3) log-map residual from L10: how much the current relative pose differs from the measured one, expressed in the Lie algebra.

Landmark observation factor: for a monocular camera, h(T_t, l_k) = π(T_t⁻¹ l_k) is the perspective projection of landmark l_k into camera frame at pose T_t. Residual: r_t,k = z̄_t,k − π(T_t⁻¹ l_k) ∈ R² (pixel error).

Loop-closure factor: identical in form to the odometry factor, but connects two non-consecutive poses. This is exactly the measurement delivered by place recognition (L12) + geometric verification: T̄_ij is the estimated relative pose between poses i and j. Residual: r_ij = log([T_i⁻¹ T_j]⁻¹ T̄_ij)^∨ ∈ R⁶.

All three measurement types have the same residual form. The difference is only which variables appear: odometry touches T_t and T_t+1; a landmark factor touches T_t and l_k; a loop-closure factor touches T_i and T_j (far apart in time). This uniformity is what makes the factor graph beautiful.

The SLAM back-end objective

Collect all residuals into a stacked vector r(θ) ∈ R^m. Stack the inverse covariances into block-diagonal W. The MAP estimator is:

θ* = argmin_θ r(θ)^T W r(θ)

This is exactly weighted NLLS from L9. We linearize at θ̂ using Jacobians, solve the normal equations (Chapter 4), and retract the update onto SE(3) (L10). The only novelty in SLAM: the Jacobian J is enormous but sparse.

Hand numbers: 3-pose residual vector

Our toy example, 2D poses (3 DOF) + 2D landmark (2 DOF): θ = (T₁, T₂, T₃, l₁), dim = 3+3+3+2=11. Measurements: odo(1,2): r∈R³; odo(2,3): r∈R³; obs(1,l₁): r∈R²; obs(3,l₁): r∈R²; lc(1,3): r∈R³. Total residual dim = 3+3+2+2+3=13. J is 13×11, mostly zeros.

python  — Gauss-Newton step for tiny pose graph
import numpy as np

# Toy 2D pose graph: 3 poses (x,y,θ), 1 loop closure
# True poses: T1=(0,0,0), T2=(1,0,0), T3=(2,0,π) (returned to start)
# Drifted estimate of T3: (2.3, 0.4, π*0.85)
poses_gt  = np.array([[0,0,0],[1,0,0],[2,0,np.pi]])
poses_est = np.array([[0,0,0],[1,0,0],[2.3,0.4,np.pi*0.85]])

# Loop closure measurement: T3 should be at T1 rotated 180°
lc_meas = np.array([2.0, 0.0, np.pi])  # (dx,dy,dθ) from T1 to T3

# Residual of loop closure factor
r_lc = poses_est[2] - (poses_est[0] + lc_meas)
print(f"Loop closure residual: {r_lc}")
# [0.3, 0.4, -0.047] — 0.5m positional error, 2.7° heading error

# One GN step: distribute residual across poses (simplified linear case)
# Equal sharing: each of the 3 poses absorbs 1/3 of the correction
correction = r_lc / 3
poses_corrected = poses_est.copy()
poses_corrected[0] += correction     # T1 shifts a little
poses_corrected[1] += correction     # T2 shifts a little
poses_corrected[2] -= 2*correction  # T3 absorbs most correction
print("After 1 GN step, T3 residual:", poses_corrected[2] - (poses_corrected[0] + lc_meas))
# [0.0, 0.0, 0.0] — perfectly satisfied after correction redistribution

In the SLAM MAP objective, a loop-closure factor between pose T₅ and pose T₄₇ contributes a residual r_5,47. Which variables appear as nonzeros in the Jacobian of this residual?

Only T₅ and T₄₇. All poses T₅ through T₄₇. T₅, T₄₇, and all landmarks visible from either pose. The entire state vector θ.

Chapter 9: Connections & Cheat Sheet

SLAM Back-End Cheat Sheet

Concept	One-liner
SLAM state	θ = all poses T_1:n + all landmarks l_1:K
MAP → NLLS	Gaussian noise ⇒ maximize posterior = minimize weighted squared residuals (same as L9)
Factor graph	Bipartite graph: variable nodes (poses & landmarks) + factor nodes (measurements). Each factor = one term in the objective.
Odometry factor	Connects consecutive poses T_t, T_t+1; residual = log-map of relative pose error on SE(3)
Loop-closure factor	Connects non-consecutive T_i, T_j; same form as odometry; provided by place recognition (L12). Cancels accumulated drift globally.
Sparsity	Each factor touches 2 variables ⇒ Hessian H = J^TWJ is >99% zeros for large SLAM. Block at (a,b) iff a and b share a factor.
Normal equations	H δ = b; H = J^TWJ; b = J^TWr₀. Solve via sparse Cholesky, not dense inversion.
Variable ordering	Controls fill-in in L factor of H = LL^T. Bad order ⇒ dense L. COLAMD/AMD find near-optimal orderings.
Bundle adjustment	SLAM with only projection factors (no odometry). Arrow-pattern Hessian ⇒ eliminate landmarks cheaply via Schur complement.
Full smoothing	Keep all poses; sparse; supports loop closures anytime. iSAM2/GTSAM. Better than filtering for large SLAM.
Marginalization	Remove variables via Schur complement. Cost: adds edges among neighbors (fill-in). Densifies prior. Price of forgetting.
Wrong loop closure	One bad loop closure corrupts the global map catastrophically. Robust methods (L14) defend against this.

What degrades SLAM performance

Wrong loop closure (from L12 false positive): the optimizer tries to satisfy an impossible geometric constraint — poses get twisted into geometrically inconsistent configurations. The whole map degrades. This is the key motivation for Chapter L14 (robust SLAM / M-estimation).

Bad variable ordering: Cholesky fill-in grows from O(n) to O(n²) with pathological orderings. Real solvers use COLAMD (Column Approximate Minimum Degree) to find good orderings automatically.

Aggressive marginalization: marginalizing too many old poses too quickly densifies the prior on the active window. The sliding-window system slows down and loses the ability to detect long-range loop closures.

Where this connects in VNAV

L9: NLLS & Gauss-Newton

The core optimization machinery: linearize, solve normal equations, iterate. SLAM back-end is NLLS at scale.

↓

L10: Optimization on Manifolds

SE(3) retraction, log-map residuals, perturb-in-tangent-space. Every SLAM pose variable uses this.

↓

L11: VO/VIO

The source of odometry factors: relative pose estimates between consecutive frames. Drift = why SLAM exists.

↓

L12: Place Recognition / BoW

The source of loop-closure factors: matched frames → geometric verification → relative pose T̄_ij added to the factor graph.

↓

L13: SLAM Back-End (this lesson)

Factor graphs, sparsity, normal equations, marginalization. The global optimizer that fuses everything.

↓ next

L14: Robust SLAM

Outlier-robust cost functions (M-estimation, GNC) to survive wrong loop closures. The "what if L12 was wrong" story.

Closing thought (Luca Carlone): "The most surprising thing about SLAM is not that it works — it is that it stays fast. The SLAM problem grows without bound as the robot explores, yet each solver iteration touches only O(n) nonzero entries. Local measurements create local connections, and local connections stay sparse. The robot's memory of the world scales with the world — but the cost of updating that memory doesn't have to."

You are building a SLAM system for a drone. You must choose between: (A) EKF-SLAM with 200 landmarks, (B) full smoother (iSAM2) over 1000 poses + 200 landmarks. You need loop-closure support for a 10-minute flight. Which should you choose and why?

(A) EKF-SLAM — smaller state, less memory (B) Full smoother — supports loop closures, sparse information matrix, no drift in landmark covariance (A) EKF-SLAM — more accurate because it updates incrementally Neither — you need GPS for loop closures to work

Chapter 2: Factor Graphs — A Language for SLAM Structure

A factor graph is a bipartite graph with two kinds of nodes. Variable nodes represent the unknowns we want to estimate: robot poses (circles) and landmark positions (squares). Factor nodes (small diamonds) represent measurements — each factor encodes a log-likelihood term that couples a small set of variables. An edge connects a factor to every variable it involves.

Factor graphs were introduced by Kschischang, Frey & Loeliger (2001) for belief propagation in coding theory, but Frank Dellaert's group at Georgia Tech showed in the early 2000s that the entire SLAM problem maps naturally onto them. The key papers: Dellaert & Kaess, "Square Root SAM" (IJRR 2006) and Kaess et al., "iSAM" (TRO 2008). Today's standard library is GTSAM (Georgia Tech Smoothing and Mapping).

The four types of SLAM factors

Prior factor: pins one pose to a known value (e.g., T₁ = I). Essential to remove the gauge freedom of the problem — without it, rigidly transforming the whole trajectory is a zero-cost deformation, giving infinite solutions. In GTSAM: PriorFactor<Pose3>.

Odometry factor: between consecutive poses T_t and T_t+1. The "spring" connecting the robot's timeline. In GTSAM: BetweenFactor<Pose3>. These form the backbone — a chain of factors along the trajectory.

Measurement/projection factor: between a pose T_t and a landmark l_k. One factor per observation. In a 30-frame window observing 100 landmarks each seen 3 times, this generates 300 such factors. In GTSAM: GenericProjectionFactor (monocular), SmartProjectionFactor (multi-view).

Loop-closure factor: between two non-consecutive poses T_i and T_j, provided by the place-recognition pipeline (L12). Mathematically identical to an odometry factor in form; what differs is the distance in the timeline and the fact that this factor closes the loop — it ties a current pose back to an old one, creating a cycle in the graph that the optimizer uses to cancel drift. In GTSAM: BetweenFactor<Pose3> with the loop-closure measurement.

Misconception: "Loop closure just fixes the loop-closure pose." Wrong. When a loop-closure factor arrives, the optimizer re-solves the entire graph. Every pose and landmark shifts to minimize the total cost. The correction propagates via shared factors: T_j is pulled toward T_i, but T_j−1 is coupled to T_j via odometry, which in turn is coupled to T_j−2, and so on — all the way around the loop. Loop closure redistributes drift globally, not locally.

Factor Graph Builder

Toggle factors to grow the graph. Circles = poses, squares = landmarks, diamonds = factors.

A SLAM problem has n=5 poses and K=3 landmarks. Each pose observes exactly 2 landmarks. There is 1 loop closure and 1 prior. How many factor nodes does the factor graph have?

8 (n−1 odometry + K landmarks) 15 (5 poses × 3 landmarks) 16 (4 odometry + 10 landmark factors + 1 loop closure + 1 prior) 5 (one per pose)