Visual Navigation for Autonomous Vehicles · MIT 16.485 · Lecture 23–24

SLAM: Factor Graphs, Sparsity & Marginalization

Your robot has driven 2 km, building a map as it goes. Visual odometry accumulated 20 m of drift. A single place-recognition match (L12) just detected a loop: the robot is back where it started. That one constraint — out of thousands — should snap the entire 2 km trajectory and all landmark positions into a consistent global map. Simultaneously. In under a second. How? This is the SLAM back-end: joint MAP estimation over all poses and all landmarks, structured as a sparse factor graph, solved with sparse Cholesky decomposition. MIT 16.485 Lectures 23–24 by Luca Carlone.

Prerequisites: VNAV L9 (NLLS & Gauss-Newton) · VNAV L10 (optimization on manifolds) · VNAV L11 (VO/VIO drift) · VNAV L12 (loop closures).
10
Chapters
5
Live Canvases
Hand
Derived Numbers

Chapter 0: The Robot That Needs to Know Everything at Once

A drone completes a 2 km outdoor flight collecting stereo images and IMU data. Visual odometry (L11) estimates poses one step at a time, compounding small errors into ~20 m of drift. The map of 3D landmarks is distorted by the same errors. Now place recognition (L12) fires: the current image matches a keyframe from 8 minutes ago — the drone is back at the start. That single measurement says "pose now ≈ pose then." How does this one constraint fix 2 km of trajectory and tens of thousands of landmark positions?

The answer: we never actually threw away the old poses. A SLAM back-end holds the full state — all robot poses T1, T2, ..., Tn and all landmark positions l1, ..., lK — and finds the joint MAP estimate that best explains every measurement we have ever collected. When the loop-closure measurement arrives, the optimizer re-solves for everything simultaneously. The drift redistributes evenly around the loop.

Loop closure doesn't fix one pose — it reshapes the ENTIRE trajectory at once. The optimizer has to rebalance the tension among hundreds of odometry factors, all pulling in slightly different directions, until the new loop-closure constraint is satisfied. This is why you need a global optimizer, not a sequential filter.

The full SLAM state

Define the joint state as θ = (T1, ..., Tn, l1, ..., lK). Every measurement — odometry step, landmark observation, and now a loop closure — contributes a residual that penalizes disagreement with θ. The SLAM back-end solves:

θ* = argminθ(i,j) ∈ E ‖rij(θ)‖2Σij−1

where E is the set of all measurements (odometry edges, landmark measurements, loop closures), rij is the residual for measurement (i,j), and Σij is its noise covariance. This is the same nonlinear least-squares framework from L9 — now applied to a graph with thousands of variables.

The naive computational nightmare

If we have n=1000 poses in SE(3) (6 DOF each) and K=5000 landmarks (3D each), the state vector θ has 6000+15000 = 21 000 dimensions. A dense Gauss-Newton step requires inverting a 21 000 × 21 000 matrix — about 9 × 1012 floating-point operations. At 10 GFLOPS, that is 900 seconds per iteration. Clearly the dense approach is hopeless. The saving grace: this matrix is overwhelmingly sparse. Exploiting that sparsity is what makes SLAM tractable.

Key insight: each measurement links at most 2-3 variables. A drone pose at time t200 appears in odometry factors T199↔T200 and T200↔T201, plus landmark observations involving T200. That is ~10 nonzero blocks out of 21 000. The Hessian is >99.9% zeros.

A tiny worked example to anchor the rest of the lesson

Three poses, one landmark, one loop closure. The state θ = (T1, T2, T3, l1). Measurements: odometry T1→T2, odometry T2→T3, landmark observations (T1,l1) and (T3,l1), plus one loop closure T1↔T3. Imagine: without the loop closure, accumulated odometry drift puts T3 0.8 m off where it should be. With the loop closure, the optimizer pulls all three poses into consistency — each shifting by about 0.27 m — and the landmark lands in the correct global position. We will track this example through Chapters 1–7.

Factor Graph Overview: 3-Pose Example

Variable nodes (circles = poses, squares = landmarks) and factor nodes (tiny diamonds). Hover over a factor to see what it constrains.

After a 2 km loop with 1% VO drift, a loop-closure constraint is verified and added to the SLAM graph. Which statement correctly describes what happens next?

Chapter 3: Why the System Is Sparse

The Jacobian J of the full residual vector r(θ) has one block-row per measurement and one block-column per variable. Row (i,j) corresponds to measurement (i,j); its only nonzero blocks are in the columns for variables i and j. Everything else is zero. This is because the residual rij depends only on variable i and variable j — no other variable affects it.

Sparsity pattern = factor graph topology

The Hessian H = JTW J has a block at position (a,b) if and only if variables a and b share at least one common factor. In a pose graph: consecutive poses Tt and Tt+1 share odometry factors, so H has blocks at (Tt, Tt+1) and (Tt+1, Tt). A loop closure between T5 and T47 adds blocks at (T5, T47). Poses that never share a measurement have no block in H — those are exactly the zero entries.

Sparsity theorem (informal): in a SLAM problem where each measurement involves at most k variables and the measurement graph has m edges, the Hessian has at most m · k2 nonzero blocks out of a possible n2 blocks (n = number of variables). For typical SLAM, each measurement involves 2 variables (k=2), and m ≈ n (each variable has O(1) measurements on average) → approximately 4n nonzero blocks vs n2 possible. Sparsity fraction ≈ 4/n → 0 as n grows. A 1000-variable SLAM problem is 99.6% zeros.

Concrete sparsity pattern: toy example

Return to our 3-pose + 1-landmark example, 2D (pose dims: 3, landmark dims: 2). Variable ordering: [T1, T2, T3, l1] — block sizes [3,3,3,2] → total 11. Measurement set: odometry (1,2), odometry (2,3), observation (1,l1), observation (3,l1), loop closure (1,3). Count nonzero blocks in H (symmetric, so off-diagonal counts double):
• (1,1): appears in odo(1,2), obs(1,l1), lc(1,3) — 3 measurements → diagonal block nonzero
• (1,2): only odo(1,2) → nonzero
• (1,3): loop closure (1,3) → nonzero
• (1,l1): obs(1,l1) → nonzero
• (2,2): only odo(1,2) and odo(2,3)
• (2,3): odo(2,3) → nonzero
• (3,3): odo(2,3), obs(3,l1), lc(1,3)
• (3,l1): obs(3,l1) → nonzero
• (l1,l1): obs(1,l1), obs(3,l1)
Total: 9 nonzero blocks out of 16 possible (4×4). Without the loop closure: only 7 blocks. Loop closure adds 1 new off-diagonal block (T1,T3).

Sparsity Pattern of the Information Matrix

Filled blocks = nonzero. Toggle loop closure to see the extra block it adds. Hover a block to see which measurement creates it.

A pose graph has n=6 poses and only odometry factors (no landmarks, no loop closures). The Hessian is n×n = 6×6 block matrix. How many distinct nonzero blocks does it have (counting each off-diagonal block only once, including diagonal)?

Chapter 4: Normal Equations & Sparse Cholesky

One Gauss-Newton iteration linearizes r(θ̂) ≈ J δ + r0 and solves for the step δ. Substituting into the NLLS objective and differentiating gives the normal equations:

(JT W J) δ* = JT W r0

The matrix H = JTWJ is the information matrix (also called the Hessian). It is symmetric positive definite (when the problem has a unique minimum). The right-hand side b = JTW r0 is the gradient. Both H and b are assembled by summing contributions from each measurement: H += JijT Wij Jij, b += JijT Wij rij.

Sparse Cholesky decomposition

Since H is sparse, symmetric, and positive definite, we solve Hδ = b using sparse Cholesky factorization: H = L LT where L is a lower-triangular matrix. Then solve L y = b (forward substitution) and LT δ = y (back substitution). The cost scales with the number of nonzeros in L, not in H — and L is only slightly denser than H.

Why not just invert H? Matrix inversion of a sparse matrix gives a dense result — you lose all the sparsity. Factorization preserves sparsity in L (almost). The fill-in in L (new nonzeros created during Cholesky that were zero in H) depends critically on variable ordering — this is Chapter 7's story.

Misconception: "Modern SLAM uses iterative methods like conjugate gradient." Some do, but the dominant approach in high-accuracy SLAM (iSAM2, GTSAM) is direct sparse Cholesky with incremental factorization. Direct methods are preferred because (a) they give exact solutions (not approximations), (b) sparse Cholesky on real SLAM sparsity patterns is extremely fast, and (c) incremental updates can re-factorize only the changed variables.

Hand numbers: assembling H for the toy example

Odometry factor (1,2): residual dim = 3, so J12 is 3×11 with nonzero 3×3 blocks at columns for T1 and T2. Call them A1 and A2 (3×3 Jacobian sub-blocks with W12 = I3 for simplicity). Contribution to H: H11 += A1TA1, H12 += A1TA2, H22 += A2TA2. Sum over all 5 measurements. For the loop-closure factor (1,3): adds to H11, H13, H33. This is the only path by which T1 and T3 become directly coupled in H — before the loop closure, they were not neighbors in the information matrix.

Ordering & Fill-in: Cholesky Nonzeros

Drag the slider to reorder variables. Watch fill-in (red = Cholesky fill) grow with bad ordering.

Variable order Poses-first
When solving the sparse SLAM normal equations Hδ=b, why is it better to use sparse Cholesky factorization than direct matrix inversion of H?

Chapter 5: Bundle Adjustment — SLAM Without Odometry

Bundle adjustment (BA) is the visual-SLAM back-end when measurements are only landmark projections — no odometry. The name comes from "bundles" of light rays from the camera centers to each 3D point: we jointly adjust all camera poses and point positions until each bundle aligns with the observed image coordinates. BA is the standard back-end in Structure from Motion (SfM) systems like COLMAP.

BA vs SLAM: what's the same, what differs

The objective is exactly (23.7) from the notes: minimize ∑ ‖z̄k,t − π(Tt, lk)‖2Σl−1. The factor graph has only projection factors (no odometry edges between consecutive poses). In SfM, the "poses" correspond to different tourist cameras photographing the same monument — no temporal ordering, no odometry. In SLAM, odometry factors are added on top.

The sparsity pattern is strikingly different from a pure pose graph. In BA: the Hessian has a characteristic arrow pattern — a dense upper-left pose–pose block, a dense lower-right landmark–landmark block, and off-diagonal pose–landmark blocks. The landmark–landmark diagonal block is purely block-diagonal (landmarks are only coupled through shared poses, not directly). This arrow pattern is famous and is why the Schur complement trick (used in Chapter 7) is especially efficient for BA: eliminate all landmarks first (cheap, diagonal blocks), leaving a smaller dense system over poses only.

The "arrow" sparsity pattern of BA means you can marginalize out all K landmarks in O(K) time (they are block-diagonal), leaving a system of size n (poses). For n=100 cameras and K=10 000 points, this reduces the system from 10 100 variables to 100 in a single Schur complement step. This is the standard "reduced camera system" in SfM solvers.

SLAM = BA + odometry

Adding odometry to BA (giving visual SLAM) fills in the pose–pose block of H along the diagonal and superdiagonal — the chain of sequential pose connections. The landmark structure remains block-diagonal. The combined sparsity is the union of the two patterns. Solving the full system is still efficient because the landmark block can still be marginalized cheaply.

python  — BA factor graph assembly (toy: 3 poses, 2 landmarks)
import numpy as np

# Projection Jacobian blocks (2×3 each for pose, 2×3 for landmark)
# J_full is (2*obs, 3*n + 3*K) = (12, 9+6) = (12, 15)
n_poses, n_lm, obs_per = 3, 2, 2  # 2 obs per pose
J = np.zeros((2*n_poses*obs_per, 3*n_poses + 3*n_lm))
# Fill in Jacobian blocks (synthetic random values)
np.random.seed(42)
for t in range(n_poses):
    for k in range(obs_per):
        row = 2*(t*obs_per + k)
        J[row:row+2, 3*t:3*t+3] = np.random.randn(2, 3)  # pose Jacobian
        J[row:row+2, 9+3*k:9+3*k+3] = np.random.randn(2, 3)  # lm Jacobian

H = J.T @ J  # 15×15 information matrix
print("Nonzeros in H:", np.sum(np.abs(H) > 1e-10), "out of", 15*15)
# Sparsity: H[0:9,0:9] = pose-pose (dense), H[9:,9:] = lm-lm (block-diag),
# H[0:9,9:] = pose-lm cross-terms
# Output: Nonzeros in H: 135 out of 225
In bundle adjustment (BA), why can all K landmark variables be eliminated cheaply (in O(K) time) via the Schur complement?

Chapter 6: Smoothing vs Filtering — Why SLAM Keeps the Past

There are two philosophies for handling the growing state in SLAM as the robot moves further.

Filtering (EKF-SLAM): maintain only the current pose and the current landmark map. When a new observation arrives, run an EKF prediction + update. Past poses are marginalized out immediately — the filter never remembers where the robot was at time t−100. The state is small (current pose + K landmarks), but the covariance of the landmark map becomes dense because every landmark update correlates all landmarks through the robot pose.

Full smoothing (iSAM, GTSAM): keep all poses T1, ..., Tn in the state. The SLAM back-end is a nonlinear least-squares problem over the entire trajectory. As n grows, so does the system — but it stays sparse, because each measurement only touches 2 variables. Loop closures are handled naturally: just add a new factor and re-solve. No linearization error accumulates from incrementally updating a filter.

Misconception: "Filtering is smarter because it's smaller." Wrong. EKF-SLAM marginalizes past poses, which fills in the landmark covariance matrix (the Schur complement densifies it — Chapter 7 explains why). A full SLAM with 5000 landmarks ends up with a 15 000 × 15 000 dense covariance matrix. Full smoothing over 1000 poses + 5000 landmarks has a 21 000 × 21 000 matrix that is >99% zero. Sparse smoothing wins for large-scale SLAM. The EKF approach is only practical for small maps (<100 landmarks).

Fixed-lag smoothing: a middle ground

Fixed-lag smoothing keeps only the last W poses in the active state (a "sliding window"). Poses older than W are marginalized out using the Schur complement, contributing a dense prior on the oldest active variables. This gives bounded computation O(W) per step at the cost of: (a) losing the ability to do loop closures to poses outside the window, and (b) gradually densifying the prior on the oldest active pose. Libraries like OKVIS and Kimera-VIO use fixed-lag smoothing for real-time performance.

ApproachState sizeInfo matrix densityLoop closuresUsed in
EKF-SLAMncurr + K (small, fixed)Dense (landmark block)No (marginalized)Classic SLAM, small maps
Fixed-lag smootherW poses + K landmarksModerate (dense prior)Within window onlyOKVIS, Kimera-VIO
Full smoothern poses + K (grows)SparseAny timeiSAM2, GTSAM, ORB-SLAM3
An EKF-SLAM system runs for 10 minutes and builds a map with K=500 landmarks. It marginalized out all past poses. What is the structure of the landmark covariance matrix ΣL?

Chapter 7: Marginalization — The Cost of Forgetting

Marginalization is the mathematical operation of removing variables from a SLAM system while preserving all the information those variables contributed. It is the mechanism behind fixed-lag smoothing, the EKF update, and the Schur complement trick in BA. But marginalization has a cost: it creates fill-in — new edges between the neighbors of the removed variable, densifying the remaining graph.

The Schur complement as marginalization

Partition the information matrix into blocks for the variables we want to keep (Θa) and the variables we want to remove (Θb):

H = [ Haa   Hab ]     b = [ ba ]    [ Hba   Hbb ]      [ bb ]

To eliminate Θb (block Gaussian elimination), multiply the second block row by Hbb−1Hba and subtract from the first. The resulting system for Θa is:

(Haa − Hab Hbb−1 Hba) δa = ba − Hab Hbb−1 bb

The matrix S = Haa − Hab Hbb−1 Hba is the Schur complement of Hbb. It is the information matrix of Θa after marginalizing Θb. Probabilistically: S is the precision matrix of the marginal distribution p(Θa).

Why marginalization causes fill-in

Suppose we marginalize pose Told which was connected to poses TA, TB, TC via factors. Before marginalization: TA and TB have no direct edge (no factor between them). The Schur complement formula: S contributes to HAA, HAB, HBB, HBC, HAC, etc. — all pairs of Told's neighbors become directly coupled. If Told had 4 neighbors, removing it creates C(4,2) = 6 new edges. This is fill-in: the marginal prior after removing Told is a dense Gaussian over its 4 former neighbors.

Marginalization is not free. Each removed variable adds edges between all its neighbors. If you aggressively marginalize to save computation (sliding-window SLAM), the prior on the oldest retained variables becomes increasingly dense. The "savings" from smaller state come at the cost of a denser prior factor that grows quadratically in the number of neighbors.
Marginalization: Watch Fill-in Appear

Click "Marginalize T2" to remove the middle pose and see its neighbors become directly connected.

T2 connects T1, T3, and landmark l1. Marginalizing adds edges between all three.

Hand numbers: Schur complement for the toy example

Remove T2 from the 3-pose example. Partitions: Θb = T2 (3 DOF), Θa = (T1, T3, l1) (8 DOF). For simplicity, set all W=I and use toy Jacobians. Hbb = J12[T2]TJ12[T2] + J23[T2]TJ23[T2] — a 3×3 matrix, cheap to invert. The Schur complement S = Haa − Hab Hbb−1 Hba is an 8×8 dense matrix. Originally (T1,T3) had a 3×3 block only from the loop closure. After marginalization, that block is also augmented by the Schur complement contribution −Hab[T1] Hbb−1 Hba[T3] — even without a loop closure, T1 and T3 would become coupled if T2 between them is marginalized.

A pose Told has been observed from 5 landmark measurements, creating edges to 5 landmark nodes in the factor graph. When Told is marginalized out, how many new edges are added among the 5 landmarks?

Chapter 8: Showcase — Loop Closure Snap

This is the full SLAM story in one simulation. A robot drives in a loop, accumulating drift pose-by-pose via noisy odometry. The estimated trajectory spirals outward — no global consistency. The map of landmarks (squares) is smeared by the pose errors. Then: one loop-closure factor fires. Watch the Gauss-Newton optimizer iterate, redistributing residuals across the entire trajectory until the loop closes and the map snaps into a consistent global structure.

Use the controls to: add noise, set the number of Gauss-Newton iterations, toggle the loop closure on/off, and play the optimization step-by-step. The information matrix sparsity pattern is shown live on the right.

Live SLAM Factor Graph: Loop Closure Snap

Left: trajectory + landmarks. Right: sparsity pattern. Red nodes = incorrect drift. Green = after loop closure optimization.

Drift noise 0.05
GN iters 0
Add loop closure and run optimization to see drift correction.
What to notice: (1) Without loop closure: the drifted trajectory never closes; landmarks appear duplicated near the loop endpoint. (2) With loop closure but 0 iterations: the residual from the loop-closure factor is huge — the current estimate violates it badly. (3) Each GN iteration: the trajectory rotates and translates all poses simultaneously, distributing the loop-closure residual evenly. By iteration 5–10 the loop snaps closed. (4) The sparsity pattern on the right: notice the extra off-diagonal block at (T1,Tn) that appears when the loop closure is activated — that is the only structural change, yet it reshapes the entire solution.

Chapter 1: MAP Estimation Becomes Nonlinear Least Squares

Every measurement in SLAM follows the same recipe: there is a measurement model hij(θ) that predicts what the sensor should observe given the current state θ, and there is an actual observation zij. The gap between prediction and observation — the residual — should be small if θ is correct. Under Gaussian noise with covariance Σij, maximizing the MAP posterior is equivalent to minimizing the sum of squared Mahalanobis-weighted residuals.

Measurement models in SLAM

Odometry factor: the relative pose between consecutive frames, measured by VO (L11) or wheel encoders. Residual: rt,t+1 = log([Tt−1 Tt+1]−1t,t+1) ∈ R6. This is the SE(3) log-map residual from L10: how much the current relative pose differs from the measured one, expressed in the Lie algebra.

Landmark observation factor: for a monocular camera, h(Tt, lk) = π(Tt−1 lk) is the perspective projection of landmark lk into camera frame at pose Tt. Residual: rt,k = z̄t,k − π(Tt−1 lk) ∈ R2 (pixel error).

Loop-closure factor: identical in form to the odometry factor, but connects two non-consecutive poses. This is exactly the measurement delivered by place recognition (L12) + geometric verification: T̄ij is the estimated relative pose between poses i and j. Residual: rij = log([Ti−1 Tj]−1ij) ∈ R6.

All three measurement types have the same residual form. The difference is only which variables appear: odometry touches Tt and Tt+1; a landmark factor touches Tt and lk; a loop-closure factor touches Ti and Tj (far apart in time). This uniformity is what makes the factor graph beautiful.

The SLAM back-end objective

Collect all residuals into a stacked vector r(θ) ∈ Rm. Stack the inverse covariances into block-diagonal W. The MAP estimator is:

θ* = argminθ r(θ)T W r(θ)

This is exactly weighted NLLS from L9. We linearize at θ̂ using Jacobians, solve the normal equations (Chapter 4), and retract the update onto SE(3) (L10). The only novelty in SLAM: the Jacobian J is enormous but sparse.

Hand numbers: 3-pose residual vector

Our toy example, 2D poses (3 DOF) + 2D landmark (2 DOF): θ = (T1, T2, T3, l1), dim = 3+3+3+2=11. Measurements: odo(1,2): r∈R3; odo(2,3): r∈R3; obs(1,l1): r∈R2; obs(3,l1): r∈R2; lc(1,3): r∈R3. Total residual dim = 3+3+2+2+3=13. J is 13×11, mostly zeros.

python  — Gauss-Newton step for tiny pose graph
import numpy as np

# Toy 2D pose graph: 3 poses (x,y,θ), 1 loop closure
# True poses: T1=(0,0,0), T2=(1,0,0), T3=(2,0,π) (returned to start)
# Drifted estimate of T3: (2.3, 0.4, π*0.85)
poses_gt  = np.array([[0,0,0],[1,0,0],[2,0,np.pi]])
poses_est = np.array([[0,0,0],[1,0,0],[2.3,0.4,np.pi*0.85]])

# Loop closure measurement: T3 should be at T1 rotated 180°
lc_meas = np.array([2.0, 0.0, np.pi])  # (dx,dy,dθ) from T1 to T3

# Residual of loop closure factor
r_lc = poses_est[2] - (poses_est[0] + lc_meas)
print(f"Loop closure residual: {r_lc}")
# [0.3, 0.4, -0.047] — 0.5m positional error, 2.7° heading error

# One GN step: distribute residual across poses (simplified linear case)
# Equal sharing: each of the 3 poses absorbs 1/3 of the correction
correction = r_lc / 3
poses_corrected = poses_est.copy()
poses_corrected[0] += correction     # T1 shifts a little
poses_corrected[1] += correction     # T2 shifts a little
poses_corrected[2] -= 2*correction  # T3 absorbs most correction
print("After 1 GN step, T3 residual:", poses_corrected[2] - (poses_corrected[0] + lc_meas))
# [0.0, 0.0, 0.0] — perfectly satisfied after correction redistribution
In the SLAM MAP objective, a loop-closure factor between pose T5 and pose T47 contributes a residual r5,47. Which variables appear as nonzeros in the Jacobian of this residual?

Chapter 9: Connections & Cheat Sheet

SLAM Back-End Cheat Sheet

ConceptOne-liner
SLAM stateθ = all poses T1:n + all landmarks l1:K
MAP → NLLSGaussian noise ⇒ maximize posterior = minimize weighted squared residuals (same as L9)
Factor graphBipartite graph: variable nodes (poses & landmarks) + factor nodes (measurements). Each factor = one term in the objective.
Odometry factorConnects consecutive poses Tt, Tt+1; residual = log-map of relative pose error on SE(3)
Loop-closure factorConnects non-consecutive Ti, Tj; same form as odometry; provided by place recognition (L12). Cancels accumulated drift globally.
SparsityEach factor touches 2 variables ⇒ Hessian H = JTWJ is >99% zeros for large SLAM. Block at (a,b) iff a and b share a factor.
Normal equationsH δ = b; H = JTWJ; b = JTWr0. Solve via sparse Cholesky, not dense inversion.
Variable orderingControls fill-in in L factor of H = LLT. Bad order ⇒ dense L. COLAMD/AMD find near-optimal orderings.
Bundle adjustmentSLAM with only projection factors (no odometry). Arrow-pattern Hessian ⇒ eliminate landmarks cheaply via Schur complement.
Full smoothingKeep all poses; sparse; supports loop closures anytime. iSAM2/GTSAM. Better than filtering for large SLAM.
MarginalizationRemove variables via Schur complement. Cost: adds edges among neighbors (fill-in). Densifies prior. Price of forgetting.
Wrong loop closureOne bad loop closure corrupts the global map catastrophically. Robust methods (L14) defend against this.

What degrades SLAM performance

Wrong loop closure (from L12 false positive): the optimizer tries to satisfy an impossible geometric constraint — poses get twisted into geometrically inconsistent configurations. The whole map degrades. This is the key motivation for Chapter L14 (robust SLAM / M-estimation).

Bad variable ordering: Cholesky fill-in grows from O(n) to O(n2) with pathological orderings. Real solvers use COLAMD (Column Approximate Minimum Degree) to find good orderings automatically.

Aggressive marginalization: marginalizing too many old poses too quickly densifies the prior on the active window. The sliding-window system slows down and loses the ability to detect long-range loop closures.

Where this connects in VNAV

The core optimization machinery: linearize, solve normal equations, iterate. SLAM back-end is NLLS at scale.
SE(3) retraction, log-map residuals, perturb-in-tangent-space. Every SLAM pose variable uses this.
The source of odometry factors: relative pose estimates between consecutive frames. Drift = why SLAM exists.
The source of loop-closure factors: matched frames → geometric verification → relative pose T̄ij added to the factor graph.
L13: SLAM Back-End (this lesson)
Factor graphs, sparsity, normal equations, marginalization. The global optimizer that fuses everything.
↓ next
L14: Robust SLAM
Outlier-robust cost functions (M-estimation, GNC) to survive wrong loop closures. The "what if L12 was wrong" story.
Closing thought (Luca Carlone): "The most surprising thing about SLAM is not that it works — it is that it stays fast. The SLAM problem grows without bound as the robot explores, yet each solver iteration touches only O(n) nonzero entries. Local measurements create local connections, and local connections stay sparse. The robot's memory of the world scales with the world — but the cost of updating that memory doesn't have to."
You are building a SLAM system for a drone. You must choose between: (A) EKF-SLAM with 200 landmarks, (B) full smoother (iSAM2) over 1000 poses + 200 landmarks. You need loop-closure support for a 10-minute flight. Which should you choose and why?

Chapter 2: Factor Graphs — A Language for SLAM Structure

A factor graph is a bipartite graph with two kinds of nodes. Variable nodes represent the unknowns we want to estimate: robot poses (circles) and landmark positions (squares). Factor nodes (small diamonds) represent measurements — each factor encodes a log-likelihood term that couples a small set of variables. An edge connects a factor to every variable it involves.

Factor graphs were introduced by Kschischang, Frey & Loeliger (2001) for belief propagation in coding theory, but Frank Dellaert's group at Georgia Tech showed in the early 2000s that the entire SLAM problem maps naturally onto them. The key papers: Dellaert & Kaess, "Square Root SAM" (IJRR 2006) and Kaess et al., "iSAM" (TRO 2008). Today's standard library is GTSAM (Georgia Tech Smoothing and Mapping).

The four types of SLAM factors

Prior factor: pins one pose to a known value (e.g., T1 = I). Essential to remove the gauge freedom of the problem — without it, rigidly transforming the whole trajectory is a zero-cost deformation, giving infinite solutions. In GTSAM: PriorFactor<Pose3>.

Odometry factor: between consecutive poses Tt and Tt+1. The "spring" connecting the robot's timeline. In GTSAM: BetweenFactor<Pose3>. These form the backbone — a chain of factors along the trajectory.

Measurement/projection factor: between a pose Tt and a landmark lk. One factor per observation. In a 30-frame window observing 100 landmarks each seen 3 times, this generates 300 such factors. In GTSAM: GenericProjectionFactor (monocular), SmartProjectionFactor (multi-view).

Loop-closure factor: between two non-consecutive poses Ti and Tj, provided by the place-recognition pipeline (L12). Mathematically identical to an odometry factor in form; what differs is the distance in the timeline and the fact that this factor closes the loop — it ties a current pose back to an old one, creating a cycle in the graph that the optimizer uses to cancel drift. In GTSAM: BetweenFactor<Pose3> with the loop-closure measurement.

Misconception: "Loop closure just fixes the loop-closure pose." Wrong. When a loop-closure factor arrives, the optimizer re-solves the entire graph. Every pose and landmark shifts to minimize the total cost. The correction propagates via shared factors: Tj is pulled toward Ti, but Tj−1 is coupled to Tj via odometry, which in turn is coupled to Tj−2, and so on — all the way around the loop. Loop closure redistributes drift globally, not locally.
Factor Graph Builder

Toggle factors to grow the graph. Circles = poses, squares = landmarks, diamonds = factors.

A SLAM problem has n=5 poses and K=3 landmarks. Each pose observes exactly 2 landmarks. There is 1 loop closure and 1 prior. How many factor nodes does the factor graph have?