Robust SLAM & Frontiers — MIT 16.485 VNAV

Chapter 0: The One Lie That Folds Your Map

The SLAM back-end from L13 is beautiful. It encodes every measurement as a factor, exploits the resulting sparsity, and solves in milliseconds via sparse Cholesky. The loop closure from L12 snaps the 2 km trajectory into a globally consistent map. This is the happy path.

Now imagine a single bad thing: L12's bag-of-words retrieval finds a visually similar but geometrically wrong match — a false positive. It says "pose T_k is back at pose T₁," but it isn't. A loop-closure factor with a wrong relative-pose measurement enters the factor graph.

What happens? The optimizer minimizes the sum of squared residuals. The true inlier measurements generate small residuals. The single wrong loop closure generates a huge residual — and because the cost is squared, that one outlier contributes a cost that swamps all the true measurements combined. The optimizer stretches the entire trajectory to satisfy the lie. Your perfectly calibrated map folds into something unusable.

The outlier problem at the back-end is fundamentally different from RANSAC (L8). RANSAC operates in the front-end on a single two-view model at a time — it finds one clean set of inliers for one pose estimate. Robust SLAM must reject outliers inside the joint global optimization over thousands of variables and all measurements simultaneously. RANSAC cannot help you here.

A concrete catastrophe — worked numbers

Three poses in 1D. True positions: x₁=0, x₂=1, x₃=2. Odometry measurements: z₁₂=1.0, z₂₃=1.0 (perfect). One inlier loop closure says x₃−x₁≈2.0 (correct). One outlier loop closure says x₃−x₁≈0.0 (the lie: poses are at the same spot).

Standard least-squares cost with σ=0.1 for odometry, σ=0.05 for loop closures:

f(θ) = ((x₂−x₁−1)²/0.01) + ((x₃−x₂−1)²/0.01) + ((x₃−x₁−2)²/0.0025) + ((x₃−x₁−0)²/0.0025)

The last term has information weight 400 (=1/0.0025). Even with x₁=0, x₃=2, that term contributes 400×4=1600 cost units. The optimizer will move x₁ and x₃ toward each other to reduce it, pulling x₃ down toward 0 and x₁ up. The inlier odometry (weight 100 each) loses the tug-of-war. Final: x₁≈0.8, x₂≈1.0, x₃≈1.2 — the trajectory has compressed to half its true extent. The outlier wins.

One Wrong Loop Closure Folds the Map

Five-pose 2D trajectory. Toggle the outlier loop closure and watch least-squares deform the map.

Why does a single outlier loop closure corrupt the ENTIRE map in standard least-squares SLAM, rather than affecting only the two poses it directly links?

Because loop closures are given higher weight than odometry factors by default. The outlier only affects the two poses it links; SLAM is otherwise robust to bad measurements. The squared cost makes the outlier's contribution grow quadratically with its error, overwhelming all inlier costs and forcing ALL variables to move to satisfy the wrong constraint. Because the Hessian matrix is dense, so any one factor influences every variable equally.

Chapter 1: M-Estimators: Replace the Square

The squared cost is pathological for outliers because its derivative — the "influence" a measurement exerts on the solution — grows without bound as the residual grows. A measurement 10σ away pulls 100 times harder than one 1σ away. The fix: replace the squared cost with a function that saturates for large residuals. These are called M-estimators (maximum-likelihood-type estimators), introduced by Huber in 1964.

The general robust SLAM objective replaces each squared term with a robust cost ρ:

θ* = argmin_θ ∑_(i,j) ρ(‖r_ij(θ)‖_{Σ_ij⁻¹})

where ‖r‖_Ω = (r^TΩr)^1⁄2 is the Mahalanobis norm. The scalar u = ‖r‖_Σ⁻¹ is the whitened residual. We choose ρ(u) to grow slowly for large u.

The influence function: how hard can a measurement pull?

The influence function ψ(u) = ρ′(u) tells you how much a measurement with whitened residual u can perturb the solution. For squared cost, ρ(u)=u²⁄2, so ψ(u)=u — unbounded. For a well-designed robust cost, ψ(u) is bounded: no single measurement can pull the solution by more than a fixed amount, no matter how large its residual. This is the key property that rejects outliers.

Bounded influence = outlier rejection. If the maximum influence any measurement can exert is capped at some constant, then a single outlier with a huge residual cannot dominate the solution. The inlier consensus — many measurements each pulling with moderate force — wins. This is the fundamental principle behind all robust estimators.

Weight function

The weight function w(u) = ψ(u)⁄u = ρ′(u)⁄u tells you the effective weight multiplied by u². For squared cost, w(u)=1 (all equal weight). For robust costs, w(u) decreases as u grows — high-residual measurements get down-weighted. This connects M-estimators to the IRLS algorithm in Chapter 3.

What property of a robust cost function ρ(u) guarantees that an outlier measurement with residual u=100 cannot dominate the SLAM solution over 1000 inliers each with residual u≈1?

ρ(u) must grow faster than u² for large u to penalize outliers more heavily. ρ(u) must be strictly convex so the optimizer converges. The influence function ψ(u)=ρ′(u) must be bounded — capping how much any single measurement can pull the solution regardless of residual size. The weight function w(u) must equal zero for all u>1 so outliers are exactly zeroed out.

Chapter 2: The Four Cost Functions

Four robust costs appear throughout SLAM literature. Each represents a different tradeoff between how aggressively it down-weights outliers and how convex (and thus solver-friendly) it remains.

Squared (standard LS) — baseline

ρ(u) = u²⁄2 ψ(u) = u w(u) = 1

No down-weighting. Unbounded influence. Optimal when ALL measurements are inliers from a Gaussian distribution.

Huber loss — piecewise quadratic/linear

ρ_H(u) = u²⁄2 if |u|≤k, k(|u|−k⁄2) if |u|>k

Below threshold k: squared (full weight). Above k: linear growth, so influence ψ(u)=k (capped). The weight function w(u)=k/|u| decays as 1/u for large residuals. Convex everywhere — the optimizer is well-behaved but influence only saturates (linearly), not truly bounded.

Cauchy (Lorentzian) loss — truly bounded influence

ρ_C(u) = (c²⁄2) ln(1 + u²⁄c²) ψ(u) = u ⁄ (1 + u²⁄c²) w(u) = 1 ⁄ (1 + u²⁄c²)

Influence is bounded and actually decreases for very large residuals. A measurement at u=100c has near-zero weight. Non-convex for u>c/√2, introducing the local-minima problem (Chapter 4).

Truncated Least Squares (TLS) — hard rejection

ρ_TLS(u) = u² if u≤c, c² if u>c

Cost is literally constant beyond threshold c: w(u)=0 for |u|>c. The measurement is completely ignored — hard zeroing of the outlier. Highly non-convex. Most aggressive rejection, most severe local-minima problem. But when GNC (Chapter 5) solves the non-convexity, TLS gives the cleanest separation of inliers from outliers.

Hand-computed comparison at u=1 and u=10 (k=c=1)

Cost	ρ(u=1)	ρ(u=10)	Ratio ρ(10)/ρ(1)	w(u=10)
Squared	0.5	50	100×	1.0
Huber (k=1)	0.5	9.5	19×	0.10
Cauchy (c=1)	0.347	2.40	6.9×	0.0099
TLS (c=1)	1.0	1.0 (capped)	1×	0

The squared cost at u=10 is 100 times worse than at u=1; TLS doesn't even notice. This is the quantitative version of "outliers dominate".

Robust Cost Shapes & Influence Functions

Left: cost ρ(u). Right: influence ψ(u)=ρ′(u). Drag the kernel-parameter slider to watch how the threshold separates inliers from outliers.

Kernel κ 1.00

At whitened residual u=5 with kernel parameter c=1, rank these costs from LARGEST to SMALLEST: Squared, Huber, Cauchy, TLS.

Squared > Huber > Cauchy > TLS TLS > Cauchy > Huber > Squared Squared > Cauchy > TLS > Huber Huber > Squared > Cauchy > TLS

Chapter 3: IRLS: Turning Robust Costs into Weighted LS

Minimizing a robust cost function ∑ρ(r_i) looks like a completely different beast from the NLLS machinery of L9. But there is an elegant bridge: Iteratively Reweighted Least Squares (IRLS).

The key observation: the M-estimator gradient condition ∂f⁄∂θ = 0 can be rewritten as a weighted least-squares normal equation with weights w_i = ρ′(u_i)⁄u_i. This gives us an iterative algorithm:

Initialize

Start from some θ₀ (e.g., odometry chain without loop closures)

↓

Compute residuals

r_i = r_i(θ_t) for all measurements i

↓

Compute weights

w_i ← ρ′(‖r_i‖)⁄‖r_i‖ using current residuals

↓

Solve weighted LS

θ_t+1 = argmin_θ ∑ w_i ‖r_i(θ)‖² — sparse Cholesky from L13!

↻ repeat until convergence

The inner WLS step uses exactly the sparse Cholesky machinery from L13 — just with modified diagonal weights. Each outer iteration re-computes weights based on current residuals, gradually down-weighting high-residual (outlier) measurements.

Worked 1D IRLS example

Five measurements of a single scalar x. True value x=0. Four inliers: z_1..4 = {0.1, -0.2, 0.15, -0.1} with σ=0.3. One outlier: z₅=5.0 with the same σ=0.3 (whitened residual |z₅-x|/0.3≈16 — enormous).

Iteration 0 (plain LS start): Weights all w_i=1. Solution x=(0.1-0.2+0.15-0.1+5.0)/5=0.99. Clearly pulled by the outlier.

Iteration 1 (Cauchy, c=1): Compute residuals at x=0.99: r_1..4≈0.89..1.19 (whitened: ~3-4), r₅=4.01 (whitened: ~13.4). Weights w_i=1/(1+u_i²/c²): w_1..4≈0.06..0.10, w₅=1/(1+179)≈0.0056. New WLS: x = (∑ w_iz_i)⁄(∑ w_i). Sum of w_iz_i for inliers: ~0.008+(-0.016)+0.012+(-0.008)=-0.004. For outlier: 0.0056×5=0.028. Denominator: ~0.078+0.0056=0.084. x≈0.028⁄0.084=0.28. Much better — outlier weight is 10× smaller than each inlier.

After 3-5 iterations: x converges to ≈0.01. The outlier's weight falls below 0.001. The inlier consensus wins.

RANSAC (L8) cleans the front-end one model at a time; IRLS rejects outliers inside the joint global optimization. IRLS never explicitly labels any measurement as an outlier — it just gives it a tiny weight. The information from inliers dominates automatically. This is the key distinction from the front-end.

IRLS Reweighting Steps on a 1D Fit

Click Step to run one IRLS iteration. Watch the outlier's weight (red dot) shrink and the estimated mean converge to the inlier consensus.

In IRLS iteration 2, measurement z₅=5.0 (outlier) has whitened residual u=13.4 and Cauchy kernel c=1, giving weight w=1/(1+179)≈0.006. Four inliers each have w≈0.08. How will the estimate change in iteration 2 compared to iteration 1?

The estimate will jump back toward z₅=5.0 because one outlier = one vote as the solution converges. The estimate will move further toward the inlier consensus near 0, since the 4 inliers together have total weight ~0.32 vs outlier weight ~0.006. The estimate stays exactly the same because IRLS has converged after iteration 1. The estimate diverges because Cauchy is non-convex for large residuals.

Chapter 4: The Non-Convexity Trap

There is a catch. Robust cost functions like Cauchy and TLS are non-convex: the cost landscape has multiple local minima. Standard gradient-based optimizers (Gauss-Newton, IRLS) converge to whichever local minimum they start near. This is a problem because:

If initialized near a local minimum dominated by the outlier, the optimizer stays there.
The "correct" minimum (where inliers agree) may be far in configuration space from the outlier-dominated start.
Unlike the convex squared-cost case, there is no guarantee that gradient descent finds the global optimum.

Ironically, a robust cost can sometimes give a worse solution than plain least-squares if the initial estimate is poor. IRLS is not magic — it's a local optimizer.

Robust costs are non-convex — naively optimizing one can land in a worse local minimum than plain least-squares. If your initialization (typically: the VO chain without loop closures) happens to be near the outlier's local minimum, IRLS will happily converge there and hand you a corrupted map. This is why gradient methods alone are insufficient for robust SLAM. You need either a good initialization OR an algorithm that escapes local minima.

The 1D intuition: two bowls

Consider a 1D SLAM problem with one strong inlier constraint pulling x toward 0 and one strong outlier constraint pulling x toward 10. With squared cost: two bowls whose gradients always oppose each other — the optimizer finds the weighted average. With TLS: two flat floors with steep walls. If you start near x=10, you are in the outlier's bowl. IRLS converges to x=10. The inlier bowl at x=0 exists but you never reach it.

Non-Convexity and Local Minima in Robust Costs

Total cost landscape (inlier at 0 + outlier at 10) under different cost functions. Drag the init slider to see where gradient descent converges.

Init x₀ 1.0 Cost

A SLAM problem has 20 inlier loop closures centered at x=0 and 1 outlier loop closure at x=10. With TLS cost and initialization at x=8, where does gradient descent converge?

To x=0, because 20 inliers always beat 1 outlier regardless of initialization. To x≈10, because starting near the outlier's local minimum means the optimizer stays there — the 20 inliers are in a different basin of attraction. To the average x=(0×20+10×1)/21≈0.48. TLS always finds the global optimum regardless of initialization because it exactly zeros outliers.

Chapter 5: Graduated Non-Convexity (GNC)

Graduated Non-Convexity (GNC) is Carlone's answer to the local-minima problem. The core idea: instead of optimizing the non-convex robust cost directly, start with a convex surrogate and gradually "turn up" the non-convexity, annealing toward the true cost. Each step of this schedule is a small perturbation of a problem you just solved — so the solution tracks the global optimum.

The GNC schedule

Define a family of costs ρ_μ(u) parameterized by a control parameter μ such that:

At μ = μ₀: ρ_μ₀ is convex (e.g., the squared cost). Easy to optimize globally.
At μ = μ_∞: ρ_{μ_∞} is the target non-convex cost (e.g., TLS).
The family smoothly interpolates between them.

Algorithm: solve at μ₀, warm-start at μ₁>μ₀ using the previous solution, repeat until μ=μ_∞.

GNC-TLS: the Carlone construction

For truncated least squares with threshold c², define the GNC surrogate (Yang et al. 2020, Carlone group):

ρ_μ(u) = c²μu² ⁄ ((μ+1)c² + u²), μ ∈ (1, ∞)

At μ→∞: ρ_μ→c²u²⁄u²=c² (TLS plateau). At μ=1: ρ₁(u)=c²u²⁄(2c²+u²) — smoother, more convex. Schedule: multiply μ by a factor μ = μ×1.4 each step. With ~20 steps, we go from convex to effectively TLS.

Why GNC rejects outliers without a good initial guess

At μ=μ₀ (convex surrogate), there is only one basin — the global optimum is the inlier consensus, because convexity guarantees uniqueness. As μ increases, the outlier basin gradually forms, but the solution is already in the inlier basin and tracks it through each small step. The outlier basin never "captures" the solution because each step's perturbation is small relative to the basin width.

GNC gives outlier-robust SLAM without ANY initial guess beyond the trivial (e.g., identity). This is the key advance over IRLS alone. You do not need VO-initialized poses. A UAV that crashed into an unknown environment can run GNC and get a correct map, even with 50%+ outlier loop closures — as demonstrated in Carlone's experiments.

Worked μ schedule example

One inlier at z=0 (σ=1), one outlier at z=10 (σ=1). TLS threshold c=2. Start μ=1, multiply by 1.5 each step.

μ	w_inlier(z=0)	w_outlier(z=10)	Estimate x*
1	0.80	0.04	0.48
1.5	0.86	0.02	0.21
2.25	0.92	0.008	0.08
10	0.99	0.0004	0.004
∞ (TLS)	1.0	0	0.0

The estimate converges to the inlier truth from any start because the convex phase pinned the solution to the correct basin.

GNC: Annealing from Convex to Non-Convex

The cost landscape at increasing μ. Watch the outlier basin (right) form while the solution (orange dot) tracks the inlier minimum (left). Drag μ manually or click Anneal.

μ 1.0

Why does GNC succeed at finding the inlier-consensus solution even when started from a random initial guess near the outlier, while naive IRLS with TLS cost fails?

GNC runs many random restarts and picks the best solution, so it eventually finds the inlier basin by chance. GNC uses a better line-search than IRLS, allowing it to jump out of local minima. GNC's initial convex surrogate has only one basin (the inlier consensus), and the solution tracks this basin through each small annealing step before the outlier basin forms. GNC explicitly labels outliers before optimizing, avoiding the need to handle them inside the cost function.

Chapter 6: Switchable Constraints & Consistency Checks

GNC is not the only approach. Two complementary families attack the same problem from different angles.

Switchable Constraints (SC)

Introduced by Sünderhauf & Protzel (2012). Each loop-closure factor gets a continuous "switch variable" s_i ∈ [0,1] that interpolates between "fully active" (s_i=1) and "fully off" (s_i=0). The augmented objective:

f(θ, s) = ∑_odo ‖r_ij‖² + ∑_lc s_k²‖r_k‖² + ∑_lc λ(1−s_k)²

The last term is a regularizer penalizing s_k away from 1 (switching off has a cost). The optimizer simultaneously solves for all poses AND all switch variables. A genuine inlier loop closure is easy to satisfy — its switch variable stays near 1. An outlier generates a large residual at s_k=1 — the optimizer prefers to reduce s_k toward 0, paying only the small regularizer cost. Outliers are "switched off" automatically.

The augmented state is larger (one extra scalar per loop closure), but the problem structure is still sparse. Standard sparse solvers handle it. The switch variables provide explicit "which loop closures did we trust?" output.

Max-Mixtures

Olson & Agarwal (2012) model each loop-closure measurement as a mixture of two Gaussians: one tight (inlier model) and one very broad (outlier model). The cost becomes −log max(π_in·N(r;0,Σ_in), π_out·N(r;0,Σ_out)). The max approximation keeps the problem sparse; effectively the inlier Gaussian dominates for small residuals, the outlier Gaussian for large ones.

Pairwise Consistency Maximization (PCM)

A pre-optimization front-end filter developed by Mangelson et al. (2018). Before adding any loop closures to the optimizer, PCM builds a "consistency graph": nodes are candidate loop closures, edges connect pairs that are geometrically consistent with each other (their combined relative-pose forms a plausible loop). Outlier loop closures tend to be inconsistent with the true inliers. PCM finds the maximum clique in the consistency graph — the largest set of mutually consistent loop closures — and only passes those to the back-end optimizer.

PCM removes outliers BEFORE the optimizer sees them, while GNC removes them INSIDE the optimizer. They are complementary: PCM reduces the problem size and provides a clean initialization; GNC handles any remaining false positives that slip through geometric consistency. Modern systems (Kimera, DOOR-SLAM) combine both.

A switchable-constraints SLAM system processes a loop closure with a large residual (the loop is almost certainly wrong). After optimization, what will the switch variable s_k for that loop closure be, and why?

s_k≈1, because the optimizer always tries to use every available measurement. s_k=0 exactly, because the cost function hard-zeroes measurements above a threshold. s_k≈0, because the large residual makes s_k²‖r_k‖² very costly at s_k=1, so the optimizer reduces s_k to pay only the small regularizer λ(1−s_k)² instead. The switch variable cannot help here because it shares the same local minimum issue as all M-estimators.

Chapter 7: Certifiable & Global Optimality

GNC gives empirically excellent results, but how do you know the solution you got is actually the global optimum? Can you prove it? This is the question of certifiable perception — one of Carlone's central research contributions.

The SE-Sync idea

Rosen et al. (2019) showed that the pose graph optimization problem (SLAM with only relative-pose measurements, no landmarks) can be reformulated as a semidefinite program (SDP). SDPs are convex — they can be solved globally. The catch: the original problem has non-convex constraints (rotation matrices must lie in SO(3)). SE-Sync relaxes these constraints to get an SDP, solves it globally, then checks if the relaxation is tight (i.e., the SDP solution is also feasible for the original problem).

Certificate of global optimality

The SDP gives a dual certificate: a proof that no better solution exists. Specifically, if the relaxation is tight, the dual variable (a PSD matrix called the certificate matrix) has rank 1 — meaning the global optimum of the relaxed problem coincides with a valid rotation solution. When the rank is 1, you have a certificate. When rank > 1, the relaxation is not tight and you know you need to look further.

Practical result from SE-Sync experiments: on most real-world SLAM datasets (KITTI, EuRoC, MIT Stata, TUM), the relaxation IS tight — meaning the iterative solution (GN/iSAM2) had already found the global optimum, and SE-Sync confirms it. The certificate fails (rank > 1) mainly when there are many outlier loop closures or very high noise — exactly when you need to know you might be in a local minimum.

Practical use in a SLAM pipeline

SE-Sync is not meant to replace iSAM2 for real-time operation — one SDP solve is slow (seconds to minutes for large graphs). Instead it is used as a verification step: run iSAM2 in real-time, periodically run SE-Sync offline to certify that the accumulated map is globally optimal. If certification fails, trigger a re-optimization with GNC. Think of it as a "sanity check" layer.

SE-Sync solves a semidefinite relaxation of SLAM. After the SDP solver returns a solution, the "certificate matrix" has rank 3 (not 1). What does this mean for the SLAM result?

Rank 3 means the SDP found three candidate solutions — the best one is the global optimum. Rank 3 is expected for a 3D problem (SE(3) has 3 rotational DOF), so this certifies optimality. The relaxation is not tight: the SDP solution does not correspond to a valid rotation, so we cannot certify global optimality — there may be a better solution that SE-Sync's bound does not reach. Rank 3 means the problem is infeasible and the SLAM map is corrupt.

Chapter 8: Showcase — Robust SLAM Live

A 2D SLAM loop with adjustable outlier injection and method comparison. Adjust the number of outlier loop closures, pick a solver, and see the resulting map. Which constraints get rejected? How bad does plain LS get before robust methods kick in?

Robust SLAM Showcase: Map Quality vs. Outlier Rate

Left: estimated trajectory (colored by solver). Right: which loop closures were trusted (teal) vs rejected (red). Tune outlier fraction and solver.

Outlier % 20% Solver

Chapter 9: Frontiers & VNAV Series Cheat Sheet

Where the field is going

Dense & Neural Mapping. Classic SLAM stores landmarks as points. NeRF (Neural Radiance Fields) and 3D Gaussian Splatting store maps as neural volumetric representations — differentiable, dense, photorealistic. iNeRF-SLAM, NeRF-SLAM, Gaussian-SLAM: optimize poses and neural map jointly. Challenge: slow training, not yet real-time without GPU.

Learned Front-Ends. Replace hand-crafted SIFT (L6) with SuperPoint (self-supervised), replace BoW (L12) with NetVLAD or DINOv2 features. These generalize across illumination/weather changes that kill traditional descriptors.

Semantic & Object SLAM. The map isn't just geometry — it knows "this is a chair, this is a door." Object-level SLAM: poses are tracked relative to recognized objects, giving long-term re-localization even as furniture moves. Challenges: open-vocabulary object detection, dynamic objects.

Multi-Robot & Lifelong SLAM. A fleet of robots must merge maps, handle diverged estimates, and remain consistent across months of operation. Distributed SLAM: each robot solves locally, consensus algorithms (ADMM, distributed GNC) merge globally without a central server. Lifelong SLAM: the map must update as the world changes, not just accumulate.

Other Sensors. LiDAR: dense depth at every scan, no illumination issues — LOAM, LIO-SAM extend the factor-graph backend to 3D point-cloud registration factors. Event cameras: microsecond-latency pixel events instead of frames, ideal for fast motion and HDR — still an open research problem for SLAM.

Open Challenges. Robustness at high outlier rates (>70%), dynamic environments (moving people), scalability to city-scale maps, tight real-time integration of dense neural representations with optimization-based back-ends.

VNAV Complete Pipeline Cheat Sheet

Lesson	Topic	Key formula / concept	Link
L1	3D Geometry	SO(3), SE(3), T=exp(ξ̂)	L1
L2	Lie Groups	Exp/Log, BCH, retraction ⊕	L2
L3	Control	PID, LQR (Riccati P=A^TP+PA−PBR⁻¹B^TP+Q), geometric attitude	L3
L4	Traj. Opt.	Min-snap QP, differential flatness	L4
L5	Image Formation	p = K[R\|t]P, pinhole model	L5
L6	Features	Harris corner, SIFT, KLT optical flow	L6
L7	Two-View Geom.	E=[t]_×R, essential matrix, 8-point, triangulation	L7
L8	RANSAC	N=log(1−p)/log(1−w^s), front-end outlier rejection	L8
L9	NLLS	Gauss-Newton δ = −(J^TΩJ)⁻¹J^TΩr, LM	L9
L10	Manifold Opt.	Retract: T ← T·exp(δ), on-manifold GN	L10
L11	VO/VIO	Preintegration ΔR_ij, Δv_ij, Δp_ij; tight VIO fusion	L11
L12	Place Recog.	BoW+TF-IDF, vocab tree, RANSAC verify, false-positive risk	L12
L13	SLAM Back-End	Factor graph, H=J^TΩJ (sparse), Cholesky, iSAM2, Schur complement	L13
L14	Robust SLAM	M-estimators, IRLS, GNC, switchable constraints, SE-Sync	This lesson

The full SLAM pipeline in one picture

Sensors

Camera (L5), IMU (L11), LiDAR — raw pixels, acceleration, point clouds

↓

Front-End

Feature detect & track (L6) → two-view geometry (L7) → RANSAC clean (L8) → relative poses

↓

VIO Odometry

Preintegrate IMU (L11) + visual landmarks → dead-reckoning trajectory with bounded drift

↓

Place Recognition

BoW retrieval (L12) + geometric verify → loop-closure candidates (with possible false positives!)

↓

PCM Filter

Pairwise consistency → remove most outlier loop closures before they enter the graph (L14 Ch6)

↓

Robust Back-End

Factor graph (L13) + GNC/IRLS (L14) → globally consistent map despite remaining outliers

↓

Certify

SE-Sync (L14 Ch7) → certificate of global optimality or trigger re-optimization

Related Gleams across the site

The SLAM / robust estimation thread connects to other series. From the Estimation section: Bayes Filter (the filtering vs smoothing distinction), Extended Kalman Filter (the original sequential SLAM approach), Modern SLAM (systems-level: ORB-SLAM3, VINS-Mono). From AI Architectures: NeRF & 3D Gaussian Splatting (the dense neural mapping frontier). From Decision/Control: Model-Based RL (the robot needs this map for planning).

The Feynman closure: "What I cannot create, I do not understand." You have now built every piece of the visual SLAM pipeline from scratch — the geometry, the optimization, the sparse linear algebra, the place recognition, and the robustness machinery. The next step is to implement it. Take GTSAM or g2o, synthesize a pose graph with injected outliers, run GNC, and check if SE-Sync certifies the result. That is the capstone exercise of this series.

python
# Squared vs Huber vs Cauchy vs TLS cost comparison
import numpy as np

def rho_sq(u): return 0.5 * u**2
def rho_huber(u, k=1.0):
    return np.where(np.abs(u) <= k,
                    0.5*u**2,
                    k*(np.abs(u) - k/2))
def rho_cauchy(u, c=1.0):
    return (0.5*c**2) * np.log(1 + (u/c)**2)
def rho_tls(u, c=1.0):
    return np.minimum(u**2, c**2)

# IRLS weight functions: w(u) = rho'(u) / u
def w_huber(u, k=1.0):
    return np.where(np.abs(u) <= k, 1.0, k/np.abs(u))
def w_cauchy(u, c=1.0):
    return 1.0 / (1 + (u/c)**2)
def w_tls(u, c=1.0):
    return np.where(np.abs(u) <= c, 1.0, 0.0)

# IRLS on 1D data with one outlier
z = np.array([0.1, -0.2, 0.15, -0.1, 5.0])  # last = outlier
x = 0.0  # initial guess
for it in range(10):
    u = (z - x)   # residuals (sigma=1 assumed)
    w = w_cauchy(u, c=1.0)
    x = np.sum(w * z) / np.sum(w)
    print(f"iter {it}: x={x:.4f}, w_outlier={w[-1]:.4f}")

# GNC-TLS schedule
def gnc_tls_weight(u, c=1.0, mu=1.0):
    # Yang et al. 2020 Eq. (11)
    num = c**2 * mu
    denom = (mu + 1) * c**2 + u**2
    rho_mu = num * u**2 / denom
    drho = (2 * num * u * denom - num * u**2 * 2 * u) / denom**2
    w = drho / (2 * u) if abs(u) > 1e-9 else num / denom
    return max(0, w)

mu = 1.0
x = 0.0
for step in range(20):
    u = z - x
    w = np.array([gnc_tls_weight(ui, c=1.0, mu=mu) for ui in u])
    x = np.sum(w * z) / (np.sum(w) + 1e-9)
    mu *= 1.4
    print(f"mu={mu:.1f} x={x:.4f} w_out={w[-1]:.4f}")