You spent L13 building the perfect sparse factor graph. One wrong loop closure — one single bad match from L12's place-recognition system — enters that graph and, because least-squares squares the error, it can fold your entire 2 km map. The outlier problem inside the global optimizer is distinct from RANSAC's front-end role. This lesson teaches how to make SLAM ignore the liars, with no oracle telling you which measurements are trustworthy. Then: the open frontiers where the field is moving right now. MIT 16.485 Lectures 29–30 by Luca Carlone.
The SLAM back-end from L13 is beautiful. It encodes every measurement as a factor, exploits the resulting sparsity, and solves in milliseconds via sparse Cholesky. The loop closure from L12 snaps the 2 km trajectory into a globally consistent map. This is the happy path.
Now imagine a single bad thing: L12's bag-of-words retrieval finds a visually similar but geometrically wrong match — a false positive. It says "pose Tk is back at pose T1," but it isn't. A loop-closure factor with a wrong relative-pose measurement enters the factor graph.
What happens? The optimizer minimizes the sum of squared residuals. The true inlier measurements generate small residuals. The single wrong loop closure generates a huge residual — and because the cost is squared, that one outlier contributes a cost that swamps all the true measurements combined. The optimizer stretches the entire trajectory to satisfy the lie. Your perfectly calibrated map folds into something unusable.
Three poses in 1D. True positions: x1=0, x2=1, x3=2. Odometry measurements: z12=1.0, z23=1.0 (perfect). One inlier loop closure says x3−x1≈2.0 (correct). One outlier loop closure says x3−x1≈0.0 (the lie: poses are at the same spot).
Standard least-squares cost with σ=0.1 for odometry, σ=0.05 for loop closures:
The last term has information weight 400 (=1/0.0025). Even with x1=0, x3=2, that term contributes 400×4=1600 cost units. The optimizer will move x1 and x3 toward each other to reduce it, pulling x3 down toward 0 and x1 up. The inlier odometry (weight 100 each) loses the tug-of-war. Final: x1≈0.8, x2≈1.0, x3≈1.2 — the trajectory has compressed to half its true extent. The outlier wins.
Five-pose 2D trajectory. Toggle the outlier loop closure and watch least-squares deform the map.
The squared cost is pathological for outliers because its derivative — the "influence" a measurement exerts on the solution — grows without bound as the residual grows. A measurement 10σ away pulls 100 times harder than one 1σ away. The fix: replace the squared cost with a function that saturates for large residuals. These are called M-estimators (maximum-likelihood-type estimators), introduced by Huber in 1964.
The general robust SLAM objective replaces each squared term with a robust cost ρ:
where ‖r‖Ω = (rTΩr)1⁄2 is the Mahalanobis norm. The scalar u = ‖r‖Σ−1 is the whitened residual. We choose ρ(u) to grow slowly for large u.
The influence function ψ(u) = ρ′(u) tells you how much a measurement with whitened residual u can perturb the solution. For squared cost, ρ(u)=u2⁄2, so ψ(u)=u — unbounded. For a well-designed robust cost, ψ(u) is bounded: no single measurement can pull the solution by more than a fixed amount, no matter how large its residual. This is the key property that rejects outliers.
The weight function w(u) = ψ(u)⁄u = ρ′(u)⁄u tells you the effective weight multiplied by u2. For squared cost, w(u)=1 (all equal weight). For robust costs, w(u) decreases as u grows — high-residual measurements get down-weighted. This connects M-estimators to the IRLS algorithm in Chapter 3.
Four robust costs appear throughout SLAM literature. Each represents a different tradeoff between how aggressively it down-weights outliers and how convex (and thus solver-friendly) it remains.
No down-weighting. Unbounded influence. Optimal when ALL measurements are inliers from a Gaussian distribution.
Below threshold k: squared (full weight). Above k: linear growth, so influence ψ(u)=k (capped). The weight function w(u)=k/|u| decays as 1/u for large residuals. Convex everywhere — the optimizer is well-behaved but influence only saturates (linearly), not truly bounded.
Influence is bounded and actually decreases for very large residuals. A measurement at u=100c has near-zero weight. Non-convex for u>c/√2, introducing the local-minima problem (Chapter 4).
Cost is literally constant beyond threshold c: w(u)=0 for |u|>c. The measurement is completely ignored — hard zeroing of the outlier. Highly non-convex. Most aggressive rejection, most severe local-minima problem. But when GNC (Chapter 5) solves the non-convexity, TLS gives the cleanest separation of inliers from outliers.
| Cost | ρ(u=1) | ρ(u=10) | Ratio ρ(10)/ρ(1) | w(u=10) |
|---|---|---|---|---|
| Squared | 0.5 | 50 | 100× | 1.0 |
| Huber (k=1) | 0.5 | 9.5 | 19× | 0.10 |
| Cauchy (c=1) | 0.347 | 2.40 | 6.9× | 0.0099 |
| TLS (c=1) | 1.0 | 1.0 (capped) | 1× | 0 |
The squared cost at u=10 is 100 times worse than at u=1; TLS doesn't even notice. This is the quantitative version of "outliers dominate".
Left: cost ρ(u). Right: influence ψ(u)=ρ′(u). Drag the kernel-parameter slider to watch how the threshold separates inliers from outliers.
Minimizing a robust cost function ∑ρ(ri) looks like a completely different beast from the NLLS machinery of L9. But there is an elegant bridge: Iteratively Reweighted Least Squares (IRLS).
The key observation: the M-estimator gradient condition ∂f⁄∂θ = 0 can be rewritten as a weighted least-squares normal equation with weights wi = ρ′(ui)⁄ui. This gives us an iterative algorithm:
The inner WLS step uses exactly the sparse Cholesky machinery from L13 — just with modified diagonal weights. Each outer iteration re-computes weights based on current residuals, gradually down-weighting high-residual (outlier) measurements.
Five measurements of a single scalar x. True value x=0. Four inliers: z1..4 = {0.1, -0.2, 0.15, -0.1} with σ=0.3. One outlier: z5=5.0 with the same σ=0.3 (whitened residual |z5-x|/0.3≈16 — enormous).
Iteration 0 (plain LS start): Weights all wi=1. Solution x=(0.1-0.2+0.15-0.1+5.0)/5=0.99. Clearly pulled by the outlier.
Iteration 1 (Cauchy, c=1): Compute residuals at x=0.99: r1..4≈0.89..1.19 (whitened: ~3-4), r5=4.01 (whitened: ~13.4). Weights wi=1/(1+ui2/c2): w1..4≈0.06..0.10, w5=1/(1+179)≈0.0056. New WLS: x = (∑ wizi)⁄(∑ wi). Sum of wizi for inliers: ~0.008+(-0.016)+0.012+(-0.008)=-0.004. For outlier: 0.0056×5=0.028. Denominator: ~0.078+0.0056=0.084. x≈0.028⁄0.084=0.28. Much better — outlier weight is 10× smaller than each inlier.
After 3-5 iterations: x converges to ≈0.01. The outlier's weight falls below 0.001. The inlier consensus wins.
Click Step to run one IRLS iteration. Watch the outlier's weight (red dot) shrink and the estimated mean converge to the inlier consensus.
There is a catch. Robust cost functions like Cauchy and TLS are non-convex: the cost landscape has multiple local minima. Standard gradient-based optimizers (Gauss-Newton, IRLS) converge to whichever local minimum they start near. This is a problem because:
Ironically, a robust cost can sometimes give a worse solution than plain least-squares if the initial estimate is poor. IRLS is not magic — it's a local optimizer.
Consider a 1D SLAM problem with one strong inlier constraint pulling x toward 0 and one strong outlier constraint pulling x toward 10. With squared cost: two bowls whose gradients always oppose each other — the optimizer finds the weighted average. With TLS: two flat floors with steep walls. If you start near x=10, you are in the outlier's bowl. IRLS converges to x=10. The inlier bowl at x=0 exists but you never reach it.
Total cost landscape (inlier at 0 + outlier at 10) under different cost functions. Drag the init slider to see where gradient descent converges.
Graduated Non-Convexity (GNC) is Carlone's answer to the local-minima problem. The core idea: instead of optimizing the non-convex robust cost directly, start with a convex surrogate and gradually "turn up" the non-convexity, annealing toward the true cost. Each step of this schedule is a small perturbation of a problem you just solved — so the solution tracks the global optimum.
Define a family of costs ρμ(u) parameterized by a control parameter μ such that:
Algorithm: solve at μ0, warm-start at μ1>μ0 using the previous solution, repeat until μ=μ∞.
For truncated least squares with threshold c2, define the GNC surrogate (Yang et al. 2020, Carlone group):
At μ→∞: ρμ→c2u2⁄u2=c2 (TLS plateau). At μ=1: ρ1(u)=c2u2⁄(2c2+u2) — smoother, more convex. Schedule: multiply μ by a factor μ = μ×1.4 each step. With ~20 steps, we go from convex to effectively TLS.
At μ=μ0 (convex surrogate), there is only one basin — the global optimum is the inlier consensus, because convexity guarantees uniqueness. As μ increases, the outlier basin gradually forms, but the solution is already in the inlier basin and tracks it through each small step. The outlier basin never "captures" the solution because each step's perturbation is small relative to the basin width.
One inlier at z=0 (σ=1), one outlier at z=10 (σ=1). TLS threshold c=2. Start μ=1, multiply by 1.5 each step.
| μ | winlier(z=0) | woutlier(z=10) | Estimate x* |
|---|---|---|---|
| 1 | 0.80 | 0.04 | 0.48 |
| 1.5 | 0.86 | 0.02 | 0.21 |
| 2.25 | 0.92 | 0.008 | 0.08 |
| 10 | 0.99 | 0.0004 | 0.004 |
| ∞ (TLS) | 1.0 | 0 | 0.0 |
The estimate converges to the inlier truth from any start because the convex phase pinned the solution to the correct basin.
The cost landscape at increasing μ. Watch the outlier basin (right) form while the solution (orange dot) tracks the inlier minimum (left). Drag μ manually or click Anneal.
GNC is not the only approach. Two complementary families attack the same problem from different angles.
Introduced by Sünderhauf & Protzel (2012). Each loop-closure factor gets a continuous "switch variable" si ∈ [0,1] that interpolates between "fully active" (si=1) and "fully off" (si=0). The augmented objective:
The last term is a regularizer penalizing sk away from 1 (switching off has a cost). The optimizer simultaneously solves for all poses AND all switch variables. A genuine inlier loop closure is easy to satisfy — its switch variable stays near 1. An outlier generates a large residual at sk=1 — the optimizer prefers to reduce sk toward 0, paying only the small regularizer cost. Outliers are "switched off" automatically.
The augmented state is larger (one extra scalar per loop closure), but the problem structure is still sparse. Standard sparse solvers handle it. The switch variables provide explicit "which loop closures did we trust?" output.
Olson & Agarwal (2012) model each loop-closure measurement as a mixture of two Gaussians: one tight (inlier model) and one very broad (outlier model). The cost becomes −log max(πin·N(r;0,Σin), πout·N(r;0,Σout)). The max approximation keeps the problem sparse; effectively the inlier Gaussian dominates for small residuals, the outlier Gaussian for large ones.
A pre-optimization front-end filter developed by Mangelson et al. (2018). Before adding any loop closures to the optimizer, PCM builds a "consistency graph": nodes are candidate loop closures, edges connect pairs that are geometrically consistent with each other (their combined relative-pose forms a plausible loop). Outlier loop closures tend to be inconsistent with the true inliers. PCM finds the maximum clique in the consistency graph — the largest set of mutually consistent loop closures — and only passes those to the back-end optimizer.
GNC gives empirically excellent results, but how do you know the solution you got is actually the global optimum? Can you prove it? This is the question of certifiable perception — one of Carlone's central research contributions.
Rosen et al. (2019) showed that the pose graph optimization problem (SLAM with only relative-pose measurements, no landmarks) can be reformulated as a semidefinite program (SDP). SDPs are convex — they can be solved globally. The catch: the original problem has non-convex constraints (rotation matrices must lie in SO(3)). SE-Sync relaxes these constraints to get an SDP, solves it globally, then checks if the relaxation is tight (i.e., the SDP solution is also feasible for the original problem).
The SDP gives a dual certificate: a proof that no better solution exists. Specifically, if the relaxation is tight, the dual variable (a PSD matrix called the certificate matrix) has rank 1 — meaning the global optimum of the relaxed problem coincides with a valid rotation solution. When the rank is 1, you have a certificate. When rank > 1, the relaxation is not tight and you know you need to look further.
SE-Sync is not meant to replace iSAM2 for real-time operation — one SDP solve is slow (seconds to minutes for large graphs). Instead it is used as a verification step: run iSAM2 in real-time, periodically run SE-Sync offline to certify that the accumulated map is globally optimal. If certification fails, trigger a re-optimization with GNC. Think of it as a "sanity check" layer.
A 2D SLAM loop with adjustable outlier injection and method comparison. Adjust the number of outlier loop closures, pick a solver, and see the resulting map. Which constraints get rejected? How bad does plain LS get before robust methods kick in?
Left: estimated trajectory (colored by solver). Right: which loop closures were trusted (teal) vs rejected (red). Tune outlier fraction and solver.
Dense & Neural Mapping. Classic SLAM stores landmarks as points. NeRF (Neural Radiance Fields) and 3D Gaussian Splatting store maps as neural volumetric representations — differentiable, dense, photorealistic. iNeRF-SLAM, NeRF-SLAM, Gaussian-SLAM: optimize poses and neural map jointly. Challenge: slow training, not yet real-time without GPU.
Learned Front-Ends. Replace hand-crafted SIFT (L6) with SuperPoint (self-supervised), replace BoW (L12) with NetVLAD or DINOv2 features. These generalize across illumination/weather changes that kill traditional descriptors.
Semantic & Object SLAM. The map isn't just geometry — it knows "this is a chair, this is a door." Object-level SLAM: poses are tracked relative to recognized objects, giving long-term re-localization even as furniture moves. Challenges: open-vocabulary object detection, dynamic objects.
Multi-Robot & Lifelong SLAM. A fleet of robots must merge maps, handle diverged estimates, and remain consistent across months of operation. Distributed SLAM: each robot solves locally, consensus algorithms (ADMM, distributed GNC) merge globally without a central server. Lifelong SLAM: the map must update as the world changes, not just accumulate.
Other Sensors. LiDAR: dense depth at every scan, no illumination issues — LOAM, LIO-SAM extend the factor-graph backend to 3D point-cloud registration factors. Event cameras: microsecond-latency pixel events instead of frames, ideal for fast motion and HDR — still an open research problem for SLAM.
Open Challenges. Robustness at high outlier rates (>70%), dynamic environments (moving people), scalability to city-scale maps, tight real-time integration of dense neural representations with optimization-based back-ends.
| Lesson | Topic | Key formula / concept | Link |
|---|---|---|---|
| L1 | 3D Geometry | SO(3), SE(3), T=exp(ξ̂) | L1 |
| L2 | Lie Groups | Exp/Log, BCH, retraction ⊕ | L2 |
| L3 | Control | PID, LQR (Riccati P=ATP+PA−PBR−1BTP+Q), geometric attitude | L3 |
| L4 | Traj. Opt. | Min-snap QP, differential flatness | L4 |
| L5 | Image Formation | p = K[R|t]P, pinhole model | L5 |
| L6 | Features | Harris corner, SIFT, KLT optical flow | L6 |
| L7 | Two-View Geom. | E=[t]×R, essential matrix, 8-point, triangulation | L7 |
| L8 | RANSAC | N=log(1−p)/log(1−ws), front-end outlier rejection | L8 |
| L9 | NLLS | Gauss-Newton δ = −(JTΩJ)−1JTΩr, LM | L9 |
| L10 | Manifold Opt. | Retract: T ← T·exp(δ), on-manifold GN | L10 |
| L11 | VO/VIO | Preintegration ΔRij, Δvij, Δpij; tight VIO fusion | L11 |
| L12 | Place Recog. | BoW+TF-IDF, vocab tree, RANSAC verify, false-positive risk | L12 |
| L13 | SLAM Back-End | Factor graph, H=JTΩJ (sparse), Cholesky, iSAM2, Schur complement | L13 |
| L14 | Robust SLAM | M-estimators, IRLS, GNC, switchable constraints, SE-Sync | This lesson |
The SLAM / robust estimation thread connects to other series. From the Estimation section: Bayes Filter (the filtering vs smoothing distinction), Extended Kalman Filter (the original sequential SLAM approach), Modern SLAM (systems-level: ORB-SLAM3, VINS-Mono). From AI Architectures: NeRF & 3D Gaussian Splatting (the dense neural mapping frontier). From Decision/Control: Model-Based RL (the robot needs this map for planning).
python # Squared vs Huber vs Cauchy vs TLS cost comparison import numpy as np def rho_sq(u): return 0.5 * u**2 def rho_huber(u, k=1.0): return np.where(np.abs(u) <= k, 0.5*u**2, k*(np.abs(u) - k/2)) def rho_cauchy(u, c=1.0): return (0.5*c**2) * np.log(1 + (u/c)**2) def rho_tls(u, c=1.0): return np.minimum(u**2, c**2) # IRLS weight functions: w(u) = rho'(u) / u def w_huber(u, k=1.0): return np.where(np.abs(u) <= k, 1.0, k/np.abs(u)) def w_cauchy(u, c=1.0): return 1.0 / (1 + (u/c)**2) def w_tls(u, c=1.0): return np.where(np.abs(u) <= c, 1.0, 0.0) # IRLS on 1D data with one outlier z = np.array([0.1, -0.2, 0.15, -0.1, 5.0]) # last = outlier x = 0.0 # initial guess for it in range(10): u = (z - x) # residuals (sigma=1 assumed) w = w_cauchy(u, c=1.0) x = np.sum(w * z) / np.sum(w) print(f"iter {it}: x={x:.4f}, w_outlier={w[-1]:.4f}") # GNC-TLS schedule def gnc_tls_weight(u, c=1.0, mu=1.0): # Yang et al. 2020 Eq. (11) num = c**2 * mu denom = (mu + 1) * c**2 + u**2 rho_mu = num * u**2 / denom drho = (2 * num * u * denom - num * u**2 * 2 * u) / denom**2 w = drho / (2 * u) if abs(u) > 1e-9 else num / denom return max(0, w) mu = 1.0 x = 0.0 for step in range(20): u = z - x w = np.array([gnc_tls_weight(ui, c=1.0, mu=mu) for ui in u]) x = np.sum(w * z) / (np.sum(w) + 1e-9) mu *= 1.4 print(f"mu={mu:.1f} x={x:.4f} w_out={w[-1]:.4f}")