Visual Navigation for Autonomous Vehicles · MIT 16.485 · Lecture 29–30

Robust SLAM & Frontiers

You spent L13 building the perfect sparse factor graph. One wrong loop closure — one single bad match from L12's place-recognition system — enters that graph and, because least-squares squares the error, it can fold your entire 2 km map. The outlier problem inside the global optimizer is distinct from RANSAC's front-end role. This lesson teaches how to make SLAM ignore the liars, with no oracle telling you which measurements are trustworthy. Then: the open frontiers where the field is moving right now. MIT 16.485 Lectures 29–30 by Luca Carlone.

Prerequisites: VNAV L9 (NLLS) · VNAV L10 (manifold opt.) · VNAV L12 (place recognition) · VNAV L13 (factor graphs).
10
Chapters
5
Live Canvases
Series
Capstone

Chapter 0: The One Lie That Folds Your Map

The SLAM back-end from L13 is beautiful. It encodes every measurement as a factor, exploits the resulting sparsity, and solves in milliseconds via sparse Cholesky. The loop closure from L12 snaps the 2 km trajectory into a globally consistent map. This is the happy path.

Now imagine a single bad thing: L12's bag-of-words retrieval finds a visually similar but geometrically wrong match — a false positive. It says "pose Tk is back at pose T1," but it isn't. A loop-closure factor with a wrong relative-pose measurement enters the factor graph.

What happens? The optimizer minimizes the sum of squared residuals. The true inlier measurements generate small residuals. The single wrong loop closure generates a huge residual — and because the cost is squared, that one outlier contributes a cost that swamps all the true measurements combined. The optimizer stretches the entire trajectory to satisfy the lie. Your perfectly calibrated map folds into something unusable.

The outlier problem at the back-end is fundamentally different from RANSAC (L8). RANSAC operates in the front-end on a single two-view model at a time — it finds one clean set of inliers for one pose estimate. Robust SLAM must reject outliers inside the joint global optimization over thousands of variables and all measurements simultaneously. RANSAC cannot help you here.

A concrete catastrophe — worked numbers

Three poses in 1D. True positions: x1=0, x2=1, x3=2. Odometry measurements: z12=1.0, z23=1.0 (perfect). One inlier loop closure says x3−x1≈2.0 (correct). One outlier loop closure says x3−x1≈0.0 (the lie: poses are at the same spot).

Standard least-squares cost with σ=0.1 for odometry, σ=0.05 for loop closures:

f(θ) = ((x2−x1−1)2/0.01) + ((x3−x2−1)2/0.01) + ((x3−x1−2)2/0.0025) + ((x3−x1−0)2/0.0025)

The last term has information weight 400 (=1/0.0025). Even with x1=0, x3=2, that term contributes 400×4=1600 cost units. The optimizer will move x1 and x3 toward each other to reduce it, pulling x3 down toward 0 and x1 up. The inlier odometry (weight 100 each) loses the tug-of-war. Final: x1≈0.8, x2≈1.0, x3≈1.2 — the trajectory has compressed to half its true extent. The outlier wins.

One Wrong Loop Closure Folds the Map

Five-pose 2D trajectory. Toggle the outlier loop closure and watch least-squares deform the map.

Why does a single outlier loop closure corrupt the ENTIRE map in standard least-squares SLAM, rather than affecting only the two poses it directly links?

Chapter 1: M-Estimators: Replace the Square

The squared cost is pathological for outliers because its derivative — the "influence" a measurement exerts on the solution — grows without bound as the residual grows. A measurement 10σ away pulls 100 times harder than one 1σ away. The fix: replace the squared cost with a function that saturates for large residuals. These are called M-estimators (maximum-likelihood-type estimators), introduced by Huber in 1964.

The general robust SLAM objective replaces each squared term with a robust cost ρ:

θ* = argminθ(i,j) ρ(‖rij(θ)‖Σij−1)

where ‖r‖Ω = (rTΩr)1⁄2 is the Mahalanobis norm. The scalar u = ‖r‖Σ−1 is the whitened residual. We choose ρ(u) to grow slowly for large u.

The influence function: how hard can a measurement pull?

The influence function ψ(u) = ρ′(u) tells you how much a measurement with whitened residual u can perturb the solution. For squared cost, ρ(u)=u2⁄2, so ψ(u)=u — unbounded. For a well-designed robust cost, ψ(u) is bounded: no single measurement can pull the solution by more than a fixed amount, no matter how large its residual. This is the key property that rejects outliers.

Bounded influence = outlier rejection. If the maximum influence any measurement can exert is capped at some constant, then a single outlier with a huge residual cannot dominate the solution. The inlier consensus — many measurements each pulling with moderate force — wins. This is the fundamental principle behind all robust estimators.

Weight function

The weight function w(u) = ψ(u)⁄u = ρ′(u)⁄u tells you the effective weight multiplied by u2. For squared cost, w(u)=1 (all equal weight). For robust costs, w(u) decreases as u grows — high-residual measurements get down-weighted. This connects M-estimators to the IRLS algorithm in Chapter 3.

What property of a robust cost function ρ(u) guarantees that an outlier measurement with residual u=100 cannot dominate the SLAM solution over 1000 inliers each with residual u≈1?

Chapter 2: The Four Cost Functions

Four robust costs appear throughout SLAM literature. Each represents a different tradeoff between how aggressively it down-weights outliers and how convex (and thus solver-friendly) it remains.

Squared (standard LS) — baseline

ρ(u) = u2⁄2    ψ(u) = u    w(u) = 1

No down-weighting. Unbounded influence. Optimal when ALL measurements are inliers from a Gaussian distribution.

Huber loss — piecewise quadratic/linear

ρH(u) = u2⁄2 if |u|≤k,   k(|u|−k⁄2) if |u|>k

Below threshold k: squared (full weight). Above k: linear growth, so influence ψ(u)=k (capped). The weight function w(u)=k/|u| decays as 1/u for large residuals. Convex everywhere — the optimizer is well-behaved but influence only saturates (linearly), not truly bounded.

Cauchy (Lorentzian) loss — truly bounded influence

ρC(u) = (c2⁄2) ln(1 + u2⁄c2)    ψ(u) = u ⁄ (1 + u2⁄c2)    w(u) = 1 ⁄ (1 + u2⁄c2)

Influence is bounded and actually decreases for very large residuals. A measurement at u=100c has near-zero weight. Non-convex for u>c/√2, introducing the local-minima problem (Chapter 4).

Truncated Least Squares (TLS) — hard rejection

ρTLS(u) = u2 if u≤c,   c2 if u>c

Cost is literally constant beyond threshold c: w(u)=0 for |u|>c. The measurement is completely ignored — hard zeroing of the outlier. Highly non-convex. Most aggressive rejection, most severe local-minima problem. But when GNC (Chapter 5) solves the non-convexity, TLS gives the cleanest separation of inliers from outliers.

Hand-computed comparison at u=1 and u=10 (k=c=1)

Costρ(u=1)ρ(u=10)Ratio ρ(10)/ρ(1)w(u=10)
Squared0.550100×1.0
Huber (k=1)0.59.519×0.10
Cauchy (c=1)0.3472.406.9×0.0099
TLS (c=1)1.01.0 (capped)0

The squared cost at u=10 is 100 times worse than at u=1; TLS doesn't even notice. This is the quantitative version of "outliers dominate".

Robust Cost Shapes & Influence Functions

Left: cost ρ(u). Right: influence ψ(u)=ρ′(u). Drag the kernel-parameter slider to watch how the threshold separates inliers from outliers.

Kernel κ 1.00
At whitened residual u=5 with kernel parameter c=1, rank these costs from LARGEST to SMALLEST: Squared, Huber, Cauchy, TLS.

Chapter 3: IRLS: Turning Robust Costs into Weighted LS

Minimizing a robust cost function ∑ρ(ri) looks like a completely different beast from the NLLS machinery of L9. But there is an elegant bridge: Iteratively Reweighted Least Squares (IRLS).

The key observation: the M-estimator gradient condition ∂f⁄∂θ = 0 can be rewritten as a weighted least-squares normal equation with weights wi = ρ′(ui)⁄ui. This gives us an iterative algorithm:

Initialize
Start from some θ0 (e.g., odometry chain without loop closures)
Compute residuals
ri = rit) for all measurements i
Compute weights
wi ← ρ′(‖ri‖)⁄‖ri‖ using current residuals
Solve weighted LS
θt+1 = argminθ ∑ wi ‖ri(θ)‖2 — sparse Cholesky from L13!
↻ repeat until convergence

The inner WLS step uses exactly the sparse Cholesky machinery from L13 — just with modified diagonal weights. Each outer iteration re-computes weights based on current residuals, gradually down-weighting high-residual (outlier) measurements.

Worked 1D IRLS example

Five measurements of a single scalar x. True value x=0. Four inliers: z1..4 = {0.1, -0.2, 0.15, -0.1} with σ=0.3. One outlier: z5=5.0 with the same σ=0.3 (whitened residual |z5-x|/0.3≈16 — enormous).

Iteration 0 (plain LS start): Weights all wi=1. Solution x=(0.1-0.2+0.15-0.1+5.0)/5=0.99. Clearly pulled by the outlier.

Iteration 1 (Cauchy, c=1): Compute residuals at x=0.99: r1..4≈0.89..1.19 (whitened: ~3-4), r5=4.01 (whitened: ~13.4). Weights wi=1/(1+ui2/c2): w1..4≈0.06..0.10, w5=1/(1+179)≈0.0056. New WLS: x = (∑ wizi)⁄(∑ wi). Sum of wizi for inliers: ~0.008+(-0.016)+0.012+(-0.008)=-0.004. For outlier: 0.0056×5=0.028. Denominator: ~0.078+0.0056=0.084. x≈0.028⁄0.084=0.28. Much better — outlier weight is 10× smaller than each inlier.

After 3-5 iterations: x converges to ≈0.01. The outlier's weight falls below 0.001. The inlier consensus wins.

RANSAC (L8) cleans the front-end one model at a time; IRLS rejects outliers inside the joint global optimization. IRLS never explicitly labels any measurement as an outlier — it just gives it a tiny weight. The information from inliers dominates automatically. This is the key distinction from the front-end.
IRLS Reweighting Steps on a 1D Fit

Click Step to run one IRLS iteration. Watch the outlier's weight (red dot) shrink and the estimated mean converge to the inlier consensus.

In IRLS iteration 2, measurement z5=5.0 (outlier) has whitened residual u=13.4 and Cauchy kernel c=1, giving weight w=1/(1+179)≈0.006. Four inliers each have w≈0.08. How will the estimate change in iteration 2 compared to iteration 1?

Chapter 4: The Non-Convexity Trap

There is a catch. Robust cost functions like Cauchy and TLS are non-convex: the cost landscape has multiple local minima. Standard gradient-based optimizers (Gauss-Newton, IRLS) converge to whichever local minimum they start near. This is a problem because:

  1. If initialized near a local minimum dominated by the outlier, the optimizer stays there.
  2. The "correct" minimum (where inliers agree) may be far in configuration space from the outlier-dominated start.
  3. Unlike the convex squared-cost case, there is no guarantee that gradient descent finds the global optimum.

Ironically, a robust cost can sometimes give a worse solution than plain least-squares if the initial estimate is poor. IRLS is not magic — it's a local optimizer.

Robust costs are non-convex — naively optimizing one can land in a worse local minimum than plain least-squares. If your initialization (typically: the VO chain without loop closures) happens to be near the outlier's local minimum, IRLS will happily converge there and hand you a corrupted map. This is why gradient methods alone are insufficient for robust SLAM. You need either a good initialization OR an algorithm that escapes local minima.

The 1D intuition: two bowls

Consider a 1D SLAM problem with one strong inlier constraint pulling x toward 0 and one strong outlier constraint pulling x toward 10. With squared cost: two bowls whose gradients always oppose each other — the optimizer finds the weighted average. With TLS: two flat floors with steep walls. If you start near x=10, you are in the outlier's bowl. IRLS converges to x=10. The inlier bowl at x=0 exists but you never reach it.

Non-Convexity and Local Minima in Robust Costs

Total cost landscape (inlier at 0 + outlier at 10) under different cost functions. Drag the init slider to see where gradient descent converges.

Init x0 1.0 Cost
A SLAM problem has 20 inlier loop closures centered at x=0 and 1 outlier loop closure at x=10. With TLS cost and initialization at x=8, where does gradient descent converge?

Chapter 5: Graduated Non-Convexity (GNC)

Graduated Non-Convexity (GNC) is Carlone's answer to the local-minima problem. The core idea: instead of optimizing the non-convex robust cost directly, start with a convex surrogate and gradually "turn up" the non-convexity, annealing toward the true cost. Each step of this schedule is a small perturbation of a problem you just solved — so the solution tracks the global optimum.

The GNC schedule

Define a family of costs ρμ(u) parameterized by a control parameter μ such that:

Algorithm: solve at μ0, warm-start at μ10 using the previous solution, repeat until μ=μ.

GNC-TLS: the Carlone construction

For truncated least squares with threshold c2, define the GNC surrogate (Yang et al. 2020, Carlone group):

ρμ(u) = c2μu2 ⁄ ((μ+1)c2 + u2),   μ ∈ (1, ∞)

At μ→∞: ρμ→c2u2⁄u2=c2 (TLS plateau). At μ=1: ρ1(u)=c2u2⁄(2c2+u2) — smoother, more convex. Schedule: multiply μ by a factor μ = μ×1.4 each step. With ~20 steps, we go from convex to effectively TLS.

Why GNC rejects outliers without a good initial guess

At μ=μ0 (convex surrogate), there is only one basin — the global optimum is the inlier consensus, because convexity guarantees uniqueness. As μ increases, the outlier basin gradually forms, but the solution is already in the inlier basin and tracks it through each small step. The outlier basin never "captures" the solution because each step's perturbation is small relative to the basin width.

GNC gives outlier-robust SLAM without ANY initial guess beyond the trivial (e.g., identity). This is the key advance over IRLS alone. You do not need VO-initialized poses. A UAV that crashed into an unknown environment can run GNC and get a correct map, even with 50%+ outlier loop closures — as demonstrated in Carlone's experiments.

Worked μ schedule example

One inlier at z=0 (σ=1), one outlier at z=10 (σ=1). TLS threshold c=2. Start μ=1, multiply by 1.5 each step.

μwinlier(z=0)woutlier(z=10)Estimate x*
10.800.040.48
1.50.860.020.21
2.250.920.0080.08
100.990.00040.004
∞ (TLS)1.000.0

The estimate converges to the inlier truth from any start because the convex phase pinned the solution to the correct basin.

GNC: Annealing from Convex to Non-Convex

The cost landscape at increasing μ. Watch the outlier basin (right) form while the solution (orange dot) tracks the inlier minimum (left). Drag μ manually or click Anneal.

μ 1.0
Why does GNC succeed at finding the inlier-consensus solution even when started from a random initial guess near the outlier, while naive IRLS with TLS cost fails?

Chapter 6: Switchable Constraints & Consistency Checks

GNC is not the only approach. Two complementary families attack the same problem from different angles.

Switchable Constraints (SC)

Introduced by Sünderhauf & Protzel (2012). Each loop-closure factor gets a continuous "switch variable" si ∈ [0,1] that interpolates between "fully active" (si=1) and "fully off" (si=0). The augmented objective:

f(θ, s) = ∑odo ‖rij2 + ∑lc sk2‖rk2 + ∑lc λ(1−sk)2

The last term is a regularizer penalizing sk away from 1 (switching off has a cost). The optimizer simultaneously solves for all poses AND all switch variables. A genuine inlier loop closure is easy to satisfy — its switch variable stays near 1. An outlier generates a large residual at sk=1 — the optimizer prefers to reduce sk toward 0, paying only the small regularizer cost. Outliers are "switched off" automatically.

The augmented state is larger (one extra scalar per loop closure), but the problem structure is still sparse. Standard sparse solvers handle it. The switch variables provide explicit "which loop closures did we trust?" output.

Max-Mixtures

Olson & Agarwal (2012) model each loop-closure measurement as a mixture of two Gaussians: one tight (inlier model) and one very broad (outlier model). The cost becomes −log max(πin·N(r;0,Σin), πout·N(r;0,Σout)). The max approximation keeps the problem sparse; effectively the inlier Gaussian dominates for small residuals, the outlier Gaussian for large ones.

Pairwise Consistency Maximization (PCM)

A pre-optimization front-end filter developed by Mangelson et al. (2018). Before adding any loop closures to the optimizer, PCM builds a "consistency graph": nodes are candidate loop closures, edges connect pairs that are geometrically consistent with each other (their combined relative-pose forms a plausible loop). Outlier loop closures tend to be inconsistent with the true inliers. PCM finds the maximum clique in the consistency graph — the largest set of mutually consistent loop closures — and only passes those to the back-end optimizer.

PCM removes outliers BEFORE the optimizer sees them, while GNC removes them INSIDE the optimizer. They are complementary: PCM reduces the problem size and provides a clean initialization; GNC handles any remaining false positives that slip through geometric consistency. Modern systems (Kimera, DOOR-SLAM) combine both.
A switchable-constraints SLAM system processes a loop closure with a large residual (the loop is almost certainly wrong). After optimization, what will the switch variable sk for that loop closure be, and why?

Chapter 7: Certifiable & Global Optimality

GNC gives empirically excellent results, but how do you know the solution you got is actually the global optimum? Can you prove it? This is the question of certifiable perception — one of Carlone's central research contributions.

The SE-Sync idea

Rosen et al. (2019) showed that the pose graph optimization problem (SLAM with only relative-pose measurements, no landmarks) can be reformulated as a semidefinite program (SDP). SDPs are convex — they can be solved globally. The catch: the original problem has non-convex constraints (rotation matrices must lie in SO(3)). SE-Sync relaxes these constraints to get an SDP, solves it globally, then checks if the relaxation is tight (i.e., the SDP solution is also feasible for the original problem).

Certificate of global optimality

The SDP gives a dual certificate: a proof that no better solution exists. Specifically, if the relaxation is tight, the dual variable (a PSD matrix called the certificate matrix) has rank 1 — meaning the global optimum of the relaxed problem coincides with a valid rotation solution. When the rank is 1, you have a certificate. When rank > 1, the relaxation is not tight and you know you need to look further.

Practical result from SE-Sync experiments: on most real-world SLAM datasets (KITTI, EuRoC, MIT Stata, TUM), the relaxation IS tight — meaning the iterative solution (GN/iSAM2) had already found the global optimum, and SE-Sync confirms it. The certificate fails (rank > 1) mainly when there are many outlier loop closures or very high noise — exactly when you need to know you might be in a local minimum.

Practical use in a SLAM pipeline

SE-Sync is not meant to replace iSAM2 for real-time operation — one SDP solve is slow (seconds to minutes for large graphs). Instead it is used as a verification step: run iSAM2 in real-time, periodically run SE-Sync offline to certify that the accumulated map is globally optimal. If certification fails, trigger a re-optimization with GNC. Think of it as a "sanity check" layer.

SE-Sync solves a semidefinite relaxation of SLAM. After the SDP solver returns a solution, the "certificate matrix" has rank 3 (not 1). What does this mean for the SLAM result?

Chapter 8: Showcase — Robust SLAM Live

A 2D SLAM loop with adjustable outlier injection and method comparison. Adjust the number of outlier loop closures, pick a solver, and see the resulting map. Which constraints get rejected? How bad does plain LS get before robust methods kick in?

Robust SLAM Showcase: Map Quality vs. Outlier Rate

Left: estimated trajectory (colored by solver). Right: which loop closures were trusted (teal) vs rejected (red). Tune outlier fraction and solver.

Outlier % 20% Solver

Chapter 9: Frontiers & VNAV Series Cheat Sheet

Where the field is going

Dense & Neural Mapping. Classic SLAM stores landmarks as points. NeRF (Neural Radiance Fields) and 3D Gaussian Splatting store maps as neural volumetric representations — differentiable, dense, photorealistic. iNeRF-SLAM, NeRF-SLAM, Gaussian-SLAM: optimize poses and neural map jointly. Challenge: slow training, not yet real-time without GPU.

Learned Front-Ends. Replace hand-crafted SIFT (L6) with SuperPoint (self-supervised), replace BoW (L12) with NetVLAD or DINOv2 features. These generalize across illumination/weather changes that kill traditional descriptors.

Semantic & Object SLAM. The map isn't just geometry — it knows "this is a chair, this is a door." Object-level SLAM: poses are tracked relative to recognized objects, giving long-term re-localization even as furniture moves. Challenges: open-vocabulary object detection, dynamic objects.

Multi-Robot & Lifelong SLAM. A fleet of robots must merge maps, handle diverged estimates, and remain consistent across months of operation. Distributed SLAM: each robot solves locally, consensus algorithms (ADMM, distributed GNC) merge globally without a central server. Lifelong SLAM: the map must update as the world changes, not just accumulate.

Other Sensors. LiDAR: dense depth at every scan, no illumination issues — LOAM, LIO-SAM extend the factor-graph backend to 3D point-cloud registration factors. Event cameras: microsecond-latency pixel events instead of frames, ideal for fast motion and HDR — still an open research problem for SLAM.

Open Challenges. Robustness at high outlier rates (>70%), dynamic environments (moving people), scalability to city-scale maps, tight real-time integration of dense neural representations with optimization-based back-ends.

VNAV Complete Pipeline Cheat Sheet

LessonTopicKey formula / conceptLink
L13D GeometrySO(3), SE(3), T=exp(ξ̂)L1
L2Lie GroupsExp/Log, BCH, retraction ⊕L2
L3ControlPID, LQR (Riccati P=ATP+PA−PBR−1BTP+Q), geometric attitudeL3
L4Traj. Opt.Min-snap QP, differential flatnessL4
L5Image Formationp = K[R|t]P, pinhole modelL5
L6FeaturesHarris corner, SIFT, KLT optical flowL6
L7Two-View Geom.E=[t]×R, essential matrix, 8-point, triangulationL7
L8RANSACN=log(1−p)/log(1−ws), front-end outlier rejectionL8
L9NLLSGauss-Newton δ = −(JTΩJ)−1JTΩr, LML9
L10Manifold Opt.Retract: T ← T·exp(δ), on-manifold GNL10
L11VO/VIOPreintegration ΔRij, Δvij, Δpij; tight VIO fusionL11
L12Place Recog.BoW+TF-IDF, vocab tree, RANSAC verify, false-positive riskL12
L13SLAM Back-EndFactor graph, H=JTΩJ (sparse), Cholesky, iSAM2, Schur complementL13
L14Robust SLAMM-estimators, IRLS, GNC, switchable constraints, SE-SyncThis lesson

The full SLAM pipeline in one picture

Sensors
Camera (L5), IMU (L11), LiDAR — raw pixels, acceleration, point clouds
Front-End
Feature detect & track (L6) → two-view geometry (L7) → RANSAC clean (L8) → relative poses
VIO Odometry
Preintegrate IMU (L11) + visual landmarks → dead-reckoning trajectory with bounded drift
Place Recognition
BoW retrieval (L12) + geometric verify → loop-closure candidates (with possible false positives!)
PCM Filter
Pairwise consistency → remove most outlier loop closures before they enter the graph (L14 Ch6)
Robust Back-End
Factor graph (L13) + GNC/IRLS (L14) → globally consistent map despite remaining outliers
Certify
SE-Sync (L14 Ch7) → certificate of global optimality or trigger re-optimization

Related Gleams across the site

The SLAM / robust estimation thread connects to other series. From the Estimation section: Bayes Filter (the filtering vs smoothing distinction), Extended Kalman Filter (the original sequential SLAM approach), Modern SLAM (systems-level: ORB-SLAM3, VINS-Mono). From AI Architectures: NeRF & 3D Gaussian Splatting (the dense neural mapping frontier). From Decision/Control: Model-Based RL (the robot needs this map for planning).

The Feynman closure: "What I cannot create, I do not understand." You have now built every piece of the visual SLAM pipeline from scratch — the geometry, the optimization, the sparse linear algebra, the place recognition, and the robustness machinery. The next step is to implement it. Take GTSAM or g2o, synthesize a pose graph with injected outliers, run GNC, and check if SE-Sync certifies the result. That is the capstone exercise of this series.
python
# Squared vs Huber vs Cauchy vs TLS cost comparison
import numpy as np

def rho_sq(u): return 0.5 * u**2
def rho_huber(u, k=1.0):
    return np.where(np.abs(u) <= k,
                    0.5*u**2,
                    k*(np.abs(u) - k/2))
def rho_cauchy(u, c=1.0):
    return (0.5*c**2) * np.log(1 + (u/c)**2)
def rho_tls(u, c=1.0):
    return np.minimum(u**2, c**2)

# IRLS weight functions: w(u) = rho'(u) / u
def w_huber(u, k=1.0):
    return np.where(np.abs(u) <= k, 1.0, k/np.abs(u))
def w_cauchy(u, c=1.0):
    return 1.0 / (1 + (u/c)**2)
def w_tls(u, c=1.0):
    return np.where(np.abs(u) <= c, 1.0, 0.0)

# IRLS on 1D data with one outlier
z = np.array([0.1, -0.2, 0.15, -0.1, 5.0])  # last = outlier
x = 0.0  # initial guess
for it in range(10):
    u = (z - x)   # residuals (sigma=1 assumed)
    w = w_cauchy(u, c=1.0)
    x = np.sum(w * z) / np.sum(w)
    print(f"iter {it}: x={x:.4f}, w_outlier={w[-1]:.4f}")

# GNC-TLS schedule
def gnc_tls_weight(u, c=1.0, mu=1.0):
    # Yang et al. 2020 Eq. (11)
    num = c**2 * mu
    denom = (mu + 1) * c**2 + u**2
    rho_mu = num * u**2 / denom
    drho = (2 * num * u * denom - num * u**2 * 2 * u) / denom**2
    w = drho / (2 * u) if abs(u) > 1e-9 else num / denom
    return max(0, w)

mu = 1.0
x = 0.0
for step in range(20):
    u = z - x
    w = np.array([gnc_tls_weight(ui, c=1.0, mu=mu) for ui in u])
    x = np.sum(w * z) / (np.sum(w) + 1e-9)
    mu *= 1.4
    print(f"mu={mu:.1f} x={x:.4f} w_out={w[-1]:.4f}")