Unmixing what was blended. Hand a machine a tangle of overlapping signals — voices, brain waves, mixed audio — and it pulls out the original, independent sources. The math behind solving the cocktail party problem.
You're at a crowded party. Two people are talking at once, and two microphones are recording in different corners of the room. Each microphone picks up both voices, but in different proportions — mic 1 is closer to Alice so it hears mostly her with a bit of Bob; mic 2 is closer to Bob so it's mostly him with a bit of Alice. Each recording is a mixture, a blend of the two voices.
Now the seemingly impossible question: given only the two mixed recordings, can you recover the two original voices — Alice's speech alone and Bob's speech alone? No information about the room, the microphone positions, or what either person said. Just the tangled mixtures. It sounds like trying to un-bake a cake. Astonishingly, it's not just possible — it's a solved problem, and the algorithm that solves it is Independent Component Analysis, ICA.
The top two waveforms are the original sources — two independent signals (think Alice's voice, Bob's voice). The bottom two are what the microphones record: each is a different blend of both sources. You can see the mixtures look like neither original cleanly — they're tangled. ICA's job: recover the top two, given only the bottom two.
This "blind source separation" problem is everywhere, far beyond parties. Brain imaging: EEG electrodes on the scalp each record a mixture of signals from many brain regions — ICA separates them into distinct neural sources (and pulls out artifacts like eye-blinks). Medical: separating a fetal heartbeat from the mother's in a combined ECG. Finance: finding independent driving factors behind correlated stock movements. Audio: isolating instruments from a mix. Anywhere independent sources get blended together, ICA can pull them back apart.
Let's make the blending precise. There are some hidden source signals — call them s — that are independent (Alice's voice, Bob's voice). What each microphone records is a linear mixture: a weighted sum of the sources, where the weights depend on how close that mic is to each speaker. With two sources and two mics:
The x's are what we observe (the recordings); the s's are the hidden sources we want; and the a's are the mixing weights — how much of each source reaches each microphone. Packaging the weights into a matrix A, the whole thing is just one clean equation:
The mixing matrix A captures the entire room geometry in a few numbers. We don't know A (we don't know the room), and we don't know s (the original voices) — all we have is x, the mixed recordings, at many points in time. Two unknowns, one observation. That's the challenge.
The top two signals are the independent sources. Adjust the mixing weights — how much of each source bleeds into each microphone — and watch the bottom two mixed signals change. With balanced weights the mixtures become a thorough blend; with the identity (no cross-mixing) they stay separate. This blending is exactly what the room does to the voices.
If mixing is "multiply the sources by A," then un-mixing is just the reverse: multiply the mixtures by A's inverse. Call that the unmixing matrix W = A−1. If we could find W, recovering the sources would be one multiplication away:
So the entire problem of separating the voices reduces to one thing: find the right unmixing matrix W. Once you have W, you apply it to each recorded sample and out come the separated sources. The whole game is finding those few numbers in W.
But here's the puzzle: we don't know A, so we can't just invert it — we have to find W directly from the mixtures alone. How can we possibly know when we've found the right W, with nothing to compare against? This is where ICA's brilliant insight comes in, and it's the subject of the next chapter. The short version: we recognize the correct W by the fact that it makes the recovered signals independent. When the un-mixed outputs stop looking like blends and start looking like distinct, unrelated signals, we know we've separated them.
The mixtures are fixed. Adjust the two unmixing weights in W and watch the recovered signals at the bottom. Most settings give garbage (still-blended messes). But there's a special W that makes the two outputs match the original clean sources. Hunt for it — or press Auto-solve (ICA) to let the algorithm find it instantly.
How do we recognize the correct unmixing without knowing the sources? The answer lives in a beautiful geometric picture. Forget the waveforms for a moment and instead plot the two mixtures against each other: at every instant in time, take (mic1 value, mic2 value) and drop a point at that coordinate. Over many time samples, you get a scatter cloud — and the shape of that cloud reveals the hidden structure.
Here's the key fact. When two independent sources are mixed, the scatter cloud forms a tilted, sheared shape — for many signals, a parallelogram or rhombus. The edges of that shape point along the original source directions. Mixing rotated and sheared the natural axes; the independent sources are hiding in plain sight as the edges of the cloud. ICA's job is to find those edges — the directions along which the data, once projected, looks like a single clean source rather than a blend.
What makes a recovered signal "a single clean source" rather than "a blend"? Non-Gaussianity. A blend of independent signals always looks more Gaussian (more bell-curved) than the individual sources — this is the Central Limit Theorem in action: sums of independent things drift toward the bell curve. So to un-blend, we run it backwards: find the directions that make the projected signal least Gaussian. The most non-Gaussian directions are the original, unmixed sources. Independence and maximal non-Gaussianity turn out to be two faces of the same target.
Each dot is one time-instant, plotted as (mic1, mic2). Notice the cloud isn't a round blob — it's a sheared shape whose edges (orange lines) are the independent source directions, found by ICA. Compare them to the PCA axes (blue): PCA finds perpendicular max-variance directions, but the true sources aren't perpendicular — ICA's edges follow the actual cloud geometry. Adjust the mixing and watch ICA's axes track the cloud's true edges.
Now the most surprising and important fact about ICA, one that trips up everyone the first time: ICA cannot separate Gaussian sources. If the original signals were bell-curved, the whole method collapses. Understanding why reveals the deep reason ICA works at all.
The culprit is symmetry. A 2D Gaussian with independent components has a scatter cloud that's a perfectly round, rotationally symmetric blob — a circular splat. And here's the fatal problem: a circle looks identical no matter how you rotate it. So if you mix two Gaussian sources, the resulting cloud is still a featureless round (or elliptical) blob with no distinguishable edges. There's no "shape" to lock onto, because every rotation of a Gaussian blob is just another equally-valid Gaussian blob. The original source directions are washed out — mathematically unrecoverable.
Non-Gaussian sources break this symmetry, and that's what saves us. A non-Gaussian signal — a sine wave, a sawtooth, speech, anything with a distinctive non-bell shape — produces a scatter cloud with genuine corners and edges: a parallelogram, a star, a square. These have a preferred orientation, so the source directions are visible as the cloud's edges. The more non-Gaussian the sources, the sharper the corners, the easier the separation. Non-Gaussianity is not a nuisance assumption — it is the very thing that makes the sources identifiable.
Toggle the source type. With Gaussian sources, the mixed cloud is a round/elliptical blob — rotationally symmetric, no edges, so ICA has nothing to grab and separation is impossible. With non-Gaussian sources (uniform), the cloud is a sharp parallelogram with obvious edges (the source directions) — ICA locks right on. Same mixing; the only difference is the sources' shape.
Time to watch the whole thing work. Below: two original source signals, a mixing matrix that blends them into two tangled recordings, and ICA recovering the originals — with the scatter cloud showing how it does it. This is the cocktail party problem, solved before your eyes.
Top: original sources. Middle: mixed recordings. Bottom: ICA's recovered signals. The scatter (right) shows the (mic1, mic2) cloud and the source directions ICA finds. Press Run ICA to separate.
(No quiz — the lab is the test. If you can look at the tangled scatter cloud and point to where the source directions are — the cloud's edges — you understand how ICA sees what we hear as noise.)
Both PCA and ICA find a new set of directions (a new basis) for your data — but they're after fundamentally different things, and seeing the contrast crystallizes what each one does.
| PCA | ICA | |
|---|---|---|
| Goal | Directions of maximum variance | Directions that are statistically independent |
| Achieves | Uncorrelated components | Independent components |
| Uses | 2nd-order stats (covariance) | Higher-order stats (non-Gaussianity) |
| Directions | Always orthogonal (perpendicular) | Need not be orthogonal |
| Gaussian data | Works fine | Fails (no unique answer) |
| Typical use | Compression, dimensionality reduction | Source separation (unmixing) |
The crux is in two rows. PCA's directions are always perpendicular and it only guarantees uncorrelated outputs. But the true source directions in a mixture are usually not perpendicular — the mixing sheared them at an angle. So PCA, forced to find perpendicular axes, cannot align with the real sources; it finds the max-variance axes instead, which are a blend. ICA, free to find non-perpendicular directions and demanding full independence (not just uncorrelatedness), locks onto the actual source edges.
The same sheared scatter cloud of mixed signals, with both methods' directions drawn. PCA (blue) finds perpendicular max-variance axes — which slice across the cloud, not along its edges. ICA (orange) finds the cloud's actual edges — the true source directions, even though they're not perpendicular. For separating sources, only ICA's answer is right.
How does ICA actually find W in practice? It's an optimization, in the same family as everything else in this course. We turn "make the outputs independent" into a number to maximize, then climb it.
The recipe has two stages. First, whiten the data (this is where PCA helps): center it and rescale so the mixtures are uncorrelated with equal variance — turning the sheared cloud into a more regular shape. After whitening, the only freedom left is a rotation, which shrinks the search to finding one angle (in 2D) or one rotation matrix (in general). Second, rotate to maximize non-Gaussianity: search for the rotation that makes the projected outputs as far from bell-curved as possible — measured by statistics like kurtosis (a measure of "peakedness/tailedness," zero for a Gaussian) or negentropy. The famous FastICA algorithm does exactly this, climbing toward maximal non-Gaussianity with a fast fixed-point iteration. Equivalently, you can derive ICA as maximum likelihood — choosing W to make the recovered sources most probable under an assumed non-Gaussian source distribution, then doing gradient ascent.
ICA can recover the sources, but with two unavoidable ambiguities — both of which, happily, don't matter for real applications:
You've reached the end of the classical ML core — and ICA is a fitting finale, because it inverts the lesson every other method taught (Gaussian-is-good) and shows the power of going after a stronger goal (independence, not mere uncorrelatedness). You can now separate the inseparable.
| Concept | What it means |
|---|---|
| Goal | Blind source separation: recover independent sources from their observed mixtures (the cocktail party problem). |
| Mixing model | x = As: observed mixtures = (unknown) mixing matrix A times the hidden sources s. |
| Unmixing | Find W = A−1; then s = Wx recovers the sources. |
| The criterion | The right W makes the outputs statistically independent — equivalently, maximally non-Gaussian. |
| Non-Gaussian required | Gaussian sources give a rotationally symmetric mix with no recoverable directions. Sources must be non-Gaussian. |
| vs PCA | PCA: uncorrelated, orthogonal, max-variance (2nd-order). ICA: independent, possibly non-orthogonal (higher-order). Independence > uncorrelatedness. |
| Algorithm | Whiten (PCA), then rotate to maximize non-Gaussianity (FastICA / max-likelihood). |
| Ambiguities | Source order and scaling/sign are unrecoverable — but harmless for applications. |
python from sklearn.decomposition import FastICA import numpy as np # X: observed mixtures, shape (n_samples, n_microphones) ica = FastICA(n_components=2, whiten='unit-variance') S_recovered = ica.fit_transform(X) # the separated independent sources! A_est = ica.mixing_ # estimated mixing matrix A # X ≈ S_recovered @ A_est.T — we unmixed without ever knowing A # Classic demo: mix sine + sawtooth + noise, then separate them: # t = np.linspace(0,8,2000) # S = np.c_[np.sin(2*t), signal.sawtooth(3*t)] # 2 non-Gaussian sources # X = S @ [[1,1],[0.5,2]].T # mix them # FastICA(2).fit_transform(X) → recovers sine & sawtooth
This is the last of the CS229 classical machine learning lessons. Together they form a complete foundation: linear and logistic regression, unified by GLMs; the generative view; the bias-variance tradeoff and model selection; and unsupervised learning — k-means, EM/GMM, PCA, and now ICA. You have the toolkit a working ML engineer reaches for daily.
"The sciences do not try to explain, they hardly even try to interpret, they mainly make models." — John von Neumann. ICA's model — independent sources, linearly mixed — is simple enough to write in one line and powerful enough to pull a single voice from a roaring crowd.