Programmable Logic — Scherz & Monk, Chapter 14

Chapter 0: The One-Chip Problem

You are building a digital clock. The design calls for a 10-stage counter to divide down a crystal oscillator, a binary-to-decimal decoder to drive the display digits, and a couple of glue gates to handle the colon blink and the alarm compare. Forty years ago you would have walked to the parts bin and pulled a 7490 decade counter, a 7447 decoder/driver, a 7408 quad AND, a 7432 quad OR — a dozen chips, a rat's nest of wires, and a board the size of a paperback.

No real product is built that way anymore. The whole design has to collapse onto one chip. There are exactly two ways to do that, and they are philosophically opposite.

Option 0 — The old way: discrete logic ICs

A counter chip, a decoder chip, gate chips. Each does one fixed job. Genuine parallel hardware, but a dozen packages, fixed forever, and a layout nightmare. Nobody ships this.

↓

Option A — A microcontroller (Ch 13)

Move the hardware problem into software. One CPU, one program counter, executing your "logic" as a sequence of instructions. Cheap and flexible — but it imitates the gates one step at a time. A single thread aping parallel hardware. It can fall behind.

↓

Option B — A CPLD or FPGA (this chapter)

Describe the counter, decoder and gates you want — and the chip becomes that custom hardware. Every block runs in genuine parallel, all the time. You don't program it like a CPU; you configure its fabric.

The microcontroller is a sequential machine pretending to be your circuit. The FPGA is your circuit. That is the single idea this whole chapter unpacks. A field-programmable gate array (FPGA) is a sea of tiny configurable logic cells plus a programmable wiring matrix; you hand it a description of the hardware you want, and a synthesis tool melts that description down into cell-and-wire settings that make the chip behave exactly like your schematic.

The worked comparison: divide a clock by 10

Say the divide-by-10 stage must toggle an output once every 10 input clock pulses. On a microcontroller the CPU spins a counter variable: increment, compare to 10, branch, maybe reset, repeat — perhaps 5 instructions per pulse. On the FPGA, four flip-flops wired as a counter just are the divider; on every clock edge they all update simultaneously in one gate delay (nanoseconds), no instructions involved.

Numbers make it vivid. Suppose the input clock is 12 MHz. The FPGA divider runs at the full 12 MHz because the flip-flops update at the clock edge directly. The microcontroller, running at 12 MHz but spending ~5 cycles of overhead per pulse, can only "service" the divider at 12 MHz ÷ 5 = 2.4 MHz if it does nothing else. Add the decoder and the gates to the software loop and that ceiling drops further. The FPGA does all three jobs at once with zero interference.

Configure, don't program. The deepest distinction in this chapter: a microcontroller has fixed hardware and you change its behavior by changing software. An FPGA has fixed nothing — you change its actual hardware by loading a configuration. The same FPGA can be a counter today, a video scaler tomorrow, a cryptographic engine next week. The silicon is generic; your bitstream gives it identity.

Sequential vs Parallel: Three Counters

Three independent counters need to run from three different clocks. Pick a mode and press Run. In FPGA mode all three are real hardware and advance together. In microcontroller mode a single instruction pointer visits one counter at a time — watch it fall behind as the workload grows.

Workload per counter (instructions) 3

A microcontroller and an FPGA can both replace a board full of logic chips. What is the fundamental difference in how they do it?

The FPGA is just a faster microcontroller running the same instructions The µC imitates the logic sequentially in software; the FPGA is configured to become the actual parallel hardware They are identical — both store a program in flash and execute it

Chapter 1: PALs and the AND-OR Plane

Before FPGAs there were PALs — programmable array logic, the first commercially successful programmable logic device (PLD). The PAL is the simplest possible programmable chip, and it teaches the central trick directly: any combinational function can be written as a sum of products, and a sum of products maps onto a grid of programmable connections.

Recall from Chapter 12: every Boolean function has a sum-of-products (SOP) form. Each product term is an AND of some inputs (each input appearing either true or complemented); the whole function is the OR of those product terms. For example, a 1-of-2 data selector that outputs A when SEL=1 and B when SEL=0 is:

OUT = (SEL · A) + (SEL · B)

Two product terms, OR'd together. That is exactly the shape a PAL implements. A PAL has two planes of wires. The AND plane takes every input and its complement and feeds them across a grid of horizontal product-term lines; at each crossing sits a programmable connection (historically a literal fuse). Blow or keep each fuse and you decide which literals feed each AND gate. The OR plane then sums selected product terms into each output. In a true PAL the AND plane is programmable and the OR plane is fixed; in its ancestor the PLA both were programmable.

Worked example: program the selector

We need two product terms. Lay out four columns — A, A̅, B, B̅ — well, six columns counting SEL and SEL̅. For the selector we want:

Product term 1: connect fuses for SEL and A → this row computes SEL·A.
Product term 2: connect fuses for SEL̅ and B → this row computes SEL̅·B.
OR plane: route both rows into OUT → OUT = SEL·A + SEL̅·B.

Check it: when SEL=1, term 1 = 1·A = A and term 2 = 0·B = 0, so OUT = A. When SEL=0, term 1 = 0 and term 2 = 1·B = B, so OUT = B. The fuse pattern is the truth table. Change which fuses you keep and you change the function — without rewiring a single physical trace.

Fuses encode literals. A PLD is "programmed" by deciding, at each grid crossing, whether an input reaches a product-term line. Early PALs used one-time fuses (program once, done). Later devices used EEPROM or SRAM cells so the array could be reprogrammed. But the logical model never changed: SOP = a programmable AND array feeding a (often fixed) OR array. Master this and the CPLD macrocell in the next tab is obvious.

Build a PAL: Programmable AND-OR Plane

Two inputs A and B (with complements) feed three product-term rows. Click the grid dots to connect a literal into a row (an AND term). Each connected row that is enabled feeds the OR output. The live truth table on the right shows OUT for all four input combinations. Try to build AND, then OR, then the selector.

In a classic PAL, which plane is programmable and which is fixed?

The OR plane is programmable, the AND plane is fixed The AND plane is programmable (you choose the product terms); the OR plane is fixed Both planes are fixed; only the outputs are programmable

Chapter 2: CPLD vs FPGA

The PAL scales badly. A single AND-OR plane works for a function of a few inputs, but a real design has thousands of signals; one giant plane would be enormous and slow. Two architectures grew out of the PAL to fix this, and they took opposite routes.

CPLD: many PALs on one chip

A complex programmable logic device (CPLD) is, roughly, a bunch of PAL-style blocks — called macrocells — stitched together by a central, predictable interconnect. Each macrocell is a sum-of-products engine (an AND-OR plane) feeding a flip-flop. CPLDs are the direct successors of the PAL: same SOP DNA, just replicated and interconnected. They are typically built on EEPROM/flash, so the configuration is non-volatile — it survives power-off, and the chip is ready the instant you apply power. Their timing is highly deterministic because every signal crosses the same predictable central matrix, which is why CPLDs are loved for glue logic, address decoding, and state machines that must wake up instantly.

FPGA: a sea of lookup tables

An FPGA abandons the SOP plane entirely. Instead of macrocells it uses a fine-grained fabric of small lookup tables (LUTs) — tiny memories whose stored bits are a truth table — each paired with a flip-flop, and surrounded by a rich, flexible routing network. There are far more of these cells (modern parts hold 200,000 to several million), and the routing is much more general, which is what lets FPGAs implement huge, deeply pipelined designs. The price: the configuration lives in SRAM, which is volatile. Cut the power and the design evaporates; it must be reloaded at every power-up (more on this in Chapter 4 of this lesson).

Property	CPLD	FPGA
Core logic element	Macrocell (sum-of-products AND-OR plane)	Lookup table (LUT) + flip-flop
Heritage	Successor to the PAL	Gate-array fabric, SRAM-configured
Capacity	Hundreds to a few thousand cells	200,000 to several million blocks
Configuration memory	EEPROM / flash — non-volatile	SRAM — volatile, reloaded at power-up
Timing	Very deterministic (central matrix)	Depends on place & route; more variable
Sweet spot	Glue logic, decoders, instant-on state machines	Large parallel systems, DSP, video, SoC

Worked example: which one for the clock?

Our digital clock from Tab 0 needs a 10-stage divider, a decoder, and a few gates — a few dozen flip-flops and a hundred-ish gates. That fits comfortably in a small CPLD with maybe 64–128 macrocells, and the non-volatility means the clock starts keeping time the instant batteries go in, with no external memory. Now suppose we instead want to overlay an HD video clock with anti-aliased digits and a spectrum-analyzer alarm tone: thousands of logic cells, multipliers, deep pipelines — that is FPGA territory, and we accept a ~100 ms boot from external flash. The architecture follows the workload.

Same goal, opposite grain. A CPLD spends its silicon on wide, shallow SOP logic with rock-solid timing and instant-on behavior. An FPGA spends its silicon on a vast, fine-grained, deeply routable fabric that trades determinism and instant-on for sheer scale. "CPLD = macrocells / sum-of-products; FPGA = LUTs" is the one-line answer to the most common interview question in this field.

Architecture Compare: Macrocell vs LUT Fabric

Left, a CPLD macrocell: a sum-of-products plane feeding one flip-flop. Right, a slice of FPGA fabric: many small LUT+FF cells joined by routing. Toggle the view to highlight where the configuration lives (volatile SRAM vs non-volatile flash).

What is the core architectural difference between a CPLD and an FPGA?

CPLDs are analog, FPGAs are digital They are the same chip with different names CPLDs use sum-of-products macrocells (non-volatile); FPGAs use LUT-based logic blocks (volatile SRAM config)

Chapter 3: Inside the FPGA — The LUT

Here is the most beautiful idea in the whole chapter, and it is almost embarrassingly simple. How does an FPGA implement any logic function of its inputs without having any actual gates wired up? It uses a lookup table — and a lookup table is just a tiny ROM.

Think back to ROM as addressable storage. A ROM with k address lines has 2^k locations; you put an address in, the byte at that location comes out. Now strip the ROM down to one bit wide. A k-input LUT is a 2^k × 1-bit ROM. Its k inputs are the address; the single bit stored at that address is the output. And here is the punchline: if you fill those 2^k stored bits with the truth table of the function you want, the LUT computes that function exactly. The stored bits don't encode the logic — they are the truth table.

k-input LUT = 2^k × 1-bit ROM → inputs = address, stored bits = truth table

Worked example A: a 2-input LUT is four memory cells

Take k = 2. Then 2^k = 2² = 4 memory cells, addressed by the input pair (A,B) in order 00, 01, 10, 11. Whatever four bits you store, you get that function. Toggle the four bits and you change the gate:

Address (A B)	00	01	10	11	Gate
Stored bits	0	0	0	1	AND
Stored bits	0	1	1	1	OR
Stored bits	0	1	1	0	XOR
Stored bits	1	1	1	0	NAND

Same four physical memory cells, same silicon, four completely different gates — chosen entirely by what you wrote into the cells. This is why an FPGA needs no fixed gates: every cell can become any 2-input gate by reloading four bits.

Worked example B: counting the functions a LUT can be

If a 2-input LUT has 4 cells, and each cell is independently 0 or 1, then the number of distinct fillings is 2⁴ = 16. Sixteen possible configurations — and that is exactly the number of distinct Boolean functions of two variables (AND, OR, XOR, NAND, NOR, XNOR, the two constants, the four "pass/invert one input" functions, and so on). A 2-input LUT can be any of them. In general:

Functions a k-input LUT can implement = 2^{(2^k)}

k = 2: 2^(2²) = 2⁴ = 16 functions.
k = 4: 2^(2⁴) = 2¹⁶ = 65,536 functions, from 16 stored bits.
k = 6: 2^(2⁶) = 2⁶⁴ ≈ 1.8 × 10¹⁹ functions, from a 64 × 1-bit ROM.

That last line is why modern FPGAs settled on the 6-input LUT: 64 configuration bits per cell buys you literally any function of six inputs. Multi-million-cell parts give you millions of these universal little machines, each ready to be any 6-input gate.

The truth table is the program. A CPU executes instructions to evaluate logic. A LUT skips evaluation entirely: it pre-computes the answer for every possible input and just looks it up. There is no "running" — the moment the address (inputs) is present, the stored bit (output) appears in one memory-read delay. This is why FPGA logic is fast and inherently parallel: every LUT does its lookup at the same instant, independently.

2-Input LUT Bit-Toggler (Showcase Widget)

Four memory cells at addresses 00, 01, 10, 11. Click a cell to flip its stored bit. The named gate that results appears live, along with the configuration count (one of 16). Set the live inputs A and B to watch the address light up and the output read out of the table.

Live input A 1

Live input B 0

A 4-input lookup table stores 16 configuration bits. How many distinct logic functions can it be programmed to compute?

16 (one per stored bit) 256 65,536 — that is 2¹⁶, any function of its 4 inputs

Chapter 4: The Logic Block & the Fabric

A bare LUT computes combinational logic, but real circuits need memory — flip-flops to hold state, to build counters and state machines. So the FPGA's basic repeating unit pairs a LUT with a flip-flop. This unit is the logic block (Xilinx calls a cluster of them a configurable logic block, or CLB; the fundamental cell is sometimes a "logic element").

LUT

A 2^k×1 ROM holding a truth table — computes any function of its k inputs.

→

Flip-flop (optional)

A configuration bit chooses: route the LUT output straight out (combinational) or through a D flip-flop clocked by a global clock (registered/sequential).

→

Routing matrix

Programmable switches connect this block's output to other blocks' inputs — this is how individual cells wire into a real circuit.

Tile thousands to millions of these logic blocks across the chip, thread them with a programmable routing matrix (a switching network whose every connection is itself a configuration bit), and ring the edge with I/O blocks that connect the fabric to physical pins. That is the whole FPGA: logic blocks + routing + I/O. Modern parts also drop in hardened helpers — block RAM, DSP multipliers, and on SoC FPGAs, full ARM CPU cores — but the soul of the device is the configurable fabric.

Configuration is volatile — and that matters

Every one of those bits — the LUT contents, the flip-flop bypass settings, every routing switch — lives in SRAM cells. SRAM is fast and infinitely re-writable, but it forgets everything when power drops. The complete set of bits is the configuration bitstream, and on most FPGA boards it is stored in a small external EEPROM/flash chip. At power-up a tiny on-chip loader streams the bitstream in and the fabric "wakes up" as your circuit — typically in under ~200 ms.

Why volatility is a fair trade. SRAM cells are tiny and unlimited-rewrite, which is exactly what lets FPGAs pack millions of reconfigurable cells and re-flash in milliseconds during development. The cost — needing an external EEPROM and a ~200 ms boot — is trivial for most systems. CPLDs make the opposite bet: flash config that is instant-on and survives power loss, at the price of far less capacity. Volatile vs non-volatile config is the FPGA-vs-CPLD trade in one sentence.

Worked example: how many bits in a bitstream?

Estimate crudely. Suppose a small FPGA has 10,000 logic blocks, each a 4-input LUT (16 bits) plus a few control bits, say ~20 configuration bits per block: that is 10,000 × 20 = 200,000 bits just for logic. Add routing — often several times the logic bits — and a real small part lands around 1–3 million configuration bits. Streaming 2,000,000 bits from a serial EEPROM at, say, 25 MHz takes 2,000,000 ÷ 25,000,000 = 0.08 s = 80 ms — comfortably inside the ~200 ms power-up budget. The arithmetic is why "FPGAs boot in a fraction of a second" is true.

FPGA Fabric: Logic Blocks, Routing & Bitstream Load

A grid of CLBs (each a LUT + flip-flop) surrounded by I/O blocks and threaded with routing channels. Click two blocks to select a source and destination, then Wire them. Hit Configure to stream the bitstream in from external EEPROM — watch blocks light up as they load. Power-cycle to see the volatile config vanish.

An FPGA's configuration lives in SRAM. What happens when power is removed and reapplied?

Nothing — the design is etched permanently into the silicon The design is kept, but runs slower The config is lost (volatile) and must be reloaded from external EEPROM at power-up, typically in <~200 ms

Chapter 5: Design Entry — Schematic vs HDL

You now have a chip full of universal LUTs and switchable wires. How do you tell it what circuit to become? You never set individual fuses or LUT bits by hand — that would be hopeless at scale. Instead you describe the hardware you want, and a synthesis tool maps your description onto LUTs and routing automatically. There are two ways to enter the description.

Schematic entry: draw the gates

The intuitive way, especially for beginners: drag gate and flip-flop symbols onto a canvas and wire them, exactly as in Chapter 12. For our 1-of-2 selector you would place an AND, an inverter, another AND, and an OR, and wire them per OUT = SEL·A + SEL̅·B. The tool reads the drawing and synthesizes it. Schematics are wonderfully concrete — what you draw is what you get — but they do not scale: a 4-bit counter is a manageable drawing, a 32-bit CPU is an unreadable acre of symbols.

HDL entry: describe the gates

The scalable way: a hardware description language (HDL). You write text that describes the circuit's structure and behavior, and the synthesis tool figures out the gates. The two dominant HDLs are Verilog (C-like, terse, popular in commercial and consumer work) and VHDL (verbose, strongly typed, favored in aerospace and defense where the strictness catches bugs). Our selector in Verilog is just:

module sel2(input SEL, input A, input B, output OUT);
  assign OUT = SEL ? A : B;
endmodule

That single line describes the same hardware the four-gate schematic does — and a 32-bit counter is one more line (count <= count + 1;) instead of a wall of flip-flops. This is the great leap of the chapter: the HDL looks like software but describes hardware. The synthesis tool does not compile it into instructions; it compiles it into LUT contents and routing.

It looks like code but it is not code. The single most common beginner mistake is reading Verilog as a program that runs top-to-bottom. It is not. Every assign and every always block describes a piece of hardware that exists simultaneously and permanently. Three counters described in three blocks are three real counters, all clocking at once — not a loop that visits them in turn. Internalize this and Verilog suddenly makes sense; miss it and nothing does.

The design flow, end to end

1. Describe

Schematic or HDL (Verilog/VHDL) capturing the desired hardware.

↓

2. Synthesize

Tool (e.g. Xilinx ISE) maps the description onto generic LUTs, flip-flops, and nets.

↓

3. Place & route

Assign each LUT to a physical block and program the routing switches to connect them.

↓

4. Generate bitstream

Produce the .bit file — the exact configuration of every cell and switch.

↓

5. Configure

Load the .bit into the FPGA (or its EEPROM). The fabric becomes your circuit.

On a hobby board like the Elbert V2 (about $29.95, built around a Xilinx Spartan XC3S50A), you run this whole flow on your laptop in Xilinx ISE, click "generate," and push the resulting bitstream to the board over USB. Minutes later the $30 chip is your custom counter.

Design Flow: From Description to Configured Fabric

Step through the flow for the 1-of-2 selector. Press Next stage to advance from HDL text, through synthesis to gates, place & route onto the fabric, and finally the loaded bitstream. Toggle the entry method to compare schematic vs HDL at the start.

A Verilog description "looks like software." What does it actually produce when synthesized for an FPGA?

A sequence of CPU instructions executed one at a time A hardware configuration (LUT contents + routing) where independent blocks run in genuine parallel A text file that the FPGA interprets line-by-line at runtime

Chapter 6: Verilog Essentials

Let's get concrete with the language. Verilog organizes everything into modules — a module is a block of hardware with named input and output ports. Inside, you declare signals and describe how they relate. The mental model never changes: everything you write describes wires and the logic between them, all existing at once.

Wires, registers, and buses

wire — a plain connection driven continuously by combinational logic (via assign).
reg — a signal whose value is set inside an always block; when clocked, it becomes a flip-flop (a "register"). The name is misleading — a reg is not always a hardware register.
Bus: a multi-bit signal, e.g. reg [3:0] count; declares a 4-bit bus — four flip-flops carried together, exactly the 4-bit ripple counter from Chapter 12.

The always block and the sensitivity list

An always block describes logic that re-evaluates whenever something in its sensitivity list changes. The most important form, for sequential logic, triggers on a clock edge:

module counter4(input clk, input rst, output reg [3:0] count);
  always @(posedge clk) begin  // on each rising clock edge
    if (rst) count <= 4'b0000;
    else    count <= count + 1;
  end
endmodule

This describes a 4-bit counter: on every rising edge of clk, all four flip-flops update at once, either clearing to 0 or incrementing. The <= is the non-blocking assignment — it means "all these updates happen together at the clock edge," which is precisely how real flip-flops behave.

Worked example: the clock prescaler

Our digital clock board runs from a 12 MHz crystal, but the display digit-multiplexing wants about 1 kHz, and the seconds counter wants exactly 1 Hz. You build prescalers — counters that divide the clock by a fixed N, using f_out = f_in / N. The arithmetic:

12,000,000 Hz ÷ 12,000 = 1,000 Hz = 1 kHz 12,000,000 Hz ÷ 12,000,000 = 1 Hz

So to get 1 kHz you build a counter that counts 12,000 input pulses and toggles (or pulses) its output once per cycle; for 1 Hz you count all the way to 12,000,000. In Verilog each is a few lines, and — crucially — both prescalers can run from the same 12 MHz clock at the same time, in separate always blocks, because they are separate hardware:

reg [13:0] cnt1k;    reg tick1k;
always @(posedge clk12)    // 12 MHz in
  if (cnt1k == 11999) begin cnt1k <= 0; tick1k <= ~tick1k; end
  else cnt1k <= cnt1k + 1;  // → ~1 kHz

(Note 12,000 distinct counts need a counter that reaches 11,999, requiring ⌈log₂12000⌉ = 14 bits, hence [13:0] — the bit-width arithmetic matters and the tool will warn you if you get it wrong.)

Parallel by default. In software, two counting loops run one after the other unless you go out of your way to thread them. In Verilog it is the reverse: two always blocks are two pieces of hardware that run simultaneously by default, and making them interact takes deliberate effort. This inversion — parallel is free, sequential is the special case — is the whole reason FPGAs outrun microcontrollers on parallel workloads.

Verilog Counter & Prescaler Simulator

A 4-bit counter described by the always @(posedge clk) block above, plus a divide-by-N prescaler. Set the divisor N and press Step or Run; watch the 4-bit register and the divided "tick" output update on each modeled clock edge. The waveform shows that the prescaler output frequency is exactly clk ÷ N.

Prescaler divisor N 8

Two separate always @(posedge clk) blocks in a Verilog module describe two counters. How do they execute on the FPGA?

One runs, then the other, like two function calls Only the first one synthesizes; the second is ignored Both are independent hardware blocks running in genuine parallel, every clock edge

Chapter 7: The Parallel Payoff — Modular Design & the Race

Everything converges here. The reason you reach for an FPGA over a microcontroller is the same reason this chapter exists: genuine parallelism. To prove it, we run a race — three counters on three independent clocks — in both worlds, and watch the microcontroller's single instruction pointer lose.

Modular, parameterized design

First, the engineering practice that makes big FPGA designs tractable: build small modules and instantiate them. Our clock needs three timers (digit-mux at 1 kHz, blink at 1 Hz, alarm-compare). Rather than write three nearly-identical counters, you write one parameterized prescaler module and instantiate it three times with different divisors:

module presc #(parameter N = 12000) (input clk, output reg tick);
  reg [23:0] c;
  always @(posedge clk)
    if (c == N-1) begin c <= 0; tick <= ~tick; end else c <= c + 1;
endmodule

presc #(.N(12000))    u_mux  (.clk(clk12), .tick(t_1k));   // 1 kHz
presc #(.N(12000000)) u_blink(.clk(clk12), .tick(t_1hz));  // 1 Hz
presc #(.N(6000000))  u_alarm(.clk(clk12), .tick(t_2hz));  // 2 Hz

Three instances, three real prescalers, all clocking off the same 12 MHz crystal at once. You also write a test fixture (testbench) — non-synthesizable Verilog that generates a fake clock and checks the outputs in a simulator before ever touching hardware, so you catch the off-by-one bit-width bug at your desk, not on the bench.

The race, with numbers

Now suppose all three counters must advance every microsecond. On the FPGA, three counter modules update on their clock edges simultaneously — one gate delay each, fully overlapped. Total time to service all three per tick: essentially one clock period, because they happen at once.

On the microcontroller, the single CPU must visit each counter in turn. If servicing one counter costs ~5 instructions and the CPU runs at 12 MHz (one instruction ≈ 83 ns), three counters cost 3 × 5 = 15 instructions = 15 × 83 ns ≈ 1.25 µs per round. The FPGA finished the same work in ~0.08 µs. As you add a fourth, fifth, sixth counter, the FPGA cost stays flat (more counters = more parallel hardware) while the microcontroller cost grows linearly and eventually it simply cannot keep up — it drops ticks.

The instruction pointer is the bottleneck. A microcontroller has exactly one program counter, so no matter how clever the code, it can only be doing one thing at one instant. An FPGA has as many "doers" as you configure — thousands of LUTs and flip-flops, every one active every clock. The FPGA doesn't time-share the work; it spatially replicates it. That is why, for wide parallel problems, configured hardware annihilates sequential software.

The Race: Three Counters, Two Worlds (Showcase)

Three counters drive three clocks. In FPGA mode they advance together every tick. In µC mode a single instruction pointer (the highlighted marker) hops between counters, spending the set number of instructions on each — and falling progressively behind the FPGA's tick count. Add workload to widen the gap. The live tally shows ticks completed by each world.

Instructions per counter (µC) 5

Number of counters 3

Why does the FPGA keep up with the three counters while the microcontroller falls behind as counters are added?

The FPGA's clock is always faster than any microcontroller's The FPGA spatially replicates the counters (all advance at once); the µC has one instruction pointer that must service them in sequence The microcontroller has no counters, so it must compute them in floating point

Chapter 8: Connections & Summary

Chapter 14 took one problem — collapse a board of logic chips onto one chip — and followed the programmable-logic answer all the way down: from the PAL's AND-OR plane, to the CPLD macrocell and the FPGA LUT, to the fabric, to HDL design entry and the parallel payoff. Here is the whole chapter on one page.

Cheat-sheet: key equations & facts

Concept	Formula / fact	Worked value
LUT as ROM	k-input LUT = 2^k × 1-bit ROM	6-input = 64 × 1-bit
Functions per LUT	2^{(2^k)}	k=2→16; k=4→65,536; k=6→1.8×10¹⁹
2-input LUT configs	2⁴ distinct fillings	16 (AND, OR, XOR, NAND, …)
Sum of products	OUT = OR of AND-terms; fuses pick literals	OUT = SEL·A + SEL̅·B
Clock division	f_out = f_in / N	12 MHz÷12,000 = 1 kHz; ÷12,000,000 = 1 Hz
Config (FPGA)	Volatile SRAM, reloaded from EEPROM	boot <~200 ms
Config (CPLD)	Non-volatile flash, instant-on	0 ms boot
Logic block	LUT + flip-flop (+ routing)	200K–several-million per FPGA

The vendor & tool landscape

Vendors: Xilinx and Altera together held roughly 90% of the FPGA market through the period of this text. Tool: Xilinx ISE ran the describe→synthesize→place&route→bitstream flow.

HDLs: Verilog (C-like, terse, consumer/commercial) and VHDL (verbose, strongly typed, aerospace & defense). Output: a .bit bitstream.

Board: the Elbert V2 (~$29.95, Xilinx Spartan XC3S50A) makes the whole flow accessible on a hobby budget — describe a circuit, generate the bitstream, push it over USB.

SoC FPGAs add hard ARM CPU cores beside the fabric, letting you run software and custom parallel hardware on one die — the best of Chapters 13 and 14 together.

The three ways to collapse a board, ranked

Discrete logic ICs

Parallel hardware, but dozens of packages, fixed forever. Obsolete for products.

↓

Microcontroller

One chip, cheap, flexible — but sequential. Best when the logic is modest and timing is loose.

↓

CPLD / FPGA

One chip that becomes the hardware, fully parallel. CPLD for instant-on glue logic; FPGA for large, fast, parallel systems.

The one idea to keep. A microcontroller is fixed hardware running variable software. An FPGA is variable hardware — you change what the silicon is by loading a configuration, and what you load is, at bottom, the contents of thousands of tiny truth-table ROMs plus a map of which wires connect to which. HDL lets you describe that hardware in text, and because every block is real and simultaneous, the result runs in genuine parallel. Configure, don't program.

Connections to neighboring chapters

Ch 12 — Digital Electronics: sum-of-products, gates, flip-flops, ripple counters — the raw material the LUT and macrocell implement.
Ch 13 — Microcontrollers: the sequential alternative this chapter contrasts against; SoC FPGAs fuse the two.
Ch 10 — Oscillators & Timers: the crystal clock our prescalers divide down.
Ch 15 — Motors →: programmable logic generates the precise parallel PWM and step sequences motor drivers need.

"You don't program an FPGA. You configure it — and it becomes the machine you described."

You can now read a Verilog module and see the parallel hardware behind the text, trace a function from truth table to LUT bits, and choose between a microcontroller and an FPGA on principle. Next, in Chapter 15, that parallel hardware goes to work driving motors.

A 6-input LUT is equivalent to what kind of ROM?

A 6 × 1-bit ROM A 64 × 1-bit ROM (2⁶ = 64 stored bits, one per input combination) A 6 × 6-bit ROM

← Ch 13: Microcontrollers Ch 15: Motors →