12 Chapter 4.5 — The Visual System II: Building a World in the Cortex
The thalamic gateway, the hypercolumn, and the two great streams out of V1
12.1 Where the retina left off
We ended the last chapter at a door. The retina had done its work — transducing light, splitting it into ON and OFF, differencing it across space into edges and across the spectrum into color, sorting the result into parallel streams — and shipped the whole compressed package inward along the optic nerve and tract to a structure in the thalamus called the lateral geniculate nucleus. We named that gateway and stopped. This chapter walks through it.
I want to be honest from the outset about a shift in the kind of story I can tell. Up to now the visual system has been describable as a machine: this receptor transduces that variable, this circuit performs that subtraction, and the consequences follow with the tidiness of arithmetic. Lateral inhibition and opponency are, in the end, theorems — you can almost prove they must work the way they do. The cortex is not like that. It is a place where the clean theorems run out and the biology gets genuinely harder to read, where the best experiments earned Nobel Prizes precisely because the answers were not obvious, and where some of the most famous structural features turn out, on closer inspection, to have no agreed function at all. So this chapter is more provisional than the last one, and I will flag the soft spots as we reach them rather than paper over them. That is not a retreat. Learning where the firm ground ends is part of knowing the territory.
Here is the route. We follow the optic tract into the lateral geniculate nucleus and ask what it is doing besides relaying — including a piece of bookkeeping the retina set up and the LGN preserves, which turns out to be the key to recovering the third dimension we lost. Then we enter primary visual cortex (V1), the conceptual heart of the chapter, and watch the retina’s edges get reassembled into oriented contours, the two eyes’ signals recombined for depth, and color pulled out into its own compartments. We will meet Hubel and Wiesel’s cells and the elegant hypercolumn that packages all of it for one point in space — and the deep problem that this feed-forward picture raises if you follow it too far. Finally we watch the single visual stream fan out into a constellation of specialized areas and split into two great pathways, one heading up and back toward where and how, one heading down and forward toward what. We will name the destinations — color in one, motion in another — and stop at the threshold of what happens when these areas break, which is the whole of the next chapter.
12.2 The thalamic gateway: not just a relay
Recall the shared plan from the unit overview: for every sense on the common scheme, the last stop before cortex is a dedicated thalamic nucleus, and for vision that nucleus is the lateral geniculate nucleus (LGN) — geniculate meaning “knee-like,” for its folded shape. The textbook word for what the thalamus does is relay, and I will use it, but I want to register a complaint about it immediately, the same complaint the overview raised for the thalamus in general: the LGN is not a passive repeater. It is the first place where what you are paying attention to, and what your cortex predicts, can reach down and shape what gets passed along. We will come back to that. Start, though, with its structure, because the structure is doing real work.
The LGN is a layered structure, and strictly so — six principal layers stacked like the pages of a slightly fanned book. The parallel streams the retina set up are kept segregated in those layers, exactly as they were kept separate in the optic nerve. The two magnocellular layers (large cells, the fast motion-and-contrast stream from the parasol ganglion cells) sit at the bottom. The four parvocellular layers (small cells, the high-acuity red-green stream from the midget ganglion cells) sit above them. And tucked in the thin zones between those principal layers are the koniocellular cells (konio- from the Greek for “dust,” after their small, scattered appearance) carrying the blue-yellow signal and more besides. The retina’s parallel decomposition is not collapsed at the thalamus; it is faithfully maintained, layer by layer. Hold that fact — the streams stay separate here, and they will stay separate well into the cortex.
12.2.1 One more segregation: the two eyes, and why
There is a second sorting going on in those layers, and it is the one that pays off a debt the last chapter incurred. Recall the partial crossing at the optic chiasm: each half of the brain receives the opposite half of the visual world, but it receives that half-world from both eyes — the nasal fibers from one eye (crossed) and the temporal fibers from the other (uncrossed). So the left LGN gets a left-eye input and a right-eye input, both reporting the right half of the world. And the LGN keeps them apart: each of its six layers receives input from one eye only, the eyes alternating layer by layer. So the LGN is sorting its input two ways at once — by stream (M, P, K) and by eye of origin.
Why bother keeping the eyes separate? Here is where a loose thread from the last chapter gets tied. The visual system has a hard problem to solve: the world is three-dimensional, but each retina is a flat two-dimensional sheet, so depth — distance from you — is exactly the dimension the retinal image throws away. There is a beautiful trick for getting it back, and it depends on having two eyes a few centimeters apart. Because they view the world from slightly different positions, any object casts its image on a slightly different spot on each retina, and the size of that mismatch depends systematically on how far away the object is. This positional mismatch is called binocular disparity, and it is the raw material of stereoscopic depth — the vivid sense of solidity you lose when you close one eye. But the brain can only compute disparity if it has kept the two eyes’ views separate long enough to compare them. Throw them into a common pot too early and the comparison is impossible; the mismatch you need to measure has been averaged away. So the eye-by-eye segregation in the LGN is not bureaucratic tidiness. It is the system protecting the information it will need to reconstruct the third dimension, holding the two monocular images apart until cortex is ready to bring them together. As we will see, that bringing-together happens just upstairs, in V1.
Figure 4.5.1. The layered LGN. Six principal layers in cross-section: two magnocellular (ventral), four parvocellular (dorsal), with koniocellular cells in the interlaminar zones. Annotate each layer by stream (M/P/K) and by eye of origin (contralateral/ipsilateral), showing the two interleaved sortings. [Figure to source or redraw.]
12.2.2 What the gateway adds
If the LGN merely sorted and forwarded, “relay” would be fair. But two facts say otherwise. First, the LGN receives a massive projection running the wrong way — from cortex back down to the thalamus. By count, there are more fibers descending from V1 to the LGN than ascending from the LGN to V1 [@BriggsUsrey2011; @Sherman2005]. A structure that is mostly listening to the cortex it supposedly feeds is not a simple relay. Second, attention modulates it: signals related to what an animal is attending to can be detected in the LGN, gating what passes [@McAlonanEtAl2008; @SaalmannKastner2011]. What this descending torrent is for remains, as the overview warned for cortico-thalamic feedback generally, genuinely unsettled — prediction, attention, gain control, and gating during sleep are all live candidates and none is established [@SpacekEtAl2022]. I am not going to pretend the arrow diagram in your textbook captures it. The honest statement is that the gateway is doing something active and important with cortical guidance, and working out exactly what is one of the open problems of sensory neuroscience. For now, carry the gateway forward as a gated, cortically-supervised sorting station, not a stretch of passive cable.
12.3 Into V1: the cortex starts to compute
From the LGN, the parvocellular, magnocellular, and koniocellular streams project to the back of the brain — to the occipital pole, to primary visual cortex, which goes by several names you should be able to recognize as the same place: V1, striate cortex (for the stripe of Gennari, a band of myelin visible to the naked eye in a slice), and Brodmann area 17. This is where the cortical story properly begins, and it is the part of this chapter I most want you to understand, because the move V1 makes — building specific, meaningful features by combining simpler inputs — is the move the entire rest of the brain makes, over and over, for the rest of this book.
12.3.1 A map, distorted in a revealing way
First, the layout. V1 honors the shared plan in two respects. The thalamic input arrives predominantly in layer 4 — so consistently that “a fat layer 4 stuffed with thalamic axons” is practically the anatomical fingerprint of a primary sensory area. And V1 is organized as a map: the visual field is laid out across the cortical sheet in order, neighboring points in the world projecting to neighboring points on the cortex. This is retinotopy, the visual member of the same family as the somatotopic map in S1 and the tonotopic map in A1. (One quirk to file away, because it explains clinical pictures in the next chapter: the map is not only flipped left-for-right by the chiasm but also inverted up-for-down — the lower visual field projects to the upper bank of the calcarine sulcus, and vice versa. The image is, in every sense, turned over on its way in.)
But “the visual field is mapped in order” hides something the lectures rightly insisted on every time a feature map came up: the maps are distorted, and the distortions are informative. In the tonotopic map, the frequency range of speech gets disproportionate cortical real estate. In the somatotopic map, the fingertips and lips get a swollen share. The visual map has its own version, and it is dramatic: the fovea — the central couple of degrees, where acuity is highest — is allotted a wildly disproportionate area of V1. This is the cortical magnification factor: the tiny central region of the visual field commands an enormous fraction of the cortical sheet, while the vast periphery is squeezed into what is left [@HortonHoyt1991]. The map is not a faithful scale drawing of the visual field; it is a drawing redrawn in proportion to how much that region matters, with the high-resolution center hugely enlarged. This is the cortical echo of a fact we met in the retina — the fovea’s near one-to-one wiring, the periphery’s heavy pooling — now written into the geometry of the brain itself. The same priority is enforced twice: once in how the retina samples, once in how much cortex it earns.
Figure 4.5.2. Retinotopy and cortical magnification in V1. Left: the visual field with eccentricity rings and quadrants. Right: the corresponding map across the calcarine sulcus, with the foveal representation grossly enlarged (cortical magnification) and the periphery compressed; mark the upper-field-to-lower-bank inversion. [Figure to source or redraw.]
12.3.2 From spots to edges: the oriented receptive field
Now the computation. Recall the receptive field — the patch of the visual world a neuron listens to — and recall the retina’s and the LGN’s signature receptive field: the center-surround, a little circular bullseye that reports local contrast, fires to a spot of light in its center and is silenced by light in its surround. That is what arrives at V1’s doorstep: a vast array of these concentric spot-detectors, each tiled across the visual field.
Hubel and Wiesel’s great discovery — the work that won them the Nobel Prize in 1981 — was what V1 does with that array [@HubelWiesel1962; @HubelWiesel1968]. They lowered a microelectrode into the visual cortex of a cat, played visual stimuli on a screen, and listened (literally — the spikes were piped to a loudspeaker, so a cell firing made an audible patter, and they hunted for the stimulus that made it crackle). What they found is that the typical V1 cell does not want a spot at all. It wants a line — an edge or bar — and, crucially, a line at a particular orientation. Show it a bar tilted to its preferred angle and it fires hard; rotate the bar away from that angle and the response falls off, until at the orthogonal orientation the cell may fall silent. The cell is orientation-selective. V1 has taken the retina’s orientation-blind spot-detectors and built from them a detector for oriented contours.
How? The intuition is worth slowing down for, because it is the whole game in miniature. Take several center-surround cells whose little bullseyes happen to lie in a row across the visual field, and wire them all to feed a single V1 cell. Now a spot of light excites only one of them — a weak drive. But a bar of light, lying along that row, lands on the centers of all of them at once — and their combined drive is strong. Tilt the bar off the row and it falls across only one or two centers (and into the inhibitory surrounds of the rest), and the drive collapses. So a V1 cell that simply sums a row of aligned center-surround inputs is, automatically, a detector for a bar at the orientation of that row. The retina handed up edge-fragments expressed as little contrast-spots; V1 stitches the aligned ones into an actual oriented edge. Nothing mystical happened — it is the same principle as lateral inhibition, selective combination of simpler inputs to compute a more specific feature — but applied one level up, it produces something qualitatively new: a representation of contour and its angle, the alphabet from which shapes are spelled.
12.3.3 Simple, complex, and end-stopped: a hierarchy of features
Hubel and Wiesel did not find one kind of oriented cell; they found a graded family, and the gradient is itself the lesson. They named three types.
Simple cells are the ones just described: orientation-selective, with receptive fields divided into distinct excitatory and inhibitory subregions — an ON stripe flanked by OFF stripes, say. Because the subregions are laid out in specific places, a simple cell is fussy not just about a bar’s orientation but about its exact position: slide the bar sideways out of the ON stripe and into an OFF flank, and excitation turns to suppression. A simple cell reports an edge of this orientation, right here.
Complex cells are orientation-selective too, but they have given up the fussiness about exact position. A complex cell responds to its preferred orientation anywhere within a larger receptive field — and, characteristically, it likes that oriented bar to be moving through its field, often in a preferred direction. It has traded positional precision for a more abstract, more useful invariant: an edge of this orientation is present and moving in this region, never mind precisely where. You can see, if you squint, how a complex cell could be built by pooling several simple cells of the same orientation but slightly different positions — and that is exactly what Hubel and Wiesel proposed: complex cells sit one rung up, assembled from simple cells the way simple cells are assembled from center-surround inputs.
Hypercomplex cells — the term you will still see, though the field now more often calls them end-stopped cells — add one more constraint: length. An end-stopped cell increases its response as the bar lengthens, but only up to a point; extend the bar past the end of its receptive field and the response is suppressed. The cell wants a line that stops — that has an end, a corner, a terminus — within its field. With end-stopping you have, in principle, a detector not just for edges but for line-endings and corners, the features that mark where one object stops and curvature begins [@HubelWiesel1965].
A word of calibration on this scheme, in the spirit of the last chapter’s honesty about tidy stories. Hubel and Wiesel framed simple → complex → hypercomplex as a strict hierarchy, each level mechanically built from the one below by pooling, and that staircase is how it is usually taught and how I have just taught it. The picture is a powerful idealization and the simple/complex distinction has survived close re-examination. But the strict feed-forward staircase is a simplification: end-stopping, once thought to be a higher-area specialty, turns up within V1 itself; the lines between the categories are softer than three clean boxes suggest; and Hubel and Wiesel themselves eventually folded the “hypercomplex” cells into the complex category as a variety rather than a separate tier. Keep the hierarchy as your scaffold — it is the right first idea, and the combination-builds-features logic it embodies is exactly right — but hold the three sharp boxes a little loosely.
Figure 4.5.3. Building oriented receptive fields. (a) A row of aligned center-surround cells summed onto one cell yields orientation selectivity. (b) Simple cell: ON/OFF subregions, position-specific. (c) Complex cell: orientation-selective across a larger field, motion-preferring. (d) End-stopped (hypercomplex) cell: response grows with bar length then is suppressed past the receptive-field end. [Figure to source or redraw.]
12.3.4 The hierarchy’s tempting trap: grandmother cells
Here is where I want to plant an idea that the rest of the book will keep returning to, because it is one of the genuinely deep problems in thinking about the brain, and V1’s hierarchy is the cleanest place to first feel its pull.
If simple cells are built from center-surround cells, and complex cells from simple cells, and end-stopped cells from complex cells, then the obvious thing to do is keep going. Pool end-stopped cells into corner-and-curve detectors; pool those into detectors for simple shapes; pool those into detectors for, eventually, whole familiar objects. Follow the staircase to the top and you arrive at a seductive endpoint: a single neuron, somewhere deep in the brain, that fires when and only when you see one specific thing — your grandmother, say. The grandmother cell: the apex of a feed-forward pyramid, the cell that is your recognition of grandmother.
It is a tidy idea, and it is almost certainly wrong as stated, for reasons worth seeing now. If recognition of each object required its own dedicated cell, you would need a separate neuron for every face, word, object, and scene you can recognize — and another for each as it appears from a new angle, in new light, at a new size. The combinatorics explode; you do not have the cells. Worse, such a system would be catastrophically fragile: lose the one cell and you lose grandmother entirely, forever. And it raises a properly vertiginous question, the homunculus problem: even if such a cell existed and fired, who is watching it? A neuron lighting up is not yet anyone’s experience of recognition; positing a cell whose firing means “grandmother” just relocates the mystery of perception to the question of who reads the cell — a little observer inside the head, who would need eyes and a cortex of their own, and so on forever. The grandmother cell does not solve perception; it hides the problem inside a single neuron and hopes you will not look.
I raise this not to resolve it — we cannot, here — but because the failure of the naive hierarchy is a signpost. The brain does build complex features from simpler ones; V1 proves it. But it does not build them all the way up to one-cell-per-concept, and the question of how recognition is actually distributed across populations of cells, and what it would even mean for a pattern of firing to constitute an experience, is one we will circle for the rest of the course. For now: take from V1 the real and powerful principle — features are built by combining simpler features — and take, as its shadow, the warning that you cannot just iterate that principle to infinity and call it perception. Where exactly the building stops, and what replaces it, is the open country the next chapters move into.
12.4 Color in the cortex: the blobs
Run a particular stain across V1 — cytochrome oxidase, a marker for metabolic activity — and a striking pattern jumps out that you would never see otherwise: a regular polka-dot array of darker patches, evenly spaced through the upper layers, which someone with a gift for technical nomenclature named blobs (and the regions between them, inevitably, interblobs). The blobs are real, regular, and they do something distinct from their surroundings.
What they do has a history worth a sentence, because it shows the science correcting itself. When Margaret Livingstone and David Hubel first characterized the blobs in the 1980s, they proposed them as the seat of color processing in V1, the interblobs handling orientation and form [@LivingstoneHubel1984; @LivingstoneHubel1988]. The claim was contested for years — and the honest current picture, supported by more recent recording, is that the original instinct was substantially right: blob neurons are indeed enriched for color-responsive cells, less concerned with orientation, and they are a node where the color signals carried up separately — the red-green from the parvocellular stream, the blue-yellow from the koniocellular — are brought together [@ConwayLivingstone2006; @ConwayEtAl2010]. The complication, which the lectures flagged and which I will pass on rather than smooth over, is that the division is not absolute: some color sensitivity lives in the interblobs too, and some blob cells are not purely about color. So “blobs are the color spots” is the right first approximation with a real asterisk: color processing is concentrated there, not confined there. It is one more instance of the chapter’s recurring caution — the compartments in cortex are tendencies and enrichments, not sealed boxes.
12.5 The hypercolumn: everything about one point in space
We now have several different things V1 computes for a given location — orientation (in the simple/complex/end-stopped cells), eye-of-origin (the monocular inputs in layer 4, not yet combined), color (in the blobs) — and one organizing question: how are they arranged on the cortical sheet? The answer is one of the most satisfying pieces of architecture in the brain, and it gives us the unit that V1 is really built from.
V1 is organized into columns running from the cortical surface down through its layers, and cells within a single column share properties. Drop an electrode straight down through the thickness of cortex and the cells you pass all have nearly the same receptive-field location (they care about the same point in space) and, in an orientation column, the same preferred orientation. Move the electrode sideways across the surface, though, and the preferred orientation shifts smoothly, marching through all the angles. Cutting across that, in a second dimension, the cortex alternates between stripes dominated by one eye and stripes dominated by the other — the ocular dominance columns, left-eye and right-eye territory interleaved like the stripes of a flag.
Put these together and you get Hubel and Wiesel’s lovely synthesis: the hypercolumn. Picture a small block of cortex — a chunk a millimeter or two across — that contains a full set of orientation columns (so that every angle is represented) crossed with a full set of ocular dominance columns (both eyes), with blobs (color) studded through it. That one block holds a complete analysis — every orientation, both eyes, color, luminance, the makings of motion — of one small patch of the visual field. The next hypercolumn over does the same job for the next patch of the field, and so on, tiling the whole visual scene. The retinotopic map, in other words, is a map of hypercolumns: each is a self-contained processing module that takes one point in space and computes everything V1 knows how to compute about it, and they are laid side by side in the order of the world. It is a gorgeous design — a repeating, general-purpose local circuit, replicated across the sheet and addressed by position.
12.5.1 A caution about the column
I am going to teach the hypercolumn because it is real, useful, and clarifying — but I would be breaking faith with this book’s commitment to flagging soft ground if I left you thinking it is fully understood, because one of its components is, on close inspection, a genuine puzzle. The ocular dominance columns in particular have a surprisingly uncertain functional status. The natural guess is that they exist to support stereoscopic depth — keeping the eyes’ inputs sorted for binocular comparison. But the evidence does not cooperate. Ocular dominance columns are present in some species and flatly absent in others (mice, rats, and squirrels do without them) with no corresponding loss of the visual faculties you would expect them to serve; the depth-tuned cells that actually compute disparity show no systematic relationship to the columns; and — most awkwardly — the columns vary enormously within a single species and even within different patches of one individual’s cortex, sometimes nearly vanishing, with no apparent perceptual consequence [@HortonAdams2005; @AdamsHorton2009]. After fifty years of study, the leading review of cortical columns concludes, pointedly, that the column may be “a structure without a function” — possibly an incidental byproduct of how axons sort themselves out during development rather than a feature selection built for a purpose. I do not want to overstate this: ocular dominance columns are a real, beautiful, reliable piece of anatomy, and the orientation and retinotopic organization of V1 are on much firmer functional ground. But put your skepticism ears up here. The textbook habit of treating every regular cortical pattern as an evolved computational device is exactly the kind of just-so reasoning this book keeps warning against — and the ocular dominance column is the standing reminder that a structure can be conspicuous and orderly and still be waiting for anyone to show what it is for.
Figure 4.5.4. The V1 hypercolumn. A block of cortex containing a complete set of orientation columns crossed with left- and right-eye ocular dominance columns, with cytochrome-oxidase blobs. The block analyzes one patch of the visual field; adjacent hypercolumns tile the retinotopic map. [Figure to source or redraw.]
12.6 The streams stay separate: V2 and the fan-out
Leaving V1, the visual signal does not converge into a single “seeing” area. It does the opposite — it fans out into a whole constellation of further visual areas, and the count is humbling in the same way the retinal ganglion-cell count was in the last chapter. In the macaque, careful mapping has identified more than thirty distinct visual areas, occupying something like half of the entire cerebral cortex [@FellemanVanEssen1991]; in humans, modern multimodal parcellation of the cortex finds on the order of dozens of visual areas among the 180 regions it draws per hemisphere [@GlasserEtAl2016]. Vision is not a small upstream module feeding a general-purpose thinker. It is one of the largest things the primate brain does, and it is carved into many specialized stages.
The first stop out of V1 is V2, the second visual area, which wraps around V1 like a collar. And V2 carries the same news the LGN and V1 did: the streams are still segregated. Apply the cytochrome-oxidase stain to V2 and instead of blobs you find stripes — alternating thick stripes, thin stripes, and pale stripes — and the different kinds of information sort into them. Speaking roughly: the thin stripes carry color (the blob output), the pale stripes carry oriented form (the interblob output), and the thick stripes carry motion and disparity (the magnocellular-dominated output). The retina began this decomposition; the LGN preserved it; V1 maintained it in its blobs and layers; and V2 still keeps these information types in separate compartments. This is the single most repeated fact of the early visual system, and it is the one to carry forward: the visual system does not reassemble a picture early. It keeps the scene decomposed into parallel, specialized descriptions for a remarkably long way up. Reassembly, to whatever extent it happens at all, happens late.
Figure 4.5.5. The CO-stain signature changes from V1 to V2. V1’s blobs/interblobs give way to V2’s thick, thin, and pale stripes, with the rough mapping: thin → color, pale → form, thick → motion/disparity. Show injection-tracer continuity from blob/interblob to the corresponding stripe type. [Figure to source or redraw.]
12.7 Two streams: what and where (and how)
Stand back far enough and the constellation of areas resolves into a now-famous large-scale plan. From V1 and V2, visual processing flows along two broad streams heading in different directions across the brain — a dichotomy introduced in a landmark synthesis by Ungerleider and Mishkin, who gathered the lesion evidence and proposed that the visual cortex splits its labor in two [@UngerleiderMishkin1982].
The ventral stream runs downward and forward, from the occipital lobe along the underside of the temporal lobe. It is dominated by parvocellular-derived information — high acuity, fine detail, color, form — and it is in the business of recognition: what is this object? It carries the signals you need to identify a face, read a word, tell an apple from a ball. It is often called the “what” pathway.
The dorsal stream runs upward and back, from the occipital lobe into the parietal lobe. It is dominated by magnocellular-derived information — fast, motion-sensitive, contrast-driven, largely colorblind — and it is concerned with space and movement: where is this object, where is it going, where is it relative to me? Ungerleider and Mishkin called it the “where” pathway. A later and influential reframing by Goodale and Milner argued that “where” undersells it — that the dorsal stream’s real job is to guide action, to convert vision into the reaching, grasping, and orienting of the body in real time, and so is better called the “how” pathway [@GoodaleMilner1992; @MilnerGoodale2008]. Whether you call it “where” or “how,” the contrast with the ventral stream is the load-bearing idea: one stream largely for identifying the world, one largely for acting in it.
A necessary caution, and one the lectures were emphatic about — be suspicious of clean dichotomies. The two-streams scheme is genuinely useful and you should learn it, but it is an idealization with leaks, and the leaks are not minor. The mapping of subcortical streams onto cortical streams, in particular, is looser than the tidy “magno → dorsal, parvo → ventral” slogan implies: the dorsal stream does draw heavily on magnocellular input, but it also receives koniocellular and parvocellular input, and the ventral stream’s areas — V4 most clearly — turn out to receive roughly equal magnocellular and parvocellular drive [@Skottun2014; @FerreraEtAl1994; @NassiCallaway2009]. The streams are also not sealed off from each other; they interconnect and talk throughout. So hold “what versus where/how” as a powerful organizing frame, not a pair of insulated cables. Two tendencies, richly cross-linked — not two dichotomous systems.
Figure 4.5.6. The dorsal and ventral streams. From V1/V2, the dorsal (“where”/“how”) stream to parietal cortex via MT; the ventral (“what”) stream to inferotemporal cortex via V4. Note the dominant but non-exclusive M and P contributions, and the cross-connections between streams. [Figure to source or redraw.]
12.7.1 Two destinations: a color area and a motion area
Each stream has a specialized way-station worth naming now, because they are the two areas whose failures open the next chapter, and because they make the “specialized areas” claim concrete rather than abstract.
In the ventral stream sits V4, an area heavily involved in color — though, true to the chapter’s running caution, not only color: V4 also contributes to form and shape, and its cells respond to complex contours and illusory edges, so it is better thought of as a color-and-form area than a pure color module. The evidence that it matters for color is convergent and clean. Imaging studies contrasting colored against grey-scale stimuli light up a region in ventral occipitotemporal cortex [@ZekiEtAl1991]. Direct electrical stimulation of that region in patients can either evoke colors — floating, retinotopically-localized patches of color appearing in the visual field — or disrupt color perception, wiping out the colors in a test pattern while leaving its forms intact [@AllisonEtAl1994]. Stimulate it and color appears or vanishes; that is about as direct a demonstration of a brain area’s job as neuroscience offers.
In the dorsal stream sits area MT (for middle temporal, its location in the monkey), also called V5 — and MT is about motion [@BornBradley2005]. Its cells are exquisitely tuned to the direction and speed of moving stimuli, the magnocellular stream’s preoccupation carried to a specialized cortical conclusion. Imaging confirms it: a moving stimulus activates MT/V5 strongly where a static one does not, and the area sits right where the motion-and-disparity (thick-stripe) output of V2 leads [@ZihlHeywood2015]. MT is, in effect, the brain’s dedicated motion area, the place where “something is moving, this fast, in this direction” is computed as a feature in its own right — exactly as V4 computes color and V1 computes orientation. The principle is the same one we started with in V1 and have followed the whole way: build a specific feature by combining the right inputs, and give it its own piece of cortex.
12.8 Coda: from a decomposed scene to the edge of meaning
Stand back from the whole two-chapter arc and the shape of it is clear. The retina took an overwhelming flood of light and decomposed it — into contrast, into color comparisons, into parallel streams — and shipped inward not a picture but a set of filtered, specialized reports. This chapter has watched the cortex receive those reports and do two things with them at once. It has begun to build: V1 assembles oriented edges from spots, motion and depth from edges and the two eyes, the alphabet of contour from which shapes are spelled. And it has, just as stubbornly, kept the decomposition going — color, form, and motion held in separate compartments in the blobs and stripes and streams, far longer than intuition expects, fanned out across thirty-some specialized areas rather than fused into one seeing-place.
That double character — building features upward while keeping streams apart — is the organizing fact of cortical vision, and it sets up the deep problem we touched at the hypercolumn and the grandmother cell. If the scene stays decomposed into parallel descriptions and is never simply reassembled into a picture for an inner observer to view, then who sees, and how does the brain bind color, form, and motion back into the single coherent object you experience? We do not have the answer, and I have tried not to pretend otherwise: the cortex is where the clean theorems gave out. But notice how much we do firmly have — orientation columns, retinotopy and cortical magnification, the parallel streams maintained from retina to V2, the two large-scale pathways, a color area and a motion area you can stimulate and disrupt. That is a great deal of solid ground, and it was hard-won.
The deepest way to see what this machinery is for, and how much of ordinary seeing is the brain’s active construction rather than the world’s gift, is to watch it break. When V4 is destroyed, the world goes grey. When MT is destroyed, motion itself disappears and the world becomes a series of stills. When V1 is gone, sight can vanish from awareness while somehow still steering the hand. And when the higher reaches of the ventral stream fail, a person can see an object perfectly — its edges, its color, its form all intact — and have no idea what it is, or whose face they are looking at. These are not curiosities. They are the system showing its working, the seams of construction laid bare by damage. The next chapter takes them up — the constructive nature of vision and the strange, revealing, sometimes devastating ways the cortical machinery of seeing fails.
Reasonably settled:
- The LGN keeps the parallel streams (M, P, K) segregated in its six layers and keeps the two eyes’ inputs in separate layers — the latter protecting binocular disparity for stereoscopic depth, computed just upstream in V1.
- V1 (striate cortex, area 17) is retinotopically mapped with massive cortical magnification of the fovea, receives thalamic input in layer 4, and contains orientation-selective cells. Building oriented edge-detectors by combining aligned center-surround inputs is a well-established and foundational computation.
- Hubel and Wiesel’s simple, complex, and end-stopped (hypercomplex) cells are real response types, orientation-selective, with the orderly columnar organization (orientation columns, ocular dominance columns, blobs) summarized by the hypercolumn.
- The early visual system maintains a parallel, decomposed representation far up the hierarchy: blobs/interblobs in V1, thick/thin/pale stripes in V2.
- The visual cortex comprises dozens of specialized areas split into a ventral (“what”) and a dorsal (“where”/“how”) stream, with V4 a color-and-form area and MT/V5 a motion area — both demonstrable by imaging, stimulation, and lesion.
Genuinely unsettled, or more complicated than the tidy version:
- The function of cortico-geniculate feedback. The LGN receives more descending fibers from cortex than ascending fibers to it; what this is for — prediction, attention, gain, sleep gating — remains open.
- The function of the ocular dominance column. Present in some species and absent in others, highly variable even within an individual, with no faculty clearly dependent on it — possibly a developmental byproduct rather than a computational device (“a structure without a function”).
- The strict simple → complex → hypercomplex hierarchy. A powerful idealization, but the categories are softer and the feed-forward staircase less clean than three sharp boxes imply.
- How the streams map onto the subcortical channels. “Magno → dorsal, parvo → ventral” is a tendency with real leaks; V4, for instance, gets roughly equal M and P input, and the streams cross-talk throughout.
- The grandmother-cell / binding / homunculus problem. How recognition is actually distributed across populations, how features bound apart in cortex are rejoined into single experienced objects, and what it would even mean for neural activity to constitute seeing — open, and circled through the rest of the book.
And, as always: there is a great deal here we are sure of. The construction of oriented features in V1, the maintained parallel streams, the two large-scale pathways, and the specialized color and motion areas are among the best-established results in systems neuroscience. You can build on them.