11 Chapter 4.4 — The Visual System I: Turning Light Into Difference
Transduction, the inverted retina, and the first two great circuit motifs
11.1 The sense that sees the future
Touch tells you what is happening to your body right now. By the time something registers on your skin it is already against you — the branch, the stove, the floor. We ended the last two chapters at that reactive edge of the sensory gradient, where the world has no lead time and the spinal cord answers before the cortex has heard the question.
Vision is the other end of that gradient. Light reaches you from things that are not yet touching you and, very often, not yet doing anything to you at all: the shape moving at the far side of the clearing, the car still two seconds from the crosswalk, the ripe fruit hanging across the gap. A sense that reaches out in space reaches out in time. That is the deep reason an animal pays for eyes. Recall the slogan from Unit I — the expensive brain is buying prediction — and recall from the unit overview that a large part of what it buys prediction with is the distance senses. Vision converts how far away something is into how much warning you get. It is, in a real and unmysterious sense, a time machine.
I want to set the camera metaphor against that idea straight away, because it is the single most misleading thing you already believe about vision. The eye does have camera-like optics: a transparent window, an adjustable aperture, a lens that throws an inverted image onto a light-sensitive surface at the back. So far the analogy holds. But a camera records an image, and the retina does almost the opposite — it takes the image apart. Before a single signal leaves your eye, the retina has thrown away most of the raw light, kept the differences, split the world into parallel streams, and built the beginnings of color out of comparisons. The optic nerve is not a cable carrying pixels to a little screen inside your skull. It carries the output of a computation that has already happened. My own view, which I will spend this chapter trying to earn, is that the retina is not a sensor attached to the brain. The retina is brain — a piece of it, pushed out toward the light.
This is the first of two chapters on vision, and the division is honest about where the biology has its natural seam. Here we stay with the front end: light, the eye, the retina, and the two computations the retina performs on the way out — the ones that turn a flood of absolute light intensities into a manageable code of differences. We will follow the optic nerve as far as its crossing point and name the thalamic gateway it is heading for, and stop there. The next chapter (4.5) picks up inside that gateway and in the cortex behind it — how the brain reassembles edges and motion and disparity into orientation, objects, faces, and scenes, and what the strange clinical failures of that machinery reveal. The retina computes; the cortex interprets. That is the seam, and it is a real one.
11.2 What the visual system has to represent, and the problems light makes
Start with the world the eye is pointed at. The things an animal needs to act on have several visual properties at once, and a good visual system has to recover all of them: how bright a surface is (its luminance); its color, which depends on the mix of wavelengths it reflects; whether it is moving, and how; where its edges and contours are, so you can tell where one object stops and the next begins; and where it sits in space, including how far away it is. These are not a tidy list to be memorized so much as a set of simultaneous demands. Light delivers information about all of them tangled together, and the nervous system has to pull them apart.
Doing so runs into three problems that are worth stating plainly at the outset, because the rest of the chapter is, in large part, the retina’s answers to them.
The first is the inverse problem. The world is three-dimensional; each retina is a two-dimensional sheet. A small object nearby and a large object far away can cast exactly the same image — the tall line, the short line, and the diagonal line in your textbook figure all paint the same length across the retina. The image does not contain an unambiguous description of its own cause. The visual system cannot read off the world; it has to infer the most likely arrangement of things that would have produced this pattern of light. (Notice that the inversion of the retinal image — top-for-bottom, left-for-right — is not the inverse problem. That part is trivial; the brain simply learns the convention. The hard problem is the lost dimension.)
The second is the compression problem, and it is severe enough to shape the entire architecture of the retina. Lay out the numbers. The human eye can operate across roughly eleven orders of magnitude of light intensity — the difference between a moonless night and a bright snowfield is something like a hundred billion-fold. Against that, the signaling range of a neuron is pitiful: a cell firing flat-out manages perhaps a couple hundred spikes a second, two orders of magnitude at best. Worse, there are about 100 million rods and 6 million cones feeding into only about 1 million ganglion-cell axons in the optic nerve — a hundred-to-one bottleneck at the door. You cannot pour a hundred-billion-fold range through a two-order-of-magnitude code down a wire that is already a hundred times too narrow. Something has to give, and the something is most of the raw light intensity. The retina cannot afford to report how bright things are. It has to decide what is worth sending. Hold onto this; it is the reason both of the circuit tricks in this chapter exist.
The third is that vision is constructive. Because the system infers rather than records, it fills in, guesses, and can be fooled — and the places where it guesses wrong are not embarrassing glitches bolted onto an otherwise perfect camera. They are the system showing its working. A visual illusion is a window onto the assumptions that make ordinary seeing possible. We will meet a few in passing, and the next chapter is full of them.
Figure 4.4.1. The inverse problem. A single retinal image is consistent with infinitely many arrangements of the world: a small near object and a large far object can project identical images. Show three physical objects of different size and distance casting the same retinal extent, with the retina as a 2-D sheet. [Figure to source or redraw.]
11.3 A short evolutionary history of light sensing
Before the human eye, there was light sensitivity without anything we would call an eye — and it is worth a moment, because it sets up everything that follows and because the diversity of animal eyes makes a point no abstract argument can. The point is not that all eyes are the same. It is that “better vision” is not one thing, and that our own arrangement is one workable solution among many, with its own peculiar compromises baked in.
Light sensitivity is ancient, far older than image-forming eyes. Long before any animal could see a shape, cells used light as a signal: for a photosynthetic organism, light is food and its direction is worth tracking; for many animals, light meant time of day, depth, exposure, or the passing shadow of something larger. The molecular basis is a family of proteins called opsins, which hold a light-sensitive chromophore that changes shape when it absorbs a photon — and that single conformational flip, amplified, is the root of all of it. Opsins diversified very early in animal evolution, and comparative genomics shows photoreception to be older and more varied than any one design of eye [@PorterEtAl2012; @LambCollinPugh2007; @FainHardieLaughlin2010]. We met the principle already in the unit overview, in the single-celled Euglena — a protist with a pigmented eyespot and a flagellum, steering itself toward light it can use. There is a receptor, a coupling, and an effector, and the state of the world bends the movement. That is the whole logic, running in one cell with no neuron in sight.
Now watch the design problem unfold, not as a ladder but as a series of options. A flat patch of photoreceptors can tell an animal that there is light, but it is nearly useless for telling where the light is coming from — illumination from every direction lands on it alike. Put pigment behind the patch and recess it into a cup, and shadow now falls differently depending on direction: the system gains a sense of where. Narrow the cup’s opening toward a pinhole and the directionality sharpens into a crude image, at the cost of letting in less light. Cover the opening with a refractive lens and you can gather light and preserve spatial detail at once. Each of these exists in living animals right now; each is a genuine solution to “which way should I go, given what I can detect”; none is a rung on a staircase climbing toward us [@NilssonPelger1994; @Nilsson2009; @LandNilsson2012].
You will sometimes hear that the eye is too intricate to have arisen by gradual change. It is a weak argument, and it fails for a simple reason: every intermediate stage is useful, and living animals display the intermediates. A bare light-sensitive patch is useful — Euglena does well with one. A pigmented cup is more useful. A pinhole is more useful again. A lens, more useful still. Nilsson and Pelger famously modeled a step-by-step route from flat patch to focused camera-eye, allowing only small improvements at each step, and estimated that the whole transition could run to completion in a geologically modest span under ordinary selection [@NilssonPelger1994]. The precise number matters less than the logic it makes concrete: there is a smooth, everywhere-uphill path from a patch of pigment to an eagle’s eye, and evolution never has to leap. As always in this book, “the eye evolved in order to see” is shorthand for “vision was the fitness payoff that selected, step by step, for better light-handling” — no foresight required.
The variety is the lesson. Cephalopods — octopus, squid — independently evolved camera eyes that rival ours in optical quality, on a completely separate evolutionary line, and (as we will see) wired up the more sensible way round. Insects build compound eyes that sacrifice fine spatial detail but excel at wide-field motion and very fast temporal resolution. Many birds carry four cone classes and see well into the ultraviolet, with oil droplets that sharpen their color tuning [@HartHunt2007; @LindEtAl2014]; raptors add foveal specializations for acuity at distance [@MitkusEtAl2017]. The right question is never “is this eye better than ours?” but better for what — for stabilizing flight, for seeing in dim water, for spotting a mouse from altitude, for reading text, for telling ripe fruit from leaves. Evolution does not optimize “vision.” It shapes eyes around the lives that have to use them.
I will hold one of these examples in reserve — the mantis shrimp, which appears to break the rule that more receptors mean better color — until we reach color, because it makes a point about circuits that I cannot make until you have seen what a circuit is for. For now, keep the frame: light sensing is old, opsins are its common currency, and the camera eye is a thing evolution has built more than once, by more than one route.
Figure 4.4.2. Light sensing as a branching set of design options, not a ladder. Show: opsin-bearing photoreceptive cell; flat eyespot; pigmented cup; pinhole eye; lensed camera eye; compound eye; and vertebrate vs. cephalopod camera eyes side by side. Lay out as a branching bush, emphasizing independent solutions. [Figure to source or redraw.]
11.4 The eye, and the strange backwardness of the vertebrate retina
The human eye responds to a narrow band of the electromagnetic spectrum, roughly 400 to 700 nanometers of wavelength — a sliver between the ultraviolet we mostly cannot see (but many birds and insects can) and the infrared we cannot see either (but pit vipers detect with separate heat-sensing organs, and night-vision goggles convert into visible light for us). There is nothing special about this band physically; it is simply the slice our pigments were tuned to, set by the light that reaches the surface of the earth and water. Our visual world is not the whole of what is out there. It is a parochial sample of it.
The optics are quickly told. Most of the eye’s focusing power is at the cornea, the transparent curved front surface; the iris adjusts the pupil to control how much light enters; the lens fine-tunes focus for near or far by changing shape, a process called accommodation; and the focused, inverted image lands on the retina lining the back of the eye. One region of the retina, the fovea, sits at the center of gaze and is packed almost purely with cones; it is where acuity is highest, and we will see why shortly. The rest of the retina trades acuity for sensitivity and for catching motion at the edges of the scene.
Now the oddity, and it is a real one. The vertebrate retina is built backwards. Light entering the eye has to pass through several layers of neurons and a tangle of their fibers — the very wiring that carries the visual signal — before it reaches the photoreceptors, which sit at the very back, facing away from the light. Roughly fifteen percent of the incoming light is scattered or absorbed making that pointless-seeming journey. And where all those ganglion-cell axons gather and dive back through the retinal sheet to leave the eye as the optic nerve, there are no photoreceptors at all: a hole in the retina, the blind spot — a built-in scotoma in your own visual field. (You can find it with the classic demonstration: close one eye, fixate a mark, and slide a second mark sideways until it vanishes. You do not normally notice the hole because, as we will see in the next chapter, the visual system fills it in.) The cephalopods, building their camera eye independently, did it the obvious way: photoreceptors facing forward, toward the light, with the nerve fibers leaving from behind. No detour, no blind spot.
So why is ours backwards? It is tempting to make this a joke about evolution being a sloppy engineer, and I would resist the urge — not because evolution is a good engineer (it has no foresight and cannot redesign from scratch) but because the arrangement turns out to solve a genuine problem, the one the unit overview kept flagging: energy. Photoreceptors are astonishingly expensive cells. They are among the most metabolically demanding tissue in the entire body — by direct measurement the retina consumes oxygen faster than the brain itself, several-fold faster per gram [@WangsaWirawanLinsenmeier2003; @JoyalEtAl2018]. The reason is a curiosity we are about to meet in detail: photoreceptors run a continuous electrical current in darkness and have to keep paying, moment by moment, to maintain it. A cell with that appetite needs to sit directly against a rich blood supply, and the retina’s inverted layout places the photoreceptors flush against the dense vascular bed (the choroid) at the back of the eye, while a second supply feeds the inner retina from the front. The “backwards” retina buys the hungriest cells in the body a double blood supply. The fovea then claws back the lost acuity locally by shoving the overlying layers and vessels aside, leaving the cones there an almost unobstructed path to the light. It is not what you would draw on a clean whiteboard. But living tissue is never designed on a clean whiteboard, and the compromise is a sensible one once you know what the photoreceptors cost.
Figure 4.4.3. Anatomy of the human eye, with the inverted retina called out. Label cornea, iris, pupil, lens, vitreous, retina, fovea/macula, optic disc (blind spot), optic nerve. Inset: light passing through ganglion-cell and inner layers before reaching the rearward-facing photoreceptors; note the choroidal blood supply behind. [Figure to source or redraw.]
11.5 Inside the retina: a three-cell highway and a lateral web
The retina develops as an outpouching of the diencephalon — it grows out of the developing brain and stays connected to it by the optic nerve. This is not a technicality; it is the anatomical fact behind the claim I keep making. The retina is central nervous tissue. It is a layered neural circuit that performs the first stage of visual computation, and it is worth learning its cells because the two great motifs of this chapter are built from them.
The main signal path runs vertically, from the back of the retina toward the front, in three cells:
photoreceptor → bipolar cell → ganglion cell.
The photoreceptors (rods and cones) transduce light into graded changes in membrane voltage. The bipolar cells carry that signal inward to the ganglion cells, whose axons bundle together as the optic nerve and leave the eye. Crucially — and we will lean on this — only the ganglion cells fire true all-or-none action potentials. The photoreceptors and bipolar cells signal with graded, continuous voltage changes, like the dendrites we studied earlier: smoothly varying analog signals, not discrete spikes. (Why graded? Almost certainly because the distances inside the retina are tiny. Action potentials exist to send signals far without their decaying; across a fraction of a millimeter you do not need them, and a graded signal carries more information — a continuous range rather than a spike count. It is reasonable, if you like, to picture the whole retina as something close to a single elaborate neuron, with the photoreceptors and bipolars as a vast dendritic tree doing analog computation and the ganglion-cell axon as the one place it finally commits to spikes. If that picture does not help you, drop it; the cells are real and the metaphor is optional.)
Cutting across that vertical highway is a horizontal web, built from two more cell types: horizontal cells in the outer retina and amacrine cells in the inner retina. These are the lateral connectors. They let activity at one point on the retina influence the signal at neighboring points — they let a cell’s output depend not only on the light falling on it but on the light falling around it. That single capacity — comparison across space — is the seed of the first circuit motif in this chapter. Hold the architecture in mind: a vertical three-cell path for “what is here,” crossed by a horizontal web for “what is around here.” Everything below is built from those two directions.
Figure 4.4.4. The retina’s vertical and horizontal organization. Show light entering from the ganglion-cell side and passing inward to the photoreceptors; the vertical path photoreceptor → bipolar → ganglion; and the horizontal path of horizontal cells (outer) and amacrine cells (inner) providing lateral interactions. [Figure to source or redraw.]
11.5.1 Rods, cones, and the range of light
Vertebrates have two broad classes of image-forming photoreceptor, and the division of labor between them is a direct response to the eleven-order-of-magnitude range of light. Rods are extraordinarily sensitive — a fully dark-adapted rod can register the absorption of a single photon — and they carry vision in dim light (called scotopic vision). The price of that sensitivity is poor spatial acuity (many rods pool their signals together, as we will see) and no useful color discrimination. Cones are far less sensitive and need many more photons to respond, but they support high-acuity daylight vision (photopic vision) and they are the basis of color. There are about 100 million rods and 6 million cones, and they are not spread evenly: cones are concentrated almost entirely in the fovea, rods dominate everywhere else. Between the two extremes lies a mixed mesopic range — twilight — where both contribute. The system does not simply turn up and down across light levels; it switches operating regimes.
Humans have three cone classes, distinguished by which wavelengths they absorb best: S, M, and L for short-, medium-, and long-wavelength. It is conventional to call them “blue,” “green,” and “red” cones, and your textbook figure will label them that way, but I want you to start unlearning that habit now, because it plants exactly the wrong intuition about color — an intuition we will have to dismantle in a few pages. For the moment, note only that the three cone sensitivities are broad and heavily overlapping, especially M and L, whose curves sit almost on top of each other. That overlap is not sloppy engineering; it is the very thing color vision will exploit. The reason M and L are so similar, incidentally, is historical: they arose from a comparatively recent gene duplication in the primate lineage, which is why their genes sit in tandem on the X chromosome — and why red-green color deficiency, which follows from variation in those genes, is far more common in genetically male (XY) individuals with only one X to draw on [@NeitzNeitz2011; @Jacobs2009].
There is also a third kind of photoreceptor, discovered only around 2002, that is neither rod nor cone and does not contribute to seeing shapes at all. A small subset of ganglion cells — the output cells — contain their own light-sensitive pigment, melanopsin, making them directly photosensitive (hence intrinsically photosensitive retinal ganglion cells, or ipRGCs) [@BersonEtAl2002; @HattarEtAl2002; @DoYau2010]. They are slow, they report overall ambient light level rather than pattern, and they project not to the image-forming pathway but to the suprachiasmatic nucleus (the brain’s circadian clock) and to the pretectum (driving the pupil reflex). This is why many people who are completely blind to form — whose rods and cones are gone — nonetheless keep their circadian rhythms entrained to the day-night cycle and still constrict their pupils to light: the melanopsin system is intact and reporting “how much light is there,” even when “what is out there” is lost. We will meet these cells again when the course turns to sleep. For now they make a useful point in their own right: “vision” is not one system but several, braided together at the retina.
Scotopic vision is rod-driven, in dim light: sensitive, colorless, low-acuity. Photopic vision is cone-driven, in bright light: less sensitive, sharp, in color. Mesopic is the mixed regime between — twilight, dusk — where both rods and cones contribute. The vocabulary is worth keeping because it marks a real fact: the retina does not merely scale its activity up and down with light, it changes which receptors and circuits are running. This is also why your night vision is colorless and blurry, and why a dim star is easier to see slightly off to the side (on rod-rich peripheral retina) than by looking straight at it (on the cone-only, rod-free fovea).
11.5.2 Phototransduction: light turns the receptor down
Here is one of the genuinely counterintuitive facts in sensory biology, and your lecture flagged it as a “curiosity” worth sitting with: in a vertebrate photoreceptor, light does not switch the cell on. It switches it down. In darkness the photoreceptor is relatively depolarized and is busily releasing its neurotransmitter (glutamate); when light strikes, the cell hyperpolarizes and releases less. More light means less signal from the receptor. The sign is backwards from almost every other receptor you have met.
The top-line version is all you need to carry forward: a photoreceptor sits in a depolarized, transmitter-releasing state in the dark, and light closes that state down. Why this should be so — and why it is not the problem it first appears — is worth one paragraph. The apparent absurdity dissolves immediately once you remember a principle from earlier in the course: the sign of a signal carries no meaning on its own; meaning lives in who is reading it. A downstream cell can be wired to treat “less glutamate” as “more light” just as easily as the reverse. What matters is only that the relationship is reliable and that the next stage knows the convention. As we are about to see, the retina actually reads it both ways at once, and gets something valuable out of doing so. The molecular machinery behind the sign flip is elegant but not something I expect you to memorize; I have boxed it for the curious.
In darkness, an enzyme keeps levels of a small signaling molecule, cyclic GMP (cGMP), high inside the photoreceptor’s outer segment. cGMP binds to and holds open a set of ion channels (cGMP-gated channels), through which sodium and calcium flow steadily into the cell. This standing inward flow is the dark current, and it keeps the cell depolarized and releasing glutamate — and it is the reason photoreceptors are so metabolically expensive, since the cell must constantly pump those ions back out to keep the current running.
When a photon is absorbed, the opsin’s chromophore (11-cis retinal) changes shape, activating the opsin. In rods, activated rhodopsin switches on a G-protein called transducin, which activates an enzyme (phosphodiesterase, PDE) that breaks down cGMP. With cGMP falling, the cGMP-gated channels close, the inward dark current is cut, and the cell hyperpolarizes — and therefore releases less glutamate. The whole cascade is also a massive amplifier: one absorbed photon, through this enzymatic chain, can block the entry of millions of ions, which is precisely how a rod can register a single photon. The cascade is the mechanism; the take-home for the main line is unchanged — light closes channels, the cell hyperpolarizes, transmitter release falls.
11.5.3 ON and OFF: reading one signal two ways
The retina’s first clever move is to do exactly what the “sign doesn’t matter, the reader does” principle allows: it reads the photoreceptor’s output with two different kinds of bipolar cell, wired to opposite conventions, and so splits the visual world into two complementary channels at the very first synapse.
OFF bipolar cells carry ordinary ionotropic glutamate receptors — fast channels that open when glutamate binds. So in darkness, when the photoreceptor pours out glutamate, the OFF cell is driven; in light, with glutamate falling, it quiets. The OFF cell therefore signals darkness in its receptive field — light decrements.
ON bipolar cells carry a special metabotropic (sign-inverting) glutamate receptor, the kind that “rings a doorbell” rather than opening a channel directly — and this particular receptor inverts the signal: glutamate suppresses the ON cell. So in darkness the ON cell is held down, and in light — as glutamate falls — it is released and becomes active. The ON cell signals light increments.
One photoreceptor, two readers, opposite signs. This is not redundancy, and your instinct to ask “why send the same thing twice?” is the right question with a satisfying answer: it is not the same thing. A dark speck on a bright wall and a bright speck on a dark wall are different events in the world, and an animal needs both. By committing one channel to “it got lighter here” and another to “it got darker here,” the retina carries both kinds of contrast forward with equal fidelity. Keep the ON/OFF split in mind; it is the raw material the next motif works on.
Figure 4.4.5. ON and OFF bipolar pathways. One cone synapses onto an OFF bipolar (ionotropic glutamate receptors) and an ON bipolar (sign-inverting metabotropic receptors), producing opposite responses to a light increment. Show the dark state and the lit state side by side. [Figure to source or redraw.]
11.6 The first great motif: lateral inhibition
I am going to slow down here, because in many years of teaching this course I have learned two things about lateral inhibition. The first is that it is one of the most important ideas in the whole of sensory neuroscience — it is not a quirk of the retina but a motif, a trick the nervous system reuses everywhere it needs to sharpen a signal. The second is that students reliably bounce off it, and I think I finally understand why: it is usually taught as a wiring diagram, a tangle of pluses and minuses, when it is really the answer to a problem you already understand. So let us start from the problem, not the wiring.
The problem is the compression problem, in a specific form. The retina cannot afford to report absolute light intensity — there is far too much range, and the wire is far too narrow. But here is the thing it can exploit: absolute light intensity is mostly useless anyway. What an animal needs to know is not how many photons are arriving, but where things change — where one surface ends and another begins, where the edge is. The absolute brightness of this lecture room versus the absolute brightness of the lawn outside differs by a huge factor, yet your shirt looks the same color in both places, and the boundary between your shirt and your jacket is equally crisp in both. The information worth keeping is in the differences across space, not in the overall level. So the retina should throw away the level and keep the differences. Lateral inhibition is how it does exactly that.
Now the intuition, and I will use the version I use in lecture because it works. Picture three people standing in a row, and a broad spotlight shining on all of them — brightest on the middle one, but spilling onto the two beside her. If each simply reported “I am lit,” all three would shout, and you would learn very little about where the light actually centered. But suppose the brightly-lit middle person is allowed to quiet her neighbors in proportion to her own excitation — to inhibit them sideways. Now she shouts, and she shushes the two beside her, who were less lit to begin with. The result: a sharp peak at the middle, near-silence on either side. You started with a smooth hump of light, all positive, and you ended with a spike marking the place where the light was most concentrated. The neighbors’ job was not to report their own light but to be subtracted from the center.
That is lateral inhibition, and notice what it has done mathematically. By having each location inhibited by its neighbors, the retina is computing, at every point, the difference between the light here and the light just around here. A region of uniform light — every cell and its neighbors equally lit — produces almost no output, because the excitation and the inhibition cancel. Only a place where the light changes across space — an edge — survives the subtraction. If you have done any signal processing this will look familiar: the retina is taking something like a spatial derivative, differencing the image everywhere and reporting where the differences are. It is, among other things, an edge detector. And it solves the compression problem in the same stroke, because differences span a far smaller range than absolute intensities: the retina no longer has to encode “zero to a hundred billion,” only “how much brighter is this patch than its surround,” which fits comfortably into a couple hundred spikes a second.
The horizontal cells we met earlier are the wiring that makes this happen — they are the lateral connection through which a region’s surround pulls down its center. But I want you to hold the function more firmly than the wiring: lateral inhibition turns a code of absolute amounts into a code of local differences, sharpening edges and discarding the overall level. If you remember nothing else, remember that sentence.
11.6.1 Center-surround receptive fields: lateral inhibition made visible
Lateral inhibition shows up, when you record from a single ganglion cell, as a beautiful and very specific structure called a center-surround receptive field. First, the idea of a receptive field: it is simply the patch of the visual world that a given neuron listens to — the region where light can change that cell’s firing. For a single cone, the receptive field is tiny (the bit of the world whose light it catches). For a ganglion cell, which pools input from many photoreceptors through the bipolar and horizontal web, the receptive field is larger, and — this is the key — it is organized into two opposed zones.
Take an ON-center ganglion cell. Light falling on the small central zone of its receptive field makes it fire harder. Light falling on the surrounding annulus makes it fire less. The two zones oppose each other, and the consequences are exactly what the spotlight story predicts:
- A small spot of light on the center alone: strong firing. (Excitation, no opposing inhibition.)
- A ring of light on the surround alone (center dark): firing is suppressed below its resting level.
- Uniform light flooding center and surround together: little or no change — the excitation and the inhibition cancel. The cell is nearly blind to the overall level.
- An edge lying across the receptive field — light covering the center and part of the surround: strong response, because the balance is broken.
And there are OFF-center ganglion cells wired in mirror image — darkness in the center excites them, light in the center suppresses them — which is the ON/OFF split from the previous section, now given a spatial structure. (This is what the two kinds of bipolar cell were for.)
Two details from your lecture are worth pinning down because students stumble on them. First, the response to uniform light being near-zero is the whole point, not a failure: a cell that ignores uniform illumination is a cell that has subtracted out the ambient level and is reporting only contrast. That is why your shirt looks the same indoors and out — the retina is differencing away the lighting and keeping the shirt. Second, these cells have a spontaneous, ongoing firing rate even in the dark, and that baseline is not noise to be apologized for — it is essential. Inhibition can only be seen as a drop below something. If a cell sat silent until excited, it would have no way to signal “less than nothing here”; the resting rate is the zero-line against which both “more” (excitation) and “less” (inhibition) can be read. The spontaneous rate is what lets a single channel carry a signed signal.
Put many such cells side by side and you have a machine that, across the whole visual field, reports where light changes and stays quiet where it does not — an array of edge-finders. The retina has taken an image and handed onward, not a copy of it, but a map of its contours.
Figure 4.4.6. Center-surround receptive fields. Top: an ON-center/OFF-surround ganglion cell under four conditions — center spot, surround annulus, uniform illumination, and an edge across the field — with the corresponding spike trains, including the spontaneous baseline. Bottom: the mirror-image OFF-center/ON-surround cell. [Figure to source or redraw.]
11.7 The second great motif: color is comparison, not wavelength
If lateral inhibition is the motif students bounce off, color opponency is the one they do not even attempt — and I sympathize, because it asks you to give up a picture of color that feels obvious and correct. So I want to build it slowly, and I want to build it by showing you that it is the very same trick you just learned, run in a different space. Lateral inhibition differenced light across space. Opponency differences cone signals across the spectrum. Once you see that the two motifs are the same subtraction wearing different clothes, opponency stops being a second hard thing to learn and becomes a second instance of a thing you already understand.
Start by killing the obvious picture. The obvious picture — the one I had as an undergraduate, and the one your computer screen quietly teaches you — is that color works like RGB: there are three kinds of cone, “red,” “green,” and “blue,” and the brain reads off color by checking which cones are firing, the way a screen makes yellow by lighting its red and green dots. It is clean, intuitive, and wrong. Here is the fact that breaks it: a single cone is colorblind. It cannot, by itself, tell you anything about color at all.
Why not? Because a cone reports only how strongly it was stimulated — an intensity, a number — and that single number throws away the wavelength. Recall that each cone’s sensitivity is a broad curve over wavelength. An L-cone responds best around the long-wavelength end, but it also responds, more weakly, to medium wavelengths. So suppose an L-cone gives you a middling response. What caused it? It could be a moderate amount of light at its best wavelength, or a lot of light at a wavelength it is poor at, or anything in between — all of these produce the identical output. The cone has collapsed “which wavelength” and “how much” into one number, and once collapsed, they cannot be pulled back apart. A single cone, fired, cannot tell you the color of what fired it any more than a single ganglion cell, fired, can tell you which of its photoreceptors caught the light. The wavelength information is lost at the receptor. (This, by the way, is exactly the parallel your lecture drew to the auditory system: a single auditory nerve fiber gives the intensity of sound around its best frequency, not the frequency itself. Same collapse, same loss.)
So if one cone cannot see color, where does color come from? From comparison. The wavelength information that any single cone throws away is recoverable by comparing the outputs of cones with different sensitivities. And the operation that does the comparing is — exactly as with lateral inhibition — subtraction.
Here is the move, concretely, with the M and L cones whose curves nearly overlap. Suppose light of some wavelength arrives and you note how strongly M responds and how strongly L responds. Individually, each is ambiguous (the intensity-wavelength collapse). But take the difference, L minus M, and something new appears: a quantity that no longer depends much on the overall intensity (which raised both M and L together, and cancels in the subtraction) but does depend systematically on where in the spectrum the light sits. At long wavelengths L leads M and the difference is positive; at medium wavelengths M leads L and the difference is negative; somewhere between, they are equal and the difference is zero. A single neuron computing L − M therefore reports a position along the spectrum — a genuine color signal — wrung out of two cones that were each individually colorblind. This is the red-green opponent channel. Notice it is built by differencing, and notice the payoff is the same as before: the subtraction cancels the part you do not want (overall intensity) and keeps the part you do (spectral position).
That is the whole idea. Color opponency is lateral inhibition’s trick — take the difference, let the common part cancel, keep what varies — performed not on neighboring points in space but on different cone types looking at the same point. Two motifs, one operation.
11.7.1 Opponency, built up one channel at a time
With the core idea in hand, the rest is bookkeeping, and it is worth doing carefully because this is where students get lost in the algebra and miss that nothing new is happening — it is all the same subtraction.
The visual system builds three comparison channels out of the three cone types and the rods:
- A luminance channel that essentially adds cone signals (roughly L + M, with rod input): “how much light is here overall,” which — having summed rather than differenced — keeps intensity and is especially useful for fine spatial detail and brightness. This is the achromatic channel.
- A red-green channel: L − M, the one we just built. Positive toward red, negative toward green.
- A blue-yellow channel: S compared against the sum of L and M — that is, S − (L + M). Note that “yellow” is not a receptor; there is no yellow cone. Yellow is what the system computes when L and M are both active and S is not. To get the blue-yellow comparison you first have to construct “yellow” by combining L and M, then difference it against S. The channel is built in two stages, but each stage is the same elementary operation.
And here is the fact that first told the nineteenth century something must be going on like this, long before anyone knew cones existed. Ewald Hering noticed that certain color combinations are forbidden in perception: you can see a reddish-yellow (orange) and a bluish-green (teal), but you can never see a reddish-green or a bluish-yellow. There is no such color, anywhere, for anyone. Why should two color pairs be impossible? Because red and green are not two separate things being added — they are the two signs of one channel. A neuron computing L − M can be positive (red) or negative (green), but it cannot be both at once any more than a number can be simultaneously greater and less than zero. The same for blue and yellow on the S − (L+M) channel. The colors you cannot see are a direct readout of the opponent wiring — the proof, visible in your own experience, that color lives in differences and not in a tally of cone activations.
The other everyday fingerprint of opponency is the afterimage. Stare at a saturated red patch for half a minute and then look at a blank white wall: you will see a ghostly green. Stare at blue, see yellow. The opponent explanation is clean. Prolonged red drives the red-green channel hard in the “red” direction, and the mechanism adapts — fatigues, adjusts its baseline. Remove the red, look at neutral white (which on its own would balance the channel), and the channel, still offset by its adaptation, now reads in the opposite direction: green. The same logic explains the motion aftereffect your lecture mentions — the waterfall illusion, where staring at downward motion makes a static scene appear to drift up — but that runs on motion-opponent machinery further along, in the cortex. Afterimages are not eye-fatigue curiosities. They are the opponent channels showing you their resting point by overshooting it.
The opponent channels do not stop at one subtraction; they are themselves combined downstream. A useful way to see it: imagine a later neuron that receives the red-green difference wave on one input and the blue-yellow difference wave on another, and takes a difference of those differences. At wavelengths where one channel is crossing through zero (contributing nothing), the other channel alone determines the percept — which is why there exist particular wavelengths that look like a “pure,” unmixed blue or a pure yellow, with no trace of red or green: those are the points where the red-green channel reads exactly zero. The construction is two layers of the same operation — difference, then difference again — and out of just those two subtraction stages the entire two-dimensional space of perceived hue can be spanned. Nothing in this box is a new mechanism; it is the opponent subtraction, iterated. (Some of this second-stage combination happens in the retina and LGN; some in cortex. The chapter that follows takes up the cortical part.)
11.7.2 Why more cones need not mean better color: the mantis shrimp
Now I can spend the example I held back, because it makes exactly this point and it makes it beautifully. If color came from counting channels — if each cone type were a color, RGB-style — then an animal with more cone types should see more colors, with finer discrimination. The mantis shrimp is the natural test. Some species carry up to twelve distinct photoreceptor classes spanning from deep ultraviolet to far red — four times our three, the most elaborate set of color receptors known in any animal [@ThoenEtAl2014]. The RGB intuition predicts a creature that sees colors we cannot imagine and splits hairs we cannot split.
When it was finally tested behaviorally, the opposite turned out to be true. Trained to discriminate between wavelengths, mantis shrimp are worse at telling similar colors apart than we are — by some measures around ten times worse than humans, bees, or goldfish, unable to distinguish hues that to us are plainly different [@ThoenEtAl2014; @Zaidi2014]. Twelve receptors, poor discrimination. The reason is exactly the lesson of this section: fine color discrimination does not come from having many channels, it comes from comparing them — from the opponent subtractions that wring spectral position out of receptor outputs. The mantis shrimp appears to skip that comparison stage almost entirely, reading its twelve channels more like a bank of separate detectors scanned across a scene — a system built for fast recognition (“is this the right color?”) rather than fine discrimination (“which of these two near-identical colors is it?”), suited to an animal that strikes in milliseconds. It is a different solution to a different problem, and it is the cleanest possible demonstration that, in color, the circuit matters more than the receptor count. Three cones richly compared beat twelve cones barely compared. Comparison is the thing.
11.8 Acuity: why you only see sharply where you are looking
One more property the retina sets up before we follow its output to the brain: acuity, the fineness of spatial detail you can resolve — and it is bound up with the convergence we have been describing. The logic is simple once you see it. Acuity is set by how many photoreceptors funnel into a single ganglion cell. Where one ganglion cell pools input from a wide patch of receptors, it cannot tell you which of them caught the light — once it fires, that information is gone, just as a cone cannot report which wavelength fired it. A widely-pooling ganglion cell has a large receptive field and coarse acuity. Where instead a ganglion cell draws on nearly one receptor, its receptive field is tiny and its acuity is high: if it fires, you know almost exactly where the light fell.
This is the deep reason for the fovea-periphery difference. In the fovea, cones connect to ganglion cells at close to a one-to-one ratio — minimal pooling, maximal acuity. Out in the periphery, thousands of receptors may converge onto a single ganglion cell — heavy pooling, high sensitivity (many receptors catching scarce photons), but coarse acuity. So the fovea is your high-resolution patch and the periphery is a low-resolution, motion-sensitive early-warning system. There is a direct trade buried here: pooling buys sensitivity at the cost of acuity, which is precisely the bargain rods strike for night vision and cones decline for daylight detail.
And now the genuinely surprising part, which your lecture rightly flags as an illusion worth confronting. It feels as though you see the whole scene in front of you in sharp, uniform detail. You do not. Only the foveal sliver — a couple of degrees, about the width of your thumbnail at arm’s length — is actually high-resolution at any instant. If vision could be frozen, you would find the rest of the visual field is a blur, and nearly colorless toward the edges (few cones out there). The reason you do not notice is that your eyes are in nearly constant motion, flicking the high-acuity fovea from point to point several times a second, and the brain stitches the samples into a seamless impression of a detailed, stable world. The detail is real, but it is gathered actively, a piece at a time — not delivered all at once like a photograph. Your sense of seeing everything sharply is the brain’s competent summary, not the retina’s actual output. (We will return to this active, sampled character of vision in the next chapter; it matters more than it first appears.)
Figure 4.4.7. Acuity and convergence. Left: foveal pathway, near one-to-one cone-to-ganglion-cell wiring, small receptive fields, high acuity. Right: peripheral pathway, many receptors converging on one ganglion cell, large receptive fields, high sensitivity but low acuity. Inset: a line of text rendered sharp only at fixation, blurred a few degrees out. [Figure to source or redraw.]
11.9 Parallel channels leaving the eye
We have been speaking of “the ganglion cell” as if there were one kind. There is not. The retina sends its output to the brain not as a single stream but as a set of parallel channels, each a different population of ganglion cells reporting a different aspect of the scene — and this parallel, decomposed character is the last thing the retina sets up before the optic nerve. The modern count is humbling: the mammalian retina contains dozens of ganglion-cell types — well over thirty functionally distinct output channels have been catalogued — each tiling the retina and extracting its own feature [@Masland2012; @BadenEtAl2016; @SanesMasland2015]. The retina is not one camera but a stack of specialized ones, each sending its own filtered description inward.
For this course, three of those channels matter most, because they organize the early visual pathway and they have distinct destinations we will use in the next chapter:
- Midget ganglion cells feed the parvocellular (P) pathway: small receptive fields, high acuity, slow and sustained, and carriers of the red-green (L − M) opponent signal. They are dense in the fovea and make up the large majority of primate ganglion cells. The P pathway is the high-resolution, color-and-detail stream.
- Parasol ganglion cells feed the magnocellular (M) pathway: large receptive fields, fast and transient, highly sensitive to luminance contrast and motion, but not carriers of fine color. The M pathway is the fast, motion-and-contrast stream. (Mnemonic worth its keep: M for magno, motion, and massive receptive fields; P for parvo, precise, and parvo = small.)
- Small bistratified ganglion cells feed the koniocellular (K) pathway, which carries prominent blue-yellow (S vs. L+M) signals.
This three-channel scheme is genuinely useful and you should learn it — but it is a simplification in two specific ways that are worth flagging honestly rather than smoothing over, and I have put the details in a box so they do not clutter the main line.
First, “rods feed the magnocellular pathway” is wrong, even though you will see it written. It is an understandable guess — rods and the M pathway are both high-sensitivity, low-acuity — but rods do not have their own private channel to the brain at all. Rod signals are routed into the existing cone pathways through a specialized interneuron, the AII amacrine cell, which distributes the rod signal into the ON and OFF cone-bipolar circuits, and from there into all three of the P, M, and K streams [@Wassle2004; @Masland2012]. So at night your vision runs largely on rods, but it reaches your brain by borrowing the cone wiring, not through a dedicated rod-to-magno line. The clean “rods → M” mapping is a teaching convenience that happens to be false.
Second, “K = the blue pathway” undersells a genuinely messy channel. The koniocellular pathway does carry the major blue-yellow signal, which is why it is taught that way. But K is not one thing: it is a heterogeneous collection of cell types with blue-ON and blue-OFF cells, cells suppressed by contrast, cells that respond to nothing the standard stimuli offer, and — unlike the orderly M and P streams that report to V1 — some K cells project past primary visual cortex directly to motion area MT [@Solomon2021]. The “third pathway” is real and important, but it is a grab-bag we understand less well than the other two, and calling it simply “blue” hides that. When you meet M, P, and K in the next chapter as the inputs to cortex, carry the clean version as your scaffold and this box as the asterisk.
The point that survives all the complication, and the one to keep: the retina does not ship a raw image inward. It decomposes the scene into parallel streams — one for fine form and color, one for fast motion and contrast, one for blue-yellow, and many more besides — each a partial, specialized description. The brain receives a set of filtered reports, not a picture. This decomposition, begun in the retina, is maintained all the way up, and it is the organizing fact of the cortical chapter to come.
Figure 4.4.8. The three principal early visual channels. Midget/parvocellular (small fields, high acuity, red-green), parasol/magnocellular (large fields, fast, motion and luminance), and small-bistratified/koniocellular (blue-yellow), each with characteristic receptive-field size, speed, color signal, and projection target. [Figure to source or redraw.]
11.10 From eye to brain: the optic nerve, the crossing, and the gateway ahead
Follow the output the rest of the way. The axons of all those ganglion cells gather and leave the eye as the optic nerve — cranial nerve II — and almost at once the two optic nerves, one from each eye, meet at a junction called the optic chiasm. The name is a clue: chiasm is from the Greek letter chi, X, a crossing. And a crossing is exactly what happens, but a partial and beautifully logical one.
Here is the logic, and it is worth getting right because it explains both normal vision and, in the next chapter, a whole family of clinical deficits. Divide each retina into a nasal half (toward the nose) and a temporal half (toward the temple). At the chiasm, the fibers from the nasal halves cross to the opposite side of the brain, while the fibers from the temporal halves stay on their own side. Work through where that sends the image, and a clean rule falls out. An object off to your right casts its light onto the nasal retina of your right eye and the temporal retina of your left eye. The right eye’s nasal fibers cross; the left eye’s temporal fibers do not. Both therefore end up on the left side of the brain. The same logic in mirror sends the left half of the world to the right brain. So, exactly as in the somatosensory system we studied earlier — and via the same general principle of contralateral organization — each half of the visual world is handled by the opposite half of the brain. Note the elegant consequence: because each eye contributes to both sides, closing one eye does not blind you to half the world; it costs you depth (you lose the two slightly-different views the brain compares for stereo distance), but not a visual field.
That crossing also makes the signature of damage predictable — cut the pathway at different points and you get different, diagnosable patterns of visual-field loss — and it is precisely there, at the readout of these crossings, that the clinical story of vision begins. But that story belongs to the next chapter, where it can be told properly alongside the cortex it depends on, so I will only plant the marker here: the chiasm is where the logic of “opposite half of the brain” is wired, and the lesions that exploit it are the next chapter’s opening.
For now, follow the crossed and uncrossed fibers just past the chiasm — they continue as the optic tract — to their main destination, the gateway every one of the senses on the shared plan has to pass through on the way to cortex. For vision that gateway is the lateral geniculate nucleus of the thalamus, the LGN. And here, deliberately, this chapter stops. We have brought light in through the optics, transduced it, watched the retina perform its two great compressions — differencing across space, differencing across the spectrum — split the output into parallel streams, and delivered them to the thalamic door. What the LGN does with them is not the passive relaying its “gateway” name suggests, and what the cortex behind it builds — orientation, depth, motion, objects, faces, and the strange and revealing ways all of that can break — is the subject of the chapter that follows.
Figure 4.4.9. The visual pathway and the partial decussation. Left and right visual fields; nasal and temporal hemiretinas; optic nerve, chiasm (nasal fibers crossing, temporal fibers staying), optic tract, and the LGN as destination. Mark the points whose lesions produce characteristic field defects, to be taken up in Chapter 4.5. [Figure to source or redraw.]
11.11 Coda: the retina as a biological compression engine
Stand back from the machinery and the chapter has one idea in it, stated four ways. The retina takes an overwhelming flood of photons — eleven orders of magnitude of intensity, a hundred million receptors’ worth — and turns it into something a million-axon nerve can carry and a brain can use. It does this not by recording but by computing: transducing light into a graded signal whose sign runs backwards; splitting that signal into ON and OFF; differencing it across space so that only edges and changes survive, the overall light level thrown away; differencing cone signals across the spectrum so that color is wrung from comparison rather than read off receptors; pooling tightly at the fovea for acuity and loosely in the periphery for sensitivity; and shipping the result inward as a set of parallel, specialized streams. Every one of those operations is a way of keeping the differences that matter and discarding the absolutes that do not. The retina is a difference engine.
And that is why the camera metaphor has to be handled so gently. Optically, the front of the eye is a camera — a lens throwing an image on a surface. But the surface is not film. It is neural tissue that takes the image apart before passing it on, sending the brain not a picture but a structured, compressed, already-interpreted set of evidence about the world. The two circuit motifs you met here — lateral inhibition and opponency — are your first real encounter with how nervous systems compute, and they are the same elementary move, take the difference and let the common part cancel, performed once across space and once across the spectrum. You will meet that move again and again in this book. It is one of the nervous system’s deepest and most reusable ideas, and you saw it first in the eye.
The next chapter begins exactly where this one ends — at the LGN, and the cortex behind it. There we will see how V1 takes the retina’s edges and builds orientation, how the eye-specific signals kept separate at the chiasm are recombined for depth, how the parallel streams fan out into a constellation of visual areas, and what happens — sometimes sad, sometimes bizarre, always revealing — when the machinery of cortical vision fails.
Reasonably settled:
- The eye’s optics are camera-like, but the retina is neural tissue that transforms the image rather than recording it. The retina develops from the diencephalon and is, literally, brain.
- Photoreceptors transduce light via opsins and respond to light by hyperpolarizing and releasing less glutamate. The dark current and its cost are well understood.
- The retina splits signals into ON and OFF channels and, through lateral inhibition (horizontal cells), builds center-surround receptive fields that emphasize contrast and edges over absolute illumination. This is among the best-established computations in all of neuroscience.
- Color depends on comparison among cone classes (opponency), not on one-to-one wavelength labels — a conclusion supported independently by perceptual facts (forbidden colors, afterimages), by physiology, and by comparative cases like the mantis shrimp.
- The retina decomposes the scene into many parallel channels; the midget/P, parasol/M, and small-bistratified/K streams are real and useful organizing categories.
- The partial decussation at the chiasm sends each half of the visual field to the opposite hemisphere. Foveal cones converge near one-to-one for high acuity; peripheral receptors pool heavily for sensitivity.
Genuinely unsettled, or more complicated than the tidy version:
- Retinal ganglion-cell diversity. The clean three-channel (M/P/K) account is a simplification of a system with thirty-plus output types whose functions are still being worked out. The koniocellular pathway in particular is heterogeneous and incompletely understood.
- How rod and cone circuits share wiring. The routing of rod signals through the AII amacrine cell into cone pathways is established in outline, but the details (and species differences) are an active area.
- The function of the massive feedback the visual thalamus receives from cortex — taken up in the next chapter — remains, as for the other senses, genuinely open.
And, as always: there is a great deal here we are sure of. The retina’s basic computations — transduction, ON/OFF, center-surround, opponency, parallel channels — are some of the most solid and most beautiful results in sensory neuroscience. You can build on them.