10 Chapter 4.3 — Audition
Hearing as a Distance Sense
10.1 A sense that buys time
We have spent the last two chapters close to the body. Somatosensation reports the world in contact with the skin; pain reports that some of that contact is doing damage. Both are, in the language of this unit, reactive senses: the stimulus has already arrived, and the job is to respond to something that is already happening to you. With audition the geometry of the problem changes. A twig cracks somewhere behind you. A truck rumbles before it rounds the corner. Someone says your name from the next room. None of these events is touching you yet, and that is precisely the point.
This is the move I want to keep front and center, because it is the spine of the rest of this unit. The distance senses convert spatial distance into time — into warning. The sound is not the event; it is a pressure wave thrown off by the event, propagating through air at roughly 340 meters per second, which on the scale of a forest or a room is slow enough to be useful. By the time you hear the predator, you have not yet been eaten, and that gap is the whole evolutionary payoff. Hearing is one of the senses that lets a control system stop merely reacting and start anticipating. It is, in the terms we set up in Unit II and have been building on since, part of how an animal buys prediction.
I do not want to reduce everything about hearing to predator and prey. The same machinery that lets a mouse hear an owl lets you hear a cello, and a chapter that pretended music were merely a survival tool would be both wrong and joyless. But the architecture came first, and the architecture was built by selection acting on animals that needed to detect events at a distance, identify what produced them, locate them, and act before contact. Speech and music are recent tenants in a very old building. We will meet speech again in Unit VII; here I treat audition relatively briefly for exactly that reason, and concentrate on the parts that belong to sensing for action rather than to language.
This chapter has four jobs. First, place hearing in its evolutionary context, because the human ear is a recent mammalian elaboration of much older machinery — and the history is visible in the hardware. Second, follow the physics and biomechanics: how a pressure wave in air becomes a receptor potential in a hair cell. Third, trace the pathway from cochlea to cortex, which is not the tidy single cable our other sensory systems might have led you to expect. Fourth, ask what the system actually computes — frequency, intensity, and above all location, where some of the most beautiful and most genuinely unsettled neuroscience in this unit lives.
10.2 The evolution of audition: old mechanosensors in new acoustic worlds
The evolutionary story of vision is easy to turn into a cartoon: light-sensitive molecule, eye spot, cup eye, pinhole, lens, and eventually an animal that looks back at you with eyes that are annoyingly good. The cartoon leaves a great deal out, but it has an intuitive ladder. Audition resists that narration, because the first step was not an ear. The first step was a cell that could be bent.
The deep continuity in vertebrate hearing is the mechanosensory hair cell — and it is worth seeing that this is the same transduction trick we have been tracking since the molecular-transduction layer of this unit, where a mechanical force pulls open an ion channel. Hair cells are not specialized for sound. They are also the receptors of the vestibular system, where they report gravity and head acceleration, and of the lateral-line system of fishes and aquatic amphibians, where they report water movement along the body. An ancestral hair cell was not a “hearing cell”; it was a mechanically sensitive cell whose apical bundle turned deflection into an electrical signal. Sound is one way to bend such a bundle. Gravity and water flow are others. That is why audition belongs in the same family as balance, and, in aquatic vertebrates, the lateral line [@FritzschStraka2014HairCells; @LipovsekElgoyhen2023EvolutionaryTuning].
This already removes a misunderstanding. Early vertebrates lived in water, and water poses the problem differently than air does. In water, mechanical disturbances include both pressure changes and bulk particle motion. Fish inner ears use dense otoliths that lag behind when the surrounding tissue moves, bending hair-cell bundles; lateral-line organs sit at the body surface and are tuned to local water movement. A nearby predator, prey, or current can stimulate overlapping systems at once. The boundary between “hearing” and “feeling water move” is therefore far less clean in a fish than in us [@HiggsRadford2013LateralLine; @Webb2023LateralLineEvolution]. Fish hearing is not even one thing: some species are poor pressure-detectors, while others have accessory structures — gas-filled swim bladders, and in some groups tiny Weberian ossicles linking the bladder to the inner ear — that improve sensitivity. Small bones improve hearing in those fish, and small bones improve hearing in mammals, but they are not the same bones, and they were not inherited from a common small-bones-for-hearing ancestor. They are convergent answers to a shared physical problem [@LadichSchulzMirbach2016FishDiversity].
The move onto land changed the physics again, and introduced the problem the rest of this chapter keeps returning to. Airborne sound does not pass efficiently into fluid: air and inner-ear fluid have very different acoustic impedances, so a pressure wave in air mostly bounces off a fluid boundary rather than entering it. This is the impedance-matching problem, and terrestrial vertebrates solved it with tympanic middle ears that couple an eardrum to the inner ear.
Here I want to be careful, because this is exactly the kind of claim textbooks over-tidy. Tympanic hearing appears to have arisen independently in several tetrapod lineages — amphibians, mammals, and reptiles among them — which is a lovely example of convergence under a shared physical constraint. But “independent origins” is itself being revised as the evidence sharpens: a recent analysis of crown reptiles argues for a single deep origin of the tympanic middle ear within that group rather than multiple separate origins among living reptiles [@ChristensenDalsgaardCarr2008TympanicEars; @Tucker2017TympanicMiddleEar; @Bronzati2024CrownReptileTympanum]. The large point survives — land animals repeatedly faced the same impedance problem — but it is a good reminder not to harden a satisfying story while the fossils are still arriving.
Mammals then did something genuinely elegant. The mammalian middle ear has three ossicles — malleus, incus, and stapes — and two of them, the malleus and incus, derive from bones that once formed part of the jaw joint of our synapsid ancestors. As the mammalian dentary-squamosal jaw joint evolved, older jaw elements were freed from chewing and recruited into hearing. This is one of the canonical examples of exaptation: structures built for one job becoming available for another. Evolution did not plan it — it tinkered with development, jaw mechanics, and skull geometry across long spans of time — but the result is a standing reminder that the head you use to chew, speak, and listen is a historical compromise, not a designed instrument [@AnthwalJoshiTucker2013MiddleEarJaw; @MaierRuf2016MiddleEarHistorical].
One more lesson, and then we can go inside the ear. Ears did not evolve only once, and they are not all built on the vertebrate plan. Insects have evolved tympanal ears many times over, and they place them wherever development allows — on legs, wings, abdomens, even mouthparts. An ear on a cricket’s leg looks bizarre only if you assume ears belong on the side of a head. They do not. An ear is not a fleshy flap on a primate skull; it is any biological solution to the problem of extracting useful information from mechanical vibration [@WarrenNowotny2021InsectEars].
Young humans hear from roughly 20 Hz to 20 kHz, the upper limit dropping with age [@Purves2001AudibleSpectrum]. It is tempting to rank species on that scale, but “better hearing” is not a single dimension. Elephants communicate over kilometers using infrasonic rumbles near or below the bottom of our range [@Garstang2004ElephantCommunication]. Bats and toothed whales independently evolved echolocation — in air and in water — with ears, skulls, and circuits tuned to analyze returning echoes [@MossOrtizWahlberg2023Echolocation]. The greater wax moth detects ultrasound approaching 300 kHz, almost certainly as a move in its evolutionary arms race with echolocating bats [@MoirJacksonWindmill2013WaxMoth]. Barn owls are famous not because they hear every frequency well but because their ears, facial ruff, and brainstem circuits are exquisitely specialized for locating prey in darkness [@KnudsenKonishi1979BarnOwl] — a fact that will matter a great deal later in this chapter. The point is not that humans are mediocre, though that is a healthy corrective to our usual vanity. The point is that auditory systems are shaped to ecological problems. Evolution does not optimize hearing in the abstract; it tunes a control system to the world the animal must act in.
So the auditory system can feel oddly busy compared with a clean sensory-pathway diagram, and now we can see why: it is carrying several histories at once. The inner ear carries the history of balance and water motion; the middle ear carries the history of jaws; the auditory brainstem, as we will see, carries the history of rapid orienting and bilateral comparison. Human audition is not a language device bolted onto a generic mammal. It is an ancient vertebrate mechanosensory system, elaborated for the acoustic life of primates and then, very recently, recruited for speech and song.
Figure 4.3.1. A non-laddered sketch of auditory evolution: (1) an ancestral mechanosensory hair cell; (2) vestibular and lateral-line hair-cell organs in an aquatic vertebrate; (3) a fish inner ear with otoliths, with optional swim-bladder/Weberian-ossicle enhancement; (4) a tympanic middle ear as an impedance-matching device, labeled to note that the number and timing of origins differs by lineage; (5) the mammalian three-ossicle middle ear and coiled cochlea, with malleus and incus marked as former jaw elements; (6) an insect tympanal ear on a non-head segment. The figure should deliberately avoid implying a ladder of progress toward humans. [Figure to source or redraw.]
10.3 What sound is
Sound is a mechanical disturbance traveling through a medium. In air it is a train of compressions and rarefactions: a vibrating object — a tuning fork, a vocal fold, a speaker cone — pushes nearby air molecules together and then lets them spread, and that local pressure pattern propagates outward. Nothing material streams from the source to your ear. Energy moves through the medium; the molecules mostly stay home.
A pure tone is described by frequency (cycles per second, in hertz), which maps strongly though not perfectly onto perceived pitch, and amplitude, the size of the pressure swing, which maps onto loudness — again imperfectly, because the ear weights frequencies unequally and loudness is not a simple pressure meter. Two practical points matter for the rest of the chapter. First, loudness is measured in decibels, which are logarithmic: a 60 dB sound is not twice a 30 dB sound but vastly larger in pressure. That is not perversity; the ear covers an enormous dynamic range, from a mosquito near the ear to a thunderclap, and a logarithmic scale is the sane way to handle it. Second — and this is the point I care about — the same machinery that makes the ear exquisitely sensitive also makes it mechanically fragile. The structures that amplify faint vibrations can be injured by loud ones, which is why hearing loss from noise exposure is a real and permanent thing, a fact we will return to.
Real sounds are almost never pure tones. A voice, a violin, a snapping twig, a siren — each has a spectrum, a mixture of frequencies with its own amplitudes and its own changes over time. This is why a middle C on a piano and a middle C on a clarinet share a pitch but sound nothing alike: same fundamental, different timbre. And it tells us what the auditory system is really for. It does not merely ask “what frequency is present?” It asks “what kind of event produced this pattern of frequencies changing over this span of time?” That is a question about causes, not about pressure — and answering it will turn out to require a great deal of machinery.
10.4 The ear as an impedance-matching machine
The auditory system opens with the engineering problem we met in the evolution section: pressure waves arrive in air, but the receptor cells sit in fluid, and most of the airborne energy would simply reflect off the fluid boundary. The outer and middle ear exist to solve this.
Sound first meets the outer ear — the pinna and ear canal. The pinna is not just cartilage for hanging eyeglasses on. Its folds filter sound in a direction-dependent way, and together with the head and torso it imposes a characteristic, location-dependent filtering on incoming sound. Engineers call the resulting description a head-related transfer function; the biology of it is that the spectrum reaching your eardrum carries a fingerprint of where the sound came from, especially for up-down and front-back, where the two ears get nearly identical timing. We will use this when we get to localization. The ear canal adds its own resonances, boosting some frequencies — notably in the range that matters for speech. I have in the past called this an auditory “fovea,” and I want to take that back, or at least heavily qualify it. The retina has a literal fovea: a dense patch of receptors with high acuity. The ear has no such patch. What it has is a set of mechanical filters that make certain frequency ranges especially consequential. Speech benefits from that, but the filters were not built for English. They are old vertebrate hardware that speech later moved into [@Hofman1998NewEars].
The pressure wave then reaches the tympanic membrane, the eardrum, which moves with it, turning airborne pressure into mechanical motion. That motion passes through the three ossicles — malleus, incus, stapes; hammer, anvil, stirrup. The ossicles act as a lever system and, crucially, concentrate force from the relatively large eardrum onto the much smaller oval window of the cochlea. Force over a large area transferred to a small area is a pressure gain, and that is the impedance match: it is how airborne energy gets efficiently into cochlear fluid instead of bouncing off. The stapes pushes the oval window; the fluid inside the cochlea moves; and a second flexible membrane, the round window, bulges to let that fluid displacement go somewhere. Without the round window the cochlea would be a sealed, rigid bottle — push at one end and nothing useful happens.
There is a lesson worth stating plainly, because it recurs for every sense. The receptors are not at the eardrum. The eardrum is part of an energy-transfer chain: air pressure, to membrane motion, to ossicle motion, to fluid motion, to hair-bundle deflection, to electrical signals. Every link in that chain loses something and emphasizes something else. A sensory system is not a transparent window onto the world. It is a selective transducer, and what it selects is part of what the animal ends up perceiving.
Figure 4.3.2. Cross-section of outer, middle, and inner ear: pinna, ear canal, tympanic membrane, malleus, incus, stapes, oval window, round window, cochlea, and the vestibular organs. The figure should emphasize the energy-transfer chain — air pressure → eardrum → ossicles → cochlear fluid wave — and the area difference between eardrum and oval window that produces the impedance match. [Figure to source or redraw.]
10.5 The cochlea: an unrolled map of frequency
The cochlea is coiled like a snail shell — one of those anatomical comparisons that actually helps. Unroll it conceptually and you get a tapered mechanical structure. Near the base, by the oval and round windows, the basilar membrane is narrow and stiff; near the apex it is wider and floppier. That mechanical gradient is the heart of the matter: stiff structures resonate to high frequencies, compliant ones to low. So a given frequency produces its maximum displacement at a particular place along the membrane. High frequencies peak near the base, low frequencies travel farther and peak nearer the apex. This place-to-frequency mapping is called tonotopy, and it is the auditory system’s first and most basic feature map.
The foundational demonstration goes back to Georg von Békésy, who showed that a sound launches a traveling wave along the basilar membrane — a wave that grows, peaks at a frequency-dependent location, and then decays [@Bekesy1960Experiments]. That spatial peak is one of the first ways the nervous system represents frequency: a 4000 Hz tone displaces a different stretch of membrane than a 250 Hz tone, and downstream neurons inherit that “which place moved” information.
But passive mechanics alone would give poor hearing — far less sensitive and far less sharply tuned than what we actually have. The cochlea is active. The trick lives in a division of labor between two kinds of hair cell. Inner hair cells are the true sensory receptors; they carry the great majority of the afferent signal to the auditory nerve. Outer hair cells are mostly amplifiers: they change length in response to voltage, driven by the motor protein prestin, and this electromotility pumps mechanical energy back into the basilar membrane, sharpening its tuning [@Ashmore2008OuterHairCell; @Dallos2008CochlearAmplification]. Knock out prestin function in mice and the cochlear amplifier collapses, with large losses of sensitivity [@Liberman2002Prestin]. The clinical corollary follows directly: damage outer hair cells and hearing becomes less sensitive and less sharply tuned; damage inner hair cells or their synapses and the signal to the brain is compromised at the source.
Two codes for frequency, then, not one. There is the place code from tonotopy, and there is a timing code: for lower-frequency sounds, auditory-nerve fibers can phase-lock to a particular part of each cycle of the wave. No single fiber can fire on every cycle at high frequencies, but populations of fibers, firing on different cycles, preserve timing information across a useful range. As usual, the nervous system declines to rely on one imperfect code when it can combine several — a theme that will return, with consequences, when we reach localization.
Figure 4.3.3. An unrolled cochlea showing the basilar-membrane gradient: narrow and stiff at the base (high-frequency peaks) widening to floppy at the apex (low-frequency peaks). Include an inset of a traveling wave whose peak location shifts with frequency, and label inner versus outer hair cells with their distinct roles (receptor vs. amplifier). [Figure to source or redraw.]
10.6 Hair cells: mechanotransduction, with a twist
At the receptor level, audition is mechanotransduction — the bending of a bundle opening an ion channel — but with enough distinctive features that it is worth slowing down. The “hairs” are not hairs; they are stereocilia, actin-filled projections in graded rows on top of each hair cell. When the basilar membrane moves relative to the overlying tectorial membrane, the bundle shears. Bending toward the tallest row increases tension on fine extracellular filaments called tip links; bending the other way relaxes them. The tip links are built largely from cadherin-23 and protocadherin-15 [@Kazmierczak2007TipLinks], and tension on them pulls open mechanically gated channels near the tips of the stereocilia. The molecular identity of that channel was a long-standing puzzle; the current view is that the channel complex is built around TMC1/TMC2 together with several associated proteins, and I word that carefully on purpose, because this is exactly the kind of sentence a single good structural paper can rewrite — the channel is a multi-protein machine, not a single named protein [@Pan2018TMC1Pore].
Here is the twist, and it is one students reliably get backwards because it contradicts the rule of thumb from every other neuron. The stereocilia are bathed in endolymph, an unusual extracellular fluid that is rich in potassium and sits at a positive potential relative to the hair cell’s interior. So when the transduction channels open, the main depolarizing current is potassium flowing into the cell — not the sodium influx you would expect elsewhere. Calcium matters too, especially for adaptation and for transmitter release at the cell’s base, but the first-pass cartoon should not be “sound opens calcium channels.” It is: sound bends the bundle, bundle tension opens the transduction channels, and potassium-rich endolymph drives the depolarization. Depolarization of an inner hair cell then releases glutamate onto auditory-nerve fibers, whose cell bodies sit in the spiral ganglion and whose axons form the auditory part of cranial nerve VIII.
One fact here has large clinical weight: in mammals, mature cochlear hair cells do not regenerate in any useful way. Birds and fish can replace hair cells after injury; mammals essentially cannot, particularly in the organ of Corti [@Choi2024HairCellRegeneration]. This is the reason noise exposure, ototoxic drugs, and ordinary aging produce permanent hearing loss — once the receptors are gone, they are gone. There is intense and genuinely promising work on regeneration, supporting-cell reprogramming, and related strategies, but as of this writing none of it is a routine clinical cure for the adult mammalian cochlea. I would be glad to revise that sentence in a future edition.
It is tempting to draw the ear as a microphone wired upward to the brain. That is not wrong, but it is incomplete. The brain sends signals back down to the cochlea through olivocochlear efferents: medial fibers contact outer hair cells and can change cochlear gain; lateral fibers influence auditory-nerve dendrites near the inner hair cells [@Guinan2006Olivocochlear]. Even at the very first stage of hearing, the system is not passively receiving input — it is regulating the input it receives, a control loop reaching all the way out to the sensory surface. This is the same architectural signature the overview raised for the great cortico-thalamic feedback projection: the existence of massive descending control is not in doubt, but what it is computationally for — protection from loud sounds, improving hearing in noise, attentional gain, or something else — is still argued, exactly the kind of open question the overview asked you to keep live rather than paper over. Hold onto the architecture, though; we saw descending control modify its own input in the pain system, and we will see it again in vision.
10.7 The pathway: a system that is already computing
The shared vertebrate sensory plan from the unit overview leads you to expect a tidy line — receptor to nerve to a thalamic nucleus to a primary cortical map in layer 4. Audition honors that plan: its thalamic nucleus is the medial geniculate, its primary cortex is A1, its map is tonotopic. But it complicates the line almost immediately, and the complication is the interesting part — this is one of the chapters where the overview’s “older, parallel pathways” turn out to do a great deal of the work.
The auditory nerve enters the brainstem and synapses in the cochlear nuclei. These are not passive relays; different cell types extract and preserve different features — onset, timing, intensity, spectral shape. From there a simplified ascending path runs: cochlear nuclei → superior olivary complex → nuclei of the lateral lemniscus → inferior colliculus (in the midbrain tectum) → medial geniculate body of the thalamus → primary auditory cortex on Heschl’s gyrus. That is the line to know. But it is emphatically not a single cable, and two features make audition different from the somatosensory system you already studied.
First, the system goes bilateral almost immediately. Recall that in somatosensation each side of the body maps to the opposite hemisphere, full stop — a one-sided map. Audition does not work that way. Fibers from the ventral cochlear nucleus project to the superior olive on both sides, some crossing the midline in the trapezoid body, so that the superior olive receives input from both ears. Then, higher up, the pathway crosses again at the inferior colliculus. The upshot is that each ear is represented in both hemispheres, with a contralateral bias. The behavioral signature is striking: a lesion of one primary auditory cortex does not make a person deaf in one ear, the way a lesion of primary visual cortex can blind part of the visual field. The reason audition is built this way is the subject of the next section: locating a sound requires comparing the two ears, and you cannot compare what you have not brought together. The binaural architecture is not incidental. It is the whole point.
Second, the auditory pathway intersects with action early — and here it cashes in a promissory note from the overview. The inferior colliculus sits in the tectum, the ancient midbrain structure for mapping events in space and driving orienting responses. The overview flagged these older, parallel routes — the superior colliculus snapping your eyes toward something salient before you have consciously identified it — as a recurring feature to watch for in every sensory chapter; audition is where one of them sits squarely on the main ascending path. The presence of a major auditory station inside the orienting tectum tells you the system is not merely asking “what sound is this?” but also, from early on, “where is it, and should I turn toward it?” There is even a hint of this integration in the dorsal cochlear nucleus, which receives somatosensory as well as auditory input — plausibly because the sound at your eardrum depends on the position of your head, pinna, and jaw, so the system has reason to track the body through which the sound was filtered. Sensing, here, is already entangled with the body and with action.
Figure 4.3.4. The ascending auditory pathway from cochlea through auditory nerve, dorsal and ventral cochlear nuclei, trapezoid body, superior olivary complex (with MSO, LSO, and MNTB), lateral lemniscus, inferior colliculus, medial geniculate body, and auditory cortex. Use line weight or color to show the bilateral projection and contralateral bias — and contrast it visually with the one-sided somatosensory map from 4.1. [Figure to source or redraw.]
10.8 Localizing sound: tiny timing differences, and a debate worth having
Sound localization is one of the genuine achievements of the auditory brainstem, and it is easy to underrate because it feels effortless. Consider the physical problem. Your ears are separated by maybe 20 centimeters. A sound from your left reaches the left ear slightly before the right — the maximum interaural time difference for a human is only on the order of hundreds of microseconds. And yet, under good conditions, listeners discriminate differences of tens of microseconds. That is an absurd level of timing precision for warm, wet tissue, and I mean absurd as a compliment.
The system uses two main cues, sorted by frequency. For low-frequency sounds it uses interaural time differences (ITDs): the sound arrives earlier at the near ear. For high-frequency sounds, where the wavelength is short relative to the head, the head casts an acoustic shadow, so the sound is louder at the near ear — an interaural level difference (ILD). The medial superior olive (MSO) is classically the ITD machine; the lateral superior olive (LSO), with powerful, precisely timed glycinergic inhibition relayed through the medial nucleus of the trapezoid body (MNTB), is classically the ILD machine. Level differences are computed as something like a neural tug-of-war: stronger input from one side, inhibiting the other, and the side that “wins” indicates the likely direction. This is exactly what stereo audio exploits — make a sound slightly earlier and louder in the left channel and you hear it on the left. Spatial audio is a controlled illusion built on real brainstem arithmetic.
Now for the part I most want you to sit with, because it is a clean example of how to hold a scientific question that is genuinely open — and of why a beautiful model is sometimes a reason for more suspicion, not less.
The classic account of ITD detection is the Jeffress model (1948). Picture two rows of axons, one from each ear, feeding a row of coincidence-detector neurons — like two lines of falling dominoes started from opposite ends. If the sound hits both ears simultaneously, the two waves of activity meet in the middle; if it hits the left ear first, the meeting point shifts toward the right. Each coincidence detector, sitting at a particular position, therefore fires best for a particular ITD, and the population forms a topographic map of azimuth — a place code for auditory space. It is an elegant idea, and I still teach the domino intuition first because it captures the essential insight: that a difference in time can be converted into a difference in place, and thus into a neural signal for direction.
And here is where calibration matters. The Jeffress model is not merely a teaching cartoon that has been confirmed — it has been confirmed in birds and not in mammals, and that distinction is the live science. In the barn owl, whose localization system has been studied in extraordinary detail, the predicted ingredients are largely there: delay lines and a topographic map of space in the relevant nucleus [@KnudsenKonishi1979BarnOwl]. This is the owl from the evolution box earning its place — a creature under intense selection to locate prey in darkness, in which the textbook circuit really does seem to be implemented.
In mammals, the picture looks different, and the disagreement is real, not a matter of a result that simply failed. Recording from the MSO of the gerbil, Brand, Behrend, Marquardt, McAlpine, and Grothe found responses inconsistent with a Jeffress-style topographic map, and showed that precisely timed glycinergic inhibition is essential to the tuning — pharmacologically blocking that inhibition shifted neurons’ preferred ITDs [@Brand2002Inhibition]. From this and related work, an alternative emerged: rather than reading out the peak of activity across a labeled-line map, small-headed mammals may encode ITD as a population rate code in just two broadly tuned, opponent channels — one for each hemifield — with location inferred from the difference in firing between the two sides [@McAlpine2001NeuralCode; @GrothePecka2014NaturalHistory]. There is supporting evidence that human ITD processing, too, looks more like an opponent-channel scheme than a fine topographic map.
I want to be careful to represent this as what it is. These are independent research programs genuinely disagreeing about interpretation, not a fringe challenge to a settled result. The classical delay-line/place-code lineage is well-supported in archosaurs; the opponent-channel/inhibition-based account is well-supported in small mammals; and reviewers in the field state plainly that the mammalian question “is not settled by far,” with the role of inhibition in the MSO still under active debate. That is the honest status: a real, current, multi-lab disagreement about how the mammalian brain solves a problem that birds appear to solve the textbook way. Learning to sit with that — to hold “the delay-line intuition is right about the computation” alongside “the mammalian implementation is contested” without collapsing either into false certainty — is itself a scientific skill, and a more useful one than memorizing a single diagram. There is plenty in this chapter we are sure of; this is one of the places where the right move is to keep the question open.
Finally, neither ITDs nor ILDs resolve everything. Many locations produce nearly identical interaural cues — the so-called cone of confusion, where front-back and up-down ambiguities live. Here the spectral cues from the pinna come back into play: the direction-dependent filtering we met earlier lets the brain disambiguate elevation and front-from-back. And because those cues depend on the exact shape of your ears, the brain has to learn them — fit people with subtly altered pinnae and they re-learn to localize over days and weeks [@Hofman1998NewEars]. Small head movements help too: rotate your head and the ambiguity changes, which a static two-ear snapshot could never resolve. Once again, perception is not passive reception. It is active, embodied inference — the animal moving its sensors to ask better questions of the world.
Figure 4.3.5. Sound localization in three panels: (a) an ITD for a low-frequency source on the left; (b) an ILD/head-shadow cue for a high-frequency source on the left; (c) pinna/spectral cues resolving elevation and front-back within the cone of confusion. A companion panel could contrast the avian delay-line/place-code scheme with the mammalian opponent-channel scheme, labeled as competing live hypotheses rather than as a settled mechanism. [Figure to source or redraw.]
10.9 From frequency maps to auditory cortex
Primary auditory cortex sits on Heschl’s gyrus, tucked into the lateral sulcus. Unlike visual cortex, which sprawls visibly over the occipital pole, auditory cortex is mostly hidden from the lateral surface — but hidden is not minor. Like the cochlea, it is tonotopically organized, and this is the auditory instance of the overview’s central point about primary sensory cortex: it is a map, a feature space in which nearby points represent similar things, with frequency playing the role for sound that body-location plays for touch and visual-field-location plays for vision. Stimulate auditory cortex electrically and a person hears tones; imaging reveals orderly tonotopic gradients [@Formisano2003Tonotopic]. The thalamic input from the medial geniculate arrives, as the overview’s layer-4 signature predicts, strongly in the middle cortical layers.
But tonotopy is the beginning, not the explanation, and this is worth saying bluntly: auditory cortex is not a piano keyboard. A frequency map does not by itself explain timbre, speech, music, or the segregation of a voice from background noise. The point is exactly parallel to vision, and the parallel is worth holding onto for the next chapter. Primary visual cortex is retinotopic, but we do not stop there; we go on to ask how the brain represents motion, depth, objects, and faces. In audition, tonotopy is the foundational map, not the finished perceptual world — the auditory world is no more made of isolated frequencies than the visual world is made of isolated pixels. Building auditory objects — grouping the mixture at the eardrum back into “a voice,” “a violin,” “a door closing” — is a further inference the cortex performs, and one we are only beginning to map. We will pick this up properly when language re-encounters auditory cortex in Unit VII.
There is also a real but easily-cartooned hemispheric asymmetry. In broad strokes, the left auditory system tends to be more engaged by the rapid temporal structure that matters for speech, the right by spectral detail, pitch, and music [@ZatorreBelin2001TemporalSpectral]. The evidence for that generalization is real; the cartoon “left brain = language, right brain = music” is not how brains work. Speech is bilateral. Music is bilateral. The honest version is that auditory cognition has partially separable components, which is why focal damage can dissociate them: a right-temporal lesion can produce amusia, in which music no longer sounds like music while speech comprehension is relatively spared, and other lesions can do the reverse. Dissociations like these tell us the components are separable; they do not tell us one hemisphere “owns” a faculty.
Figure 4.3.6. Human auditory cortex on the superior temporal plane: Heschl’s gyrus/core, surrounding belt and parabelt regions, and several tonotopic gradients — labeled “schematic, not a one-to-one map of functions.” Pair conceptually with the retinotopy figure to come in the vision chapter, to make the map-is-only-the-beginning point across both senses. [Figure to source or redraw.]
10.10 Restoring hearing: the cochlear implant, and a moving edge
Hearing loss is common, and its weight is easy to underestimate if you think of hearing as merely tone detection. Hearing carries communication, social contact, spatial awareness, and safety, and losing it has costs that ripple well beyond the audiogram.
The most successful neural prosthesis we have is the cochlear implant, and by now you can almost reverse-engineer it from what this chapter has already covered — which is, frankly, the most satisfying way to learn how it works. Suppose the hair cells are gone but the auditory nerve survives. You have a tonotopically organized nerve with no working transducers in front of it. So: put a microphone outside; have a processor split the incoming sound into frequency bands; and place an electrode array along the cochlea so that each band stimulates the part of the nerve that — by tonotopy — already “means” that frequency. Stimulate near the base for high frequencies, nearer the apex for low. The implant does not restore the hair cells; it bypasses them, exploiting the map the cochlea was already using. The nerve and brain then have to learn to read this coarser, electrically delivered version of sound, which is why outcomes depend so heavily on early implantation and rehabilitation, and why music and hearing-in-noise remain harder than quiet speech.
Cochlear implants are now common enough that “routine” can obscure how extraordinary they are. As of July 2022, more than one million had been implanted worldwide [@Zeng2022Millionth; @NIDCD2024QuickStats] — a useful update to older lecture notes (including mine) that still quote a figure of around 500,000. Clinical neuroscience moves even when the pathway diagrams do not.
And the edge is still moving. In April 2026 the FDA granted accelerated approval to the first gene therapy for an inherited deafness — Otarmeni (lunsotogene parvec-cwha), for severe-to-profound hearing loss caused by confirmed biallelic variants in the OTOF gene, in patients with preserved outer-hair-cell function and no prior cochlear implant in the treated ear [@FDA2026Otarmeni]. The logic is elegant and worth seeing, because it depends on exactly the inner-hair-cell synapse we discussed: OTOF encodes otoferlin, a protein required for transmitter release from inner hair cells onto the auditory nerve. In these patients the hair cells can be present and mechanically responsive — the bundle bends, the channel opens — but the synapse fails, so the signal never reaches the nerve. Replacing the gene restores the missing step. It is a real clinical milestone, and it is also a narrow one: it does nothing for the common age-related or noise-induced loss of hair cells themselves, which remains the large unsolved problem. (I will note for honesty’s sake that “accelerated approval” rests on a surrogate endpoint — improved audiometry at 24 weeks — rather than long-term outcomes; the result is genuine and the caveat is also genuine.)
I use “restore” here in a clinical sense — restoring access to auditory information — and not to suggest that Deaf people are incomplete people waiting to be fixed. Deaf culture and signed languages are part of the full human story, and the biology in this section is one thread of it, not the whole cloth.
10.11 Coda: hearing buys time, and vision will buy more
Audition is a near-perfect illustration of this unit’s argument. It begins with the oldest of mechanosensory cells and with bare physics — pressure waves in a medium. It passes through biomechanics: pinna filtering, eardrum motion, ossicle leverage, the cochlear traveling wave, hair-bundle deflection. It becomes neurobiology: auditory nerve, cochlear nuclei, superior olive, inferior colliculus, medial geniculate, cortex. And then it becomes behavior: orienting, recognizing, communicating, and — only at the very end of that long history — speech and song. At no point is the system a microphone wired to a recorder. It is a stack of evolved filters and control loops, each with descending feedback, each a historical solution to a physical or ecological problem.
Set against the rest of the unit, audition occupies a specific place on the gradient the overview laid out. Somatosensation reported the world in contact, with no lead time — reactive, homeostatic, the innermost-facing of the outward senses. Audition reports the world at a distance, and converts that distance into warning: the first genuinely long-range, prediction-buying sense in our tour outward from the body surface. It also keeps the intero/extero seam honest — the dorsal cochlear nucleus listening to the body, the descending olivocochlear loop, the early entanglement with orienting and action all remind us that “sensing the world” is never cleanly separable from “regulating the animal in it.” The overview promised that as we moved outward from the skin, sensing would start buying real lead time and the brain would begin, in earnest, to predict. Audition is where that promise starts being kept.
The next chapter, Vision, takes up the other great distance sense, and the one that dominates human conscious experience. Vision buys still more time and far more spatial detail than audition can, but it pays for that detail with its own transformations and its own characteristic illusions. The lesson audition leaves us with should make us cautious going in: the world does not enter the brain as itself. It enters through machinery, and machinery always has a point of view.
What we are sure of. Sound is a pressure wave, and the ear converts it through a chain — eardrum, ossicles, cochlear fluid, hair-bundle deflection, receptor potential, auditory-nerve firing. Vertebrate hearing is built from ancient mechanosensory hair-cell machinery shared with balance and, in aquatic vertebrates, the lateral line; the mammalian middle-ear ossicles include former jaw bones. The cochlea is tonotopic, high frequencies at the base and low at the apex, and it is active — outer hair cells, via prestin, amplify and sharpen. Signals ascend through cochlear nuclei, superior olive, lateral lemniscus, inferior colliculus, and medial geniculate to a tonotopic auditory cortex, and the system is bilateral from the superior olive onward. Localization uses interaural time differences, interaural level differences, and pinna spectral cues. Cochlear implants provide real auditory access for many people with sensorineural loss, exploiting the cochlea’s own tonotopic map.
What we are still working out. The detailed molecular architecture of the hair-cell transduction channel is an active area — the channel is a multi-protein complex, and the picture is still being refined. The exact evolutionary sequence of the mammalian middle ear is clear in outline but continually sharpened by new fossils. How the mammalian brain encodes interaural time differences is genuinely unsettled — the avian delay-line/place-code scheme does not appear to transfer to mammals, and an opponent-channel rate code is a serious, independently supported alternative, with the role of MSO inhibition still debated. Routine regeneration of mammalian cochlear hair cells remains unsolved. And tonotopy, though real, does not by itself explain auditory objects, music, speech, or scene analysis — the cortical story is still being written, and we will return to it in Unit VII.