24 Audition: Hearing at a Distance

Sound, Space, and Auditory Objects

Audition begins the outward turn of this unit. Somatosensation reports events at the body surface, and pain often signals a disturbance already affecting the body. Hearing can report an event that is distant, hidden, behind the head, or still approaching. By extending the interval between detection and contact, hearing buys time for orientation, avoidance, communication, and preparation for action.

The shared sensory architecture developed in the unit overview is assumed here: physical energy is transduced by receptors, carried through ascending pathways, relayed through thalamus, and projected into mapped cortical fields embedded in recurrent networks. Audition nevertheless modifies the simplest version of that plan. The cochlea performs an active mechanical analysis before the signal reaches the brain; the inner hair cell is a receptor cell that releases transmitter onto a separate first-order neuron; and the central pathway branches bilaterally through several brainstem and midbrain stations before reaching the medial geniculate nucleus and auditory cortex.

This chapter follows sound from pressure fluctuations in a medium to auditory objects and spatial scenes. Its central questions are how the ear separates frequency, how hair cells convert motion into synaptic release, how bilateral circuits estimate source location, and how thalamocortical networks transform acoustic structure into perceptually coherent events.

24.1 Sound as a physical variable

Sound is a mechanical disturbance transmitted through a material medium. In air, a vibrating source produces alternating regions of compression and rarefaction. Nearby molecules move back and forth over small distances while energy propagates away from the source as a longitudinal pressure wave. Sound can also travel through water, tissue, bone, and other elastic materials, but it cannot propagate through a vacuum.

A simple periodic wave can be described by several physical variables. Frequency is the number of cycles completed each second, measured in hertz. Frequency contributes strongly to perceived pitch, although pitch is not a direct frequency meter: harmonic structure, temporal pattern, and context also matter, and a pitch can be heard even when energy at the corresponding fundamental frequency is absent. Amplitude describes the size of the pressure fluctuation. It contributes to loudness, but loudness also depends on frequency, duration, adaptation, and the other sounds present. Phase identifies a point within the cycle. Phase becomes especially important when the two ears receive slightly shifted versions of the same waveform.

Most natural sounds are not single sine waves. A vowel, violin note, footstep, and closing door each contain a spectrum of frequencies whose amplitudes change over time. Their slower fluctuations form a temporal envelope; their rapid cycle-by-cycle structure is called temporal fine structure. Two instruments can play the same nominal note and still sound different because their harmonic spectra, onset transients, and envelopes differ. The acoustic signal therefore contains several partially independent dimensions from which the auditory system can infer what event produced it.

Four-panel figure. — Figure 24.1: Sound as pressure and spectrum. **(A)** A vibrating source displaces adjacent air molecules, producing a longitudinal wave that propagates away from the source. Regions of elevated molecular density (compressions) alternate with regions of reduced density (rarefactions); the wave travels in the direction indicated, while individual molecules oscillate about fixed positions rather than travelling with it. **(B)** The same disturbance expressed as instantaneous pressure against distance. Amplitude is the pressure deviation from ambient at a peak; wavelength (λ) is the distance between successive peaks; the period (T) spans one complete cycle, from which frequency follows as f = 1/T; phase (φ) specifies position within the cycle at a chosen reference point. Peaks correspond to the compressions in (A) and troughs to the rarefactions. **(C)** A sustained vowel plotted as amplitude against time. The rapid cycle-by-cycle alternations constitute the temporal fine structure, whose periodicity is set by the glottal pulse rate; the smooth boundary tracing the outer limits of the waveform is the temporal envelope, reflecting the slower amplitude contour of onset, sustained portion, and offset. **(D)** The frequency-domain representation of the waveform in (C). A periodic complex sound decomposes into discrete components: the lowest, the fundamental frequency (f₀), corresponds to the glottal pulse rate, with higher harmonics at integer multiples (2f₀, 3f₀, …). Relative harmonic amplitudes are shaped by vocal-tract resonances and determine the perceived timbre of the vowel.

Sound pressure level and the decibel

The auditory system operates across an enormous pressure range, so sound pressure is usually expressed on a logarithmic scale. Sound pressure level is defined as

\[ L_p = 20\log_{10}\left(\frac{p_{\mathrm{rms}}}{p_0}\right), \]

where \(p_{\mathrm{rms}}\) is the root-mean-square sound pressure and the standard reference in air is \(p_0 = 20\,\mu\mathrm{Pa}\). The factor is 20 rather than 10 because acoustic intensity is proportional to pressure squared.

A difference of 30 dB therefore corresponds to about 31.6 times the pressure amplitude and 1,000 times the acoustic intensity. 0 dB SPL is not the absence of sound; it is the standard reference pressure, chosen near the threshold of hearing around 1 kHz under favorable conditions. Clinical audiograms often use dB HL, a different scale that expresses hearing level relative to frequency-specific normal thresholds. The two units should not be treated as interchangeable.

24.2 From air to cochlear fluid

The outer and middle ear transfer acoustic energy into the fluid-filled cochlea. The pinna, together with the head and torso, filters sound according to its direction of arrival. Ridges and cavities in the pinna introduce frequency-dependent delays and attenuations that later help distinguish elevation and front from back. The external auditory canal adds its own resonance before the pressure wave reaches the tympanic membrane, or eardrum.

Motion of the tympanic membrane is transmitted through the three ossicles of the middle ear: the malleus, incus, and stapes. The ossicles provide more than a mechanical connection. Air and cochlear fluid have very different acoustic impedances, so direct transfer from the tympanic membrane into fluid would reflect much of the incident energy. The lever action of the ossicles and the difference in area between the tympanic membrane and the smaller oval window increase pressure at the cochlear entrance and improve energy transfer.

The footplate of the stapes moves at the oval window, launching pressure changes into the cochlear fluids. The round window provides a compliant outlet that moves in the complementary direction; without it, displacement of the nearly incompressible fluid would be severely constrained. Sound has now changed physical form several times:

air-pressure variation → tympanic motion → ossicular motion → cochlear-fluid motion

None of these structures is the auditory receptor. They form a frequency-dependent energy-transfer system that delivers mechanical motion to the receptor organ.

Figure 24.2: Mechanical transmission of sound through the outer, middle, and inner ear. Sound begins as alternating air-pressure changes collected by the pinna and directed through the external auditory canal toward the tympanic membrane. The numbered sequence emphasizes the successive physical transformations: (1) air-pressure waves travel through the canal; (2) the tympanic membrane moves inward and outward with those pressure changes; (3) this motion is transmitted through the malleus, incus, and stapes, which increase the pressure delivered to the much smaller oval window; and (4) movement of the stapes footplate at the oval window displaces fluid within the cochlea. Thus, sound transmission proceeds from air → membrane → bone → fluid. The round window, located inferior to the oval window, provides a compliant boundary that permits cochlear-fluid displacement. The nearby vestibular organs—the semicircular canals, utricle, and saccule—share the fluid-filled inner-ear labyrinth but detect head motion and orientation rather than airborne sound. The enlarged cross-section at right shows the three fluid-filled chambers of the cochlea near its base. The scala vestibuli and scala tympani contain perilymph, whereas the intervening scala media, or cochlear duct, contains endolymph. The scala media is separated from the scala vestibuli by the vestibular, or Reissner’s, membrane, and from the scala tympani by the basilar membrane, which supports the organ of Corti. During the phase illustrated, inward movement of the stapes produces pressure displacement in the scala vestibuli, deformation of the cochlear partition, and compensatory outward movement of the round-window membrane. These movements are oscillatory: all arrows reverse direction during the opposite phase of the sound cycle.

24.3 The cochlea separates frequency

The cochlea is a coiled tube built around a mechanically graded partition. If it is unrolled conceptually, the basilar membrane is narrow and stiff near the base, close to the oval and round windows, and progressively wider and more compliant toward the apex. This gradient converts frequency into place. High-frequency sounds produce their largest displacement toward the base; lower-frequency sounds travel farther before reaching a maximum nearer the apex. The resulting ordered relationship between frequency and position is tonotopy.

A pressure change at the oval window launches a traveling wave along the cochlear partition. The wave grows as it approaches the region whose mechanical properties best match the stimulus frequency, reaches a maximum, and then rapidly declines. Georg von Békésy’s measurements established this traveling-wave principle [@Bekesy1960Experiments]. The position of the peak supplies an initial place code for frequency, but the living cochlea is much more sharply tuned than a passive membrane alone would permit.

The organ of Corti sits on the basilar membrane and contains one row of inner hair cells, several rows of outer hair cells, supporting cells, and the overlying tectorial membrane. Motion of the cochlear partition produces relative movement among these structures and bends the hair bundles. The two hair-cell classes make different contributions.

Outer hair cells are active mechanical elements. Changes in membrane potential alter their length through the motor protein prestin, feeding mechanical energy back into the cochlear partition. This cochlear amplifier increases sensitivity to weak sounds, sharpens frequency selectivity, and contributes to the compressive nonlinearity that allows the ear to represent a large dynamic range [@Ashmore2008OuterHairCell; @Dallos2008CochlearAmplification; @Liberman2002Prestin]. The mechanical activity can even generate faint sounds that travel outward through the middle ear as otoacoustic emissions, which provide a useful clinical measure of outer-hair-cell function.

Inner hair cells provide the principal afferent output. Damage to outer hair cells raises thresholds and broadens frequency tuning because active mechanical enhancement has been lost. Damage to inner hair cells or their synapses more directly removes the signal delivered to auditory-nerve fibers. The distinction is therefore functional as well as anatomical: outer hair cells regulate the mechanical input, while inner hair cells convert the resulting motion into synaptic output.

Figure 24.3: **Cochlear tonotopy, traveling waves, and active amplification.** A. Sound-driven motion produces a traveling wave that propagates along the cochlear partition from base to apex. Because the basilar membrane is narrower and stiffer at the base but wider and more compliant toward the apex, high-frequency waves reach their maximum near the base, whereas progressively lower frequencies peak farther apically. The plotted envelopes show relative vibration amplitude rather than absolute displacement; frequency positions and dimensions are schematic. At the helicotrema, scala vestibuli and scala tympani communicate around the apical end of scala media. B. A radial section through the organ of Corti shows one row of inner hair cells and three rows of outer hair cells, separated by the pillar cells and tunnel of Corti. Deiters’ cells support the outer hair cells, the reticular lamina forms the apical cellular surface, and the tectorial membrane overlies the hair bundles. Inner hair cells provide most of the cochlea’s afferent sensory output, whereas outer hair cells contribute primarily to mechanical amplification. C. Deflection of an outer-hair-cell bundle produces a graded receptor potential that drives prestin-dependent shortening or elongation of the cell. These rapid length changes return mechanical energy to the cochlear partition, increasing and sharpening vibration near the frequency-specific peak and thereby enhancing stimulation of nearby inner hair cells.

24.4 Hair cells convert motion into synaptic release

Auditory transduction is another form of mechanotransduction, but its receptor architecture differs from the arrangement introduced in Chapter 22. In many cutaneous receptors, mechanical force acts at the peripheral ending of the first-order sensory neuron. In the cochlea, a specialized epithelial hair cell transduces movement and then communicates with a separate first-order afferent neuron.

The apical surface of each hair cell carries an ordered bundle of actin-filled projections called stereocilia. These are not hairs and are not motile cilia. They form rows of increasing height, linked by fine extracellular filaments. Deflection of the bundle toward the tallest stereocilia increases tension through the tip-link apparatus; deflection in the opposite direction decreases it. Cadherin 23 and protocadherin 15 are major components of the tip link [@Kazmierczak2007TipLinks]. Increased tension opens mechanically gated channels near the tops of the shorter stereocilia.

The ionic environment makes the resulting receptor potential unusual. The stereocilia project into endolymph, which contains a high concentration of potassium and is electrically positive relative to the hair-cell interior. When the mechanotransduction channels open, potassium therefore flows into the cell and depolarizes it. Deflection in the opposite direction closes channels and hyperpolarizes the cell. Hair cells use graded receptor potentials rather than converting every cycle directly into an action potential.

Depolarization reaches the basal pole of an inner hair cell, where voltage-gated calcium channels open. Calcium triggers glutamate release at specialized ribbon synapses onto the peripheral endings of auditory-nerve fibers. The ribbon maintains a pool of synaptic vesicles near release sites, supporting rapid and sustained transmission over the duration of a sound.

The cell bodies of these first-order afferents lie in the spiral ganglion. Their peripheral processes contact hair cells, and their central axons form the auditory component of cranial nerve VIII. The sequence is therefore:

cochlear motion → inner-hair-cell receptor potential → glutamate release → spiral-ganglion action potentials

The hair cell is the receptor cell; the spiral-ganglion neuron is the first-order sensory neuron. That distinction will remain important when the pathway is traced into the brainstem.

Figure 24.4: **Hair-bundle transduction and ribbon synapse.** Deflection of the stereociliary bundle toward the tallest stereocilia increases tip-link tension and opens more mechanotransduction (MET) channels near the tips of the shorter stereocilia; deflection in the opposite direction decreases tension and closes them. Because the bundle is bathed in potassium-rich endolymph, channel opening allows K(^+) entry and depolarizes the inner hair cell, whereas opposite deflection reduces K(^+) entry and hyperpolarizes it. The resulting change in membrane potential is a **graded receptor potential** that spreads to the basal pole of the inner hair cell, where voltage-gated Ca(^{2+}) channels open. Calcium entry triggers glutamate release at a **ribbon synapse** onto the peripheral terminal of a spiral-ganglion neuron. Glutamate depolarizes the afferent terminal, and the spiral-ganglion neuron then generates **action potentials** that propagate centrally in the cochlear division of cranial nerve VIII. The figure emphasizes the distinction between the **inner hair cell as the receptor cell** and the **spiral-ganglion neuron as the first-order sensory neuron**.

The molecular machine at the tip link

The molecular identity of the hair-cell mechanotransduction channel was unresolved long after the mechanical action of the bundle was understood. Current evidence supports TMC1 and TMC2 as pore-forming components of a larger mechanotransduction complex rather than as an isolated channel operating alone [@Pan2018TMC1Pore]. Proteins including TMIE, LHFPL5, and CIB2/3 contribute to assembly, localization, force transmission, or channel function [@Beurg2024LHFPL5; @Giese2025TMCComplex].

The central conclusion is secure: tip-link tension gates a membrane complex that converts bundle displacement into ionic current. The complete structural arrangement, the exact path by which force reaches the pore, and parts of the gating mechanism remain under active study. A compact textbook label such as “the TMC1 channel” is therefore useful only when understood as shorthand for a larger molecular machine.

24.5 The auditory nerve carries place and time

The spiral ganglion preserves the cochlea’s frequency order. Fibers contacting inner hair cells near the base are tuned to higher frequencies, and fibers from progressively apical regions are tuned to lower ones. This tonotopic arrangement remains visible through much of the ascending pathway.

Frequency is not represented by place alone. At lower frequencies, auditory-nerve spikes can become synchronized to a particular phase of the acoustic waveform. A single neuron need not fire on every cycle for a population to preserve the timing pattern. This phase locking supplies information about temporal fine structure that is important for pitch and binaural comparison. At higher carrier frequencies, cycle-by-cycle fine-structure coding becomes less useful, while neural responses can still follow slower changes in the sound’s amplitude envelope. The upper frequency at which humans make perceptual use of fine-structure phase locking is not one settled number: binaural evidence is strongest below roughly 1.5 kHz, whereas proposed limits for some monaural uses extend higher [@Verschooten2019PhaseLocking].

Sound level is also represented by several interacting codes. Increasing level generally increases firing rates within their effective ranges, recruits fibers with higher thresholds, and spreads excitation across a broader region of the cochlea. Auditory-nerve fibers contacting the same inner hair cell differ in spontaneous rate, threshold, dynamic range, and vulnerability. The nerve therefore carries a population pattern rather than a single scalar report of frequency or loudness.

This diversity helps explain why a standard audiogram is informative but incomplete. An audiogram measures whether quiet tones at different frequencies can be detected. It does not directly measure how faithfully the nerve preserves rapid temporal structure, how well neural populations remain separable at higher sound levels, or how effectively a voice can be represented among competing sounds.

Cochlear synaptopathy and hidden hearing loss

Noise exposure can damage the synapses between inner hair cells and auditory-nerve fibers in animal models even when hair cells survive and behavioral thresholds later return toward normal [@KujawaLiberman2009HiddenHearingLoss; @Liberman2017NoiseAge]. The lesion is often called cochlear synaptopathy. Because an ordinary audiogram may recover while suprathreshold neural coding remains altered, the phrase hidden hearing loss was introduced to describe a deficit that threshold testing could miss.

The animal evidence is strong. Translation to humans is less settled. Human synapses cannot ordinarily be counted during life, and proposed physiological proxies are affected by outer-hair-cell function, anatomy, age, and measurement noise. Some studies have found associations consistent with synaptopathy, whereas others have found no evidence that it explains speech-in-noise difficulty in listeners with normal audiograms [@Guest2018CochlearSynaptopathy].

The appropriate conclusion is asymmetric: noise-induced synaptic loss is a well-established biological phenomenon in several animal models; its prevalence, diagnosis, and functional importance in living humans remain unresolved. “Hidden hearing loss” should not become a default diagnosis for every complaint that an audiogram fails to explain.

Descending control reaches the cochlea

Auditory traffic is not exclusively ascending. Neurons in and near the superior olivary complex send olivocochlear efferents back toward the inner ear. Medial olivocochlear fibers act mainly in the outer-hair-cell region and can change cochlear gain. Lateral olivocochlear fibers terminate near the afferent endings beneath inner hair cells and can influence auditory-nerve output [@Guinan2006Olivocochlear].

These pathways have been implicated in protection from acoustic injury, regulation of dynamic range, listening in noise, and state-dependent control. No single proposed function explains every result. Their anatomy nevertheless makes one point unambiguous: even the sensory periphery is embedded in a recurrent circuit. The brain regulates part of the signal that it subsequently receives.

24.6 Ascending auditory pathways: branches, crossings, and destinations

The auditory pathway conforms to the unit’s canonical sensory plan only at a broad level. The first-order spiral-ganglion axon reaches a brainstem sensory nucleus; later neurons ascend through midbrain and thalamus to primary cortex. Between those points, however, the pathway branches, crosses, reconverges, and performs consequential computations.

The central spine is:

inner hair cell → spiral ganglion → cochlear nuclei → bilateral brainstem and midbrain routes → medial geniculate nucleus → auditory cortex

Central axons of spiral-ganglion neurons enter the pontomedullary junction in cranial nerve VIII and terminate in the cochlear nuclear complex on the same side. Fibers branch among the dorsal and ventral cochlear nuclei. These nuclei are not passive relays. Their neuronal populations preserve or transform different features of the signal, including onset, duration, intensity, temporal precision, and spectral shape.

Outputs from the ventral cochlear nucleus project through several routes to the superior olivary complex on both sides. Many crossed fibers travel in the trapezoid body. Because neurons in the superior olive receive input derived from both ears, this is the first prominent station for binaural comparison. The medial superior olive is strongly associated with interaural timing, especially for lower-frequency sounds. The lateral superior olive is strongly associated with interaural level differences, especially when the head produces an acoustic shadow. These are useful functional biases rather than exclusive assignments.

Excitation and inhibition are both essential. In one well-studied circuit, input driven by the opposite ear reaches the medial nucleus of the trapezoid body, whose glycinergic neurons inhibit the lateral superior olive. The lateral superior olive can therefore compare excitation associated with one ear against inhibition associated with the other. Timing circuits in and around the medial superior olive likewise depend on precisely timed excitatory and inhibitory inputs, not simply on two unmodified excitatory streams.

Auditory output ascends in the lateral lemniscus, whose own nuclei perform additional temporal processing, and reaches the inferior colliculus. Not every pathway passes through the superior olive first; direct and indirect routes from the cochlear nuclei converge in the midbrain. The inferior colliculus integrates frequency, timing, level, and spatial information, participates in auditory orienting, receives extensive descending input, and communicates with other sensorimotor structures. As described in the overview’s account of the tectal plan, auditory information is therefore tied to orienting before it reaches cerebral cortex.

The principal ascending output of the inferior colliculus travels through the brachium of the inferior colliculus to the medial geniculate nucleus of the thalamus. The ventral division of the medial geniculate provides the main first-order thalamic relay to primary auditory cortex. Other medial geniculate divisions participate in broader cortical, limbic, and multisensory circuits. The thalamic anatomy and the meaning of first-order thalamic relay are developed in Section 21.3 and illustrated in Figure 21.2.

This pathway has no single decisive decussation comparable to the dorsal-column crossing in the medulla or the anterolateral crossing in the spinal cord. Crossed and uncrossed branches appear at several levels, and information from each ear is represented bilaterally above the cochlear nuclei, usually with a contralateral bias. A unilateral cortical lesion can impair localization, segregation, or recognition without ordinarily producing complete deafness in one ear. The early bilateral architecture reflects the central importance of comparing the two ears.

The dorsal cochlear nucleus adds another complication: it receives somatosensory as well as auditory input. This convergence may help distinguish external acoustic structure from changes associated with movement of the head, jaw, pinna, or body, and it may contribute to recalibration after altered peripheral input. Those interpretations are plausible rather than final. The secure anatomical point is that auditory processing begins in a nucleus already supplied with information beyond the auditory nerve.

Figure 24.5: **Bilateral ascending auditory pathways.** Simplified organizational map of the principal ascending routes from the cochlea to auditory cortex. Inner hair cells synapse onto bipolar neurons of the spiral ganglion, whose central axons travel in cranial nerve VIII and terminate in the **ipsilateral cochlear nuclear complex** at the pontomedullary junction. Within the complex, fibers distribute to the dorsal cochlear nucleus (**DCN**) and ventral cochlear nucleus (**VCN**), where different neuronal populations preserve or transform information about sound onset, duration, intensity, temporal structure, and spectral shape. The gray-green projections emphasize that the DCN also receives somatosensory information related to movements of the head, jaw, neck, and body. Outputs from the cochlear nuclei divide among several crossed and uncrossed routes. VCN projections reach the **superior olivary complex** on both sides, with many crossed fibers traveling through the **trapezoid body**. The superior olivary complex includes the medial superior olive (**MSO**), lateral superior olive (**LSO**), medial nucleus of the trapezoid body (**MNTB**), and additional nuclei not shown. Because these nuclei receive information derived from both ears, the superior olive is the first prominent station for binaural comparison. Auditory signals then ascend through the **lateral lemniscus**, whose constituent nuclei perform additional temporal processing, to the **inferior colliculus**. The multiple pathways drawn from the cochlear nuclei indicate that not all ascending fibers first synapse in the superior olive: direct and indirect routes converge in the inferior colliculus at several levels. The inferior colliculus integrates information about frequency, timing, level, and spatial location and communicates with orienting and other sensorimotor systems. Descending projections, represented schematically in gray, also influence midbrain auditory processing. The principal ascending output of each inferior colliculus travels through the **brachium of the inferior colliculus** to the ipsilateral **medial geniculate nucleus** (**MGN**) of the thalamus. The darker ventral division (**MGNv**) provides the main first-order thalamic relay to primary auditory cortex (A1) on Heschl’s gyrus of the superior temporal plane. Other MGN divisions participate in broader cortical, limbic, and multisensory circuits. Because crossed and uncrossed branches arise at several stages, the auditory system has no single decisive decussation. Above the cochlear nuclei, each cerebral hemisphere receives information originating from both ears, usually with a contralateral bias. The inset expands two representative superior-olivary computations. In the LSO circuit, input associated with the ipsilateral ear provides excitation, whereas input driven by the opposite ear crosses in the trapezoid body, excites the MNTB, and reaches the LSO as glycinergic inhibition. This arrangement supports comparison of ipsilateral excitation with contralaterally driven inhibition and is especially important for interaural level differences. The MSO receives precisely timed bilateral excitation together with temporally structured inhibition, supporting sensitivity to interaural timing differences, particularly at lower sound frequencies. These are functional biases rather than exclusive assignments, and the inhibitory sources in the MSO circuit are simplified. Blue pathways carry information originating at the left ear, orange pathways carry information originating at the right ear, and teal pathways represent ascending output after substantial binaural convergence. Purple T-shaped terminals denote inhibition; triangular arrowheads denote the direction of ordinary or excitatory projections. Differences in line weight indicate a qualitative contralateral bias and should not be interpreted as quantitative estimates of projection strength. The diagram is an organizational summary rather than a complete tract atlas.

24.7 Localizing sound

The cochlea contains a map of frequency, not a map of external space. Auditory location must be reconstructed from the different transformations imposed on a sound before it reaches the two ears. Three classes of cues are especially important: interaural timing, interaural level, and direction-dependent spectral filtering.

For lower-frequency sounds, the nervous system can use an interaural time difference (ITD). A source to the left usually delivers the waveform to the left ear slightly before the right. The maximum delay produced by the human head is measured in hundreds of microseconds, and under favorable laboratory conditions listeners can discriminate changes in the range of tens of microseconds [@Brughera2013ITD]. Such performance depends on preserving temporal precision from the cochlea through the brainstem.

For many higher-frequency sounds, the head creates an acoustic shadow. The signal is more intense at the nearer ear than at the farther one, producing an interaural level difference (ILD). Timing and level cues overlap in their useful frequency ranges rather than dividing the spectrum at one exact boundary. Their relative importance changes with frequency, bandwidth, distance, and the acoustic environment.

ITDs and ILDs do not uniquely specify every point in space. Different locations can produce similar binaural differences, especially along a cone of confusion extending around the interaural axis. The pinnae, head, and torso resolve part of this ambiguity by imposing direction-dependent peaks and notches on the spectrum. The resulting head-related transfer function differs across listeners because ear and head anatomy differ. Elevation and front–back judgments therefore depend partly on learned relationships between an individual’s anatomy and the spectral patterns reaching the eardrums.

Movement improves the estimate. A small turn of the head changes ITDs, ILDs, and pinna cues in a lawful way. The auditory system can compare those changes against motor and somatosensory information, converting an ambiguous static sample into a more informative sequence. Localization is therefore an active process even when the eventual percept seems immediate.

Figure 24.6: **Sound-localization cues.** A, Interaural time differences (ITDs) arise because a lateral sound reaches the nearer ear first and are especially informative for lower-frequency temporal structure. B, Interaural level differences (ILDs) arise when the head attenuates sound at the farther ear, an effect that generally becomes stronger at higher frequencies. C, Locations with similar binaural cues can lie on a cone of confusion; direction-dependent filtering by the pinnae, head, and torso adds listener-specific spectral peaks and notches that support elevation and front–back judgments. The plotted spectra are schematic. D, Small head movements change ITDs, ILDs, and spectral cues in a lawful sequence, helping resolve otherwise ambiguous locations. Natural sounds usually provide several overlapping cues at once.

The Jeffress model: a powerful idea, not a literal mammalian diagram

The classic model of interaural timing, proposed by Lloyd Jeffress, combines axonal delay lines with coincidence detectors. Signals from the two ears travel along paths with different delays. A neuron responds most strongly when activity from the two sides arrives together, converting a temporal difference into selective neural activity. Closely related place-coding solutions are strongly supported in avian auditory systems, including the barn owl.

Mammalian brainstem neurons are also exquisitely sensitive to timing, but the implementation is not a simple copy of the avian map. The effective delays produced by excitation and inhibition, the distribution of preferred ITDs across neuronal populations, and the readout of opponent channels all contribute [@FitzpatrickKuwada2001ITD; @Encke2018ITD]. The Jeffress model therefore remains the right first demonstration that a circuit can turn delay into spatial information. It should not be mistaken for a complete wiring diagram of the human medial superior olive.

Spatial calibration remains plastic. When adult listeners wore molds that altered the shape of their pinnae, elevation judgments initially deteriorated but improved with continued experience; performance with the original ears remained available when the molds were removed [@Hofman1998NewEars]. The result shows that the nervous system learns the acoustic consequences of the body through which sound is received. Adult plasticity can recalibrate a cue without requiring the peripheral anatomy to return to its previous state.

24.8 From the medial geniculate to auditory cortex

The medial geniculate should be understood through the general thalamic architecture developed in the overview. Its ventral division preserves substantial tonotopic order and projects strongly to the middle layers of primary auditory cortex. Corticothalamic projections return from auditory cortex to the medial geniculate, and both ascending and descending axons interact with inhibitory thalamic circuitry. The thalamic station is therefore a controlled component of a recurrent loop rather than the final passive relay before perception.

Primary auditory cortex lies on the superior temporal plane, in and around Heschl’s gyrus, much of it hidden within the lateral sulcus. The relationship is not one gyrus to one functional field. Heschl’s gyrus varies markedly across individuals and may be single, partially divided, or duplicated; functional and cytoarchitectonic boundaries vary with that anatomy [@DaCosta2011Heschl]. A cortical drawing should therefore treat the location of A1 as approximate rather than outlining one universally fixed patch.

Like the cochlea and several subcortical nuclei, human auditory cortex is tonotopically organized. Imaging reveals multiple gradients in which neighboring cortical territories prefer neighboring frequencies [@Formisano2003Tonotopic; @Moerel2014AuditoryTopography]. These gradients provide an orderly substrate for analyzing spectral structure, just as the somatotopic maps of S1 preserve relationships across the body. They are not a complete account of the percept.

Auditory cortex is not a piano keyboard. A piano keyboard assigns one visible key to one note. Cortical populations respond to combinations of frequency, timing, bandwidth, intensity, spatial cues, and behavioral context. Pitch can depend on harmonic relationships and temporal regularity rather than on activity at one place corresponding to a fundamental frequency. Tonotopic position and pitch-related activity can therefore be dissociated [@Allen2022TonotopyPitch].

Primary fields communicate with surrounding auditory cortex on the superior temporal plane and lateral temporal surface, as well as with parietal, frontal, motor, and limbic systems. The commonly drawn core–belt–parabelt organization provides a useful comparative framework, but its sharpest anatomical definition comes from nonhuman primates. In humans, those labels should be shown schematically rather than as universally agreed borders. Across these regions, response properties become increasingly sensitive to conjunctions that characterize voices, environmental events, music, and other natural sound categories [@LeaverRauschecker2010NaturalSounds; @NormanHaignere2022Song].

Figure 24.7: **Human auditory cortex: anatomy, functional organization, and distributed connections.** A, the superior temporal plane with the opercula removed, showing Heschl’s gyrus, adjacent planar landmarks, approximate primary and nonprimary auditory territories, schematic tonotopic gradients, and common variations in Heschl’s-gyrus anatomy. B, the comparative core–belt–parabelt framework, whose field boundaries are best established in nonhuman primates and remain approximate in humans. C1, a broad population-level progression from sensitivity to spectrotemporal structure toward longer-timescale combinations and category-biased responses to voices, music, and environmental sounds [@LeaverRauschecker2010NaturalSounds; @NormanHaignere2022Song]. C2, simplified direct and indirect interactions between auditory cortex and frontal, parietal, motor, insular, amygdalar, and medial temporal systems. Borders, gradients, and connections are schematic and vary across individuals.

24.9 From frequencies to auditory objects

An auditory object is a perceptually coherent source or event inferred from acoustic input. It need not already be recognized or named. An unfamiliar machine can be heard as one continuing source before the listener knows what it is. Object formation refers first to grouping parts of an acoustic mixture as likely consequences of the same event.

This is a difficult problem because sounds from different sources add together before reaching the ears. At each eardrum, a voice, ventilation system, footstep, and closing door contribute to one pressure waveform. The auditory system must decide which frequency components and temporal fragments belong together and which should be separated. Common onset, harmonic relationships, shared amplitude modulation, continuity of pitch or timbre, and location all provide evidence. Prior knowledge and attention can alter the grouping without replacing the acoustic constraints.

Albert Bregman called this problem auditory scene analysis [@Bregman1990AuditoryScene]. The phrase is apt because the percept is not a list of detected frequencies. It is a structured scene containing partially segregated sources: one voice continuing behind another, a vehicle approaching from the left, rain forming a background texture, or a melody persisting as its notes move across frequency.

The cortical pathways supporting source identity, spatial location, and sound-guided action show partially separable biases. Anterior temporal regions are often emphasized in accounts of auditory object identity, whereas posterior temporal and parietal regions contribute strongly to spatial and sensorimotor processing [@RauscheckerScott2009MapsStreams; @BizleyCohen2013AuditoryObjects]. The distinction should not become a rigid auditory version of two filing cabinets labeled what and where. Identity helps resolve location, location helps segregate identity, and both are continuously linked to possible action.

Perceptual illusions reveal the inference. In the ventriloquism effect, a plausible visual source can pull the apparent location of a sound toward itself. In stereo reproduction, carefully chosen timing and level differences create a phantom source between two loudspeakers. These are not arbitrary failures. They expose cue-combination rules that ordinarily help locate a common cause in a noisy environment.

24.9.1 Speech and music: asymmetry without caricature

Speech and music are structured auditory events unfolding over time. Both depend on frequency, timing, timbre, grouping, expectation, and learned categories. Neither is processed by one hemisphere alone.

Population-level asymmetries are nevertheless real. Left auditory regions often show greater sensitivity to relatively rapid temporal modulations that are useful for speech, whereas right auditory regions often show greater sensitivity to fine spectral structure and slower temporal organization that support melody and pitch. Experiments that independently degraded temporal and spectral information in sung speech produced corresponding changes in speech and melody perception and in left- and right-auditory-cortical decoding [@Albouy2020SpeechMelody]. These are graded processing biases, not ownership rules in which the left hemisphere contains language and the right contains music.

A voice simultaneously carries phonetic structure, speaker identity, affect, prosody, distance, and location. Music carries pitch relationships, rhythm, meter, timbre, and learned social meaning. Human auditory cortex contains populations with selective responses to natural categories, including voices, music, and song, but such selectivity does not establish a single center that contains the category [@LeaverRauschecker2010NaturalSounds; @NormanHaignere2022Song]. The category emerges from distributed activity within a recurrent auditory and multisensory network.

Language will return in a later unit. Here the important point is architectural: language uses auditory machinery that evolved for analyzing sound, locating sources, recognizing conspecific signals, and linking heard events to action. The cochlea does not encode words, and the superior olive does not compute grammar. Linguistic interpretation is built on earlier acoustic and auditory-object processing rather than substituted for it.

24.9.2 Attention and listening in noise

Natural listening usually requires selecting one source from a mixture. Spatial attention can favor a talker at one location; knowledge of a voice can support tracking across interruptions; rhythmic regularity can predict when informative events will occur; and linguistic context can help reconstruct a partially masked word. These influences arise through recurrent interactions among auditory cortex, thalamus, brainstem, motor systems, and frontoparietal control networks rather than from a separate attention module operating after perception is complete.

A degraded peripheral signal increases the work required of these systems. Conversation in a noisy room may depend on sustained attention, visual speech cues, prediction, working memory, and inference. A listener can answer correctly while expending much more effort than a quiet-tone audiogram would suggest. Difficulty hearing in noise is therefore not simply a softer version of difficulty detecting a tone. It tests the fidelity of peripheral coding and the capacity of central networks to segregate, track, and interpret an auditory object.

24.10 What central auditory disorders reveal

The preceding sections separated several operations that ordinary hearing compresses into one fluent experience. Acoustic energy must be transduced and transmitted; components of a mixture must be grouped into auditory objects; an object may be assigned a source, identity, and location; and one source may be selected over its competitors. Central auditory disorders show that hearing is not one indivisible act.

neural transmission without conscious sound → conscious sound without recognition → available sound without spatial selection → an auditory percept without an external source

These contrasts are not a rigid serial pipeline, and the syndromes do not identify one cortical module for each function. Lesions cross areal boundaries, interrupt white matter as well as cortex, and change during recovery. They are nevertheless informative because they reveal which auditory operations can come apart.

24.10.1 A signal without a heard world: cortical deafness

Rare bilateral damage involving auditory cortex and adjacent superior temporal regions can produce cortical deafness. A similar loss of conscious hearing can follow bilateral interruption of the auditory radiations carrying medial geniculate output toward cortex. The person may fail to orient to loud sounds and report no auditory experience even though cochlear measures and short-latency auditory brainstem responses remain present [@MendezGeehan1988CorticalAuditory; @Akiyoshi2021SubcorticalDeafness].

A preserved conventional auditory brainstem response shows that the cochlea, auditory nerve, and early brainstem pathway can respond. It does not by itself show that the signal has reached the medial geniculate nucleus or auditory cortex. Complete cortical deafness generally requires bilateral damage, consistent with the bilateral organization of the ascending pathway after the cochlear nuclei. A unilateral lesion is more likely to produce a subtler deficit in temporal analysis, localization, recognition, or listening under competition.

The boundaries become visible during recovery. In some patients, an initial cortical deafness evolves into generalized auditory agnosia, amusia, pure word deafness, or a more restricted disturbance of temporal sequencing [@MendezGeehan1988CorticalAuditory]. These outcomes are better understood as related disorders within an overlapping system than as entirely separate diseases.

24.10.2 Hearing without recognition: the auditory agnosias

Auditory agnosia refers to a failure to perceive or recognize sounds that cannot be explained by peripheral hearing loss, a general intellectual disorder, or a language deficit sufficient to account for the impairment [@SlevcShell2015AuditoryAgnosia]. The category includes several striking dissociations.

In environmental auditory agnosia, a person cannot reliably recognize nonverbal sounds such as a barking dog, running water, a telephone, or a closing door, although speech comprehension may be much better preserved. In an apperceptive disorder, the acoustic pattern itself is not organized or discriminated normally: the listener may have difficulty segregating a sound from a mixture or deciding whether two samples came from the same kind of source. In an associative disorder, a reasonably coherent auditory object is available, but it fails to activate knowledge of what produced it. The listener may hear a bounded event and still not know that it was a door closing.

The distinction is useful but not absolute. Sound segregation, nonsemantic matching of different exemplars from the same source, and semantic identification can be disrupted separately rather than failing as one strictly hierarchical chain [@Clarke1996AuditoryRecognition]. Some lesion studies have also found more acoustic confusions after right-hemisphere damage and more semantic confusions after left-hemisphere damage [@Schnider1994EnvironmentalSounds]. These are tendencies, not exclusive hemispheric centers. They show that constructing an auditory object and knowing what it means are related but distinguishable achievements.

Phonagnosia is a selective impairment in recognizing people by their voices. A person may hear that someone is speaking, understand every word, and recognize the same individual from a face, yet fail to identify the speaker from the voice. Some cases impair the discrimination of voice structure itself; others preserve discrimination of unfamiliar voices but disrupt the link from a familiar voice to person-specific knowledge. One particularly selective case followed a right anterior temporal stroke: familiar singers could no longer be identified from their voices even though voice perception, language, face perception, and familiar-face recognition were preserved [@VanLancker1988Phonagnosia; @Luzzi2018Phonagnosia]. The evidence favors a distributed, right-biased network for familiar-voice identity rather than a single cortical voice center.

Acquired amusia is a loss of one or more musical abilities after brain injury. It is not the disappearance of “music” as a unit. Pitch direction, melodic contour, tonal structure, timbre, rhythm, beat, meter, and musical memory depend on partly separable operations. One patient may lose the tonal organization of a familiar melody while retaining enough rhythm to recognize it; another may show the reverse pattern. Stroke and lesion-network studies associate persistent amusia especially with a right-biased network involving superior temporal and Heschl’s gyri, insula, striatum, connected white matter, and additional frontal and parietal regions [@Vignolo2003MusicAgnosia; @Sihvonen2016AcquiredAmusia; @Sihvonen2024AmusiaNetwork]. Acquired amusia therefore supports the chapter’s earlier conclusion: speech and music show meaningful asymmetries without belonging exclusively to opposite hemispheres.

A related syndrome, pure word deafness, disproportionately disrupts the auditory analysis of speech while reading, writing, and spontaneous speech remain comparatively available. It will be considered with the language disorders. Its existence is worth noting here because environmental sounds, voices, music, and spoken words are not guaranteed to fail together.

24.10.3 Sound without selection: auditory neglect

A sound can also be represented without being selected. After unilateral brain damage, especially within a right-hemisphere attention network, a person may fail to acknowledge sounds in contralesional space. The deficit is often clearest under competition: a left-sided sound may be reported when presented alone but omitted when a simultaneous sound occurs on the right. This is auditory extinction.

Ear and space must be distinguished. In a dichotic-listening task, different signals are delivered to the two ears; failure to report the left-ear item can result from damage to auditory cortex or its connections as well as from spatial neglect. A stronger test of hemispatial neglect uses free-field sounds or interaural time differences to create apparent sources on the left and right while both ears receive input. Some patients then show impaired allocation of attention to contralesional space, whereas others systematically mislocalize sounds toward the ipsilesional side [@Bellmann2001AuditoryNeglect; @Gutschalk2012AuditoryNeglect]. Brainstem circuits may therefore compute usable ITDs and ILDs without ensuring that a sound enters the attended spatial scene or guides an orienting response.

24.10.4 A percept without a source: musical hallucinations

The preceding disorders remove an operation from an externally driven percept. Musical hallucinations reveal the complementary possibility: a highly organized auditory percept can occur without a corresponding external sound. The percept may consist of a familiar tune, singing, instrumental music, or repetitive musical fragments. It can occur in people with acquired hearing loss who remain cognitively intact, retain insight, and show no other evidence of psychosis. Hearing loss is not the only setting; neurological lesions, epilepsy, neurodegenerative disease, medications, and psychiatric disorders can also contribute [@Hammeke1983MusicalHallucinations; @Griffiths2000MusicalHallucinosis; @Linszen2019HearingHallucinations].

Reduced peripheral input may weaken the bottom-up constraints that normally keep recurrent auditory activity aligned with the acoustic environment. On a release or deafferentation account, internally generated activity, learned predictions, or musical memories then exert disproportionate influence. This is a plausible family of mechanisms, not a complete explanation for every case, and the percept should not be described as primary auditory cortex simply replaying a stored recording.

Musical hallucinations also sharpen the distinction from tinnitus. Tinnitus is usually an unformed percept such as ringing, buzzing, or hissing; a musical hallucination has recognizable melodic, rhythmic, vocal, or instrumental organization. The boundary is not absolute, and the same person may experience both. Together they show that the absence of an external pressure wave does not imply the absence of an auditory object.

Auditory verbal hallucinations raise a broader problem involving speech, memory, agency, and source attribution. They should not be reduced to spontaneous activation of primary auditory cortex and are better treated alongside language and psychiatric disorders than as another auditory agnosia [@Barber2021InnerSpeechAVH].

The general lesson is not that the temporal lobe contains separate boxes for doors, voices, songs, and locations. Conscious auditory awareness, object construction, semantic recognition, person identity, musical structure, spatial selection, and source attribution are distinguishable operations implemented by overlapping recurrent networks. Ordinary hearing conceals those distinctions because the operations are usually completed together and fast enough to feel like one event.

24.11 When hearing is altered

The central syndromes just described arise after acoustic energy has been transduced and at least some early neural transmission remains available. Hearing can also be altered earlier in the pathway. Conductive hearing loss reduces the transfer of sound through the outer or middle ear. Sensorineural hearing loss arises principally from the cochlea or auditory nerve, including damage to outer hair cells, inner hair cells, synapses, or neural elements. Central and peripheral impairments can coexist, and similar everyday complaints can therefore arise from different anatomical causes.

Damage to cochlear hair cells is especially consequential because mature mammalian hair cells do not regenerate in a clinically useful routine manner. Birds and several other vertebrates can replace sensory hair cells after injury; the adult mammalian organ of Corti largely cannot. Supporting-cell reprogramming, developmental transcription factors, gene delivery, and other regenerative strategies remain active areas of research, but none currently provides a general treatment for ordinary age-related or noise-induced cochlear damage [@Choi2024HairCellRegeneration; @Wang2024RegenerationAtlas].

Tinnitus is the perception of ringing, buzzing, hissing, or another sound without a corresponding external source. It often follows peripheral hearing damage, but it is heterogeneous. Increased central gain, altered inhibition, and abnormal synchrony are important families of models, not one complete explanation [@EggermontRoberts2004Tinnitus]. Attention, sleep, stress, and affect can strongly alter the percept and its burden. Tinnitus therefore illustrates both the dependence of central activity on peripheral input and the limits of explaining a conscious auditory experience from cochlear damage alone.

The public-health burden is large. The World Health Organization reported in March 2026 that more than 5% of the world’s population—about 430 million people—required rehabilitation for disabling hearing loss, with more than 700 million projected to require it by 2050 [@WHO2026HearingLoss]. Those figures include many different causes and levels of impairment; they should not be read as a count of people who need the same intervention.

24.11.1 Hearing aids and cochlear implants

A hearing aid amplifies and processes acoustic input so that residual cochlear function can use it. Modern devices can apply frequency-dependent gain, directional filtering, compression, and noise-management algorithms. They do not bypass the cochlea, and their benefit therefore depends partly on the surviving receptor and neural populations.

A cochlear implant takes a different approach. An external microphone and processor divide sound into frequency bands. An implanted receiver drives an electrode array inserted into the cochlea, mapping higher-frequency information toward the basal end and lower-frequency information farther apically. Electrical current activates surviving auditory-neural elements, bypassing damaged hair-cell transduction. The implant does not stimulate the basilar membrane as though recreating a traveling wave.

This interface exploits tonotopy but does not reproduce normal cochlear mechanics. Spectral resolution is limited by the number and placement of electrodes, current spread through cochlear fluid, neural survival, programming, and the mismatch between the electrode map and the listener’s surviving anatomy. Many users achieve excellent speech understanding, particularly with appropriate rehabilitation, while music, localization, and speech in noise often remain more difficult [@WilsonDorman2008CochlearImplants; @DrennanRubinstein2008MusicCI]. Bilateral implantation can improve spatial hearing for many people, but it does not simply recreate the fine temporal cues available to two normally functioning ears.

As of July 2022, more than one million cochlear-implant devices had been implanted worldwide [@NIDCD2024QuickStats]. The date and the distinction between devices and recipients matter because some people receive bilateral implants and because the total continues to change.

Clinical language about restoring hearing refers to access to auditory information, not to restoring the worth or completeness of a person. Deaf communities include signed languages, cultural identities, and differing views of medical intervention. Hearing aids, cochlear implants, sign language, captioning, and other accommodations belong to overlapping rather than mutually exclusive human responses to hearing difference.

A current clinical frontier: OTOF gene replacement

On April 23, 2026, the U.S. Food and Drug Administration approved lunsotogene parvec-cwha (Otarmeni), the first approved gene therapy for a genetic form of hearing loss [@FDA2026Otarmeni]. The indication is narrow: severe-to-profound or profound sensorineural hearing loss, defined in the label as any frequency above 90 dB HL, associated with molecularly confirmed biallelic variants in OTOF, with preserved outer-hair-cell function and no previous cochlear implant in the treated ear.

OTOF encodes otoferlin, a protein required for efficient synaptic transmission from inner hair cells. In eligible patients, outer-hair-cell function and much of the receptor apparatus are preserved, but otoferlin deficiency disrupts transmitter release to the auditory nerve. Otarmeni uses a dual adeno-associated-virus system to deliver a functional OTOF sequence to inner hair cells, restoring otoferlin production and auditory signaling. In the ongoing pivotal study, 16 of 20 patients evaluable at 24 weeks reached an average pure-tone threshold of 70 dB HL or better—an improvement not expected from the untreated natural history [@FDA2026Otarmeni].

This treatment does not regenerate lost hair cells and is not a general therapy for age-related, noise-induced, or most genetic hearing loss. Its 2026 approval was accelerated, so continued evaluation must establish durability and confirm effects on speech development and quality of life. The result is nevertheless a major proof of principle: when the peripheral apparatus is present but one molecular step in synaptic transmission is missing, targeted gene replacement can restore a route into the auditory nerve.

24.12 Development and recalibration

The cochlea can deliver auditory signals early in life, but mature hearing is not supplied by the receptor organ alone. The developing nervous system must learn the statistics of voices, speech categories, reverberant spaces, sound-source motion, and the filtering imposed by its own growing head and pinnae. Auditory object recognition and spatial calibration therefore depend on experience as well as on intact peripheral machinery.

This matters when auditory access is limited early in development and later supplied by a hearing aid, cochlear implant, or gene therapy. Earlier access is often associated with better spoken-language outcomes, but age is not the only variable. Duration and degree of deprivation, cochlear and auditory-nerve integrity, prior auditory or signed-language experience, device characteristics, rehabilitation, family interaction, education, and the wider linguistic environment all contribute [@KralSharma2012DevelopmentalPlasticity]. There is no single critical-period switch that determines one inevitable outcome.

Descriptions of wholesale cortical “reorganization” should also be used carefully. Auditory deprivation can alter responsiveness, connectivity, and the balance of inputs to auditory cortex, and visual or somatosensory signals can recruit parts of the network under some conditions. Those changes do not imply that the entire auditory system has been irreversibly converted into another modality. Restored peripheral input acts on a brain shaped by its previous history, while substantial capacity for learning remains.

Adult plasticity is evident in several forms. Listeners learn to interpret cochlear-implant stimulation, adapt to unfamiliar accents, refine pitch and timing through musical practice, and recalibrate localization after pinna cues are changed. Plasticity is therefore real but constrained: experience can improve the use of available information, while the quality of the peripheral signal and the timing of development still matter.

Gene replacement makes the same distinction visible. Restoring otoferlin can repair one peripheral transmission step, but it does not supply a mature auditory world model. The central system must still learn how the newly available patterns correspond to sources, locations, words, and actions.

24.13 Looking ahead: two ways to sense at a distance

Audition begins with pressure waves and ends with inferred events. Air-pressure variation becomes tympanic motion, ossicular motion, cochlear-fluid motion, hair-bundle deflection, receptor potentials, synaptic release, and patterned activity across bilateral neural pathways. At each stage, the signal is transformed rather than copied. The cochlea separates frequency; the brainstem compares the ears; thalamocortical networks preserve maps while constructing sources and scenes.

This organization makes audition especially effective for detecting events that unfold over time and outside the current line of sight. A sound can announce motion behind the head, a voice in another room, or an approaching source before contact. Hearing buys time because it converts the remote physical consequences of an event into information that can guide action.

The next chapters turn to vision, another distance sense with a different starting geometry. Auditory space must be reconstructed from timing, level, and spectral transformations at two ears. Vision begins with a two-dimensional receptor sheet on which spatial relationships are already laid out. That advantage does not make vision a direct picture of the world. It creates a different sequence of transformations and a different set of problems for the nervous system to solve.

What is well established, and what remains unsettled

Reasonably settled:

Sound is a mechanical pressure variation in a medium. The outer and middle ear transfer acoustic energy into cochlear-fluid motion through direction-dependent filtering and impedance matching.
A traveling wave moves along the mechanically graded cochlear partition. High frequencies peak toward the stiff base and low frequencies nearer the compliant apex, establishing a tonotopic place code.
Outer hair cells provide active mechanical gain and sharpen tuning; inner hair cells provide the principal afferent output through ribbon synapses onto spiral-ganglion neurons.
Hair-bundle deflection gates a mechanotransduction complex through the tip-link apparatus. Potassium-rich endolymph supplies the principal depolarizing current, and basal calcium entry triggers glutamate release.
The inner hair cell is a receptor cell, while the spiral-ganglion neuron is the first-order sensory neuron. Auditory-nerve populations carry both place and temporal information.
Auditory pathways branch bilaterally after the cochlear nuclei and ascend through superior-olivary, lateral-lemniscal, inferior-collicular, and medial-geniculate circuits to auditory cortex.
Interaural timing, interaural level, pinna-derived spectral filtering, and head movement all contribute to localization.
The ventral medial geniculate projects strongly to primary auditory cortex in and around Heschl’s gyrus. Human auditory cortex contains multiple tonotopic gradients, but tonotopy alone does not explain pitch, auditory objects, speech, or music.
Central lesions can dissociate conscious auditory awareness, environmental-sound recognition, familiar-voice recognition, components of music perception, and auditory spatial attention even when cochlear function or early brainstem responses remain relatively preserved. These dissociations show that hearing is not one indivisible cortical operation.
Cochlear implants bypass damaged hair-cell transduction and can provide powerful auditory access without recreating normal cochlear mechanics. OTOF gene replacement is now an approved treatment for one narrowly defined molecular form of deafness.

Genuinely unsettled, and presented as such:

The complete structure and gating mechanism of the hair-cell mechanotransduction complex. TMC1/2 and several associated proteins are established components, but force transmission and channel operation are not fully resolved.
The human importance of cochlear synaptopathy. Noise-induced synaptic loss is robust in animal models, while its prevalence, diagnosis, and contribution to human speech-in-noise difficulty remain debated.
The population code for interaural timing in mammals. The Jeffress principle remains useful, but mammalian circuits combine coincidence, inhibition, and distributed coding in ways that are not captured by one delay-line map.
The principal functions of descending auditory pathways. Protection, gain control, selective listening, learning, and state regulation are all supported to some degree, but no single account explains the system.
How tonotopic, temporal, spatial, categorical, and task-dependent organizations interact across human auditory cortex. The maps are real; the route from those maps to stable auditory objects remains incompletely understood.
How named central auditory syndromes map onto overlapping cortical and white-matter networks. Lesion extent, disconnection, task demands, premorbid expertise, and recovery all shape the observed disorder. Cortical deafness, generalized auditory agnosia, and more selective impairments can represent changing points along a clinical spectrum rather than failures of separate cortical modules.
How formed auditory hallucinations arise in the absence of an external sound source. Reduced sensory input, recurrent activity, memory, prediction, epilepsy, medication effects, and broader network state may all contribute. Release or deafferentation models are plausible, particularly after hearing loss, but no single mechanism explains every case.
Whether broadly useful cochlear regeneration can be achieved in adult mammals, and how durable the benefits of newly approved gene therapy will be. Both are active clinical frontiers rather than general solutions to hearing loss.

Most of the chapter’s physical sequence, receptor physiology, large-scale pathway anatomy, bilateral organization, and principal cortical maps is well established. Central disorders also establish that auditory awareness, object recognition, voice identity, musical analysis, and spatial selection can come apart. The principal open questions concern the molecular details of transduction, the codes built from precisely timed populations, the functions of descending control, the relationship between clinical syndromes and overlapping neural networks, and how externally driven or internally generated activity becomes a coherent auditory event.