33 The Predicting Machine

The Cerebellum, Forward Models, and Predictive Control

33.1 Prediction inside a recurrent controller

The previous chapter (Chapter 32) ended with a control problem built into every moving body. Sensory evidence arrives after the state that produced it. By the time visual pathways report that a hand has drifted, the hand has moved farther. By the time a corrective command reaches spinal circuits and muscle, the limb, the target, and the forces between them may all have changed again. A controller that waited for a complete sensory description of the present would always act on the recent past.

The nervous system solves this problem by combining feedback with prediction. The cerebellum is a major part of that solution. Its best-established contribution to sensorimotor control can be stated directly:

The cerebellum learns forward models—predictive relations that use the estimated current state, the context, and signals related to an outgoing action to anticipate the action’s sensory and mechanical consequences. These predictions allow embodied controllers to act on an estimate of the present rather than waiting for delayed feedback. Later feedback corrects the action and recalibrates the model.

A forward model is not a miniature conscious replica of the body. It is a learned mapping. Given that the arm is here, moving at this speed, carrying this load, and receiving this command, what state is likely to follow? Given that the head is about to turn, what vestibular and proprioceptive pattern should result? Given that an eye movement has been issued, where should the image fall next? The predicted consequence can be used to estimate current state, prepare an anticipatory correction, distinguish reafference from an external event, or train a faster response for the next encounter [@miall1993smith; @wolpert1998internal; @nguyenperson2025predictive].

Prediction does not replace feedback. It changes what feedback can do. Incoming sensory evidence is compared with what the controller expected. Agreement supports the current state estimate. A discrepancy—sensory prediction error—reveals an unexpected disturbance, an inaccurate command, or a model that no longer matches the body or environment. The same recurrent loop therefore controls the current action and learns how to control the next one.

The cerebellum is not the only source of prediction in the nervous system, and it does not choose every goal or issue the final command to muscle. Its distinctive position is between rich sensory and motor-related input and widespread cerebral, brainstem, vestibular, and spinal control systems. It can learn the dynamics of the particular system with which each cerebellar region is coupled and return a phase-advanced signal that improves estimation, timing, and correction.

Its scale makes that role less surprising than its location at the back of the brain might suggest. The cerebellum contains roughly four-fifths of the neurons in the human brain, most of them small granule cells. Its cortex is folded so tightly that an unfolded human cerebellar surface would form a strip nearly a meter long and would approach four-fifths of the surface area of the neocortex [@azevedo2009neurons; @sereno2020surface]. The structure is not a minor accessory to the cerebrum. It is a vast predictive-learning sheet compressed into narrow folds.

A recognizable circuit motif recurs across much of that sheet. Mossy-fiber pathways carry mixtures of sensory state, motor-related activity, and context. Granule cells expand those inputs into an enormous population of combinations. Purkinje cells transform that population activity and inhibit cerebellar or vestibular output neurons. Climbing fibers from the inferior olive provide powerful instructive signals that help revise the mapping when prediction and consequence diverge. Regional differences in input, molecular identity, microzone, and output target determine which variables are predicted and which controller receives the result [@kozareva2021atlas; @dezeeuw2021diversity].

The chapter therefore treats predictive internal models as the central explanatory framework for cerebellar sensorimotor function. The claim is strong without requiring every cerebellar territory to encode the same variable or use one identical neural code.

33.2 Why delayed feedback requires a forward model

Negative feedback is indispensable because disturbances cannot all be known in advance. A foot can land on a loose stone, a cup can be heavier than expected, and another person can move the object being reached for. Feedback reveals those departures. Delay determines whether the correction will stabilize the action or chase it.

Several delays accumulate within an ordinary movement. Receptors and peripheral axons require time to signal a change. Spinal, brainstem, thalamic, and cortical pathways add synaptic and conduction delays. Vision requires retinal and cortical processing. A new descending command then requires additional time to reach motor neurons, activate muscle, develop force, and accelerate the body. The relevant interval differs across pathways, but even a short delay matters when the limb is moving rapidly.

Body dynamics add a second problem. Moving the shoulder produces interaction torques at the elbow and wrist. Muscle force depends on length, velocity, activation history, and load. A command that worked a moment ago may no longer produce the same consequence after fatigue, growth, injury, a change of tool, or a change in the environment. The controller must know not only where the body was but how its present command is likely to change it.

A forward model supplies that missing relation. It takes an estimate of the current state together with information about the intended or ongoing action and predicts a later state or sensory consequence. The prediction can be generated before the real consequence returns through afferent pathways. A state estimator then combines that phase-advanced prediction with delayed and noisy sensory evidence to infer the body’s current condition.

The logic can be followed through a reach. A descending command begins to accelerate the arm. A motor-related signal informs the forward model that the command has been issued. The model predicts the emerging joint velocities, interaction torques, and sensory consequences. That prediction allows braking and interjoint compensation to begin before visual or proprioceptive evidence could report the complete result. When the actual evidence arrives, it is compared with the predicted evidence. A small discrepancy preserves the current estimate; a large discrepancy triggers correction and learning.

This architecture also explains how predictive and feedback control become one recurrent process. The forward model supplies an internal, rapidly available consequence of the action. External feedback later supplies the measured consequence. Their difference isolates what the model failed to explain. The controller can respond quickly to that residual rather than treating every expected sensory change as a new disturbance.

Deeper Dive: A Smith predictor for the body

An engineering Smith predictor stabilizes a delayed system by placing a model of the controlled plant and its delay inside the controller. The model predicts how the plant should respond to a command before the delayed measurement returns. A delayed copy of that prediction can then be compared in temporal register with the real feedback. The controller acts immediately on the model’s current estimate while using the later mismatch to correct both performance and the model [@miall1993smith].

The biological analogy is not that the cerebellum contains a literal block diagram. The useful point is the division of labor. A learned model of body dynamics advances the controller beyond delayed sensory evidence. A learned model of the delay allows predicted and observed consequences to be compared at the appropriate time. The external sensory loop remains essential, but it no longer has to carry the entire burden of rapid control.

This idea makes a specific prediction about cerebellar damage. Feedback gain can remain available while the response acquires excessive phase lag. The person can still correct, but the correction follows the target or limb too late. That signature has been measured in cerebellar ataxia and can be partly improved by artificially advancing visual feedback [@zimmet2020feedback].

Prediction also supports learning of anticipatory policies. When the same context repeatedly requires the same correction, the controller need not recompute every detail from the beginning. A learned state–action mapping can begin the correction before an error occurs. Such a policy does not oppose the forward-model account. Prediction errors can train it, and a forward model can continue to supervise it when the body or environment changes.

The central point is therefore not that movement alternates between a feedforward mode and a feedback mode. Embodied control is recurrent throughout. Forward models allow the recurrent loop to remain stable and responsive despite delay by estimating what is happening now and what the current action is about to cause.

33.3 What happens when prediction fails

Cerebellar damage does not ordinarily resemble interruption of the corticospinal tract, a ventral root, or a peripheral motor nerve. A patient may still generate substantial force and initiate the intended action. What deteriorates is the ability to anticipate how forces, joints, sensory consequences, and time will unfold together.

The broad clinical term is ataxia. A reach can overshoot or undershoot its target, a failure called dysmetria. As the finger approaches the target, corrections may alternate around it, producing an intention tremor that becomes most visible when precision is required. A movement that should combine shoulder, elbow, wrist, and hand can break into a sequence of partially isolated components. Rapid alternation of opposing movements becomes irregular, a deficit called dysdiadochokinesia. Gait widens and becomes unstable. Speech can become poorly timed and variably stressed. Eye movements can show dysmetria, impaired holding, or nystagmus.

A predictive-control account explains why these failures cluster. If the estimate of limb state lags behind the limb, braking begins too late and the movement overshoots. If corrections depend too heavily on delayed sensory evidence, each correction can arrive after the previous one has already changed the state, producing oscillation around the target. If the consequences of motion at one joint are not anticipated at the others, multijoint movement decomposes. If the transition between agonist and antagonist activity is not predicted accurately, rapid alternation loses its rhythm. The same logic applies to the precisely sequenced muscles of speech and to the relation between head and eye movement.

This is not merely an analogy fitted after the clinical signs were known. Human perturbation experiments reveal the predicted temporal signature. Brief transcranial magnetic stimulation over the lateral cerebellum during an ongoing arm movement caused the next reach to be planned from a hand-position estimate that was approximately 138 milliseconds out of date [@miall2007state]. In a separate continuous-tracking experiment, people with cerebellar ataxia showed approximately normal feedback gain but substantially greater phase lag than control participants. A model lacking a Smith-predictor-like component reproduced the deficit, and phase-advancing the visual feedback improved the patients’ control [@zimmet2020feedback].

The important failure is therefore not an absence of feedback. Feedback pathways remain capable of changing movement. The failure is that the controller lacks the normally available prediction that places delayed evidence into the current state of the body. The patient corrects a limb that has already moved on.

Cerebellar syndromes still differ with anatomy. Damage near the midline often affects stance, gait, and axial control. More lateral lesions can produce ipsilateral limb dysmetria and impaired multijoint coordination. Vestibulocerebellar damage disrupts gaze stabilization and balance. Posterior cerebellar lesions can alter language, executive performance, spatial organization, affect, or social behavior without producing the same motor syndrome as anterior sensorimotor lesions [@stoodley2016lesion]. The predicted variables and controlled systems differ across loops.

The contrast with paralysis should not be made absolute. Cerebellar disease can reduce muscle tone, alter force scaling, and make severe actions functionally impossible. The central distinction remains: elementary activation is often better preserved than prediction, timing, interjoint compensation, stability, and adaptation. The lesion syndrome is what should be expected when descending and spinal controllers can still act but have lost an important source of phase-advanced state information and learned compensation.

33.4 A map of predictive loops, not evolutionary ages

The cerebellum can be divided by lobules, fissures, molecular zones, afferent pathways, output nuclei, and patterns of activity. No single map serves every purpose. For understanding predictive control, three broad families of connections remain useful.

Vestibular and ocular-motor loops include the flocculus, nodulus, nearby vermis, vestibular inputs, and vestibular nuclei. They contribute to gaze stabilization, balance, orientation relative to gravity, and the prediction of sensory consequences produced by head and eye movement.

Spinal, brainstem, and somatic sensorimotor loops occupy much of the anterior lobe and intermediate cerebellum. They receive information related to limb and trunk state, spinal interneuronal activity, brainstem commands, and cerebral motor output. Their projections return through cerebellar nuclei to brainstem and thalamocortical motor systems. These loops are positioned to predict changing body state, compensate for interaction forces, and recalibrate posture and movement.

Cerebral association and motor loops are especially extensive in the lateral hemispheres and posterior cerebellum. Cerebral cortex reaches the cerebellum principally through the pontine nuclei, while cerebellar nuclear output returns through thalamus to motor, premotor, parietal, prefrontal, and other association territories. Viral tracing in nonhuman primates has demonstrated closed-loop organization linking different cerebellar territories with motor and prefrontal cortex [@kellystrick2003loops].

Cortical output also has a broad medial-to-lateral organization. Vermal Purkinje cells project mainly to the fastigial nucleus and to vestibular targets; intermediate cortex projects mainly to the interposed nuclei; and lateral hemispheric cortex projects mainly to the dentate nucleus. Those relations are useful first approximations, not sharply sealed channels. Nuclear territories contain several projection-cell classes, and each participates in recurrent networks with brainstem, thalamic, and cerebral targets [@dezeeuw2021diversity].

These families overlap. Vestibular signals influence limb and postural control. Somatic signals enter territories engaged during apparently cognitive tasks. Cerebral association cortex can alter eye movements, posture, and action selection. Functional imaging also reveals more than one body representation and broad maps related to distinct cerebral networks. The human cerebellum is therefore not arranged as one anterior motor region followed by one posterior cognitive region. It contains repeated, interleaved gradients of sensorimotor and association-related organization [@buckner2011organization; @king2019boundaries].

The traditional names archicerebellum, paleocerebellum, and neocerebellum encouraged the idea that vestibular, spinal, and cerebral territories were added in a simple old-to-new sequence. That evolutionary ladder is misleading. The terms describe useful anatomical biases only imperfectly, and living vertebrates do not preserve successive historical stages. Cerebellar evolution altered the relative size, geometry, inputs, outputs, and behavioral use of related predictive circuits in different lineages.

33.5 An ancient circuit for predicting consequences

The evolutionary history of the cerebellum begins before the appearance of the large, layered structure familiar in mammals. Comparative evidence suggests that vertebrate evolution assembled a predictive-learning circuit from older developmental and sensory-motor components, then repeatedly coupled that circuit to new effectors, sensory systems, and forebrain loops.

Jawless vertebrates reveal an early part of that history. Developmental studies in lamprey and hagfish have identified ancestral gene-expression programs associated in jawed vertebrates with the rhombic lip, ventricular zone, granule-cell lineage, and inhibitory cerebellar neurons [@sugahara2021cerebellum]. An adult sea-lamprey cell atlas, however, did not identify the full canonical ensemble of granule cells, Purkinje cells, and cerebellar nuclear neurons that defines the mature cerebellum of jawed vertebrates [@lamanna2023lamprey]. The most useful conclusion is not that lamprey possesses a tiny mammalian cerebellum or no relevant ancestry at all. Developmental territories and regulatory programs preceded the fully differentiated adult circuit.

A recognizable cerebellar organization is established across jawed vertebrates, but evolution did not preserve one fixed geometry. Amniotes concentrate many output neurons in deep cerebellar nuclei. Teleost fish instead place eurydendroid cells within the cerebellar tissue near Purkinje cells, where they can receive both Purkinje-cell and parallel-fiber input. The bichir Polypterus, an early-diverging ray-finned fish, shows an intermediate output arrangement rather than either the standard teleost pattern or an amniote deep nucleus [@ikenaga2022polypterus]. Major cell classes and circuit relations were conserved while their physical arrangement and projection routes changed.

Fish also possess cerebellum-like structures associated with electrosensory, mechanosensory, and related systems. These structures are not displaced pieces of the true cerebellum, but they share the same informative conjunction: peripheral sensory input, contextual parallel-fiber input, and plasticity that learns which sensory events follow the animal’s own behavior. In several species, the circuit constructs a negative image of predictable reafference. This is direct evidence that an ancient cerebellar-type architecture can learn a forward relation between action and sensory consequence.

Larval zebrafish show how that predictive contribution fits inside recurrent control. When visual feedback during the optomotor response was perturbed unpredictably, fish made rapid compensatory changes that could be described by an acute feedback controller. Disrupting the cerebellum did not abolish that immediate correction. When the altered relation between swimming and visual feedback persisted, however, the controller was recalibrated, and that longer-term adaptation required an intact cerebellum [@markov2021zebrafish]. The cerebellum did not replace feedback. It learned the changed action–consequence relation that allowed the feedback controller to work properly on subsequent movements.

Forebrain–cerebellar interactions expanded independently in several lineages. Mammals route extensive cortical input through the pontine nuclei. Birds possess pontine pathways but also a telencephalon-to-cerebellum relay through the medial spiriform nucleus. Comparative measurements show that this nucleus is especially enlarged in parrots and covaries with telencephalic size, indicating convergent expansion of forebrain–cerebellar communication through an anatomical route different from the mammalian cortico-ponto-cerebellar system [@gutierrezibanez2018parrots].

Within primates, the cerebellum and cerebrum scale largely together. Comparative analysis does not support a simple claim that apes acquired a wholly disproportionate cerebellum. It does support primate-wide relative expansion of particular lateral territories, especially crura I and II, as brains become larger [@magielse2023primate]. Regional reorganization is more informative than one ratio of total cerebellar to cerebral volume.

The evolutionary pattern is therefore one of conservation, duplication, and reassignment. Related circuits were embedded in vestibular, electrosensory, postural, locomotor, ocular, manipulative, vocal, and forebrain systems. The variables changed with the body and ecological niche, but the recurring problem remained: use recent context and self-generated activity to predict what should happen next, preserve sensitivity to what was not predicted, and recalibrate when the relation changes.

This view fits the larger argument of the book. Evolution did not replace older feedback controllers with an abstract forebrain planner. It added predictive loops that allowed inherited spinal, brainstem, sensory, and later cerebral controllers to operate despite delay and changing dynamics.

33.6 Predictable reafference in electric fish

Weakly electric mormyrid fish emit brief electric discharges and measure how nearby objects alter the resulting field. Each discharge changes the animal’s own receptors far more strongly than many external objects do. The sensory system must preserve small departures caused by prey or obstacles while discounting the large and highly predictable consequence of the fish’s own action.

Much of the classic work on this problem concerns the electrosensory lateral line lobe, a cerebellum-like structure rather than the true cerebellum. Peripheral electrosensory afferents carry the actual consequence of a discharge. Parallel fibers carry contextual signals related to the motor command and other events. Through associative plasticity, the parallel-fiber pathway learns a negative image of the sensory pattern that normally follows the discharge. Combining the learned negative image with the incoming electrosensory signal reduces the predictable component and leaves a larger response to what the discharge did not explain [@bell1997plasticity; @kennedy2014temporal; @requarth2014corollary].

The negative image is not a picture stored in the brain. It is a learned set of weights that maps command-related context onto the expected sensory waveform. Its amplitude and timing change when the relation between discharge and sensory consequence changes. The circuit therefore contains all of the principal elements of a forward model: information related to an action, a prediction of the resulting sensory input, comparison with the measured consequence, and plasticity driven by the residual.

This preparation is unusually decisive because the predicted signal can be separated from the external event. When the fish’s own discharge produces exactly the expected input, the negative image cancels it. When an object changes the field, the unexplained component remains. The output is not a generic reduction of sensation. It is selective suppression of the sensory consequence that the circuit has learned to attribute to the animal’s own action.

The electrosensory lateral line lobe is not a mammalian cerebellar lobule, and cancellation is not the purpose of every cerebellar loop. The stronger evolutionary point is that a cerebellar-type circuit demonstrably learns an action-to-sensation forward relation. In another loop, the predicted variable can be limb state, head motion, retinal slip, or the timing of an expected event. The common operation is not suppression as such. It is prediction of the component that the animal’s own activity should produce.

33.7 The prediction can be measured and perturbed

Forward-model accounts are not supported only by the fact that cerebellar lesions produce ataxia. Experiments have perturbed the predicted state, measured the temporal signature of failed prediction, and recorded cerebellar neurons carrying combinations of motor-related and sensory information expected from an internal model.

33.7.1 A hand represented 138 milliseconds in the past

In the human stimulation experiment introduced above, participants slowly moved one hand and were occasionally instructed to interrupt that movement with a rapid reach toward a target. Transcranial magnetic stimulation over the ipsilateral lateral cerebellum increased both the initial directional error and the final endpoint error. The pattern was not random. It was what would be expected if the new reach had been planned from an estimate of hand position that lagged the real hand by about 138 milliseconds [@miall2007state].

That result is unusually specific. Disrupting the cerebellum did not simply make the movement noisy or weak. It shifted the inferred time of the state used to plan the action. A forward-model contribution predicts exactly such a failure: without phase-advancing the recent sensory and motor information, the controller launches the next command from an obsolete state.

33.7.2 Feedback gain survives while phase advance is lost

A continuous tracking task separated how strongly participants used feedback from how late their response occurred. People with cerebellar ataxia produced feedback gains similar to those of control participants, but their movements showed substantially greater phase lag. A model containing a Smith-predictor-like element captured the control data, whereas removing that predictive element captured the patient data [@zimmet2020feedback].

The investigators then manipulated the feedback rather than the patient. Visual information about the participant’s own arm was advanced in time within a virtual display. The artificial phase advance improved control in the cerebellar group. This result ties the behavioral deficit to delay compensation: the system could use feedback, but it benefited when the experiment supplied part of the temporal advance normally provided by prediction.

33.7.3 Cerebellar neurons compare expected and actual self-motion

Vestibular control provides a preparation in which active and passive motion can be separated. The head can move because the animal generated a command or because an external apparatus imposed the same motion. The sensory receptors respond in both cases, but the nervous system should treat the two events differently. Active motion is partly predicted; passive motion is not.

When the normal relation between a voluntary head command and the resulting motion was unexpectedly altered in macaques, cerebellar neurons initially responded to the sensory discrepancy as though the motion had been externally generated. As the animal adapted, neuronal sensitivity declined with the same time course as the behavioral learning. The neurons tracked the changing difference between the predicted and actual consequence of the command [@brooks2015updating].

More recent recordings localized the predictive combination within Purkinje-cell populations of the anterior vermis. During passive head movement, the cells carried vestibular information. During attempted but mechanically prevented head movement, they carried motor-related information even though the expected movement did not occur. During active movement, the sensory and motor-related components combined so that the population predicted the sensory consequence of self-generated motion. A weighted model using the responses of roughly 40 Purkinje cells accounted for the cancellation observed downstream in early vestibular pathways [@zobeiri2024purkinje].

The vestibular system also exposes a different internal-model problem. Otolith organs respond similarly to linear acceleration and to a change in head orientation relative to gravity. Cerebellar Purkinje and nuclear neurons can separate those physically ambiguous inputs by using an internal model of gravity and motion. Under artificial motion conditions that produce an erroneous percept, the neurons represented the model-derived estimate rather than merely copying the peripheral signal [@laurens2013internal].

Together, these experiments identify more than a lesion syndrome. They show a state estimate shifted backward in time when the cerebellum is disrupted, preserved feedback gain accompanied by excessive phase lag, behavioral improvement when feedback is artificially advanced, neuronal responses that track sensory prediction error during adaptation, and Purkinje-cell populations that combine motor and sensory signals to predict self-generated motion. The forward model is therefore an empirically constrained account of what particular cerebellar circuits compute, not merely an engineering metaphor attached to ataxia.

33.8 The canonical circuit

The canonical cerebellar diagram begins with two afferent systems. Mossy fibers arise from many precerebellar sources, including pontine, vestibular, reticular, spinal, and external cuneate pathways. They terminate in the granule-cell layer and also send excitatory collaterals to cerebellar output neurons. Depending on the pathway and territory, they carry proprioceptive input, cutaneous input, vestibular information, spinal interneuronal activity, motor-cortical and association-cortical signals, task context, and mixtures of these variables.

That mixture is precisely what a forward model requires. To predict the consequence of an action, the circuit needs information about the state from which the action begins, the command or intention that will alter that state, and the context that changes the relation between command and consequence. A cup, a hammer, and an empty hand impose different dynamics on the same arm. A head turn generated voluntarily has different implications from the same motion imposed externally. Mossy-fiber pathways provide the ingredients from which such distinctions can be learned.

Each mossy fiber contacts granule cells within cerebellar glomeruli. A granule cell receives only a small number of mossy-fiber inputs, but the population is enormous. In mice, pontine motor-related input and proprioceptive input can converge on individual granule cells [@huang2013convergence]. The axon of each granule cell ascends and bifurcates into a long parallel fiber that crosses many Purkinje-cell dendritic trees. This expansion creates a high-dimensional basis in which combinations of state, command, context, and time can be separated and assigned different learned weights. Population imaging confirms that parallel-fiber activity can occupy a high-dimensional space during behavior [@lanore2021highdimensional].

Purkinje cells form the sole output of the cerebellar cortex. They fire frequent simple spikes shaped by parallel-fiber input, interneurons, intrinsic conductances, and recurrent network state. Their axons inhibit neurons in the deep cerebellar and vestibular nuclei. Because Purkinje cells fire tonically, prediction can be expressed through increases, decreases, pauses, synchrony, and changes in population pattern rather than through a simple on–off code.

Viewed within a forward-model framework, Purkinje simple-spike activity transforms the expanded mossy-fiber representation into a learned prediction or control-relevant estimate. The predicted variable need not be the same in every region. It may be expected retinal slip, the sensory consequence of head movement, the state of a limb after a command, or a timed signal that prepares a downstream response. What matters is that the output arrives early enough and in the appropriate coordinates to alter the controller before delayed feedback alone could do so.

The cerebellar and vestibular nuclei are not passive relays released whenever Purkinje inhibition falls. They receive direct excitatory collaterals from mossy and climbing fibers, inhibitory Purkinje input, and additional local and nucleocortical influences. Their projection neurons provide most of the cerebellum’s output to thalamic and brainstem targets. Synchronized changes across Purkinje cells can produce precisely timed nuclear spiking, illustrating how a population prediction can become a sharply timed command or estimate [@personraman2012synchrony].

The second afferent system originates in the inferior olive. Each Purkinje cell receives one climbing fiber in the adult mammalian cerebellum, but that fiber forms many synapses across the dendritic tree and evokes a distinctive complex spike. Olivary neurons are electrically coupled and organized with Purkinje cells and nuclear targets into narrow modules or microzones. Climbing-fiber events can signal a mismatch, an unexpected outcome, or another behaviorally important event and can powerfully change the mapping from parallel-fiber context to Purkinje output.

The circuit is therefore anatomically suited to predictive learning. Mossy and granule-cell pathways represent the conditions under which a consequence should occur. Purkinje and nuclear populations provide a learned, rapidly available transformation of those conditions. Climbing-fiber and other feedback-related signals revise the transformation when the observed consequence differs from the expected one.

The familiar sequence—mossy fiber, granule cell, parallel fiber, Purkinje cell, nucleus—is still a first approximation. The actual cerebellum contains molecular stripes, several interneuron classes, unipolar brush cells, recurrent pathways, nucleo-olivary and nucleocortical projections, and lineage-specific variants. Those details do not erase the forward-model interpretation. They reveal how many local predictive models can be embedded within a common circuit scaffold.

33.9 How prediction is learned

The regular architecture of the cerebellum encouraged unusually explicit theories of learning. Marr proposed that mossy-fiber information would be expanded into many granule-cell combinations, allowing Purkinje cells to learn which patterns mattered. Albus developed a related account in which climbing-fiber activity altered the efficacy of active parallel-fiber synapses [@marr1969theory; @albus1971theory]. Ito and colleagues then demonstrated long-lasting depression of parallel-fiber influence on Purkinje cells when parallel-fiber and climbing-fiber activity were paired [@itokano1982longlasting].

This parallel-fiber–Purkinje-cell long-term depression, or LTD, supplied a plausible mechanism for updating a forward model. Parallel-fiber activity specifies the state, command, and context associated with a prediction. A climbing-fiber event marks a consequence that was not adequately predicted. Changing the active synapses alters what Purkinje output will be produced the next time a similar state and action recur.

The modern account distributes that learning across more of the circuit. Parallel-fiber transmission can potentiate as well as depress. Inhibitory synapses, mossy-fiber pathways, intrinsic excitability, and cerebellar nuclear circuits also change with experience. Experiments that disrupt one canonical LTD pathway can leave substantial vestibulo-ocular, eyeblink, or locomotor learning intact [@schonewille2011ltd]. This does not weaken the predictive-learning account. It means that the model is implemented by coordinated plasticity at several sites rather than stored at one synapse.

Climbing fibers also carry richer instructive information than a binary signal meaning wrong. Complex spikes are often related to movement error, retinal slip, unexpected outcomes, or events that can teach a correction. Their probability, synchrony, timing, and duration can carry information about error direction and magnitude. In other tasks they relate to reward, reward omission, salience, movement onset, or learned expectation [@kitazawa1998complexspikes; @ohmae2015temporaldifference; @kostadinov2019reward]. Different modules receive different teaching variables because they predict different consequences.

The common logic remains clear. A context activates a distributed granule-cell pattern. Purkinje and nuclear output express the currently learned prediction or anticipatory adjustment. The observed result returns through sensory and instructive pathways. A discrepancy modifies the mapping so that future output better matches the body’s dynamics and the environmental relation in that context.

Learning can also migrate in its expression. Early adaptation may depend strongly on cerebellar cortical plasticity, while repeated practice changes nuclear, brainstem, spinal, or cerebral circuits that carry a faster policy. The cerebellum can then supervise a controller whose routine output is no longer generated entirely within cerebellar cortex. This is one reason a learned skill can become automatic while still requiring the cerebellum when circumstances change [@dezeeuw2021diversity].

Marr, Albus, and Ito did not provide the last word on cerebellar plasticity. They provided the essential architecture of an answer: an enormous contextual basis, a powerful instructive pathway, and modifiable output to recurrent control systems. Modern work has enlarged that architecture without removing its central relevance to forward-model learning.

33.10 Three uses of predictive control

The same predictive architecture appears in different forms across cerebellar-dependent tasks. Vestibulo-ocular adaptation recalibrates a rapid controller. Reaching adaptation updates the expected consequence of an action. Eyeblink conditioning learns both the occurrence and timing of an anticipated event.

33.10.1 Vestibulo-ocular adaptation calibrates a rapid controller

When the head turns, the vestibulo-ocular reflex, or VOR, rotates the eyes in the opposite direction. The immediate drive begins with vestibular evidence that the head is moving. This short-latency response helps stabilize an image before slower visual pathways could calculate and correct retinal motion.

The reflex must be calibrated. If eye rotation is too small or too large for the head movement, the image slips across the retina. Magnifying or minifying lenses change the eye movement required for stability. Repeated exposure gradually changes VOR gain so that the eyes again compensate appropriately. The floccular cerebellum and vestibular nuclei are central to this adaptation, and climbing-fiber activity can convey the retinal-slip error that revises the predicted relation between head motion and required eye motion [@boydenraymond2003vor; @dezeeuw2021diversity].

The VOR illustrates nested control. Vestibular pathways provide rapid feedback about actual head movement. A cerebellar model predicts the eye movement needed under the current dynamics and calibrates the short-latency loop. Visual feedback reports the residual slip. Prediction, feedback, and plasticity are components of one controller operating at different delays.

33.10.2 Reaching adaptation updates action–consequence relations

In a visuomotor-rotation task, a participant moves a hand while a cursor is rotated away from the true hand direction. Early reaches miss the target. With practice, the motor command changes in the opposite direction, and removal of the rotation produces an aftereffect. The aftereffect shows that the relation between command and expected visual consequence has been recalibrated rather than merely corrected consciously on each trial.

People with cerebellar degeneration show reduced adaptation in tasks designed to isolate sensory prediction error: the difference between the sensory consequence expected from a command and the consequence that actually occurs. Allowing online corrections does not rescue this learning, and the size of abnormal online corrections is not what predicts the adaptation deficit [@tseng2007cerebellum]. The cerebellum is required specifically for using the mismatch to revise the action–consequence model.

Explicit strategies can still improve task performance. A participant may deliberately aim away from the visible target, and cerebral systems can learn reward-based solutions. Those routes explain why performance and adaptation are not identical. The cerebellar contribution is the slower, largely automatic recalibration that changes what sensory consequence the same command is expected to produce.

33.10.3 Eyeblink conditioning predicts whether and when

In delay eyeblink conditioning, a neutral cue such as a tone begins before an air puff near the eye and overlaps with it. After repeated pairings, the cue evokes a blink that peaks near the expected time of the air puff. The cerebellum and its output pathways are necessary for normal acquisition and expression of this learned response, and focal cerebellar lesions can prevent acquisition even when the unconditioned blink remains available [@lincoln1982eyeblink; @mccormickthompson1984eyeblink].

The learned response is predictive in two senses. The cue comes to specify that an aversive event is likely, and the cerebellar circuit shapes a response whose timing protects the eye without closing it unnecessarily early. Climbing-fiber activity can signal both unexpected delivery and unexpected omission of the air puff, producing a temporally structured prediction-error signal [@ohmae2015temporaldifference].

When a gap separates the cue from the air puff in trace conditioning, forebrain structures become more important. The comparison is useful: a cerebellar loop can learn a compact predictive relation when cue and consequence overlap, while a longer temporal bridge recruits a broader network. Prediction is not an abstract forecast detached from behavior. It is learned in the coordinates and timescale of the controller that will use it.

33.11 Forward models organize the other concepts

Several terms in motor-control theory are often presented as competing explanations of the cerebellum. In practice, many describe different parts of the same predictive architecture.

Component	Role within predictive control
Motor-related signal or efference copy	Informs the predictive circuit that a command has been issued or an action is intended
Forward model	Maps estimated state, action, and context onto a predicted next state or sensory consequence
State estimator	Combines the forward prediction with delayed and uncertain sensory evidence to estimate the present
Sensory prediction error	Measures the difference between predicted and observed consequence and drives correction and learning
Adaptive filter	Provides one circuit-level method for learning the predictable component of a signal from contextual inputs
Learned policy	Maps a recurring state or context directly onto an anticipatory response, often after prediction errors have trained it

A state estimator is therefore not an alternative to a forward model. It is one of the principal uses of a forward model. Sensory receptors report the recent past; the model projects that evidence forward; the estimator combines prediction and measurement. The resulting state can guide cerebral, brainstem, or spinal controllers.

An adaptive filter is a possible implementation. Parallel-fiber patterns provide contextual basis functions, learned weights approximate the expected component of another signal, and the residual reveals what was not predicted [@fujita1982adaptive]. The electric-fish negative image makes this operation visible. In a limb-control loop, the same mathematical arrangement can estimate an interaction torque or tune a corrective response rather than cancel a sensory waveform.

A learned policy addresses a related but different level. With practice, a context can evoke the useful anticipatory correction directly. The nervous system need not repeatedly simulate every detail of a familiar action. Model-based learning and policy learning can cooperate: prediction error trains the policy, the policy supplies speed, and the forward model becomes especially important when the context changes or an unexpected disturbance occurs. Recent theoretical work places this interaction at the center of cerebellar predictive motor control [@nguyenperson2025predictive].

Timing is intrinsic to all of these operations. A prediction must specify not only what consequence will occur but how the state will evolve. The cerebellum’s contribution to timed responses, movement dynamics, and sequencing does not require a separate clock added beside the forward model. A forward model predicts a trajectory through time, while the recurrent circuit converts that trajectory into appropriately phased output.

Nor must a cerebellar module contain a complete simulation of the body. Local models are enough. One loop can predict retinal slip from head motion; another can estimate the limb state relevant to a reach; another can learn the sensory consequence of a vocal or respiratory command. The cerebellum can therefore contain many partial models distributed across loops, each expressed in the coordinates needed by its target.

The most useful summary is direct:

Cerebellar circuits learn phase-advanced predictions that relate state and action to consequence. Those predictions support state estimation, anticipatory control, sensory cancellation, and error-driven recalibration.

This formulation identifies what the cerebellum contributes without requiring that every region expose the same intermediate variable or that predictive control exclude learned policies, timing, or feedback.

33.12 Beyond movement

The forward-model account is best established in sensorimotor, vestibular, ocular-motor, and active-sensory circuits. Cerebral–cerebellar anatomy nevertheless extends the same circuit scaffold into premotor, prefrontal, posterior parietal, language-related, and limbic-associated networks. Functional imaging reveals extensive lateral posterior regions that are more engaged during language, working-memory, social, and affective tasks than during elementary movement. Resting-state organization likewise maps large portions of the cerebellum onto association networks rather than primary motor systems [@kellystrick2003loops; @buckner2011organization; @guell2018triple].

Lesion evidence gives those maps clinical significance. Damage concentrated in posterior cerebellar territories can produce executive, visuospatial, language, and affective changes collectively described as the cerebellar cognitive affective syndrome [@schmahmannsherman1998ccas]. Some patients show slowed or poorly organized thought, reduced verbal fluency, impaired working-memory manipulation, flattened or disinhibited affect, or difficulty adjusting behavior to context. Lesion location matters: anterior-lobe damage is more strongly associated with motor impairment, whereas posterior-lobe damage is more likely to produce cognitive and affective consequences [@stoodley2016lesion]. A 2025 meta-analysis spanning 129 studies found group-level impairment across every assessed cognitive domain, with the largest effects in processing speed, language, and social cognition; the profile differed across focal and degenerative disorders [@reumers2025cognition].

Predictive control gives these findings a mechanistic direction. A cerebral association network also unfolds through states. A sentence prepares likely continuations. A multistep action establishes expectations about what should come next. A social exchange generates predictions about timing, response, and context. Cerebellar loops could learn those sequential relations and provide phase-advanced signals that help cerebral controllers remain fluent, appropriately scaled, and rapidly adjustable.

The phrase dysmetria of thought captures this proposed continuity [@schmahmann1998dysmetria]. A motor action can be too large, too small, too late, or poorly coordinated with the state that surrounds it. A thought, emotional response, or behavioral sequence can likewise be poorly timed or scaled relative to context. The phrase should be treated as a network hypothesis rather than a claim that thought is a disguised movement. Its value is that it predicts a particular kind of deficit: not the simple erasure of a faculty, but degraded sequencing, calibration, and contextual adjustment.

The predictive account is more informative than saying only that the cerebellum “supports cognition.” It proposes that association loops use cerebellar learning to anticipate the next relevant state and to update that expectation when the sequence departs from experience. The exact variables remain less directly measured than in reaching or vestibular control, but the anatomical loops and clinical effects make the extension plausible and testable.

Evolution supplies a parallel clue. Primates and parrots independently expanded pathways linking flexible forebrain systems with cerebellar circuitry. That convergence is consistent with selection for applying predictive learning to increasingly elaborate behavioral sequences. It does not make the cerebellum the seat of language, planning, or social cognition. It makes the cerebellum a predictive component within the distributed controllers that produce them.

33.13 Why cerebellar signs are often ipsilateral

A lesion of the left cerebral motor cortex commonly weakens the right side of the body because the corticospinal pathway crosses. A lesion of the left cerebellar hemisphere more commonly produces ataxia of the left limbs. This apparent exception follows from the crossings within the larger loop.

Cerebral input reaches the opposite cerebellar hemisphere mainly through crossed pontocerebellar fibers. Cerebellar nuclear output then crosses in the superior cerebellar peduncle to influence thalamus and motor cortex. Descending corticospinal output crosses again before reaching the limb. Two crossings restore the cerebellar hemisphere’s predominant relation to the ipsilateral body. Several spinocerebellar pathways also preserve or recover an ipsilateral relation through uncrossed or double-crossed routes.

The rule is most useful for lateralized appendicular signs. Damage to the vermis commonly produces truncal and gait ataxia rather than a clean unilateral limb syndrome. Vestibular and ocular-motor effects can be bilateral or direction-specific. Diffuse disease and lesions that interrupt cerebellar peduncles can produce more complicated patterns.

The bedside distinction remains valuable. Contralateral weakness suggests damage along a cerebral corticospinal system; ipsilateral dysmetria with relatively preserved strength suggests a cerebellar hemisphere or its connections. As throughout this unit, the kind of failure and the side on which it appears jointly constrain lesion localization.

33.14 How cerebellar function is studied

The case for predictive internal models rests on convergence across methods. Each method exposes a different part of the control loop.

Clinical lesions and degenerative disease show that an intact cerebellum is necessary for normal accuracy, timing, adaptation, and cognitive-affective regulation. The distribution of ataxia, ocular signs, dysarthria, cognitive change, and adaptation deficits relates those functions to cerebellar territory. Natural lesions are rarely confined to one microzone, and degeneration changes connected networks over time, so the syndrome becomes most informative when paired with a computationally designed task.

Adaptation and conditioning paradigms control the relation between an action and its consequence. Visuomotor rotations, force fields, VOR gain changes, and eyeblink conditioning can separate immediate feedback correction, explicit strategy, gradual recalibration, retention, and aftereffects. They allow sensory prediction error to be manipulated independently from success or from the corrective movement itself.

Causal perturbation can interrupt the predictive signal at a selected moment. Cerebellar TMS can make a state estimate behave as though it were temporally stale. Reversible inactivation, stimulation, and targeted genetic manipulation can distinguish acquisition from expression and identify the regions and plasticity mechanisms needed for each phase.

Single-neuron recording and optical imaging reveal simple spikes, complex spikes, granule-cell activity, nuclear output, and population dynamics during behavior. Active versus passive movement, attempted but prevented movement, and experimentally altered action–consequence relations help separate sensory input from motor-related prediction. These designs have directly identified Purkinje-cell signals that predict the sensory consequences of self-motion.

Circuit tracing and comparative anatomy identify the pathways through which state, command, context, and instructive signals enter a module and the controllers that receive its output. Cerebellum-like electrosensory circuits provide unusually accessible examples of a learned negative image. Developmental genetics and cell-type atlases show how related circuit components were reorganized across vertebrate evolution.

Human functional imaging and connectivity map motor and nonmotor territories across the whole cerebellum. Imaging can reveal reproducible loops and task organization that guide lesion and stimulation studies. It is most persuasive when its maps converge with anatomy, behavior, physiology, and causal perturbation.

Computational models specify the variables that verbal theories can leave vague: the plant, the state, the delay, the predicted consequence, the feedback gain, and the learning rule. The Smith-predictor account, for example, predicted preserved feedback gain with excessive phase lag after cerebellar damage [@miall1993smith]. That pattern was later observed, and the model suggested the phase-advance manipulation that improved performance [@zimmet2020feedback].

The cumulative evidence is stronger than any one method. Lesions reveal the failure, controlled tasks identify the error signal, causal perturbation shifts the state estimate, neuronal recording exposes the predicted consequence, comparative circuits show how the mapping can be learned, and models explain why the operation stabilizes a delayed embodied controller. Together they support a positive account of cerebellar function rather than a definition based only on what is lost after damage.

33.15 Coda: prediction keeps control in the present

The cerebellum does not choose every goal, generate every movement, or provide the final route to muscle. Its contribution is more specific and more consequential. It learns how states change when actions are issued. It uses those learned relations to predict what the body and sensory systems are about to do. It supplies other controllers with estimates and anticipatory adjustments that arrive before delayed feedback could fully describe the result.

That account explains the characteristic lesion syndrome. Movement remains possible because cerebral, brainstem, spinal, and peripheral motor systems remain available. Movement becomes dysmetric, decomposed, unstable, and difficult to adapt because those systems are acting with an impoverished estimate of the present and an inaccurate forecast of the consequences. Corrections arrive, but they arrive late. Interaction forces occur, but they are not compensated at the right time. Repeated error no longer recalibrates the controller normally.

The forward model is not merely inferred from that syndrome. Cerebellar stimulation can make a reach begin from a state estimate approximately 138 milliseconds out of date. Patients can preserve feedback gain while losing the phase advance that allows feedback to stabilize rapid control. Advancing their visual feedback can improve performance. Cerebellar neurons track mismatches between expected and actual self-motion during adaptation, and Purkinje-cell populations combine motor-related and sensory signals to predict the consequences of active movement. Cerebellum-like circuits in electric fish reveal how contextual input and plasticity can construct a negative image of predictable reafference.

The anatomy gives those results a plausible mechanism. Mossy and granule-cell pathways provide a vast basis for representing state, action, context, and time. Purkinje and nuclear populations transform those conditions into phase-advanced output. Climbing fibers and other feedback-related signals modify the mapping when the consequence differs from the prediction. Plasticity is distributed across the circuit, but the learning problem is coherent: predict what the controlled system will do, compare that prediction with what occurred, and improve the next estimate and action.

Evolution repeatedly placed related circuitry into new loops. Fish use cerebellum-like networks to separate self-generated sensory consequences from external events. Zebrafish use the true cerebellum to recalibrate a feedback controller when the visual consequences of swimming change. Birds and mammals independently expanded routes linking flexible forebrain systems with cerebellar circuitry. The embodied-controller hierarchy became more capable not by escaping feedback, but by learning to predict across its delays.

The same framework gives a principled way to approach the cognitive cerebellum. Cerebral association systems also unfold through sequences and consequences. Cerebellar loops can help predict the next relevant state, time a response, and update an expectation when context changes. The evidence is less direct than it is for eye, head, and limb control, but the question is no longer whether the cerebellum has any role beyond movement. It is which states and consequences each association loop learns to predict.

The title The Predicting Machine should therefore be read literally enough to guide explanation but not so literally that the cerebellum becomes an isolated simulator. Prediction is a circuit operation embedded within recurrent control. The model receives its meaning from the body and controller to which it is connected. Feedback keeps the model honest; the model lets feedback arrive in time to matter.

The next chapter turns to the basal ganglia. The contrast is useful. Cerebellar loops predict consequences and recalibrate control; basal-ganglia loops help determine which actions gain access, how vigorously they are pursued, and when they are sustained or switched. Both systems are ancient, recurrent, and deeply interconnected with cortex and brainstem. Together they show how an embodied nervous system can select an action and control it before all of its consequences have arrived.

A note on what this chapter is sure of, and what it isn’t

We are confident that:

Sensory and motor delays create a fundamental control problem: rapid behavior requires an estimate of the present state and the likely consequences of ongoing action.
The cerebellum is a major neural substrate for predictive internal models in sensorimotor control. It combines motor-related, sensory, and contextual information to generate phase-advanced estimates and anticipatory adjustments.
Cerebellar damage commonly preserves the ability to initiate movement while producing dysmetria and impairing stability, interjoint coordination, timing, and adaptation in the pattern expected from degraded predictive control.
Causal disruption of lateral cerebellum can make human reaching use a hand-state estimate that is approximately 138 milliseconds out of date.
People with cerebellar ataxia can retain feedback gain while showing excessive phase lag, and artificially advancing visual feedback can improve their control.
Purkinje-cell populations in primate cerebellum can combine motor-related and sensory signals to predict the sensory consequences of active self-motion.
Sensory prediction errors drive important forms of cerebellum-dependent reaching, vestibulo-ocular, and eyeblink learning.
Cerebellar and cerebellum-like circuits can learn predictable relations between contextual signals and sensory consequences.
Mossy fibers, granule cells, parallel fibers, Purkinje cells, climbing fibers, and cerebellar or vestibular output neurons provide the core architecture for that predictive learning.
Cerebellar contributions extend beyond elementary movement through organized loops with cerebral association networks.

We have good reason to think that:

The cerebellum contains many local forward models, each learning the dynamics and consequences relevant to the controller with which that module is coupled.
State estimation, adaptive filtering, temporal prediction, and learned feedforward policies are complementary uses or implementations of predictive cerebellar processing rather than unrelated explanations.
Climbing fibers often provide module-specific instructive signals that include sensory prediction error, while plasticity across cerebellar cortex and nuclei stores and expresses the learned mapping.
Related predictive-learning circuits were repeatedly modified and recruited into different sensory, postural, locomotor, ocular, manipulative, vocal, and forebrain loops during vertebrate evolution.
Cerebellar association loops contribute to the sequencing, timing, prediction, and contextual calibration of cognitive and affective behavior.

We remain genuinely unsure about:

Which state variables and sensory consequences are represented in each cerebellar module, and how those representations change across tasks.
How labor is divided among explicit forward prediction, state estimation, and directly learned policies during well-practiced behavior.
How learning is distributed among cerebellar cortical, nuclear, brainstem, thalamic, spinal, and cerebral sites during acquisition, consolidation, and long-term expression.
Which aspects of complex-spike activity carry prediction error, reward, salience, timing, or other instructive variables in different modules.
How closely predictive operations in language, executive control, social behavior, and affect correspond to those demonstrated in sensorimotor loops.
What adult circuit in the last common vertebrate ancestor preceded the canonical cerebellum of jawed vertebrates.

The remaining questions concern the variables, implementation, and scope of predictive cerebellar processing. They do not reduce the cerebellum to a structure known only through the motor deficits that follow its loss.