6  Chapter 3.4 — How a Synapse Learns

Hebbian Plasticity, Eligibility, and the Dopamine Signal

This unit began with a promise about where it would end. The overview said we would finish where signaling becomes memory — with the activity-dependent changes that let a synapse record its own history, with how those changes are reinforced, and with a first look at the hardest problem a learning brain faces: how to assign credit for a reward that arrives only after the act that earned it. Two chapters of machinery now stand between that promise and its payment, and they have left us holding precisely the two pieces we need.

The fast chapter built a neuron that can be excited and can excite others, and at its very end it noticed something it had assembled almost in passing. When a neuron fires, the action potential born at the axon hillock does not only travel outward down the axon; a wave of depolarization can also propagate backward, into the soma and into much of the dendritic tree — the back-propagating action potential — so that recently active synapses, especially those on proximal and permissive dendritic regions, are given a postsynaptic signal that the cell has fired. The slow chapter built the broadcast layer, and at its end it named the molecule that layer releases when something turns out better or worse than expected: dopamine, sent not down a wire but by volume, arriving as a phasic burst over a tonic background, slow enough to shape the synapses it reaches.

Hold those two facts side by side and the architecture of this chapter is already visible in outline. A synapse, having just been active, carries a fading mark of its own recent activity — a tag that says, in effect, I was active a moment ago. A modulatory signal can arrive afterward and tell that tag whether the activity it marks turned out to be worth keeping. The fast layer marks what happened; the slow layer, arriving late, decides what it meant. This chapter assembles those two layers into a mechanism that learns.

One scope note before we begin, because it will keep a powerful teaching path from hardening into a false universal. Most of this chapter follows the best-studied case: plasticity at excitatory glutamatergic synapses — especially hippocampal and cortical synapses for Hebbian LTP, and corticostriatal synapses for the dopamine-gated reinforcement learning at the end. Other synapses learn too — inhibitory synapses, the synapses of the modulatory systems themselves, the cerebellum’s distinctive LTD, the many forms expressed on the presynaptic side — but they do not all follow this one template, and where the chapter says “a synapse,” it usually means this particular, well-mapped kind. The narrative runs cleanest if we follow that path and are honest, at the end, about how much of the brain’s plasticity it does and does not cover.

We will build it in the unit’s usual order, from the local and fast to the global and slow. First the idea that started it all — Donald Hebb’s postulate, and the molecular detector, the NMDA receptor, that turns out to implement it almost literally. Then the changes that detector triggers: long-term potentiation and depression, strengthening and weakening, with the amount of calcium as the switch between them and the trafficking of receptors as the change made visible. Then a refinement that timing forces on the whole picture — spike-timing-dependent plasticity, where the back-propagating spike from the fast chapter becomes the engine of a learning rule. Then the synapse talking back, the retrograde signal we have twice promised. And then the turn that makes the chapter a capstone rather than a list of mechanisms: the discovery that a purely correlational rule, however elegant, cannot be the whole story, because correlation is silent about value and mute about delay. The repair is the eligibility trace and the dopamine signal that reads it — the three-factor rule, and with it the credit-assignment problem the overview promised we would end on.

6.1 Hebb’s postulate

The founding idea is older than almost everything else in this unit’s molecular detail, and it was proposed before any of the machinery was known. In 1949 the Canadian psychologist Donald Hebb, in The Organization of Behavior, asked what physical change in the brain could underlie learning, and gave an answer so durable that it now reads as obvious. His proposal was that when one neuron repeatedly takes part in firing another, the connection between them is strengthened — so that thereafter the first is more effective at exciting the second. The synapse, in other words, keeps a record of its own success, and the record is written as a change in strength.

Two features of Hebb’s actual proposal are worth lingering on, because the famous slogan that grew up around it loses them both. The first is that Hebb’s rule is about coincidence with a consequence: cell A must not merely be active at the same time as cell B, it must “take part in firing” B — it must have been active and B must have fired, the two together. That is a coincidence detector’s job description, and we will shortly meet the molecule built to do exactly it. The second is that Hebb’s rule, as he wrote it, is quietly directional and causal. A takes part in firing B; the arrow runs from the input that arrives first to the output it helps produce. This is more than “they were active together.” It is the germ of a claim about which synapse deserves credit — the ones that were active just before the cell fired, and so might have helped cause it. We will see that germ flower into a precise timing rule.

What the rule does, at the level of a network, is turn correlation into structure. Inputs that are reliably active together, and reliably active when their shared target fires, grow strong together; inputs that fire at unrelated times do not. Run this over a population and the synapses come to encode the statistical regularities of whatever the network experiences — the co-occurrences, the predictable sequences, the features that travel together in the world. This is learning of a particular and limited kind: unsupervised and local. Unsupervised, because nothing outside the synapse tells it what is worth learning; it simply records what co-occurs. Local, because each synapse decides using only information physically available at that synapse — its own activity and the state of the cell it sits on. No global signal, no teacher, no reward. Just correlation, written into weights. Keep that double limitation in view, because the second half of this chapter is the story of why it is not enough.

Hebb’s postulate is almost always paraphrased and almost never quoted, which has let it drift. His own formulation was that when an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change occurs in one or both cells such that A’s efficiency as one of the cells firing B is increased. Three things are notable. He explicitly allowed the change to be presynaptic, postsynaptic, or both — a question that would be fought over for decades and that we will meet again as the “locus of expression” debate. He framed it in terms of firing, not mere co-activation, building causation into the rule from the start. And he proposed it as frankly speculative; he had no mechanism, and the molecular detector that would vindicate him lay a quarter-century in the future.

The catchphrase “cells that fire together wire together” is not Hebb’s. It was coined much later, by the neuroscientist Carla Shatz, as a compression of Hebb’s idea for a general audience, and it is a brilliant piece of mnemonic engineering. But it is a lossy compression. “Fire together” drops the directional, causal element — the takes part in firing — and flattens Hebb’s asymmetric, sequence-sensitive proposal into a symmetric statement about simultaneity. As we will see when we reach spike-timing-dependent plasticity, the difference between “fired together” and “one fired just before and helped cause the other” is not pedantic. It is the difference between detecting correlation and detecting prediction, and the synapse turns out to care about exactly that difference.

6.2 The molecular detector: the NMDA receptor

Hebb described a coincidence detector without knowing whether the brain contained one. It does, and we have already met it three times in this unit without watching it work. At an excitatory glutamate synapse, two ionotropic receptors sit side by side in the postsynaptic membrane. The AMPA receptor is the workhorse the fast chapter described: glutamate binds, the channel opens, sodium floods in, and the membrane depolarizes into the brief EPSP that carries the moment-to-moment signal. The NMDA receptor looks like its cousin but behaves strangely, and the strangeness is the whole point. At the resting potential, its channel is physically corked by a magnesium ion drawn into the pore — the cameo the magnesium row of the ion table was waiting for. Glutamate binding, together with a co-agonist such as glycine or D-serine, gates the receptor; but at the resting membrane potential the pore is blocked by that magnesium, so almost no current flows even when the receptor is fully bound. Substantial current passes only when the membrane is already depolarized, which electrostatically expels the positive magnesium from the pore and unblocks the channel.

Read off what that means and you have Hebb’s detector in a single molecule. The NMDA receptor passes current only when two conditions are met at once: glutamate must be present — meaning the presynaptic cell was active and released transmitter — and the postsynaptic membrane must be sufficiently depolarized — driven there by the cell’s other inputs or by a back-propagating spike sweeping past. Presynaptic activity supplies the glutamate; local postsynaptic depolarization unplugs the magnesium. Only their conjunction lets calcium through. This is the coincidence Hebb needed — input active and target driven toward firing — detected at the level of one receptor by the meeting of a chemical cue and an electrical one. (Note the receptor reads local dendritic depolarization, which usually but not always means the cell as a whole is firing; a strong dendritic event can satisfy it without a full somatic spike, a subtlety the STDP section will return to.)

And what comes through the unplugged channel is the decisive part. The NMDA receptor is unusual among glutamate-gated channels in being substantially permeable to calcium. So when, and only when, the coincidence is satisfied, calcium enters the postsynaptic spine. We met calcium in the last chapter as the brain’s great internal courier, the second messenger that triggers transmitter release on the presynaptic side; here it does the converse job on the postsynaptic side. The calcium that floods a spine through open NMDA receptors is the signal that says the Hebbian condition was just met here — and, as the next section shows, it is also the signal that decides what to do about it. The AMPA receptor carries the message; the NMDA receptor watches for coincidence and, when it finds it, admits the calcium that will rewrite the synapse. Two receptors for one molecule, two jobs, and the division of labor is exactly the molecule-versus-receptor principle this unit has pressed from the sponge onward.

The NMDA receptor’s coincidence requirement is even stricter than the two-condition story lets on, and the extra condition reaches back to the cells chapter. Opening the channel requires not only glutamate but a co-agonist bound at a separate site — either glycine or D-serine. This matters because it hands a third party partial control over whether the detector can work at all. Recall the gliotransmission deeper dive from the chapter on cells: the astrocyte wrapped around the synapse can regulate the local availability of D-serine, and we flagged then that a cell controlling D-serine “could therefore influence whether nearby synapses are capable of strengthening.” Here is that thread paid off — with a caveat the cells chapter would endorse. Astrocytes are one important regulator of this co-agonist environment, not its sole master: the relative contribution of astrocytic versus neuronal D-serine, and of glycine versus D-serine, varies across synapses and across development, and the question is still actively worked on. The defensible and interesting claim is the weaker one: the tripartite synapse is not only a housekeeper that clears glutamate and buffers potassium; through its hand in the co-agonist supply, the astrocyte can act as a conditional influence on plasticity itself. Learning at a synapse can depend, in part, on the glial cell next to it — which is already a long way from glue.

The amount and speed of the calcium signal also depend on physical geometry that the clean account omits. Calcium entering through NMDA receptors does not flood the spine uniformly; it forms steep, short-lived nanodomains near the channel mouth, and the dendritic spine — that small protrusion the cells chapter introduced — exists in part to compartmentalize this calcium, sealing it into a tiny volume so that the signal stays local to the one synapse that earned it rather than spilling to its neighbors. The narrow spine neck is an electrical amplifier, as the fast chapter noted, and also a diffusional bottleneck that keeps each synapse’s calcium its own. This is why plasticity can be synapse-specific: the spine is a biochemical isolation chamber, and the calcium verdict reached inside one spine need not be shared with the spine beside it.

6.3 Strengthening and weakening: LTP and LTD

A detector is only useful if something acts on what it detects. In 1973 Tim Bliss and Terje Lømo, recording in the hippocampus of anesthetized rabbits, supplied the missing demonstration. They delivered a brief, high-frequency burst of stimulation to a bundle of axons and found that the synaptic response it evoked grew larger — and, crucially, stayed larger for hours afterward. A few seconds of intense activity had produced a lasting increase in synaptic strength. They had found long-term potentiation, or LTP, and it remains the most studied candidate mechanism for memory in the brain. The mirror-image phenomenon, long-term depression or LTD — a lasting decrease in synaptic strength, typically produced by prolonged low-frequency activity — was characterized later, and it is just as important, because a synapse that can only strengthen is a synapse that will eventually saturate and stop carrying information. Memory needs both a way to write and a way to erase.

What determines which one a synapse undergoes? In the classic NMDA-dependent hippocampal case, an elegant first approximation is that both are triggered by the same thing — calcium entering through NMDA receptors — and that the amount of calcium is the switch. A large, fast, sharp rise in postsynaptic calcium drives potentiation; a smaller, more modest, more prolonged rise drives depression; and a trickle below some floor does nothing at all. The same messenger, read by its concentration and time course, produces opposite structural verdicts. This is why high-frequency stimulation, which drives NMDA receptors hard and pours calcium in fast, tends to potentiate, while sluggish low-frequency stimulation, which lets in a thin sustained dribble, tends to depress. The picture is a genuine simplification — LTD in particular has several mechanisms, and even among glutamatergic synapses the outcome depends on receptor subtype, dendritic location, prior activity, and neuromodulatory state — but as a way to grasp how one signal yields two opposite changes, the calcium-amount switch is the right first idea: the synapse reads a single calcium signal against thresholds, and chooses to strengthen, weaken, or stand pat accordingly.

The change itself — what “stronger” and “weaker” physically mean — turns, in many of the best-studied glutamatergic synapses, largely on the other receptor. The NMDA receptor is the detector, but it is the AMPA receptors that carry the ordinary synaptic current, so one major way to change a synapse’s strength, especially at the much-studied hippocampal CA1 synapse, is to change how many AMPA receptors it has (and how well they conduct). Potentiation inserts additional AMPA receptors into the postsynaptic membrane: with more receptors present, the same puff of glutamate opens more channels, admits more sodium, and produces a larger EPSP. Depression does the reverse, removing AMPA receptors so that the same glutamate produces a smaller response. The detector stays put and keeps watching; the workhorse population is what grows or shrinks. This postsynaptic, receptor-trafficking route is not the only way a synapse can change its weight — some forms are expressed presynaptically, as a change in how much transmitter is released, which the retrograde-signaling section will reach — but it is the cleanest and best-documented, and it is the precise sense in which the fast chapter’s promise comes due: AMPA and NMDA, the two receptors it distinguished “because they return in the plasticity chapter,” turn out to play the two distinct roles plasticity requires, one sensing, one expressing.

The calcium-amount switch is implemented by a competition between two kinds of enzyme reading the same signal. A large, fast calcium transient preferentially activates a protein kinase — calcium/calmodulin-dependent protein kinase II, or CaMKII, one of the most abundant proteins in the postsynaptic density — which phosphorylates targets that drive AMPA receptors into the synapse and is itself capable of autophosphorylation, switching into a persistently active state that can outlast the calcium pulse that started it. CaMKII is, in effect, a molecular switch with memory, and it has long been a leading candidate for part of how a brief event leaves a durable trace. A smaller, prolonged calcium rise instead preferentially engages phosphatases (calcineurin and, downstream, PP1), which dephosphorylate the same targets and drive AMPA receptors out. Kinase versus phosphatase, set against calcium concentration: that is the dual-threshold model, associated especially with John Lisman, and while real synapses are messier than any two-enzyme cartoon, the principle — opposing biochemical processes with different calcium sensitivities — is the cleanest way to understand how one messenger yields two verdicts.

The receptor movements have their own rich biology. AMPA receptors are not static furniture; they cycle continuously between the membrane and internal pools and diffuse laterally within the membrane, and LTP/LTD bias this traffic toward insertion or removal at the synapse (work associated with Malenka, Malinow, Nicoll, and others). A striking special case is the silent synapse: a nascent connection that has NMDA receptors but no AMPA receptors, and so passes no current at the resting potential — it is electrically silent until the day an LTP-inducing event inserts its first AMPA receptors and “unsilences” it. This is plasticity creating a functional synapse where there was effectively none, and it is especially prominent in development. Finally, lasting potentiation is accompanied by structural change: the spine physically enlarges as it strengthens and shrinks as it weakens, so that the synapse’s electrical weight and its anatomical size move together — memory written not only in receptor counts but in the shape of the dendritic tree.

6.4 Timing is everything: spike-timing-dependent plasticity

Hebb’s buried causal arrow now resurfaces and demands a sharper rule. If what should be strengthened is the input that helped cause the output, then the bare fact of co-activation is not enough — order must matter. An input that arrived just before the cell fired might have contributed to that firing and deserves credit; an input that arrived just after could not possibly have helped cause a spike that had already happened, and deserves none. In the late 1990s, experiments by Henry Markram and by Guo-qiang Bi and Mu-ming Poo, among others, showed that real synapses can follow exactly this logic, with a precision measured in milliseconds. The phenomenon is spike-timing-dependent plasticity, or STDP, and it is Hebb’s postulate made temporal.

The rule is simple to state and worth fixing in mind. If the presynaptic spike precedes the postsynaptic spike by a short interval — up to roughly twenty milliseconds — the synapse is potentiated, and the closer the timing, the stronger the effect. If the order is reversed, the postsynaptic spike preceding the presynaptic one within the same brief window, the synapse is depressed. Outside the window, in either direction, little happens. So the sign of the change flips with the order of firing, and flips sharply right around zero: a few milliseconds’ difference in timing is the difference between strengthening a synapse and weakening it. The synapse is not asking “were we both active?” It is asking “did my input lead the cell’s output, or trail it?” — and answering, in the leading case, as if to say I may have helped, strengthen me, and in the trailing case, I came too late to matter, weaken me. This is the canonical pair-based STDP rule, seen most clearly in certain excitatory synapses under controlled, low-frequency pairing; real synapses can modify it considerably depending on dendritic location, burst structure, firing rate, and neuromodulatory state, and at high firing rates the order-dependence can give way to net potentiation regardless of timing — so take the clean rule as the central case, not an invariant law (the deeper dive maps where it frays).

The mechanism for this exquisite timing sensitivity is the very signal the fast chapter flagged as the chapter’s parting gift. The postsynaptic depolarization that unplugs the NMDA receptor’s magnesium is supplied, in large part, by the back-propagating action potential — the wave that washes back over the dendrites when the cell fires. So the coincidence the NMDA receptor detects is, concretely, the overlap in time between glutamate in the cleft (the presynaptic spike) and the back-propagating spike sweeping past (the postsynaptic firing). If glutamate arrives just before the back-propagating spike, the two overlap while glutamate is still bound, the magnesium pops, calcium pours in fast and hard, and the synapse potentiates. If the back-propagating spike has already come and gone before glutamate arrives, the overlap is poor, calcium enters only as a weak trailing dribble, and the synapse depresses — the small-calcium verdict from the previous section. The timing rule and the calcium switch are the same mechanism seen from two angles. The back-propagating spike is not merely a copy of the neuron’s output sent backward for information’s sake; it is the postsynaptic half of a coincidence detector, and its arrival time, relative to the input, is what STDP measures.

The canonical STDP curve — the one reproduced in every textbook — comes largely from Bi and Poo’s 1998 experiments in dissociated hippocampal cultures. Plotting the change in synaptic strength against the time difference \Delta t = t_{\text{post}} - t_{\text{pre}} yields a sharp, asymmetric, roughly double-exponential shape: positive \Delta t (pre leads post) gives potentiation that decays over about twenty milliseconds, negative \Delta t (post leads pre) gives depression over a similar window, and the curve switches sign almost discontinuously at \Delta t = 0. It is one of the most reproduced figures in cellular neuroscience, and for good reason: it is a clean, quantitative, almost suspiciously tidy confirmation that synapses care about millisecond order.

It is also less universal than its fame suggests, and a careful reader should know where it frays. The shape depends on where on the dendrite the synapse sits, because the back-propagating spike weakens as it climbs into the dendritic tree, so distal synapses see a smaller, later, less reliable depolarization than proximal ones — the same cable physics that shaped the fast chapter’s account of summation. It depends on frequency: the simple pair-based rule holds at low pairing frequencies, but at higher frequencies the order-dependence tends to wash out and net potentiation dominates regardless of sign, so STDP is really one regime of a more complex frequency- and calcium-dependent picture (work associated with Sjöström and others). It often requires bursts rather than single spikes to engage robustly, and its status as the operative rule in the intact, behaving brain — as opposed to the dish — remains genuinely debated. STDP is best held as a true and illuminating principle about coincidence and order, not as a universal law that every synapse obeys in the same shape. It is, in the unit’s recurring phrase, a powerful first approximation with real and well-mapped exceptions.

6.5 The synapse talks back: retrograde signaling

Everything so far has the postsynaptic cell adjusting itself — inserting and removing its own receptors, reading its own calcium, reshaping its own spines. But the cells chapter and the fast chapter both promised a second direction of traffic, “a mechanism we will meet again when we discuss synaptic plasticity”: signals running backward across the synapse, from the postsynaptic cell to the presynaptic terminal. The canonical arrow of the chemical synapse points one way, presynaptic to postsynaptic, and one of the places the neuron doctrine leaks is that this arrow is not the only one. The postsynaptic cell can answer.

The clearest case is a class of molecules called endocannabinoids — so named because they are the brain’s own internal ligands for the same receptors that the cannabis plant’s THC activates. They are made on demand: when a postsynaptic neuron is strongly active and its calcium rises, it synthesizes these lipid messengers in its membrane and releases them, and — because they are fat-soluble — they diffuse the short distance backward across the cleft to the presynaptic terminal. There they bind receptors (the CB1 receptor) that suppress further transmitter release. The consequence depends on which terminal carries those CB1 receptors, and this is the part a generic “brake” framing gets wrong. Suppress a glutamatergic terminal and you turn down excitation onto the cell — a true brake. But suppress a GABAergic terminal and you reduce inhibition, which briefly disinhibits the cell — the opposite of a brake. The two cases even have their own names: DSE, depolarization-induced suppression of excitation, and DSI, depolarization-induced suppression of inhibition. What is common to both is the direction of the message: a strongly active postsynaptic cell reaches backward to adjust the terminals feeding it, dialing its own inputs up or down at their source depending on which inputs bear the receptor. This is genuine retrograde transmission, the promised reversal of the synapse’s usual direction, and it shows that the postsynaptic cell is not merely a passive recipient tuning its own end of the junction — it is an active participant that can regulate its inputs where they originate.

Retrograde signaling matters here for two reasons that point in opposite directions in the chapter. First, it is itself a form of plasticity: some lasting forms of synaptic weakening are expressed presynaptically, through endocannabinoid signaling that durably reduces release — which is exactly the presynaptic locus of change that Hebb left open and that the field argued over for decades. Plasticity is not only a postsynaptic receptor count; it can be a presynaptic release probability, adjusted by a message sent backward. Second, and setting up the turn the chapter is about to make, retrograde signaling is one of several routes by which a synapse’s strength is held under negative feedback rather than left to the runaway logic of pure Hebbian potentiation — though, as the DSI case shows, the sign of that feedback depends on the circuit, and the real work of keeping the whole system stable falls to a broader set of mechanisms we turn to next. That problem of runaway is the hinge on which the rest of the chapter turns.

The two principal endocannabinoids in the brain are 2-arachidonoylglycerol (2-AG) and anandamide (the latter named from the Sanskrit ananda, “bliss”). Unlike classical transmitters, they are not pre-synthesized and stored in vesicles; they are cleaved from membrane lipid precursors on demand, in response to postsynaptic depolarization and calcium influx and to activation of postsynaptic metabotropic glutamate receptors — the doorbell receptors of the last chapter, here triggering lipid synthesis rather than opening a channel. Made on the spot, they cross to the presynaptic terminal and act on CB1 receptors, among the most abundant G-protein-coupled receptors in the brain, to inhibit transmitter release. The textbook demonstrations are DSI and DSE — depolarization-induced suppression of inhibition and of excitation — in which strongly depolarizing the postsynaptic cell briefly quiets its own incoming GABAergic or glutamatergic terminals for a few seconds, a transient retrograde effect visible directly in a recording.

This system is also the reason cannabis touches memory and cognition, which ties the section back to the previous chapter’s theme that nearly every psychoactive drug works by leaning on the brain’s own signaling. THC is a CB1 agonist; it does not introduce a foreign mechanism so much as hijack an endogenous retrograde-signaling system that normally operates locally, transiently, and on demand, flooding it instead with a diffuse, sustained, exogenous drive. Because that endogenous system participates in regulating synaptic plasticity and the timing of release across wide regions including the hippocampus, blanketing it with an outside agonist perturbs exactly the machinery this chapter is about — which is the cellular root of the familiar effects of the drug on short-term memory and the formation of new memories.

6.6 Why correlation is not enough

We now have a complete, self-consistent, and genuinely powerful account of synaptic learning — and it is not enough to explain a learning animal. The gap is worth stating sharply, because the rest of the unit exists to close it, and because seeing the gap clearly is what makes the repair intelligible rather than arbitrary. A Hebbian synapse, even sharpened by STDP and stabilized by retrograde feedback, suffers from two limitations, and they are different in kind.

The first is instability, and it is built into the rule’s logic. Hebbian potentiation is positive feedback: a synapse that helps fire its target gets stronger, which makes it better at firing its target, which makes it stronger still. Left unchecked, the strong synapses run away to saturation and the weak ones collapse to nothing, and a network that has done this has stopped learning, because it can no longer change in response to anything new. Real synapses plainly avoid this fate, which means real brains must run counter-mechanisms — the retrograde feedback of the previous section, and, more globally, homeostatic processes by which a neuron monitors its own average activity and scales its synapses up or down to keep itself in a workable range. These are essential, but they are housekeeping: they keep the Hebbian rule from destroying itself. They do not address the second limitation, which is the deep one.

The second limitation is that a Hebbian rule is silent about value. It records what co-occurs, faithfully and indiscriminately — but co-occurrence is not the same as importance, and certainly not the same as usefulness to the animal. A rat’s brain registers a thousand correlations a minute, the overwhelming majority of which mean nothing for its survival; a purely Hebbian system would strengthen the synapses encoding all of them equally, learning the irrelevant with the same diligence as the crucial. What it cannot do, on its own, is learn that this pattern was the one that led to food and that one led nowhere — because nothing in the local coincidence of pre- and postsynaptic firing carries any information about outcome. And the problem is made worse by time. The outcome that would tell a synapse whether its activity mattered — the food, the escape, the reward — typically arrives seconds after the neural activity that earned it, by which point the coincidence that should be credited is over and gone. The synapse that fired at the right moment has no way to know, at the moment it fired, that it was about to be vindicated; and by the time vindication arrives, the moment has passed. This is the credit-assignment problem, and it is the problem the overview promised this unit would end on: how can a brain assign credit for a reward that arrives only after the act that earned it?

A correlational rule cannot solve this, because it has no notion of reward and no way to bridge the delay. Solving it requires two new things that Hebb’s local rule does not contain. It requires a signal that reports value — a global broadcast that says that was better, or worse, than expected — and it requires a way for a synapse to hold its eligibility open across the gap in time, so that a verdict arriving seconds late can still find the synapse that earned it. We have, conveniently, spent the last chapter building the first, and the fast chapter built the raw material of the second. The synthesis is the rest of this chapter.

6.7 The eligibility trace

Take the delay problem first, because it has a clean and almost obvious solution once stated, and because the fast chapter already handed us the mechanism. The trouble is that the synapse’s moment of relevant activity and the moment of reward are separated in time. The solution is to make the activity leave a trace — a fading mark that lingers for a few seconds after the Hebbian condition is met, so that there is still something present for a late-arriving signal to act on. A synapse that has just satisfied the coincidence condition is left, for a brief window, in a flagged or eligible state: not yet changed, but marked as a candidate for change, with the mark decaying over seconds. If a reinforcement signal arrives while the flag is still up, the synapse consolidates its change; if no such signal arrives, the flag fades and the synapse returns to where it was, the candidate change quietly abandoned. This decaying flag is the eligibility trace, and it is the bridge across the temporal gap.

Notice how exactly this matches the two clues we opened with, and how it cashes out the lecturer’s framing that has run through the unit’s last two chapters. The back-propagating action potential is one way the postsynaptic side of the coincidence gets supplied — by carrying news of the cell’s firing back over recently active synapses, it helps leave each participating synapse marked as I was active a moment ago, and the timing was right. But the eligibility trace is not the electrical spike lingering in the dendrite; the spike is long gone within milliseconds. The trace is better understood as a short-lived biochemical state at the recently active synapse — a configuration of molecules, set in motion by the pre- and postsynaptic coincidence, that persists for seconds and decays. The back-propagating spike helps set the flag; the flag itself is chemistry, not voltage. And the reinforcement signal that reads it, arriving seconds later to decide the marked synapse’s fate, is the slow modulatory broadcast — dopamine chief among them. The fast layer marks what happened, and a chemical residue of the mark persists; the slow layer, arriving late, decides what it meant. The eligibility trace is the physical embodiment of that division of labor, the thing that lets “arriving late” still work.

How real is this flag? The concept comes originally from the mathematics of reinforcement learning, where an eligibility trace is exactly the device that lets a delayed reward be assigned to an earlier action, and for years it was more a theoretical necessity than an observed object. The biological evidence has caught up, most cleanly in the striatum, where the dopamine signal has been shown to strengthen recently active corticostriatal synapses only if it arrives within a narrow window — on the order of a second or two — after the activity, exactly as an eligibility trace of that duration predicts. The molecular identity of the trace is still being worked out, and there is more than one timescale of it, but the principle now has experimental ground under it: a synapse can hold its eligibility open for a short, definite window, waiting to learn whether what it just did was worth keeping.

There is a second, slower phenomenon that embodies the same eligibility logic at the timescale of memory consolidation rather than second-by-second reinforcement, and it is worth pairing with the eligibility trace because each illuminates the other. In 1997 Uwe Frey and Richard Morris demonstrated synaptic tagging and capture. The puzzle they addressed is that lasting, “late” LTP requires the synthesis of new proteins, but protein synthesis is a relatively global, cell-wide affair, while LTP is synapse-specific — so how do the newly made proteins know which synapses to act on? Their answer: a synapse that undergoes plasticity sets a local tag, a transient, synapse-specific mark (lasting on the order of an hour or two) that does not by itself trigger protein synthesis. A sufficiently strong event — at one synapse, or elsewhere on the cell — triggers the cell-wide production of plasticity-related proteins. Any synapse still bearing a tag can then capture those proteins and use them to convert its transient change into a lasting one. Tagged synapses consolidate; untagged ones do not, even though the proteins were available to all.

The structural parallel to the eligibility trace is exact: the tag is an eligibility flag, and the protein-synthesis-triggering event is a consolidation gate, and only a synapse that is both flagged and gated undergoes lasting change. The two ideas operate at different timescales — the dopamine-reinforcement eligibility trace is a matter of seconds, the synaptic tag a matter of an hour — and they may well be distinct molecular phenomena. But they are the same architectural solution to the same problem, the problem of letting a global, late signal act selectively on the specific synapses that earned it. And the gate is, suggestively, often neuromodulatory: dopamine acting at D1/D5 receptors is one of the signals that can drive the synthesis and capture of plasticity-related proteins, which is the cellular point of contact between the consolidation machinery here and the reward machinery of the next two sections.

6.8 Dopamine and the reward-prediction error

The eligibility trace supplies a synapse waiting to be told whether it should keep its change. What signal does the telling? The overview and the last chapter have been pointing at the answer for some time: a modulatory broadcast that reports value, released by volume, slow enough to act on a trace. The best-understood such signal is dopamine, and the discovery of what it actually encodes is one of the most satisfying results in modern neuroscience, because it turned out to match a piece of theory that had been developed entirely independently.

The naive guess is that dopamine signals reward — that the midbrain dopamine neurons fire when something good happens. The work of Wolfram Schultz and colleagues, recording from midbrain dopamine neurons (in the substantia nigra and ventral tegmental area) in monkeys learning to associate cues with juice rewards, showed that the truth is more interesting and more useful. The neurons do fire to an unexpected reward. But once an animal has learned that a particular cue predicts the reward, the dopamine burst moves backward in time to the cue, and the reward itself — now fully predicted — evokes no burst at all. And if a predicted reward is then withheld, the neurons pause, dropping below their baseline rate at exactly the moment the reward was expected and did not come. Put these together and a pattern is unmistakable: in these classic conditioning tasks, many midbrain dopamine neurons behaved less like simple reward detectors and more like reward-prediction-error neurons — signaling the difference between the reward received and the reward expected. A better-than-expected outcome drives a positive burst; an as-expected outcome drives nothing; a worse-than-expected outcome drives a dip. The phasic dopamine signal, at least in large part, is a running commentary on surprise about value.

This is exactly the teaching signal a learning system needs, and exactly the one a Hebbian rule lacks. A reward-prediction error is, by construction, a measure of what the animal has not yet learned: it is large when prediction is poor and shrinks toward zero as prediction improves, switching itself off precisely when there is nothing left to learn. Broadcast by volume across the striatum and cortex, read slowly through metabotropic dopamine receptors, arriving in the seconds after the activity that might have earned it, the phasic dopamine signal is built to find eligibility traces and tell them whether to consolidate. And it exploits the tonic-versus-phasic distinction the last chapter built on the overview’s affinity principle: a tonic dopamine level sets background motivational tone, while the brief phasic burst, riding above that baseline, delivers the event-specific error signal. The same molecule says one thing steadily and another thing in a spike, read apart by receptors of differing affinity — the affinity principle, one last time, turned into a learning signal.

The theory that dopamine turned out to match was built by psychologists and computer scientists who were not studying dopamine at all. In 1972 Robert Rescorla and Allan Wagner proposed that associative learning is driven not by mere co-occurrence but by surprise. In their model, the change in the associative strength V of a cue is proportional to the gap between the outcome actually received, \lambda, and the outcome already predicted:

\Delta V = \alpha\,(\lambda - V)

where \alpha is a learning rate. The bracketed term is a prediction error: learning is fast when prediction is poor (\lambda - V large) and stops when prediction is perfect (\lambda = V, so \Delta V = 0). When several cues are present at once — the case needed to explain blocking — what matters is the total prediction made by all of them together, so the error term becomes \lambda - \sum_i V_i, the gap between the outcome and the summed prediction of every cue present. This single idea explained phenomena that simple contiguity could not — most famously blocking (Kamin): if one cue already predicts a reward, a second cue introduced alongside it learns nothing, because the first cue has already driven \sum_i V_i up to \lambda, leaving no error and so no surprise to drive learning about the newcomer. Co-occurrence is present in full; learning does not happen, because the error is gone. This is the precise sense in which value-based learning is not Hebbian: the controlling variable is a global error, not a local correlation.

Rescorla–Wagner handles outcomes but not timing within a trial. The extension that does is temporal-difference (TD) learning, developed by Richard Sutton and Andrew Barto. TD treats the prediction error as a comparison across successive moments:

\delta_t = r_t + \gamma V(s_{t+1}) - V(s_t)

where r_t is any reward received now, V(s) is the estimated future value of a state, and \gamma is a discount factor weighting future reward against present. The crucial term is \gamma V(s_{t+1}): because the error compares the current prediction to the next moment’s prediction plus reward, value can propagate backward in time across a sequence, so that a cue predicting a later reward gradually acquires value of its own. That backward propagation of value is exactly what Schultz observed when the dopamine burst migrated from the reward to the cue that predicted it. The identification, proposed by Montague, Dayan, and Schultz in the 1990s, is that phasic dopamine approximates \delta_t — the TD error. A theory developed to make machines learn from delayed reward had turned out to describe, with surprising accuracy, a major component of how midbrain dopamine neurons behave — one of those rare moments when an abstraction invented for engineering reasons lands squarely on a piece of biology. Modern work has refined rather than overturned this: dopamine neurons are more heterogeneous than a single scalar error suggests, many also carry movement, salience, and ramping signals, the signals differ across striatal territories, and there is evidence the brain represents a distribution of prediction errors rather than one average. The core correspondence between phasic dopamine and a reward-prediction error remains one of the field’s load-bearing results; the claim that dopamine is nothing but a scalar RPE does not, and the horizons box at the chapter’s end says where that line now sits.

6.9 The three-factor rule: putting value into Hebb

We can now write down, in words, the rule the chapter’s reinforcement half has been building toward — and be careful to say what it is and is not the rule for. It is not a replacement for the Hebbian LTP of the hippocampus, which, as we saw, runs perfectly well on two factors and no dopamine. It is the rule for a particular and important job: reinforcement-gated plasticity, the kind that learns from reward, seen most clearly at corticostriatal synapses. For that job the rule becomes three-factor rather than two-factor. The first two factors are Hebb’s: the presynaptic cell was active, and the postsynaptic membrane was driven, in the right order and close enough in time — the coincidence that leaves an eligibility trace. The third is new: a neuromodulatory signal reporting value — dopamine carrying a reward-prediction error — arriving while the trace is still open. The first two factors mark which synapse is a candidate and how it is leaning; the third factor strongly influences whether the candidate change is kept, and in which direction. (Only “influences,” not “single-handedly decides”: the sign and size of the outcome also depend on the dopamine receptor class, the cell type, the local microcircuit, acetylcholine, endocannabinoids, and timing — the third factor is decisive but not solitary.) The relationship to Hebb is the one this chapter has insisted on from the start: not correction but completion. Two-factor Hebbian plasticity is exactly right about correlation and entirely silent about value; the third factor supplies the value, and the eligibility trace lets it arrive late enough to be useful.

The cell where this story is cleanest is one the cells chapter already introduced and the last chapter set in place. The medium spiny neuron of the striatum sits, by its anatomy, at the precise convergence the rule requires: it receives massive glutamatergic input from the cortex — what is happening, what the animal is doing and sensing — and it receives dopaminergic input from the midbrain — how good or bad that turned out to be. A corticostriatal synapse onto a medium spiny neuron is a place where Hebbian coincidence (cortical glutamate meeting postsynaptic depolarization) leaves an eligibility trace, and where a dopamine signal arriving shortly after can convert that trace into a lasting change whose sign depends on the dopamine. It is, almost diagrammatically, a three-factor synapse — the cellular substrate where the brain stamps in the actions that paid off and stamps out the ones that did not. The basal ganglia loop the cells chapter sketched for “action selection” is, seen through this chapter’s lens, a machine for reinforcement learning, with the medium spiny neuron as its learning element and dopamine as its teacher.

This is the architecture of reinforcement, and it answers the immediate form of the question the overview promised to end on. A brain assigns credit for a recently delayed reward by having its synapses hold their eligibility open for a second or two after they act, and by broadcasting, when the outcome arrives, a global signal of how surprising-in-value that outcome was — a signal that finds the still-open traces and tells them what their activity was worth. This solves the seconds-scale version of credit assignment; longer delays need additional, circuit-level machinery we will come to in a moment. The fast layer marks what happened and keeps a chemical trace of the mark briefly alive; the slow layer, arriving late, reads the trace and decides what it meant. Two systems, two timescales, built for two different jobs, and learning is what happens where they meet.

The three-factor rule is one of the most productive ideas in contemporary neuroscience, and it is also a place where the textbook version is cleaner than the literature, in a way this unit has made a habit of flagging. It is worth being precise about where the ground is firm.

The firmest ground is the striatum. Corticostriatal plasticity onto medium spiny neurons depends on dopamine in a well-documented, sign-dependent way, and the two populations of medium spiny neuron the cells chapter implied — those expressing D1 receptors and those expressing D2 receptors — respond to dopamine with broadly opposite plasticity. In the classic simplified model, the D1 population belongs to the “direct,” roughly go pathway and the D2 population to the “indirect,” roughly no-go pathway, so that the same dopamine signal can strengthen “do that again” synapses while weakening “don’t” ones. That cartoon is the right first picture and also genuinely a cartoon: both pathways are in fact active during movement, the D1/D2 segregation is not absolute, and the actual plasticity depends heavily on cholinergic interneurons and local microcircuit state. Direct evidence for an eligibility trace is also strongest here: experiments timing dopamine relative to synaptic activity have found a roughly one-to-two-second window during which dopamine can reinforce recent activity (work associated with Yagishita and Kasai). For the basal ganglia, the three-factor rule with an eligibility trace is not just a model; it has real mechanistic support.

The ground is softer elsewhere. In the hippocampus, classical NMDA-dependent LTP — the LTP of Bliss and Lømo — is induced perfectly well without dopamine; what dopamine (and other neuromodulators) chiefly does there is regulate the persistence and consolidation of LTP, gating the late, protein-synthesis-dependent phase and the synaptic capture described earlier, rather than gating induction itself. So the third factor in the hippocampus tunes how durable a Hebbian change becomes more than whether it occurs. In the neocortex, a clean, universal three-factor rule — one in which a dopamine-like error gates plasticity at every synapse — is more a theoretical aspiration than an established fact; cortical LTP and STDP have been characterized largely without it, neuromodulators clearly influence cortical plasticity but the mapping is not the tidy striatal one, and much of the appeal of a universal cortical three-factor rule comes from its computational attractiveness rather than from direct demonstration. The honest summary is the one this unit keeps reaching: the three-factor rule is a powerful and partially confirmed synthesis whose strongest evidence lives in the basal ganglia, and treating it as a single law obeyed identically by every synapse in the brain is the kind of tidy overreach the biology has not earned. It is, like the neuron doctrine and the canonical list of neuromodulators, a true and illuminating principle with seams — and knowing where the seams run is what separates understanding the idea from merely repeating it.

6.10 Where this leaves us — and where the unit ends

We set out to explain how a synapse learns, and we can now state the answer as the assembly of parts the unit spent four chapters preparing. At an excitatory glutamatergic synapse, a synapse detects coincidence — presynaptic glutamate meeting postsynaptic depolarization — through the NMDA receptor, whose magnesium block lets calcium through only when transmitter and depolarization arrive together; and in the classic hippocampal case the amount of that calcium is the verdict, a large fast pulse strengthening and a modest slow one weakening, the change expressed largely by trafficking AMPA receptors into or out of the synapse and reshaping the spine that holds them. Timing sharpens coincidence into causation, the back-propagating spike serving as one source of the postsynaptic depolarization that makes order matter. The postsynaptic cell can answer its inputs through retrograde signals, and a broader set of homeostatic mechanisms keeps the whole positive-feedback system from running away. But correlation alone, however refined, is silent about value and helpless against delay — and so, at the synapses built for reward learning, the synapse holds an eligibility trace open for a second or two, and a global dopamine signal, reporting how much better or worse than expected the world just turned out, arrives to read that trace and bias what the activity was worth. Two factors of local coincidence, one factor of global value: the three-factor rule, which solves the seconds-scale version of credit assignment by letting a late signal find a still-open trace — and leaves the longer-range version for the circuits beyond this unit.

The unit’s threads all run through this last chapter, and it is worth naming them as we close the whole arc. The molecule-versus-receptor principle, pressed since the sponge, reached perhaps its sharpest cellular form here: glutamate the same at every synapse, yet the detector of learning at its NMDA receptor and the expression of learning at its AMPA receptors, while dopamine means one thing tonically and another in a phasic burst. The timescales that have organized the unit since the overview invoked Sapolsky are the very substance of the learning rule: a millisecond coincidence, a few-second eligibility trace, a seconds-late modulatory verdict, an hours-long consolidation into new protein and new spine — learning is precisely a conversation across timescales, the fast layer and the slow layer doing together what neither could do alone. The honest leaky category appeared again and again — Hebb’s rule true but incomplete, STDP real but not universal, the three-factor rule solid in the striatum and aspirational in the cortex — because that is what good biological principles are, and pretending otherwise would teach the wrong lesson about how this science actually stands. And the metabolic price the unit never let out of sight is present here too: every coincidence detected, every spine enlarged, every gradient spent and restored, is paid for in ATP by the pump that has been running underneath this entire unit, the same pump the sponge’s ancestors first switched on to keep from flooding.

One honesty remains, and it is the right note on which to end a unit rather than pretend to close a subject.

Plasticity is one of the fastest-moving areas in neuroscience, and a chapter this clean owes the reader a map of where the cleanness is earned and where it is a teaching convenience.

Reasonably settled:

  • Activity-dependent, long-lasting change in synaptic strength is real, and at excitatory glutamatergic synapses the NMDA receptor acts as a coincidence detector: ligand-and-co-agonist binding gates it, but at rest a magnesium block stops current, so calcium flows mainly when glutamate and postsynaptic depolarization arrive together.
  • LTP and LTD both exist and both matter; in the much-studied hippocampal CA1 case, postsynaptic calcium is a major control variable and a major expression mechanism is the trafficking of AMPA receptors into or out of the synapse, accompanied by structural growth or shrinkage of the spine.
  • Plasticity can be timing-dependent (STDP) in the canonical Hebbian direction — pre-before-post tends to strengthen, post-before-pre to weaken — at least under controlled low-frequency pairing in suitable synapses.
  • Synapses talk backward: endocannabinoids are genuine retrograde messengers, suppressing release at whichever terminal bears CB1 receptors (DSE at excitatory, DSI at inhibitory).
  • In classic conditioning tasks, midbrain dopamine neurons carry a reward-prediction-error-like signal, shifting from reward to predictive cue and dipping when a predicted reward is omitted — a major, repeatedly confirmed result.
  • At corticostriatal synapses, dopamine gates plasticity in a sign- and cell-type-dependent way, and there is direct evidence for a short (~1–2 s) eligibility window during which dopamine can reinforce recent activity. This is the firmest ground for a three-factor, reward-gated learning rule.

Genuinely unsettled, and presented as such:

  • How universal the calcium-amount rule is. The “big-fast-calcium = LTP, small-slow = LTD” account is an elegant first approximation for classic NMDA-dependent plasticity; LTD in particular has several distinct mechanisms, and outcomes depend on receptor subtype, dendritic location, prior activity, and neuromodulators (metaplasticity). Treat it as the central case, not a law.
  • Whether STDP is the brain’s operative rule in vivo. The textbook Bi–Poo curve comes largely from culture; in intact tissue the rule bends with firing rate, bursts, dendritic location, and voltage, and its behavioral relevance is actively debated.
  • The molecular identity of the eligibility trace. The trace is a biochemical state, not a lingering spike, but exactly which molecules hold it — and how its seconds-scale (reinforcement) version relates to the hour-scale synaptic tag of tagging-and-capture — is not settled.
  • How far the three-factor rule extends beyond the striatum. In the hippocampus, dopamine chiefly modulates the persistence and consolidation of LTP rather than gating its induction; in the neocortex, a universal dopamine-gated rule is more a computational aspiration than a demonstrated mechanism. The clean rule is strongest in the basal ganglia and softens as you move outward.
  • Whether phasic dopamine is “just” a scalar reward-prediction error. The RPE correspondence is robust, but dopamine neurons are heterogeneous and also carry movement, salience, and ramping signals; the signal varies across striatal territories; and recent work argues the brain may encode a whole distribution of prediction errors. A 2024 perspective in the field frames the original scalar-RPE account as enormously influential but, in its original form, too simple.

And, as always: the core of this chapter is solid. A coincidence-detecting receptor, a calcium-gated change expressed in receptor number and spine size, a timing rule with the Hebbian sign, a backward-talking synapse, a dopamine error signal, and an eligibility trace that lets a late reward find the activity that earned it — you can build on all of it. What remains open is mostly how universal each mechanism is, and how the pieces compose into whole-behavior learning — which is to say, exactly the frontier worth your curiosity.

The mechanism we have built assigns credit beautifully for a reward that follows an action by a second or two — but most of what an animal must learn is not so kind. The reward for a good decision may arrive minutes, hours, or a lifetime later, after a long chain of intervening actions, and a single few-second eligibility trace cannot reach across that gulf on its own. How a brain solves the full credit-assignment problem — how value propagates back across long sequences of states and actions, how the basal ganglia loops and the cortex and the dopamine system together implement something like the temporal-difference learning the deeper dive sketched — is a problem this unit has only opened, exactly as the overview promised: a first look forward, not a last word. We have built the single synapse that learns from a coincidence and a reward. Assembling many such synapses into a system that can learn to act, over long delays, toward distant goals is the work of the chapters and units still ahead. The molecule, as ever, stays the humble inherited thing it was in the sponge. What we have watched it acquire, across this unit, is a memory — and the beginnings of a reason to remember one thing rather than another.