Acting on the Seen World
Visually Guided Action, and Its Disorders
Where the dorsal stream goes
The previous chapter got a command to the muscle. But notice how artificial its movements were — reaches to a target on a screen, a sequence of presses, a signature. Real movements are almost never made in a vacuum. They are directed at things: you reach for this cup, grasp that handle, step over that root. To do so, the brain must take information about an object — its location, its size, its shape, all delivered by the eyes — and convert it into the right reach and the right grip. Movement and vision must be coupled. This chapter is about how, and about what we learn when the coupling breaks.
We have, in fact, already met the relevant machinery, in the unit on vision. Recall that visual information leaving the occipital lobe divides into two great streams. The ventral stream, running forward into the temporal lobe, is the pathway of recognition — the “what” system that tells you the object is a cup. The dorsal stream, running up into the parietal lobe, was originally christened the “where” system, but David Milner and Mel Goodale argued for a better description: it is the vision-for-action system, the “how” pathway, concerned not with identifying objects but with the visual guidance of movements toward them. And here is the point that matters now: the dorsal stream does not dead-end in the parietal lobe. It flows forward, from the posterior parietal cortex into the premotor cortex, forming a frontoparietal network whose business is precisely to turn seen objects into actions upon them.
That network faces a problem that is easy to overlook. The eyes report where an object is on the retina — and the retina moves every time the eyes move, the head turns, or the body shifts. But to reach for the object, the motor system needs to know where it is relative to the hand. Somewhere between seeing and reaching, the brain must translate the target out of the coordinates of the eye and into the coordinates of the body. This sensorimotor transformation is one of the central jobs of the posterior parietal cortex, and it is more subtle than it sounds.
You might imagine the brain solving the transformation the way a programmer would: take the target’s retinal position, look up the current eye and head angles, and compute the target’s position in body-centered coordinates by trigonometry. Some explicit recomputation of that kind. But that is not what the parietal cortex appears to do, and the alternative is a lovely example of this book’s recurring theme.
Richard Andersen and colleagues found that many posterior parietal neurons have ordinary retinal receptive fields — they respond to a stimulus at a particular spot on the retina — but that the size of their response to that stimulus is modulated by the current position of the eyes in the head. This modulation is called a gain field. No single neuron explicitly encodes “the target is here relative to the body.” But because each neuron’s activity jointly reflects both where the stimulus is on the retina and where the eyes are pointed, the population of such neurons implicitly carries the target’s location in body-centered coordinates, available to be read out by the structures downstream that drive reaching.
Notice the shape of the explanation once again. The transformation is not computed by an algorithm running on top of the neurons; it falls out of the structure of the population’s responses. As we keep finding, from the central pattern generator to the dynamics of M1 to this, the architecture does not implement the computation as a separate step. The architecture is the computation.
Reaching for the world
When you pick something up, you are really doing two things at once, and the brain treats them as partly separate problems. Reaching transports the hand to the object’s location. Grasping shapes the hand for the object’s form — the fingers preforming an aperture matched to the object’s size, the grip configured for its shape, all of this happening during the reach, before contact. The frontoparietal network handles both, with parietal areas and premotor cortex working together; reaching for something, as one naturally says, is always motivated behavior — you are reaching for something, with a purpose.
Within this network we begin to find neurons of a kind we did not see in the primary areas — neurons that fuse information across the senses. A single parietal neuron might respond both to a touch on a particular part of the body and to a visual stimulus near that same body part; move a panel across the monkey’s view and it fires, touch the animal in the corresponding direction and it fires again. These cells are mapping not the world at large but the space immediately around the body — the peripersonal space in which action happens, the reachable, touchable zone where seeing and doing meet. This is what the convergence of vision and touch is for: not perception in the abstract, but the guidance of movement in the space where the body can act.
The network is also sensitive to something the philosopher and psychologist J. J. Gibson — whom we met in Unit I — called affordances: the features of an object that offer themselves up for action, the natural places and ways to grip it. A long thin object affords being grasped near its middle; an awkwardly shaped one does not afford being picked up by its tip, because you could not support it that way. You compute affordances constantly and without a flicker of deliberation. Hand someone an unfamiliar object and watch: they pick it up along a sensible axis, by a workable purchase, automatically, as though the object’s graspable structure were simply visible to the hand. It is the dorsal stream, doing its job. Gibson’s intuition — that the environment presents the organism not with neutral shapes to be analyzed but with possibilities for action — turns out to have a home in the frontoparietal system.
Goals in the parietal lobe
If reaching is always for something, you might wonder whether the brain’s reaching circuitry cares about the something — about the purpose of the act, not just its geometry. It does, and the evidence is one of the more striking findings in the area.
Picture a monkey reaching to pick up a small piece of food. The grasp itself can be held constant: same object, same reach, same closing of the hand. But the purpose of the grasp can be varied. On some trials the monkey picks up the food to bring it to its mouth and eat it; on others it picks up the same food to place it in a container. The movement of picking up is, at the start, the same. Yet some neurons in the parietal lobe fire when the animal grasps in order to eat but fall silent when it grasps the identical object in order to place — and other neurons do the reverse. A double dissociation, written in single cells: the firing is keyed not to the grasp but to the goal the grasp serves, the larger act it belongs to. Some of these neurons even begin to fire before the movement, while the animal is merely looking at the object — a signature of intention rather than execution.
The lesson reaches back to the very first page of this unit. We argued there that movement is, at bottom, adaptive — organized around the organism’s purposes, in the service of its needs. Here we find that principle expressed in the firing of individual neurons in a supposedly “sensorimotor” area. Even where the brain is busy with the concrete geometry of getting a hand to an object, it is also representing why — and the why shapes the how from the beginning.
When seeing and doing come apart
We have been treating “perceiving an object” and “acting on it” as two jobs done by two streams. If that division is real, then brain damage should be able to break one while sparing the other — in both directions. It can, and the resulting pair of cases is among the most instructive in all of neuropsychology.
Consider first the patient known as D.F., studied extensively by Goodale and Milner. Damage to her ventral stream left her with a profound visual form agnosia: she could not recognize objects by sight, and she could not report their visual properties. Shown a slot and asked which way it was tilted, she could not say; asked to hold up her hand and rotate it to match the slot’s orientation, or to open her finger and thumb to indicate an object’s size, she could not do that either. By every measure of conscious visual perception of form, she was severely impaired. And yet — when she was simply asked to post a card through the slot, her hand rotated smoothly and correctly to the slot’s angle as it approached; when she reached to pick up an object, her grip aperture scaled accurately to the object’s size as she reached. The visual information she could not consciously report was nonetheless reaching her hands and guiding them with precision. Her ventral “what” stream was destroyed; her dorsal “how” stream was intact, and it could see perfectly well for the purpose of action even though she could not see for the purpose of perception.
Now consider the mirror-image case: optic ataxia, which follows damage to the dorsal stream in the posterior parietal cortex. Such a patient has the opposite profile. Their ventral stream is intact, so they recognize objects without difficulty — they can tell you exactly what they are looking at. But they cannot use vision to guide the hand. They misreach, groping past or beside a target they can plainly see; and when they try to pick something up, their grip is wrong, oriented along the wrong axis, taking hold of an awkward object by exactly the point no one would choose — at the tip, where it cannot be supported. (Optic ataxia typically appears as part of a larger parietal syndrome, Balint’s syndrome.) Here perception is preserved and action is broken — the exact converse of D.F.
Put the two cases side by side and you have a double dissociation, the strongest form of evidence neuropsychology can offer that two capacities rest on separate machinery. Conscious visual perception and the visual guidance of action are not the same thing, served by one system that can be more or less damaged. They are distinct, and either can fall while the other stands. This is a deep and somewhat unsettling conclusion: the vision that guides your hand is not the vision you are aware of seeing.
That same surprising independence reappears in the domain of force, in the small demonstration we folded into the overview — the size–weight illusion. Lift two objects of equal weight but different size and you will feel the smaller one as heavier, stubbornly, no matter how often you lift them. Yet your hand quickly applies the same lifting force to both, having worked out that they weigh the same, even as your conscious experience insists otherwise. Once again the system that acts and the system that perceives have come apart, the motor system quietly correct, perception confidently wrong. It is the same moral as D.F. and optic ataxia, told in the language of grip force rather than reach and grasp.
Acting to understand
There is one more population of motor neurons whose discovery caused more excitement — and, in time, more overstatement — than perhaps any finding in modern neuroscience. In the premotor cortex of the monkey, in an area called F5 (the rough homologue of human inferior frontal cortex, which, as you will see when we come to language, overlaps with Broca’s area), Giacomo Rizzolatti and his colleagues found neurons that fire, as expected, when the monkey performs a particular action — say, grasping a piece of food. The surprise was that the same neurons also fire when the monkey merely watches someone else perform that action, making no movement itself. Some fire even when the monkey only hears the action — the sound of paper being torn — without seeing it at all. And, like the goal-selective neurons we just met, many are choosy about the purpose of the observed act, not merely its kinematics. These are the famous mirror neurons.
It is not hard to see why this electrified the field. It suggests a mechanism for one of the deeper puzzles about social life: how do we grasp what another person is doing, and why? The mirror-neuron proposal — sometimes called direct matching or simulation — is that we understand others’ actions by covertly running our own motor representations of those actions: you understand my reach because watching it activates the very circuitry you would use to reach yourself. From there it is a short and tempting step to imitation, to empathy, to the evolution of language out of a gestural system in the same frontal territory. We have, in fact, met a cousin of this idea already: when you see someone’s face contort in disgust, the same regions of your own insula activate that would activate if you yourself had tasted something foul — as if you understand their disgust by partially undergoing it. Shared representation between doing and observing may be a broad principle, and mirror neurons would be its motor instance.
The honest difficulty is that the strong human claims rest on much softer evidence than the monkey recordings. We cannot routinely record single neurons in the human brain, so the human case is built largely from indirect signs: the mu rhythm, a motor-cortex oscillation that is suppressed both when you make a movement and when you watch someone else make it; or brain-imaging studies showing, for instance, that expert dancers watching a familiar dance activate motor regions far more than non-dancers watching the same dance, as though their motor systems recognize what they themselves could do. Suggestive, but circumstantial. Mirror neurons had an enormous moment — for a while they were treated as the explanation for nearly everything social — and your author’s advice is to hold the excitement and the skepticism together, which the fold below tries to do.
Few findings have been simultaneously as influential and as overstated as mirror neurons, and separating the solid from the speculative is a useful exercise in scientific judgment.
What is reasonably solid. In the monkey, single neurons in premotor area F5 (and in inferior parietal cortex) genuinely do respond both during the execution of a specific goal-directed action and during the observation of the same action performed by another. That is a real and replicated phenomenon.
Where the trouble begins. The leap from “these neurons respond during observation” to “these neurons are how we understand actions” is a leap, not a deduction. A central worry is the direction of causation: perhaps the mirror response does not produce your understanding of an action but instead follows from an understanding you have already achieved by other means — the activity would then be a consequence of comprehension, not its cause. The strongest popular extensions have fared poorly. The “broken-mirror” theory of autism — the idea that autism stems from a dysfunctional mirror system — has not held up well against the evidence. And direct single-neuron confirmation in humans is sparse; a small number of mirror-like neurons have been recorded in patients during clinical monitoring, but the rich human story is still largely inferential. None of this means the phenomenon is unreal or unimportant. It means that the careful claim — some premotor and parietal neurons are active during both action and observation — is well supported, while the sweeping claim — mirror neurons are the basis of empathy, imitation, language, and social cognition — remains, at the time of writing, unproven and contested.
The disorders of skilled action
In the previous chapter we placed paralysis at the bottom of our diagnostic ladder — a failure of the path to the muscle. The disorders of this chapter sit higher, where action meets planning and perception, and they have a paradoxical character: the patient is not weak, the muscles work, the limb can move — and yet the person cannot carry out the skilled, learned, purposeful action they intend. This family of disorders is called apraxia, and it is, by definition, a failure of skilled action that cannot be blamed on weakness, on sensory loss, or on a failure to understand the request. The machinery of execution is intact; what is broken is the organization of the act.
The forms of apraxia are many — neuropsychology has a weakness for ever-finer subdivision — but two are worth knowing. In ideomotor apraxia, the patient cannot perform a familiar action to command or in pantomime. Ask them to show you how they would use a key, hammer a nail, wave goodbye, or signal traffic to stop, and they fumble, perseverate, produce a clumsy and spatially garbled version, or substitute a body part for the tool (using a finger as the comb rather than miming holding one). The revealing twist is that the same patient may perform the same action perfectly well when it arises naturally and automatically in context — reaching out and turning a real doorknob to leave the room without difficulty, then being unable to show you “how you turn a doorknob” when asked. In ideational (or conceptual) apraxia, the failure is one of sequence and tool logic: the patient cannot put the steps of a multi-stage action in the right order — attempting to put on the sock after the shoe, fumbling the ordered routine of getting dressed. (You will recognize the link to the supplementary motor area of the last chapter, the sequencer of action.) Other variants exist — a constructional apraxia of drawing and assembly, and the striking case of the musician who, asked to demonstrate how a chord is played, cannot, yet sits down and plays the piece flawlessly.
That last contrast is the key to the whole family, and it ties this chapter together. Notice how often the deficit is specifically in deliberate, explicit, on-demand performance, while the automatic, in-context, over-learned version of the very same action survives. The musician cannot demonstrate the chord but can play it; the patient cannot pantomime the key but can unlock the door. We have now seen this same dissociation three times over: D.F. could not consciously report an object’s orientation but could let it guide her reaching hand; the optic-ataxic patient could perceive the object but not act on it; and the apraxic patient can act automatically but cannot summon the action to command. In every case, a capacity and our deliberate access to that capacity have come apart. The brain holds knowledge of how to act — an abstract repertoire of skilled actions, separate from both raw execution and conscious perception — and that knowledge can be damaged on its own, or cut off from voluntary call-up while remaining available to automatic, situated behavior.
This is, in a sense, the fulfillment of the abstract motor plan we ended the last chapter with — the signature legible from hand, foot, or mouth. There is a level of the motor system that traffics in acts, not muscles and not even movements: in what it means to use a key, to comb, to dress. The apraxias are what we see when that level is injured. And like all lesion findings, they are messy — patients can do one thing and not another, succeed today and fail tomorrow — which is itself a useful reminder that the clean boxes of a textbook are idealizations laid over biological tissue that did not read the textbook.
With the cortical machinery of voluntary, visually guided, skilled action now in hand, we are ready to turn to the two great structures that sit alongside it and shape what it does — the basal ganglia, which choose among the actions on offer, and the cerebellum, which predicts and refines them.
What is well established. Visual information for the guidance of action travels in a dorsal stream that flows forward from occipital cortex through posterior parietal cortex into premotor cortex, forming a frontoparietal network for reaching and grasping. The visual guidance of action and conscious visual perception are dissociable, as shown by the double dissociation between visual form agnosia (perception impaired, action spared, as in D.F.) and optic ataxia (perception spared, action impaired). The frontoparietal network represents not only the geometry of actions but their goals. And apraxia is a genuine disorder of the organization of skilled action, distinct from weakness, sensory loss, or failure of comprehension.
What remains contested or unsettled. Whether human mirror neurons exist in the rich sense claimed, and whether mirroring explains action understanding rather than merely accompanying it, is unresolved and disputed; the broader claims linking mirror neurons to empathy, autism, and language are not well supported. Exactly how posterior parietal cortex carries out sensorimotor coordinate transformations — gain fields are part of the story, but only part — is still being worked out. The taxonomy of the apraxias is unsettled and probably over-divided, and the precise organization of the brain’s “how-to-act” knowledge (and its strong left-hemisphere bias) is incompletely understood. As always, the dissociations are cleaner in principle than in any individual patient.