learning to reach: a mathematical model · 2005-09-09 · learning to reach: a mathematical model...

13
Developmental Psychology Copyright 1996 by the American Psychological Association, Inc. 1996~ Vol. 32, No. 5, 811-823 0012-1649/96/$3.00 Learning to Reach: A Mathematical Model Neil E. Berthier University of Massachusetts at Amherst This article presents a mathematical model of the development of reaching that assumes that the major problem facing infants is their lack of lowerlevel motor control and that infants learn to adjust their reaching strategies as a consequence of their previous experience and to match their current level of control. The model hypothesizes that infant reaches are a series of submovements, with the goal being to get the hand to the target in the face of errors in executed submovements. To relate actual infant reaches to this model, reaching data were decomposed into submovements, using a polynomial fitting algorithm that assumed minimum-jerk submovements. The model makes quan- titative predictions about the course of development that are supported by existing results. The validity of the model's underlying assumptions was assessed by comparing the directional variability of the submovements with the variability assumed in the model. The development of motor control has increasingly been viewed as a significant problem in cognitive development (Lockman & Thelen, 1993). To succeed at reaching, infants must integrate visual, proprioceptive, and auditory input (Clifton, Rochat, Robin, & Berthier, 1994) and accommodate their movements to the external constraints of the task, both in terms of planning movements that will uncover objects or avoid obstacles and in terms of using appropriate muscle forces for the situation (Goldfield, Kay, & Warren, 1993; Thelen, 1994). Infants succeed even though they lack the fine motor control of adults and operate with a neuromuscular system that is contin- uously changing in size, strength, and neural connectivity. Even with these substantive difficulties, most infants successfully reach by about 16 weeks of age and are proficient at grasping small objects in the second half of the first year of life. Although the study of motor development has long been a part of developmental psychology (e.g., Gesell, 1929; McGraw, 1943), von Hofsten (1979) was the first to use modern equip- ment to examine the detailed kinematics of infant reaching. Von Hofsten focused his analysis on the hand-speed profiles of infant reaches. A hand-speed profile is a plot of the speed of the hand as a function of time during the reach. He found that in- fant and adult hand-speed profiles differ in fundamental ways. In simple reaching situations adults will reach with a single ac- celeration and deceleration of the hand, but von Hofsten found that infants show multiple accelerations and decelerations dur- ing reaches. Von Hofsten labeled these accelerations and decel- erations of the hand movement units. Von Hofsten's work has been confirmed and extended by Fetters and Todd (1987) and Mathew and Cook ( 1990 ). More recently, von Hofsten ( 1991 ) The author wishes to thank Andy Barto, Rachel Clifton, Michael McCarty, and David Rosenbaum for their helpful and insightful com- ments on this work, and to Daniel Robin and Daniel McCall for their help in the collection of data. This research was supported by NSF grant SBR 9410160 and N1H grant HD 27714. Correspondence concerning this article should be addressed to Neil E. Berthier, Department of Psychology, Tobin Hall, University of Mas- sachusetts, Amherst, Massachusetts 01003. Electronic mail may be sent via Internet to [email protected]. 811 longitudinally studied 5 infants and found that reaching trajec- tories were relatively straight within movement units, that the number of movement units decreased with age, and that the first movement unit of the reach lengthened and started to dominate the reach until the hand-speed profiles approached the appear- ance of the adult. Von Hofsten ( 1991, 1993 ) has suggested that movement units are actually the elementary action units of infant reaches. Be- cause each movement unit is defined by the acceleration of the hand, it reflects an expenditure of energy and may be the result of a control action on the part of the infant. Further support is provided by results showing that the hand usually changes direction between but not within movement units (Fetters & Todd, 1987; von Hofsten, 1979, 1991, 1993). However, Mathew and Cook (1990) showed that when movement units are de- fined as by von Hofsten ( 1991 ), there are a significant number of direction changes within movement units. Von Hofsten (1993) also suggested that action units of the reach are not the output of a single motor program because movement units have highly variable shape. Recently, others have focused on the dynamics of infant reach. Thelen et al. (1993) studied infants around the time of onset of reaching and confirmed von Hofsten's ( 1991 ) basic re- sults. Unlike von Hofsten ( 1991 ), who focused on the common features of development, Thelen et al. (1993) emphasized that different infants show different developmental paths. Thelen et al. ( 1993 ) found that 2 of their infants moved very slowly with highly damped movements at the onset of reaching and that 2 of their infants initially moved rapidly and energetically. By 22 weeks of age, all 4 infants showed similar reaching styles; the slow-moving infants had speeded up their reaches, whereas the energetic infants had slowed their reaches. Thelen et al. ( 1993 ) concluded that the central nervous system does not contain rigid programs that detail the kinematics of early reaching but that development of reaching is the result of the infant matching the dynamics of the arm to the task. Other data show that infant reaching is not simply a neural program that is triggered by the presence of a goal object, but that infants match the kinematics of their reaches to the task and their goals. Figure 1 shows two hand movements of a 6-

Upload: others

Post on 07-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Developmental Psychology Copyright 1996 by the American Psychological Association, Inc. 1996~ Vol. 32, No. 5, 811-823 0012-1649/96/$3.00

Learning to Reach: A Mathematical Model

N e i l E. B e r t h i e r University of Massachusetts at Amherst

This article presents a mathematical model of the development of reaching that assumes that the major problem facing infants is their lack of lower level motor control and that infants learn to adjust their reaching strategies as a consequence of their previous experience and to match their current level of control. The model hypothesizes that infant reaches are a series of submovements, with the goal being to get the hand to the target in the face of errors in executed submovements. To relate actual infant reaches to this model, reaching data were decomposed into submovements, using a polynomial fitting algorithm that assumed minimum-jerk submovements. The model makes quan- titative predictions about the course of development that are supported by existing results. The validity of the model's underlying assumptions was assessed by comparing the directional variability of the submovements with the variability assumed in the model.

The development of motor control has increasingly been viewed as a significant problem in cognitive development (Lockman & Thelen, 1993). To succeed at reaching, infants must integrate visual, proprioceptive, and auditory input (Clifton, Rochat, Robin, & Berthier, 1994) and accommodate their movements to the external constraints of the task, both in terms of planning movements that will uncover objects or avoid obstacles and in terms of using appropriate muscle forces for the situation (Goldfield, Kay, & Warren, 1993; Thelen, 1994). Infants succeed even though they lack the fine motor control of adults and operate with a neuromuscular system that is contin- uously changing in size, strength, and neural connectivity. Even with these substantive difficulties, most infants successfully reach by about 16 weeks of age and are proficient at grasping small objects in the second half of the first year of life.

Although the study of motor development has long been a part of developmental psychology (e.g., Gesell, 1929; McGraw, 1943), von Hofsten (1979) was the first to use modern equip- ment to examine the detailed kinematics of infant reaching. Von Hofsten focused his analysis on the hand-speed profiles of infant reaches. A hand-speed profile is a plot of the speed of the hand as a function of time during the reach. He found that in- fant and adult hand-speed profiles differ in fundamental ways. In simple reaching situations adults will reach with a single ac- celeration and deceleration of the hand, but von Hofsten found that infants show multiple accelerations and decelerations dur- ing reaches. Von Hofsten labeled these accelerations and decel- erations of the hand movement units. Von Hofsten's work has been confirmed and extended by Fetters and Todd (1987) and Mathew and Cook ( 1990 ). More recently, von Hofsten ( 1991 )

The author wishes to thank Andy Barto, Rachel Clifton, Michael McCarty, and David Rosenbaum for their helpful and insightful com- ments on this work, and to Daniel Robin and Daniel McCall for their help in the collection of data. This research was supported by NSF grant SBR 9410160 and N1H grant HD 27714.

Correspondence concerning this article should be addressed to Neil E. Berthier, Department of Psychology, Tobin Hall, University of Mas- sachusetts, Amherst, Massachusetts 01003. Electronic mail may be sent via Internet to [email protected].

811

longitudinally studied 5 infants and found that reaching trajec- tories were relatively straight within movement units, that the number of movement units decreased with age, and that the first movement unit of the reach lengthened and started to dominate the reach until the hand-speed profiles approached the appear- ance of the adult.

Von Hofsten ( 1991, 1993 ) has suggested that movement units are actually the elementary action units of infant reaches. Be- cause each movement unit is defined by the acceleration of the hand, it reflects an expenditure of energy and may be the result of a control action on the part of the infant. Further support is provided by results showing that the hand usually changes direction between but not within movement units (Fetters & Todd, 1987; von Hofsten, 1979, 1991, 1993). However, Mathew and Cook (1990) showed that when movement units are de- fined as by von Hofsten ( 1991 ), there are a significant number of direction changes within movement units. Von Hofsten (1993) also suggested that action units of the reach are not the output of a single motor program because movement units have highly variable shape.

Recently, others have focused on the dynamics of infant reach. Thelen et al. (1993) studied infants around the time of onset of reaching and confirmed von Hofsten's ( 1991 ) basic re- sults. Unlike von Hofsten ( 1991 ), who focused on the common features of development, Thelen et al. (1993) emphasized that different infants show different developmental paths. Thelen et al. ( 1993 ) found that 2 of their infants moved very slowly with highly damped movements at the onset of reaching and that 2 of their infants initially moved rapidly and energetically. By 22 weeks of age, all 4 infants showed similar reaching styles; the slow-moving infants had speeded up their reaches, whereas the energetic infants had slowed their reaches. Thelen et al. ( 1993 ) concluded that the central nervous system does not contain rigid programs that detail the kinematics of early reaching but that development of reaching is the result of the infant matching the dynamics of the arm to the task.

Other data show that infant reaching is not simply a neural program that is triggered by the presence of a goal object, but that infants match the kinematics of their reaches to the task and their goals. Figure 1 shows two hand movements of a 6-

8 12 BERTHIER

A

1400

1200 Bat

~ 1000

600

400 Reach 200

0 400 800 1200 1600 Time (ms)

B

120

g l00 g

80

~- 60

g 4o

2o a

Bat

J

1 0 460 860 1200 1600" 20'00

Time (ms)

Figure 1. Batting and reaching speed profiles for movements by a 6-month-old infant• The end of the time series is the time of contact lbr each of the movements• A: Both speed profiles are displayed on the same scale. The two movement units in the bat correspond to moving the hand back to the body and moving the hand from the body to contact the toy. B: A plot of the distance of the hand from the marker on the toy for the reach and bat. The reach occurred first, and the time delay from the end of the reach to the start of the bat was compressed on the figure, but was l s during the trial. The distance to the toy at the beginning and end of the bat is not equal because of slight movement of the toy and because the distance is measured to the position of the marker on the toy, not the toy itself. These data were obtained from the experiment described in Experiment 1.

mon th -o ld tha t resulted in contac t with a plastic toy held in front of the infant. One movement , labeled reach, started with the hand close to the body and resulted in or ien ta t ion of the hand with the toy and grasp of the object. The other movement , labeled bat, was a backward and forward movemen t of the hand tha t s tar ted with the hand in contac t with the toy. Dur ing the bat the infant moved the hand back to the body to approxi- mately the same posit ion as the beg inn ing of the reach and then moved the hand at high speed to momenta r i ly contac t the ob- ject. The two m o v e m e n t uni ts cons t i tu t ing the bat cor respond to the hand moving backward to the body and forward to the toy, respectively. In this par t icu lar exper imenta l session, the in- fant made several reaches and bats. W h e n the contact resulted in grasp of the toy, movemen t k inemat ics resembled those la- beled reach, when the toy was approached open-handed and contacted momentar i ly , the k inemat ics resembled those labeled bat. This example shows tha t infants use different k inemat ics when the goal of the reach is different.

Figure 1 suggests tha t there is someth ing abou t the task of reaching tha t leads the infant to adopt mul t ip l e -movement -un i t hand movements . Figure 1 shows tha t it is clearly possible for the infant to move the hand to the toy in a single movemen t uni t because tha t is what the in fan t does in the bat. One obvious difference between bat t ing and reaching is tha t at the end of a reach the infant is in a posi t ion to grasp the toy, bu t at the end of a bat the hand is no t in a posi t ion to grasp the toy because it is moving too rapidly. One hypothesis developed in this article is tha t infants show mul t ip l e -movement -un i t reaches because they are the most k inemat ica l ly efficient way to move the hand to grasp objects.

Thelen et al. ( 1993 ) proposed tha t the tai loring of movemen t k inemat ics to the task is one of soft assembler: So/? assembly means tha t the reacher marshals the dynamics of the body and the e n v i r o n m e n t to create a movemen t tha t is specific to the task at hand. This means tha t funct ion imposes cons t ra in ts on the action, tha t the behavior is self-organizing and optimizing, and tha t the system discovers good solutions according to task

demands. Von Hofsten (1993) has also described the p rob lem of moto r learning as a search of " task space" in which the in- fants explore the results of their actions and choose reaches tha t are the smoothes t or most economical•

The main goal of this article is to propose a mathemat ica l model of the development of reaching. The proposed model bui lds on the work of von Hofsten, Thelen, and others and makes predict ions about the k inemat ics tha t should be ob- served at various t imes in development. The model hypothe- sizes tha t infants use mult iple movement uni ts in reaching be- cause such reaches are relatively efficient given the state of thei r developing neuromuscu la r system. The generat ion of reaches is also hypothesized to occur th rough the interact ion of the infant with the env i ronmen t , using a trial and evaluat ion me thod that leads the infant to discover an optimal s t ruc tur ing of the reach. Data are presented that suppor t the basic assumpt ions of the model.

M o d e l

Infant reaching is modeled as a stochastic, or probabil is t ic , opt imal control p roblem. Meyer, Abrams , Kornb lum, Wright, and Smith (1988) have developed a s imilar model of adul t poin t ing exper iments . In a classic exper iment , Fitts (1954) found tha t adults adjust the speed of reaching to the accuracy of movement ; increasing demands for accuracy result in de- creases in reach velocities. To explain these data Meyer et al. assumed that adults a t t empt to move to a target zone in such a way as to min imize total movemen t t ime and that they reach using any n u m b e r of e lementa ry movements . Meyer et al. fur- ther assumed that rapid movements result in more endpo in t e r ror than slow movements . The Meyer et al. model produces results tha t match exper imenta l data.

The cu r ren t work extends Meyer et al.'s ( 1988 ) model to in- fants. We assume tha t infant movements are composed of a se- quence of e lementa ry units, tha t each of the submovements has some error tha t depends on the speed of the submovement , and

LEARNING TO REACH 813

that infants attempt to reach in a way that minimizes total movement time. The submovements of the current model differ from the movement units ofvon Hofsten ( 1979 ). Von Hofsten's movement units are operationally defined by an acceleration and deceleration of the hand and by definition do not overlap in time. The submovements in the current model do not require an acceleration and deceleration, and a submovement may be generated before the preceding submovement has been com- pleted. For computational simplicity, the implementation of the model presented here assumes that a submovement is com- pleted and the hand speed drops to zero before the next sub- movement is generated. It should be relatively simple to extend the current mathematical formulation to the case of overlap- ping submovements.

Infants are hypothesized to develop movement strategies through a process of"trial and evaluation" as modeled by con- nectionist (reinforcement learning) procedures. In reinforce- ment learning tasks, an agent takes actions in a particular envi- ronment (e.g., Barto, Bradtke, & Singh, 1995). Inherent in the task specifications is a goal that the agent is required to accom- plish. Reinforcement learning algorithms that solve these tasks work by executing actions and evaluating the consequences. If the agent takes an action that makes the attainment of the goal more likely, the probability of that action in the future is in- creased. Conversely, if the agent takes an action that makes at- tainment of the goal less likely, the probability of that action is decreased. Reinforcement learning algorithms generally ex- plore the effects of many possible actions to determine which action is the best. If a stochastic optimal control problem is ad- dressed by a reinforcement learning algorithm, it can be proven that the reinforcement learning algorithm will find an optimal solution under certain assumptions (Watkins & Dayan, 1992). Reinforcement learning algorithms have the advantage that in nonstationary environments the algorithm will adjust the ac- tion probabilities to select actions appropriate for the current environment. In the current context, where the infant's neuro- muscular system is constantly maturing and changing, rein- forcement learning algorithms will select the action that is most appropriate for the current state of the infant's development.

The model specifically assumes the following: 1. Infant reaches are composed of a sequence of

submovements. 2. Infants learn to reach when their cognitive, motor, and

perceptual systems are undergoing rapid development. Because of the dynamics of the arm, motor commands often have out- comes that are unanticipated by the young infant. At this time, the mapping between their motor commands and movements is constantly changing. Thus, infants learn to reach when they cannot accurately predict the consequences of particular motor commands. This uncertainty decreases with age.

3. Movement strategies of infants are adapted to deal with this uncertainty, and the strategies used by infants arise through a "trial and evaluation" process. The strategies infants use track changes in their neuromuscular system and are optimal or near- optimal strategies.

The model is specified by a mathematical model of the neu- romuscular system of the infant that has the key feature that the mapping between motor commands and movements is stochas- tic. The amount of uncertainty in the motor command to move- ment mapping is assumed to decrease with age and is modeled

by reduction of model stochasticity. A numerical learning algo- rithm that works by "trial and evaluation" is used to compute the movement strategies that bring the hand most reliably to the target in minimum time. The computed strategies for move- ment can then be compared with the strategies used by infants at various ages.

S imula t ion Exper imen t

For simplicity, the current simulations assume that the hand is moving in a plane. The model of the lower level motor systems has a single parameter k, which controls the degree of stochas- ticity in the mapping from motor commands to movements. The state of the system at step i = 0, 1 . . . . is an ordered pair si = (xi, Yi ) specifying the planar coordinates of the hand. Each coordinate is an integer between - 4 0 and 140. The goal is to move the hand into a 10 × 10 box centered on ( 100, 100). The motor command generated by the controller (higher motor systems) at each step is a triple, ai = (dxi , dyi, t~i), with dxi and dy~ being commanded distances in the x and y directions, respectively, and v~ being commanded average speed, dxi and dyi E {-60, -40, - 2 0 . . . . . 100}, and vi @ {80, 160, 240, . . . . 480 }. The present simulations use distance in mm and speed in mm/s. The motor command is applied to the model of the arm and state transitions are defined as

xi+l = xi + dxi + 7(0, 0"2) ( 1 )

Yi+l = Yi + dYi + 7(0, 0"2), (2)

where n(0, tr 2) is a Gaussian probability distribution with a mean of 0 and variance a 2 = kv~ + .2. k is constant between zero and one that controls the degree of stochasticity of the state transitions. Equations 1 and 2 define a model of the infant's arm in which the amount of uncertainty about intended movements increases with movement speed.

Q-learning was the particular reinforcement learning algo- rithm that was used to solve the stochastic optimal control problem stated above. Like other reinforcement learning algo- rithms, Q-learning works by trial and evaluation and has been shown to be computationally tractable. In the current context, Q-learning is not meant to be a specific model of infant learning but is simply used to efficiently compute movement strategies for various levels of stochasticity.

Q-learning works by computing a Q-function that estimates the cumulative costs of taking particular actions in particular states. The optimal action of each state is then the action that minimizes the total cumulative cost. Q-learning has been shown to converge to an optimal control strategy under certain condi- tions (Watkins & Dayan, 1992 ) and in practice has been shown to compute approximately optimal control strategies (Barto et al., 1995 ). A detailed description of the Q-learning procedure used in the current simulations is provided in the Appendix.

S imula t ion Results

A series of simulations were performed that differed only in the value of k, the arm model parameter that controls the de- gree of stochasticity. Computations with high stochasticity are meant to refer to infants just learning to reach, and those with

8 1 4 B E R T H I E R

A

C

E

140

120

i00

80

s o 4 0

20

0

140

120

100

80

60

40

20

0

- 2 o ',0

140

120

100

80

60

40

20

0

-2920

• °°° • •

• ...¶.,.'. ..~" . . • . .

. , , . ' . .

• " • .

• • * ! % ,* .

• ~ * % , . °% ,,

**

6 ~o go Ko ~'o ~o~o~o

• . ' ° ° • ° •

~ o.t. -, ." • . I I ~ I * *

I ° : ~, ° , oO

go 8'o 16o 1~o 14o

. ° ,t "* • . ~.::,,:...

B

D

F

140

120

i00

80

60

40

20

0

:o" 8"" 2'o

• . .. ° °, ,,

• . • .~ ~.'..~_

• ° L.~.. ~ • 4 - -

• . ,~,

40 60 80 i00 120 1~

140

120

I00

80

60

40

20

0

-20_

~'. ~'.:. " .. -. -.~.-.,... ,'...;."

% ** •,. •.. . .. , ~ ..~.

• . .*

0 0 20 40 60 80 100 120 140

. °

• :..~_~_~ •

• . - t.J-,,~.

s "

140

120

i00

80

60

40

20

0

- 2 9 2 0 6 2"0 4"0 go e'o 16o i~o 14o 6 2'o 4'0 6'0 8'o 16o ~oi~o

Figure 2. Hand positions at the end of each movement unit during 300 simulated reaches. The movements were controlled by a Q-learning-computed policy that approximated a time-optimal policy. Panel A shows the positions of the hand at the end of the first unit, Panel B at the end of the second unit, Panel C at the end of the third unit, and so on. If the hand reached the target, the reach was terminated, and hand positions were not plotted on subsequent panels. The starting position of the hand was (0, 0), and the target was the box centered at ( 100, 100). These reaches used an arm model with k = 20 × 10 5. Note that the optimal strategy is to move the hand about halfway to the target during the first movement unit and then use several small movements to reach the target.

low stochast ici ty are mean t to refer to older infants. Figure 2 shows a s u m m a r y of the k inemat ics o f 300 reaches with an a r m model with relatively high stochasticity. Each panel of the figure shows the (x , y) posi t ions o f the hand after a par t icu lar sub- m o v e m e n t across the 300 reaches. Panel A shows hand posi- t ions after the first submovement , Panel B after the second sub- movement , and so on. Each dot shows the hand posit ion for a par t icular reach. I f hand posi t ion was wi th in the box centered at ( I00, 100), the target was considered a t ta ined, and the reach was t e rmina ted . All reaches s tar ted at (0, 0 ) .

The s imula t ions with high a r m - m o d e l stochast ici ty indicate t ha t the opt imal strategy is to reach for the target using a se- quence o f s u b m o v e m e n t s o f relatively high speed. T he first un i t takes the hand abou t halfway to the target, and several smaller movemen t s take the hand the rest of the way. T he target is usu- ally a t ta ined wi th in six movemen t s units.

Figure 3 shows the k inemat ics o f 300 reaches with an a rm model of m u c h lower s tochast ici ty than Figure 2, Figure 3

shows tha t the opt imal strategy with low stochasticity is to reach for the target in one submovement , a strategy tha t works on a lmost all of the trials• Subsequent s imulat ions showed tha t the "sca t te r" seen in Panel B is the result of the coarseness of the data s t ruc ture for the Q-funct ion.

Figure 4 summar izes the results of the simulations. The mean n u m b e r ( top) , speed (midd l e ) , and dis tance (bo t tom) of move- men t uni ts in opt imal strategies is shown as a funct ion of de- creasing model stochasticity, which corresponds to increasing age of the infants. As expected, the s imula t ions showed that with increasing age the efficient strategy is to reach with fewer sub- movements of increasing distance. This predic t ion qualitatively matches the available longitudinal da ta showing tha t infants reach with fewer movemen t uni ts o f increasing dis tance as de- velopment proceeds (von Hot , ten, 1991 ; Thelen et al., 1993).

An unexpected result was seen in the t rade-off of movemen t speed and n u m b e r of submovements . The s imulat ions show tha t wi th high stochastici ty the efficient strategy is to reach for

L E A R N I N G T O R E A C H 8 1 5

A

C

140

120

i00

80

60

40

20

0

-20.

140

120

i00

80

60

40

20

0

-20

~0

20 40 60 80 100 120 140

D.

I~IIII IoI.lII:llI liIo| ~III i |,

140

120

I00

80

60

40

20

o - 2 0 .

- , : 0

D 1 4 0

120

100

8 0

6 0

4 0

20

0

-2 2

2'0 40 60 80 i00 120 140

D

0 0 2"0 40 60 80 I00 120 140

Figure 3. Hand positions at the end of each movement unit for 300 simulated reaches. Data plotted as in Figure 2. These reaches are using an arm model with k = 2 × 10 -5. Note that the optimal strategy is to try to move the hand to the target in one movement unit. The "scatter" in hand positions in Panel B is a numerical problem that is due to the coarseness of the representation of hand position in the data structure storing the Q-values.

the target with four or five relatively high speed units. As sto- chasticity decreases with age, an abrupt transition in strategy is seen. At about k = 15 × 10 5, the optimal strategy shifts from reaching with four or five high-speed units to reaching with one or two low-speed movements. Decreasing stochasticity beyond this point leads to faster and faster movements composed of one or two units. This leads to the counterintuitive prediction that a decrease in movement speed will be seen at a point in devel- opment when the number of submovements decrease.

E x p e r i m e n t 1

The current model assumes that infant reaches are a se- quence of submovements and that by moving in such a se- quence infants are able to correct for early inaccuracies in the reach by making corrections late in the sequence. Although the mathematical implementation above makes the simplifying as- sumption that the hand comes to rest between submovements, the available data clearly show that the submovements overlap in time. The goal of Experiment 1 is to relate the empirically obtained data with the simple assumptions of the mathematical model and to obtain a theoretical understanding of the form of the submovements underlying infant reaches.

Von Hofsten (1979) was the first to argue that infant reaches are composed of a series of submovements and to define sub- movements by minima of hand-speed profiles. Fetters and Todd ( 1987 ) later showed that the infant's hand changes direction at these times of minimal speed, a result that further supports the hypothesis that von Hofsten movement units are in fact func- tional submovements. Von Hofsten ( 1991 ) confirmed this anal- ysis in a longitudinal study, and von Hofsten and R6nnqvist ( 1993 ) confirmed these results in neonates.

Although these results are consistent with the view that infant reaches are composed of a sequence of submovements, two problems remain. First, changes in hand direction are often seen between speed valleys. Approximately 20% of all curvature maxima occur between speed valleys (von Hofsten, 1991; von Hofsten & R6nnqvist, 1993, Mathew & Cook, 1990). This finding is not consistent with the hypothesis that infants reach in a sequence of simple submovements. It is possible, how- ever, that the reason directional changes are observed within movement units is that the current segmentation methods do not fully decompose infant reaches into their elementary submovements.

A second problem is that the current segmentation methods are empirical in nature and do not have a rigorous theoretical justification. It is possible that the accelerations and decelera- tions of the hand during the reach are a reflection of the dynam- ics of the arm and not consequences of a sequence of actions, much like a damped pendulum accelerates and decelerates be- fore finally resting. A theoretical understanding of the submove- ments of the reach that is consistent with current theories of movement generation would provide strong evidence that infant reaches are composed of a sequence of submovements.

Examination of the literature reveals that adults typically move their hands in straight lines and show bell-shaped speed profiles when asked to move in between two points in space (Abend, Bizzi, & Morasso, 1982 ). Flash and Hogan ( 1985) sug- gested that adults plan movements between two points as a straight line and adopt movement kinematics that minimizes hand jerk. Experimental results confirm that adults use mini- mum-jerk trajectories when asked to make simple movements (Flash & Hogan, 1985). Although the minimum-jerk model

816 BERTHIER

5 "5

.o E

Z

.= 50 40 30 20 10 Stochasticity

(ID

121

500

400

300

200

100

O6' o

180

160 140

120 100

80

60 40

20

5'0 4'0 3'0 2'0 1'0 Stochasticity

s'o 4'o 3'0 2'o Stochasticity

~'o o

Figure 4. The simulated number (top), speed (middle) , and distance (bot tom) of the movement units as a function of increasing age. Increas- ing age was modeled as decreasing stochasticity in the a rm model. The values ofstochasticity are X 10 5. The speed and distance for movement units 2 and 3 are not given for stochasticities < 15 x 10 5 because these movements were almost always composed of one movement unit.

doe s n o t fit all t h e d a t a (e .g. , U n o , K a w a t o , & Suzik i , 1989) ,

m i n i m u m - j e r k p o l y n o m i a l s c a n a c c o u n t for a wide va r i e ty o f

a d u l t da ta . It is l ikely t h a t t he o b s e r v e d m i n i m u m - j e r k t r a jec to -

r ies a r i se n o t f r o m s o m e p l a n n i n g p r o c e s s as F l a sh a n d H o g a n

( 1 9 8 5 ) sugges t ed , b u t as a r e su l t o f t he d y n a m i c s o f t he a r m

itself.

Recent ly , F l a s h a n d H e n i s ( 1 9 9 1 ) e x t e n d e d m i n i m u m - j e r k

t h e o r y to t he case in w h i c h a d u l t s a r e r e q u i r e d to m o v e to a

sh i f t ed target . T h e y f o u n d t h a t t h e y c o u l d fit e x p e r i m e n t a l l y ob-

t a i n e d k i n e m a t i c d a t a by a s s u m i n g t h a t m o v e m e n t s to a sh i f t ed

ta rge t cons i s t o f two m i n i m u m - j e r k s u b m o v e m e n t s t h a t s i m p l y

s u m m e d . T h e c u r r e n t e x p e r i m e n t s eeks to e x t e n d F l a sh a n d

H e n i s ' s ( 1991 ) t h e o r y to i n f a n t r e a c h i n g m o v e m e n t s a n d s h o w

t h a t i n f a n t r e a c h e s cons i s t o f a s e q u e n c e o f m i n i m u m - j e r k sub-

m o v e m e n t s . M i n i m u m - j e r k p o l y n o m i a l s a re u s e d s i m p l y be-

c a u s e t h e y e m p i r i c a l l y fit t h e a d u l t d a t a a n d n o t b e c a u s e a n y

a s s u m p t i o n is m a d e in t he c u r r e n t m o d e l t ha t i n f a n t s act ively

p lan to m i n i m i z e m o v e m e n t jerk . I f t h i s e x t e n s i o n is success fu l ,

t h e n s t r ong ev idence w o u l d be o b t a i n e d t ha t i n f a n t s m o v e in a

s e q u e n c e o f e l e m e n t a r y s u b m o v e m e n t s a n d t ha t the u n d e r l y i n g

m o v e m e n t s a re s i m i l a r to s i m p l e p o i n t - t o - p o i n t m o v e m e n t

m a d e by adu l t s . A s i m i l a r b u t a theo re t i ca l m e t h o d h a s b een

deve loped by M i l n e r ( 1992 ) for a d u l t r eaches .

Method

Participants. Six 6.5-month-old infants were used as participants in the current experiment. Three of the infants were male and 3 were fe- male. Infants were recruited as part of a larger longitudinal study, and because of the time constraints of that study, infants were from families whose parents had ties to the university. All infants were born at full- term, in good health, and had a normal course of development.

Stimulus and apparatus. Infants were presented with a hand-held plastic replica of "Big Bird" (7 cm in length). A small rattle was held behind the toy. As part of the larger longitudinal study, inthnts were also tested in the dark with a glowing toy and in the dark with a sounding toy. Only data from the fully i l luminated trials are presented here.

The infants were videotaped throughout the session at 33 f rames / s with an infrared camera (Panasonic WVI800) that was placed to the right of the infant for a side view of the reaches, in addition to the vid- eotape, the reaches were recorded using an Optotrak motion analysis system ( Northern Digital ). This system consists of three infrared cam- eras that generate estimates of a marker 's position in three-dimensional coordinates. In the current experiments, four infrared emitting diodes ( IREDs) were placed on the infant 's a rm and used as markers. The Optotrak system estimated the positions of these markers at a rate of 100 Hz. Position data were acquired during 15-s trials. Two IREDs were taped on the back of the infant 's right hand, one just proximal to the joint of the index finger and one on the ulnar surface just proximal to the joint of the little finger. Two IREDs were used in order to keep at least one in camera view if the infant rotated his or her hand during the reach. Infants of this age are not bothered by the IREDs and tend to ignore them once they are in place. One IRED was placed on the exper- imenter 's hand that held the toy. The Optotrak cameras were placed above and to the right of the infants,

Both the video camera and the Optotrak system were fed through a date-timer ( For-A ) and into a videocassette recorder ( Panasonic Model 1950) and a video monitor (Sony Model 1271 ). The Optotrak system and the date-timer were triggered simultaneously in order to time-lock the IRED data with the videorecorded behavior for later scoring.

Procedure. Infants were seated on their parents" laps in front of the experimenter who presented the objects. The parents were asked to hold their infants firmly around the hips to support the infant and allow for free movement of the infant 's arms, The parents were further asked to refrain from at tempting to influence the infant in any way. A second experimenter was seated out of view to trigger the Optotrak system and to observe the infant on the video monitor. After the IREDs were ap- plied to the infanrs hand, the presenter indicated that the session could begin. Before each trial, the presenter got the infant 's attention while concealing the toy, and began the trial when the infant looked straight ahead and made eye contact. A trial consisted of the presenter signaling the other experimenter to trigger the Optotrak, then bringing the toy into view and shaking it slightly. The toy was presented at approxi- mately 30 ° from the midline in the infant 's right hemifield about shoul- der height and at a rm ' s length. Presenting the object in this hemifield encouraged right-handed reaching, which could be tracked with l REDs. The presenter held the toy in this position, shaking it intermittently, until the infant successfully reached for and grasped it or until the 15-s trial ended, lntertrial intervals lasted approximately 10 s.

Lighted, glowing, and sounding object trials were always presented in blocks of two trials per condition. The order of blocks was randomized

LEARNING TO REACH 8 1 7

across participants. Each infant continued to receive blocks of trials in random order until he or she had completed 18 total trials (6 per condition ) or until he or she became fussy. All infants bad at least one reach in each condition.

Data scoring. The trials were scored by viewing the Optotrak data in conjunction with the videotape. Each trial was first examined on the videotape to ascertain whether a reach was performed on that trial and approximately where during the 15-s trial the reach occurred. A reach was defined as a forward mot ion of tbe a rm that resulted in contact with the object that was not part of a turning mot ion or a torso rotation. The Optotrak data for successful reaches were then examined to determine whether the IREDs had remained within sight of the cameras for the duration of the reach. If a reach occurred, and the Optotrak data had gaps of less than 330 ms of missing data, the kinematic data were ana- lyzed. Of the two IREDs on the reaching hand, only the one that was missing the least amoun t of data were used for further analysis. If nei- ther IRED was missing any data, then the IRED that was used in the majority of the other trials for that participant was used.

Reach onset was defined as the momen t when the infant 's a rm began an uninterrupted approach toward the object and was determined by viewing the infant 's behavior on videotape. Two observers indepen- dently viewed each reach and noted the t ime of reach onset. In cases of disagreement, a third observer also viewed the videotape. If the third observer decided that reach onset was clearly one of the t imes noted by the other observers, the third observer chose that onset time. If tbe third observer could not decide which o f tbe other observer's t imes was most appropriate, the earlier t ime was used. The same procedure was used to determine the t ime of initial contact with the object, except that the later t ime was chosen in ambiguous cases.

Kinematic data analysis and computational methods. The data ob- tained from the Optotrak system are estimates of the true IRED posi- tion at the t ime o f t be sample. The dynamic p rogramming method of Busby and Trujillo (1985) was used to estimate the position, velocity, and acceleration of the hand. We used the criteria suggested by Busby and Trujillo for selecting B and used B = 1 X 10 - ~ . In the current article, the te rm speed refers to the magni tude of the velocity vector.

Following Flash and Henis (1991), we assumed that the observed speed profile is the simple sum of a series o f submovements , each sub- movement being described by a min imum-je rk speed profile. Mini- mum-jerk movements that start at position x = 0 and t ime t = 0, and end at position x = 1 and at t ime t = 1, are given by

x( t ) = 10t 3 - 15t 4 + 6t 5. (3)

Equation 3 can be parameterized to describe movements of arbitrary length and starting location (Flash & Hogan, 1985 ). The speed profile of a min imum-je rk movement is obtained by differentiating Equa- tion 3,

Jc(t) = 30t 2 - 60t 3 + 30t 4, (4)

where x is the time derivative o f x ( t ) . To fit arbitrary movements, we require a parameterization of Equa-

tion 4 that allows arbitrary shifting in t ime and scaling in t ime and height. An appropriate parameterization of Equation 4 is

x ( t )

where p is proportional to the peak speed, c is the t ime of peak, and w is the t ime duration o f tbe movement , x ( t ) is defined as zero i f t < c - w/ 2 o r t > c + w/2.

If Flash and Henis 's ( 1991 ) result is appropriate for infants and the observed speed profile is the simple sum of a series o f min imum-je rk submovements , then the observed speed profile should be the simple

sum of a set of functions described by Equation 5. If there are n sub- movements indexed by i, then the observed speed profile should be

~( t ) = ~ x i ( t ) . (6) i

The parameters that control the position and shape of the i m sub- movement will be denoted by p~, q , and w~.

Although Equation 5 is nonlinear, its parameters can be estimated relatively easily because infant submovements are separated in time. In summary , the present algorithm first estimates the parameters of the largest submovement of the reach. This submovement is then subtracted from the data, and the largest remaining submovement is fit. This sub- traction-fitting cycle proceeds until all the submovements have been fit. The parameter estimation step of each iteration uses gradient descent on the error function to compute parameter estimates.

The data are first filtered and differentiated by the algorithm of Busby and Trujillo ( 1985 ). Next, the m a x i m u m of the speed data is found, and it is assumed that the m a x i m u m occurs near the center, c;, o f tbe largest submovement . The parameters of the largest submovement are then es- t imated by fitting a movement of Equation 5 to the peak and its neigh- boring four points. Because the data were sampled at 100 Hz, this fitting was done using 50 ms of data. The parameter estimation is done by gradient descent on the mean squared error function,

j + 4

E = Z [ x ( j ) - x ( J ) ] 2, (7) J

where x is the actual segment of data to be approximated, .~ is the ap- proximation, and j is the t ime index of the first of the five points of x that we are trying to approximate. The parameters, p;, ci, and wi, are adjusted to go down the gradient,

OE OE OE zXpi = - c ~ - - ~c; = AW; = (8)

Op," -~ 7c,' - 'r Ow,"

Step sizes, a, fl, % were adjusted according to Darken and Moody ( 1991 ). The initial values for the parameters, p;, c;, and w;, were pro- vided by the peak speed of the data, the t ime of peak speed, and by a default value of 30 ms, respectively. Experimentation with various numbers of steps in the gradient descent indicated that 50,000 steps worked well in an acceptable amoun t of computer time.

After the 50 ms of the data were fit, the function was extended from t = q - w~/2 to t = c~ + wJ2. This function describes the largest sub- movement. Because Flash and Henis ( 1991 ) showed that multiple sub- movements sum, we then subtracted this submovement from the data and proceeded to fit the next largest submovement of the remaining data. A pseudo-R 2 was computed to assess the quality of the fit by di- viding the mean squared error of the approximation by the mean of the sum of the squared data and subtracting it from 1. The iterations continued until one of four criteria were met: (a) The mean squared error was less than 40.0, (b) the addition of the last submovement re- duced the mean squared error less than 20.0, (c) on the average, there was more than one submovement per 100 ms, or (d) the pseudo-R 2 was greater than .992.

Once ~he number and parameters of the submovements in a reach were determined, a global gradient descent on the error function was performed to optimize the parameters. The global gradient descent was performed as in each of the previously ment ioned fitting steps, except that all of the parameters were adjusted simultaneously. A computer program written in c implemented the decomposition algorithm (written by and available from Neil E. Berthier).

R e s u l t s

To i l lus t ra te h o w the d e c o m p o s i t i o n p r o c e d u r e works , de-

c o m p o s i t i o n s o f two m o v e m e n t s will be d e s c r i b e d in detai l . T h e

818 BERTHIER

first movement was the high-speed bat of the toy presented in Figure 1, and the second was a typical reach for the toy. The bat is particularly relevant to the question of the extension of Flash and Henis ( 1991 ) to infant data because it consists of two sub- movements and is directly analogous to the two submovements that Flash and Henis found in response to double shifts of a target.

The hand-speed profile of the bat is shown in Figure 5. At the start of this movement the baby was touching the toy and rap- idly brought her hand back toward the shoulder, reversed direc- tion, and rapidly brought the hand back in contact with the toy, hitting it at the end of the data sequence. This movement appeared to consist of two movements, up and back, and thus is directly analogous to the Flash and Henis double shift paradigm.

Panels A through C on Figure 5 show the sequential decom- position of the movement. The algorithm first chose to fit the peak occurring at 310 ms. The program chose initial values for the submovement and submitted the 50 ms around the peak to a local fit. Once the parameters of the peak were estimated, they were extended over the full domain of the submovement (250- 350 ms). The submovement after the first fit is shown in Panel A, the dashed line is a plot o fa parameterized version of Equa- tion 5. The program then chose to fit the peak at 150 ms. This peak was locally fit and the submovement extended, with the results shown in Panel B. After this step a third peak was at- tempted at 170 ms, but the mean squared error actually in- creased after the local fit so that the third peak was discarded.

The parameters representing the two submovements were then subject to global fitting. The results after the global fitting stage are shown in Panel C of Figure 5. The final fit was pseudo- R 2 = 0.9998. In all three panels the sum of the submovements is also shown by the dotted line (labeled a p p r o x i m a t i o n ) , but because it overlies either the tails of the submovement or the data itself, it is difficult to see.

Figure 6 shows the hand-speed profile for a typical reach per- formed by this infant. This movement had multiple peaks and was much lower in velocity than the movement shown in Figure 5. The decomposition of this reach is shown in Panels A through E. Panels A through D show the decomposition after the first, second, third, and fourth submovements. After the fourth sub- movement, a fifth submovement was attempted, but addition of the fifth submovement caused the mean squared error to in- crease, and it was discarded. Panel E of Figure 6 shows the de- composition after the global estimation step that resulted in a mean squared error of 124 and a pseudo-R 2 of 0.992.

Overall, the six infants reached on 52 trials in the light in their experimental sessions. Fifty of these reaches were made with the hand that had the IREDs on it, and 43 of the reaches had data with sufficient Optotrak data to be differentiated. These 43 reaches were submitted to the decomposition procedure de- scribed earlier.

The experimenter visually inspected the decompositions and provided an estimate of the quality of the fit. Of the 43 decom- positions, 35 were judged "good" by the experimenter and were similar to the decomposition displayed in Figure 6. Six other decompositions were judged "adequate" by the experimenter, and two were judged "poor." The two decompositions that were judged poor included a very slow reach in which the maximum speed was 60 mm/s and a decomposition in which the local fit was acceptable, but the global fit resulted in the widening of a single submovement to span 600 ms. The mean pseudo-R 2 was .993 for the 41 reaches judged good or adequate. The mean pseudo- R 2 was .872 for the two reaches judged poor.

Figure 7 shows boxplots for durations of the submovements, for the time delay between subsequent submovements, and for the peak speeds of the submovements based on 239 submove- ments. The plots show that the submovements used by the in- fants are not random and show certain characteristics. For ex- ample, the central half of the submovement durations ranged

E

u'J

A 1400

1200

1000

800

600

400

200

0

Data - -

5" 0 ' ' ~ ,t , J 100 150 200 250 300 350

Time (ms)

Approximation . . . . . . . . . . .

B

w

50 100 150 200 250 300 350 Time (ms)

Submovements . . . . . . . . . . .

C

. . . . .~, ' % . ,

0 50 100 150 200 250 300 350

Time (ms)

[\igure 5. Steps in the decomposition of the hand-speed profile for a rapid bat of the toy. A. The fit after the first iteration. The algorithm has added a peak in the hand-speed profile at about 300 ms. B: The fit after the second iteration, where the algorithm added a peak at about 150 ms. C: The approximation after a global fit, where the parameters of both submovements were adjusted simultaneously. The overall pseudo° R 2 for this fit was 0.9998.

LEARNING TO REACH 819

A 350

E g 25O

Q. if)

150

100

50

Data Approximation . . . . . . . . . . . Submovements ...........

B C

100 200 300 400 500 600 0 Time (ms)

D 40O

350

g 300 ~ 2 5 0

~ 200

m 150

100 /',, 0 100 200 300 400 500 600

Time (ms)

/ ".-"

100 200 300 400 500 600 Time (ms)

E

/',, # ' ' . , . . : ' ,= , ,

0 100 200 300 400 500 600 Time (ms)

:.', : ,. • ;-,4,

: , /!i?, 100 200 300 400 500 6C)0

Time (ms)

Figure 6. Decomposition of a multisubmovement reach. The algorithm adds a single submovement at each stage of the iteration from Panels A-D. Panel E shows the decomposition after the parameters of the submovements were adjusted simultaneously. The overall pseudo-R 2 for this fit was 0.992.

between 230 to 316 ms. All three distributions showed a hand- ful of numerically large outliers.

Discussion

The current results show that it is possible to decompose a large proportion of infant reaches into an underlying sequence of submovements that resemble simple movements of adults. These results provide a theoretical understanding of the sub- movements themselves and how they are combined to form a reach, and they suggest that infant reaches are composed of a sequence of simple actions as hypothesized by the current model. The data also show that a major difference between adult reaches observed in other experiments (e.g., Flash & Ho- gan, 1985 ) and infant reaches is that adults are able to generate movements that reach targets in a single submovement, whereas infants require several submovements.

Von Hofsten ( 1991 ) found that single movement units did not preserve their basic shape from action unit to action unit when movement units were defined by speed minima. Move- ment units defined by speed minima often have asymmetrical and irregular shapes. He also found that movement unit dura- tion and movement unit speed were relatively poorly correlated. These results led yon Hofsten (1991) to suggest that single movement units are not generated by a single, simple rule, but that they were generated in some complex way that was highly context dependent. The current results show it is possible to

decompose infant reaches into submovements that have a sin- gle, invariant shape. Further, the shape of these individual sub- movements is symmetrical and is the same as the shape of sim- ple adult reaching movements. The current results suggest, therefore, that generation of the reach may be simpler than yon Hofsten ( 1991 ) supposed and that the submovements may be the result of a relatively simple process.

To the extent that the current algorithm correctly identifies submovements of the reach, it is clear that submovements in infants overlap in time (e.g., Figures 5 and 6). On the other hand, the mathematical implementation of the model makes the simplifying assumption that action i is completed before action i + 1 is generated. Clearly, further development of the model will require modifications that allow for submovements to be generated before preceding submovements are completed.

An empirical finding of the current experiment is that infant reaches can be decomposed into the underlying movements for a large majority of infant reaches. This finding allows for a mi- croexamination of reach kinematics to determine how infant reaching is planned and executed. The following experiment ex- ploits this finding and analyzes single submovements to deter- mine whether the directional variability is similar to that pre- dicted by the current mathematical model.

Exper imen t 2

The model hypothesizes that multiple submovement reaches are the result of the infant's inability to precisely control the

820 BERTHIER

"G" ¢1 U)

E E

v

¢1 I1} C:L

09

g O)

E . _ I--

1200

1100

1000

900

800

700

600

500 -

400

300

200

100

0

X

O

o

I I I

Our D e l a y Pk S p e e d

Figure 7. Boxplots for submovement duration (Dur), delay between the start of subsequent submovements ( Delay ), and submovement peak speed (Pk Speed) for 239 submovements. Each plot shows the 75th and 25 t, percentiles as the upper and lower boundaries to the rectangle (the hinges), the median as the central horizontal line, and the mean as the plus sign. The whiskers on the box show the most deviant score that is less than 1.5 hingespreads from the hinges, and the outliers are labeled by circles ( 1.5-3.0 hingespreads from the hinge) and crosses (greater than 3.0 hingespreads from the hinge). See Smith and Prentice (1993) for further discussion ofboxplots.

arm. The model assumes that infants are uncertain as to the effects o f their motor commands on a rm position and as a result use a corrective sequence o f submovements to reach a target. The uncer ta inty assumpt ion is modeled by adding a Gaussian random variable to the c o m m a n d to movement mapping (Equat ions 1 & 2).

The validity of the assumpt ion that error in reaching leads to multiple submovement reaches can be directly tested by com- paring the variability of the simulated movements with the vari- ability o f movements made by infants. If the assumption is cor- rect, then the variability of movements in simulation should be comparable to the variability o f movements when infants move in multiple submovements . Exper iment 2 tests this predict ion by compar ing the directional variability o f reaches made by 8 infants at approximately the t ime of reach onset with the direc- tional variability of the simulated reaches of Figure 2.

M e t h o d

Participants. Eight infants were tested at approximately the age of onset of reaching. These infants were tested as part of a longitudinal

experiment and were brought once a week into the laboratory, starting at about 10 weeks of age. Data presented here were obtained from the session during which infants first made five successful contacts with the object ( age range = 13-18 weeks, M - 16.6 weeks).

Stimulus. apparatus, and procedure. The stimuli, apparatus, and procedure were identical to that of Experiment 1.

Data anaO'sis Data were scored, differentiated, and decomposed as in Experiment 1. The position of the hand at the time of submovement onset and offset was then determined. The direction of hand movement during the submovement was defined by the vector connecting these two points. The direction of the target at the time ofsubmovement onset was then determined. Directional error of the hand during the submove- merit was then described by an error in azimuth and an error in eleva- tion. The error in azimuth was defined as the angle to the left or right of the target, and the error in elevation was defined as the angle above or below the target.

Directional error was computed similarly for the simulated data. The time ofsubmovement onset and offset was determined directly from the data in the simulations of Figure 2. Angular error was computed as just described, but because the simulated reaches are two-dimensional movements, a single error in azimuth completely describes the angular error.

Resu l t s

The distributions of angular errors are shown in Figure 8 with the error from the simulated reaches shown by the dashed lines with boxes, the error in elevation by the solid lines, and the error in azimuth by dotted lines with crosses. The infant data are based on 113 submovements of 27 reaches. Figure 8 shows that the distributions of the infant and the simulated data are ap- proximately the same shape. The mean error in azimuth was 2.1 ° with a standard deviation of 67 °. The mean error in eleva- tion was - 4 . 5 ° with a standard deviation of 60 °. Neither mean was significantly different from zero, t( 112 ) = 0.335 and t (112) = -0 .798 , respectively, p > .05. The 95% confidence intervals

.3° !

.25

.20

o

~- .15

.10

.05

0

: Azimuth -*- ,~', Elevation -*

# :', Simulations -0- , I :', , i ,1 , a

' ~

-200 -150 -100 -50 0 50 100 150 200 Degrees

Figure 8. The distributions of the angles between the direction of the target and the direction of the hand movement. One angle (degrees elevation) indicates directional error above or below the target, and the other (degrees azimuth) indicates directional error to the left or right. Angular error is also shown from the simulated reaches of Figure 2 (Simulations).

LEARNING TO REACH 821

for the means were -10.3 < ts < 14.5 and -15.7 < ~ < 6.7, respectively. The mean of the simulated data was -2.17 degrees with a standard deviation of 29.8*.

Discussion

The close fit of the infant and simulated movement variability provides direct support for the assumptions of the mathemati- cal model. First, the comparison shows that although the infants move directly toward the target on the average, there is consid- erable directional variability on a submovement-by-submove- ment basis. As is assumed in the model, infants do seem to be attempting to move toward the target, but the motor commands they use lead to random error in execution. Second, the data show that the variability of infant movement is closely approxi- mated by the zero mean, gaussian random variable assumed by the model. Third, the amount of error in infant reaching, which is the driving force in making multiple submovement reaches more efficient, is comparable to that assumed in the simulations that lead to predicted reaches of multiple submovements.

Genera l Discussion

The current paper presents a mathematical model of the de- velopment of reaching that builds on the work of von Hofsten (1993), Thelen et al. (1993), and Meyer et al. (1988). The model assumes that infants learn to reach by evaluating the effects of movement commands on the state of their arm and on the state of the environment. The model hypothesizes that infants are primarily attempting to deal with their uncertainty in the motor command to movement mapping. Simulations were presented that qualitatively match the kinematic develop- ment of reaching. Experimental data were presented that sup- port two key assumptions of the model. Experiment 1 showed that infant movements could be decomposed into the underly- ing submovements using a principled method, and Experiment 2 showed that the angular error in infant reaches matches the form and magnitude of error assumed by the model.

The current model hypothesizes that the development of reaching is the result of infants' active exploration of their own motor abilities and the constraints of the task. This hypothesis is not a new one (e.g., Gibson, 1988, von Hofsten, 1993, Thelen et al., 1993 ), but the current work differs from earlier work in presenting a concise, mathematical model of the process. The current simulations produce movement strategies that depend jointly on the properties of the muscle-arm system and on the size of the target. Whether motor learning is described as soft assembly (Thelen et al., 1993) or as search of a task space (von Hofsten, 1993), the current simulations show that learning by exploration is an effective means of selecting efficient strategies for movement.

The model, though clearly simplified, captures many of the important features of the development of reaching. The model characterizes infants, not as using a collection of reflexes, but as individuals that explore possibilities for action and select the most efficient acts. Infants are characterized not simply as re- acting to changes in their motor systems and the environment, but as continually searching for the most efficient way of ac- complishing tasks. This view is similar to that of Haith (1980), who characterized neonates as using eye movement strategies to

maximize the collection of information about the environment. The model and data presented here also show that reach kine- matics depend on the intentions of the infant. As shown by Fig- ure 1, infants use one kinematic pattern to grasp objects and another to bat objects. Thelen et al. ( 1993 ) has also shown that infants adapt their reaching kinematics to increase the proba- bility of grasping an object.

In the current work, motor learning was modeled using Q- learning. Q-learning is a simple algorithm that relies on the ac- tor trying various motor actions and evaluating the results. Freedland and Bertenthal (1994) have emphasized that motor learning is critically dependent on generation of a rich set of possible actions from which the most appropriate can be se- lected. If too large a set of possible actions is used, improvement in performance will be slow because so many actions need to be tested. Q-learning attempts to limit the set of possible actions, but the algorithm is still slow because a significant number of actions are executed a large number of times. Perhaps the most important difference between motor learning in infants and Q- learning is that infants are much better at selecting and limiting the number of possible actions during learning (Siegler, 1994).

Q-learning stores the current information about the utility of actions as a Q-function, which in the current work was stored as a look-up table indexed by state and action. Representation of information in this way is not only psychologically unrealis- tic, but an impediment to learning. Representation of the Q- function in a more psychologically realistic way that readily al- lows for interpolation and generalization of information should lead to a better selection of actions and to faster and more real- istic learning (e.g., Albus, 1981; Poggio & Girosi, 1990; Rosen- baum, Loukopoulos, Meulenbroek, Vaughan, & Engelbrecht, 1995).

A second way in which the current simulations are too sim- plistic is in the initial movements of the arm at the beginning of learning. In the current simulations, the initial actions are essentially random because the Q-function was initialized by uniformly setting it to 0.1. However, infants do not start reach- ing by randomly moving their arms. Infants start reaching with a set of movements that are the result of maturation of the neu- romuscular system and their already extensive experience in moving their arms (Thelen et al,, 1993; von Hofsten, 1982). This initial biasing of reaching strategies has the effect of accel- erating the course of development of reaching.

The current model stresses that perceptual development must go hand in hand with motor development. Improvement in mo- tor control is contingent on the infants' ability to perceive the position of the target and the posture of the arm. Although some investigators have emphasized the infant's use of vision of the hand in early reaching (e.g., Bushnell, 1985), results are accu- mulating that emphasize the role of proprioception and effer- ence copy in estimating the posture of the arm. Clifton, Muir, Ashmead, and Clarkson (1993) Showed that lack of vision of the hand around the onset of reaching has no effect on the suc- cess of reaching. Clifton et al. (1994) showed that lack of the vision of the hand did not cause 6-month-old infants to alter the way they reach. Robin, Berthier, and Clifton (1996) showed that even when infants are required to catch a moving object, that lack of vision of the hand has little effect on the success of reach- ing. These data all suggest that learning to reach, sight of the

822 BERTHIER

target, and proprioception from the a rm are far more important than sight of the hand.

The current model makes the simplifying assumption that submovements of the reach do not overlap in time. This translates in the model into having the infant make a submove- merit, have the hand come to a stop, perceive the position of the hand, and generate the next submovement. However, the data clearly show that reach submovements do overlap (e.g., Figure 1 ). If infant reaches are a sequence of correcting submove- ments, then infants seem to be able to generate a correction while the hand is in mot ion and to generate a correction for any error in the next submovement. This process may not involve any explicit prediction of future l imb positions on the part of the infant but may be accomplished by learning a mapping be- tween the current state and the correct action (e.g., Berthier, Singh, Barto, & Houk, 1993, p. 63).

Experimental results show that adults are able to make rapid corrections while the hand is in motion (e.g., Goodale, P61isson, & Prablanc, 1986; P~lisson, Prablanc, Goodale, & Jeannerod, 1986). However, the data on infants are more equivocal. Ash- mead, McCarty, Lucas, and Belvedere (1993) analyzed the hand paths of infants in a target switching exper iment in the dark and found that 9-month-olds made in-flight corrections to switched targets. However, Ashmead et al. ( 1993 ) did not find evidence for in-flight corrections o f 5-month-olds, a result that might be the result of a lack of experimental power.

In sum, this article presents a mathematical model that gives a precise implementat ion of a theory of the development of reaching. Many ideas in the theory have been discussed by oth- ers. The model characterizes infants as actively exploring op- tions for movement and selecting the most efficient movement strategies. Infants are situated in a particular envi ronment and act in order to fulfill their goals and intentions. Because the ki- nematics of reaching depend on what infants intend to do with objects (e.g., Figure 1 ), kinematic analyses of infant reaching offers a tool with which to investigate infant cognition. Recent work using kinematic analyses has provided evidence that in- fants form mental representations of objects they can no longer see (Clifton, Rochat, Litovsky, & Perris, 1991 ), that infants an- ticipate movements of objects in accordance with the law of inertia (von Hofsten, Spelke, & Feng, 1993), and that infants aim ahead of moving objects in order to grasp them (Robin, Berthier, & Clifton, 1996).

R e f e r e n c e s

Abend, W., Bizzi, E., & Morasso, E (1982). Human arm trajectory formation. Brain, 105, 331-348.

Albus, J. (1981). Brains, behavior, and roboti~s. Peterborough, NH: Byte Books.

Ashmead, D. H., McCarty, M. E., Lucas, L. S., & Belvedere, M. C. (1993). Visual guidance in infants' reaching toward suddenly dis- placed targets. Child Development, 64, 1111-1127.

Barto, A. G., Bradtke, S. J., & Singh, S. P. ( 1995 ). Learning to act using real-time dynamic programming, Artificial Intelligence, 72, 81-138.

Berthier, N. E., Singh, S. E, Barto, A. G., & Houk, J. C. (1993). Dis- tributed representations of limb motor programs in arrays of adjust- able pattern generators. Journal of Cognitive Neuroscience, 5, 56-78.

Busby, H. R., & Trujillo, D. M. ( 1985 ). Numerical experiments with a new differentiation filter. Journal of Biomechanical Engineering, 107, 293-299.

Bushnell, E. (1985). The decline of visually guided reaching during infancy. In[ant Behavior and Development. 8, 139-155.

Cliftom R. K., Muir, D. W., Ashmead, D. H., & Clarkson, M. G. (1993). Is visually guided reaching in early infancy a myth? Child Do,elopment, 64, 1099-1110.

Clifton, R. K., Rochat, E, Litovsky, R., & Perris, E. (1991). Object representation guides infants' reaching in the dark. Journal qfExper- imental Psychology: Human Perception and Performance, 17, 323- 329.

Clifton, R. K., Rochat, P., Robin, D. J., & Berthier, N. E. (1994). Multimodel perception in the control of infant reaching. Journal qf Experimental Psychology: ttuman Perception and Perlormance, 20, 876-886.

Darken, C., & Moody, J. ( 1991 ). Note on learning rate schedules for stochastic optimization, l~k, ural h~lbrrnation Processing Systems, 3, 832-838.

Fetters, L., & Todd, J. ( 1987 ). Quantitative assessment of infant reach- ing movements. Journal q['Motor Behavior, 19, 147-166.

Fitts, E M. (1954). The information capacity of the human motor sys- tem in controlling the amplitude of movement. Journal ¢~[kL,,~peri - menial Psychology, 47, 381-391.

Flash, T., & Henis, E. ( 1991 ). Arm trajectory modifications during reaching towards visual targets. Journal ¢?[Cognitive Neuroscience, 3, 220-230.

Flash, T., & Hogan, N. (1985). The coordination of arm movements: An experimentally confirmed mathematical model. Journal q/Neu- roscience, 7, 1688-1703.

Freedland, R. L., & Bertenthal, B. 1. (1994). Developmental changes in interlimb coordination: Transition to hands-and-knees crawling. Psychological Science, 5, 26-32,

Gesell, A. ( 1929 ). lnl~mcy and human growth. New York: Macmillan. Gibson, E. J. ( 1988 ). Exploratory behavior in the development of per-

ceiving, acting, and the acquiring of knowledge. Annual R~wiew of Psychology, 39. 1-41.

Goldfield, E. C., Kay, B. A., & Warren, W. H. ( 1993 ). Infant bouncing: The assembly and tuning of action systems. Child Development, 64, 1128-1142.

Goodale, M. A., P~lisson, D., & Prablanc, C. ( 1986 ). Large adjustments in visually guided reaching do not depend on vision of the hand or perception of target displacement. Nature, 320, 728-750.

Haith, M. M. (1980). Rules babies look by: The organization of new- born visual activity Hillsdale, N J: Erlbaum.

Lockman, J. J., & Thelen. E. (1993). Developmental biodynamics: Brain, body, behavior connections. Child Do,elopment, 64, 953-959.

Mathew, A., & Cook, M. (1990). The control of reaching movements by young infants. Child Development, 61, 1238-1257.

McGraw, M. B. (1943). The neuromuscular maturation of the human injant. New York: Columbia University Press.

Meyer, D. E., Abrams, R. A., Kornblum. S., Wright, C. E., & Smith, J. E. K. ( 1988 ). Optimality in human motor performance: Ideal con- trol of rapid aimed movements. P.s3,choh~gical Ro,iew, 95, 340-370.

Milner, T. E. ( 1992 ). A model for the generation of movements requir- ing endpoint precision. Neuroscience, 49, 487-496.

P~lisson, D., Prablanc, C., Goodale, M. A., & Jeannerod, M. (1986). Visual control of reaching movements without vision of the limb. II. Evidence of fast unconscious processes correcting the trajectory of the hand to the final position of a double-step stimulus. Experimental Brain Research, 62, 303-31 i.

Poggio, T., & Girosi, E ( 1990, September). Networks for approxima- tion and learning. Proceedings ~f the 1EEE, 78, 1481-1497.

Robin, D. J , Berthier, N. E., & Clifton, R. K. ( 1996 ). Infants' predictive reaching for moving objects in the dark. Developmental Ps),chology, 32, 824-835.

Rosenbaum, D., Loukopoulos, L., Meulenbroek, R., Vaughan, J., & Engelbrecht, S. (1995). Planning reaches by evaluating stored pos- tures, t3wchological Reviews; 102, 28-67.

LEARNING TO REACH 823

Siegler, R. S. (1994). Cognitive variability: A key to understanding cog- nitive development. Current Directions in Psychological Science, 3, 1-5.

Smith, A. F., & Prentice, D. A. (1993). Exploratory data analysis. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behav- ioral sciences, statistical issues (pp. 349-390). Hillsdale, NJ: Erlbaum.

Thelen, E. (1994). Three-month-old infants learn new patterns of in- terlimb coordination. Infant Behavior & Development, 17, 978.

Thelen, E., Corbetta, D., Kamm, K., Spencer, J. P., Schneider, K., & Zernicke, R. E (1993). The transition to reaching: Mapping inten- tion to intrinsic dynamics. Child Development, 64, 1058-1098.

Uno, Y., Kawato, M., & Suziki, R. (1989). Formation and control of optimal trajectory in human multijoint arm movement. Biological Cybernetics, 61, 89-101.

von Hofsten, C. (1979). Development of visually directed reaching: The approach phase. Journal of Human Movement Studies, 5, 160- 168.

von Hofsten, C. ( 1982 ). Eye-hand coordination in the newborn. Devel- opmental Psychology. 18, 450-461.

von Hofsten, C. ( 1991 ). Structuring of early reaching movements: A longitudinal study. Journal of Motor Behavior, 23, 280-292.

von Hofsten, C. (1993). Prospective control: A basic aspect o f action development. Human Development. 36, 253-270.

von Hofsten, C., & R/Snnqvist, L. (1993). The structuring of neonatal arm movements. Child Development, 64, 1046-1057.

von Hofsten, C., Spelke, E., & Feng, Q. (1993). Principles of predictive reaching in 6-month-old infants. 34 th Annual Meeting of the Psycho- nomic Society Abstracts, 9.

Watkins, C. J. C. H., & Dayan, E (1992). Q-learning. Machine Learn- ing, 8, 279-292.

A p p e n d i x

Q - L e a r n i n g

Stochastic, discrete-time, optimal-control problems are mathemati- cally defined by a state set s, which contains all the states of the dynam- ical system, an action set a, which contains the possible actions (each action is not necessarily possible in all of the states), and the state-tran- sition probability mapping, which describes the probability of making a transition from a particular state, st, to a particular next state, s,+~, given that action a was taken. Each state transition of the dynamical system is viewed as having a particular cost. In the case where a time optimal solution is required, the cost of each state transition is the time it takes for the transition to occur. In the current context, the cost, c(s~, a, s~+~), of the transition from state st to state s~+~, given that action a was taken, is simply the time it takes for the arm to move to a new configuration. The computational task is to develop an optimal policy that maps each state to its optimal action or actions.

The true Q-function is written Q(s, a), and it is the expected time it takes to reach a goal state if action a were taken in state s. Q-learning starts with an arbitrary Q-function and iteratively computes the true Q- function. At step k of the iteration, an estimate of the true Q-function is provided by Qk(S, a). The current estimate of the best action in state s is the action with minimum Qk(S, a).

In the current simulations the Q-function was initialized by setting it uniformly to 0.1. The system was then set to the start state, (0, 0). An action was then chosen, applied to the plant, and a state-transition simulated. The Q-function was then updated for that action and state by the following rule:

Qk+ i (si, a j)

= ( 1 - a)Qk(si, aj) + a[c(si, aj, st+l) + 3"(rain Q(Si+l, a) ) l , (AI ) a ~ A

where si is the current state in which action aj was taken, and si+l is the actual next state of the dynamical system, a is a learning rate parameter, and 3' is the discount factor. The process was then repeated until a goal state was reached. The state was then reset to the start state, and the process repeated until 100,000 to 300,000 reaches were completed.

How one chooses actions in Q-learning is a key decision. Early in learning the Q function contains little usable information about how to choose actions because it was initially set uniformly to . I. Late in train- ing, the Q-function hopefully becomes a good guide to optimal actions because it has cached information about the expected results o f partic- ular actions. In the present simulations, actions were chosen according to a probabilistic exploration strategy that smoothly changes the proba- bility of actions as training proceeds ( Darken & Moody, 1991 ). Because the Q-function is of little utility early in training, the method randomly chooses actions in the initial stages of training. As training proceeds, actions are chosen with probabilities that reflect their Q-values, and late in training actions are chosen deterministically according to the Q- function.

To accelerate convergence to the optimal Q-function, neighboring states were lumped together by dividing the state space into boxes that were 10 × 10 mm. States that were within a particular box had the same Q-values. a was set to 0.1, and because the simulations were finite horizon problems, 3' was set to 1.0.

Rece ived S e p t e m b e r 30, 1994 Revis ion received O c t o b e r 24, 1995

A c c e p t e d O c t o b e r 24, 1995 •