balancing inverted pendulum using neurral networks

8/20/2019 Balancing Inverted Pendulum Using Neurral Networks

http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 1/6



I

2 IllVERTEDPmuI m

i nver t ed pendul umconsi st s of a movabl e cart and a pol e mount ed on

i t w t h a pi vot havi ng no f r i cti on, as shown i n f i g. 1. The car t i s al -

l owed t o move w t hi n t he bounds of a one- di mensi onal hor i zontal t r ack and

t he pendul um i s

f r ee t o f al l ar ound t he pi vot . The obj ecti ve

i s

t o bal ance

t he pol e and gui de the car t t o a speci f i ed posi t i on on t he tr ack.

The dynamcs of t he systemobeys t he f ol l ow ng nonl i near di f f er ent i al

equati ons.

The

d0/ dt = w 1)

( f +m l wsi ne)

gsi n0- cos8

mc + m

41 - l mDcos O

w dt = g, ( O, w, f ) =

c

c+ m

P

dx/dt = v

f +ml {w2si n0- ( dw/ dt ) cos0)

P

m

v/ dt = gZ( O, WdWdt , f ) =

C

wher e

f :

0:

W

X:

V

g

m

m

1

C

P

f orce appl i ed t o the car t s center of mass,

angl e of t he pendul umw t h r espect t o t he ver t i cal ,

=dE)/ dt): angul ar vel oci t y of t he pendul um

hori zont al posi t i on of the car t rel at i ve t o the tr ack,

=dx/dt) :

2

=9.8m/s

: accel er at i on due t o gr avi t y,

( =O. O Kg) : mass of t he car t ,

hori zont al vel oci t y of t he car t ,

3 )

4)

( =O. l Kg) : mass of t he pol e,

( =0. 5m: hal f of t he pol e l engt h.

I n the comput er si mul ati on descr i bed i n

5,

t he above di f f er ent i al equa-

t i ons

ar e t r ansf ormed i nt o t he f ol l ow ng di f f er ence equat i ons w t h a ti me

st ep of T=O. Ol s, usi ng the Eul er appr oxi mati on.

These equat i ons are assumed t o be not known: t hey wer e onl y used t o si mu-

l ate the dynamcs of t he pl ant .

3. LEARNI NG

TEE

MDEL

I n order t o deter m ne the appr opr i ate f orces to bal ance the pol e at t he

speci f i ed posi t i on i t i s needed t o know t he dynamcs

or t o l ear n a model of

t he i nvert ed pendul um Lear ni ng a model of t he i nvert ed pendul um i s basi -

cal l y a system i dent i f i cati on probl em Here i t i s done by a mul t i l ayer

neur al net work capabl e of model i ng nonl i near pl ant s.

215



Fi g. 2 . shows a schemati c di agr amof t he whol e syst em I t consi st s of

t he pl ant ( t he i nver t ed pendul um and t wo neur al net wor ks, one ( r i ght i n

t he f i gure) f or i dent i f i cat i on and t he other ( l ef t ) f or cont r ol . The

neur al net works are both st andard l ayered ones usi ng t he f ol l ow ng si g-

moi dal acti vat i on f uncti on.

h(z)=

[l-exp -z)l/[l+exp -z)l. 9)

The neur al i dent i f i er consi st s of f our out put uni t s and t hr ee hi dden

uni t s. The state var i abl es,

0.

w

x and v, observed f r omt he pl ant and the

cont r ol f orce, f , pr ovi ded by t he cont r ol l er are appl i ed to t he net work as

i nput s. The hi dden uni t s r ecei ve t hese var i abl es, and t he out put uni t s

r ecei ve t hem as wel l as t he out put s of t he hi dden uni t s. Her e we are

assum ng t hat t he st at es of t he pl ant ar e di r ect l y observabl e w t hout any

ki nd of ext ernal i nt er f er ence. Four out put s of t he net work r epr esent s t he

pr edi ct i on of t he stat e var i abl es of t he i nver t ed pendul um at t he next t i me

st ep.

The neur al net wor k i s t r ai ned usi ng t he back- propagati on l ear ni ng

al gori t hmsuch t hat t he out put of t he network cl osel y matches t he out put of

t he r eal pl ant . The t r ai ni ng pr ocess begi ns w t h an i ni t i al st at e whi ch

i s

generat ed r andom y i n each r un. At t i ne t , dur i ng each r un, t he i nput of

t he neural net wor k i s set equal t o t he cur r ent st at e of t he pl ant . The

neural net wor k i s t r ai ned, usi ng the val ue of t he next st at e of t he pl ant

as a desi r ed r esponse, so as t o pr edi ct t he next st at e o f t he pl ant at t i me

t +T.

At each sampl i ng t i me, t he cal cul ati on i s perf ormed i n t hree st eps. I n

t he f orward pr opagati on st ep, t he out put

of

each pr ocessi ng uni t i n t he t wo

neur al nets

i s

cal cul at ed l ayer by l ayer f rom i nput t o out put . I n t he

backward propagat i on st ep t he err or bet ween t he t arget out put and t he

act ual out put

i s

cal cul at ed, and i t i s propagat ed backwar d t hrough the

network onl y i n the neural

i dent i f i er . The wei ght modi f i cat i on t akes pl ace

af t er t he backward pr opagat i on st ep i s compl eted. The wei ght o f al l i nt er -

connect i ons bet ween uni t s are modi f i ed usi ng t he out put of each uni t and

t he backpr opagated err or .

4. LEARNING

FE

CONTROLLER

The neur al cont r ol l er i s composed of f our i nput uni t s ( corr espondi ng t o

t he st ate vari abl es). one out put uni t ( pr oduci ng t he cont r ol si gnal ) , and

t hr ee hi dden uni t s. I t cont ai ns di r ect connecti ons f r om i nput to out put .

At each sampl i ng t i me. t he neur al cont r ol l er recei ves f our st at e var i abl es

of t he pl ant . Empl oyi ng t hi s i nf ormati on, the neur al cont r ol l er determ nes

t he f orce necessar y f or mai nt ai ni ng t he pendul umver t i cal and t he cart

movi ng t oward the cent er of t he tr ack.

The obj ect i ve of t he l earni ng

i s

t o t each t he contr ol l er how t o bal ance

t he pendul um but t he probl emher e

i s

what ki nd of si gnal shoul d be gi ven

as t eachi ng s i gnal . I n general , when neur al net wor ks are empl oyed i n

cont r ol appl i cat i ons, i t becomes a pr obl em t o deci de what ki nd

o f

si gnal

have t o be gener at ed as t eacher . Here, we ut i l i ze cer t ai n a pr i ori

know edge suggest i ng whi ch ki nd of cont r ol si gnal shoul d be appl i ed t o

bal ance the pendul umat a speci f i ed posi t i on.

When consi der i ng t he i nt r oduct i on of such ki nd of a pr i ori know edge i t

i s possi bl e t o t ake i nt o account t wo al t ernat i ves. I n t he f i r st l i ne t he

cont r ol r ul e i s di r ectl y speci f i ed on t he f orce. Thi s appr oach was

empl oyed by Ki t amur a and Sai t oh as was ment i oned i n t he I nt r oduct i on. I n

t he second l i ne t he know edge i s descri bed on t he st ate space of t he pl ant ,

and t he cont r ol r ul e speci f yi ng t he f orce i n t er ms of t he st ate var i abl es

216



of t he pl ant

i s

i ndi r ect l y deri ved f r omi t . Here, we adopt t he second

approach because,

i n

gener al , t he know edge

s

gi ven on t he st ate var i abl es

space w th l ess di f f i cul ty .

The a pr i ori know edge consi der ed her e

i s

as f ol l owS:

( Kl ) when t he pendul um i s f al l i ng f r om t he ver t i cal posi t i on and we change

t he angul ar vel oci t y of t he pendul umby cert ai n amount i n the opposi t e

di r ecti on on whi ch t he pendul um

i s

f a l l i ng, the pendul um w l l be

f orced t o move i n the di r ect i on of t he angul ar vel oci t y.

( K2) when t he car t i s movi ng at a cert ai n di st ance f r omt he cent er t he

track

and t he pendul um i s ver t i cal , t he pendul ummust t end t o f al l i n

t he di r ect i on of t he cent er of t he t r ack. However, i f t he car t i s

movi ng t o t he r i ght f or exampl e, t hen t he pendul ummust t end t o f al l

i s

the opposi t e di recti on (l eft ). I f i t

i s

movi ng t o the l ef t t he

pendul ummust t end t o f al l i n t he opposi t e di r ect i on (r i ght) .

Ut i l i zi ng a combi nat i on of t he above ment i oned paramet ers, t he desi r ed

t ar get val ue w ( t t T) of t he angul ar vel oci t y of t he pendul um at t i me t +T,

can be wr i t t en speci f i cal l y i n equat i on f or mas f ol l ows.

10)

Wher e h i s t he si gmoi dal f uncti on def i ned by 9 ) . As f or t he ot her st ate

var i abl es, t he t argets are not speci f i ed. I t shoul d be noted t hat t he

obj ecti ve of t he l earni ng i s descr i bed on t he st ate space whi ch gener -

at ed by t he f orce, not on the force i t sel f .

The backpr opagat i on l ear ni ng can al so be used f or t r ai ni ng t he cont r ol -

l er . The t r ai ni ng consi st s of t hr ee st eps. I n t he f or war d pr opagat i on

st ep, t he out put of each pr ocessi ng uni t i n t he neur al cont r ol l er and

i dendi f i er ar e cal cul at ed i n the f orward di recti on: it i s t he same as i n

t he l ast secti on, not r equi r i ng t o repeat the cal cul at i on. I n t he backwar d

pr opagati on st ep t he err or between t he t arget out put speci f i ed by t he

equat i on

10)

and t he act ual out put at t he act ual angul ar vel oci t y of t he

Pol e i s cal cul at ed, and i t i s appl i ed t o the corr espondi ng out put uni t of

t he neur al i dent i f i er and

i s

propagat ed backward t hrough t he neural i den-

t i f i er

t o

t he neur al cont r ol l er . The modi f i cat i on of t he connect i on t akes

Pl ace onl y i n the neur al cont r ol l er, i n the same way as t he l ear ni ng i n the

neur al i ndent i f i er descri bed i n the l ast secti on.

of

d

wd( t +T)

=

2h( 5 e

+

O.lx t )+O 2v(

t

s

5 SIMULATION

RESULTS

The cart was al l owed to move i n an i nt erval equal t o f i ve t i mes t he

l engt h of the pol e. The pol e was f r ee t o f al l f r oma ver t i cal posi t i on

( zero degr ee) t o pl us or m nus s i xty degr ees.

Each r un st ar t ed f r om an i ni t i al st at e i n whi ch the st ate var i abl es ar e

chosen at r andomand ended when the pendul um o r t he cart surpassed t he

l i m t s ment i oned bef ore or a key was pr essed on t he keyboard. The wei gths

of the connect i ons i n t he t wo neur al networks are i ni t i al i zed at r andom

w t h val ues r angi ng f rom 1 t o 1.

Fi g. 3 shows t he si mul ati on r esul t s. The t hr ee cur ves i l l ust r at ed i n

t he f i gur e r epr esents t he r esponse of t he i nver t ed pendul umcont r ol l ed by

t he neur al cont r ol l er at t hr ee t i mes dur i ng l earni ng. I n order t o eval uat e

t he eff ect i vi t y of t he l ear ni ng, we st opped t he l ear ni ng pr ocess t em

por ari l y af t er t he l ot h, 15t h, and 20t h r uns and observed t he r esponses of

the pendul umuti l i zi ng the same i ni t i al val ues.

Fi g. 3. a. shows t he curves cor r espondi ng t o t he angl e of t he pendul um

and f i g. 3. b. shows the out put cur ves corr espondi ng t o t he posi t i on of t he

car t . I n t he ear l y st age of t he l ear ni ng ( dott ed and chai ned l i nes) , t he

217



t arget val ues were not achi eved. At t he 20th r un ( sol i d l i ne) t he angl e of

t he pendul um and t he posi t i on of t he car t change consi der abl y i ni t i al l y,

but gr adual l y t he desi r ed val ues

of

t he angl e and posi t i on ar e r eached and

t he pendul um

i s

bal anced f or a l ong peri od of t i me, whi l e the car t i s

posi t i oned at t he cent er o f t he t r ack. Cl ear l y t he perf ormance i mproves

w t h l earni ng.

Af t er t he l ear ni ng i s f i ni shed, i f t he ext er nal di st ur bance i s added,

t he cont rol l er

i s

abl e t o r est or e t he pendul um t o t he ver t i cal posi t i on

( zer o degr ee) af t er t he car t moves back and f ort h two or t hr ee t i mes.

6 . CONCLUSI ON

The neur al i dent i f i er was abl e t o r epr esent t he dynamcs o f t he i n-

ver t ed pendul um Nonl i near i t y i n t he model was essenti al f or accur ate

model i ng of t he dynamcs.

r equi r ed

a nonl i near cont r ol l er, i mpl ement ed by another l ayered neur al network. The

l ear ni ng i s done i n a l ayered neural net work usi ng t he back- propagat i on

met hod.

I ntr oduci ng a pr i ori know edge pr oved t o be usef ul i n t he i mpl ement a-

t i on

of

t he l earni ng al gori t hm i n whi ch t he cont r ol r ul e was gi ven i n the

st at e var i abl es space gener at ed by t he f or ce, not on t he f orce i t sel f .

Si mul at i on r esul t s showed t hat t he l ear ni ng al gori t hmper f ormed ver y

wel l , r equi r i ng onl y a f ew t ens of r uns t o t r ai n t he neur al cont r ol l er t o

bal ance pendul um successf ul l y whi l e t he cart was gui ded t o t he cent er

of t he tr ack.

Cont r ol l i ng t he nonl i near ki nemati cs

of

t he i nvert ed pendul um

t he

REPEREBCES

1)

A.G. Bar t o, R. S. Sut t on, and C. W Anderson: Neur onl i ke adapt i ve el e-

ment s t hat can sol ve di f f i cul t l ear ni ng cont r ol pr obl ems. I EEE

Transact i ons on Syst em Man, and Cyberneti cs, smc-13( 5) , 1983.

2)

MI . J ordan and R. A. J acobs: l ear ni ng t o cont r ol an unst abl e system

w t h f orward model i ng.

3 )

D E

Rumel har t . G E. Hi nt on, and R. J . Wl l i ams: Lear ni ng i nt er nal r epr e-

sent ati ons by err or pr opagati on. I n Rumel hart and McCl el l and (eds. ) ,

Par al l el Di st r i but ed Proces6i ng. vol .

1,

chap.

8

M T Pr ess, Cambr i dge,

4) S.

Ki t amur a and

M

Sai t oh: St abi l i t y Lear ni ng Cont r ol of t he i nver t ed

pendul um usi ng neural networ ks ( i n J apanese) . Know edge and I ntel l i gent

syst ems si mposi um pp. 61-64. Mar ch, 1990.

MA 318- 362, 1986.

218



F i g 1 :

0

I n v e r t e d p e n d u l u m .

e t l

W ( t 1

P L A N T

X l t l

( 1 1

c o n t r o l l e r

m o d e l

F i g

2 :

W h o l e S y s t e m

__----

u n n u m b e r

I O

----

u n n u m b e r 1 6

- Ru n n u m b e r 2 0

c

I

1.11

a: A n g l e o f t h e

p e n d u l u m

1.11

219

balancing inverted pendulum using neurral networks

Documents