balancing inverted pendulum using neurral networks
TRANSCRIPT
8/20/2019 Balancing Inverted Pendulum Using Neurral Networks
http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 1/6
8/20/2019 Balancing Inverted Pendulum Using Neurral Networks
http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 2/6
I
2 IllVERTEDPmuI m
i nver t ed pendul umconsi st s of a movabl e cart and a pol e mount ed on
i t w t h a pi vot havi ng no f r i cti on, as shown i n f i g. 1. The car t i s al -
l owed t o move w t hi n t he bounds of a one- di mensi onal hor i zontal t r ack and
t he pendul um i s
f r ee t o f al l ar ound t he pi vot . The obj ecti ve
i s
t o bal ance
t he pol e and gui de the car t t o a speci f i ed posi t i on on t he tr ack.
The dynamcs of t he systemobeys t he f ol l ow ng nonl i near di f f er ent i al
equati ons.
The
d0/ dt = w 1)
( f +m l wsi ne)
gsi n0- cos8
mc + m
41 - l mDcos O
w dt = g, ( O, w, f ) =
c
c+ m
P
dx/dt = v
f +ml {w2si n0- ( dw/ dt ) cos0)
P
m
v/ dt = gZ( O, WdWdt , f ) =
C
wher e
f :
0:
W
X:
V
g
m
m
1
C
P
f orce appl i ed t o the car t s center of mass,
angl e of t he pendul umw t h r espect t o t he ver t i cal ,
=dE)/ dt): angul ar vel oci t y of t he pendul um
hori zont al posi t i on of the car t rel at i ve t o the tr ack,
=dx/dt) :
2
=9.8m/s
: accel er at i on due t o gr avi t y,
( =O. O Kg) : mass of t he car t ,
hori zont al vel oci t y of t he car t ,
3 )
4)
( =O. l Kg) : mass of t he pol e,
( =0. 5m: hal f of t he pol e l engt h.
I n the comput er si mul ati on descr i bed i n
5,
t he above di f f er ent i al equa-
t i ons
ar e t r ansf ormed i nt o t he f ol l ow ng di f f er ence equat i ons w t h a ti me
st ep of T=O. Ol s, usi ng the Eul er appr oxi mati on.
These equat i ons are assumed t o be not known: t hey wer e onl y used t o si mu-
l ate the dynamcs of t he pl ant .
3. LEARNI NG
TEE
MDEL
I n order t o deter m ne the appr opr i ate f orces to bal ance the pol e at t he
speci f i ed posi t i on i t i s needed t o know t he dynamcs
or t o l ear n a model of
t he i nvert ed pendul um Lear ni ng a model of t he i nvert ed pendul um i s basi -
cal l y a system i dent i f i cati on probl em Here i t i s done by a mul t i l ayer
neur al net work capabl e of model i ng nonl i near pl ant s.
215
8/20/2019 Balancing Inverted Pendulum Using Neurral Networks
http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 3/6
Fi g. 2 . shows a schemati c di agr amof t he whol e syst em I t consi st s of
t he pl ant ( t he i nver t ed pendul um and t wo neur al net wor ks, one ( r i ght i n
t he f i gure) f or i dent i f i cat i on and t he other ( l ef t ) f or cont r ol . The
neur al net works are both st andard l ayered ones usi ng t he f ol l ow ng si g-
moi dal acti vat i on f uncti on.
h(z)=
[l-exp -z)l/[l+exp -z)l. 9)
The neur al i dent i f i er consi st s of f our out put uni t s and t hr ee hi dden
uni t s. The state var i abl es,
0.
w
x and v, observed f r omt he pl ant and the
cont r ol f orce, f , pr ovi ded by t he cont r ol l er are appl i ed to t he net work as
i nput s. The hi dden uni t s r ecei ve t hese var i abl es, and t he out put uni t s
r ecei ve t hem as wel l as t he out put s of t he hi dden uni t s. Her e we are
assum ng t hat t he st at es of t he pl ant ar e di r ect l y observabl e w t hout any
ki nd of ext ernal i nt er f er ence. Four out put s of t he net work r epr esent s t he
pr edi ct i on of t he stat e var i abl es of t he i nver t ed pendul um at t he next t i me
st ep.
The neur al net wor k i s t r ai ned usi ng t he back- propagati on l ear ni ng
al gori t hmsuch t hat t he out put of t he network cl osel y matches t he out put of
t he r eal pl ant . The t r ai ni ng pr ocess begi ns w t h an i ni t i al st at e whi ch
i s
generat ed r andom y i n each r un. At t i ne t , dur i ng each r un, t he i nput of
t he neural net wor k i s set equal t o t he cur r ent st at e of t he pl ant . The
neural net wor k i s t r ai ned, usi ng the val ue of t he next st at e of t he pl ant
as a desi r ed r esponse, so as t o pr edi ct t he next st at e o f t he pl ant at t i me
t +T.
At each sampl i ng t i me, t he cal cul ati on i s perf ormed i n t hree st eps. I n
t he f orward pr opagati on st ep, t he out put
of
each pr ocessi ng uni t i n t he t wo
neur al nets
i s
cal cul at ed l ayer by l ayer f rom i nput t o out put . I n t he
backward propagat i on st ep t he err or bet ween t he t arget out put and t he
act ual out put
i s
cal cul at ed, and i t i s propagat ed backwar d t hrough the
network onl y i n the neural
i dent i f i er . The wei ght modi f i cat i on t akes pl ace
af t er t he backward pr opagat i on st ep i s compl eted. The wei ght o f al l i nt er -
connect i ons bet ween uni t s are modi f i ed usi ng t he out put of each uni t and
t he backpr opagated err or .
4. LEARNING
FE
CONTROLLER
The neur al cont r ol l er i s composed of f our i nput uni t s ( corr espondi ng t o
t he st ate vari abl es). one out put uni t ( pr oduci ng t he cont r ol si gnal ) , and
t hr ee hi dden uni t s. I t cont ai ns di r ect connecti ons f r om i nput to out put .
At each sampl i ng t i me. t he neur al cont r ol l er recei ves f our st at e var i abl es
of t he pl ant . Empl oyi ng t hi s i nf ormati on, the neur al cont r ol l er determ nes
t he f orce necessar y f or mai nt ai ni ng t he pendul umver t i cal and t he cart
movi ng t oward the cent er of t he tr ack.
The obj ect i ve of t he l earni ng
i s
t o t each t he contr ol l er how t o bal ance
t he pendul um but t he probl emher e
i s
what ki nd of si gnal shoul d be gi ven
as t eachi ng s i gnal . I n general , when neur al net wor ks are empl oyed i n
cont r ol appl i cat i ons, i t becomes a pr obl em t o deci de what ki nd
o f
si gnal
have t o be gener at ed as t eacher . Here, we ut i l i ze cer t ai n a pr i ori
know edge suggest i ng whi ch ki nd of cont r ol si gnal shoul d be appl i ed t o
bal ance the pendul umat a speci f i ed posi t i on.
When consi der i ng t he i nt r oduct i on of such ki nd of a pr i ori know edge i t
i s possi bl e t o t ake i nt o account t wo al t ernat i ves. I n t he f i r st l i ne t he
cont r ol r ul e i s di r ectl y speci f i ed on t he f orce. Thi s appr oach was
empl oyed by Ki t amur a and Sai t oh as was ment i oned i n t he I nt r oduct i on. I n
t he second l i ne t he know edge i s descri bed on t he st ate space of t he pl ant ,
and t he cont r ol r ul e speci f yi ng t he f orce i n t er ms of t he st ate var i abl es
216
8/20/2019 Balancing Inverted Pendulum Using Neurral Networks
http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 4/6
of t he pl ant
i s
i ndi r ect l y deri ved f r omi t . Here, we adopt t he second
approach because,
i n
gener al , t he know edge
s
gi ven on t he st ate var i abl es
space w th l ess di f f i cul ty .
The a pr i ori know edge consi der ed her e
i s
as f ol l owS:
( Kl ) when t he pendul um i s f al l i ng f r om t he ver t i cal posi t i on and we change
t he angul ar vel oci t y of t he pendul umby cert ai n amount i n the opposi t e
di r ecti on on whi ch t he pendul um
i s
f a l l i ng, the pendul um w l l be
f orced t o move i n the di r ect i on of t he angul ar vel oci t y.
( K2) when t he car t i s movi ng at a cert ai n di st ance f r omt he cent er t he
track
and t he pendul um i s ver t i cal , t he pendul ummust t end t o f al l i n
t he di r ect i on of t he cent er of t he t r ack. However, i f t he car t i s
movi ng t o t he r i ght f or exampl e, t hen t he pendul ummust t end t o f al l
i s
the opposi t e di recti on (l eft ). I f i t
i s
movi ng t o the l ef t t he
pendul ummust t end t o f al l i n t he opposi t e di r ect i on (r i ght) .
Ut i l i zi ng a combi nat i on of t he above ment i oned paramet ers, t he desi r ed
t ar get val ue w ( t t T) of t he angul ar vel oci t y of t he pendul um at t i me t +T,
can be wr i t t en speci f i cal l y i n equat i on f or mas f ol l ows.
10)
Wher e h i s t he si gmoi dal f uncti on def i ned by 9 ) . As f or t he ot her st ate
var i abl es, t he t argets are not speci f i ed. I t shoul d be noted t hat t he
obj ecti ve of t he l earni ng i s descr i bed on t he st ate space whi ch gener -
at ed by t he f orce, not on the force i t sel f .
The backpr opagat i on l ear ni ng can al so be used f or t r ai ni ng t he cont r ol -
l er . The t r ai ni ng consi st s of t hr ee st eps. I n t he f or war d pr opagat i on
st ep, t he out put of each pr ocessi ng uni t i n t he neur al cont r ol l er and
i dendi f i er ar e cal cul at ed i n the f orward di recti on: it i s t he same as i n
t he l ast secti on, not r equi r i ng t o repeat the cal cul at i on. I n t he backwar d
pr opagati on st ep t he err or between t he t arget out put speci f i ed by t he
equat i on
10)
and t he act ual out put at t he act ual angul ar vel oci t y of t he
Pol e i s cal cul at ed, and i t i s appl i ed t o the corr espondi ng out put uni t of
t he neur al i dent i f i er and
i s
propagat ed backward t hrough t he neural i den-
t i f i er
t o
t he neur al cont r ol l er . The modi f i cat i on of t he connect i on t akes
Pl ace onl y i n the neur al cont r ol l er, i n the same way as t he l ear ni ng i n the
neur al i ndent i f i er descri bed i n the l ast secti on.
of
d
wd( t +T)
=
2h( 5 e
+
O.lx t )+O 2v(
t
s
5 SIMULATION
RESULTS
The cart was al l owed to move i n an i nt erval equal t o f i ve t i mes t he
l engt h of the pol e. The pol e was f r ee t o f al l f r oma ver t i cal posi t i on
( zero degr ee) t o pl us or m nus s i xty degr ees.
Each r un st ar t ed f r om an i ni t i al st at e i n whi ch the st ate var i abl es ar e
chosen at r andomand ended when the pendul um o r t he cart surpassed t he
l i m t s ment i oned bef ore or a key was pr essed on t he keyboard. The wei gths
of the connect i ons i n t he t wo neur al networks are i ni t i al i zed at r andom
w t h val ues r angi ng f rom 1 t o 1.
Fi g. 3 shows t he si mul ati on r esul t s. The t hr ee cur ves i l l ust r at ed i n
t he f i gur e r epr esents t he r esponse of t he i nver t ed pendul umcont r ol l ed by
t he neur al cont r ol l er at t hr ee t i mes dur i ng l earni ng. I n order t o eval uat e
t he eff ect i vi t y of t he l ear ni ng, we st opped t he l ear ni ng pr ocess t em
por ari l y af t er t he l ot h, 15t h, and 20t h r uns and observed t he r esponses of
the pendul umuti l i zi ng the same i ni t i al val ues.
Fi g. 3. a. shows t he curves cor r espondi ng t o t he angl e of t he pendul um
and f i g. 3. b. shows the out put cur ves corr espondi ng t o t he posi t i on of t he
car t . I n t he ear l y st age of t he l ear ni ng ( dott ed and chai ned l i nes) , t he
217
8/20/2019 Balancing Inverted Pendulum Using Neurral Networks
http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 5/6
t arget val ues were not achi eved. At t he 20th r un ( sol i d l i ne) t he angl e of
t he pendul um and t he posi t i on of t he car t change consi der abl y i ni t i al l y,
but gr adual l y t he desi r ed val ues
of
t he angl e and posi t i on ar e r eached and
t he pendul um
i s
bal anced f or a l ong peri od of t i me, whi l e the car t i s
posi t i oned at t he cent er o f t he t r ack. Cl ear l y t he perf ormance i mproves
w t h l earni ng.
Af t er t he l ear ni ng i s f i ni shed, i f t he ext er nal di st ur bance i s added,
t he cont rol l er
i s
abl e t o r est or e t he pendul um t o t he ver t i cal posi t i on
( zer o degr ee) af t er t he car t moves back and f ort h two or t hr ee t i mes.
6 . CONCLUSI ON
The neur al i dent i f i er was abl e t o r epr esent t he dynamcs o f t he i n-
ver t ed pendul um Nonl i near i t y i n t he model was essenti al f or accur ate
model i ng of t he dynamcs.
r equi r ed
a nonl i near cont r ol l er, i mpl ement ed by another l ayered neur al network. The
l ear ni ng i s done i n a l ayered neural net work usi ng t he back- propagat i on
met hod.
I ntr oduci ng a pr i ori know edge pr oved t o be usef ul i n t he i mpl ement a-
t i on
of
t he l earni ng al gori t hm i n whi ch t he cont r ol r ul e was gi ven i n the
st at e var i abl es space gener at ed by t he f or ce, not on t he f orce i t sel f .
Si mul at i on r esul t s showed t hat t he l ear ni ng al gori t hmper f ormed ver y
wel l , r equi r i ng onl y a f ew t ens of r uns t o t r ai n t he neur al cont r ol l er t o
bal ance pendul um successf ul l y whi l e t he cart was gui ded t o t he cent er
of t he tr ack.
Cont r ol l i ng t he nonl i near ki nemati cs
of
t he i nvert ed pendul um
t he
REPEREBCES
1)
A.G. Bar t o, R. S. Sut t on, and C. W Anderson: Neur onl i ke adapt i ve el e-
ment s t hat can sol ve di f f i cul t l ear ni ng cont r ol pr obl ems. I EEE
Transact i ons on Syst em Man, and Cyberneti cs, smc-13( 5) , 1983.
2)
MI . J ordan and R. A. J acobs: l ear ni ng t o cont r ol an unst abl e system
w t h f orward model i ng.
3 )
D E
Rumel har t . G E. Hi nt on, and R. J . Wl l i ams: Lear ni ng i nt er nal r epr e-
sent ati ons by err or pr opagati on. I n Rumel hart and McCl el l and (eds. ) ,
Par al l el Di st r i but ed Proces6i ng. vol .
1,
chap.
8
M T Pr ess, Cambr i dge,
4) S.
Ki t amur a and
M
Sai t oh: St abi l i t y Lear ni ng Cont r ol of t he i nver t ed
pendul um usi ng neural networ ks ( i n J apanese) . Know edge and I ntel l i gent
syst ems si mposi um pp. 61-64. Mar ch, 1990.
MA 318- 362, 1986.
218
8/20/2019 Balancing Inverted Pendulum Using Neurral Networks
http://slidepdf.com/reader/full/balancing-inverted-pendulum-using-neurral-networks 6/6
F i g 1 :
0
I n v e r t e d p e n d u l u m .
e t l
W ( t 1
P L A N T
X l t l
( 1 1
c o n t r o l l e r
m o d e l
F i g
2 :
W h o l e S y s t e m
__----
u n n u m b e r
I O
----
u n n u m b e r 1 6
- Ru n n u m b e r 2 0
c
I
1.11
a: A n g l e o f t h e
p e n d u l u m
1.11
219