05 rdb productivity
TRANSCRIPT
-
8/10/2019 05 RDB Productivity
1/9
Th e 98 A C M Turing Award Lecture
Delivered at ACM '81, Los Angeles , ,California , November 9, 1981
Th e 1981 A CM Tur i ng A w ar d w as pr esen ted to Edgar F . Codd, an I B M
Fellow of the San Jose Re search La bora tory, by Pres ident Peter Denr~ing on
No vem ber 9, 1981 at the AC M Annu al Conference in Los Angeles , California .
~t is the Associat ion' s foremost award fbr technical contr ibutions to the com-
p u t i n g c o m m u n i t y .
Codd w as se lec ted by the A CM G ener a l Technica l A chievement A w ar d
Com mit te e f or h is f und amen ta l and cont inu ing cont r ibu t ions to the theor y
and pr ac t ice of da tabase m anag emen t sys tems . The or ig ina tor of the r e la t iona l
mode l f or da tabases , Codd has made t~ r the r impor tan t cont r ibu t ions in the
deve lopm ent of r e la tiona l a lgebr a , r e la t iona l ca lcu lus , and nor mal iza t ion of
relations.
Edgar F . Codd jo ined I BM in 1949 to pr epar e pr ogr ams f or the Se lec t ive
Sequence Electronic Calculator . S ince then, his work in computing has encom-
passed ~ogicat des ign of comp uters ( IB M 701 and Stretch) , man agi ng a comp uter
cente r in Canad a , heading the d eve lopment of one of the f ir st oper a t ing sys tems
w i th a gener a l mu i t ip r o gr am min g capabi l i ty , cont r ibu t ing to the log ic of s el t:
r epr oduc ing au to mata , deve loping h igh leve l t echniques f or sof tw ar e spec if ica -
t ion , c r ea t ing and ex tending the r e la t iona l appr oach to da tabase management , and deve loping an Engl i sh ana lyz ing
and syn thesizing subsy stem for casual users of re lat ional databa ses . He is a lso the au thor of Cellular Automata,an ear ly
volume in the A CM Monogr aph Ser ies .
Codd r ece ived h i s B .A. and M.A . in Ma them at ics f r om O xf or d U niver s i ty in England , and h is M.Sc. and Ph .D .
in Co mp ute r and Com mun ica t ion Sc iences f rom the U niver s i ty of Michigan . H e i s a Me mbe r of the N a t iona l
A cademy of Engineer ing ( U SA) and a Fe llow of the Br i t ish Co mpu te r Soc ie ty .
The A C M Tu r ing A w ar d i s p r esen ted each yea r in comm emo r a t ion of A . M. Tur ing , the Engl ish mathe mat i c ian
w ho made major cont r ibu t ions to the comput ing sc iences .
Re lation al Database A Practical Fou nda tion for
Productivity
E. F. Codd
IB M S an Jose R esea rch L abora to ry
I t i s w e l l know n tha i the gr ow th in demand s f r om end
user s f or new appl ica t ions i s ou tst r ipp ing the capabi l i ty
of da ta pr oc ess ing de par tme nts to implement the cor r e -
sponding appl ica t ion pr ogr ams . Ther e a r e tw o
comple-
mentary a p p r o a c h e s t o attacking this problem (and both
approaches are needed): one is to put end users into
di r ec t touch w i th the
information stored in computers;
the n the r i s to inc r ease the productivity of data process-
ing professionals in the development of application pro-
grams. It is less well known that a single technology,
Author's Present Address: E. F. Co dd, IBM Rese arch Laboratory,
5600 Cottle Road , San Jose, CA 95193.
Permission to copy without fee all or part of this material is
granted provided t hat the copies are not made or distributed for direct
commercial advan tage, the ACM co pyrightnotice and the title of the
publication and its d ate app ear, and notice is given hat co pying is by
permission of the Association for Computing Machinery. To copy
otherwise, or to republish , requires a fee and/or specificpermission.
1982 ACM 0001-0782/82/0200-0109 $00.75
11}9
r e la t iona l
database management,
provides a pract ical
f ounda t ion tbr bo th appr oaches . I t i s expla ined w hy th i s
is so.
Whi le deve loping th i s
productivity theme, it
is noted
tha t the t ime has com e to dr aw a ve r y sha r p l ine between
r e la t iona l and non-relational database sys tems , so tha i
the label re la t ion al wil l not be used in mis leadin g ways.
The key to drawing this l ine is something cal led a
relational processing capability.
CR Categor ies and Su bjec t D esc r ip tor s : H .2 ,0 [ D a tabas~
Management]: General; H.2.1 [Database Management]:
Logica l Design-data models; H . 2 . 4 [ D a t a b a s e M a n a g e -
mea t ] : Sys tems
G e n e r a l T e r m s : H u m a n F a c t o r s , L a n g u a g e s
A ddi t iona l K ey Wor ds and Phr ases : da tabase , r e la t iona l
da tabase , r e la t iona l mode l , da ta s t r uc tur e , da ta manip-
u la t ion , da ta in tegr i ty , p r oduc t iv i ty
Communications Februa ry 1982
of Volume 25
the ACM Number 2
-
8/10/2019 05 RDB Productivity
2/9
1t. ilntroduc*ien
[t is general ly admitted that there is a productivi ty
c r i s i s in the deve lopment o f " running code" for corn-
merc ia l and indus t r ia l app l ica t ions . The growth in end
use r demands fbr new appl ica t ions i s ou ts t r ipp ing the
capabi l i ty o f da ta processing depa r tments to im plem ent
the corresponding applicat ion programs. In the la te s ix-
t ie s and ea r ly sevent ie s many people in the comput ing
f ie ld hoped tha t the in t roduc t ion of da tabase manage -
ment sys tems (commonly abbrev ia ted DBMS) would
markedly inc rease the produc t iv i ty o f app l ica t ion pro-
grammers by removing many of the i r prob lems in han-
d l ing input and output f i l e s . DBMS (a long wi th da ta
dict ionar ies) appear to have been highly successful as
ins t ruments o f da ta contro l , and they d id remove m any
of the f i l e handl ing de ta i l s f rom the concern o f app l ica-
t ion programmers . Why then have they fa i led a s pro-
ductivi ty boosters?
There are three pr incipal reasons:
(1 ) These sys tems burdened ap pl ica t ion programm ers
wi th n um erous concepts tha t were i r r e levant to the i r da ta
re t r ieva l and manipu la t ion ta sks , forc ing them to th ink
and code at a needlessly low level of structura l deta i l ( the
" o w n e r - m e m b e r s et " o f C O D A S Y L D B T G i s a n ou t-
s t a n d in g e xa m p le ) ;
(2 ) No commands were prov ided for process ing mul -
t ip te r ecords a t a t ime-- in o the r words , DBMS did not
suppor t
set p rocessing
and, as a result, progra mm ers were
%rced to th ink and code in te rms o f i te rat ive loops tha t
were often unnecessary (here we use the word "set" in
i ts tradit ional mathematica l sense , not the l inked struc-
t u re s e ns e o f C O D A S Y L D B T G ) ;
(3) Th e :needs of end u sers for direct interaction w ith
da tabases , pa r t icu la r ly in te rac t ion of an unant ic ipa ted
na ture , were inadequa te ly recognize d--a q ue ry capabi l -
i ty was a ssumed to be som eth ing one could add on to a
DB MS a t some la ter t ime .
Looking back a t the da tabase management sys tems
of the la te s ixt ies , we may readi ly observe that there was
no sha rp d is t inc t ion be tween the program mer ' s ( log ica l)
v iew of the da ta and the (phys ica l ) r epresenta t ion o f da ta
in s torage. Ev en th oug h w ha t w as ca l led the log ica l l eve l
usua l ly p rov ided pro tec t ion f rom p lacement expressed in
terms of storage addresses and byte offsets , many stor-
age--oriented concepts w ere an integra l p ar t o f this level~
The adverse impac t on deve lopment produc t iv i ty o f
requ i r ing program mers to nav iga te a long access pa ths to
T h e c r u x o f t h e p r ob l e m w i t h t h e t he C O D A S Y L D B T G o w n e r -
memb er s e t i s t h a t i t co mb i n es i n to o n e co n s t r u c t th r ee o r th o g o n a l
co n cep t s : o n e - to - m an y r e l a t i on s h i p , ex i s t en ce d ep en d en cy , an d a u s e r -
v i s i b l e l i n k ed s t r u c tu r e to b e t r av e r s ed b y ap p l i ca t i o n p r o g r ams . I t i s
th e l a s t o f th es e th r ee co n cep t s th a t p l aces a h eav y an d u n n eces s a r y
n av i g a t i o n b u r d en o n ap p l i ca t i o n p r o g r ammer s , I t a l s o p r es en t s an
i ~ l s u r mo u n tab l e o b s tac l e f b r en d u s e r s .
110
reach t lae target dat a ( in som e cases having to deal
directly with the layout of data ir~ storage and in others
having to fb l low po in te r cha ins ) was
enormous.
In ad-
dit ion, i t was not possibte to make sl ight changes in the
layout in s torage wi tho ut s im ul taneou s ly hav ing to r ev ise
at1 programs that re l ied of~ the previous structure . The
in t roduc t ion of a s. index mi ght hav e a s im i la r e f f?ct . As
a resta i t , f ia t too much manpower was being invested in
contin ual (ar id avoi dable) maintenar~ce of ' app l icat ion
programs.
An othe r consequence was tha t ins ta l la t ion of these
sys tems was o t ten agoniz ing ly s low, due to the la rge
am oun t o f t ime spent in 1ea rn ing about the sys tems and
in p lanning the organ iza t ion of the da ta a t bo th log ica l
and phys ica l l eve ls, p r ior to da tabase ac t iva t ion . The a im
of" this prepl annin g was to "g et i t r ight once an d for a l l"
so as to avoid the need fbr subsequent changes in the
da ta desc r ip t ion tha t , in turn , wo uld force coding changes
in app l ica t ion programs. Such an ob jec t ive was , o f
course , a mirag e , even if sou nd pr incip les for database
des ign had been kn own a t the t ime (and , o f course , they
were not).
To show how re la t iona l da tabase management sys -
tems avoid the three pi tfa l ls c i ted above, we shal l f i rst
r ev iew the m ot iva t ion of the re la t iona l m ode l an d d iscuss
some of i ts features. We shal l then c lassify systems that
are based ctpon that model. As we proceed, we shal l
s t re ss app l ica t ion program mer p roduc t iv i ty , even though
the benefi ts fbr end users are ju st as great , because mu ch
has a l r eady been sa id and demonst ra ted rega rd ing the
valu e of re la t ional datab ase to end u sers (see [23] and
the papers c i ted there in).
2. Met ivat ieri
The m ost im por tan t m ot iva t ion for the re search work
tha t r e su l ted in the re la t iona l mod e l w as the ob jec tive o f
provid ing a sha rp and c lea r bou ndary be tween the log ica l
and phys ica l aspec ts o f da tabase m anag em ent ( inc lud ing
da tabase design , da ta r e t rieva l , and da ta manipu la t ion) .
We cal l this the
data independence objective.
A second ob jec t ive was to make the mode l s t ruc tm-
a l ly simpte , so tha t a l l k inds o f use r s and p rogramm ers
c o u ld ha ve a c o m m o n un d e r s t a n d in g o f t h e d at a , a n d
could the re fore commu nica te wi th one anothe r about the
database. We cal l this the
communicability objective.
A th i rd ob jec t ive was to in t roduce h ig h leve l language
concepts (but not specific syntax) to enable users to
express ope ra t ions upon la rge chunks of in forma t ion a t
a t ime . Th is en ta i led prov id ing a founda t ion for se t -
or iented p rocessing (i .e. , the abi l i t y to express in a s ingle
sta tement the processing of mult iple se ts of records a t a
t ime). W e cal l this the
set-processing objective.
There were other objectives, such. as providing a
sound theore t ica l founda t ion for da tabase organ iza t ion
and management, but these objectives are less re levant
to our present produc t iv i ty theme .
C o m m u n i c a t i o n s F e b r u a r y 1 9 82
o f V o l u m e 2 5
t h e A C M N u m b e r 2
-
8/10/2019 05 RDB Productivity
3/9
3~ The Relational l~ 'iode~
To sat is f ) / these three object ives , i t was necessary to
discard a l l tlhose da ta s t ructu r ing concepts (e .g. , repeat ing
groups , l i nked s t ruc tures ) t ha t were no t f ami l i a r t o end
users and to tak e a f resh lo ok a t the address ing of data .
Pos i t ional concepts have a lways played a s igni f icant
ro l e i n compute r addres s ing , beg inn ing wi th p lugboard
address ing, then absolute numeric address ing, re la t ive
num er i c addres sing , and sym bol i c addres s ing wi th a r i th -
met ic prop ert ies (e .g. , the sym bol ic address A + 3 in
as semble r l anguage ; t he addres s X( I + t , Y - 2 ) o f an
element in a ~ : :or t ran, Algol , or PL/I array named X). In
the re l a t iona l m ode l we rep l ace p os i t iona l addres s ing by
tota l ly associa t ive address ing. Ever y datu m in a re la-
t i ona l da t abase can be un ique ly addres sed by means o f
the re l a t i on , name , p r imary key va lue , and a t t r i bu te
name . Assoc i a t i ve addres sing o f th i s fo rm enab les use rs
(yes, and e ven pro gra m m ers a lso ) to leave i t to the
sys tem to ( t ) de t e rm ine the de ta i l s o f p l acement o f a x lew
piece o f i n fo rm a t ion tha t i s be ing inse r t ed in to a da tabase
and (2 ) s e l ec t appropr i a t e acces s pa ths when re t r i ev ing
data.
A l l i n form a t ion in a re l a t i ona l da t abase i s r epresen ted
by va lues i n t ab l e s ( even t ab l e names app ea r a s cha rac te r
s t r ings in a t leas t one table) . Address ing data by value ,
ra the r than by pos i t i on , boos t s the produ c t i v i ty o f p ro -
gram m ers as wet1 as end u sers (pos i t ions of i tems in
sequences are usual ly subject to change and are not easy
for a person to keep t rack of , especia l ly i f the sequences
conta in many i t ems ) . Moreove r , t he f ac t t ha t p rogram-
mers and end users a l l address data in the same way goes
a long way to mee t ing the communicab i l i t y ob jec t i ve .
The n-ary re la t ion was chosen as the s ingle aggregate
s t ruc ture fo r t he re l a t i ona l mode l , because wi th appro-
pr i a t e ope ra tors and an ap prop r i a t e conceptua l represen-
ta t ion ( the table) i t sa ti s f ies a l l three of the c i ted objec-
t ives . Note that an n-ary re ta t ion is a mathemat ica l se t ,
i n w hich the orde r ing o f rows i s imm ate r ia l .
Som et imes the fo l l owing q ues t ions a r i se : Wh y ca ll i t
t he re l a ti ona l m ode l ? W hy n ot ca l l i t t he t abu la r m ode l ?
There are two reasons : (1) At the t ime the re la t ional
mo de l was i n t roduced , m any peop le i n da ta proces s ing
fe l t t ha t a re l a t i on (o r re l a t i onsh ip) among two or more
ob jec t s mus t be represen ted by a l i nked da ta s t ruc ture
( so the na m e was s e l ec t ed to cou nte r t h is mi sconcept ion) ;
(2) Tab les are a t a lower level of abs t ract ion than re la-
t ions , s ince they give the impress ion that pos i t ional (ar-
ray- type ) addres s ing i s app l i cab le (which i s no t t rue o f
n -a ry re l a ti ons ) , and they f a i l to show tha t t he i n form a-
t ion conten t o f a t ab l e i s i ndependent o f row orde r .
Nevertheless , even wi th these minor f laws , tables are the
mos t im po r t an t conceptu a l represen ta tion o f re l a ti ons ,
because they a re un ive rsa l l y unders tood .
Inc iden ta l l y , i f a da t a m ode l i s to be cons ide red a s a
se rious a l t e rna t i ve fo r t he re l a t i ona l mode l , i t t oo should
have a c l ea r ly de f ined conceptua l represen ta t ion fo r
database ins tances . Such a representat ion fac i l i ta tes
111
think: iag abou t the ef fects of whatev er operat io ns are
und er cons ide ra tion . I t i s a requ i rem ent ~br prog ram me r
and end-user product ivi ty. Such a representat ion is
rare ly, i f ever, discussed in data mo dels that use concepts
:such as ent i t ies and re la t ionships , or in funct ional data
mode l s . Such mode l s f requent ly do no t have any ope r -
a tors ei ther Nev ertheless , they may be useful tbr certa in
k inds o f da ta type analysi s encounte red in the p roces s o f
es tabl ishing a new database , especia l ly in the very ear ly
s tages o f de t e rmin ing a p re l imina ry in form a l o rgan i za -
t ion. This leads to the ques t ion: What i s a data model?
A data m odel i s, of course , not jus t a d ata s t ructure ,
a s many people s eem to th inL I t i s na tura l t ha t t he
pr inc ipa l da t a mode l s a re named a f t e r t he i r p r inc ipa l
s t ructures , but that i s not the whole s tory.
A data m ode l [9] is a com binat io n of a t leas t three
component s :
(1) A col lect ion of data s t ructure types ( the database
building, blocks);
(2) A col lect ion of operators or rules of inference ,
which can be ap pl ied to .any val id ins tances of the data
types t i s ted in (1) , to re t r ieve , der ive , or modify data
f rom any pa r t s o f t hose s t ruc tures i n any combina t ions
des i red;
(3) A col lect ion of genera l in tegr i ty rules , which im-
pl ic i t ly or expl ic i t ly def ine the se t of cons is tent database
s ta tes or changes of sta te or bo th- - th ese rules are genera l
in the sense that they apply to any database us ing this
mode l ( i nc iden ta l l y , t hey may somet imes be expres sed
as i nse r t -up da te -de l e t e ru l es ) .
The re la t ional model i s a data model in this sense ,
and was the f i rs t such to be def ined. W e do no t prop ose
to g ive a de ta i l ed de f in i t i on o f t he re l a t i ona l m ode l
he re - - the or i g ina l de f in i t i on appea red in [7 ] , and an
improved one in Secs . 2 and 3 of [8] . I t s structuralpart
cons is ts of domains , re la t ions of assorted degrees (w i th
tables as thei r pr incipal conceptual representat ion) , a t -
t r ibutes , tuples , candidate keys , and pr imary keys . Under
the pr inc ipa l r epresen tat ion , a t t r ibu te s becom e co lumns
of t ab l es and t up l e s becom e rows, bu t t he re i s no no t ion
of one co lum n succeed ing anothe r o r o f one row suc -
ceeding another as t~r as the database tables are con-
ce rned . In o the r words , t he l e f t to r i gh t o rde r o f co lum ns
and the top to bo t tom orde r o f rows in those t ab le s a re
a rb i t ra ry and i r re levan t .
T h e manipuIativepart of the re l a t iona l m ode l consi s ts
of the a lg ebra ic op erators (se lect, project , join, e tc . ) wh ich
t rans form re l a t ions i n to re l a t i ons ( and he nce t ab l e s i n to
tables).
T h e integrity par t cons ists of two integr i ty rules : ent i ty
integr i ty and referent ia l integr i ty (see [8 , 11] tbr recent
deve lopment s i n th i s l a t t e r a rea ) . In any pa r t i cu l a r ap-
p l i ca t ion o f a da ta m ode l i t ma y be neces sa ry to impose
further (database-speci f ic) integr i ty cons tra ints , and
there by def ine a smal ler se t of cons is tent database s ta tes
or changes of s ta te .
In the deve lop m ent o f t he re l at i ona l mode l , t he re has
a lways been a s t rong coupl ing be tween the s t ruc tura l ,
Comm unications Febru ary 1982
of Volume 25
the ACM Num ber 2
-
8/10/2019 05 RDB Productivity
4/9
m anip ula t ive , and integr i ty aspects . If" the s t ructures are
de f ined a tone and s epa rat e ly , t he i r behav iora l p rope~ ti e s
a re no t p inn ed d own, i n f in i t ely man y pos s ib il i ti e s p resen t
themselv es , and end less specula t ion resul ts . It is therefore
no surpr i s e tha t a tt empt s s uch a s those o f CO D A SY L
and ANS-[ to deve lop da ta s t ruc ture de f in i ti on l anguage
( D D L ) a n d d a t a m a n i p u l a t i o n l a n g u a g e ( D M L ) i n s e p -
a ra t e commi t t ees have y i e tded many mi sunders t and ings
and incompat ibi l i t ies .
4o The Relat ienal Precess i rag CapabWty
The re l a t i ona l mode l ca l l s no t on ly fo r re l a t i ona l
s t ructure s (which can be tho ug ht of as tables) , but a lso
fbr a part ic ular kinnd of se t process ing ca l led
relational
proee.rsing. Rela t iona l p roces s ing en ta i l s t r ea t ing whole
re l a t ions a s ope rands . I t s p r imary purp ose i s l oop-avo id -
ance , an abso lu t e req u i rem ent fo r end use rs to be pro -
duc t i ve a t a l l, and a c t ea r p roduc t i v i ty boos te r fo r app l i -
c a t io n p r o g r a m m e r s .
T h e S E L E C T o p e r a t o r ( a l s o c a l l e d R E S T R I C T ) o f
the re l a t i ona l a l gebra t akes one re la t ion ( table) as oper-
and and produces a new re l a t i on ( t ab l e ) cons i s t i ng o f
s e l ect ed tup l e s ( rows ) o f the f i rs t. The P RO JEC T ope r -
a tor a l so t rans forms one re la t ion ( table) into a new one,
this t ime however cons is t ing of se lected a t t r ibutes (col -
u m n s ) o f t h e fi rs t. T h e E Q U I - J O I N o p e r a t o r t a k es
two
re l a t i ons ( t ab l e s) a s ope rands a nd p roduces a th i rd con-
s i st i ng o f rows o f t he f i r st conca tena ted w i th rows o f t he
second , bu t o n ly where spec i f ied co lumns in the f i rs t and
spec i f i ed co lumns in the s econd have ma tch ing va lues .
t f r edundancy in co lumns i s removed , t he ope ra tor i s
c a l l e d N A T U R A L J O I N . I n w h a t f o l l o w s , w e u s e t h e
t e rm " jo in " to ret%r to e i the r the equ i - j o in o r the na tura l
j o in .
The re l a t i ona l a l gebra , which inc ludes these and
othe r op e ra tors , i s i n t ended as a ya rds t i ck o f power . I t i s
not i n t ended to be a s t anda rd l anguage , t o which a l l
r e l a t i ona l sys tems should adhe re . T he s e t -proces s ing ob-
j ec t i ve o f t he re l a t i ona l mode l i s i n t ended to be me t by
means o f a da ta s ub languag e 2 hav ing a t l ea s t t he pow er
of the re l a t i ona l a l gebra without making use of i terat ion
or recurs ion s tatements .
Mu ch of the de r i vab i l i ty pow er o f the re l a t i ona l
a l g e b r a i s o b t a i n e d f r o m t h e S E L E C T , P R O J E C T , a n d
JOIN ope ra tors a lone , p rov ided the JOIN i s no t s ub jec t
to any implementa t ion re s t r i c t i ons hav ing to do wi th
prede f in i t i on o f s upp or t ing phys i ca l access pa ths. A sys-
t em has an
unres tr ic ted join capab i l i ty
i f i t a l lows joins to
be t aken w here in any pa i r o f a t t r i bu te s m ay be ma tched ,
prov id ing on ly tha t t hey a re de f ined on the s ame dom a in
or da ta type ( fo r our presen t purpose , i t does no t ma t t e r
2 A data sublang uage is a special ized language for database man -
agement , suppor t ing at least data def in i t ion, data ret r ieval , inser t ion,
update , and delet ion. I t need not be computat ional ly complete , and
usual ly is not. In the context of appl icat ion p rogramm ing, i t i s in tended
to be used in con junc t ion w i th one o r m or e p r ogr am m ing l anguages .
~12
whethe r the doma in i s syn tac t i c o r s emant i c and i t does
not ma t t e r whe the r the da ta type i s weak or s t rong , bu t
s ee [10 j fb r c i rcums tances i n which i t does ma t t e r ) .
Occas iona l ly , one f i nds sys t ems in which jo in i s
sup por t ed o n ly i f t he a t t r ibu te s to be ma tc hed have the
same na m e or a re supp or t ed by a ce r t a in type o f p re -
declared access path. Such res t r ic t ions s igni f icant ly im-
pa i r t he pow er o f t he sys tem to de r i ve re l a t i ons f rom the
base re l a t i ons . These re s t r i c t i ons conseq~ent ty reduce
the sys t em 's capab i l i t y to hand le unant i c i pa t ed que r i e s
by end use rs and reduce the chances fb r app l i ca t ion
p r o g r a m m e r s t o a v o i d c o d i n g i t e r a ti v e l o o p s .
Thus , we s ay tha t a da ta s ub language L has a re / a -
tional processing cc~pability i f t he t rans forma t ions spec i -
f i e d b y t h e S E L E C T , P R O J E C T , a n d u n r e s t r ic t e d J O I N
opera tors o f ' t he re l a t i ona l a l gebra can be spec i f i ed in L
wi thou t re sort i ng to comm and s fo r i t e ra t i on or recurs ion .
F o r a d a t a b a s e m a n a g e m e n t s y s t e m t o b e c a t l e d rela-
tional i t mus t s uppor t :
( 1 ) Tab les w i thou t use r -v i s ib l e n av ig a t ion l i nks be -
tween them;
(2) A da ta sub lang uag e w i th a t l ea s t th i s (min ima l )
re l a t i ona l p roces s ing capab i l i t y .
One con seque nce o f th i s is t ha t a D BM S tha t does
not suppor t r e l a t i ona l p roces s ing should be cons ide red
non-relational. S u c h a s y s t e m m i g h t b e m o r e a p p r o p r i -
a te ly ca l led tabular, prov id ing tha t i t s uppor t s t ab l e s
wi th out u se r -v i sib l e nav ig a t ion l i nks be tw een t ab l e s. Th i s
t e r m s h o u l d r e p l a c e t h e t e r m " s e m i - r e l a t i o n a l " u s e d i n
[8 ] , because the re is a l a rge d i f f e rence in imp lem enta t ion
com plex i ty be tween t abu la r sys t ems , in which the pro-
grammer does h i s own nav iga t ion , and re l a t i ona l sys -
t ems , i n which the sys t em does the nav iga t ion fo r h im,
i .e . , the sys tem provides automatic navigat ion.
T h e d e f i n i t i o n o f . r e l a t i o n a l D B M S g i v e n a b o v e i n -
t en t iona l ly pe rm i t s a l o t o f l a t i t ude in the s e rv i ces pro -
v ided . For example , i t i s no t requ i red tha t t he fu l l
r e l a t i ona l a l gebra be suppor t ed , and the re i s no requ i re -
ment i n rega rd to suppor t o f t he two in t egr i ty ru l e s o f
the re l a t i ona l mode l ( en t i t y i n t egr i ty and re fe ren t i a l i n -
t egr i ty ) . Fu l l s up por t b y a re l a t i ona l sys t em of these
l a t t e r two pa r t s o f t he m ode l j us t i f i e s ca l l i ng tha t syst em
fu l l y r e la t iona l
[ 8 ] . A l t h o u g h w e k n o w o f n o s y st e m s th a t
qua l i f y a s tMly re l a t i ona l t oday , some a re qu i t e c lose to
qua l i fy ing , and no doubt w i l l soon do so .
In F ig . 1 we i l l us t ra t e the d i s t i nc t ion b e tween the
va r ious k inds o f re l a t i ona l and t abu la r sys tems . For each
c las s the ex ten t o f shad ing in the S box i s i n t ended to
show the degree o f f i de l i ty o f mem bers o f t ha t c la ss to
the s t ruc tura l r equ i rem ent s o f the re l a t i ona l m ode l . A
s imi l a r remark app l i e s to the M box wi th re spec t t o the
manipu l a t i ve requ i rement s , and to the I box wi th re spec t
to the i n t egr i ty requ i rement s ,
m denotes the min ima l re l a t i ona l p roces s ing capab i l -
i t y . denotes re l a t i ona l comple t enes s ( a capab i l i t y cor -
re sponding to a two-va lued f i r s t o rde r pred ica t e l og i c
w i t h o u t n u l ls ) . W h e n t h e m a n i p u l a t i o n b o x M i s f u l ly
shaded , t h i s denotes a capab i l i t y cor re sponding to the
Com m un ica t ions F eb r ua r y 1982
o f V o l u m e 2 5
t h e A C M N u m b e r 2
-
8/10/2019 05 RDB Productivity
5/9
Fig , 1 . Class i f i cat io~ of DBM S,
5 . The Uni form Re la t iona l P rope r ty
S = S t r u c t u r a l
M = M a n i p u l a t i v e
] =
I n t e g r i t y
T abu l a r
(prev ious ly ca l led
semi -re lat ional )
M i n i m a l l y
Relat ional
g
Relat ional l y
, ~ Co mp l e te
72
F u l l y
R e l a t i o n a l
c = Re l a t iona l c omp l e tenes s
m =: M in imal re la t ional
p roc es s i ng c ap ab i lhy
M
m ~ .
S
1
M
C ---~-
S m ~ l l
M
m . ~
1
M
m 4~.
S I
f u t l r e l a t i ona l a l gebra de f ined in [ 8 ] ( a th ree -va lued
pred ica t e l og i c w i th a s ing l e k ind o f nu ll ) . The ques t ion
m ark in the i n t egr i ty box for each c la ss except the fu l l y
re l a t i ona l i s an ind i ca t ion o f t he presen t i nadequa te
sup por t fo r i n t egr i ty i n re l a t i ona l syst ems . S t ronge r sup-
por t t b r doma ins and pr imary keys i s needed [10 ] , a s
wel l as the kin d of fac i l i ty discussed in [ 14] .
Note tha t a re l a t i ona l DBMS may package i t s r e l a -
t i ona l p roces s ing capab i l i t y i n any conven ien t way . For
e x a m p l e , i n t h e I N G R E S s y st e m o f R e l a ti o n a l T e c h n o l-
o g y , In c ., t h e R E T R I E V E s t at e m e n t o f Q U E L [ 2 9 ]
em bod ies a l l three op erators (se lect , project , join) in one
s t a t ement , i n s uch a way tha t one can ob ta in the s ame
e f fec t a s any one o f t he ope ra tors o r any combina t ion o f
them.
In the de f in i t i on o f t he re l a t i ona l mode l t he re a re
severa l prohibi t ions . To c i te two examples : user-vis ible
nav ig a t ion l i nks b e tween t ab l e s a re ru l ed ou t , and da ta-
base i n form a t ion m us t no t be represen ted (o r h idden) in
the orde r ing o f t up l e s w i th in base re l a ti ons. O ur expe r i -
ence i s t ha t DBMS des igne rs who have implemented
non- re l a t i ona l sys t ems do no t read i ly unders t and and
accept these prohibi t ions . By contras t , users enthus ias t i -
ca l l y un ders t an d and accept t he enhanced ease o f l ea rn -
ing and ease o f use re su l t i ng f rom these proh ib i ti ons .
Inc iden ta l l y , the Re la t iona l Task G rou p o f t he Am er -
ican Nat i ona l S tandar ds Ins t i tute has recent ly i ssued a
repor t [ 4 ] on the f eas ib i l i ty o f deve lop ing a s t anda rd fo r
re la t ional database sys tems. This report conta ins an en-
l i gh ten ing ana lys i s o f t he f ea tures o f a dozen re l a ti ona l
sys tems , and i t s authors c lear ly unders tand the re la t ional
mode l .
In o rde r to have wide app l i cab i l i t y mos t re l a t i ona l
DBM S have a da ta s ub languag e which can be in t e r faced
w i t h o n e o r m o r e o f th e c o m m o n l y u s e d p r o g r a m m i n g
languages (e .g. , Cobol , Fort ran, P L / I , APL) . We sha l l
refer to these la t ter languages as
host languages.
A rela-
t i ona l DBMS usua l ly suppor t s a t l ea s t one end-use r
or i en ted da ta s ub language - - some t imes s eve ra l , because
the needs of these users may vary. Som e pr efer s tr ing
l anguages such a s QU EL or SQL [5 ] , wh i l e o the rs pre fe r
the s c reen-or i en ted two-d imens iona l da t a s ub language
o f Q u e r y - b y - E x a m p l e [ 3 3 ] .
Now, some re la t ional sys tems (e .g. , Sys tem R [6] ,
IN G RE S [29 ]) s uppor t a da ta s ub language tha t i s usab le
in two modes : ( t ) in teract ive ly a t a termina l and (2)
embedded in an app l i ca t ion program wr i t t en in a hos t
l anguage . The re a re s trong a rgum ent s fo r s uch a
double-
mode
da ta sub language :
(1 ) Wi th such a l anguage app l i ca t ion programmers
can separate ly debug a t a terminal the database s ta te-
ment s they w i sh to i ncorpora t e i n the ir app l i ca t ion pro -
g r a m s - - p e o p l e w h o h a v e u s e d S Q L t o d e v e l o p a p p l ic a -
t i on programs c l a im tha t t he doub le -mode f ea ture s i g -
n i f i can t ly enhances the i r p rodu c t i v i ty ;
(2 ) Such a l anguage s i gn i f ican t ly enhances comm u-
n ica t ion among program mers , ana lys ts , end u se rs , da t a -
base administration staft ; etc.;
(3) Fr ivol ous dis t inct ions betw een the langu ages used
in these two modes p l ace an unneces sa ry l ea rn ing and
m em ory burden on those use rs who hav e to work in bo th
modes .
The imp or t ance o f t hi s f ea ture i n produc t i v i ty sug-
ges ts that re la t ional DBMS be c lass i f ied according to
wh ether they possess this feature or not . Accordin gly, we
ca ll t hose re l a t i onal DB MS tha t s uppor t a doub le -m ode
sub language
uniform relational.
Thus , a un i fo rm re l a -
t i ona l DBMS suppor t s re l a t i ona l p roces s ing a t bo th an
end-use r i n t e r f ace and a t an app l i ca t ion programming
interface using a data sublanguage com mon to both inter-
fdce
The na tura l t e rm for a l l o the r re l a t i ona l DBMS i s
non-uniform relational. A n e x a m p l e o f a n o n - u n i t b r m
r e la t io n a l D B M S i s t h e T A N D E M E N C O M P A S S [ 19 ].
With this sys tem, when re t r ieving data interact ive ly a t a
t e rmina l , one uses the re l a t i ona l da t a s ub language EN-
FORM (a l anguage wi th re l a t i ona l p roces s ing capab i l -
i t y ) . When wr i t i ng a program to re t r i eve or manipu l a t e
da ta, on e uses an ex tended ve rs ion o f Cobol ( a l angu age
that does not possess the re la t ion al process ing capabi l i ty) .
Com m on to bo th l evel s o f use a re the s t ruc tures: t ab le s
wi thout use r -v is ib le nav iga t ion l i nks be tween them.
A ques t ion that immediate ly ar ises i s this : how can a
da ta sub language wi th re l a t i ona l p roces s ing capab i l i t y
be in t e rf aced wi th a l anguage such a s Cobol o r P L / I t ha t
can handle data one record a t a t ime only ( i .e . , that i s
incapab le of t reat ing a se t of records as a s ingle operand )?
To so lve th i s p rob lem we mus t s epa ra t e the fo l l owing
113
Co mm u n i ca t i o n s F eb r u a r y 1 982
o f V o l u m e 2 5
t h e A C M N u m b e r 2
-
8/10/2019 05 RDB Productivity
6/9
two ac t ions f ?om one anothe r : ( 1 ) de f in i t i on o f t he re l a -
t i on to be de r i ved ; ( 2 ) p re sen ta t ion o f t he de r i ved re l a t i on
to the hos t l anguage program.
One so lu t ion ( adop ted in the Pe te r l ee Re la t iona l Tes t
V ehic le [3 l ] ) i s to cas t a der iv ed re la t ion in the fbrm of
a f i t e t ha t can be read record-by- record by means o f hos t
lang uag e s ta tements . In this case det ive~ 7 of records i s
delegated to the f i le sys tem used by the pert inent hos t
l a n g u a g e .
Anothe r so lu t ion ( adopted by Sys tem R) i s t o keep
the de l i v e ry o f records under the cont ro l o f da t a s ub lam.
guag e s t a t ement s and , hence , unde r the cont ro l o f t he
re l a t i ona l DB MS opt i rai ze r . A q ue ry s t a temen t Q of
S Q L ( t h e d a t a s u b l a n g u a g e o f
Sys tem
R ) m a y b e e m -
bedded in a hos t l anguage program, us ing the fb l l owing
kind o f phrase ( fo r expds i to ry reasons , t he syn tax i s no t
e x a c t l y th a t o f S Q L )
D E C L A R E C C U R S O R F O R Q
w h e r e C s t a nd s f o r a n y n a m e c h o s e n b y t h e p r o g r a m m e r .
Such a s ta tement associa tes a
cursor
named C wi th the
de f in ing expres s ion Q , Tup les f rom the de r i ved ro t a t i on
de f ined by Q a re presen ted to the prog ram one a t a t ime
b y m e a n s o f th e n a m e d c u r so r . E a c h t i m e a K E T C H p e r
th i s cursor i s executed , t he sys t em de l i ve rs ano the r t up l e
f l ' om the de r i ved re l a ti on . T he orde r o f de l i ve ry is sys -
t em-de te rmined -un le s s the SQL s t a t ement Q de f in ing
the de r i ved re l a t i on conta ins an ORDER BY c l ause .
t t is imp or t an t t o no te tha t i n advanc ing a cursor ove r
a de r i ved re l a t i on the programmer i s
not
engag ing in
nav iga t ion to some t a rge t da t a . The de r i ved re l a t i on i s
i t s e l f t he t arge t da t a I t i s t he DBM S tha t de t e rm ines
whe the r the de r i ved re l a t i on should be ma te r i a l i zed en
bloc
pr io r to the cursor -cont ro l l ed s can or ma te r i a l i zed
piecemeal dur ing the scan. In e i ther case , i t i s the sys tem
(not the programmer ) tha t s e l ec t s t he acces s pa ths by
which the de r i ved da ta i s t o be gene ra t ed . Th i s t akes a
s i gn i f i can t burden o f f t he pro gram m er ' s shoulde rs ,
t he reby inc reas ing h i s p roduc t i v i ty .
6o S kep~ticism A bo st R elat,~ena] Sy stem s
Th ere has been no shor t age o f s kept ic i sm conce rn ing
the prac t i ca l i t y o f t he re l a t i ona l approach to da tabase
m a n a g e m e n t . M u c h o f t h is s k e p ti c is m s t em s f r o m a l a ck
o f u n d e r s t an d i n g , s o m e f r o m a f e a r o f t h e n u m e r o u s
theoret ica l inves t igat ions that are based on the re la t ional
m ode l [1, 2 , 15 , 16, 24] . Ins tead o f we lcom ing a theoret. .
i ca l fbunda t ion a s prov id ing soundnes s , t he a t t i t ude
seems to be: i f i t ' s theoret ica l , i t cannot be pract ica l . T he
absence o f a theore t i ca l Ibun da t ion fo r a lmos t a l l non-
re l a t i ona l DB MS i s t he pr ime cause o f t he i r
ungepotchket
qua l i t y . (T h i s i s a Y idd i sh word , one o f whose mean ings
i s pa t ched up . )
On the o the r hand , i t s eems reasonab le to pose the
fo l lowing two ques t ions :
(1 ) Can a re l at i ona l syst em prov ide the range o f set-
H 4
vices
that
w e h a v e g r o w n t o e x p e c t f r o m o t h e r D B M S ?
(2) I f (1) is m~swered af~i rm adv dy, can such a system
pe rfb rm as well as non~relatior~.a/ D BM S? ~
W e took at each. of the se i~"~ t~rm
6.~ Rang e o f Se rv i ces
A fu l l - s ca l e DBMS prov ides the %l lowing capab i l i -
ties:
o da ta s to rage , r e t ri eva l , and upd a te ;
a use r -acces s ib l e ca t a log fb r dam desc r i p t i on ;
t ransac t ion suppor t t o ensure t h a t all o r none o f a
s equence o f da tabase cha rg es a re re f l ec t ed in the
per t inent datab ases (see [17] fbr an ~dp- to-date sum-
m a r y o f t r an s a c t i o n t e c h n o l o g y ) ;
o recov ery services in case of fhi ture (sys tem , media ,
o r p r o g r a m ) ;
o concu r rency con t ro l s e rv i ces to ensure tt~a t concur -
ren t t r ansac t ions be have the s a rae way as i f run in
some sequent i a l o rde r ;
o au thori zat i on services to ensu re th at a l l access to
a n d m a n i p u l a t i o n o f d a t a b e i n a c c o r d a n c e w it h
spec i f i ed cons t ra in t s on u se rs and prog ram s [18 ];
i n t egra t ion wi th su pp or t t k~ r da ta comm unica t ion ;
integr i ty services to ensure that database s ta tes and
changes o f s t a te con fbrm to spec i f i ed rubs .
Cer t a in re l a t i ona l p ro to types deve loped in the ea r ly
sevent ies fe l t far short of providing a l t these services
(pos s ib ly fo r good reasons ) . N ow , h ow ever , s eve ral r e-
l a t i ona l sys t ems a re ava i l ab l e a s so f tware produc t s and
prov ide a l l t hese s e rv i ces w i th the e xcep t ion o f t he l as t.
P resen t ve rs ions o f t hese produ c t s a re adm i t t ed ly weak
in the prov i s ion o f i n t egr i ty s e rv i ces , bu t t h i s i s r ap id ly
being rem edied [ 10]~
S o m e r e l a t i o n a l D B M S a c t u a l l y p r o v i d e m o r e c o m -
p le t e da ta s e rv i ces than the non- re l a t i on a l sys tems . Three
examples fo l l ow.
A s a f i r s t e x a m p l e , r e l a t i o n a l D B M S s u p p o r t t h e
ext rac t ion o f a l l m ean in gf u l r e l a t i ons f rom a da tabase ,
whereas non- re l a t i ona l sys t ems suppor t ex t rac t ion on ly
where the re ex i s t s t a t i ca l l y pre de f i ned access pa ths .
As a s econd exam ple o f" the add i t i ona l s e rv i ces pro-
v ided by som e re l a t i ona l sys t ems , cons ide r v i ews . A view
i s a v i r t ua l r e l a t i on ( t ab l e ) de f i ned b y m eans o f an
e x p r e s si o n o r se q u e n c e o f c o m m a n d s . A l t h o u g h n o t di -
rec t ly sup por t ed by ac tua l da t a , a v i ew app ea rs to a user
as i f i t we re an add i t i on a l base t ab l e k ep t u p- to - da te and
in a s t a te o f i n t egr i ty w i th the o the r base t ab l e s. V iews
a re use fu .1 fo r pe rm i t t i ng ap p l i ca t ion pro gram s an d use rs
a t t e rmina l s t o i n t e rac t w i th cons tan t v i ew s t ruc tures ,
even when the base t ab l e s themse lves a re undergo ing
s t ruc tura l changes a t t he
logical
l eve l ( p rov id ing tha t t he
pe r t i nen t v i ews a re s t i l l de f inab le f rom the new base
tab l e s ) . They a re a l so use fu l i n re s t r i c t i ng the s cope o f
a One sho uld bear in mi nd tha t the n on-re l a t iona l ones always
employ compara t ive ly k)w leve l da ta sublanguages ibr appl ica t ion
p rog ra mming .
Comm unic a dons K e brua ry t 982
of Volum e 25
t h e A C M N u m b e r 2
-
8/10/2019 05 RDB Productivity
7/9
access of programs and users. Non-re la t ional systems
either do not support views at a l l or e tse support much
more pr im i t ive counte rpa r ts , such as the CODASYL
s.ubschema.
As a th i rd example , some sys tems (e4 . , SQ L/D S [28 ]
and i ts proto type p redecessor System R) perm it a var ie ty
of" changes to be m ade to the logical and physical orga-
n iza t ion of ' the d a ta dynam ica l ly - - wh i le t r ansac tions a re
in progress. These clhanges rarely require application
prog ram s to be recoded~ Th us, there is less of a progra m
maintenance bv,rden , leav ing programmers to be more
produc t ive do ing dev e lopm ent r a the r than m a in tenance ,
This capabi l i ty i s mad e poss ib le in SQ L/D S by the t~c t
that the system has complete control over access path
selection.
In non-re la t iona l sys tems such changes would nor -
matly require a l l other database activi t ies including
transactions in progress to be brought to a halt . The
da tabase then rema ins out o f ac t ion unt i l the organiza -
t ional changes are completed and any necessary recom-
pil ing done.
6.2 PerfOrmance
Natura l ly, people would hesi ta te to use re la t ional
systems if these systems were sluggish in performance.
All too oRen, erroneous conclusions are draw n about the
pe r formance of r e la t iona l systems by compar ing the t ime
it might take for one of these systems to execute a
complex transaction with the t ime a non-re la t ional sys-
tem might take to execute an extremely simple transac-
t ion. To arr ive a t a fa ir performance comparison, one
must compare these systems on the same tasks or appl i-
cat ions. We shal l present arguments to show why re la-
t ional systems should be able to compete successful ly
wi th non- re la t iona l sys tems.
G ood pe r form ance i s de te rmined by two fac tor s : ( t )
the sys tem m ust suppor t pe r formance -or ien ted phys ica l
data structures; (2) high-lev el language requests for data
must be compiled into lower-level code sequences a t
least as goo d as the average applicat ion prog ram me r can
produce by hand.
The first s tep in the argument is that a program
wr i t ten in a Cobol - leve l language can be made to pe r-
form -efic iently on large databases conta ining production
da ta s t ruc tured in tabula r form wi th no use r -v is ib le
naviga t ion l inks be tween them. This s tep in the a rgum ent
i s suppor ted by the fo l lowing in form a t ion [19 ] : a s o f
Augus t t 9 8 1 , T a n d e m C o m pu te r C o r p . h a d m a n u fa c -
tured an d in sta l led 760 systems; of these, over 700 w ere
m a k in g u se o f th e T a n d e m E NC O MP A S S r e l a ti o n al
da tabase m anag em ent sys tem to suppor t da tabases con-
ta in ing 1produc t ion da ta . T andem has comm it ted i ts own
man ufac tu r ing da tabase to the ca re o f ENC OM PA SS.
EN CO MP A SS does not suppor t l inks be tween the data -
base tables, e i ther user-visible (navigation) l inks or user-
invisible (access 'method) l inks.
In the second step of the argument, suppose w e take
the applicat ion programs in the above-c ited insta l la t ions
1t5
and replace the database re tr ieval and manipulat ion
sta tements by sta tements in a database sublangu age w ith
a relational processing capability (e.g. , SQL). Clearly, to
obta in good performance with such a high level lan-
guage, i t is essentia l that i t be com piled into object code
(instead of being interpreted), and i t is essential that that
object code be efficient.
Compila t ion is used in System R and i ts product
ve r sion SQ L/D S. In 1976 Raym ond Lor ie deve loped an
ingenious pre- and post-compil ing scheme ~br coping
with dynam ic changes in access paths [21]. It a lso copes
with ear ly (and hence effic ient) author iz at ion and integ-
r i ty checking (the la t ter , however , is not yet imple-
mented). This scheme cal ls for compil ing in a ra ther
specia l way the SQL sta tements embedded in a host
language program. T his comp i la t ion s tep t r ansforms the
SQL s ta tements in to appropr ia te CALLs wi th in the
source program together with access modu les conta ining
object code. These m odules are then stored in the data-
base tbr la ter use a t runtime. The code in these access
modu les is generated by the system so as to optim ize the
sequencing of the majo r operations and the se lection of
access paths to prov ide ru ntim e effic iency. Al ter this pre-
compila t ion step, the applicat ion program is compiled
by a regular com piler tbr the per t inent .host languag e. I f
a t any subsequent t ime one o r more o f the access paths
is removed and an a t tempt i s made to run the program,
enough source information has been re ta ined in the
access mod ule to enable the system to re-compile a new
access m odul e that exploits the now ex ist ing access paths
without requiring a re-compilation of the application pro-
gram.
Incidenta l ly, the same data sublanguage compiler is
used on ad hoc quer ies submitted interactively t~om a
terminal and a lso on quer ies that are dynamical ly gen-
erated dur ing the execution of a program (e .g. , i~om
parameters submitted interactively). Immediate ly after
compila t ion, such quer ies are executed and, with the
exception of the simplest of quer ies, the p erform ance is
better than that of an interpreter .
The generation of access modules (whether a t the
init ia l compil ing or re-compil ing stage) enta i ts a quite
sophist icated optimi zation scheme [27], wh ich mak es use
of sys tem-ma in ta ined s ta t is t ic s that would not norma l ly
be wi th in the programm er ' s knowledge . Thus , on ly on
the simplest of a l l transactions would i t be possible for
an ave rage app l ica t ion programm er to compe te w i th th i s
optimizer in generation of eff ic ient code. Any attempts
to compe te are bou nd to reduce the programm er ' s pro-
ductivi ty. Thus, the pr ice paid for extra compile-t ime
overhead wou ld seem to be we l l wor th pay ing .
Assu ming no n-l in ked tabul ar structures in both cases,
we can expec t SQL/DS to genera te code comparab le
with average hand-writ ten code in many simple cases,
and supe r ior in man y comp lex cases . Many comm erc ial
transactions are extremely simple . For exam ple , one m ay
need to look up a record for a par t icular ra i lroad w agon
to find out where i t is or f ind the balance in someone 's
Communications February 1982
of Volume 25
the ACM Number 2
-
8/10/2019 05 RDB Productivity
8/9
savings account, if suitably t~st access paths are sup-
por ted (e.g., hashing), there is no reason wh y a high-lev el
language such as SQL, QUEL, or QBE should resut t in
less efficient ram.time code for these simple transactions
than a lower level language, even though such transac--
t ions m ake l it tle u se of the opt imizing capabi l i ty o f the
high- level data sublanguage compi ler .
7 . Future Direct ions
If we are to use relat ional database as a %undat ion
for product iv i ty , w e need to know what sort of develop -
m ents m ay l ie ahead tbr relat ional systems.
Let us deal with near-term developments first . In
some relat ional systems s t ronger support i s needed for
domains and prim ary keys per suggest ions in [10] . As
al ready noted, at l relat ional systems need upg rading wi th
regard to automatic adherence to integrity constraints.
Exis ting constraints on u pdat ing jo in- type views need to
be relaxed (where theoretically possible), and progress is
being m ade on this problem [20]. Supp ort for outer jo ins
is needed.
Marked improvements a re be ing made in op timiz ing
technology, so we may reasonably expect further im-
provem ents in per%finance. In certain products , such as
the ICL CA FS [22 ] and the Br i t ton -Lee IDM5 00 [13 ],
special hardw are sup port has been implem ented. Special
hardware may help performance in certain types of
appl icat ions . However, in the majori ty of appl icat ions
deal ing wi th format ted databases , sof tware- implemented
relat ional systems can compete in performance wi th
software- implemented non-relat ional systems.
At present , most relat ional systems do not provide
any special support for engineering and scient i f ic data-
bases . Such supp ort , including interfacing w i th Fort ran,
is clearly needed and can be expected.
Catalogs in relational systems already consist of ad-
ditional relations that can be interrogated just l ike the
rest of the database us ing the same query language. A
n.atural develop me nt that can and should b e swif t ly pu t
in place is the exp ansion o f these catalogs into full-
f ledged act ive dict ionaries to p rovide addi t ional on- l ine
data control.
Final ly , in the near term, we may expect database
design aids suited for use with relational systems both at
the logical and physical levels.
In the longer term we may expect support for rela-
t ional databases dis t r ibuted over a communicat ions net -
work [25 , 30, 32] and m anage d in such a w ay that
appl icat ion programs and interact ive users can manipu-
late the dat a (1) as if al l of i t were stored at the local
node--location transparency--and (2) as i f no data w ere
replicated
anywhere--replication transyaremy.
All three
of the projects ci ted above are based on the relat ional
model . One important reason for th is i s that relat ional
databases offer great decomposi t ion f lexibi l i ty when
planning how a database is to be dis t r ibuted over a
H6
network o f comp~ ter sys tems , and g rea t recompos i ti on
p o w e r f o r d y n a mi c combination of decentralized infor-
mat ion . By con trast , CO D A SY L D BT G databases a re
very d i f f i cuk to decompose and recompose due to the
er~tanglement of" the ow ner -m em ber navigat ion l inks.
Th i s p roper ty makes the CO D A SY L app roach ex t remely
di f f icul t to adapt to a d is t r ibuted database environment
and m ay w ell prov e to be i ts downfh11. A secor~d reason
for u se of ' the rela tional m ode l is that i t off?rs concise
high level data subtanguages for t ransmit t ing requests
tbr data fYom node to node.
The ongoing work in extending the relat ional model
to cap tu re in a fo rmal w ay m ore m ean ing o f the data
can be exp ected to lead to the incorporat ion of this
raeaning in the database catalog in order to thctor i t out
o f app l i ca t ion p rogram s and make these p rograms even
m ore concise and s imple. Here, we are, of course, talk ing
abou t m eaning that i s represented in such a way that the
system can unders tand i t and act upon i t .
1reproved theories are being developed %r handl ing
miss ing data and inapp l icable data (see for example
[3]) . This w ork should yield imp roved t reatment of null
values .
As it stands today, relational database is best suited
to data wi th a rather regular or homogeneous s t ructure.
Can we retain the advantages of the relat ional approach
whi le handl ing heterogeneous data also? Such data may
include images , text , and m iscel laneous facts . An aff irm-
at ive ans we r is expected, and some research is in progress
on this subject , but more is needed.
Considerable research is needed to achieve a rap-
p rochem ent be tween da tabase l anguages and p rogram-
ming l anguages . Pasca l /R [26 ] i s a good exam ple o f work
in this direction. Ongoing investigations focus on the
incorp orat ion of abstract data types into database lan-
gu ages o n the one hand [12] and relat ional processing
in to p rogramm ing l anguages on the o ther .
8. Conclus ions
W e have p resen ted a ser ies o f a rgum ents to suppor t
the c l a im
that
relat ional databa se technology offers dra-
mat i c improvements in p roduct i v i ty bo th fo r end users
and fo r app l i ca t ion p rogramm ers . T he argum ents cen ter
on the data independence, s t ructural s impl ici ty , and
relat ional process ing def ined in the relat ional model and
implemented in re l a t iona l da tabase management sys -
tem s. A tt three o f these fF,atures s impl i fy the task of
deve lop ing app l i ca t ion p rograms and the fo rmula t ion o f
queries and updates to be submit ted from a terminal . In
addi t ion, the f i rs t feature tends to keep programs viable
in the face of organizat ional and descript ive changes in
the database and therefore reduces the effort that i s
norm al ly d iver t ed into the main tenance o f p rograms .
W hy, then, does the t i t le of this pap er sugg est that
relat ional database provides only a tbundat ion for im-
proved product iv i ty and not the total solut ion? The
Comm unic atio ns February 1982
of Volume 25
the AC M N umber 2
-
8/10/2019 05 RDB Productivity
9/9
reason is s imple : re la t ional database deals only with the
sha red da ta component o f app l ica t ion programs and
end-use r in terac tions . There a re num erous com plemen-
ta ry technologies tha t m ay he lp wi th o the r components
or aspects , { ior exam ple , p rogram m ing lang uag es that
suppor t r e la t iona l process ing and improved checking of
da ta types , improv ed ed i tor s tha t u nders tand more o f the
language be ing used , e tc . We use the te rm " tbu nda t ion , "
because interaction with shared data (whether by pro-
gram or via term inal) represents the core of so mu ch
data processing activi ty.
The practica l i ty of the re la t ional approach has been
proven by the test and production insta l la t ions that are
already in operation. Accordingly, with re la t ional sys-
tems we can f low look forward to the produc t iv i ty boos t
that we a l l hop ed DB MS w oul d provide in the f irst place.
Acknowledgmems. I woald l ike to express my in-
debtedness to the System R development team at IBM
Research, San Jose for developing a ful l -scale , uniform
re la t iona l pro to type tha t en ta i led numerous language
and sys tem innova t ions ; to the deve lopm ent team a t the
IBM Labora tory, Endicott , N .Y. tbr the professional way
in which they converted System R into product tbrm; to
the va r ious teams a t un ive r s i t ie s , ha rdware manufac -
turers , softw are f irm s, an d user inta t la tions," wh o de-
s igned and imp lem ented work ing re la tiona l sys tems; to
the QBE team a t IBM Yorktown He ights , N.Y. ; to the
PRTV team a t the tBM Sc ient i f ic Centre in England;
and to the numerous cont r ibutor s to da tabase theory
who have used the re la t ional model as a cornerstone. A
specia l acknowledgement is due to the very few col-
leagues who saw someth ing w or th suppor t ing in the ea r ly
stages, par t icular ly, Chris Date and Sharon Weinberg.
Final ly, i t was Sharon Weinberg who suggested the
theme of this paper .
Received 10/81 ; revised and accepted 12/81
~.eferences
1. Beeri, C., Bernstei n, P., Oo odm an, N. A sophisticate 's intro ducti on
to da tabase normaliza t i on theory. Proc. Very Large Data Bases, West
Berlin, Ge rm any, Sept. 1978.
2 . Bernste in, P .A., G oodm an, N. , La i, M-Y. Laying phan tom s to
res t. Report TR-03- 8 t, Center tbr R esearch in Computing
Technology, Harvard Univers i ty, Cambridge , Mass. , 1981.
3 . Biskup, J.A. A form al approach to nul l va lues in da tabase
relations. Proc. W orksh op on F orma l Bases fo r Data Bases, Toulouse ,
France , Dec 1979; published in [16] (see be low) pp 29 9-342.
4 . Brodie, M. and Schmidt , J . (Eds) , Report of the AN SI Rela t iona l
Ta sk G roup . , ( to be pub l i she d A CM SIG MO D Re c ord) .
5. Chamb erl in, D.D. , e t aL SEQUEL2: A unif ied approach to da ta
def ini t ion, manipula t ion, and control . I B M J. Res. Dev., 20, 6,
(Nov. 1976) 560-5 65.
6 . Chamb erl in, D.D. , e t al . A his tory and eva lua t ion of system R.
Comm. ACM , 24 , 10, (Oct. 198 1) 632-646.
7 . Codd, E.F. A re la t iona l mo del of da ta for la rge shared data
banks. Comm. ACM , 13 , 6, (June 1970) 377-387.
8, Codd, E.F. Exten ding the datab ase relational mo del to capture
more meaning. AC M TODS, 4 , 4 , (Dec. 19 79 ) 397-.-434.
9. Codd, E.F. Data models in da tabase manag ement. A C M
SIG MO D Re cord , 1 t , 2, (Feb. 1 981 ) 112-114.
10. Codd, E.F. The capabilities of relational database management
systems. Proc. Convencio I nformation Llatina, Barceiona, Spain, June
,.-12, 1981,,pp 13-26; also available as Report 3132, IBM Research
Lab., San Jose, Calif.
11. Date, C.J. Referential integrity. Proc . Very Large Data Bases,
Cannes , France , Septem ber 9-11, 1981, pp 2-12.
12. Ehrig, H ., and W eber, H. Alg ebraic specification schemes for
data base systems. Proc. Very Large Data Bases, West Berlin,
Germ any, Sept 13-15, 1978, 427-440.
:13. Epstein, R., a nd Haw thorne, P. D esign decisions for tile
intelligent database machine. Proe. N CC 1980 , AFIP S , Vol . 49 , May
1980, pp 237-241.
14. Eswaran, K.P., and Chamberlin, D.D. Functional specifications
of a su bsystem fo r database integrity. Proc. Very Large Data Bases,
Fram ingha m, M ass., Sept. 1975 , pp 48--68.
15. Fagin, R. Horn clauses and database dependencies. Proc. 1980
A CM SIG A CT Syrup . on Theory of Computing, Los Angeles, CA, pp
123-134,
16, Gallaire, H., Mink er, J., and N icolas, J.M. Advanc es in Data Base
Theory. Vol 1, Plenum Press, New York, 1981.
17. G ray, J. Th e transaction concept: virtues and limitation s, Proc.
Very Large Data Bases, Cannes , France , September 9- t 1 , 1981, pp
I44-154.
18. G f if f i ths , P .G. , and W ade, B.W. An author iza t ion mechanism for
a relational data base system. A C M T O D S , 1, 3, (Sept 1 976) 242-255.
19. Held, G, ENCOM PASS: A re la t ional da ta manager . Data Base /
81, Western Institu te of Com put er Science, Univ . of Sant a Clara,
Santa Clara, Calif. , August 24-28, 1981.
20. Keller, A.M. Up dates to relational databases thro ug h views
involving oins. Report RJ3282, IBM Research Laboratory, San Jose,
Calif., October 27, 198l.
21. Lorie, R.A , and N ilsson, J.F. A n access specification language
for a relational data base system. IB M J. Res. Dev., 23, 3, (May
1979) 286-298.
22. M aller, V.A.J. The content addressable file store- -CA FS. I C L
g?chnical J. , 1, 3, (Nov. 1979 ) 265-279.
23. Reisner, P. Hu m an factors studies of database query languages::
A survey and assessment. AC M Computing Surveys', I3, 1, (March
1981) 13-31.
24. Rissanen, J. Theory of relations for datab ases- -A tutoria l smvey.
Proc. Syrup. on M athematical Foundations of Computer Science,
Zakopane , P oland, September 1978, Lecture Notes in Compu ter
Science, No. 64, Springer Verlag, New Y ork, 1978.
28. Rothnie, J.B., Jr, et al. Introd uctio n to a system for distributed
databases (SDD-1). A C M T O D S , 5, 1, (March 1980) 1-17.
26. Schmidt, J.W. Some hig h level langu age constructs tbr data of
type relation. ACM TODS, 2 , 3, (Sept 1977) 247-261.
27. Selinger, P .G., et al. Access path selection in a relationa l database
system. Proe. 1979 A CM SI G M O D International Conference on
Man agement of Data, Boston, MA , May 1979, pp 23-34 .
2K - - - , SQL /Da ta system for VSE: A re la t ional da ta system for
appl ica t ion deve lopment. IBM Corp. D ata Process ing Divis ion,
W hite Plains, N.Y., G320-6590, Feb 1981.
29. Stonebraker , M.R. , e ta l . T he des ign and implem enta t ion of
IN G RES, A C M T O D S, 1, 3, (Sept. 19 76) 189-22 2.
30, Stonebraker, M.R., and Neuhold, E.J. A distributed data base
vers ion of ING RES. Proe. Second Berk eley Workshop on Distributed
Data M anagement and Computer N etworks, Lawrence-Berkeley Lab.,
Berkeley, Calif., M ay 19 77, pp 19-36.
31l. Todd., S.J.P. The Peterlee relational test vehicle--A system
overview. IB M Systems J . , 15 , 4 , 1976, 285-308.
32. Williams, R. et al. R*: An ov erview of the architecture. Repo rt
RJ3325, IBM Research Laboratory, San Jose, Calif. , October 27,
1981.
33. Zloof, M.M. Query by example. Proc . N CC, AF IP S Vol 44 , Ma y
197 5, pp 431--..438.
117
Comm unica t ions February 1982
of Volume 25
the A CM N um be r 2