ed 094 758 18 000 958 reintjes, j. francis; marcus
TRANSCRIPT
DOCUMENT 'RESUME
ED 094 758 18 000 958
AUTHOR Reintjes, J. Francis; Marcus, Richard S.TITLE Research in the Coupling'of Interactive Information
Systems. Final Report. ESL-FR-556.INSTITUTION Massachusetts Inst. of Tech., Cambridge. Electronic
Systems Lab.SPONS AGENCY National S6iende Foundation, Washington, D.C.REPORT NO ESL-FR-556; MIT-OSP-80720PUB DATE 30 Jun 74'NOTE 62p.
EDRS PRICE ,MF-$0.75 HC -$3.15 PLUS POSTAGE'DESCRIPTORS *Computers; Data Bases; *Information Networks;
*Information Retrieval; Information Science;Information Services;'*Information Systegis
IDENTIFIERS ARPANET; Compatibility; MEDLINE; NetworkInterfaces
ABSTRACTThis reported research centered on development of the
concept of a translating computer interface by which the networkingof heterogeneous interactive information systems may be achievedduring the period in which information retrieval system and network'standards are evolving. The particular concepts and techniquesinvestigated are'thevirtual system concept, a common commandlanguage, a master index and thesaurus, and.a common bibliographicdata structure. In addition to the theoretical study of the problem,
-an experimental interface haS been developed that connects theMEDLIgE and Interex retrieval system via ARPANET communication linksand that performs some of the networking functions of the virtualsystem. (Author/WH)
U.J
A
I
June 30, '1974 Report ESC-R-556
MIT-OSP Proiect.80720 \.,`NSF Grant GN-36520 ,
RESEARCH IN THE COUPLING OF INTERACTIVE
INFORMATION SYSTEMS FINAL REPORT
J. Francis Reintjes
Richard S. Marcus
Hccironic Swems Laburator)
I
MASSACHUSETTS INSTITUTE OF TECHNOLOGY, CAMBRIDGE. MASSACHUSETTS um
Dr pdritural of frit di optsccfmg
CO
CD
w June 30, V.974' Report ESL -FR -556
RESEARCH IN THE COUPLINGOF
INTERACTIVE INFORMATION SYSTEMS
FINAL REPORT
J. Francis ReintjesRichard S. Marcus
U1 DI PA4TMENT Of NEAL TolDUCATION I WELFARE
NATIONAL ...SMUT( OfE Du( A TION
11,11 1V1 N1 1111 WI 1,o,1 . 1 A AL 1+1 e 1 .1 1 1 1,11 A..
off PI . AN /A 141,4 .114
,11, 111 ...11 4111. nu wi Pwl
The research reported herein was made possiblethrough the support extended by the NationalScience Foundation through Grant GN-36520.
Electronic Systems LaboratoryDepartment of Electrical EngineeringMassachusetts Institute of Technology
Cambridge, Massachusetts 021 39
JACKNOWLEDGEMENT
.The authors would like to acknowledge the efforts
of several project co-workers: Dr. Charles W. Therrien
and M. Don atntor in the area of computer systems;
Mr. Alan Benenfeld in the area of bibliographic data
structures; Dr. Peter Kugel in the area of reei.1:6al
language analysis; and Messrs. Charles.E. Hurlburt,
Leonard Goodman, and Hsin-Kuo Kan in the area of computer
/ programming and analysis.
We would also like to acknowledge the support and
cooperation of Mr. Davis McCarn and Mrs. Barbara Sternick
of the staff of the National Library of Medicine in our
efforts to couple, experimentally, MEDLINE to our local
interface.
ABSTRACT
This report describes results of an 18-monthresearch effort in the coupling of interactive informationsystems. The research has' centered on development of the
-concept of a translating computer interface by which thenetworking of heterogeneous interactive information systemsmay be achieved in the period during which I-R system andnetwork standards are evolving. Particular concepts andtechniques which have been investigatld are:. (1) thevirtual system concept by which users perceive the networkas a single homogeneous system; (2) a common commandlanguage synthesized from a basic language of primitive I-Rfunctions; (3) a master index and thesaurus which stores thevocabularies of the separate data bases along with indexterm interrelationships and counts; and (4) a common biblio-graphic data structure in which the data elements forbibliographic information are hierarchically structured andinterrelated among different data bases. In addition to thetheoretical study of the problem, an experimental interfacehas been developed that connects the MEDLINE and Intrexretrieval system via ARPANET communication links and thatperforms some of the networking functions of the virtualsystem.
ii
I. INTRODUCTION
This report describes results obtained under National
Science Foundation Grant GN-36520, entitled Research in
the Coupling Of Interactive Information SysteMs', The
period of the grantwas January 1, 1973 through June 30, 1974.
The research was motivated by our concern that the
information systems that are beginning to appear in
operational environments will not be effectively utilized
because of an inability of users and information specialists
to master them.' Degradation in the quality of service is
likely to occur because of the many differences that exist
among the systems and the time required to gain an
underdtanding of their specialized features.
Under the present state of affairs, in which each
information system has its own unique features and is
accessed in accordance with its own set of special rules,
it is out of the question to expect users themselves to
make eilicient use of the various systems. ThcFre are just.
too many subtle procedures to master in a short time. We
have even observed that considerable training; of informa-
tion sTecialieNt is required to brine, them to a MO level
2
of proficiexty and we foresee an upper limit to the
number of disparately designed systems the specialists
will be able to handle.
It is clear that until such time as the user
community becomes highly proficient in online-access
pr educes, information specialists must be on hand td
serve user_needs. Although it may turn out that a certain
percentage of users will always want to delegate search
responsibiXities to the specialist, one would like to move
in the direction 9f se'lf- service as aqueans of getting
the user involved directly in the solution of his own
informational problem, thereby reducing'his costs and
improving personal satisfaction. This will not be_
possible unless system access is made simple and straight-
forward.
Hence, future courses of action become obvious;.
either uniform standards must be adopted for all systems,
or computerized interfaces must be developed which
accommodate nonuniformities and re-present them to the
user in a single, standardized form. It is along the
latter line that we haiie been working.
3
I.C. DISCUSSION : THE NEED FOR A NETWORK OF HETEROGENEOUS
INFORMATION SYSTEMS
A. Recent Advances in Interactive Retrieval Systets.
A number of interactive bibliographic information retrieval
systems, have been developed in recent years.*
This type of
online computer system has been widely acclaimed by users
for rapid and easy access to large data bases of bibliographic
references. The economic viability of these systems is
attested to by their continued growth and by the fact that a
number of commercially sponsored systems are currently
**available. In fact, it is now possible togain access
from most points in this country at costs ranging from about
$6 to 00 per connect hour to these retrieval systems. These
A collection of descriptions of several of these systemsis found in Walker, Donald E. (ed), Interactive BiblOgraphicSearch: The User/Computer Interface, AFIPS Press, Montvale,N.J., 1971
Economic viability is diseused in the following two_papers.:
C.W. Therrien and J.F. Reintjes, Modeling of Informa-tion Systems; Proceedings of the Sixth Annual PrincetonConference on Information Sciences and Systems, PrincetonUniversity, March, 1972
Davis B. McCarn and Joseph Letter, On-Line Servicesin Medicine and Beyond, Science, Vol. 181, No. 4097,27 July 1973, pp 318-324
syStems contain in the aggregate references to documents
numbering in the millions in such subject areas as
chemistry, aeronautics and astronautics, education, agricul-
ture,'nuclear science,.toxicology, medicine, engineering,
and environmental studies as well as data bases covering
several subject areas for such document
articles, government-sponsored reports,
cataloged monographs, .and news articles.
(types as journal
Library of Congress
B. Limitations of Present Systems. A major
limitation of current systems is the size of the data base
that can be stored online. A data base containing bibliol
graphic information for a million documents is aboutjhe
maximum size for effective online operation with current
hatdware/software environments for single computer systems.
HOwever, a collection of this size represents a very few00
dOcuments when measured against the total amount of pub-
lished literature. In particular, a million documents
will cover the literature of a single discipline fof
example, chemistry -4-- for only a very few years.
One might argue tnat most researchers work only in a
fairly narrow area and could be adequately served by a data
base of the size of a million documents. There are
several problems with this kind of argument. In thefAk
place, the information needs, of users are often most
critical in areas outside their own specialty. Thus, for
example, when starting out in a new aspect or when seeking
an experimental devide for instrumentation, a-researcher
may have more need for information than when-working strict-.
ly in his own specialty. In Project Intrex we found that111.
there were many more serious users who were from outside
the 5 specialties for which the-data base was se.lected than
there were from within thosekspecialties. Also; there seems to
be a growing trend toward interdisciplinary activity with
the concomitant need for multiple data bases. Even users of
systems with many hundreds of thousands of documents regular-
ly ask for a broader coverage. Then, too, much of-the use
of tbese systems is by information specialists 'acting as
delegated searchers; these specialists may have many clients
from different subject areas and, hence, a need for a
multiplicity of large data bases.
Another limitation on current syStems is their
capacity in terms of number of simultaneous online users.
See Project Intrex Semiannual Activity Report15 September 1971, Massachusetts Institute of Technology,pp 6-6. PB 202 860.
6
This capacity is usually numbered in the tens wheieasr
there is a potential for thousands of simultaneous users,
even if only the United States is considered.
C. .The Ultimate Uniform I4etwork Solution. Ultimately,
.,the solution to these problems may necessitate the"construc-
tion of a large-scale, on-line information retrieval net-.
work made up tof many similar.-- preferably identical
computer nodes, each node being associated with an online
to base of a millioh or more documents on a separate set
of topics. For maximum efficiency users connected to each
node --- there might be several hundred online users at
any one time-- would make requests in a common retrieval
language. Such requests would lead to parallel searching
of e data bases whi4h would orgLiized
within a standard file,structure.' Intercommunication among
computer nodes would be accomplished over high-speed.
communication lipls for hich data-concentrator techniques
would be employed to gain-further efficiencies and to
"reduce response times.
Thus, in order to achieve economies of scale,
with data bases created only once and used many times by
a large user community, and to: provide easy transfer of
information among this large community, the ultiMate
solutionrappears to be a uniform network of standardized
parts. The telephone and railroad (standard gauge)
networks would seem to provide good analogies.
D. dpbstacles to Immediate Implementation of the
Ultimate Network. For the next several years, however:the
degree of standardization required for the ultimate network
is unlikelyiin view of the\41ready heavy investments that
have been made in existing heterogeneous, nonstandaidized
.retrieval systems. Lack of standardization is &pervasive
barrier to intercommunication. A potential user of dif-
ferent retrieval systems ig faced with a series of-obstacles
right from the start': the,necessity to discover these
'systems in the-first place, to make separate procedures to
gain access and account for dosts, and-, quite possibly, to
make actual acceis via different terminals and separate
locations. Other obstacles face the user once access is
-made: different command languages, retrieval' functions,
,indexing vocabularies, and output. formats. If the pro-
grammers of one system wanted their system to communicate
directly with another, they would face problems of dif-
ferent operating systems, hardware, programming languages,
t
character codes, word /byte /bit orgahizaion, file
organizations, and most directly for the majority of
8
I-R sy6tems, no established computer.-to-computer
communication links.I
Because of the established character of the
different I-R systems, their environments,, and their user
clientele, and the cost of remakinedata bases in dif-
ferent file organizations --- even if peimissionto do so
were .granted -, it seems unlikely that any. existing I-R'.
system and environment will soon become a de facto standard.
I
E. The Computer Interface. In view of the
foregoing obstacles to the immediate implementation of thea
ultimate network, we decided on a course of action
for the intermediate term which seeks to approximate the,-/
effectivends and efficiency inherent in the ultimo.%
'work as best as possible through a currently achievable
.network based on computer-interface techniques. Such
an interface achieves compatibility among systems of
heterogeneous hardware and software components through
translating and conversion algorithms. It is this kind of
interfae that we have investigated and are reporting upon
here.
9
The over-all objective of our work was to establish
through analysis and experiment the feasibility (or in-
feasibility) of a computer-stored.common language interface
for disparately designed information systems. Our program
was conducted under three major4teadings:
Study of I-R Systems and Research Planning
Advanced Network Research
Uperimental Interface Decigr., Implementationand Analysis
10
III. RESULTS OF TEE PROJECT
A. Stud of I-R S stems and Research Plannin
effort called for an examination of several online I-R
systems from the viewpoint of the requirements they would
impose on a computer interface design. The purpose was to
find at least one I-R system outside which would be
suitable for our network experiments and whose administra-
tors and technical staff would he willing to cooperate LT,
these experiments_
We revieljied many-of the important I-R systems.
We participated in the feature analysis of twelve 1t0er-.
act:-.re fetrms: performed by Dr. Thomas Martin f :itanfotO
an 4ticndcd 4 three-day oftsirtar for this putptisc at
:aanford University: Our revi.ele of theitc ayptcluite led wa
to the CfrIclual(rn that Q4t otlEftlal idcaa fot 4 fletiecti.
interface mstitEd (Icfailc4 4cyclopmcNt It+ astWiticitt. we
iclatttlISetl the NSLDI1NL tctticvai tvctric of tfic Naticterial
I II,: at v 1'40(11c lItc a. th,c t.c t (:4c f<3 ttc t o (hoopc at tic
fitat aotc,u, tctiutc Sttim M 1,7. with which chvcticuct,t
Cite tcacc,ttc th,JoaltiE MVA.T.Nt. int ltvIc
MAI t iti, t:ctv aI I. Itti vc I cIt V S t t, 11 ot t.,1 a
; t I tT,c lt,zt It k:tc fot ttInstr,stii. at 1 ,to,
1.,cccc1,1, ct ctc c(WqICA;1.1., A 4tt. iWted .OritlpAit4
I SVC atlti4yglif
( 1) 1311,1 staff wqp very r.(popctst ivy In Aiding(flit TP1:1G41"( h Pi int t . Ploy pr(iliit 3 rsr1 brit INV Anil Tcar1V a( cFp t,n t hc.
y t
( 2 ) tic c- r rent t r,1 ) c p1,,i1pry341 pt irv1,1c c ,ritt t fni Pt, 1,c tt3c.nti33it tc1 t t, j IP I ;it: 140;1c* t-
Int tr-P rot ricvA1 evetctt; ,)1 pi 1.7
( ) 1/01. has t A1,1 i ci.cl An cirt,cr ; tucnt Ar rnr)P r t irm tr, the At. .r t lc API A r,c t to, t1 4/1. I ( to
W4'/3 (1 j,/ fr".1c he- j,f I In I t ts/i ,,,q7 ,33,3pt, rtcIw( 71. cititct Itticnt c
in the r r.t)/ t c r,f ftsfl" thic t s,c we
tiat)cnc al 11,c I cvIcw c1111.1 It. 310c 11.c. Icy; c-w ck ct
( t C ;lctWr,7lrc I 4 11 LID C 1.1 r.ctWrrtlr
c j Pc 3" 1113c rt Tiilz cv3 Pvc t c`wzI .1 nk In t i.I wc.
.1; e. t,%-ct cc3 bin( we c1Ict t Svc 3v .;cc in rtL;1 cfpci 3ntel'Ag1
I! kt; rA.3'31:1 3:14341 T17VVI-.7 I 1c r 3tituthit ci e.c. I 3,43 t le I t he.4
P I c t c t F.0 # t au* tr.r. t i. (.11 10,1 a t i r'T, t I.
Isc 3 a t 3 trna 1 en ; .c ; I 1 ; s.j_*! 4 i c I-I C- t c-ilia 41: c
s. one-11 1.c JUP I 14 t I.J 1 7 33: 11 sAt3c
hc-wluz ti ( t 1.. rt, I I...Luc t 3. a C ( (A 40.
rf 3 3 I t .1 c t 3 Isc 1i I ; r. =: cizt cct I;
.;:t c.tipc I 1i:4c:A =1 tic t w. c- .I 4c .1 In.; c 13- Ic I .rot
...1 tc3 rat i. 1 le cictru.: vc 1c
c c . c .l cti c i C : 4 :- 0., 4 i,c 1 t . 1'2'1
12
frero each EtrJupo 'Pc 711C4WPF. SyAtc-m DcP1,111k4cnt
Cntri4)/otirtn (FD0C). to At(mair Vnotgy Comvoloolcin. Aotto130.
tanfnrd Ohl-vas-yr-Pity, and !WA.
01434v*T4,-4-4 )14-tv,41-4. ;,c-scot,-J4 to.zi, 4-41.444
Et ant woo a 1.7 (,34 Loy c-4 4 f 1)fly f , f fc ot41,3c..tv ( i f Int CI-
rytorac-f t irift, rf I vc Oc giro, kccktOph; f 43 iy ccpSit 141 11-4 (4/ aka 1 crn
4Tott le31 ovette.tibt A ttsts.t tictet jt:c,.r.k t hc, cal ly Tirroolt
#,a Itccr, Ttct,atCr rat I 102 tc.pt)3It wctc *16..4 141accr,tc.i3 At
the Po t w(11 ; nt (rnr,c( t ; (Tr) far,c. 3 orf 1)kc aarer,wa; Luc c ; r,k ,,1
t tic AP.MA ;ct a' ff./ It414',Ittasit Z.( or.t
tut441,c/ Itart, of j.r a to4c1co, Co3:ft1n10 $110,330-ite
Ihcec jcetylt mat* klycr tc.1.44to
1 c Cormirstrir-A C ( ostip; t c t I t.t t, fat c
.1) F to t.-.; 3 f (-t 3-rti et t tetrte, t Itrik
; t .01 41 res= It ar.0 at E kauts(i Ci 41.1 At f c I ( -ttactw, r 1,c ci A., I 1A-A ttif r t usAt IkAt t I Aril Ipt r, ft tic .lt,t c fa( c flits*1 f rl ki au.
tt.I t , k 1.&r,E EA . Cr,.l I 1-, f 11,A t I,1T4 1Ct lAtml Avs:kzul ct C '17
I t 1.c Mai A I Ar,.; ( ; c1 i so:4 11-4 t c1,AI at 1.rt4) 1. Int1,(trLi c*, I ;at rrt ft tin ;t,E t-ar 14Utic I VC, 103
:t.ic.ictt Lt, Ic ,:: ( kt t.k I (r.tIE lizEc 7...ILrl.,&11 I anti 11-
I 1 gum . c c ( 1 I. tut .41 t :t.t t C S.-I t .ptaivat CI4 C Cl sa:, :c. 1,1... 1. Lt. hitt rLzt ZI ;' :444
IPA LA41 inc r. t , t t if IfIVIC I C
13
t :Ect,c,114 ittf,,77F#2t t# ?At 7 ic,'A) C ye t Fre Lc cl--"ac '3
t ,}trrt, r, f Itt,,, , c ir.tctfa,c , p tr: rn2 t 7 C v C. f(.7
cffc, t it, t t rc., t PII,Pyl A .,!Cc-7
.1.4. C C 113. 1 1 3
T;L! 1
,1; c c 1 i, cvctc th: ; r, . flurry ,t, f 7 AtItc 1,4, 71
(0,37 t t 11 C 'Jr 1,1C j113C- C t c.r,, cr.; 1,,
; 1L t c u:1-1;r-
; r, ; t.4 t :A roc c. ,1 < r ; t c c ; a F, ; .:41 3, r ; r Fri I t,
i :.; 1 ; , , , i -1 1 .c , mot, et, ; f.t c t'l :., c c }, ,,11; 1,c, .,c... 1 F.c..
i, F . i.c rII, ,f :5 ,..:,1::::+1 evetem 11,4...I
),,I e 1. I .1 et. 1 i,e z 7
a3 0;t,iic :1:1,11:, It I},C ;.:C't Urtl, :133 }ht.- ,,f,- `C:'C't V
, .t1.1, .111 lc: , f = vett ;c c; :,:tc,rt ;#cto;l: Doc k;,-ct.
;r,f'.c t t F.c ic., t , : , ttsti,, r,c ,r. t : I 1 11.c ;r1c1fA(c. i;ctc,`} 1,c
i , 'to,
;;,, , ,c t , c..r.,1 tr, I C mon cr t Ir c
. /CI, t I t< c C i 1
el. c:t4c r. t c , c 7 i,c t . ,
c ; c
1 I '71 v t 1.; , ,;;;;
c : :`,c c : I oe. 133, t C.
c
: sy
0 1 11 t I t
///
4.
I
Mc ,*ct ./.14 .ro C .
s t h4 I, 1*,es,r, ,
SP I 1S,1.1 'pH vs ,/p t!"*44,* !I( 0 #0c, It
ft
,111
t
11.
4,
;
14
5 1, v .5 114' I I: 4
rp' ,ss r 441..4( 46% 1: IC
s i 1 ,
f.... i', ..- "...,
., --; 4-'.F... ,
1 1
1 , 4 4 411 , o r , S ' ,.0., A ;
C 'tit vv., t .1,4, Ir 114:
c, vr x It. t. Ikt , .
S.: :SO: 4... 0-1,4 /011, I: t. t I, WI.'
::041 r t 114
'
4*,.. vs: *r*, p t C I ! , ?toys:
t ',IL I, 41..4 t C.14111, 0.), ' Yrt 4 ISCS4
IticC1,4,t4 Sr at tG Lae tve111:14 'II t yr I c.,
15
fully in :',oction C. Tho development of more advancedt
c crurmini ( t onP ri,anno1a Pwa t. c a Tro)re c(Imp1cte expoaiticm
of t onc, tr1 be ;11 ormrd by t he interface
1 c,i wino rn Cc-JIMA-1,1(i
1 of OT MA 4 ( 4
I 1.0)'# E o
Out A t i)dy of t hr r (mmand 1 anEuale f or Ar vet. a 1
xctricval AyPt 414 41'40 41 4._ 0;:i411( t. A (
a
d uc; I n a n1.$44lrC3 of 1 wpor t a'nt . 1f At 1 11
tentative, onc. 11)0 1 ono
4
a poi ticval 1-4 ic pc Tall A. ( Epp?' c ;Iv 3 c of3r(dividua1 f uric .tri a .
TI, ata 1 y; r hr (irriavrio!:. ticityAtc3y, 1 hoti lec tut /0 i
truc late t hr H.= v pi 'a ivr f t117, 3 co.:7 1 cm,
t i cc Ings...21-,0 c *4-; t .c *i;c0 tt.AC2 c A
lic fwc.Vat ;c!l" acct (,r,vet;lcttcc 11 1e 7IC4 c c ' I I
clatu.27.1(1: i,C tItA Iztl re* t t 1+=t$ i 1,t! 3 v I 41.4* !yr11:: t S t Ttlz
V.11 A 1 t. c trttlttiat,c1 1 AT ti.:(F.:,c v c-/ri,;:, f t t
hc v 7tc !tit a arc c lic 1 v 1 r f
c(millc:c .1 ((v.STIchrl-1:1vc I .c r..1:21c 1 tt.. cfs
tic r ( f (,13:4 .O
,,s t1.c z.1.2c.vc 1...c:tic I 1 : :r 1c (
At.11.1At c cfA( S 1. 11t.:z. At,r I.:IN-et. ...outt.'44.4 I:. c
A t 1 313, c 4 Inri4.2 1141 t I 11;11: C
1,h 1; t t c-31.: 1 .c-11 out:3.33131,1 :Atec. 1c (.a:.a
C 41 I .1 it I c c c nam, 4,; I
=:11, s- : t c
7.c. (.11 (. hc 1 ,,At !is, f L33 t1,cI zit,z,
10
It is a common misconception that the difficulty intranslating among command languages results ft6mthe different names given to the functionally similarcommandn and their argumentn in the differentsystems. As nuggehted above it _in rather, thediver-Pity of 4function bundlers" and the incompati-bility of exact translation that are the primedifficulties. It in, however, convenient for theliner to he able to reassign command names into morefamiliar or Pony-to-remember labels. Thin in onerequirement of an interface language which, in effect,allow; many dialects within the common-language,framework.
II. The ry f or , it is proc1pnt to ronrarc h (-cry-ary1 lanFoJap,c,4a 1 onp the 1011(twitig tn.; 4 :
Ann1ynin of retrieval commanda in order todinnect thvm into their ptisnitiva functions.
Development of 4 common command lanRunEecqnnintiny. of macron of these primitive funcltiona.
iii 1=464e:tip 141_1 Cl/41190CD foie aaB1Pt10E 00Ct P inflituaW;rit of Kenetal Income- at lit y and its
cx:Kt4iam of trannlatimi. as well ac, Ficncral
gricla fort capict owatem UDC.
iv initial cmIlhapio on acccza to tbc 1N netwotitt tt301 Ibe f cletevrti let EC t 11C: t ban t (10E11the lanEneKce of env of tfic 'ecvclal I 14 eyztemcthcaricc1vcc.
Opcn cncicclncpa, flexii,;11tv en4 motinialitT incicciEn of the fnucTontl lanEncE.c in ileeliable
in he env oquzieni of tc..layco 1 N
cycjemc.
10(clfacillg P.c(Wccti itivctac Vocal,nlaiicc
10 (*tA cat iT effoite we inveetiEsted the notitql
r jut, tI.c It,11cx 12 Cc rc i atL11.ci v tcc Itr,icltica 101 1411 CIC.C:
17
decomposition and stemming, to alleviate the difficulties
of searching data bases that have been indexed under
different controlled vocabularies. During the reseIrch,
this notion was extended to comprise the concept of
the Master Index and Thesaurus (MIT). The MIT contains all
the index and thenauron elements of each of the damn banes,
including an ordered lint of all vocabulary ,terms used for
indexing t01-;ether with the country of the number of docu-
mcnta indexed by each and the thesaurus relations for each
In additioa. Iffirougb ur.ie of the techniques of phrase.de-
ompoa t ion ( that ia, brcakins, a phrase down into it et
Ind v dna I word j and A t earn nF. (dropping: word endings r4o a a
Co «rapider only thr,w0td atrMa) we can automatically identi-
fV mat intetvoc=t-ulaty relationahip.a in addition to the
idetitv telatitphip.
vhe lot readlr etoriny. and referrIng
tt, t!..e.te 7elat.,ra:-. t( rtovidp itt thc mactcl iti(1,..x and
The c =LA %Jr:: t ei et en, ea ti. a 11 indcp, %o( al=n 1 at v t el ma under
cal 1, k= t rl ztem that ai,pealm in that total. ( :.cc r14,. 2 )
4
:I es, am/41 I he 1,A7,A fetal etc, ts i( al inzn 1 at i00 is
I'.!. A Thc : A 1 1.1.ahr t i=a1 ltpdat e :fept 1 .4 7 1
electr-
icityicol
is
H T Nelectric fields T,
Helectric intolotinq popert
Helectric sporl1
al electric/1 intirIntinnT'N
T
RT
4/11.11.1 electrical intulotont
18
;owl-
otorsN
ationT 'N
1. MO
oting
Omo
ftift11.
.N..
thermal intalotionT'N
B1--NT ill copocitor poperT
,RT ... i asbestos rN
..RT dielectricsN
ft.ft.
1 T,NRIwiring
UTft.ft.
U
RT ieloctrIcol portcloinT
PAY
relationship ettalithed outomaticollycl 01 i en from e,ittin9 thesaurus
Don, TIST Tt4ISAIIRUS
Jv NASA 1.11SAlJRUS
R1 RIAA11.0 11RM
N1 NARROW, R 11RM
liT fiROAtitR T(KM
111 11SIO FOR
f i0. 2 Sarnplo Relationship aftx>ng Ttren as Maintained In Master Index and Thesaurus
19
is automatically found to be related to the following
TEST* terms in the way specified: specific to electricity
and insulation, generic to electric insulating papers,
synonymous with electrical insulators and, of course,
electrical insulation, and otherwise related to electric
fields, electric sparks, and thermal insulation.
The MIT can be used to help determine which terms
to search under and, indeed, which data bases to search in.
The MST can be used either in a purely automatic mode ar in
a Manual or prompting mode in whith the user is given a
display of terms to search under. Interestingly, the MIT
should be a useful adjunct even for a single system with a
single data base having a controlled vocabulary. In sh6rt-,
the MIT concept clearly has important potential in the
development of I-R networks**
5. Bibliographic Data Elements and.Structures
In our research we have determined that another
prime consideration in the development of means for users
to in
*
teract conveniently with different data bases is
to.
Thesaurus of Engineering and SCientific Terms, preparedfor U.S. Department of Defense by O,ffice of Naval ResearchProject LEX, 1967
**See Section C.6 and the appendices for additiohal details.
the interrelation of the div rse4data elements and
structures Lim those data es. Three ways in which
.
the interrelations are imp r etant thay be enumerated. First,
searching is done on one or more data elements:, in order
to translate a search done on ope system into another,
the correct correspondence of data eleme is must be found.,...
20
Similarly, user output requests iequire he specification
of combinations of data elements from the catalog records.
Finally, in order -to combine retrieved document sets from
different data bases and to create keychable document
sets from separate'data bases, we need to: identify when
document references from different systems refer to the
same document: establish dommonreference formats: and
create common index (inverted file) and catalog data
structures.
Our basic solution to this probleceis the
concept of a common data structure based on the identifica-
tion of data primitives or basic data elements analogous
to the basic component functions of the common command
language. Compound data elements in any system can then.
be translated into, or composed from, combinations of
basic data elements in the common data structure. The
21
basic dataelements would be hietarchically arranged'
into a data structure' and, typically, the compound data
elements.of a system would be equated to a higher level
node of the common data structure.-
At the'highest level we haVe subdivided the
common data structure for bibliogiaphic data ineo
seven major categories. An initial breakdown of one of
these, the Abstract-Indexing-Contpntg category, is shown
in Fig. 3. There are 21 basic data elements identified
in this category with 14' higher level hierarchical
groupings. Note that the abstract sentences, which are434
included under the abstraCt grouping, are separated out
individually to make it easier to use this information in
subject indexing.
Three additional major categories which have been
similarly analyzed are Titles, Names-and-Relations, and
Related-Document-Citations as shown in Figs. 4-6, respective-
ly. The other three major categories which were identified --
but not fully analyzed --- are DescriptLve Document Features,
Library Systems Holdings and Shelving, and COntrol Fields.
Alr
Ir1A
.: T
.4...
0:
AL
S rP
,AC
I$T
IAT
.
&!
A(
a4S1
14{1
.
C C
Of
Mrf
Cf.
As
CA v.
1.9
I
SO, C
AS
..A41
4 44,
tnn.
,*
'Sr,
,,.. ,
fl.S
4,fe
r,rn
S.
.A,..
"A
AA
nstr
,,1
.10
9.11
I .1
1yr
IU1.
1
(SC
Cl
IiC
CO
E
S -
C11
C .0E
-O
A -
rASS
-1A
: r,
cia
mae
4
C L
A S
S r
5
FTL
ASS
-
,
C
CL
ASS
-.y
.Q..1
11
C A
SS
jA
:2
S'IØ
5).
'C
ei
az,
1 A
OC
4-I
._R
NI0
.«,-
I-.
7C
-:;.-
-cIC
iJ
C.T
.AT
E E
.,Ar
PHR
ASE
'*5
A-
Ccc
i
'-.1.
Ai a
-t55
Ci
I
of A
l1,
15
Fig.
3C
omm
onB
iblio
grap
hic
Dat
a N
hine
nts
and
Str
uctu
refo
rth
e In
dexi
ng C
ateg
ory
(bni
tiol V
ersi
on)
v.( ,ztillt'V
1*1_1.1:33
S*
J:
c.,rz:
2I'
-
Vitir
J
I-
r-i-*
2i. i.0.,
.
go,,r. _
,: 164 .1:.,.,
10, ,I
.-.1.
.s.1
1,1 '415
;,.
, 11-1.
C.
is.
-
Zy 3
t.-
t4:-).T
1S
1L1
-vrelf
I
I3111: - S- 311.11.
I
0,, 4
."II .%
wi,. tic
14I.
..V
V-
.
I
.
10.
..I a:
iti
It,'II.
t tA
' -I'A
O,.
IkI
',oil
4,,1.- ',1
.1Al.,.
..t;1.21i4IG
-ot%J.. a .1
.,;.*.;
.V
- MI
14
, :14.
V
1.61...,
It,...I
I1.
Etl# 1.v.
nr.*V
, VP
..9-
,i4161111:,.
5,U
.I
.,I
i II
I.
I.t.
Iv
'
Oh ...A
.4.1 J-55,51
01I55A
O -I
....I.e.
0_..
(...
U...
1r.7 .6
'41.)tyit1.1
r.1.1:41 4,h.44)
,1.411/eii.)* 4.
ay
.tv1.1.314.,3.
,/et
434T
A:
-
0...f3/4/4:1
e43.4140
IA,
111.0
111
,
43.Ia
.0,'
t!
1 0,
ay
l
-4
A.
IA
.443
Act
elfrV. 1
"131.
10.1
s
4:11031I
3.
1,3WC
I3.4
JrI
W.'
IV...131_ .
.r* ,.:4433
1-61
-;,10,
..14- 47.:4
V
34'4 ...IN
.3.4,
; r
; Ic ,IC t 't t Cr C tc I fl. C 111C t ! : r .
!
WC, Itt.: c t :., I c. c f
1 C : COY
c : : c i , t.c , c. t c 1 :.,, c :.-,- c A, i sC : C C.?:,:
1.C. C 1 rt-44c 1-101c. r, t ; r , : tr. c c I :,1 t , c. 1
4
t t c c t t ; tc Ct
c t a C c
(.cs. ctrl .0 C zit ,,! 1 jtttcY atc
1,c
-11 it 2, I ; tcY IttC 1,t ..t
j1i1C/ f.:sf C
:)C -111, .t! t c t
t!t it?. r t ler . T tut :kr.
lc Icv:tj trci
ccI ; c-,(4;c7 r crItt.tet t it n t t cl t } t
c f c anEltai..:c cctsti a
?lc an: 5, Y t Y an clat 1 ; nt, thic at]rua c t
cithrt
tctlat.tiucc f ait1It. in tltc t; ant lat 1.-ttinklr it Vt.( aF.31:11 cl
1 c vitic Si lc at (1, and t. f (,E. out putI tt t hr t I 1:11:1/1 11 1
%7
t ccA7,i .1,f1c/ct,tc t r, A Ittestv or, f i I c
, i Al jilt.; to %let C;tht 7 t'-Ctctt, t, tt,c ftillt C t,t 1 it:\t, :7 I cr,t 33 t icc t,c t,E
I t IttrliA r.'3 A 7.E t LA c 1 f c, ,1.;c1Icit
1 '1,<( r 1 a I c:t Lc-0 r t cpt,ct ittscnt lxith the°iL! Pf io t c 1-4.r. r;:,c. ttic- t .t,c ,,t hc
It ;, 1 :(,i I. ; Ti ct.. t IAt ,,tC t 1114 t 1,.ttA( }.1C vAt it 1.47, t }cat r t jc.cc
A 3r ;. 3 4 3c nt. ,1,31:41 t 1 ht. :( 11:4t v ptt,c n 1
I c v , t 1 . 1 a : i r ;JO I 4 j Leno. cac at,: 1,c
-r,: 1 t c r, t t (Pt? (otiplyrti (ft1,1,1 I
lcitft t hc r)c 1 4 tic :pi T.C1 t1J3 ittit1A1
i n t (-7 l a ( t c l : a r t .1 ( t 1 N 3 7 1 ; iAr (i S 1 ; 0 that t l,c 1 ( ( 1 1 ( 1 w -
I pC I- A t jt cr,t)111 I C, IC f ()7 :tit 41 1):0 t,h tl.c 3 11 C f f c
A :+c 1r t t (roc t wc, C 1 ;) cnt (trticct r0 j )1 t c- 1)1't) Ac4:think (can thIA :,),CtcM A)
1 :,cicc t c-Ithr.t t I tc « Farb( rn (:t di IT 1ar:EtsaFc c.y t lirc.f t c:x., A In Ch I( h it, AD:
II :.yttctr) A la Fc is L.:cc ct an l' tctjcicc,ta n I'..vctc-a, A that atc 1tt,r-r-i1)1c wittii:1 that ayatcrn.
41 I CON I .1 1 att!,:nat.:c7 c pct t tic ly41(.nEc IltUak.47,3 ti A
I . Maltc Ic pc, at t tt t ccincnt
it . ria14.4- a t ittplc tr1i.tlt IrtucDt t (. r,cc tata1t1hill! ()I t i t tl (1r1 vt'd 1 I it:: tit,i t ; t otstld i ti t tic3,1 cv lour_ It I
iii itartc a f i) c int tl.c tccttl1C 3.1 tt.c1,7jttt 7 c (.;)Ic C t an lc Cl (r7 f t ct$1,
c-cf4lIcnt ; ttk
a V ; 7 ;r 7 tr. V cµ'i alcc '')°, IZI caVcrjt4 t iii).
'Pr-7 I i '777 e re t hc fipct cr ihtic4 P770t hcr:11117 f(4' a1r1 in 4c. i 114. i At ccat( f, t crmc
t.c. o cur I tic 333/133a3-331c k.1vc-n jn ( a) anti (I)) cnau 23, ualcr t an() oc h .1 hc hcr I cyttcmv.),13" h ;e t;rt cnt ( tai,
)',3 t c t 1,zt t ht (tk,101 j)7 i At c (07a017)At
t hc cc ( telt:AA:At tFtt tcs,)11c (,f scot ( hinE in 01 lIcTcnt
vc..tc-tIA Can tc '11c, tc(1 1t1 a, f igittoyn fl lc
fl.c t 1 c 12 t at tie (if (q4t implement rot (yn
I 17 1 (lc c En 10 E vcta 1,c 1(444
4 Al Int (ttInc, t I (MC and INC I Wci7 1, nistitin IL at I.< kti
0117 (it I El nal 1 racnt ct31 %cap t ci tua Mc a s <ttit3cc t S Ott of
t ttc Int Cl r. AC C to I110 cx Ictt irval tvc.tc-333 at M I I atul
.:Ir :a( tr trt i rya rlx; tat 41;11 chinE t (ntrauni Z. a-
2.,,1 (In lc 1 tilt t t) 41.) thta G 1.011 3 C11 c c I (('3 t than c ()1 I E. I tIA 1 1 V
.u.ut it i pat r tl liowcyc t we I na 11 v t vctt ccvc.3al tli111 3. 131
t IC0 and %ea Qta ac 131c-Vc a 3333x311itt 13 at jtt11C haac t hat colt
1,1v paucc 31 out trti Eoailu, 1:3 pat t It ul al (1t11
t c l ; at c III (iF. 3 att3 t r I tlic t1 CAIN 1 ( (01111c t ((r7 I 07 he t 10:03 itct;
I T, f ; a.a t I, n 71 vcc, c TtR t a)]c ro-) t
'in.; r ,,r!!! ci jr.t c V;1"; 7 I (,f. c 1.,.ct t T
tl,c Atr4,;.1.7 t _.(i)x 47 int c-1 1;4, j c ,c c. (141: Ar c. c. 1
,;:c..7 f t , orl, 1 LT t itc t
iiC- I 144.,.
c T ..cc- I t F. J
711C. ;Tit c_ j l At C., IA, t, 11411tt :1i; 1 A I c 1:1 I I
, i c t t i c v p ; ,,1 VA b41
IC I r.at rti 4,11 At. /144114,),...1 7 c f ir :1 t nccc is tLtt
11 -if cp :vat I. i 4 cr.,1 l t cc l (lc IA tit, I hc )1 I 7.
7 () / t fftLVljt C 7'; rc ccr ta,,J C Vtt Ctli is 1.1U)1 4 NI v, is
I ,1c tit Atl I 1N,!4, () / 1 *, lr, rtc t Plat v la nil In
4 clt11 t tun we. c c t Al, 1 i C 11 I hc c An: t r, (.7,roc-c t t,,11 int c.
1 a c, with TY1VtI;7 kit t }..c I It y ((rtuutit,ItAIc
1,c1 t 1. t vc paw tit C,(.1 rilot 1,cri (11:1.11 c vtt
t be Ii1A11K, :vc t t he 1'.at t c-11c 1,7, : yot vat,
the I n $ o t tua t it .L( v c A \CC 1 A: Al., A CS Si= t ;1St ,I1C.
t t n'ITIC I tt 'II I I N:r ya t ic-tr.,
ft tai I rt.: der:, 1X1.1 i I tl.c ttc., n=t, I th: I...1.i, it
t ci onnc, t ion: r tit vc(1 i r. lvct, it. c rs,t c-
(Alai lc a 1. ilartt ict, Dat a ( trti:r4ttn 11 at iota: I ci Al I . i,C 1 1111C tat A
121; :a ket ,c 1 ,i,c.t '. ii Intel A c rf, 1 1,, 11, IA,izl,oint. Ictl.tlit al .1r 1:1( '1 A1111.:;.:, i..l ; 11
t 1 1'4
t .
vc c , .4,7.
r, ii), r.r 1 ,C mr11,01', A I f4f,e
I 7 s.i.r., t .7.: I c 1.c ; t
C
.4n
C 7,:: A 1 f. 1..t !. .4-1'4! 7 7 c tT.4 ) nl I
t ;'1..rcc.7c1 .1n 11.1. ;A/ Cc?: f
I ;1 iAl AtDc-tct C he cpteit,A) 3 t, C C t C ,t4i1,c, 1 )(I ; c c tt, r.:; 1 e. ; cc I v he- 3 ic 5IIA ct t c ::1 T.:4 I 1 t 1.c I c sr Thc, frwc-e
t 1 sx t III ;4; ;!*1 1c ) y ss.i3c t I, A 11(4 t in the.
NA1 I 01. r c AT;11.7: 7 II' 71.0 ht ir3 1-11 cil V
(1. Me- ; r I C 3 vc l C ticc f f 1 c ty:/c h iitUAC
t r.: f t 11;1.1 : nt t,c t he Nr.,7, e,,, t hA i we
Luny 15143 c,ic t Tx. 114 c 3 3 ",:tic tit t
t wc Cr'. t I,c. !1 )1 c t,t1 t 1,c 713 itcrtc atc 1-v
ol, AC 1 . :.0
1C I fo.'c c t I I.c I x.: c S A. c Ca;J,1
;:: I c ; .1.: : LICVC .1 t 1.1 A I, t t I.c 1 11
: A t a 1 4::tei 1. 1, t I 11
I I , 1..t A.:11. 1 11 t 4; 1.1 144:c\I
Hi . t 4:11i c I,N ; ct ctt:-.11,11ch
; :.!e., . 1.1 .::. lIc 113 1. .'t.c 1.AtA Ccl AId
,
I
L...
1.
r'Lrs
,
.o
::N"'"
--JN
,,g,
,..,
,,1.
.4' A
.
--".
.N,
1-
,1
5.'
: 1%
.Z
-77"
.`--
--,,_
,..__
,,,,,
Th,
....-
--...
..,,,
...._
..._.
.:---
[i
tt
th_
11, \
\ \ \
.--"
-.,
!41
\-,
.N
._--
-1_j
Vi.
1
....-
---.
...e
<
,
\--
.'1
444,
v, .,
-J
t
I.ty
I*V
' IA
s2'
71.4
,,, "
A I
11:
:'1
7.
!I. .
OV
A ''
.
TT
A14
1,14
.-,t
;;,7
I17
1-3
.72
1-
:71.
44.1
071/
'g4
' 5''t
5 d
'zte
'TT
S
32
zrttini: remain pararrrtrrt in the TIP for thr connection.
irtminal it thn onne, tro imul data Art
to the '1.1.7. MI 370/116 in which iP located.
Finallv. a dircit «mnrction- in entahliphed hetwern the
tr flata thr couinonication link
.r:twrrn thr APPANVT TIP and Intrrx. The electrical and
twihanicai hAdwAte by which the teiminal 14 nwitched from
TIP to 370 (o,DIYoter, and hy which the two data Netn are
connected toi..,rther. haa been termed the "network
crocaovrt boy" And war.: deriined and conntrucied at the
irrtyonit :iyatemtz Imhoratory. (Sec t4chematic dinKram
in,
Thr mar. mrchanitim. jr card to entablinh
(c-nnrction brtwrrn thr TIP and thr TYMNET network by
taryrly rnakinE thr urcond terminal call be to the local
1y!1NLI aatr11itc (omputrr rathrt than to thr M.I.TI 370
t ):11.111 r.hould he noted that the network crOSGover-
hAr, the ilryihility to he connected (nt
diflricnt tittirt) to diffrivnt coaputrrt,. Thin Ilexibility
is of imVoltan.c in c4pclimcntinF. with diiierent 1-11
nyntrma.fr
(to 370)
./DATA SET B
33
(to TIP)
DATA SET AV
CROSSOVER BOXTTY INPUT
J
POWERON/OFF
SELECTORSWITCH
RS-232 INPUT
INPUT SELECT
TTY PRINT/NO PRINT
MODEL 33 TELETYPE
Fig. 8 Hardware Components Used in the TIP to System/370 Connection'
a
34
In any .case, once any TIP has been connected
to a non-ARPANET computer, the CONIT interface program
must establish access to the specified TIP port through
appropriate calls on ARPANET software. This having been
done, CONIT can send commands to either of the two
currently connected I-R systems through the ,respective_
TIPs and receive responses from these I-R systems Over
these same full. duplex connections.
Thus, at any one time, CONIT can be
communicating ith either of the two I-R-systems current --I,
ly connected: one being the MEDLINE system through the
NBS TIP, the other being either Intrex or one'of the' -
TYMNET. systems through the CCA TIP. The different TYMNET
systems can be connected to CONIT sequentially by having
CONIT send the commands to logout the currently connecte1
system and thenlogon a second TYMNET I-R system. _Finally,
CONIT can switch from a TYMNET system to Intrex, or viceJ
versa, by redialing the connection o the CCA TIP from:,
the network crossover box.
3. Thelr'Basic Interface Programs -
The CONIT-1 system contains as a nucleus a set
of program modules ..that enable a user to have an
3.5
interactive dialog with the computer. These modules
include, for example, routines,that provide for message
transfer between a user terminal and CONIT, the parsing
of user requests into individual command and argument
strings, transfer of control tp routindthat execute
commands, and the storage and selection of messages sent
the user in, full or abbrTiated_formats.
Another set of routines enables and disables
the ARPANEt TIP connections,,as mentioned in the previous
section.
A third set of routines allows for the
translation of user commands into commands in CONIT or
external retrieval systems. These routines are based
upon the concept of translation table. CONIT automatical- .
ly selects the appropriate translation table when, for
example, the user is speaking the CONIT language to
MEDLINE or Intrek. Special purpose routines also perform
other functions reqdired for proper translation tp extern-
al systems; far-example, MEDLINE.
in upper case, and user commands
verted.
must receive characters
to MEDLINE are so con-'
36
4. CON IT -1 Command Language
In the design of command language for CONIT-1
the question naturally arose as to the structure and
characteristics of such a language. We have inclined toward
a simple command-name/argument string type format with common
English words for the vocabulary (Ibbreadations allowed),
very simple punctuation (e.g., spaces) for delimiters, and
general indepenakice of command and argument ordering. This
structure seems preferred for ease of use --- especially com-
pared to a more complicated programming-type language.
English, used as a common language, suffers from the twin
. defects of being of complicated structure and being very am--
bigupous; these defects make it hard to explain to users what
precisely can be accomplished in the system at hand and even
more difficult for the stem to parse the user's request
syntactically and semantically.
These views on language structure have been supported
by our own previous work in Project Intrex*and the direction
See, Marcus, R.S., Benenfeld, A.R,, and Kugel, P., _"TheUser Interface for the Intrex Retrieval System" infnOetactive Bibliographic Search, D.E. Walker, (ed), AFIPSPreas 1971; pp: 159-201.
.
31
we see existing systems taking as evidenced, for example,
by the previously mentioned Stanford seminar. With these
views in mind, we began the design and implementation-of
the CONIT command 1.4nguage. The initial set of commands for
CONIT-1 that were implemented are listed in Table 1. Note
that in several cases e.g., FIND, PRINT only the
simplest default conditions were implemented. In addition
to these regular CONIT commands, if a language other than
CONIT is being spoken, any command of that language may be
given; i.e., a "transparent" mode is possible. It can be
seen by reviewing the explanation of the commands that
with one exception all the goals for CONIT-1 as set
forth in the initial design given in Section -C.1 above have
been met including the translation of a couple of commands
from the common command language into the ,languages of.two
target systems. (The status,of the one exception --- display
of the Master Index and Thesaurus --- is des"cribed below.)
The details of the CONIT command language, while
following the general principles described above, were de-
veloped in only a very preliminary stage; should our work
be continued, they are subject to change. Nevertheless, the
Table 1. List of Commands Executable in CONIT-1
COMMAND
SELECT x:
SPEAK x:
LIST CONDITIONS:
EXPLANATION
38
Select a system x to sdarch in andenable TIP connections.
Select command language x.
List current language and system beingsearched.
VERBOSE: Have CONIT give full-form, instructiveresponses.
BRIEF: Have CONIT give abbreviated responses.
SET TABLE x: Pick translation table named x.'
REPLACE $x "y: In current translation table, add therule that the string x is replaced by y.
DELETE RULE $x: Delete from translation table that rulewith x as the string to be replaced.
LIST TABLE: Print out rules in current translationtable.
DISCONNECT x: Disable TIP connection to system x,
(T) FIND x: Do a simple search on x in system currentlyselected.
(T) PRINT: Print out standard catalog information ondocuments in current list.
NAME FILE x: Select file x (create a new file if xdoes not already exist) to be used to savefuture catalog output.
Add response of next request to currentlyselected system'to save file.
EXIT: Leave CONIT and return to MULTICS commandlevel.
FILE:
*(T) indicates Xhat a translation to the selected sfttem isrequired.
39
decision-making process for choosing certain of these
details may he of ROMP interest. The use of the command name
FIND for the basin search request deserves some comment.
It in different in form from both of the systems it now
translates into. In Intrex there are three different search
commands (SUBJECT, AUTHOR, TITLE) to specify the different in-
verted files or inverted file subsets (title words are a
subset Of the subject inverted file) to be searched. In
MEDLINE the absence of a command name implies a search request
(just the term to be searched is given), although one may use,
of need to use in certain situations, the form "FIND x".
Thus, both Intrex and MEDLINE may be characterized as
having, in general, the search command being a default situa-
tion; i.e., being unspecified as such. Our Intrex experience
has suggested that users find the command language easier to
learn and remember when it is more consistent. The default
command upsets the simple rule: command name first then ar-
guments. Leaving off the command name does not save much
since an abbreviation --- as short as one letter --- may be
used. It also makes parsing and error checking more difficult.
Thus an explicit command, one in common use, was chosen.
It should also be noted that even in this rudimentary
interface it can be seen how a number of the complexities of
40
tine of multiple diverge systems can be mitigated. For
example, the CONIT unclY need not know the intricacies of
logging on to the remote systems --- only the SELECT command
need be given; the rent is taken care of for him by CONIT.
In the case of regular MEDLINE use through\the ARPANET, there
is the special problem of accessing through one of 5 TIP
ports. CONIT automatically transmits thp proper ?rotocols
and even cycles through the S ports, one at a time, until
it finds one that is available. Similarly, CONIT given the
proper log off, or QUIT, message to system x whether because
of a DISCONNECT x or implicitly required through an EXIT1
from CONIT. Proper termination of a uqer from a system is
important not only for the individual user (e.g., for proper
charging) but also for other users who might otherwise face
problems in logging on.
5. Implications for Command Language DesignpT
In consideqng how to design a common
language that could serve as an interface to existing
retrieval systems, we came to a rather paradoxical conclusion:
it is both impossible and not too difficult to translate from
such a language to the languages of the existing systems.
It is impossible in the sense described above in Section B.3
61
in that existing nyntemn are usually rather limited in the
functions they can perform and simply calloot perform an
arbitrary request. On the other hand, if an approximate
translation in allowable, it in often not too difficult Co
make a reasonably good one. For example, the simple PRINT I
command in CONIT can be translated to the "PRINT" command
in MEDLINE or the OUTPUT command in intrex with reanenably
close results even though the catalog information printed
might occasionally vary slightly in the default modes of
thene commands.
Two ways to aid in the translation process pre
described elsewhere in thin report: the Master Index and
Thesaurus (MIT) andjthe common bibiographic data structure.
Two other elements we have found important in the interface
problem are the identification of search statements and the
disambiguation of named entities (commands, arguments, neta,
etc.), as discussed below.
Firstly, it should be recognized that the names given
entities in the target systems can be isolated from those
names given in the common language, an ldng an an unambiguous
translation in possible from common-to-target language. Thus
the main task is to provide a mechanism for insuring an
42
unambiguous set of named entities at the common system
leVel. (Of course, the naming of entitieR, such as
coananda, at the common level should consider such factors
an the advantage of using already entablinhed widely used
terms_ --- e.g., MINT.) The common interface then needs to
keep account of all entities that are named a priori in the
coamon command langungn or that may be named in the course
of a given uner'n neanion. Such entities can be generically
classified as:
1. command names
2. names of argumento to commanda
3. namen given to retrieval aearch setsI
4. new names given by a user (an through aREPLACE function) to any one or combinationof entities in the above 3 categories.
Argument names cover such entities AS OAMC5 of systems, data
bases, paved filed, catalog elements, vocabularien, vocabulary
elements, modes of command operation, and so forth. The
common interface would insure Against dmbiguitiea by checking
namen a user might initiate against the current list of named
entities and request )that the user choose again if a duplica-
tion is detected. It is pasible, of course, to diaambiguate
at times on the basin of context, but it seems simpler and
tt 43
easier to explain ayatcm %Jae to a oart if all ambiguities
are avoided in the ffrnt place.
The identification and recording of nearch ntatementn
and their renultn appears to be an especially vital tank where
multiple nyntemn and data banen are involved. The elements
aanociated with a nearch that need to be recorded include:
the aearch statement number (annigned by apnea));
2. an alternate nearch ntatment name given by user;
3. the nenrch ntntement itnelf;
4. the aystem(n) and data bane(n) for whichintended;
5. the number of references (or documents)retrieved;
6. the name of search given in any target system;
7. the actual hint of references retrieved;
8. for an subnenrchrn nrcennitated by thinuenrch either at level of common system orthe target nyntem, all the above items ofinformation.
All items of information above --- except, for acme systems.
itrm (7) --- could be obtained and maintained ntsthr vommon
interface level without requiring modilication of the target
See the discussion in Section, C.6 below of the automaticUNC Of the MIT for an exempla of how subsearcheN may begenerated.
44
6 The Mater lndPs and Theaaoros
The Maator Index and Theeaurus (MIT) concept
described in Section 13.4 %cap developed for inclusion
in the experimental interface. A detailed deaign war* pre-
pared describing the specific information to be contained
in the MIT and the logical interrelations among, the various
parametera no contained. Appendix A describes 6hia initial
deaign.
How the MIT could be diaplayed under manual control
in deacribed in Appendix B. How the MIT could be uacid in an
automatic way has alao been atudied. Aaaume a toter mAkan a
search request of the form FIND A 13 C, where A, 13, and C are,
ordinary Engliah words. The ayatem could then search the
MIT for all words or phraaca that had a word atom that
matched the stem for thy word A. Call these trrma Al; A2,
A3.... Then
11a
Or4rCh consiating of the union of OCAY0104 on
all throe trrul A would be found; i.e.; As FIND Al OR A2
OR A3.... Similarly, B and C are found. Thy search art
answering the initial request would than be the intersection
ofthrthrreartaA.AND Ds AND C.. Previous Project Intrex
/45
'work quEgrrtg that tide treuling Pct . w'ruld generally give
very rffrctivo mrarrh rreulta
Of courgr, many of }'per modra Of IICP of thP MT may be
fieeirabir in pattirular fontrxtA in lading auttnatit. arlrc-
tion/A aynonymrtua and Aprcific rrlatrd trrma and URFT aelec-
tion from Bata of trrma loin-id automatically. as abovc.
Programp were wTittrn to traneform Information frcnn
the data b4PPP Of oia two main rxperimental ayAtema
MLIAINL and INTRO( into the Maatrr Index and Tbraaurun
itorganization ahown in Appendix A The Intrrx data haat,
waa grnrratrd from INSPLC tapro containing bibliographic
data fin thrrr arpatatr date baara: Science Abatracta;
Electrical Englnrerink Abatracta; and Computrra and Control
Abatracta. The MIT formcd from thrar data baara contained
both the frar and controlled-vocahulary information that
could by gleaned from both the INSITC cata10).!. record:a as
reformattrd in the Inttvx r:yatcm and the Itarcx inverted file
Qtroultin, Ito= invctiing certain of the INSPLC firlda. In
particular. thy controllrd-vocabulary tenon both claaaifica-
See. 1) Project Intrex Srmiannual Activity Report. I1 -1:,
15 Srptrmbrt 19/1, Maaaachuartta Inatitutr of Trchnology.pP. 29-47.
2) Ovyvh4F.e. C.F.J.. and Rcintjeu. I. "ProjectIntrrx. A General Revirw." Co be publiahed in Information
St.Pr4L.c_11!1d
**It IN of intr.:nut to note that the catalog relcoVdri for thelntrex IN SPEC data baar were organi:.r,I In thr hierarchicaldata structure an dracribed in Section h.5.
146
tiOn tprmg and controlled index terms and their Index
counts are captured an are all free-vocabulary word stems
and their index counts from titlen, abntractn, and free
subject expresnionn. In addition, the noting of controlled
terra under each of the word ntemn they Contain wan generated.
Similarly, programa were written to generate MIT
information from MEDLINE data batten in the new MEDLARS-II
format. This information included the thesaurus information
given in the MEDLARS vocabulary filen.
sIV. CONCLUSIONS
A. Work AccoTpliahed. Research on the coupling of
interactive information aystemn has progrenned in a
nignificant way. The research has fc\cunncd on the concept of
a trannlating computer interface by which the networking of
heterogeneous interactive information retrieval nystems can
hr achieved. Thin concept appears to be a viable approach to
the development of I-R networks in the interim per.lod during
which I-R nyntem and network ntandardn are gradually evolving.
Particular concepts and techniques which'appear --- through
analyain and the denign, implementation, and tenting of ex-
perimental int_erfackn --- to be especially useful in developing
47
the over-all interface concept include: (1). the virtual
system concept by which users perceive the network as a
single homogeneous system; (2) a common - command language
synthesized from a basic language of atomic I-R functions;
(3) a master index and thesaurus which. stores theo
vocabularies, of the separate data bases along with index
term interrelatioilships gild counts; (4) a common
bibliographic data structure by which the data elements for
`bibliographic information may be enumerated, hierarchically
structured, and interrelated among different data bases.
B. Recommendations for Further Work. While a basis
for research into the coupling of retrieval systems has been
laid; much additional work is obviously needed including the
further elaboration of the techniques listed above; their
implementation in additional demonstration systems which
connect several systems and several data bases and cover most
retrieval functions; the testing and evaluation of these
systems with real users; and the development of more effective
computer-to-computer communications. Also, we see the need
for additional study of the relationship of the computer
interface to the network of I-R systems including such
48
questions as:
1. How many of the I-R functions shbuld be.
performed within the interface as distinct, from
being performed by the separate I-R systems?
2. What are the technological and economic
reasons for treating the separate I-R systems withIn
the network as _inviolate :"black b,oxesq (tt'llAsumption
we have made in our research to date), as cons asted1
with the alternate concept,that they should be modi-4
fied to interact more effectively with the interfate?
3.. To what extent can and should the common
interface concept becbme a de facto standard toward
which existing systemslimay evolve in order to take
maximum advantage of networking potentials?
OP"
L_
APPENDIX A
DATA 'COMPONENTS FOR THE MASTER INDEX AND THESA4URUS (MIT)
Each entry in the Master Index and Thesaurus contains up tofive fields. These fields with their data components arelisted below.
A. Header
1. Is'the entry for a single word stem or a phrase? (1 bit)
2. Entry type (3 bits) (e.g., subject, author, other).
3. Name (variable length A/N string) (e.g., "magnet ","magnetic resonance", "Smith", "Van Pelt")
4. (a) Number of English words in phrase (6-bits). (phrase4entry only)
(b) Number of affixes (stem entry only)
5.. Counts (document and reference counts for each collec=tion --- i.e., data base, for which counts are beingkept in MIT)
B. Affixes (This field needed only for word-stem entry).
This field contains, the set Of affixes for a word stem.
For each affix there would be:
1. Relative (to this entry) affix number (6 bits) (sortendings alphabetically)
2. Absolute affix code (12+bits) (at present this would bethe Intrex 12-bit suffix code; however, we should allowfor futuFe expansion to permit prefixing, as well).
'49
50
3. Document and Reference counts (i.e., 2nanc
counts,
where na= number of affixes, and n
c= number of
collections as in A.5).'
C. Controlled Vocabulary Terms (may be null)
This field lists all the cons olled vocabulary terms forthis entry.
For each term there would be:
1. Term numbgr (relative to this entry) (6,bits)
2. Controlled vocabulary code (6 bits) (i.e., what vocabu-lary the term is in). (sort by ending, this code, andthen type).
3. Type of term (4 bits) (e.g., classification or indexterms).
4. Af.fix code (5 bits) (needed only for single word --code comes from B.1) .
5. Status (2 bits). Is this a currently used term? (Ifnot, we will eventually include dates when it wasused --- also date of initial use for recent term).
6. Reference counts fr each collection, i.e., nent counts
where nt= number of controlled vocabulary terms; note
nt
includes separate item for each vocabulary - type -
affix combination.
D. Phrases With This Stem (For word-stem entries only; may be null).
For each such phrase there would be
1.' Phrase number (6 bits).I
2. Term number of phraie (from Field C.1 of entry forphrase) (6 bits)
0
51
2
3. Type of term of phrase (from Field C.3 of ehtry orphrase) (4 bits).
4.. Word number of word stem in phrase (4 bits).
5. Controlled vocabulary code (6 bits).
6. Phrase name (variable-length A/N string) (sortalphabetically, then by vocabulary and term type).
7. Reference counts for each collection.
E. Thesaurus Relations (may be null)
The standard thesaurus relations are given here. For eachrelation there would be:
y. Term code for giveti term (6 bits) (from Field C.1).
2., Type of relation (4 bits) (e.g., broader term, narrowerterm, synonym, use, used for, related term,morphological variant).
3. Controlled vocabulary for related term (6 bits).(Could conceivably be different than for given term - --note could also relate "free vocabulary' to controlledvocabulary).
4. Name of related term (variable-length A/N string). (sortalphabetically).
5. Is related term a word or a phrase? (1 bit).
6. Automatic search expansion? (2 bits) --- should therelated term be automatically added tq a search when userasks for given term.
7. Remarks (variable-length A/N string) (probably null).(Remarks in free-form English on nature of the relationand when it should be applied).
8. Reference counts for related term for each collection.
APPENDIX B
DISPLAYING THE MASTER INDEX AND THESAURUS
Preliminary specifications for the display of the MIT underuser control are given below.
The command RELATE is used to look up and display informationfrom the MIT. The syntax is
RELATE A B C X
The last argument (X) is that word or phrase eo be looked upin the MIT. The first argument (A) specifies the type ofrelation to be displayed as follows:
ALPHA--terms that surround X alphabetically
PHRASE--terms having wora'stems in common with X
THESAURUS--thesaurus relations for X
Eventually, some combination of relation types may be permissibleon one display. For now, the default option should probably beALPHA)
The second argument (B) specifies that only terms and relationsin a particular vocabulary be considered. Sample values for Bmight be MESH or INSPEC or INSPEC PHYSICS etc. Sekreral
vocabularies could be specified at once, so that the actualsyntax is Bl B2 B3...;-i.e., the-connector OR is implicit. Thedefault condition would mean all vocabularies considered.
The third argument (C) specifies, analagoUsly to B, the particularcollection(s) to be considered; e.g., MEDLINE, SDILINE, INTREX/INSPEC, etc. 'Again, the default condition means all collectionsconsidered.
These specifications will be further illustrated through examples ofdisplay output for some RELATE commands. Note the default optionsfor arguments B and C. The index terms are not meant to be actualterms from the given vocabularies.
52
53
A. SAMPLE DISPLAY OF ALPHABETICAL RELATIONS
READY
relate alpha radiation
The index terms alphabetically near the term radiation (7) area.
listed below, (1)(MRA
(6)
TERM NO. (3) : TERM(4)
: VOCABU1ARY (5): COLLECTION
220 TY rabid dogs MESH; HEADING MEDLINE
15 TZ rachet wheels INSPEC EE; TERM INTREX
20 INSPEC COMP; TERM INTREA
305 TA rad- (English stem) INTREX
21 SDILINE:
107 TA1 radical (English) INTREX
5 SDILLNE
203 TA2 radiation (English) INTREX
17 SDILLNE
1276 MESH; HEADING MEDLINE
45 SDILLNE
410410 INSPEC EE; HEADING INTREX
13 TA3 radius (English) INTREX
192 TB radiation Y(8)
damage INSPEC; HEADING INTKI.
0 217 MESH; HEADING MEDLINE
130 TC radiation ef-fects ontissues MESH;HEADING MEDLINE
(8)
If you want to see terms that follow alphabetically,type relate more.
To see phrase thesaurus relations type relate phrase X or relate thesaurus' X,
where X is the name or'term no. (see above) of the term whose relations(6)
you want to see. <MRAZ>
ft
54
NOTES FOR SAMPLE DISPLAY OF ALPHABETICAL RELATIONS
(1) This is the VERBOSE mode message: in TERSE mode no message
would be -liven. If the argument X did not correspond to
any term in the MIT the following message/would be given:
There is no index term radiation. [kowever, there
are terms with the same stem: rad-.] Terms alphabetically
near radiation are listed below. <MRA3>(6)
Theowntence in brackets would be inserted when the stem
for (the single word) X does exist in the MIT, although the
full word X does not. In the former case, the display would.be:
TA radiation (No Such Term)
(2) The document (or reference) counts are given in this column
when available. (A stem where no free vocabulary, word
exists would not have a count.)
(3) The' term number is assigned byCONIT relative to the term for
argument X, which is TA: The two terms coming alphabetically
before TA are given first and labeled TY and TZ. As many
terms after TA are given so of to have at least 20 lines of
term display. Note full words under word stems are given
numbered final ch'aracters. If a continuation display is
requested (relate more), then numbering is continued from
previous display.
(4) Full words under word stems are indented one space. If a term
is repeated (or a different vocabulary or different collection),
its printing is not repeated. For each full word in Field B of
55
the MIT, first the free-vocabulary countslrould be given
(from 'Field B.3) and then the controlled counts (from
Field C).
(5) The vocabulary is given followed by a semicolon and then
an indication of the type of vocabulary term (e.g., class
heading or index term). The vocabulary and type term
parameters come from Field C.2 and C.3, respectively, of
the MIT (see Appendix A).
(6) These are message names.' They could be arguments to a
SUPPRESS command which thereafter causes their deletion (or
replacement by TERSE form) or the EXPMIN command.
(7) Some emphasis on the playback ot argument X is desirable.
Here, for example, radiation could be in a different color
or capitalized.
(8) If information under a given column is too long to fit in
that column, some convention such as suggested here will
be desirable.
56
READY
B. SAMPLE DISPLAY OF PHRASE RELATIONS
1
relate phrase radfaiii?n damage
The index terms with words whose stems match the-stem(s)(1)
of
rad-cation (and damag-e)(1)
are listed below: (MRP>
POSTINGS TERM NO. : STEM 1(4)
: VOCABULARY : COLLECTION
305 TA 'rad- (English Stem) INTREX
21 SDILINE4
107 TA1 radial ,(English) INTREX(2)
13 TA3 rgdius (English) INTREX
1
PHRASES WITH STEM(4)
50 TB polarized radiation INSPEC EE: HEAD INTREX
192 TC radiation damage INSPEC EE: HEAD INTREX
217 MESH: HEAD MEDLINElor
: STEM 2
TD damag- (Stem) INTREX
PHRASES WITH STEM
18 TE damaged goods BUSINESS INFORM
* see term TC above(3)radiation damage
..
To see thesaurus relations type relate thesaurus X, where X is
the name or term no. (see above) of the term whose relations you
want to see. C1tRP25
READY
57
NOTES FOR SAMPLE DISPLAY OFT' PHRASE: RELATIONS
(I) Delete parenqhetical parts if only one word in X. Null
result message:
There are no index terms with words whose stems
match the stem(9 for rad-iation (or damag-e). (MRP3>
(2) The full display for this stem, as in Section A, goes here.
(3) Duplicate phrases are indicated in this fashion.
(4) Stem information is given in addition to multi-word phrases
as a convenient way to indicate what the stemming is for
the given words in argument X as well as to show single-
word "phrases".
58
SAMPLE DISPLAY OF THESAURUS RELATIONS
READY
relate thesaurus radiation damage
The index terror; with thesaurus relations to the term radiation
damage are given below:(1)
KMRT>
POSTINGS TERM NO. TYPE(2)
: RELATED TERM : VOCABULARY : COLLECTION'
NONE TA CODE 6.21 4NSPEC
NONE TB WOE 11.50.31 MESH
2005 TC BROADER radiology -MESH: HEAD MEDLINE
75 TD NARROWER radiation dosage MESH: HEAD MEDLINE
READY
NOTES FOR SAMPLE DISPLAY OF THESAURUS RELATIONS
(1) Null messages:
a
There is no index term radiation damage cHRT2>
There are no thesaurus relations for the term radiation damage cMRT3>
(2) From Field E.2 of MIT (see Appendix A)
9