automated collection documentation system at the national museum of natural history, smithsonian...

9
160 Automated colleetion documentation system at the National Museum of Natural History, Smithsonian Ins tirution, Washington, B. C. T. Gary Gautier After fdteen years of development and use, computer-assisted storage and retrieval of specimen information has become an important part of collection management at the National Museum of Natural History (NMNH), Smithsonian Institution. All seven scientific departments-anthropology, botany, entomology, invertebrate zoology, mineral sciences, palaeobiology, and vertebrate zoo- logy-use data processing to document and index collections, and the scope and volume of the applications are steadily increasing. The data banks already contain over 1.5million records covering 3 to 4 million specimens, and over 150,OQO new records are added each year. These data banks serve many useful purposes for collection management, research, and to answer questions from the public, and their value is growing as they become larger and more complete. The museum began looking into automated documentation systems in the early 1960s, when a data-processing committee was formed. A systems analyst was hired later to help with planning, and in 1967 the museum received a grant from the United States Department of Health, Education and Welfare to begin developing and using a generalized museum computer system.By the end of 1969, SIBS (Smithsonian Institution Information Retrieval System) had been developed by the Smithsonian's computer centre, now known as the Office of Computer Services (OC§), and several thousand records of crustaceans, cepha- lopods, birds and rocks had been entered. This pilot project confirmed that data processing could be an effective tool in meeting the routine doamentation needs of specimen management while greatly increasing the accessibility to the collection information and to the specimens themselves. In 1970, after the grant ended, a central organization named the Automatic Data Processing Program was formed under the direction of James F. Mello to develop and co-ordinate permanent data-processing services for management and technical-staff members were selected for their extensive txaining or experience in both museum-related subjects and in data processing. The early years of the programme were devoted to encouraging departments and curators to use computer methods, and to developing and testing procedures, setting standards, procuring equipment, organizing project plans and schedules, obtaining funds, and hiring and training personnel. The programme now has thirty-four staff members; twenty-four of these are assigned to the departments to assist with preparation and typing of data, seven provide technical services in the

Upload: t-gary-gautier

Post on 29-Sep-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

160

Automated colleetion documentation system at the National Museum of Natural History, Smithsonian Ins tirution, Washington, B. C.

T. Gary Gautier

After fdteen years of development and use, computer-assisted storage and retrieval of specimen information has become an important part of collection management at the National Museum of Natural History (NMNH), Smithsonian Institution. All seven scientific departments-anthropology, botany, entomology, invertebrate zoology, mineral sciences, palaeobiology, and vertebrate zoo- logy-use data processing to document and index collections, and the scope and volume of the applications are steadily increasing. The data banks already contain over 1.5 million records covering 3 to 4 million specimens, and over 150,OQO new records are added each year. These data banks serve many useful purposes for collection management, research, and to answer questions from the public, and their value is growing as they become larger and more complete.

The museum began looking into automated documentation systems in the early 1960s, when a data-processing committee was formed. A systems analyst was hired later to help with planning, and in 1967 the museum received a grant from the United States Department of Health, Education and Welfare to begin developing and using a generalized museum computer system. By the end of 1969, SIBS (Smithsonian Institution Information Retrieval System) had been developed by the Smithsonian's computer centre, now known as the Office of Computer Services (OC§), and several thousand records of crustaceans, cepha- lopods, birds and rocks had been entered. This pilot project confirmed that data processing could be an effective tool in meeting the routine doamentation needs of specimen management while greatly increasing the accessibility to the collection information and to the specimens themselves.

In 1970, after the grant ended, a central organization named the Automatic Data Processing Program was formed under the direction of James F. Mello to develop and co-ordinate permanent data-processing services for management and technical-staff members were selected for their extensive txaining or experience in both museum-related subjects and in data processing. The early years of the programme were devoted to encouraging departments and curators to use computer methods, and to developing and testing procedures, setting standards, procuring equipment, organizing project plans and schedules, obtaining funds, and hiring and training personnel. The programme now has thirty-four staff members; twenty-four of these are assigned to the departments to assist with preparation and typing of data, seven provide technical services in the

Page 2: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution 16 1

central office, and three are programmers assigned to OCS. The programme also administers a central computer budget used by all of its projects; additional personnel and computer funds are provided by the departments if needed.

Computer system and facilities

An important function of the ADP programme is to serve as the liaison between the museum and OCS, which operates the computer and provides all organiza- tions of the Smithsonian with systems analysis and programming services. Soon after the programme was established, it began working closely with OCS to develop a new computer system to replace SIIRS, which had some undesirable features. The new system was named SELGEM (for SELf-GEnerating Master), and it has become the workhorse of data processing in NMNH as well as at several other museums, universities and government agencies.

SELGEM is a general-purpose information-management system consisting of over thirty-five computer programs, most of which are written in COBOL 74. Some of the major capabilities of SELGEM are: (a) to create and update files with transactions from paper tape, keypunch cards, OCR, key-to-disc terminals, time-sharing terminals and mark-sense forms; (b) to sort records into different sequences; (c) to print preformatted or custom-formatted reports with tabula- tions, (d) to print on index cards, label stock and microfilm; (e) to retrieve records according to simple or complex criteria; (f,) to split and merge files; (9) to index on key words or phrases; (h) to reconfigure data formats; (i) to verify or add data using tables or authority files; (j) to print a variety of reports designed to simplify error detection; and (k) to interface files with statistical packages, mapping programs and other plotting programs. Data may be stored either in hierarchical or unit-record format, and records and fields may be variable in length (maximum record size is 999 fields, each of which is 6,330 characters long). A key feature of SELGEM is the ease with which new users can learn the basic principles and begin building their own files.

SELGEM processing at the Smithsonian is done on a Honeywell-6605 computer, which was installed in September 1977. Versions of SELGEM are also available for models of IBM, Burroughs, Univac, and CDC computers. With the exception of a new interactive (user carries on a direct dialogue with the computer) data-entry program, all SELGEM programs operate in batch mode (jobs are put in a queue to be processed later). In NMNH, most batch jobs are set up in the central office of the ADP programme by four persons who are experts on SELGEM and on the job-control language of the computer. Customers submit requests on a standard form, which is filed later to provide a processing history for each data bank. Most jobs are submitted to the computer through a remote-batch terminal that has a high-speed card reader and line printer. Over 4,000 jobs are run each year, and more than half of them are returned to the customers within twenty-four hours.

Most data entry is now done either through terminals connected to a minicomputer at OCS or with office typewriters that have a type font that is readable by OCR-scanning (Optical Character Recognition) machines. Keypunch cards may be used to submit corrections, but their use is declining. We are also experimenting with on-line data entry to the main computer, and with terminals that have storage devices such as tape cassettes. Our goal is to convert as much of our data entry as possible to terminals to decrease turn-around time, increase production, and utilize automatic editing features of terminals and computer programs.

Prior to 1974, when OCR techniques were implemented, most data entry was done on automatic typewriters that punched paper tape.More than fifteen of these machines were in operation at one point in time, but now only three seemingly immortal machines are still functional. They probably will be phased out in a few

I

1. For further information about SELGEM or to

~ ~ ~ ~ ~ & j ~ f e t $ ~ ~ ~ ~ t , ' ~ ~ $ c ~ ~ g i n a ' d months. Smithsonian Institution, Washington, D.C. 20560.

Page 3: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

a 62 T. Garv Gautier

14

14 Technicians preparing data on type specimens of plants.

15 Entering data about bats through a terminal in the Division of Mammals.

15

We have also had success with using a commercial typing service.The costs are very competitive with our internal costs, the typing is highly accurate, and the turn-around time is excellent. One batch of 100,000 lines was put on magnetic tape within two weeks at a cost of only $6 0.05 per line. This process seems to work best with large batches of data recorded in a uniform, easily-read format.

Computing at the Smithsonian is not subsidized, i.e. the users pay for whatever computer resources they use. The price schedule is complex, so it is difficult to estimate costs exactly, but ranges of costs for operating SELGEM can be given. Processing a complete catalogue record through SEEGEM costs 10 to 20 cents, depending largely upon the number of data elements in the record, the data-entry techniques used, and the types of final products such as labels and index cards that are produced. Simple queries (e.g. a list of all the specimens collected in Africa in 1 9 1 O) cost about one dollar per thousand records searched, and a locality index for

Page 4: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution 163

I6 Computer oper$tor preparing to use the remote-job-entry computer terminal consisting of (from left) high-speed printer, operator console and high-speed card reader.

t

a collection can be printed for two to three dollars per thousand records. We have been very successful at reducing costs considerably in recent years by optimizing computer programs and by improving operations, and we expect to reduce costs even further in coming months as we adapt to the new computer.

Applications

Each scientific department determines its own data-processing priorities accord- ing to its own collection-management or research needs. The collection projects fall into three broad categories that have different objectives and processing requirements-new cataloguing, retrospective cataloguing, and collection indexing.

In both new and retrospective cataloguing, a complete set of data is entered for each specimen regardless of the estimated relative value of the data elements to the future uses of the data files. Specimen labels, index cards, and other standard curatorial documents are often printed from new catalogues by the computer or by the data-input devices at the time of data entry. Retrospective cataloguing is usually done to complete a data file for a certain taxonomic group, geographic area or other unit of the existing collection of current interest to research or collection management. More time is required for verifying and standardizing data in retrospective cataloguing than in new cataloguing, especially if veiy old collections are being processed, but less time is required for management and for production of labels and catalogue cards. Retrospective cataloguing can also take advantage of the existing organization of the collection and of the existing documentation to process records more quickly, particularly by reducing the amount of redundant typing.

In collection indexing, only the most important data are entered, and data verification prior to entry is often kept to a minimum.Indexes to the collection can then be produced that will facilitate collection use, but users must still rely on labels and other documents' to find data not entered. Index files are usually built when a department has an immediate need for an inventory or for cross-indexes to . a collection, but does not have the resources to prepare all of the data for each. specimen. Index files may also be built as the first phase of a long-term goal to complete an automated catalogue of the collection. This gives a department an abbreviated but usable file for the entire collection far ahead of the time when the complete catalogue can be finished. The index file may even be helpful in planning

Page 5: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

164 T. Garv Gautier

the completion of the catalogue and in the early discovery of problems such as missing records or specimens.

A variety of methods for recording data are used.In palaeobiology, for example, data for type specimens are gathered from the literature and are recorded on preprinted data sheets, whereas in the crustacean project the typist works directly from standard catalogue ledgers, which serve quite well in the place of data sheets. In botany, the original index cards for the type specimens serve as the recording forms, and in the Division of Mammals the specimen labels themselves are often used. One curator in anthropology is planning to experiment with portable data-entry terminals in the field, which will eliminate handwriting of data altogether,

Much attention is given to developing data standards to insure consistency in the completeness, accuracy, format, vocabulary and syntax of data entered into the computer files. All departments use highly similar standards for the data elements they share in common, such as country name, collector's name,lspecimen name, date catalogued, and number of items, so it would be a relatively easy task to produce a combined species list or a geographic index for all comdleted files. Many aspects of the standards still need development or improvement, however. For example, the use of historical or obsolete geographic terms 'could be better controlled, and a standardized vocabulary for ethnological objects would help. Refinement of standards is a continuing process which grows and diversifies as the museum comes to rely more on automated documentation) methods,

Here are summaries of what the departments are doing with data process- ing.

A ~ ~ ~ ~ ~ o ~ o ~

An index file for the entire collection of 30,000 human skeletons has been built as part of the Palaeopathology egistry, which also contains photographs of specimens in other museums and a bibliographic computer file. Besides being used on numerous occasions by curators and visitors, the file has been interfaced with computer-mapping programs to produce distribution maps of collecting sites to aid research on the spread of disease in ancient populations.

An index file has also been compiled from existing documkntation for the entire ethnology collection, and another is under way for archaeology. These files will be used to help conduct a physical inventory of the collections prior to their transfer to another building in a few years. An important part of the inventory process will be the location of specimens needing conservation. Technicians working through the specimens will record conservation observations on forms by circling appropriate key words that will later be entered into the computer files so that lists of specimens needing conservation can be produced. The word list consists of terms a novice can understand but which convey! a great deal of information to conservation specialists. I

I

I

A retrospective catalogue of the collection of type specimens is over half completed. High priority is given to verifying all ofthe data with'the publications themselves. An interesting use of the file was made by one curator studying the structure of scientific names. The generic and specific names were printed out in alphabetical order but spelled backwards, which brought all words with the same suffixes together.

One curator has built a large retrospective catalogue for selected ground beetles, and uses it extensively for research. His field notes are written SO that the

Page 6: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

17

Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution 165

17 Newly installed Honeywell-6605 computer.

18 Products from the data files: specimen labels, microfiche (beneath labels), specimen ledger (beneath microfiche), page-size print-out from Xerox-1200 printer (right foreground), two sizes of index cards, computer-plotted distribution map published in the book on the far left and /publications in which many pages were priqted from data fdes by computer-driven typesetting machines or by photo-offset reproduction of computer print-outs.

data-entry operator may work directly from them, eliminati$g the need for

A retrospective catalogue is virtually complete for over 80,00? records of ticks and their hosts. Most of the collection is housed at the Rocky Mountain Laboratory in Montana, where it is used for research on disease's transmitted by ticks. One of the scientists connected with the project is on an extended visit to the British Museum to restudy some of the specimens covered by the, file. He was able to take with him a complete master list, locality index and species index printed on a two-centimetre stack of microfiche. The same printouts on paper make a stack over four metres tall. I

redundant transcriptions. j

18

Page 7: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

166 T. Gary Gautier

j I

~ ~ v e ~ t e b ~ a t ~ zoology

A complete catalogue for all crustaceans accessioned since 196p has been built, and retrospective catalogues are almost complete for cephalopods, pectens, littorinids and tridacnids, all of which are of current research interest. The department plans to expand its data-processing activities, starting with converting

Index cards and specimen labels are produced for CrustacJa at the t h e of data-entry on automatic typewriters. The process may soon be changed to print the documents directly from the computer files using either the line printer or terminals, which will permit quicker data-entry methods to be usFd. Special paper and terminals with small type will be used to print microlabefs for specimens stored in liquids.

Processing one new accession of 7,900 lots was easier than most because the data had already been entered into another computer system by the collector. A computer program was written to transfer the data into SELGEM, and to assign ,automatically catalogue numbers and other NMNH data to the records. Afterwards, the computer printed the index cards and specimen labels. Such a process has been used several times, underscoring the value of exchanges of machine-readable data files between museums.

I I

all cataloguing to automated methods. I I I I

An index file has been compiled for the entire mineral and gem collection, and another for the petrology collection is well under way, The department is now reviewing and verifying the mineral and gem data with the specimens as part of an overall reorganization of the collection. The file is one of the most used in the museum.

~ ~ Z ~ ~ O b ~ O ~ O g y

New and retrospective catalogues are being built for the type specimens, with high priority being given to checking the data with the literature. The Catdog Of Tjpe Specimens of Invertebrate Fosils: Cortodonta was the museurhgs first computer- produced publication.

A catalogue of the rapidly growing sediment collection contains over 20,000 records. The presence of the latitude/longítude co-ordinates in the records made it possible for the computer to plot a map of the distribution of sampling sites in the estern hemisphere. The map will be used to set priorities for accessioning of new samples and for selection of unprocessed samples for cataloguing.

Vertebrate zoology

New catalogues are under way for fish, mammals, reptiles and amphibians, and retrospective catalogues are complete for over 20,000 birds and for about one-third ofthe mammal collection. An index file for all reptiles and amphibians from the United States has been compiled and is being checked, and another for the type specimens of fish is about half finished. Listings from a complete file of current bird names are used in various ways in the management of the bird collection.

For years, skull-vial labels of small size were produced in the Division of Mammals by automatic typewriters with small print. Now the labels are printed from the computer files with a Xerox-1200 printer, which prints on 8.5-inch by Il-inch paper. The character spacing is 13.5 characters per inch, which is satisfactory for labels. The ADP programme does most of its bulk printing on the Xerox-1200 printer because it is cheaper than the line printers, and the page-size printout is easier to read, easier to handle and more convenient to store.

Page 8: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution 167

Specimen loans are managed with SELGEM by the Division of Fishes. Borrower data and specimen data are entered for each loan into a hierarchical file from which can be printed chronological listings of outstanding loans, alpha- betical lists of specimens, statistical summaries of loans for the current year and other useful lists. Expansion of automated loan control to other departments is being considered.

Other applicatiom

Some of the major applications of SELGEM in NMNH do not involve collections directly, but support research in biosystematics, geology and anthropology. Several bibliographic files are being built by scientists in subject areas such as volcanoes, palaeopathology and plant anatomy, and the address file of the Scientific Event Alert Network is in SELGEM. X-ray analyses of sediment samples are automatically punched on to paper tape and later read into a file that is then interfaced with statistical programs being used by a sedimento1ogist.A file of chemical analyses of deep-sea rocks is maintained by another scientist who distributes copies of it upon request. All of the collecting-locality data for the ship Albatrass has been entered to aid documentation of the extensive Albatross collection in the museum, to provide a means of distributing the data to interested scientists and to provide a central file where scientists may enter corrections and additions to the locality data. Extensive research into scientific and historical records has gone into building the Volcano Reference File, which contains eruption data on all the active volcanoes of the world and which will soon be published. A file of the endangered and threatened flora of the United States was recently published and one for the endangered and threatened fapna of the world has been built for internal use.

SELGEM is also used to prepare manuscripts for publication via computer- driven typesetting equipment. A large two-volume synoptic catalogue of the Hymenoptera of America north of Mexico is at the press, and work is far along on bibliographies for the Handbook of North American Indians and on Ifzdex Nomimm Genericormz, which contains publication information on all the genera of plants. The publications will be of the highest quality, using all of the desirable fonts and special characters. Publication costs for the first issues will be lower than if conventional typesetting methods had been used, and the data banks can be updated later to produce supplements at much reduced costs. In the meantime, the data banks can also be used for information retrieval; the Hymenoptera file was recently queried to produce a list of the bees that visit a certain species of cactus, for instance.

I

I Future

The museum’s data-processing future brightened considerably last September when time-sharing and other technical improvements were installed with the new computer. A system review is currently under way to determine how these new capabilities can best be used and to form a revised plan for future system develòpment. Among the applications put forth for consideiFtion are (a) an interactive museum-wide specimen registration system; (b) an interactive loan-control system; (c) an automated data element dictionary/directory; (d) a system to help auditing of the status and performance of collection systems; and (e) an interactive query program. These and other ideas are generally aimed at reducing costs, streamlining operations, expanding data processing into new application areas, and most important-giving curators greateri flexibility, more

A major new area of growth will be the Museum Support Cen$er, which will be constructed in a few years to ease the storage problems a; NMNH. Data processing is expected to play a key role in managing the collections at the MSC,

control and quicker responses in their uses of data files. I

Page 9: Automated collection documentation system at the National Museum of Natural History, Smithsonian Institution, Washington, D.C

168 T. Garv Gautier

and in providing a quick and convenient means for conqeying specimen information between the MSC and the main museum, where many of the scientists will be located. As in the Department of Anthropology, data processing may also be used by other departments to help plan and control the initial transfer of collections to the MSC. This will probably require an increased pace in file building and greater emphasis on index files.

To some, exploitation of the museum’s data b d s by scientists is the ultimate goal and measure of success of our data-processing projects, and this is certainly one area where we are still on the frontiers of development. Greater attention is being given to encouraging the use of files to point out where development is needed and to find weaknesses that can be eliminated by modified data standards and processing objectives. Exchange of data between museums is an important way of enhancing the research value of data files and collections, and ~~~~ is anxious to work with other museums to co-ordinate data standards, promote cataloguing and indexing of machine-readable data files, and share ideas and systems.