developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf ·...

9
Developing services for open eprint archives: globalisation, integration and the impact of links Steve Hitchcock, Les Carr, Zhuoan Jiao, Wendy Hall and Stevan Harnad Multimedia Research Group, Department of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK Contact for correspondence: Steve Hitchcock [email protected] ABSTRACT The rapid growth of scholarly information resources available in electronic form and their organisation by digital libraries is providing fertile ground for the development of sophisticated new services, of which citation linking will be one indispensable example. Many new projects, partnerships and commercial agreements have been announced to build citation linking applications. This paper describes on the Open Citation (OpCit) project, which will focus on linking papers held in freely accessible eprint archives such as the Los Alamos physics archives and the other distributed archives, and which will build on the work of the Open Archives initiative to make the data held in such archives available to authorised services. The paper emphasises the work of the project in the context of emerging digital library information environments, explores how a range of new linking tools might be combined and identifies ways in which different linking applications might converge. Some early results of linked pages from the OpCit project are reported.. KEYWORDS: electronic publishing, digital library information architectures, citation linking, distributed collections, eprint archives, open archives INTRODUCTION The process of scholarly communication, in particular the aggregation of formal academic papers in journals, is probably about to enter its fastest period of change since 1994-6. The immediate platform for this change was the emergence of the World Wide Web as a popular medium in 1994 and the subsequent conversion of most established journals to electronic facsimiles delivered over the Web, a process which began to build momentum in 1996 (Hitchcock et al. 1996). While growth in the number of e- journals continues to accelerate towards an estimated 10,000 (Maclennan 1999, Hunter 1999), it is prior developments, such as the establishment of free electronic archives, or eprint archives, that are beginning to influence wider changes, however. Launched in 1991, the importance of the Los Alamos physics eprint archive, the first and preeminent archive of its kind, (Ginsparg 1994) is widely recognised, but its practical ramifications have so far been largely confined to its home community in physics. Many have questioned, because of cultural differences between different academic disciplines (Kling and McKim 1999), whether the eprint model will be accepted beyond physics. That contention will be challenged by the most significant new eprint archives to have emerged since 1991: PubMed Central, launched at the beginning of this year and sponsored by the National Institutes of Health, covering all fields in biomedical and life sciences (Varmus 1999); and the Computing Research Repository (CoRR), sponsored by the ACM and the British Computer Society (Halpern and Lagoze 1998). An immediate effect of PubMedCentral has been the announcement by some biomedical publishers, notably the British Medical Journal (Delamothe and Smith 1999) and the Current Science Group (Anon. 1999), of freely accessible archives of electronic copies of current and past papers, both pre- and post-publication. As well as discipline-based archives, institutionally based initiatives such as Scholars Forum are planned. (Buck et al. 1999) The essential feature of the Los Alamos eprint archive model is author self-archiving based on a free-to-archive, free-to-access service. Compared with journals, eprint archives provide an almost wholly automated and highly efficient organisational framework and distribution mechanism based on the Internet, but without many of the additional services that journals provide, such as peer review, and other services that mostly require human intervention. As more archives attract more papers the test for journals is how they respond to the effective loss of exclusivity that most depend on, and how they cope with the new economics of journal publishing in which every facet of traditional value-adding is re-evaluated against the for-free services. There is another dimension to consider too: globalisation. With no geographical or financial barriers the next inevitable step is, if not universalisation where archives all adopt the same technical infrastructure, then integration of the archives based on new services. As part of this agenda archives are not only freely accessible to users but are becoming open to independent, third-party computational processes, for example Web crawler techniques used for indexing, on which these services will be built. This SPACE FOR ACM COPYRIGHT INFORMATION. REMEMBER TO DELETE THIS BEFORE SUBMITTING FINAL VERSION. (USE A COLUMN BREAK IN MS WORD TO STOP TEXT FROM OVERWRITING THIS AREA.

Upload: others

Post on 05-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

Developing services for open eprint archives:globalisation, integration and the impact of links

Steve Hitchcock, Les Carr, Zhuoan Jiao,Wendy Hall and Stevan Harnad

Multimedia Research Group, Department ofElectronics and Computer Science,

University of Southampton, SO17 1BJ, UKContact for correspondence: Steve Hitchcock

[email protected]

ABSTRACTThe rapid growth of scholarly information resourcesavailable in electronic form and their organisation by digitallibraries is providing fertile ground for the development ofsophisticated new services, of which citation linking will beone indispensable example. Many new projects, partnershipsand commercial agreements have been announced to buildcitation linking applications. This paper describes on theOpen Citation (OpCit) project, which will focus on linkingpapers held in freely accessible eprint archives such as theLos Alamos physics archives and the other distributedarchives, and which will build on the work of the OpenArchives initiative to make the data held in such archivesavailable to authorised services. The paper emphasises thework of the project in the context of emerging digital libraryinformation environments, explores how a range of newlinking tools might be combined and identifies ways inwhich different linking applications might converge. Someearly results of linked pages from the OpCit project arereported..

KEYWORDS: electronic publishing, digital libraryinformation architectures, citation linking, distributedcollections, eprint archives, open archives

INTRODUCTIONThe process of scholarly communication, in particular theaggregation of formal academic papers in journals, isprobably about to enter its fastest period of change since1994-6. The immediate platform for this change was theemergence of the World Wide Web as a popular medium in1994 and the subsequent conversion of most establishedjournals to electronic facsimiles delivered over the Web, aprocess which began to build momentum in 1996(Hitchcock et al. 1996). While growth in the number of e-journals continues to accelerate towards an estimated10,000 (Maclennan 1999, Hunter 1999), it is priordevelopments, such as the establishment of free electronicarchives, or eprint archives, that are beginning to influencewider changes, however.

Launched in 1991, the importance of the Los Alamosphysics eprint archive, the first and preeminent archive ofits kind, (Ginsparg 1994) is widely recognised, but itspractical ramifications have so far been largely confined toits home community in physics. Many have questioned,because of cultural differences between different academicdisciplines (Kling and McKim 1999), whether the eprintmodel will be accepted beyond physics. That contentionwill be challenged by the most significant new eprintarchives to have emerged since 1991: PubMed Central,launched at the beginning of this year and sponsored by theNational Institutes of Health, covering all fields inbiomedical and life sciences (Varmus 1999); and theComputing Research Repository (CoRR), sponsored by theACM and the British Computer Society (Halpern andLagoze 1998). An immediate effect of PubMedCentral hasbeen the announcement by some biomedical publishers,notably the British Medical Journal (Delamothe and Smith1999) and the Current Science Group (Anon. 1999), offreely accessible archives of electronic copies of currentand past papers, both pre- and post-publication. As well asdiscipline-based archives, institutionally based initiativessuch as Scholars Forum are planned. (Buck et al. 1999)

The essential feature of the Los Alamos eprint archivemodel is author self-archiving based on a free-to-archive,free-to-access service. Compared with journals, eprintarchives provide an almost wholly automated and highlyefficient organisational framework and distributionmechanism based on the Internet, but without many of theadditional services that journals provide, such as peerreview, and other services that mostly require humanintervention. As more archives attract more papers the testfor journals is how they respond to the effective loss ofexclusivity that most depend on, and how they cope withthe new economics of journal publishing in which everyfacet of traditional value-adding is re-evaluated against thefor-free services.

There is another dimension to consider too: globalisation.With no geographical or financial barriers the nextinevitable step is, if not universalisation where archives alladopt the same technical infrastructure, then integration ofthe archives based on new services. As part of this agendaarchives are not only freely accessible to users but arebecoming open to independent, third-party computationalprocesses, for example Web crawler techniques used forindexing, on which these services will be built. This

SPACE FOR ACM COPYRIGHT INFORMATION.REMEMBER TO DELETE THIS BEFORESUBMITTING FINAL VERSION. (USE ACOLUMN BREAK IN MS WORD TO STOP TEXTFROM OVERWRITING THIS AREA.

Page 2: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

approach is being formalised by the Open Archivesinitiative, a group that includes archive managers, potentialservice developers and academic librarians (Ginsparg et al.1999).

It is anticipated within the publishing industry that links oncitations within scholarly papers will be one of the primarynew services driving integration between scholarly sources(Needleman et al. 1999). Linking services cannot beimplemented piecemeal on such a scale. A large group ofjournal publishers have announced a collaboration (Reuters1999) to use the Digital Object Identifier (DOI) system(Davidson and Douglas 1998) to form citation links (Lyonsand Ratner 1999) between their, separately maintained,journal contents. Most traditional journals service providers- aggregators, subscription agents and secondary publishers(e.g. Brunelle and Johnson 1999) - as well as newproducers highlight the important role of citation links intheir services.This paper considers the implications of the new wave ofeprint archives and the development of open archives. Itsmain focus is an example of an open archive service beingdeveloped by the Open Citation (OpCit) project, funded bythe joint NSF-JISC international digital librariesprogramme, that will initially build citation links unitinglarge, high-profile and distributed archives but which isplanned to be extensible to other services that provideaccess to scholarly papers.

OPEN ARCHIVES: SEPARATING CONTENT FROMSERVICESEprint archives are noticeably entering a new phase. Notonly have significant new archives launched, but newservices are being developed to complement the traditionalcontent management functions of the archives. The keyfeature is that many of these new services will beindependent of the underlying archive contents. There are anumber of reasons for this. Growth in the number ofarchives and greater use create opportunities for third-partydevelopers, and such developments thus do not add to thelow-cost base of most archives and can be developed eithercommercially or non-commercially.More importantly, as the use of archives crosses newboundaries between disciplines, the role of new serviceswill be to enable the user to view an integrated, completeset, or selected subset, of all archives, with usercustomisation features to set the scope of a service likely tobecome increasingly common. For the archives predicatedon free and open access it is recognised that capitalising onthis feature means not constraining users within theboundaries of a single discipline or archive but supportingcross-disciplinary navigation. The aim is that navigationsupport such as indexing, searching and linking shouldprovide a consistent interface and seamless serviceregardless of which archives are accessed by the user, whoneed have no knowledge of the structure of an archive or,apart from perhaps noting the archive source and publisheridentities, from where a viewed document originated.Integration of free eprint archives through independentservices is currently the preferred view of the Open Archive

initiative (Ginsparg et al. 1999) of the way in which sucharchives may pave the way for a unified global scholarlyliterature, encompassing not just the archives but journalsand other contributing literatures.The way in which this level of service will be achieved willbe by supporting interoperability between the archives, forexample the protocols through which services cancommunicate with the archives, and metadata forms thatwill expose the data structures and the contents of archives.Methods will be open and published to encouragewidespread adoption, but also to ensure conformity of safeand trusted practices that do not compromise the integrityof the archives.A well established protocol for data communication indigital library applications is Dienst, developed at CornellUniversity (Marisa et al. 1998). An implementation of partof the Dienst protocol has been adopted by the OpenArchives initiative for data harvesting. The OpCit project,in which the Dienst research group is one of the principalpartners, is similarly building on this core technology for itslinking applications.

LINKING THE OPEN ARCHIVES: THE OPCIT PROJECTLinking is an apparently simple concept, especially themodel implemented by the Web in which a unidirectionalpoint-and-click event presents the user with the page fromthe location pointed at by a locator, the URL, that isauthored into the linking page. This simplified form ofhypertext linking, it has long been argued in the hypertextcommunity of researchers, is inadequate to support robustservices required for large-scale information environments,such as might be represented by the contents of scholarlycommunications and organised via digital libraries.In terms of citation linking services, an early pioneer andpredecessor of the OpCit project was the Open Journalproject (Hitchcock et al. 1998a). The informationenvironment envisaged in that project centred on publishedjournal contents but was essentially unbounded, allowinglinks to take users to other types of materials, such asabstracting and indexing services, dictionaries andbiological databases. In practice different implementationsof Open Journals had to be bounded, principally to managethe user interface more effectively.Originally supported by four publishers, by its close 12publishers were involved. It is safe to assume that manymore scholarly publishers now recognise the importanceand power of links in electronic documents, especially linkswhich implement the long-established, non-electronic formof linking inherent to the scholarly or scientific paper, thereference.With this wider participation has come the recognition thatlinking may not be as simple as originally envisaged(Caplan and Arms 1999). There are a number of reasons forthis. Expectations are high, of links from almost everyreference in every electronically-accessible paper, bothbackwards in time as provided by a typical reference list ina paper, and forwards in time in a manner made familiar bythe Science Citation Indexes (Hitchcock et al. 1998b).Constraining these expectations are accuracy, reliabilityand availability of the necessary contents in electronic form

Page 3: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

(Hitchcock et al. 1998c). There may be multiple, but notidentical, versions of the same document at multiplelocations (Caplan and Flecker 1999). Finally, there are thefinancial and authorisation barriers imposed by commercialjournals and services. Access requirements can differ foreach user, for each access location, for each document andfor each document location. Competing publishers may bebecoming allies to support cross-publisher referencelinking, but although there are various possible solutions tothe problem of linking across distributed collections, thereis as yet no convincing demonstration or detail of how thismight be achieved in this exclusively commercialenvironment (Atkins 1999).In this context linking is more than just a technical processbut must be viewed as part of the social and businessphenomena that are shaping the new informationenvironments. This is recognised in the scope andpartnerships that form the OpCit project. Ironically, in thefirst environments the project will explore, selected openarchives, such is the accessibility of so much content thatthe project could almost reduce the problem to its technicalissues, but the longer-term commitment is to collaboratingwith others in the wider environments discussed below.Three principal objectives elaborated by the project (seeoriginal proposal; Harnad et al. 1999) concern scale-compatibility-universality:• Scale: to hyperlink each of the over 100 000 papers in

Los Alamos's unique online physics archive to everyother paper in the archive that it cites

• Compatibility: to develop and integrate a family ofgeneric linking tools and to design author and userinterfaces to enable easy adoption by other archives

• Universality: to promote the power of this remarkablenew way of navigating the scientific journal literatureand induce authors in other fields to create interlinkedonline archives like Los Alamos across disciplines andaround the world

Primary partners in the project are Paul Ginsparg from LosAlamos, Southampton University with its expertise inlinking applications, also home of the Cogprints eprintarchive for cognitive sciences and a mirror site for the LosAlamos archives, and Cornell University which is a majorplayer in building NCSTRL based on Dienst. In relatedwork the Cornell group will apply the linking technologyused in OpCit to the ACM Digital Library.When the Open Journal project began in 1995 the linkingtools used then were all developed at SouthamptonUniversity. Now there are a range of tools created to suitdifferent linking application requirements:• The Distributed Link Service (Southampton

University): much modified since its use in the OpenJournal project, now more modular to work withcontributing software 'agent' applications, but stillessentially does the job of making links visible andactionable for users in documents of various formats atany Web location. (Carr et al. 1998)

• Research Index (NEC Research, Princeton): formerlycalled Citeseer, an impressive tool for resourcediscovery, data collection and indexing of documentsfound on the Web. (Lawrence et al. 1999) Citeseer has

been referred to as building an ISI-like service on theWeb, so there is another interesting parallel with theOpen Journal project which produced its mostsuccessful citation linking demonstrator using dataprovided by ISI.

• SFX (University of Ghent): similar linking capabilitiesas DLS, but tailored for knowledge management in asingle host library environment. (Van de Sompel andHochstenbach 1999a)

• SLinkS (Openly Informatics): a software-basedintermediary service that enables the URL to becalculated of a cited document published in a journaland held within the publisher's authorised service.Unlike the DOI service there are no identifiers, butcooperating publishers and content providers specifyrules that enable URL calculation from standardbibliographic information. New services can be builton this open specification. (Hellman 1998)

An early objective of the OPCit project is to examine thestructure of these tools at a software level and elaboratehow they might be used to work together, in technical termsexposing the respective application programming interfaces(APIs). By publishing these details it is hoped to establish ageneric foundation for emerging digital library linkingapplications, recognising that these research tools willconstantly change and evolve and that new tools will bedeveloped. This is a novel approach in that linkingapplications of all types have typically been tool based, andmany such tools, although often highly functional, havetended to be used independently.

Figure 1. Information environments and link services: arange of possible scenarios and tools for OpCit

Page 4: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

LINKING SERVICES AND INFORMATIONENVIRONMENTSInformation environments organised via digital librariescontinue to be part of the Web but are distinguished byservices that apply to contents that are deemed to be withinthem, determined not by physical location but by the natureand selection of those contents. Examples of theseenvironments might include the contents of libraries as inthe UK eLib-funded Distributed National ElectronicReserve (DNER), single-publisher collections such as theACM Digital Library, larger collections of publishedjournal papers accessed via DOI-based services, ordistributed archives such as the Networked ComputerScience Technical Report Library, NCSTRL. In essence, inthese environments the Web is transformed from adocument delivery service into a dynamic, computationalframework. The information environments being exploredfor citation linking by the OpCit project are outlined inFigure 1.

The scope of the project and that shown in Figure 1 isintentionally wide, although of immediate concern is thearea bounded by the left-hand vertical arrows and theinformation environments denoted by 'Southampton' and'Cornell'. The first results reported below specifically relateto Southampton's work with the Los Alamos physicsarchive.These scenarios are highly flexible and may be viewed inother ways by different applications. For example, theconnecting arrows are all possible scenarios, although suchapplications may not have been implemented yet. An SFXapplication linking various resources - notably a number ofabstracting and indexing (A&I) services, some publishers'full-text content as well as the Los Alamos archives - via anSFX database for library environments was described byVan de Sompel and Hochstenbach (1999b). It could beenvisaged that the multi-publisher linking initiativedescribed above might likely route through the DOIresolver to a library or aggregated journal environment. ISIhas announced a number of agreements with publishers tolink between Web of Science and full-texts. Each of theseapplications can be identified in some form in Figure 1.In addition, the roles imputed to the linking tools in Figure1 may not reflect their wider capabilities. Citeseer and SFXvariously contain information retrieval, database andlinking functions. This is not shown for Citeseer.Demonstrator services based on these tools have createduser interfaces. In this respect the tools could legitimatelybe indicated as part of the information environments at thetop of the diagram, but are not in this view. Apart fromCiteseer, methods for data extraction from the archives -mini-Dienst and new metadata conventions - that will beused to create citation databases are ongoing developmentsof the Open Archive initiative.Thus it can be seen how interchangeable these componentsare, and it is anticipated that this flexibility will be a driver

for significant innovation in citation linking. For now,Figure 1 should be considered a perspective onenvironments for citation linking held by the OpCit projectbut not necessarily by others.

Figure 2. Reference section of article hep-th/9907001with added links indicated by coloured boxes. For a

colour key see Figure 5

OPCIT: EARLY IMPLEMENTATIONS AND RESULTSThe process of adding citation links dynamically todocuments retrieved from an archive involves parsing thedocument during download to identify and read citations.The data are compared with a precompiled link or citationdatabase, and a link to the cited work added where an exactmatch is found. For more details, one method for doing thiswas described by Hitchcock et al. (1997).

Figure 3. Activating the link for reference [5] in Figure 2returns an SFX page offering the user a choice of

sources

A similar method has been adopted for OpCit, but in thiscase the application demands that a larger, richer citationdatabase is compiled. Broadly the stages involved in thecompilation of this database are:1. transforming original documents to a format (e.g. plain

text) for extracting citations;

Page 5: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

2. parsing documents to identify and read citations;3. design database schema for storing citation information

in an information-rich, easy-to-use and flexiblemanner, and accommodating future extensions.

Citeseer excels in this respect, but the algorithms havereportedly not been so successful when applied to the LosAlamos physics archive because the citations contain toolittle information. The project aims to build tools tosupplement the results produced by Citeseer for otherarchives.The importance of stage 3 is that richer citation databasescan provide users with useful information, apart fromlinking, effectively a highly automated version of Garfield'sfamous citation analyses:• which are the most frequently cited papers?• what papers refer to the current paper (forward

linking)?• who co-reference the same papers, and to what extent?

(identifying similar research interests)• in what context is a paper is cited? (i.e. citation

context)• which journals are the most popular?In a preliminary implementation of the linking model inOpCit in which one of the objectives was to integrateservices provided by some of the linking tools describedabove, a successfully linked citation directs the user to anintermediate page offering the user a choice: eitherdownload the text from the archive or look up somecontextual information on the citation. In this example thelinks in the original document are added by the DLS, andthe intermediate page is produced from an SFX databasewhich maintains some knowledge of the user privileges andcan offer all versions of the cited paper that are accessibleto the user (in this case only the archive versions - abstract,and link-enhanced and author-linked (the original) full textsare available, but in principle if the user or a librarysubscribes to the journal in which the cited paper waspublished that version could be linked from the SFX screentoo; also, other versions of the paper, in abstracting services

for example). The different stages of retrieval are shown inFigs 2-4. Contextual information would be retrieved from adatabase compiled by Citeseer (those results are not shownhere, but an example of this service can be tried online).

CITATION LINK ANALYSISWorking initially just within the Los Alamos physics eprintarchive, citations were analysed from a subset of paperssubmitted during 1999 to the end of October from onesection of the archive, hep-th (theoretical high-energyphysics), a total of 2170 papers. These papers containedover 65 000 citations.Citations in physics are notoriously terse. In papers in thearchive typically they include author names, and then eitheran archive identifier number or a standard abbreviation forthe journal title followed by some undifferentiatednumbers, usually volume number, start page number andthe year of publication (see Figure 2). Sometimes both thearchive ID and journal data are included.For our subset of documents the relative success inautomatically recognising and resolving to thecorresponding document in the bibliographic databasecompiled from the archive can be gauged from Figure 5.The colours in this chart correspond to the link colours inFigure 2. Some links were simply derived from explicitlycited IDs for the archive documents, others were derivedpurely from the bibliographic data in the citation. Wherearchive IDs are not included directly, they can alternativelybe derived from bibliographic data in the archive journal-ref metadata or from other more intensively-maintained,overlapping bibliographic databases in physics such asSPIRES. (SPIRES is maintained by the Stanford LinearAccelerator Center (SLAC), an associate partner in theOpCit project.)

Fig 4aFig 4b Fig 4c

Figure 4. Selecting the first three options from Figure 3: a, abstract of cited paper; b, PDF version enhanced withreference links; c, the same, original full-text from the archive (the dashed link is created by the archive on explicitly

cited identifiers). For a colour key see Figure 5

Page 6: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

Figure 5. Resolving and linking citations in a subset ofhep-th papers: what proportion could be linked, what

could not and why. For an example of a single referencelist showing this range of results see Figure 2

Where resolution of reference data against the database,and therefore linking, was unsuccessful there could anumber of reasons, and the relative occurrence of theseproblems is also shown in Figure 5. In some cases a citationwas correctly recognised, but the reference is too old tofeature in the archive, or a citation was recognised but notfound in the database (the cited paper is not in the archive).The remaining citations could not be resolved, possibly dueto poor formatting, incorrect data, etc. It can be seen thatjust over half of the references were successfully linkedwithin the archive. The number of successful links could beincreased significantly if the archives were supplementedwith other, older sources, online archival journals say.About 16 per cent of citations from this subset may neverbe resolvable (again, consistent with the experiencereported by Electronic Press).These percentages might be generalisable across thephysics archive but not necessarily to other archives orapplications, although it is interesting to compare broadmeasures of success in citation linking such as this example(52 per cent of citations linked within the archive) with thatreported by Electronic Press for Medline linking in which"on average only about 60% of references in a typicalmedical paper are contained in the Medline data". Of theseMedline citations only 85% were reliably resolved by thelinking software (Hitchcock et al. 1998c). Factors thatcontrol these figures include the size and accessibility ofthe archive and other document sources, and the accuracy,quality and completeness of the reference data.

OPCIT DESIGN AND EVALUATIONThere is another way tackle potentially unresolvablecitations for new and future submissions to the archive: atsource, when the papers are deposited by authors. As wellas developing linking services, important and interestingtasks for the project are citation analysis, interface design,and user testing and evaluation. Designing user interfacesfor the linked archives concerns not just navigation butauthor deposit too. It is argued that a barrier to widerparticipation in eprint archives beyond physics is the needfor more user-friendly procedures for authors. Given thatoriginal submissions to the archives. i.e. the preprints ratherthan the later refereed reprints, are unedited andunmoderated, the responsibility for the quality and accuracyof such submissions lies wholly with its authors, so the

challenge is to make the deposit process both easier andmore intuitive but with a greater degree of responsivenessand immediate feedback to encourage high standards ofcorrectness and completeness.Citation linking obviously depends strongly on theaccuracy of data provided by authors and so improvingcorrectness in this area is of particular interest to theproject. One idea is is to provide dynamic checking forreferences in newly submitted papers using the sameprocess as used to produce citation links and inviting theauthor to respond. In this case correct, linkable citationswould appear as standard links, but other citations could behighlighted by different link colours, say, indicatingpossible problems with a citation, suggesting a reason forthe problem and highlighting which ones the author couldusefully amend. Such a scheme might look like that shownin Figure 2 with link types as defined in Figure 5 (althoughthe model implementation was not designed for thispurpose).While each of the components developed by the projectwill be subjected to user evaluation by standard means, incitation analysis there is an inherent means of evaluatingcommon practice and usage of the archive by its mostimportant constituency: its authors.Since the project is experimenting with stored data from thearchive rather than real live data there are other effects thatcan be used to monitor user behaviour. The stored data is asnapshot of the archive contents at that time and usingsimple difference computation can be compared with laterdatasets to reveal the extent of changes, minor or major. Ofparticular interest is the proportion of pre-publicationpapers in the archive that are replaced by the finalpublished version. Only the authors of a paper can changeor update it in the archive, so there is no standard procedurefor replacing pre- with post-publication copies, but analysisis revealing some common patterns.

CONCLUSIONThe Internet is regarded by some as introducing a paradigmshift in communication (Valauskas 1997). With major newfree-to-use eprint archives joining the well established LosAlamos archives, and with progress towards integration andglobalisation of these archives motivated by the OpenArchives initiative, the new paradigm beckons for scholarlycommunication. For scholarly communication thesedevelopments bring closer the critical and long heralded(Dyson 1994) transition promised by the Internet in whichprimary value accrues to services rather than just to content.For any such transition the main issue is access. Thearchives provide access to content and projects such asOpCit and others are demonstrating how that can beexploited and value added through new services. (Harnad1998)For commercial scholarly publishing, which by its natureimposes financial barriers to access, the picture is less clear.It is ironic that as some major publishers agree to explorepossible mechanisms for managing links between theirjournals, they are failing to respond to the issue ofimproving access. For example, users can access primaryjournal papers via abstracting, indexing and aggregation

Page 7: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

services now transformed into electronic forms andsporting new brand names but mostly supported by familiarcorporate interests. A possible scenario was sketched byMorrow (1999):"Sites who have signed up for the ScienceDirect trialthrough BIDS now have access through four differentroutes ...* Selecting the Elsevier option for the BIDS route toScienceDirect does NOT preclude access to ScienceDirectvia Web of Science through MIMAS. Even after July 2000,Web of Science users (with the appropriate linking licence)at sites who have selected the Elsevier licensing option willcontinue to be able to access ScienceDirect material (inexactly the same way as those who sign a NESLI/Swetsagreement)."Individually these are perfectly good services that arelegitimately exploring new solutions at a critical moment ofchange. The question it raises, however, is does this helpusers? This is not a paradigm shift, but a muddle that is theresult of self-preservation driven by service providers ratherthan users. It arises because it fails to recognise the shift invalue from content to services. Some publishers thatmaintain close links with the research community, typicallyprofessional societies, have been required to pay closeattention to the development of the archives. Theyunderstand the distinction and have acted (e.g. the BMJ andthe Current Science Group), or are prepared to act (Doyle1999), to free their electronic archives and to free authors toself-archive.This paper has highlighted the real paradigm shift takingplace in scholarly communication, towards more open andaccessible information. This benefits linking services, butmany other digital library services can benefit too. A newphase of convergence in digital library services is beginningand is being driven by embracing pragmatic, widely sharedinterests throughout the scholarly community.

REFERENCES1. Anon. (1999) Science publishing - beginning of a

revolution. Current Science Group, press release, 26Aprilhttp://www.genomebiology.com/pressrelease26apr99.asp

2. Atkins, Helen (1999) The ISI Web of Science - Linksand Electronic Journals. D-Lib Magazine, Vol. 5 No. 9,Septemberhttp://www.dlib.org/dlib/september99/atkins/09atkins.html

3. Brunelle, Bette and Johnson, Dana (1999) Connectingthe Docs: New Models and New Tools to LinkBibliographic Databases and Full Text Journals. CNIFall 1999 Task Force Meeting: Project Briefingshttp://www.cni.org/tfms/1999b.fall/PBrief99Ftf.html

4. Buck, Anne M., Flagan, Richard C. and Coles, Betsy(1999) Scholar’s Forum: A New Model For ScholarlyCommunication. March

http://library.caltech.edu/publications/ScholarsForum

5. Caplan, Priscilla and Arms, William Y. (1999)Reference Linking for Journal Articles. D-LibMagazine, Vol. 5, No. 7/8, July/August http://www.dlib.org/dlib/july99/caplan/07caplan.html

6. Caplan, Priscilla and Flecker, Dale (1999) Choosing theAppropriate Copy. Draft version circulated to Ref-Links listserv <[email protected], 14 Octoberhttp://www.doi.org/mail-archive/ref-link/msg00060.html

7. Carr, L. A., Hall, W. and Hitchcock, S. (1998) LinkServices or Link Agents? Ninth ACM Conference onHypertext, Pittsburgh, Junehttp://www.staff.ecs.soton.ac.uk/~lac/LinksOrAgents.pdf

8. Davidson, Lloyd and Douglas, Kimberly (1998) DigitalObject identifiers: Promise and Problems for ScholarlyPublishing. Journal of Electronic Publishing, Vol. 4,No. 2, December http://www.press.umich.edu/jep/04-02/davidson.html

9. Delamothe, Tony and Smith, Richard (1999) Movingbeyond journals: the future arrives with a crash: Newways to disseminate research from NIH and the BMJ.British Medical Journal, Vol. 318, 19 June, 1637-1639http://www.bmj.com/cgi/content/full/318/7199/1637

10. D-Lib Magazine, Vol. 5 No. 10, Octoberhttp://www.dlib.org/dlib/october99/van_de_sompel/10van_de_sompel.html

11. Doyle, Mark (1999) Re: PubMedCentral. Messageposted on September 1998 American Scientist Forumlistserv <[email protected], 22 Septemberhttp://listserver.sigmaxi.org/scripts/wa.exe?A2=ind99&L=september98-forum&D=1&O=D&F=l&P=26434

12. Dyson, Esther (1994) Intellectual Property on the Net.Release 1.0, Decemberhttp://www.edventure.com/release1/1294.html

13. Ginsparg, Paul (1994) First Steps Towards ElectronicResearch Communication. Computers in Physics, Vol.8, No. 4, Jul/Aughttp://xxx.lanl.gov/blurb/blurb.ps.Z

14. Ginsparg, Paul, Luce, Rick and Van de Sompel, Herbert(1999) First meeting of the Open Archives initiative. 29Octoberhttp://vole.lanl.gov/ups/ups1-press.htm

Page 8: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

15. Halpern, Joseph Y. and Lagoze, Carl (1998) TheComputing Research Repository: Promoting the RapidDissemination and Archiving of Computer ScienceResearch. CoRR eprint, cs.DL/9812020, Decemberhttp://xxx.lan.gov/abs/cs/9812020

16. Harnad, S. (1998) On-Line Journals and Financial Fire-Walls. Nature, 395, September 10, 127-128http://www.cogsci.soton.ac.uk/~harnad/nature.html

17. Harnad, Stevan et al. (1999) Integrating and NavigatingEprint Archives Through Citation-Linking. Proposal toNSF-JISC International Digital Libraries ResearchProgrammehttp://www.cogsci.soton.ac.uk/~harnad/citation.html

18. Hellman, Eric S. (1998) Scholarly Link SpecificationFramework (SLinkS)http://www.openly.com/SLinkS/SLinkS.html

19. Hitchcock, S., Carr, L., Harris, S., Hey, J. M. N. andHall, W. (1997) Citation Linking: Improving Access toOnline Journals. Proceedings of the 2nd ACMInternational Conference on Digital Libraries, edited byRobert B. Allen and Edie Rasmussen, 1997 (New York,USA: Association for Computing Machinery), pp. 115-122http://journals.ecs.soton.ac.uk/acmdl97.htm

20. Hitchcock, Steve, Carr, Leslie and Hall, Wendy (1996)A Survey of STM Online Journals 1990-95: the CalmBefore the Storm. Directory of Electronic JournalsNewsletters and Academic Discussion Lists, edited byD. Mogge, sixth edition (Washington, D.C.:Association of Research Libraries), pp. 7-32http://journals.ecs.soton.ac.uk/survey/survey.html

21. Hitchcock, Steve, Carr, Leslie, Harris, Steve, Hall,Wendy, Probets, Steve, Evans, David and Brailsford,David (1998a) Linking Electronic Journals: Lessonsfrom the Open Journal Project. D-Lib Magazine,Decemberhttp://www.dlib.org/dlib/december98/12hitchcock.html

22. Hitchcock, Steve, Kimberley, Robert, Harris, Steve,Carr, Leslie and Hall, Wendy (1998b) Webs ofResearch: Putting the User in Control. InternetResearch and Information for Social Scientists (IRISS)conference, Bristol, Marchhttp://sosig.ac.uk/iriss/papers/paper42.htm

23. Hitchcock, Steve, Quek, Freddie, Carr, Leslie, Hall,Wendy, Witbrock, Andrew, Tarr, Ian (1998c) TowardsUniversal Linking for Electronic Journals. SerialsReview, Vol. 24, No. 1, Spring, 21-33

http://journals.ecs.soton.ac.uk/IFIP-SerRev98.html

24. Hunter, Karen (1999) Journals Online: PubMed Centraland Beyond, page 4. HMS Beagle: The BioMedNetMagazine, No. 61, September 3rd

http://www.biomednet.com/hmsbeagle/61/viewpts/page4 (free registration required)

25. Kling, Rob and McKim, Geoffrey (1999) Not Just aMatter of Time: Field Differences in the Shaping ofElectronic Media in Supporting ScientificCommunication. Journal of the American Society forInformation Sciencehttp://xxx.lanl.gov/ftp/cs/papers/9909/9909008.pdf

26. Lawrence, Steve, Giles, C. Lee and Bollacker, Kurt(1999) Digital Libraries and Autonomous CitationIndexing. IEEE Computer, Vol. 32, No. 6, 67-71http://www.neci.nj.nec.com/~lawrence/papers/aci-computer98/

27. Lyons, Catherine and Ratner, Howard (1999) DOIsUsed for Reference Linking: DOI-X (version 1.0).Octoberhttp://dx.doi.org/10.1000/6382-1

28. Maclennan, Birdie (1999) Presentation and AccessIssues for Electronic Journals in a Medium-SizedAcademic Institution. Journal of Electronic Publishing,Vol. 5, No. 1, Septemberhttp://www.press.umich.edu/jep/05-01/maclennan.html

29. Marisa, Richard, Davis, J.R. and Lagoze, C. (1998)Dienst protocol version 5.1. Decemberhttp://brolly.cit.cornell.edu/dienst51.html

30. Morrow, Terry (1999) BIDS services now linked toElsevier's ScienceDirect. Message posted on lis-scitechlistserv <[email protected], 27 Septemberhttp://www.mailbase.ac.uk/lists/lis-scitech/1999-09/0025.html

31. Needleman, Mark (1999) Meeting Report of the NISOLinking Workshop. Washington, D.C., Februaryhttp://www.niso.org/linkrpt.html

32. Reuters (1999) Scientific Minds Meet on the Net.Wired News, 17 Novemberhttp://www.wired.com/news/reuters/0,1349,32592,00.html

33. Valauskas, Edward J. (1997) Waiting for ThomasKuhn: First Monday and the evolution of electronicjournals. Journal of Electronic Publishing, Vol. 3, No.1, September http://www.press.umich.edu/jep/03-01/FirstMonday.html

Page 9: Developing services for open eprint archives ...opcit.eprints.org/dl00/htdl00-submitted.pdf · Developing services for open eprint archives: globalisation, integration and the impact

34. Van de Sompel, Herbert and Hochstenbach, Patrick(1999) Reference Linking in a Hybrid LibraryEnvironment, Part 2: SFX, a Generic Linking Solution.D-Lib Magazine, Vol. 5, No. 4, Aprilhttp://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt2.html

35. Van de Sompel, Herbert and Hochstenbach, Patrick(1999b) Reference Linking in a Hybrid Library

Environment: Part 3: Generalizing the SFX solution inthe "SFX@Ghent & SFX@LANL" experiment.

36. Varmus, Harold (1999) Journals Online: PubMedCentral and Beyond, page 1. HMS Beagle: TheBioMedNet Magazine, No. 61, September 3rdhttp://www.biomednet.com/hmsbeagle/61/viewpts/page1 (free registration required)