28 april 2004second nordic conference on scholarly communication 1 citation analysis for the free,...

Post on 27-Mar-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

28 April 2004 Second Nordic Conference on Scholarly Communication

1

Citation Analysis for the Free, Online Literature

Tim Brody

Intelligence, Agents, Multimedia Group

University of Southampton

28 April 2004 Second Nordic Conference on Scholarly Communication

2

Content

• Current services for Open Access Literature

• Institutional Archives Registry• Metadata Harvesting through Celestial• Citebase Search

– Citation Linking– Search and Navigation Service

• Web Impact as a predictor of Citation Impact

28 April 2004 Second Nordic Conference on Scholarly Communication

3

Institutional Archives Registry

28 April 2004 Second Nordic Conference on Scholarly Communication

4

28 April 2004 Second Nordic Conference on Scholarly Communication

5

Sites in the IAR

• Things we want to know:– GNU EPrints sites– Other research collections (Other Archives, Open

Journals)– BOAI 1. vs BOAI 2.

• A submission form consisting of:– URL, Name, OAI URL, Country, ‘type’, full-text,

software• Can’t (yet) track full-texts• (Create a master-list so archives only register-

once?)

28 April 2004 Second Nordic Conference on Scholarly Communication

6

Celestial

• Designed to:– Be an abstraction over OAI-PMH versions– Caching OAI metadata records

• Technological questions:– How big can the OAI-PMH go (ok for 5 million

records so far)– How reliable are OAI-PMH implementations

• Feeds Citebase, IAR, some external users

28 April 2004 Second Nordic Conference on Scholarly Communication

7

28 April 2004 Second Nordic Conference on Scholarly Communication

8

28 April 2004 Second Nordic Conference on Scholarly Communication

9

Services for Open Access Literature

Self-Archived Full-texts (Pre/Post-prints)Open Access Publishing

Citation Analysis/Linking Services(Citebase / Citeseer / OpenURL / DOI)

Version Linking Services

Search EnginesNavigation Tools

Analysis & Assessment

Citebase

Citeseer

Google

BM

C

arXiv.org

OA

I-PM

H T

ransport

OA

Ister

Scirus

n.b. Scirus/OAIster aren’t citation-analysis aware yet, Googleindexes Citeseer. Not an exhaustive list …

28 April 2004 Second Nordic Conference on Scholarly Communication

10

Citation Analysis & Linking

• A citation is a reference from one work to another [as a hyperlink: a citation link]

• Citation analysis uses citation relationships to analyse patterns in research

• As a graph a work (paper, book etc.) is a vertex and a citation an edge

• ‘Bibliometrics’– (study of patterns in literature)

28 April 2004 Second Nordic Conference on Scholarly Communication

11

Digitometric/Infometric Analysis

• Bibliometrics for the online age

• Couple citation analysis with Web analysis– (how many times has x been accessed?)

• Similar to readership studies, but easier to survey and more comprehensive– (though subject to the same problems of

copies being re-distributed, multiple accesses etc.)

28 April 2004 Second Nordic Conference on Scholarly Communication

12

Citebase Search

Repositories

Metadata Harvest(OAI-PMH)

Full-text Harvest

Meta Database

ReferencesDatabase

CitationDatabase

WebInterface

OAI-PMHInterface

Citebase

28 April 2004 Second Nordic Conference on Scholarly Communication

13

Citation Linking

• Retrieve and cache full-texts– LaTeX, PDF, XML

• Extract reference list

• Extract individual references

• Parse references into components– Author, year, title, journal, volume, pagination

• Store in structured database

28 April 2004 Second Nordic Conference on Scholarly Communication

14

Citebase Search

28 April 2004 Second Nordic Conference on Scholarly Communication

15

28 April 2004 Second Nordic Conference on Scholarly Communication

16

Citebase Search:Navigation by Citation Links

Current Article Co-cited

Article withreference list

Referencelink

Future

Past

Related

28 April 2004 Second Nordic Conference on Scholarly Communication

17

28 April 2004 Second Nordic Conference on Scholarly Communication

18

Predicting Citation Impact

• The Web gives us access to new metrics– Download/access frequency

• Can early-day ‘download’ frequency give an indication of longer-term citation frequency?

• (Web logs from the UK arXiv.org mirror, Citation data from Citebase Search)

• Pearson correlation after 6 months of web logs = 0.42 for the High Energy Physics sub-arXiv

28 April 2004 Second Nordic Conference on Scholarly Communication

19

28 April 2004 Second Nordic Conference on Scholarly Communication

20

28 April 2004 Second Nordic Conference on Scholarly Communication

21

28 April 2004 Second Nordic Conference on Scholarly Communication

22

28 April 2004 Second Nordic Conference on Scholarly Communication

23

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 100 200 300 400 500 600 700 800Days since deposit

Cor

rela

tion

(r)

28 April 2004 Second Nordic Conference on Scholarly Communication

24

Assessing Research(ers)

• Citation Impact– By-Paper, Author, [Journal, Institution]

• Web Impact– Predictor of citation-impact, combine with

citation-impact

• Search Engines

• More detailed research assessment

28 April 2004 Second Nordic Conference on Scholarly Communication

25

Comparing Online/Offline Impact

• Using ISI CD-ROM data• Use Web crawlers to find ‘online’ articles• Compare citation impact of online and

offline articles– By discipline, by journal, by author?

• Initial results for Physics show 2-3x increase– arXiv.org

• Southampton, U. Quebec, Oldenburg (de)

28 April 2004 Second Nordic Conference on Scholarly Communication

26

Relevant Web Pages

• EPrints – http://www.eprints.org/– IAR: http://archives.eprints.org/

• Citebase Search– http://citebase.eprints.org/

• Celestial– http://celestial.eprints.org/

• Correlation Generator– http://citebase.eprints.org/analysis/correlation.php

• Tim Brody <tdb01r@ecs.soton.ac.uk>

top related