research library, los alamos national laboratory @ november, 2007, niso - dallas, tx digital library...

39
Quick TIFF are ne Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team QuickTime™ TIFF (Uncompr are needed t The MESUR project: an update from the trenches.. Johan Bollen Digital Library Research & Prototyping Team Los Alamos National Laboratory - Research Library jbollen@lanl . gov Acknowledgements: Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Lyudmila L. Balakireva (LANL) Wenzhong Zhao (LANL), Aric Hagberg (LANL) Research supported by the Andrew W. Mellon Foundation. NISO Usage Data Forum • November 1-2, 2007 • Dallas, TX

Upload: rory-merris

Post on 15-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

The MESUR project: an update from the trenches..

Johan BollenDigital Library Research & Prototyping Team

Los Alamos National Laboratory - Research Library

[email protected]

Acknowledgements:

Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Lyudmila L. Balakireva (LANL)

Wenzhong Zhao (LANL), Aric Hagberg (LANL)

Research supported by the Andrew W. Mellon Foundation.

NISO Usage Data Forum • November 1-2, 2007 • Dallas, TX

Page 2: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Citations and the journal impact factor.

Journal Impact Factor: mean 2-year citation rate

2003 citations to 2001 and 2002 articles in X

divided by

number of articles published in X in 2001 and 2002

20032001 2002

Journal x All (2003)

Citation data:• Golden standard of scholarly evaluation• Citation = scholarly influences.• Extracted from published materials.• Main bibliometric data source for scholarly

evaluation.

Lowinfluence

Highinfluence

Widely applied• Fair approximation of journal “status”,…but• Used to rank authors, departments, institutions, regions, nations,

etc.• Now common in tenure, promotion and other evaluation

procedures!

Page 3: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

From ranking to assessment

best

good

not so good

Ranking 0.6:- single criterion of value- single data source, e.g. citation data- impact: frequentist, e.g. average citation rates

(ISI Impact Factor)

“Assessment 3.0”:- situate item in value landscape- multiple sources of scholarly information, e.g. citation, usage and bibliographic data.- impact: relational indicators: trust, reputation, centrality, etc.

?

?

?

1 significant problem:which dimensions to choose?

Mean citation rate

bad

Page 4: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Many possible facets of scholarly impact

Page 5: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

The MESUR project.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Johan Bollen (LANL): Principal investigator.Herbert Van de Sompel (LANL): Architectural consultant.Aric Hagberg (LANL): Mathematical and statistical consultant.Marko Rodriguez (LANL): PhD student (Cognitive Science, UCSC).Lyudmila Balakireva (LANL): Database management and development.Wenzhong Zhao (LANL): Data processing, normalization and ingestion.

“The Andrew W. Mellon Foundation has awarded a grant to Los Alamos National Laboratory (LANL) in support of a two-year project that will investigate metrics derived from the network-based usage of scholarly information. The Digital Library Research & Prototyping Team of the LANL Research Library will carry out the project. The project's major objective is enriching the toolkit used for the assessment of the impact of scholarly communication items, and hence of scholars, with metrics that derive from usage data.”

Page 6: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Usage data

Citation data pertain to 4 levels in the scholarly communication process:

• Community: authors of journal articles.• Artifacts: journal articles.• Data: citation data (+1 year publication delay).• Metrics: mean citation rate rules supreme.• Scale: expensive to extract.

Hence, various initiatives focused on usage data: COUNTER, IRS, SUSHI, CiteBase.But where are the metrics?

However, for usage data:• Community: all users including most authors.• Artifacts: all that is accessible.• Data: recorded upon publication.• Metrics: a range of web and web2.0 inspired metrics, e.g. clickstream and datamining.• Scale: automatically recorded at point of service.

Page 7: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Divergence and convergence.

CSU Usage PR IF (2003) Title (abbv.)

1 78.565 21.455 JAMA-J AM MED ASSOC

2 71.414 29.781 SCIENCE

3 60.373 30.979 NATURE

4 40.828 3.779 J AM ACAD CHILD PSY

5 39.708 7.157 AM J PSYCHIAT

LANL Usage PR IF (2003) Title (abbv.)

1 60.196 7.035 PHYS REV LETT

2 37.568 2.950 J CHEM PHYS

3 34.618 1.179 J NUCL MATER

4 31.132 2.202 PHYS REV E

5 30.441 2.171 J APPL PHYS

MSR Usage PR IF (2005) Title (abbv.)

1 15.830 30.927 SCIENCE

2 15.167 29.273 NATURE

3 12.798 10.231 PNAS

4 10.131 0.402 LECT NOTES COMP SCI

5 8.409 5.854 J BIOL CHEM

Convergence!• Open research questions:

o Is this guaranteed?o To what? A common-baseline?

• What we do know: o Institutional perspective can be

contrasted to baseline.o As aggregation increases in size, so

does value.o Cross-validation is key.

Page 8: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

MESUR1: Metrics from Scholarly Usage of Resources.

1. Pronounced “measure”

Andrew W. Mellon Foundation funded study of usage-based metrics (2006-2008)

Executed at the Digital Library Research and Prototyping team, Los Alamos National Laboratory Research Library

Objectives:1. Create a model of the scholarly

communication process.2. Create a large-scale reference data

set (semantic network) that relates all relevant bibliographic, citation and usage data according to (1).

3. Characterize reference data set.4. Survey usage-based metrics on

basis of reference data set.

Page 9: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Project data flow and work plan.

1

2 3 4

Page 10: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Project timeline.

We are here!

Page 11: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Presentation structure:an update on the MESUR project

1) Usage data characterization2) Analysis

1) Usage graphs2) Metrics analysis

3) Results4) Discussion

12

3

4

Page 12: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Presentation structure:an update on the MESUR project

1) Usage data characterization2) Analysis

1) Usage graphs2) Metrics analysis

3) Results4) Discussion

Page 13: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Usage data providers

Publisher recorded usage data:- COUNTER statistics- Item-level data

Aggregator recorded usage data:- COUNTER statistics- Item-level data

Institutionally recorded usage data:- SFX logs- Custom log formats (XML)

Page 14: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Usage data collected

• Data: > 1B usage events and 1B citations

o At this point, 247,083,481 usage events loaded

o Another +1,000,000,000 on the way

• Documents: > 50M documents• Journals: 326,000

• Community: > 100M users and authors combined

Data providers:

• Publishers: 8

• Aggregators: 3

• Institutions: 4

Page 15: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Data acquired: timelines

Span:• Majority: -1 years• Some minor 2002-2003 data

Sharing models:• Historical data for period [t-x,t]• Periodical updates

Main issues:• Restoration of archives• Digital preservation issues

o All data fields intacto Integration of various sources of usage data

Page 16: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Presentation structure:an update on the MESUR project

1) Usage data characterization2) Analysis

1) Usage graphs2) Metrics analysis

3) Results4) Discussion

Page 17: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

MESUR’s usage data representation framework.

Assumptions:1. Sessions identify sequences of events

(same user - same document)2. Documents tied to aggregate request

objects3. Request objects consist of series of

service requests and the date and time at which request took place.

Implications:1. Sequence preserved.2. Most usage data and statistics can be

reconstructed from framework3. Lends itself to XML and RDF format4. Permits request type filtering

Example:1. COUNTER stats= aggregate request

counts for each document (journal) grouped by date/time (month)

2. Usage graph: overlay sessions with same document pairs

• Out of 13 MESUR providers so far, only 3 natively follow this model.• The usage data of another 8 contains the necessary information for conversion

Page 18: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Implications for structural analysis of usage data

• Sequence preservation allows:o Reconstruction of user behavioro Usage graphs!

• Statistics do not allow this type of analysis BUT are useful for:

o validating resultso rankings

Page 19: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

How to generate a usage graph.

Documents are associated by co-occurrence in same session

• Same session, same user: common interest

• Frequency of co-occurrence in same session: degree of relationship

Usage data:

• Works for journals and articles

• Anything for which usage was recorded

Options:

• Strict pair-wise sequence

• All within session

Note: not something we invented. Association rule learning in data mining. Beer and diapers!

Page 20: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Usage graphs

journal1journal2

MESUR graph created:• 200M usage events• Usage restricted to 2006• Journals clipped to 7600

2004 JCR journals• Pair-wise sequences

o Within session, only consecutive pairs

o Raw frequency weights

Page 21: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Tracing the flow of information.

Marko Rodriguez

Page 22: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Presentation structure:an update on the MESUR project

1) Usage data characterization2) Analysis

1) Usage graphs2) Metrics analysis

3) Results4) Discussion

Page 23: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Metric types

Note:• Metrics can be calculated

both on citation and usage data

• Structural metrics require graphs

o Citation graph, e.g. 2004 JCR

o Usage graph, e.g. created by MESUR

Wasserman (1994) Social network analysis

Page 24: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Frequentist metrics

Raw cites:• Count number of

citations to document or journal

• Count number of times document or journal was accessed

Normalized:

•Journal Impact factor:o Number of citations to journalo Divided by number of articles published in journal

•Usage Impact Factor

• Number of request for journal or article

• Divided by number of articles published in journal

Johan Bollen. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. Journal of the American Society for Information Science and Technology, 59(1), 2008

Page 25: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Structural metrics calculated from usage graph

Classes of metrics:• Degree• Shortest path• Random walk• Distribution

Degree• In-degree• Out-degree

Shortest path• Closeness• Betweenness• Newman

Random walk• PageRank• Eigenvector

Distribution• In-degree entropy• Out-degree entropy• Bucket Entropy

Each can be defined to take into account weights, i.e. an unweighted and weighted variant.

Page 26: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Social network metrics: different aspects of impact I

In-degree/IFDegree

centrality

Closenesscentrality

Betweennesscentrality

Shortest pathmetrics

Degreemetrics

Page 27: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Social network metrics: different aspects of impact II

Basic idea:- Random walkers follow edges- + Probability of random teleportation- Visitation numbers converge ~ PageRank

- “Stationary Probability distribution”

From wikipedia.org

Random walkMetrics, e.g. PageRank

Page 28: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Presentation structure:an update on the MESUR project

1) Usage data characterization2) Analysis

1) Usage graphs2) Metrics analysis

3) Results4) Discussion

Page 29: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Set of metrics calculated on MESUR data set

List of metrics:JCR 2004 • CITE-BE• CITE-ID• CITE-IE• CITE-IF• CITE-OD• CITE-OE• CITE-PG• CITE-UBW• CITE-UBW-UN• CITE-UCL• CITE-UCL-UN• CITE-UNM• CITE-UNM-UN• CITE-UPG• CITE-UPR• CITE-WBW• CITE-WBW-UN• CITE-WCL• CITE-WCL-UN• CITE-WID• CITE-WNM• CITE-WNM-UN• CITE-WOD• CITE-WPR

Usage-based metrics:

MESUR 2006• USES-BE, • USES-ID• USES-IE• USES-OD• USES-OE• USES-PG• USES-UBW• USES-UBW-UN• USES-UCL• USES-UCL-UN• USES-UNM• USES-UNM-UN• USES-UPG• USES-UPR• USES-WBW• USES-WBW-UN• USES-WCL• USES-WCL-UN• USES-WID• USES-WNM• USES-WNM-UN• USES-WOD• USES-WPR

Page 30: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Overlaps and discrepancies: Usage Impact Factor

Page 31: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Overlaps and discrepancies

Citation PageRank vs. citation IF

For Computer Science only

- Map metrics themselves- We can examine structure of metric correlations and arrive at a structural model of metric relationships- Metric semantics -> what do they tell us about scholarly impact?

Page 32: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Citation rankings

2004 Impact Factorvalue journal1 49.794 CANCER2 47.400 ANNU REV IMMUNOL3 44.016 NEW ENGL J MED4 33.456 ANNU REV BIOCHEM5 31.694 NAT REV CANCER

Citation Pagerankvalue journal1 0.0116 SCIENCE2 0.0111 J BIOL CHEM3 0.0108 NATURE4 0.0101 PNAS5 0.006 PHYS REV LETT

betweennessvalue journal1 0.076 PNAS2 0.072 SCIENCE3 0.059 NATURE4 0.039 LECT NOTES COMPUT SC5 0.017 LANCET

Closenessvalue journal1 7.02e-05 PNAS2 6.72e-05 LECT NOTES COMPUT SC3 6.43e-05 NATURE4 6.37e-05 SCIENCE5 6.37e-05 J BIOL CHEM

In-Degree value journal1 3448 SCIENCE2 3182 NATURE3 2913 PNAS4 2190 LANCET5 2160 NEW ENGL J MED

In-degree entropy Value journal1 9.849 LANCET2 9.748 SCIENCE3 9.701 NEW ENGL J MED4 9.611 NATURE5 9.526 JAMA

Page 33: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Usage rankings

2004 Impact Factorvalue journal1 49.794 CANCER2 47.400 ANNU REV IMMUNOL3 44.016 NEW ENGL J MED4 33.456 NNU REV BIOCHEM5 31.694 NAT REV CANCER

In-Degree value journal1 4195 SCIENCE2 4019 NATURE3 3562 PNAS4 2438 J BIOL CHEM5 2432 LECT NOTES COMPUT SC

In-degree entropy Value journal1 9.364 MED HYPOTHESES2 9.152 PNAS3 9.027 LIFE SCI4 8.939 LANCET5 8.858 INT J BIOCHEM CELL B

betweennessvalue journal1 0.035 SCIENCE2 0.032 NATURE3 0.020 PNAS4 0.017 LECT NOTES COMPUT SC5 0.006 LANCET

Closenessvalue journal1 0.670 SCIENCE2 0.665 NATURE3 0.644 PNAS4 0.591 LECT NOTES COMPUT SC5 0.587 BIOCHEM BIOPH RES CO

Citation Pagerankvalue journal1 0.0016 SCIENCE2 0.0015 NATURE3 0.0013 PNAS4 0.0010 LECT NOTES COMPUT SC5 0.0008 J BIOL CHEM

Page 34: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Metrics relationship

Page 35: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Metrics relationships

• Citation and usage metrics reveal an entirely different pattern

o Citation is split in 2 section:- Degree metrics (right)

- Shortest path and random walk (left)

o Usage is split in 4 clusters:- Degree metrics

- PageRank and entropy

- Closeness

- Betweenness

• Usage pattern can be causedo Noise in usage grapho Higher density of usage/nodes

Page 36: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Hierarchical analysis

Page 37: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Presentation structure:an update on the MESUR project

1) Usage data characterization2) Analysis

1) Usage graphs2) Metrics analysis

3) Results4) Discussion

Page 38: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

MESUR: an update

Usage data:• Creation of single largest reference data set of usage, citation and bibliographic data• +1,000,000,000 usage events loaded in next month• Usage data obtained from multiple publishers, aggregators and institutions• Infrastructure for a continued research program in this domain• Results will guide scholarly evaluation and may help produce standards for usage data representation

Usage graphs:• Adequate data model for item-level usage data naturally leads to this• Reduced distortion compared to raw usage: structure counts, not raw hits• Several option on how to create: MESUR investigates option

Metrics:• Frequentist and structural metrics• Each can represent different facets of scholarly impact• Simple metrics can produce adequate results, e.g. in-degree vs. PageRank. How far should we go?• Note increasing convergence of usage-metrics to citation metrics as sample increases.

Page 39: Research Library, Los Alamos National Laboratory @ November, 2007, NISO - Dallas, TX Digital Library Research & Prototyping Team The MESUR project: an

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory

@ November, 2007, NISO - Dallas, TX

Digital Library Research & Prototyping Team

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Some relevant publications.

Marko A. Rodriguez, Johan Bollen and Herbert Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries, Vancouver, June 2007

Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. (cs.DL/0610154)

Johan Bollen and Herbert Van de Sompel. An architecture for the aggregation and analysis of scholarly usage data. In Joint Conference on Digital Libraries (JCDL2006), pages 298-307, June 2006.

Johan Bollen and Herbert Van de Sompel. Mapping the structure of science through usage. Scientometrics, 69(2), 2006.

Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. Journal status. Scientometrics, 69(3), December 2006 (arxiv.org:cs.DL/0601030)

Johan Bollen, Herbert Van de Sompel, Joan Smith, and Rick Luce. Toward alternative metrics of journal impact: a comparison of download and citation data. Information Processing and Management, 41(6):1419-1440, 2005.