research library, los alamos national laboratory @ november, 2007, niso - dallas, tx digital library...
TRANSCRIPT
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
The MESUR project: an update from the trenches..
Johan BollenDigital Library Research & Prototyping Team
Los Alamos National Laboratory - Research Library
Acknowledgements:
Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Lyudmila L. Balakireva (LANL)
Wenzhong Zhao (LANL), Aric Hagberg (LANL)
Research supported by the Andrew W. Mellon Foundation.
NISO Usage Data Forum • November 1-2, 2007 • Dallas, TX
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Citations and the journal impact factor.
Journal Impact Factor: mean 2-year citation rate
2003 citations to 2001 and 2002 articles in X
divided by
number of articles published in X in 2001 and 2002
20032001 2002
Journal x All (2003)
Citation data:• Golden standard of scholarly evaluation• Citation = scholarly influences.• Extracted from published materials.• Main bibliometric data source for scholarly
evaluation.
Lowinfluence
Highinfluence
Widely applied• Fair approximation of journal “status”,…but• Used to rank authors, departments, institutions, regions, nations,
etc.• Now common in tenure, promotion and other evaluation
procedures!
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
From ranking to assessment
best
good
not so good
Ranking 0.6:- single criterion of value- single data source, e.g. citation data- impact: frequentist, e.g. average citation rates
(ISI Impact Factor)
“Assessment 3.0”:- situate item in value landscape- multiple sources of scholarly information, e.g. citation, usage and bibliographic data.- impact: relational indicators: trust, reputation, centrality, etc.
?
?
?
1 significant problem:which dimensions to choose?
Mean citation rate
bad
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Many possible facets of scholarly impact
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
The MESUR project.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Johan Bollen (LANL): Principal investigator.Herbert Van de Sompel (LANL): Architectural consultant.Aric Hagberg (LANL): Mathematical and statistical consultant.Marko Rodriguez (LANL): PhD student (Cognitive Science, UCSC).Lyudmila Balakireva (LANL): Database management and development.Wenzhong Zhao (LANL): Data processing, normalization and ingestion.
“The Andrew W. Mellon Foundation has awarded a grant to Los Alamos National Laboratory (LANL) in support of a two-year project that will investigate metrics derived from the network-based usage of scholarly information. The Digital Library Research & Prototyping Team of the LANL Research Library will carry out the project. The project's major objective is enriching the toolkit used for the assessment of the impact of scholarly communication items, and hence of scholars, with metrics that derive from usage data.”
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Usage data
Citation data pertain to 4 levels in the scholarly communication process:
• Community: authors of journal articles.• Artifacts: journal articles.• Data: citation data (+1 year publication delay).• Metrics: mean citation rate rules supreme.• Scale: expensive to extract.
Hence, various initiatives focused on usage data: COUNTER, IRS, SUSHI, CiteBase.But where are the metrics?
However, for usage data:• Community: all users including most authors.• Artifacts: all that is accessible.• Data: recorded upon publication.• Metrics: a range of web and web2.0 inspired metrics, e.g. clickstream and datamining.• Scale: automatically recorded at point of service.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Divergence and convergence.
CSU Usage PR IF (2003) Title (abbv.)
1 78.565 21.455 JAMA-J AM MED ASSOC
2 71.414 29.781 SCIENCE
3 60.373 30.979 NATURE
4 40.828 3.779 J AM ACAD CHILD PSY
5 39.708 7.157 AM J PSYCHIAT
LANL Usage PR IF (2003) Title (abbv.)
1 60.196 7.035 PHYS REV LETT
2 37.568 2.950 J CHEM PHYS
3 34.618 1.179 J NUCL MATER
4 31.132 2.202 PHYS REV E
5 30.441 2.171 J APPL PHYS
MSR Usage PR IF (2005) Title (abbv.)
1 15.830 30.927 SCIENCE
2 15.167 29.273 NATURE
3 12.798 10.231 PNAS
4 10.131 0.402 LECT NOTES COMP SCI
5 8.409 5.854 J BIOL CHEM
Convergence!• Open research questions:
o Is this guaranteed?o To what? A common-baseline?
• What we do know: o Institutional perspective can be
contrasted to baseline.o As aggregation increases in size, so
does value.o Cross-validation is key.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
MESUR1: Metrics from Scholarly Usage of Resources.
1. Pronounced “measure”
Andrew W. Mellon Foundation funded study of usage-based metrics (2006-2008)
Executed at the Digital Library Research and Prototyping team, Los Alamos National Laboratory Research Library
Objectives:1. Create a model of the scholarly
communication process.2. Create a large-scale reference data
set (semantic network) that relates all relevant bibliographic, citation and usage data according to (1).
3. Characterize reference data set.4. Survey usage-based metrics on
basis of reference data set.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Project data flow and work plan.
1
2 3 4
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Project timeline.
We are here!
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Presentation structure:an update on the MESUR project
1) Usage data characterization2) Analysis
1) Usage graphs2) Metrics analysis
3) Results4) Discussion
12
3
4
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Presentation structure:an update on the MESUR project
1) Usage data characterization2) Analysis
1) Usage graphs2) Metrics analysis
3) Results4) Discussion
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Usage data providers
Publisher recorded usage data:- COUNTER statistics- Item-level data
Aggregator recorded usage data:- COUNTER statistics- Item-level data
Institutionally recorded usage data:- SFX logs- Custom log formats (XML)
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Usage data collected
• Data: > 1B usage events and 1B citations
o At this point, 247,083,481 usage events loaded
o Another +1,000,000,000 on the way
• Documents: > 50M documents• Journals: 326,000
• Community: > 100M users and authors combined
Data providers:
• Publishers: 8
• Aggregators: 3
• Institutions: 4
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Data acquired: timelines
Span:• Majority: -1 years• Some minor 2002-2003 data
Sharing models:• Historical data for period [t-x,t]• Periodical updates
Main issues:• Restoration of archives• Digital preservation issues
o All data fields intacto Integration of various sources of usage data
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Presentation structure:an update on the MESUR project
1) Usage data characterization2) Analysis
1) Usage graphs2) Metrics analysis
3) Results4) Discussion
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
MESUR’s usage data representation framework.
Assumptions:1. Sessions identify sequences of events
(same user - same document)2. Documents tied to aggregate request
objects3. Request objects consist of series of
service requests and the date and time at which request took place.
Implications:1. Sequence preserved.2. Most usage data and statistics can be
reconstructed from framework3. Lends itself to XML and RDF format4. Permits request type filtering
Example:1. COUNTER stats= aggregate request
counts for each document (journal) grouped by date/time (month)
2. Usage graph: overlay sessions with same document pairs
• Out of 13 MESUR providers so far, only 3 natively follow this model.• The usage data of another 8 contains the necessary information for conversion
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Implications for structural analysis of usage data
• Sequence preservation allows:o Reconstruction of user behavioro Usage graphs!
• Statistics do not allow this type of analysis BUT are useful for:
o validating resultso rankings
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
How to generate a usage graph.
Documents are associated by co-occurrence in same session
• Same session, same user: common interest
• Frequency of co-occurrence in same session: degree of relationship
Usage data:
• Works for journals and articles
• Anything for which usage was recorded
Options:
• Strict pair-wise sequence
• All within session
Note: not something we invented. Association rule learning in data mining. Beer and diapers!
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Usage graphs
journal1journal2
MESUR graph created:• 200M usage events• Usage restricted to 2006• Journals clipped to 7600
2004 JCR journals• Pair-wise sequences
o Within session, only consecutive pairs
o Raw frequency weights
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Tracing the flow of information.
Marko Rodriguez
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Presentation structure:an update on the MESUR project
1) Usage data characterization2) Analysis
1) Usage graphs2) Metrics analysis
3) Results4) Discussion
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Metric types
Note:• Metrics can be calculated
both on citation and usage data
• Structural metrics require graphs
o Citation graph, e.g. 2004 JCR
o Usage graph, e.g. created by MESUR
Wasserman (1994) Social network analysis
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Frequentist metrics
Raw cites:• Count number of
citations to document or journal
• Count number of times document or journal was accessed
Normalized:
•Journal Impact factor:o Number of citations to journalo Divided by number of articles published in journal
•Usage Impact Factor
• Number of request for journal or article
• Divided by number of articles published in journal
Johan Bollen. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. Journal of the American Society for Information Science and Technology, 59(1), 2008
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Structural metrics calculated from usage graph
Classes of metrics:• Degree• Shortest path• Random walk• Distribution
Degree• In-degree• Out-degree
Shortest path• Closeness• Betweenness• Newman
Random walk• PageRank• Eigenvector
Distribution• In-degree entropy• Out-degree entropy• Bucket Entropy
Each can be defined to take into account weights, i.e. an unweighted and weighted variant.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Social network metrics: different aspects of impact I
In-degree/IFDegree
centrality
Closenesscentrality
Betweennesscentrality
Shortest pathmetrics
Degreemetrics
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Social network metrics: different aspects of impact II
Basic idea:- Random walkers follow edges- + Probability of random teleportation- Visitation numbers converge ~ PageRank
- “Stationary Probability distribution”
From wikipedia.org
Random walkMetrics, e.g. PageRank
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Presentation structure:an update on the MESUR project
1) Usage data characterization2) Analysis
1) Usage graphs2) Metrics analysis
3) Results4) Discussion
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Set of metrics calculated on MESUR data set
List of metrics:JCR 2004 • CITE-BE• CITE-ID• CITE-IE• CITE-IF• CITE-OD• CITE-OE• CITE-PG• CITE-UBW• CITE-UBW-UN• CITE-UCL• CITE-UCL-UN• CITE-UNM• CITE-UNM-UN• CITE-UPG• CITE-UPR• CITE-WBW• CITE-WBW-UN• CITE-WCL• CITE-WCL-UN• CITE-WID• CITE-WNM• CITE-WNM-UN• CITE-WOD• CITE-WPR
Usage-based metrics:
MESUR 2006• USES-BE, • USES-ID• USES-IE• USES-OD• USES-OE• USES-PG• USES-UBW• USES-UBW-UN• USES-UCL• USES-UCL-UN• USES-UNM• USES-UNM-UN• USES-UPG• USES-UPR• USES-WBW• USES-WBW-UN• USES-WCL• USES-WCL-UN• USES-WID• USES-WNM• USES-WNM-UN• USES-WOD• USES-WPR
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Overlaps and discrepancies: Usage Impact Factor
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Overlaps and discrepancies
Citation PageRank vs. citation IF
For Computer Science only
- Map metrics themselves- We can examine structure of metric correlations and arrive at a structural model of metric relationships- Metric semantics -> what do they tell us about scholarly impact?
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Citation rankings
2004 Impact Factorvalue journal1 49.794 CANCER2 47.400 ANNU REV IMMUNOL3 44.016 NEW ENGL J MED4 33.456 ANNU REV BIOCHEM5 31.694 NAT REV CANCER
Citation Pagerankvalue journal1 0.0116 SCIENCE2 0.0111 J BIOL CHEM3 0.0108 NATURE4 0.0101 PNAS5 0.006 PHYS REV LETT
betweennessvalue journal1 0.076 PNAS2 0.072 SCIENCE3 0.059 NATURE4 0.039 LECT NOTES COMPUT SC5 0.017 LANCET
Closenessvalue journal1 7.02e-05 PNAS2 6.72e-05 LECT NOTES COMPUT SC3 6.43e-05 NATURE4 6.37e-05 SCIENCE5 6.37e-05 J BIOL CHEM
In-Degree value journal1 3448 SCIENCE2 3182 NATURE3 2913 PNAS4 2190 LANCET5 2160 NEW ENGL J MED
In-degree entropy Value journal1 9.849 LANCET2 9.748 SCIENCE3 9.701 NEW ENGL J MED4 9.611 NATURE5 9.526 JAMA
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Usage rankings
2004 Impact Factorvalue journal1 49.794 CANCER2 47.400 ANNU REV IMMUNOL3 44.016 NEW ENGL J MED4 33.456 NNU REV BIOCHEM5 31.694 NAT REV CANCER
In-Degree value journal1 4195 SCIENCE2 4019 NATURE3 3562 PNAS4 2438 J BIOL CHEM5 2432 LECT NOTES COMPUT SC
In-degree entropy Value journal1 9.364 MED HYPOTHESES2 9.152 PNAS3 9.027 LIFE SCI4 8.939 LANCET5 8.858 INT J BIOCHEM CELL B
betweennessvalue journal1 0.035 SCIENCE2 0.032 NATURE3 0.020 PNAS4 0.017 LECT NOTES COMPUT SC5 0.006 LANCET
Closenessvalue journal1 0.670 SCIENCE2 0.665 NATURE3 0.644 PNAS4 0.591 LECT NOTES COMPUT SC5 0.587 BIOCHEM BIOPH RES CO
Citation Pagerankvalue journal1 0.0016 SCIENCE2 0.0015 NATURE3 0.0013 PNAS4 0.0010 LECT NOTES COMPUT SC5 0.0008 J BIOL CHEM
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Metrics relationship
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Metrics relationships
• Citation and usage metrics reveal an entirely different pattern
o Citation is split in 2 section:- Degree metrics (right)
- Shortest path and random walk (left)
o Usage is split in 4 clusters:- Degree metrics
- PageRank and entropy
- Closeness
- Betweenness
• Usage pattern can be causedo Noise in usage grapho Higher density of usage/nodes
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Hierarchical analysis
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Presentation structure:an update on the MESUR project
1) Usage data characterization2) Analysis
1) Usage graphs2) Metrics analysis
3) Results4) Discussion
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
MESUR: an update
Usage data:• Creation of single largest reference data set of usage, citation and bibliographic data• +1,000,000,000 usage events loaded in next month• Usage data obtained from multiple publishers, aggregators and institutions• Infrastructure for a continued research program in this domain• Results will guide scholarly evaluation and may help produce standards for usage data representation
Usage graphs:• Adequate data model for item-level usage data naturally leads to this• Reduced distortion compared to raw usage: structure counts, not raw hits• Several option on how to create: MESUR investigates option
Metrics:• Frequentist and structural metrics• Each can represent different facets of scholarly impact• Simple metrics can produce adequate results, e.g. in-degree vs. PageRank. How far should we go?• Note increasing convergence of usage-metrics to citation metrics as sample increases.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.Research Library, Los Alamos National Laboratory
@ November, 2007, NISO - Dallas, TX
Digital Library Research & Prototyping Team
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Some relevant publications.
Marko A. Rodriguez, Johan Bollen and Herbert Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries, Vancouver, June 2007
Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. (cs.DL/0610154)
Johan Bollen and Herbert Van de Sompel. An architecture for the aggregation and analysis of scholarly usage data. In Joint Conference on Digital Libraries (JCDL2006), pages 298-307, June 2006.
Johan Bollen and Herbert Van de Sompel. Mapping the structure of science through usage. Scientometrics, 69(2), 2006.
Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. Journal status. Scientometrics, 69(3), December 2006 (arxiv.org:cs.DL/0601030)
Johan Bollen, Herbert Van de Sompel, Joan Smith, and Rick Luce. Toward alternative metrics of journal impact: a comparison of download and citation data. Information Processing and Management, 41(6):1419-1440, 2005.