repository statistics peter millington technical development officer sherpa, university of...

62
Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Upload: julian-arnold

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Repository Statistics

Peter Millington

Technical Development Officer

SHERPA, University of Nottingham

Page 2: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Overview

Introduction

Global statistics

The what & why of repository statistics

Benchmarks & data sources

Compilation methods

Web usage logging tools

Google Analytics demo

Problems and solutions

Group session – Key issues

Page 3: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Global Repository Statistics

Data Sources – Global lists of repositories• OpenDOAR - http://www.opendoar.org/• ROAR - http://roar.eprints.org/• Repository66- http://www.repository66.org/

May be useful for advocacy work

Examples of types of chart & presentation

Page 4: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 5: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 6: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 7: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 8: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 9: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

ROAR – Individual Growth Charts

Page 10: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

ROAR – Individual Source Data

Month Records Archives200407 12200408 34200409 77200410 106200411 149200412 164200501 187200502 212200503 272200504 324200505 389200506 426200507 446200508 492200509 547200510 607200511 631200512 750200601 794200602 860200603 1019200604 1090200605 1128200606 1307

Month Records Archives200607 1347200608 1405200609 1469200610 1530200611 1610200612 1705200701 1768200702 1853200703 1934200704 2042200705 2169200706 2239200707 2264200708 2352200709 2374200710 2400200711 2438200712 2484200801 2540200802 2573200803 2611200804 2643200805 2681200806 2689

Page 11: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 12: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 13: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Delegates’ What and Why of Statistics

Rate of growth• For advocacy• Measure of success – for our paymasters

Rate of usage• Targeting weak areas – departments• Measure of success• Justifying funding

Most downloaded author/paper• Promotes interest and engagement from authors

Page 14: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Delegates’ What and Why of Statistics

Where are visitors coming from – referrers• Curiosity – is it being seen by the right people

Citation statistics• To demonstrate the beneficial impact of repositories

Drilling down for more detail• For a sense reality

Steep slopes, animation, etc• Glitzy marketing

Page 15: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Individual Repositories - Content

Growth & Deposition rates• Measure of progress• Impact of advocacy events• Impact of mandatory deposition

Types of document or item• Trend-watching?

Breakdown by department and/or author• How much is everyone contributing?

Proportion of full text v metadata only• Measure of usefulness

Page 16: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Item types: Universidade do Minho

Page 17: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Individual Repositories - Performance

Proportion of publications deposited• How comprehensive is the archive?

Proportion of authors who are depositing• Are they complying with local mandates?

Compliance with funders’ mandates• Are you meeting your obligations?

Repository administration• Are your turn round times acceptable?

Page 18: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Compliance with the CERN Mandate

Page 19: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Compliance Benchmarks

Counting publications• Institution-wide bibliographies

• e.g. Maintained by research managers

• Publication lists on departmental web pages• Public/Commercial databases – ISI, Medline, etc

Counting authors• Who qualifies as an author?

• Academic staff, Research students, Managers

• University Calendars & Departmental staff lists

Page 20: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Individual Repositories - Usage

Rates of usage• Measure of usefulness• Impact of news-related items

Most downloaded items• Identifying research(ers) with most impact?• Engendering competition between authors?

Downloads according to author• Performance reviews?

Geographical distribution of users• Are you reaching your intended audience?

Page 21: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Sources of Data

Repository’s own database

OAI-PMH

Server’s access log

Remote logging

Page 22: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Compilation Methods

Repository’s own database• Copying from the human interface• Interactive SQL commands

Page 23: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Copying from the Human Interface

Page 24: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 25: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Interactive SQL Commands

mysql> SELECT type,COUNT(*) FROM eprint GROUP BY type;

+-----------------+----------+| type | COUNT(*) |+-----------------+----------+| article | 456 || book | 5 || book_section | 39 || conference_item | 173 || exhibition | 1 || monograph | 18 || other | 3 || thesis | 4 |+-----------------+----------+8 rows in set (0.00 sec)

64%1%

6%

25%

0%3%0%1%

article

book

book_section

conference_item

exhibition

monograph

other

thesis

Page 26: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Compilation Methods

Repository’s own database• Copying from the human interface• Interactive SQL commands

OAI-PMH• Harvesting programs – e.g. ROAR’s Celestial

Page 27: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

OAI-PMH ListIdentifiers

Page 28: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

OAI-PMH ListRecords

Page 29: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

ROAR - Celestial

date identifier url20070618 oai:bora.uib.no:1956/2270 Department of Earth Science20070625 oai:bora.uib.no:1956/2272 Department of History 20070625 oai:bora.uib.no:1956/2273 Department of the History of Religions 20070626 oai:bora.uib.no:1956/2274 Section for Endocrinology20070626 oai:bora.uib.no:1956/2275 Department of the History of Religions 20070626 oai:bora.uib.no:1956/2276 Department of the History of Religions 20070626 oai:bora.uib.no:1956/2277 Department of the History of Religions 20070626 oai:bora.uib.no:1956/2278 Department of the History of Religions 20070626 oai:bora.uib.no:1956/2279 Department of Oral Sciences20070626 oai:bora.uib.no:1956/2281 Department of the History of Religions 20070626 oai:bora.uib.no:1956/2282 Department of Sociology 20070626 oai:bora.uib.no:1956/2283 Else Æyen20070628 oai:bora.uib.no:1956/2284 Section for Art History20070629 oai:bora.uib.no:1956/2285 Section for Russian20070629 oai:bora.uib.no:1956/2286 Department of Geography20070629 oai:bora.uib.no:1956/2287 Department of Greek, Latin and Egyptology20070702 oai:bora.uib.no:1956/2288 Section for Spanish20070702 oai:bora.uib.no:1956/2289 Department of Mathematics20070702 oai:bora.uib.no:1956/2290 Department of Geography20070702 oai:bora.uib.no:1956/2291 Department of Geography20070702 oai:bora.uib.no:1956/2292 Department of Biology 20070703 oai:bora.uib.no:1956/2293 Department of Biology

Page 30: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Compilation Methods

Repository’s own database• Copying from the human interface• Interactive SQL commands

OAI-PMH• Harvesting programs – e.g. ROAR’s Celestial

Server’s access log• Web usage statistics tools

Page 31: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Raw Web Access Logs

209.237.238.179 - - [10/Apr/2005:05:34:06 +0100] "GET /portfolio.css HTTP/1.0" 200 816 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:16:27 +0100] "GET /DAWN_Index.htm HTTP/1.0" 200 8392 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:17:44 +0100] "GET /Eric.htm HTTP/1.0" 200 6975 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:21:12 +0100] "GET /Library_Form.htm HTTP/1.0" 200 7709 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:22:48 +0100] "GET /cleansing.htm HTTP/1.0" 200 11016 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:25:02 +0100] "GET /index.htm HTTP/1.0" 200 7613 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:28:19 +0100] "GET /integration.htm HTTP/1.0" 200 8027 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:31:35 +0100] "GET /merging.htm HTTP/1.0" 200 9132 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:07:34:39 +0100] "GET /publication.htm HTTP/1.0" 200 5327 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:08:22:38 +0100] "GET /ABACUS_Index.htm HTTP/1.0" 200 5421 "-" "ia_archiver"209.237.238.179 - - [10/Apr/2005:08:27:34 +0100] "GET /limitations.htm HTTP/1.0" 200 3781 "-" "ia_archiver"210.173.179.17 - - [20/Dec/2004:13:22:03 +0000] "GET /robots.txt HTTP/1.1" 404 - "-" "gazz/5.0 ([email protected])"210.173.179.17 - - [20/Dec/2004:13:23:51 +0000] "GET / HTTP/1.1" 200 7613 "-" "gazz/5.0 ([email protected])"210.173.179.17 - - [20/Dec/2004:13:25:34 +0000] "GET /Logo.gif HTTP/1.1" 200 3838 "-" "gazz/5.0 ([email protected])"210.173.179.17 - - [20/Dec/2004:13:27:17 +0000] "GET /contact.htm HTTP/1.1" 200 4626 "-" "gazz/5.0 ([email protected])"210.173.179.17 - - [20/Dec/2004:13:29:00 +0000] "GET /profile.htm HTTP/1.1" 200 10533 "-" "gazz/5.0

([email protected])"210.173.179.17 - - [20/Dec/2004:13:37:35 +0000] "GET /index.htm HTTP/1.1" 200 7613 "-" "gazz/5.0 ([email protected])"210.173.179.17 - - [20/Dec/2004:13:47:55 +0000] "GET /publication.htm HTTP/1.1" 200 5327 "-" "gazz/5.0

([email protected])"210.173.179.17 - - [20/Dec/2004:13:49:39 +0000] "GET /InsideInfo.jpg HTTP/1.1" 200 19372 "-" "gazz/5.0

([email protected])"

Recorded fields include:• IP Address of the computer requesting a file• Date & time transaction completed• Name of file requested• Success code – usually 200 for “successfully completed”• File size in bytes

Page 32: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Web Usage Statistics Tools

Analog• http://www.analog.cx/

Webalizer• http://www.mrunix.net/webalizer/

AWStats• http://www.mrunix.net/webalizer/

etc.

Page 33: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Sample output from theAnalog Statistics Package

Page 34: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 35: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 36: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Sample output from theWebalizer Statistics Package

Page 37: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 38: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 39: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 40: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 41: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 42: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Sample output from theAWStats Statistics Package

Page 43: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 44: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 45: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 46: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 47: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 48: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 49: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Compilation Methods

Repository’s own database• Copying from the human interface• Interactive SQL commands

OAI-PMH• Harvesting programs – e.g. ROAR’s Celestial

Server’s access log• Web usage statistics tools

Remote logging• Google Analytics

Page 50: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Google Analytics

http://www.google.com/analytics

Sign up to a Google Account

Specify the URL to be logged

Obtain snippet of JavaScript code

Insert snippet into HTML of pages to be logged• Ideally into a template file• Make sure the modified pages are live!

Logging starts automatically

Log in to your account to view the analytics

Page 51: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 52: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 53: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 54: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 55: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 56: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Google Analytics

JavaScript snippet <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");

document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));

</script>

<script type="text/javascript">

var pageTracker = _gat._getTracker("UA-3477654-3");

pageTracker._initData();

pageTracker._trackPageview();

</script>

Find URL Containing/Excluding• String

• e.g. “pdf”

• Regular expressions• e.g. /[0-9]*/ for EPrints IDs

Page 57: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Problems

Web bots and crawlers• Inflating usage volume• Scewing usage time series

Auxiliary files & non-eprint pages• CSS style sheet files• Image files – jpeg, gif, etc.• Index pages

Linking URLs to bibliographic references• What does that eprint number mean?

Page 58: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Problems and Solutions

Web bots and crawlers• Use robots.txt & meta robots tags to prevent crawling• Filtering out known bots• Still leaves maverick hackers’ & students’ bots

Auxiliary files & non-eprint pages• Configuring & tuning the analysis tool• Filter using ‘regular expressions’

Linking URLs to bibliographic references• Programmatic concordance• e.g. IRStats

Page 59: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Page 60: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Over to Chris for DSpace statistics…

Page 61: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

What are your priorities for statistics?

Page 62: Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

Peter Millington

[email protected]