ithaka a systemwide view of library collections brian lavoie, oclc research roger c. schonfeld,...
TRANSCRIPT
Ithaka
A Systemwide View of Library Collections
Brian Lavoie, OCLC ResearchRoger C. Schonfeld, Ithaka
CNI Spring Task Force Meeting April 5, 2005
Ithaka
Systemwide View of Library Collections
Print collections have been changing, as the distinction between local and external resources is increasingly blurred due to resource sharing
Digitization combined with network technologies creates opportunities for one “copy” of a resource to be shared across many libraries
These forces inevitably are going to lead to a shift in focus to the resources of the “system,” rather than individual library collections
Ithaka
Mass Digitization
Great deal of public and private investment in digitization programs … e.g., JSTOR, ARTstor - and of course mass digitization spearheaded via GooglePrint
Digitization opportunities unlimited; resources are not …• How to determine priorities? What programs of
digitization will be necessary to meet the needs of the scholarly community?
Ithaka
Print Preservation
From a systemwide perspective, what preservation framework makes most sense for print resources?
How have preservation frameworks changed over time?
As retrospective materials become increasingly available in digital form, will new frameworks for print preservation be necessary?
Ithaka
What Are We Going to Do Today?
The kinds of collaborations necessary to begin to take advantage of a systemwide perspective are very hard, both from economic and political standpoints
We will not be proposing any answers!
Instead, we thought to take advantage of the WorldCat resource – which affords the broadest view of print collections – to build a bridge from a local perspective to the beginnings of a systemwide perspective
Today’s presentation focuses on print books
Ithaka
Data Sources
WorldCat: world’s largest and most comprehensive bibliographic database• > 20,000 libraries worldwide have contributed to the
development of WorldCat
Copy of WorldCat from January 2005:• ~55 million records
Copy of WorldCat holdings file from January 2005:• ~950 million holdings
Ithaka
Data Source Limitations
Not all published materials are cataloged in WorldCat
Not all library holdings are represented in WorldCat
Largely reflects North American library collections
So … WorldCat does not embody the whole universe of library collections and holdings – but it’s a very good approximation!
Ithaka
1. The “Systemwide Collection”
Size Age
Ithaka
54,831,000
0
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
Total WorldCat Records Language-based or manuscriptmonographs, excluding
government documents andtheses/dissertations, in print
format only
How Many “Books” Are Held in the Systemwide Collection?
Ithaka
How Many “Books” Are Held in the Systemwide Collection?
45,269,000
54,831,000
0
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
Total WorldCat Records Language-based or manuscriptmonographs
Language-based or manuscriptmonographs, excluding
government documents andtheses/dissertations, in print
format only
Ithaka
How Many “Books” Are Held in the Systemwide Collection?
35,251,000
45,269,000
54,831,000
0
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
Total WorldCat Records Language-based or manuscriptmonographs
Language-based or manuscriptmonographs, excluding
government documents andtheses/dissertations
Language-based or manuscriptmonographs, excluding
government documents andtheses/dissertations, in print
format only
Ithaka
How Many “Books” Are Held in the Systemwide Collection?
31,923,00035,251,000
45,269,000
54,831,000
0
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
Total WorldCat Records Language-based or manuscriptmonographs
Language-based or manuscriptmonographs, excluding
government documents andtheses/dissertations
Language-based or manuscriptmonographs, excluding
government documents andtheses/dissertations, in print
format only
Ithaka
Works and Manifestations
FRBR (Functional Requirements for Bibliographic Records):• Hierarchy of bibliographic entities • Works, Expressions, Manifestations, Items
Work: distinct intellectual or artistic creation• e.g., Macbeth
Manifestation: physical embodiment of an expression of a work• e.g., Macbeth, Folger Shakespeare Library edition, published in
paperback by Washington Square Press (2004)
WorldCat records describe FRBR manifestations
Works identified using OCLC “FRBRization” algorithm• Converts MARC21 bibliographic databases into FRBR “work-sets”• http://www.oclc.org/research/software/frbr/
Ithaka
Most Book Works Have Few Manifestations
31,923,000
26,025,000
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
Manifestations Works
Language-based or manuscript monographs, excluding government documents and theses/dissertations, in print format only
Ithaka
Print Book Manifestations and Works – and Digital Manifestations
31,923,000
26,025,000
121,6890
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
Manifestations Works Digital Manifestations
Language-based or manuscript monographs, excluding government documents and theses/dissertations, in print format only
Ithaka
How Old Are the Components of the Systemwide Collection? Cumulative Book Works/Manifestations Over Time
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
1700
1710
1720
1730
1740
1750
1760
1770
1780
1790
1800
1810
1820
1830
1840
1850
1860
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
Manifestations
Works
Ithaka
How Old Are the Components of the Systemwide Collection? Book Works/Manifestations per Year
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1700
1710
1720
1730
1740
1750
1760
1770
1780
1790
1800
1810
1820
1830
1840
1850
1860
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
Manifestations
Works
Ithaka
Age of Works and Manifestations: Relative to 1923 (millions)
0
5
10
15
20
25
30
Manifestations Works
Pre-1923
1923andAfter
18%
82%
17%
83%
Ithaka
2. Individual Collections Cumulate to Form the System
How will digitization bring them together virtually?
Ithaka
Minimal OverlapBook Works Held by X or More Libraries (in millions)
0
5
10
15
20
25
30
1 ormore
2 ormore
3 ormore
4 ormore
5 ormore
6 ormore
7 ormore
8 ormore
9 ormore
10 ormore
100 ormore
Number of Libraries
Ithaka
Works Held BroadlyBook Works Held by X or More Libraries (in millions)
0
1
2
3
4
5
6
7
10 ormore
50 ormore
100 ormore
200 ormore
300 ormore
400 ormore
500 ormore
Number of Libraries
Ithaka
Works Held BroadlyBook Works Held by X or More Libraries, as Percent of Total Book Works
24%
9%6%
4%2% 2% 1%
0%
5%
10%
15%
20%
25%
30%
10 ormore
50 ormore
100 ormore
200 ormore
300 ormore
400 ormore
500 ormore
Number of Libraries
Ithaka
The Virtual System in Practice
GooglePrint digitization initiative
Questions:• How many print books does this initiative potentially impact?• What proportion of “systemwide print book collection” does this
represent?• Overlap (how much held broadly? how much held uniquely?)
Forthcoming paper from OCLC researchers that will offer some perspective on these questions
Hopefully, work like this will help to establish set of important questions/metrics that need to be addressed when:• Considering digitization initiatives• Considering implications of a changing world of research and
learning for collections
Ithaka
3. How Is Rareness Distributed through the System?
Ithaka
Systemwide Holdings of Print Works
1 holding37%
2 holdings14%
3-5 holdings16%
More than 5 holdings
33%
Ithaka
More than 9 millions works are held only once
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
1 holding 2 holdings 3 holdings 4 holdings 5 holdings 6 to 10holdings
11 to 20holdings
21-50holdings
51-100holdings
100+holdings
Ithaka
4. What Systemwide Preservation Frameworks Have Served Us?
Ithaka
The Growth and Peak in Average Holdings Over Time
0
5
10
15
20
25
30
35
40
45
0 25 50 75 100 125 150 175 200
Age in Years
Av
era
ge
Ho
ldin
gs
Manifestations
Works
Ithaka
Steady, Gradual Nineteenth Century Growth in Works Held Many Times…
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
18
01
-18
10
18
11
-18
20
18
21
-18
30
18
31
-18
40
18
41
-18
50
18
51
-18
60
18
61
-18
70
18
71
-18
80
18
81
-18
90
18
91
-19
00
2 to 10
11 to 50
51 to 100
101 to 200
201 to 400
400 to 1000
1000+
Ithaka
…Rapid Postwar Increase in Works Held Many Times
0
500,000
1,000,000
1,500,000
2,000,000
2,500,0001
91
1-1
92
0
19
21
-19
30
19
31
-19
40
19
41
-19
50
19
51
-19
60
19
61
-19
70
19
71
-19
80
19
81
-19
90
19
91
-20
00
2 to 10
11 to 50
51 to 100
101 to 200
201 to 400
400 to 1000
1000+
Ithaka
Of Works with Multiple Holdings, Steady Increase Through the 1960s in the Proportion Held Many Times
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
18
01
-18
10
18
11
-18
20
18
21
-18
30
18
31
-18
40
18
41
-18
50
18
51
-18
60
18
61
-18
70
18
71
-18
80
18
81
-18
90
18
91
-19
00
19
01
-19
10
19
11
-19
20
19
21
-19
30
19
31
-19
40
19
41
-19
50
19
51
-19
60
19
61
-19
70
19
71
-19
80
19
81
-19
90
19
91
-20
00
1000+
400 to 1000
201 to 400
101 to 200
51 to 100
11 to 50
2 to 10
Ithaka
Summary and Discussion
Ithaka
Summary: Findings
1. Roughly 26 million print title works, represented in 32 million print title manifestations, are held by OCLC member libraries. This should be seen as a minimum in considering the number of printed books over time. Half of the books date from the period since 1977. How can a mass digitization strategy effectively manage the intellectual property ramifications of this finding?
2. Publications are distributed across a wide number of libraries, and any mass digitization strategy that ignores this distributional reality is likely to omit numerous works. How should this finding impact the library system’s planning for a massive format migration?
Ithaka
Summary: Findings
3. Rareness is very common within the system. This has been recognized by many librarians but is not always taken into account in policy development. How will any future print preservation strategy address this reality? Can data on rareness help to inform digitization strategies?
4. Redundancy in holdings across the system has changed over time. How has this led our framework for preservation to become more or less secure? What lessons should be drawn as we consider other print preservation strategies, particularly in the era of mass digitization, such as paper repositories? What lessons might there be for digital preservation?
Ithaka
More information …
More in-depth article forthcoming …
Contact us with comments and questions:• Brian Lavoie: [email protected]• Roger C. Schonfeld: [email protected]