copac: reengineering the uk national academic union catalogue to serve the 21st century researcher
DESCRIPTION
TRANSCRIPT
Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher
Redesign, collection analysis, recommendations
Joy Palmer, MimasUniversity of Manchester
Key points
• Background & context of Copac• Development in progress• Strategic issues and directions• R&D/Innovations Work– Collections management project– Surfacing the Academic Long Tail project
• Aggregation of 50+ research & specialist libraries• 40 million records <• Aprox 1 million search sessions per month• Primary academic use case – locating long tail materials• Primary workflow use case – cataloguing & ILL support• Funded by JISC since 1996• Sponsored by RLUK (and based on RLUK data +)• In re-engineering process• Expanding consistently to include specialist libraries
Copac…
Others include….
• Imperial War Museum• Chetham’s library• Windsor Castle• National Maritime
Museum• British Museum• French Institute• University of Exeter
Special Collections
• The Women’s Library• Institute of Education• Royal Academy of Music• Kew Royal Botanic
Gardens• Tate Gallery Library• Natural History Museum
Half our users
Advanced researchersHumanities-basedBeen with us a while…Looking for specific itemsMore later…..
And the rest are mostly librarians
Cataloguing SupportCollections MgtILL SupportResearcher Support
Current data model
Copac database
User interfaces
OpenURL router
Admin
Contributing Library
catalogues Consolidated records & Single records plus Local holdings
50m merged records
c.1m Sessions p/m
ftp
Currently OpenText LiveLink Discovery Server
M2M
Users: Open access
Use by HE, FE, NHS, Libraries, Schools, General public
Live circulation data
ILL/Copy via users’ OpenURL server
Pre-processingContributing libraries
Last Updated 07/12/07
RLUK MARC21 db
HTTPRSS OpenURL Z39.50 SRU/SRW
Search updates
Deduplication & conversion to MODS XML
Social media
Z39.50
ESTC at the BL
Nielsen Bookdata
ToC, Reviews, Summaries, URL’s
ftp
http
Live request for book cover
images
COinS
Z39.50& CGI
Database cross-search
Includes details of: books, journal titles, proceedings, theses, maps & plans, print & recorded music, video & film, spoken word, electronic materials, articles, archives, etc.
Harvesting
http
Development activity in progress
• New hardware (Oracle)• Enhanced-de-duplication• Improved search (ranking, facets) • ‘FRBR-ised’ record display• Enhanced user interface• Additional specialist libraries• Graphic redesign
Strategic issues -- macro
• Changing technological landscape and user-expectations
• Death of the physical?• ‘Good enough’ = just-in-time (not a specific
item)• eBook search and discovery challenges• Integrated and cross-domain search
Strategic issues - micro
• Leadership and community role• Identity and positioning• Enabling infrastructure support library
workflows or resource discovery service?• Governance• Collections policy (uniqueness vs.
comprehensiveness?)• Innovation vs. service delivery
Copac Collections Management Project
Can I release this book?
How does my collection compare in strength to that of other UK libraries?
Project background
• Builds on the work of the White Rose Consortium
• Partners are Leeds, York and Sheffield Universities
• Funded by JISC as part of Discovery initiative (making Copac data ‘work harder).
• Sponsored & facilitated by RLUK• 7 months and limited in budget
How it works
• Web-based• Identifies which locations items/batches exist• Search by ISBN, RLUK #, author, title, subject• Batch search (comma delimited sets)• Data visualisation of results– Map views– Graphs
• Record export in MODS, CSV
Exploratory and iterative in approach
Development/testing cycle with partners trialling and providing structured feedback
Six use cases developed
• Identifying last copies among titles considered for withdrawal
• Identifying collection strengths• Deciding whether to conserve a book• Reviewing a collection at the shelves• Prioritising a collection or items for digitisation• Subject strengths – collection development
and marketing/differentiation
Findings
• Need for further development and refinement of the tools (esp. duplication & user interface issues)
• Significant potential for answering strategic questions about the status of collections
Particularly
• Overlap between the holdings of major UK research libraries in particular subject areas;
• Differences in that overlap between different subject disciplines and areas;
• The proportion of unique titles within those collections; • The extent to which researchers will find that they no
longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.
Next proposed steps
• Expansion of test libraries and resilience testing of the tool.
• Address evidence of scalability.• Building collaborations and alliances with interested
organisations pursuing complimentary activity• Addressing the development of a business model for
a service beyond a pilot• More targeted communications and dissemination
of the activity
Surfacing the Academic Long Tail
SALT
Hypothesis…Library circulation activity data can be used to support humanities research by surfacing
underused ‘long tail’ library materials through search
And also… how sustainable would an API-based national shared service be?
Can such a service support users and also library workflows such as collections management?
RLUK, M25, Leeds University, Cambridge University, Sussex University.
--John Rylands University Library: --1.3 million bib records
--600,000 search sessions per month--23% of records unique (cross checked against WorldCat)
--40,000 students
10 years of circulation data
Why aren’t we there yet?
Where’s the business case?user demand
benefitsvalue
sustainability
arts & humanities researchers borrow books…
Y-
Centrifugal searchersBerrypickers from various trails
Quite isolated and prone to pitfalls
market research reveals these users as…
And increasingly they just don’t ask librarians…They ask their tutors and each other where to look…
Researchers are suspicious about UGC, especially ratings & reviews, but….they could see the immediate benefit of‘tacit’ recommender functions….
What if?
this represented a national aggregation of data gathered from the usage activity of these researchers, collected as they worked with a national aggregation of unique or rare research collections?
In humanities research it’s
all the way
What can this mean?
• Surfacing and increasing usage of hidden collections ( & demonstrating value)
• Providing new routes to discovery based on use and disciplinary contexts (not traditional classification).
• Powering ‘centrifugal searching’ and discovery through serendipity
• Enabling new, original research – academic excellence…
» Targeted academic researchers
» Examine relationship between relevance and frequency of borrowing
» Does frequency of borrowing correlate to increased relevancy?
What are users saying?
• 3 focus groups (18 people)• MA/PhD humanities students (mixed ages)• Recommendations are already key to them:– Supervisors/Peers– Amazon– Bib citations
• Don’t accept recommendations blindly
Focus groups and user testing
• 3 focus groups (18 people)• MA/PhD humanities students (mixed ages)• How relevant/useful are the
recommendations at first glance?• Do any other recommendations look useful?• Were you previously aware of these texts?• How likely would you be to borrow the
recommended item?
What are users saying?
• Recommendations are already key to them:– Supervisors/Peers– Amazon– Bib citations
• Don’t accept recommendations blindly• Serendipity important (but not to all)
What are users saying?
• Very supportive, but in practice founds results too generalist, irrelevant, and sometimes bizzare!
• Lower ranked recommendations much better• Did you find something you’d borrow (yes!)• Find something new? (mixed)• Would you use it? (yes!)• Useful for searching more widely• More university data needed to improve results
Relation between key critical texts at the nose
And the other stuff here
Can we make the data work harder to solve other shared problems?
Issues for sustainability• Is there a clear-cut case for a national shared
service here?• Data model: – data out = easy– data in = not so much
• Licensing & Attribution: collective ownership of a collective pot?
• Is proof of our hypothesis key to sustainability?
Key findings
• Lower thresholds will throw up ‘long tail’ items, but relevance and usefulness is not evident (but what is The Long Tail?)
• Users aren’t concerned about data privacy• This can be successful without a significant
backlog of data• A shared service needs to aggregate activity
data from more libraries (but not many more)
Proposed next steps
• Aggregate more data• Assess impact over time• Gather requirements and costs for a shared
service• Establish more data extraction recipes• Investigate utility for collections mgt further• Investigate usefulness for teachers &
supervisors
Thanks for listening….
Thanks for listening