copac: reengineering the uk national academic union catalogue to serve the 21st century researcher

Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher

Redesign, collection analysis, recommendations

Joy Palmer, MimasUniversity of Manchester

Key points

• Background & context of Copac• Development in progress• Strategic issues and directions• R&D/Innovations Work– Collections management project– Surfacing the Academic Long Tail project

• Aggregation of 50+ research & specialist libraries• 40 million records <• Aprox 1 million search sessions per month• Primary academic use case – locating long tail materials• Primary workflow use case – cataloguing & ILL support• Funded by JISC since 1996• Sponsored by RLUK (and based on RLUK data +)• In re-engineering process• Expanding consistently to include specialist libraries

Copac…

Others include….

• Imperial War Museum• Chetham’s library• Windsor Castle• National Maritime

Museum• British Museum• French Institute• University of Exeter

Special Collections

• The Women’s Library• Institute of Education• Royal Academy of Music• Kew Royal Botanic

Gardens• Tate Gallery Library• Natural History Museum

Half our users

Advanced researchersHumanities-basedBeen with us a while…Looking for specific itemsMore later…..

And the rest are mostly librarians

Cataloguing SupportCollections MgtILL SupportResearcher Support

Current data model

Copac database

User interfaces

OpenURL router

Admin

Contributing Library

catalogues Consolidated records & Single records plus Local holdings

50m merged records

c.1m Sessions p/m

ftp

Currently OpenText LiveLink Discovery Server

M2M

Users: Open access

Use by HE, FE, NHS, Libraries, Schools, General public

Live circulation data

ILL/Copy via users’ OpenURL server

Pre-processingContributing libraries

Last Updated 07/12/07

RLUK MARC21 db

HTTPRSS OpenURL Z39.50 SRU/SRW

Search updates

Deduplication & conversion to MODS XML

Social media

Z39.50

ESTC at the BL

Nielsen Bookdata

ToC, Reviews, Summaries, URL’s

ftp

http

Live request for book cover

images

COinS

Z39.50& CGI

Database cross-search

Includes details of: books, journal titles, proceedings, theses, maps & plans, print & recorded music, video & film, spoken word, electronic materials, articles, archives, etc.

Google

Harvesting

http

Development activity in progress

• New hardware (Oracle)• Enhanced-de-duplication• Improved search (ranking, facets) • ‘FRBR-ised’ record display• Enhanced user interface• Additional specialist libraries• Graphic redesign

Strategic issues -- macro

• Changing technological landscape and user-expectations

• Death of the physical?• ‘Good enough’ = just-in-time (not a specific

item)• eBook search and discovery challenges• Integrated and cross-domain search

Strategic issues - micro

• Leadership and community role• Identity and positioning• Enabling infrastructure support library

workflows or resource discovery service?• Governance• Collections policy (uniqueness vs.

comprehensiveness?)• Innovation vs. service delivery

Copac Collections Management Project

Can I release this book?

How does my collection compare in strength to that of other UK libraries?

Project background

• Builds on the work of the White Rose Consortium

• Partners are Leeds, York and Sheffield Universities

• Funded by JISC as part of Discovery initiative (making Copac data ‘work harder).

• Sponsored & facilitated by RLUK• 7 months and limited in budget

How it works

• Web-based• Identifies which locations items/batches exist• Search by ISBN, RLUK #, author, title, subject• Batch search (comma delimited sets)• Data visualisation of results– Map views– Graphs

• Record export in MODS, CSV

Exploratory and iterative in approach

Development/testing cycle with partners trialling and providing structured feedback

Six use cases developed

• Identifying last copies among titles considered for withdrawal

• Identifying collection strengths• Deciding whether to conserve a book• Reviewing a collection at the shelves• Prioritising a collection or items for digitisation• Subject strengths – collection development

and marketing/differentiation

Findings

• Need for further development and refinement of the tools (esp. duplication & user interface issues)

• Significant potential for answering strategic questions about the status of collections

Particularly

• Overlap between the holdings of major UK research libraries in particular subject areas;

• Differences in that overlap between different subject disciplines and areas;

• The proportion of unique titles within those collections; • The extent to which researchers will find that they no

longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.

Next proposed steps

• Expansion of test libraries and resilience testing of the tool.

• Address evidence of scalability.• Building collaborations and alliances with interested

organisations pursuing complimentary activity• Addressing the development of a business model for

a service beyond a pilot• More targeted communications and dissemination

of the activity

Surfacing the Academic Long Tail

SALT

Hypothesis…Library circulation activity data can be used to support humanities research by surfacing

underused ‘long tail’ library materials through search

And also… how sustainable would an API-based national shared service be?

Can such a service support users and also library workflows such as collections management?

RLUK, M25, Leeds University, Cambridge University, Sussex University.

--John Rylands University Library: --1.3 million bib records

--600,000 search sessions per month--23% of records unique (cross checked against WorldCat)

--40,000 students

10 years of circulation data

Why aren’t we there yet?

Where’s the business case?user demand

benefitsvalue

sustainability

arts & humanities researchers borrow books…

Y-

Centrifugal searchersBerrypickers from various trails

Quite isolated and prone to pitfalls

market research reveals these users as…

And increasingly they just don’t ask librarians…They ask their tutors and each other where to look…

Researchers are suspicious about UGC, especially ratings & reviews, but….they could see the immediate benefit of‘tacit’ recommender functions….

What if?

this represented a national aggregation of data gathered from the usage activity of these researchers, collected as they worked with a national aggregation of unique or rare research collections?

In humanities research it’s

all the way

What can this mean?

• Surfacing and increasing usage of hidden collections ( & demonstrating value)

• Providing new routes to discovery based on use and disciplinary contexts (not traditional classification).

• Powering ‘centrifugal searching’ and discovery through serendipity

• Enabling new, original research – academic excellence…

» Targeted academic researchers

» Examine relationship between relevance and frequency of borrowing

» Does frequency of borrowing correlate to increased relevancy?

What are users saying?

• 3 focus groups (18 people)• MA/PhD humanities students (mixed ages)• Recommendations are already key to them:– Supervisors/Peers– Amazon– Bib citations

• Don’t accept recommendations blindly

Focus groups and user testing

• 3 focus groups (18 people)• MA/PhD humanities students (mixed ages)• How relevant/useful are the

recommendations at first glance?• Do any other recommendations look useful?• Were you previously aware of these texts?• How likely would you be to borrow the

recommended item?


• Recommendations are already key to them:– Supervisors/Peers– Amazon– Bib citations

• Don’t accept recommendations blindly• Serendipity important (but not to all)


• Very supportive, but in practice founds results too generalist, irrelevant, and sometimes bizzare!

• Lower ranked recommendations much better• Did you find something you’d borrow (yes!)• Find something new? (mixed)• Would you use it? (yes!)• Useful for searching more widely• More university data needed to improve results

Relation between key critical texts at the nose

And the other stuff here

Can we make the data work harder to solve other shared problems?

Issues for sustainability• Is there a clear-cut case for a national shared

service here?• Data model: – data out = easy– data in = not so much

• Licensing & Attribution: collective ownership of a collective pot?

• Is proof of our hypothesis key to sustainability?

Key findings

• Lower thresholds will throw up ‘long tail’ items, but relevance and usefulness is not evident (but what is The Long Tail?)

• Users aren’t concerned about data privacy• This can be successful without a significant

backlog of data• A shared service needs to aggregate activity

data from more libraries (but not many more)

Proposed next steps

• Aggregate more data• Assess impact over time• Gather requirements and costs for a shared

service• Establish more data extraction recipes• Investigate utility for collections mgt further• Investigate usefulness for teachers &

supervisors

Thanks for listening….

Thanks for listening

copac: reengineering the uk national academic union catalogue to serve the 21st century researcher

Education

academic long tail project

progressstrategic issues

recommendationsjoy palmer

collection analysis

century researcherredesign

mimasuniversity of manchester