exploring a world of networked information built from free-text metadata

19
Shenghui Wang Rob Koopman Exploring a world of networked information built from free-text metadata OCLC Research EMEA ELAG2015

Upload: shenghui-wang

Post on 28-Jul-2015

343 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Exploring a world of networked information built from free-text metadata

Shenghui WangRob Koopman

Exploring a world of networked information built from free-text metadata

OCLC Research EMEA

ELAG2015

Page 2: Exploring a world of networked information built from free-text metadata

What would you do if you are interested in a topic?

Page 3: Exploring a world of networked information built from free-text metadata
Page 4: Exploring a world of networked information built from free-text metadata
Page 5: Exploring a world of networked information built from free-text metadata

Difficult to answer these questions: • What are the different aspects of this topic? • Are there related aspects missing in my search terms? • Who are the most prominent authors about this topic? • Which journals publish most about this topic? • How have others — e.g. librarians — described and classified

this topic?

Page 6: Exploring a world of networked information built from free-text metadata

Demo

• http://thoth.pica.nl/relate?input=opac

Page 7: Exploring a world of networked information built from free-text metadata

How do we do this?

• OFFLINE: generates a semantic representation for each entity

• ONLINE: finds the most related entities and using multidimensional scaling to display

Page 8: Exploring a world of networked information built from free-text metadata

Build semantic representation

• Basic assumptions– Entities can be represented by its context– Entities which share more context are more likely

to be related• Context is the textual environment where an

entity occurs

• The effects of state prekindergarten programs on young children’s school readiness in five states

• [author:jung kwanghee]• [subject:readiness for school]

Page 9: Exploring a world of networked information built from free-text metadata

Dataset

● ArticleFirst, 65 million articles● Selected 4 million entities (topical terms,

authors, ISSNs, Dewey decimal codes)● Represented by 1 million topical terms

But a matrix of 4M x 1M is too big to process

Page 10: Exploring a world of networked information built from free-text metadata

Dimension reduction based on Random Projection

C: a co-occurrence matrix

R: a random matrix of +/-1

C’: approximation of C after random projection -- Semantic matrix

Page 11: Exploring a world of networked information built from free-text metadata

Online interface

• Find mutual nearest neighbors

• Use multidimensional scaling to display

Page 12: Exploring a world of networked information built from free-text metadata

Nearest neighbors

Page 13: Exploring a world of networked information built from free-text metadata

Mutual nearest neighbors

Page 14: Exploring a world of networked information built from free-text metadata
Page 15: Exploring a world of networked information built from free-text metadata

Possible applications

• Explorative interface• Context based search:

– brain

• Journal finder– Arctic ice journals– http://brain.oxfordjournals.org/

• Author name disambiguation– pre kindergarten

Page 17: Exploring a world of networked information built from free-text metadata

Ariadne(demo) http://thoth.pica.nl/relate

• An extremely fast way of navigating large scale hetereogeneous entities

• Generalisable to different datasets– Full WorldCat– Small but highly curated astrophysics dataset

• Supports explorative information retrieval and entity disambiguation

Page 18: Exploring a world of networked information built from free-text metadata

References• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting

Journal Similarity Based on What Has Been Published There.” In Proceedings of Digital Libraries 2014, 483–484. London, United Kingdom. Association for Computing Machinery. Paper, Poster

• Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked Information”. In CHI '15 Extended Abstracts on Human Factors in Computing Systems. ACM, Seoul, South Korea. Paper, Poster

• Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization of topics - browsing through terms, authors, journals and cluster allocations”. In Proceedings of 15th International Conference on Scientometrics & Informetrics. Istanbul, Turkey. Paper

Page 19: Exploring a world of networked information built from free-text metadata

Explore. Share. Magnify.

Thank youShenghui WangRob KoopmanOCLC Research EMEA

[email protected] [email protected]