more than just black and white: a case for grey literature references in scientific paper...

19
More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang Wee Kim Wee School of Communcation and Information Nanyang Technological University, Singapore Presentation for ICADL’15 December 10 th 2015

Upload: aravind-sesagiri-raamkumar

Post on 12-Jan-2017

347 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

More Than Just Black and White: A Case for Grey Literature References in Scientific Paper

Information Retrieval Systems

Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang

Wee Kim Wee School of Communcation and InformationNanyang Technological University, Singapore

Presentation for ICADL’15December 10th 2015

Page 2: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

What is this study about?“How about retrieving/recommending Grey Literature (GL) materials along with scientific articles during the literature review sessions of researchers?”

Look at past evidence => Examine the extent of GL referencing in published papers

How to push GL => Propose and evaluate boosting techniques for the relevant settings:-

a) Scientific Paper Information Retrieval (SPIR)

b) Scientific Paper Recommender Systems (SPRS) [Future Work]

Page 3: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

BACKGROUND Scientific papers are the key information resources of researchers

Why? => Rigorous review process of publishers

Grey Literature (GL) “Information produced on all levels of government, academia, business and industry in electronic and print formats not controlled by commercial publishing” (Schöpfel et al., 2005)

Examples: Websites, technical reports and dissertations

Page 4: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

BACKGROUND ON GL GL Initiatives

Conferences (GL international conference) Journals (TGJ – The Grey Journal) Open Access Repositories (OpenGrey) (Stock & Henrot, 2011)

Recognition on GL referencing in papers (Di Cesare et al., 2010; Booth et al.,2011)

Studies on ascertaining extent of GL in meta-analyses (Farace & Schöpfel, 2010; Yasin & Hasnain, 2012)

Debate on inclusion of GL in systematic literature reviews and meta-analyses (Farace & Schöpfel, 2010)

Page 5: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

OBJECTIVES RO1: Analyze the referencing extent of GL materials in an

extract of papers from ACM Digital Library (ACM DL)

RO2: Propose boosting techniques for pushing GL materials in Scientific Paper Information Retrieval (SPIR) system settings due to their applicability in digital libraries

Page 6: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO1: Analyze the referencing extent of GL materials in an extract of papers from ACM Digital Library (ACM DL)

Page 7: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO1: DETAILS OF ACM DL EXTRACT Extract: Articles from the ACM DL published between the years 1951

and 2010

Articles Count: 122,406 articles with 103,739 proceedings articles and 18,667 periodicals articles

Bibliography Parsing: AnyStyle (http://anystyle.io/) for extracting the title, reference type and date from the references in the bibliographies of the articles References type based on the BibTeX entry types

Parsed References: 2,320,345 references were parsed and the required fields were extracted

Page 8: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO1: ANALYSIS OF BIBLIOGRAPHIC REFERENCES

GL Percentage (GLP): Proceedings - 17.61% Periodicals -14.48%

Survey articles have the lowest GLP (12.86%) The types demos (18.71%) and tutorials (17.98%) have the highest

GLP as the references

Page 9: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO1: ANALYSIS OF BIBLIOGRAPHIC REFERENCES

Reference-type Count Percentage of Total

misc 268610 14.69%techreport 33539 1.83%

thesis 20024 1.10%unpublished 618 0.03%

patent 615 0.03%Top 5 GL Reference-types in ACM DL

Extract

Miscellaneous (misc) is the highly referenced GL type (14.69% of total) Consists of subtypes such as publicly viewable websites such as

Wikipedia and YouTube.

Wrongly formatted references also fall under this category

Technical reports (1.83%) and PhD/MSc theses (1.10%) are the second and third most referenced types M.Sc. project reports and thesis are referenced as technical reports.

Page 10: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO2: Propose boosting techniques for pushing GL materials in Scientific Paper Information Retrieval (SPIR)

Page 11: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO2: GL BOOSTING TECHNIUE FOR SPIR Traditional SPIR setting Article metadata and full-text are the indexed data Documents retrieval based on similarity score Documents sorted/ranked based on:-

• Citation Count• Publication Date• Relevancy• Hybrid

Document Boosting Technique New index field Boosting Weight (BW) in the list of indexed fields BW to be multiplied with the similarity score => Better ranking for GL

Page 12: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO2: GL BOOSTING TECHNIUE FOR SPIR Scenarios for Considering GL References Theses and Dissertations Higher Reference Count

Boosting Rules for Experiment Rule 1: If the article or the reference is of non-GL type, BW is 1.0

Rule 2: If the reference is of GL type and its reference count is more than 2, BW is 1.5

Rule 3: If the reference type is a thesis, BW is 1.25

Page 13: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO2: EXPERIMENT DETAILS Ten research topics were used as the search keywords for the experiment.

Ranking TechniquesCCT – Citation CountBST – BM25 Similarity ScoreCST – Combined Score Technique (CCT + BST)GBT – GL Boosting Technique

Page 14: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO2: EXPERIMENT DETAILS - EVALUATIONGrey Literature Availability Measure (GLAM) Accounts for both the retrieved GL articles count as well as the corresponding

ranks Summation of two internal metrics GL Count (GLC) and in-query MRR (iMRR) Higher GLAM values indicate higher presence of GL articles in the query

results, along with better ranks.

GL Count (GLC) Count of GL materials that are retrieved in the query

In-query MRR (iMRR) a modified version of Mean Reciprocal Rank (Voorhees, 1999) iMRR is calculated within a single query (unlike MRR)

• Step1: The reciprocal ranks of GL articles in the query are identified • Step2: The sum of the reciprocal ranks is divided by the GL articles

count (GLC) to form the iMRR value

Page 15: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

RO2: EXPERIMENT DETAILS - EVALUATION Evaluation with two ranked lists (N@10 & N@20) for each research topic

Most users rarely go beyond the top 2 or 3 search results pages (Van Deursen & Van Dijk, 2009).

For N@10, GBT has higher GLAM than the benchmarking techniques

For N@20, two research topics ‘interaction design’ and ‘wireless networks’ are exceptions

Page 16: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

LIMITATIONS RO1 – Analysis of Bibliographic References in ACM DL Extract

Limited to quantifying the referencing of GL articles in the extract

In-depth qualitative studies are required to identify the citation motivations of authors based on the citation contexts of GL references in articles

RO2 - GL Boosting Experiment for SPIR Applicable only for systems where bibliographic references are parsed

and indexed

Few boosting rules with manually set boosting weights

Applicable only for referenced GL materials and not new GL materials

Results need to be validated with datasets from other disciplines

Page 17: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

FUTURE WORK Implement GL boosting technique for recommending papers for the

literature search task “Finding similar papers based on a seed set of papers” Conduct Offline evaluation for predicting previously referenced GL

materials Conduct User evaluation study to measure user opinion and

satisfaction levels

Evaluate GL boosting technique for Scientific Paper Recommender Systems (SPRS) setting, specifically for Collaborative Filtering algorithms

Re-train AnyStyle parsing service for identifying different types of GL in bibliographic references of articles

Page 18: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

CONCLUSION Analyzed the bibliographic references of articles from an extract of the

ACM DL for ascertaining the level of GL referencing GL materials were found in nearly 16% of the references GL referencing was higher in proceedings article-types particularly

tutorial, demo and workshop papers

Proposed GL boosting technique for SPIR settings

An IR experiment was conducted to validate the effectiveness of the boosting technique using a novel evaluation metric, with ten research topics

Case raised for considering GL references while retrieving/recommending research papers for literature search tasks

Page 19: More Than Just Black and White: A Case for Grey Literature References in Scientific Paper Information Retrieval Systems

THANK YOU