tagcurate: crowdsourcing the verification of biomedical … · 2020-06-18 · introduction...

24
TagCurate: Crowdsourcing the Verification of Biomedical Annotations to Mobile Users Bahar Sateli Sebastien Luong Ren´ e Witte Semantic Software Lab Department of Computer Science and Software Engineering Concordia University, Montr´ eal, Canada NETTAB 2013 Bahar Sateli, Sebastien Luong, Ren´ e Witte The TagCurate System 1 / 16

Upload: others

Post on 08-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

TagCurate:Crowdsourcing the Verification of

Biomedical Annotations to Mobile Users

Bahar Sateli Sebastien Luong Rene Witte

Semantic Software LabDepartment of Computer Science and Software Engineering

Concordia University, Montreal, Canada

NETTAB 2013

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 1 / 16

Page 2: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Natural Language ProcessingTagCurate Motivation

1 Introduction

2 TagCurate System

3 Android-NLP Integration

4 Conclusion

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 2 / 16

Page 3: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Natural Language ProcessingTagCurate Motivation

Natural Language Processing (NLP)

Definition

A branch of Artificial Intelligence that uses various techniques to processcontent written in a natural language, e.g., English or German.

Bottleneck: Gold Standard Corpora

Manually annotated documents required for training & testing NLP pipelines(especially for machine learning components).

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 3 / 16

Page 4: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Natural Language ProcessingTagCurate Motivation

Can we ‘crowdsource’ some of this work to mobile users?

Challenge: Current Web-based annotation frameworks (e.g., GATE Teamware)not designed for mobile use

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 4 / 16

Page 5: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

1 Introduction

2 TagCurate SystemSystem ArchitectureWeb-based InterfaceAndroid App

3 Android-NLP Integration

4 Conclusion

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 5 / 16

Page 6: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

System Architecture

The Crowd Task ManagerTagCurate System

Web Server

Database

Tagreement

Client-Server Model

RESTful communication over HTTPTagreement component is responsible for managing the crowdsourcing aswell as measuring (dis)agreements

User Groups

Task Managers, define verification tasks using the web-based interface

e.g., NLP pipeline developers, literature curators, . . .

The Crowd, verify (biomedical) annotations using the Android app

i.e., Virtually anyone with access to an Android-enabled device

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 6 / 16

Page 7: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

Tagreement Web-based Interface

Task Managers can define and supervise crowdsourcing tasks

Currently, only accepts GATE-formatted corpora

Stores an internal representation of each tag for distributed verification

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 7 / 16

Page 8: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

Tagreement Web-based Interface

Task Managers can define and supervise crowdsourcing tasks

Currently, only accepts GATE-formatted corpora

Stores an internal representation of each tag for distributed verification

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 7 / 16

Page 9: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

TagCurate Android App

Developed based on the latest Android distribution (Jelly Bean version 4.3)

Responsive design for phones and tablets

Users authenticate themselves on the server

Users pull tags from server

Temporary storage of verification history

View tags in context

Verify whether a tag is a case of:

True Positive (correct)

False Positive (spurious)

Modify tags features

Pairs of < key, value >

Modifications reflect in the tag representation

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 8 / 16

Page 10: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

TagCurate Android App

Developed based on the latest Android distribution (Jelly Bean version 4.3)

Responsive design for phones and tablets

Users authenticate themselves on the server

Users pull tags from server

Temporary storage of verification history

View tags in context

Verify whether a tag is a case of:

True Positive (correct)

False Positive (spurious)

Modify tags features

Pairs of < key, value >

Modifications reflect in the tag representation

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 8 / 16

Page 11: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

TagCurate Android App

Developed based on the latest Android distribution (Jelly Bean version 4.3)

Responsive design for phones and tablets

Users authenticate themselves on the server

Users pull tags from server

Temporary storage of verification history

View tags in context

Verify whether a tag is a case of:

True Positive (correct)

False Positive (spurious)

Modify tags features

Pairs of < key, value >

Modifications reflect in the tag representation

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 8 / 16

Page 12: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

TagCurate Android App

Developed based on the latest Android distribution (Jelly Bean version 4.3)

Responsive design for phones and tablets

Users authenticate themselves on the server

Users pull tags from server

Temporary storage of verification history

View tags in context

Verify whether a tag is a case of:

True Positive (correct)

False Positive (spurious)

Modify tags features

Pairs of < key, value >

Modifications reflect in the tag representation

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 8 / 16

Page 13: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

System ArchitectureWeb-based InterfaceAndroid App

What about the missing tags?

Manual Annotation

Users select a text span and assign type and features to the generated tag.

Pros

Human-generated tags usually have a higher quality

Cons

Difficult task on devices with small screenDifficult to achieve an adequate inter-annotator agreementRequires well-established annotation guidelines

Automatic Annotation

Users invoke domain-specific text mining pipelines that generate various tagsfrom text.

Pros

Reuse existing text mining pipelines

Cons

Text mining techniques are resource-intensive

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 9 / 16

Page 14: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

1 Introduction

2 TagCurate System

3 Android-NLP IntegrationMobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

4 Conclusion

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 10 / 16

Page 15: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Mobile Applications of NLP

Automatic SummarizationCondensed version of document(s)Various types: Generic, Focused, Updatee.g., Summly

Question AnsweringAnswering factual questionse.g., Apple’s Siri App

Information Extraction (IE)Identifying instances of specific classes

e.g., Persons, Organization, Events, etc.

Content DevelopmentCombining other NLP servicesGenerate new or complementary content

Other domain-specific servicese-Health, e-Learning, etc.

(Image Courtesy of Yahoo!)

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 11 / 16

Page 16: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Mobile Applications of NLP

Automatic SummarizationCondensed version of document(s)Various types: Generic, Focused, Updatee.g., Summly

Question AnsweringAnswering factual questionse.g., Apple’s Siri App

Information Extraction (IE)Identifying instances of specific classes

e.g., Persons, Organization, Events, etc.

Content DevelopmentCombining other NLP servicesGenerate new or complementary content

Other domain-specific servicese-Health, e-Learning, etc.

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 11 / 16

Page 17: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Mobile Applications of NLP

Automatic SummarizationCondensed version of document(s)Various types: Generic, Focused, Updatee.g., Summly

Question AnsweringAnswering factual questionse.g., Apple’s Siri App

Information Extraction (IE)Identifying instances of specific classes

e.g., Persons, Organization, Events, etc.

Content DevelopmentCombining other NLP servicesGenerate new or complementary content

Other domain-specific servicese-Health, e-Learning, etc.

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 11 / 16

Page 18: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Mobile Applications of NLP

Automatic SummarizationCondensed version of document(s)Various types: Generic, Focused, Updatee.g., Summly

Question AnsweringAnswering factual questionse.g., Apple’s Siri App

Information Extraction (IE)Identifying instances of specific classes

e.g., Persons, Organization, Events, etc.

Content DevelopmentCombining other NLP servicesGenerate new or complementary content

Other domain-specific servicese-Health, e-Learning, etc.

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 11 / 16

Page 19: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Mobile Applications of NLP

Automatic SummarizationCondensed version of document(s)Various types: Generic, Focused, Updatee.g., Summly

Question AnsweringAnswering factual questionse.g., Apple’s Siri App

Information Extraction (IE)Identifying instances of specific classes

e.g., Persons, Organization, Events, etc.

Content DevelopmentCombining other NLP servicesGenerate new or complementary content

Other domain-specific servicese-Health, e-Learning, etc.

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 11 / 16

Page 20: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Mobile Natural Language Processing

What we know

Numerous mobile applications can benefit from NLP support

Robust, open-source NLP frameworks are already available

However, NLP analysis is a very resource-intensive task!

Semantic Assistants Android-NLP Integration

Novel Android-NLP integration approach

Provides Separation of ConcernsNLP developer does not need to know AndroidAndroid app developer does not need to know NLP

Android library for NLP service execution, rather than multiple apps

Enable users to benefit from complex NLP services in their tasks

[B. Sateli, G. Cook, R. Witte, “Smarter Mobile Apps through IntegratedNatural Language Processing Services”, MobiWIS 2013]

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 12 / 16

Page 21: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Semantic Assistants Framework

Existing open-source (AGPL3) service-oriented architecture

Brokers NLP pipelines as standard W3C Web services

Avoids context-switching of user to external text mining applications

Brings NLP analysis directly to various applications via plug-ins

NLP ServiceResult

...

− Calling an NLP Service

FocusedSummarization

− Runtime Parameters

Word Processor

Client

NLP Service 1

NLP Service 2

NLP Service n

Server

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 13 / 16

Page 22: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Semantic Assistants NLP Intents

Web

Serv

er

Language

Descriptions

Service

Semantic Assistants

NLP Service Connector

Semantic Assistants Server

Linux Kernel

Ap

plic

atio

ns

Lib

rary

User Android−Enabled Device

Service Information

Service InvocationSystem

Libraries Abstraction Layer

...Assistants

SA Android

Semantic

Other

Any

AppApp

Client-Server ModelClient is an Android appServer-side component is the Semantic Assistants server

RESTful communication over HTTP(S)

Handles various NLP service result formatsAnnotation, e.g., a person name in textDocument, e.g., summary of a long webpageFiles, e.g., an HTML document

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 14 / 16

Page 23: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Mobile Applications of NLPSemantic Assistants FrameworkDeveloping NLP Android Apps

Developing NLP Android Apps

Separation of Concerns

Android Developer

Identify the NLP task

Extend the SA intents by choosing aunique package name for this new service

Embed the SA Android library in a newAndroid app

Invoke the intent in app using the library

NLP Developer

Develop the concrete NLP pipeline

Deploy the pipeline on a SA server

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 15 / 16

Page 24: TagCurate: Crowdsourcing the Verification of Biomedical … · 2020-06-18 · Introduction TagCurate System Android-NLP Integration Conclusion Natural Language Processing TagCurate

IntroductionTagCurate System

Android-NLP IntegrationConclusion

Summary and Outlook

Summary and Outlook

Summary

Distribute annotation jobs to large user groups

Expert annotators can focus on quality control and difficult cases

Easily bring NLP pipelines to (Android) mobile apps

Ongoing work

TagCurate app facelift

Expanding the user profiles

Finding incentives and introducing social aspects

Add annotation capabilities (both manual and semi-automatic)

Find out more. . .

Twitter: @SemSoft

Web: http://www.semanticsoftware.info/

Bahar Sateli, Sebastien Luong, Rene Witte The TagCurate System 16 / 16