1 cbioc: collaborative bio- curation chitta baral department of computer science and engineering...

18
1 CBioC: Collaborative Bio-Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

Post on 19-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

1

CBioC: Collaborative Bio-Curation

Chitta BaralDepartment of Computer Science and Engineering

Arizona State University

Page 2: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

2

Agenda

Introduction Using the C-BioCurator System

Overall Architecture Installation User Authentication User Interaction Text extraction systems Existing databases

System Implementation Conclusion and Future Work

Page 3: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

3

Introduction

Motivation Our goal in this paper is to help get information

nuggets of articles and abstracts and store in a database.

The challenge is that the number of articles are huge and they keep growing, and need to process natural language.

The two existing approaches human curation and use of automatic information

extraction systems They are not able to meet the challenge, as the first is

expensive, while the second is error-prone.

Page 4: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

4

Introduction (cont’d)

Approach: We propose a solution that is inexpensive, and that scales up. Our approach takes advantage of automatic information

extraction methods as a starting point, Based on the premise that if there are a lot of articles, then

there must be a lot of readers and authors of these articles. We provide a mechanism by which the readers of the

articles can participate and collaborate in the curation of information.

We refer to our approach as “Collaborative Curation''.

Page 5: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

5

Introduction (cont’d)

Results: We report on our system CBioC (short

for Collaborative Bio-Curator) which facilitates collaborative curation.

Availability: A prototype of the web interaction version

is currently available at http://www.cbioc.org

Page 6: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

6

Using the C-BioCurator System

Overall Architecture: The two main components of our CBioC system

are (i) the CBioC interface and (ii) the CBioC database.

The user interacts with the CBioC system through the CBioC interface, and

The curated or extracted data (from the abstracts and texts of the articles) together with the user interaction with respect to these data is stored in the CBioC database.

Page 7: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

7

Using the C-BioCurator System (cont’d)

Extractor Systems

DownloadAgent

TextDBExistingDB

Data Format Exchange System

BioPax

CBioCDatabase

Collaborative Bio-Curation System

CBioC Interface

Browse Facts

Vote FactsAdd/Modify New

FactsAdd New SchemaInvoke

IntExtractorUser

Management

DownloadAgent

DIP

Reactome

… …Nature

SciencePubmed

... ...

Page 8: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

8

Using the C-BioCurator System (cont’d)

Installation and Invocation A researcher need to download our

system and install it in her computer. Whenever the researcher accesses a

web page from where she can access an article or an abstract, the CBioC system wakes up and creates an interaction frame.

Page 9: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

9

(a)

Toggle on/off the Web Band

Modify/Add Fact (b)

With Web Band Version

Page 10: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

10

(a)(b)

Without Web Band Version

Page 11: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

11

Using the C-BioCurator System (cont’d)

User authentication The authentication is necessary as

different kinds of user are allowed different levels of interaction by our system. For example, anonymous (non-registered)

users are only allowed browsing ability, and are not allowed to leave any impression (such as adding facts or voting) for the future.

Page 12: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

12

(a)

(b) (d)

(c)

Page 13: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

13

Using the C-BioCurator System (cont’d)

User Interaction Past the user authentication, the CBioC uses the

pubmed ID passed to search the database regarding any data about that article.

If it finds such data, it then displays them in the interaction frame, taking into account the researcher’s preferences.

It allows registered researchers to vote for the correctness of individual data tuples.

Page 14: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

14

Using the C-BioCurator System (cont’d)

Text extraction systems We periodically run (off-line) the best available

automated text extraction systems on the pubmed abstracts and store the results in the CBioC database.

If no information regarding a particular abstract is found in the CBioC database, then the information extraction systems will be run (on-line) on that abstract and the results will be displayed.

Page 15: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

15

Using the C-BioCurator System (cont’d)

Existing databases Protein Interaction (Extracted,

Exchanged (e.g., BIND)) Reference User account Voting

Page 16: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

16

CBioC DataBase

Extractor SystemsWeb Application

Web Forms

Server ControlsDB Connection/

Adapter

Web Band

Band Oject

Communication

Event Handler

Browser Helper

Sprcified Text Watcher

Communication

ES Adapter

CBioC DataBase

invoke

retrieve/store

trigger

Inform

Network

store

Implementation

Page 17: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

17

Conclusion and Future Work

we have presented a vision that overcomes and suggests a solution to the seemingly

insurmountable problem of being able to curate information nuggets from the extremely large and fast growing body of bio-medical literature.

We have developed a prototype implementing our solution, and will be improved continuously.

We believe that our proposed solution could really have a big impact on Bio-medical research, and hence this paper.

Page 18: 1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University

18

Conclusion and Future Work (cont’d)

Our approach of using mass collaboration to curate bio-medical texts can be further generalized to the web as a whole (or other document repositories) where a group of people interested in a group of

documents can collaborate to extract the knowledge buried in those documents, and

simultaneously using automated extracted systems as a first step.

We refer to this as collaborative meta-web, and are working on expanding it to many other domains.