1 cbioc: collaborative bio- curation chitta baral department of computer science and engineering...

1

CBioC: Collaborative Bio-Curation

Chitta BaralDepartment of Computer Science and Engineering

Arizona State University

2

Agenda

Introduction Using the C-BioCurator System

Overall Architecture Installation User Authentication User Interaction Text extraction systems Existing databases

System Implementation Conclusion and Future Work

3

Introduction

Motivation Our goal in this paper is to help get information

nuggets of articles and abstracts and store in a database.

The challenge is that the number of articles are huge and they keep growing, and need to process natural language.

The two existing approaches human curation and use of automatic information

extraction systems They are not able to meet the challenge, as the first is

expensive, while the second is error-prone.

4

Introduction (cont’d)

Approach: We propose a solution that is inexpensive, and that scales up. Our approach takes advantage of automatic information

extraction methods as a starting point, Based on the premise that if there are a lot of articles, then

there must be a lot of readers and authors of these articles. We provide a mechanism by which the readers of the

articles can participate and collaborate in the curation of information.

We refer to our approach as “Collaborative Curation''.

5

Introduction (cont’d)

Results: We report on our system CBioC (short

for Collaborative Bio-Curator) which facilitates collaborative curation.

Availability: A prototype of the web interaction version

is currently available at http://www.cbioc.org

6

Using the C-BioCurator System

Overall Architecture: The two main components of our CBioC system

are (i) the CBioC interface and (ii) the CBioC database.

The user interacts with the CBioC system through the CBioC interface, and

The curated or extracted data (from the abstracts and texts of the articles) together with the user interaction with respect to these data is stored in the CBioC database.

7

Using the C-BioCurator System (cont’d)

Extractor Systems

DownloadAgent

TextDBExistingDB

Data Format Exchange System

BioPax

CBioCDatabase

Collaborative Bio-Curation System

CBioC Interface

Browse Facts

Vote FactsAdd/Modify New

FactsAdd New SchemaInvoke

IntExtractorUser

Management

DownloadAgent

DIP

Reactome

… …Nature

SciencePubmed

... ...

8


Installation and Invocation A researcher need to download our

system and install it in her computer. Whenever the researcher accesses a

web page from where she can access an article or an abstract, the CBioC system wakes up and creates an interaction frame.

9

(a)

Toggle on/off the Web Band

Modify/Add Fact (b)

With Web Band Version

10

(a)(b)

Without Web Band Version

11


User authentication The authentication is necessary as

different kinds of user are allowed different levels of interaction by our system. For example, anonymous (non-registered)

users are only allowed browsing ability, and are not allowed to leave any impression (such as adding facts or voting) for the future.

12

(a)

(b) (d)

(c)

13


User Interaction Past the user authentication, the CBioC uses the

pubmed ID passed to search the database regarding any data about that article.

If it finds such data, it then displays them in the interaction frame, taking into account the researcher’s preferences.

It allows registered researchers to vote for the correctness of individual data tuples.

14


Text extraction systems We periodically run (off-line) the best available

automated text extraction systems on the pubmed abstracts and store the results in the CBioC database.

If no information regarding a particular abstract is found in the CBioC database, then the information extraction systems will be run (on-line) on that abstract and the results will be displayed.

15


Existing databases Protein Interaction (Extracted,

Exchanged (e.g., BIND)) Reference User account Voting

16

CBioC DataBase

Extractor SystemsWeb Application

Web Forms

Server ControlsDB Connection/

Adapter

Web Band

Band Oject

Communication

Event Handler

Browser Helper

Sprcified Text Watcher

Communication

ES Adapter

CBioC DataBase

invoke

retrieve/store

trigger

Inform

Network

store

Implementation

17

Conclusion and Future Work

we have presented a vision that overcomes and suggests a solution to the seemingly

insurmountable problem of being able to curate information nuggets from the extremely large and fast growing body of bio-medical literature.

We have developed a prototype implementing our solution, and will be improved continuously.

We believe that our proposed solution could really have a big impact on Bio-medical research, and hence this paper.

18

Conclusion and Future Work (cont’d)

Our approach of using mass collaboration to curate bio-medical texts can be further generalized to the web as a whole (or other document repositories) where a group of people interested in a group of

documents can collaborate to extract the knowledge buried in those documents, and

simultaneously using automated extracted systems as a first step.

We refer to this as collaborative meta-web, and are working on expanding it to many other domains.

1 cbioc: collaborative bio- curation chitta baral department of computer science and engineering...

Documents

cbioc system

system cbioc short

cbioc database

cbioc interface

web band version slide

future work slide

web interaction version

interaction frame