1 cbioc: collaborative bio- curation chitta baral department of computer science and engineering...
Post on 19-Dec-2015
218 views
TRANSCRIPT
1
CBioC: Collaborative Bio-Curation
Chitta BaralDepartment of Computer Science and Engineering
Arizona State University
2
Agenda
Introduction Using the C-BioCurator System
Overall Architecture Installation User Authentication User Interaction Text extraction systems Existing databases
System Implementation Conclusion and Future Work
3
Introduction
Motivation Our goal in this paper is to help get information
nuggets of articles and abstracts and store in a database.
The challenge is that the number of articles are huge and they keep growing, and need to process natural language.
The two existing approaches human curation and use of automatic information
extraction systems They are not able to meet the challenge, as the first is
expensive, while the second is error-prone.
4
Introduction (cont’d)
Approach: We propose a solution that is inexpensive, and that scales up. Our approach takes advantage of automatic information
extraction methods as a starting point, Based on the premise that if there are a lot of articles, then
there must be a lot of readers and authors of these articles. We provide a mechanism by which the readers of the
articles can participate and collaborate in the curation of information.
We refer to our approach as “Collaborative Curation''.
5
Introduction (cont’d)
Results: We report on our system CBioC (short
for Collaborative Bio-Curator) which facilitates collaborative curation.
Availability: A prototype of the web interaction version
is currently available at http://www.cbioc.org
6
Using the C-BioCurator System
Overall Architecture: The two main components of our CBioC system
are (i) the CBioC interface and (ii) the CBioC database.
The user interacts with the CBioC system through the CBioC interface, and
The curated or extracted data (from the abstracts and texts of the articles) together with the user interaction with respect to these data is stored in the CBioC database.
7
Using the C-BioCurator System (cont’d)
Extractor Systems
DownloadAgent
TextDBExistingDB
Data Format Exchange System
BioPax
CBioCDatabase
Collaborative Bio-Curation System
CBioC Interface
Browse Facts
Vote FactsAdd/Modify New
FactsAdd New SchemaInvoke
IntExtractorUser
Management
DownloadAgent
DIP
Reactome
… …Nature
SciencePubmed
... ...
8
Using the C-BioCurator System (cont’d)
Installation and Invocation A researcher need to download our
system and install it in her computer. Whenever the researcher accesses a
web page from where she can access an article or an abstract, the CBioC system wakes up and creates an interaction frame.
9
(a)
Toggle on/off the Web Band
Modify/Add Fact (b)
With Web Band Version
10
(a)(b)
Without Web Band Version
11
Using the C-BioCurator System (cont’d)
User authentication The authentication is necessary as
different kinds of user are allowed different levels of interaction by our system. For example, anonymous (non-registered)
users are only allowed browsing ability, and are not allowed to leave any impression (such as adding facts or voting) for the future.
12
(a)
(b) (d)
(c)
13
Using the C-BioCurator System (cont’d)
User Interaction Past the user authentication, the CBioC uses the
pubmed ID passed to search the database regarding any data about that article.
If it finds such data, it then displays them in the interaction frame, taking into account the researcher’s preferences.
It allows registered researchers to vote for the correctness of individual data tuples.
14
Using the C-BioCurator System (cont’d)
Text extraction systems We periodically run (off-line) the best available
automated text extraction systems on the pubmed abstracts and store the results in the CBioC database.
If no information regarding a particular abstract is found in the CBioC database, then the information extraction systems will be run (on-line) on that abstract and the results will be displayed.
15
Using the C-BioCurator System (cont’d)
Existing databases Protein Interaction (Extracted,
Exchanged (e.g., BIND)) Reference User account Voting
16
CBioC DataBase
Extractor SystemsWeb Application
Web Forms
Server ControlsDB Connection/
Adapter
Web Band
Band Oject
Communication
Event Handler
Browser Helper
Sprcified Text Watcher
Communication
ES Adapter
CBioC DataBase
invoke
retrieve/store
trigger
Inform
Network
store
Implementation
17
Conclusion and Future Work
we have presented a vision that overcomes and suggests a solution to the seemingly
insurmountable problem of being able to curate information nuggets from the extremely large and fast growing body of bio-medical literature.
We have developed a prototype implementing our solution, and will be improved continuously.
We believe that our proposed solution could really have a big impact on Bio-medical research, and hence this paper.
18
Conclusion and Future Work (cont’d)
Our approach of using mass collaboration to curate bio-medical texts can be further generalized to the web as a whole (or other document repositories) where a group of people interested in a group of
documents can collaborate to extract the knowledge buried in those documents, and
simultaneously using automated extracted systems as a first step.
We refer to this as collaborative meta-web, and are working on expanding it to many other domains.