cbioc: massive collaborative curation of biomedical literature future directions
Post on 21-Dec-2015
217 views
TRANSCRIPT
Recap: The Problem
Curation of “knowledge” nuggets from Biomedical articles. About 15 million abstracts in Pubmed 3 million published by US and EU researchers
during 1994-2004 (800 articles per day) 300 K articles published so far reporting
protein-protein interactions in human, yeast and mouse. BIND (in 7 yrs) -- 23K ; DIP – 3K; MINT – 2.4K.
Recap: our proposed solution
Harness available human power: scientists around the world
Seamlessly provide curation platform (web-based) to pop up while they research
Even a little input from each counts Collaborators get immediate rewards
Future work & projects
Extraction of other relationships (gene-disease, gene-organ...)
Have prototype in related project, need improvement and formal testing (measuring accuracy)
Extraction of organism info for each entity in a relationship
High-priority. Use existing software for extraction, but need to use biological databases and algorithms for deducing info (not explicit), and allow users to correct this info.
Example, PMID 16107876.
Use ontologies and some automated tools to ensure consistency and cross-link info
2 people. Information entered by users needs to be validated against existing DB & ontologies. Also, need to tag our data for cross-reference. Example
Future work & projects (2)
Support query processing in CBioC at a basic level
Users want/need to access the facts directly, not only “related articles” but facts about a specific vote patterns, entities, etc.
Incorporate data from other interaction databases
Done for one (BIND), but needs to be revamped to include other databases & left semi-automatized for updates
Integrate CBioC data w/ other traditionally curated databases
Allow users to transparently access and query all the biological interaction databases. Need to map schemas, select appropriate sources “on the fly”, and provide provenance explanation on query results.
Future work & projects (3)
Image extension - extracts images & information about images and allows collaborative curation.
Take PDFs & other structured documents, and extract images with their captions & references within the text, then let users polish. Related.
Develop adaptable software platform for similar applications.
This is to be a flexible (adaptable) system that users can “generate” online for their own scientific needs. A “non-scientific” example.
Curating & representing pathways: linking related facts
There are others that have done representation, but need to design & implement UI consistent w/ CBioC for curation. Example.
Future work & projects (4)
Recommender system that uses data from “user network” (votes, authors, etc)
Have a related project that recommends, but need to take advantage of CBioC’s data.
Handle incomplete data in CBioC Data obtained from text extraction or data integration is inherently incomplete. Here, we seek to predict missing values –using domain knowledge- and process queries even w/ the incomplete data
Handle uncertain data in CBioC Associate confidence levels to all the facts curated by CBioC based on user trustworthiness and use these appropriately while processing user queries.
Support advanced query processing
UI to allow uncertainty & incompletness handling features described above.