data landscapes neuinfo.org anita bandrowski, ph. d. university of california, san diego

21
Data Landscapes neuinfo.org Anita Bandrowski, Ph. D. University of California, San Diego

Upload: charles-richardson

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Data Landscapesneuinfo.org

Anita Bandrowski, Ph. D.University of California, San Diego

Overview

• Brief overview of NIF philosophy• Examples of data about addiction• Why you should never use google to

answer any scientific question• How can we make google better?

Power!

• How many subject/patients do we need to be relatively certain that we are correct?

• More than you can afford?

• If YFGM gave each of you 1B dollars, would that solve the problem?

• But, what if:– Big data from small data?

Addiction is a large problem

Solving the large problems of science?

• Observation• Experimentation• Modeling• Cooperative data

intensive science

A SHARED UNDERSTANDING OF THE GENETICS OF ADDICTION,HOW CAN EVERYONE PLAY?

• NIF is an initiative of the NIH Blueprint consortium of institutes– What types of resources (data, tools, materials, services) are available to the

neuroscience community?– How many are there?– What domains do they cover? What domains do they not cover?– Where are they?

• Web sites• Databases• Literature• Supplementary material

– Who uses them?– Who creates them?– How can we find them?– How can we make them better in the future?

http://neuinfo.org

• PDF files

• Desk drawers

NIF: A New Type of Entity for New Modes of Scientific Dissemination

• NIF’s mission is to maximize the awareness of, access to and utility of digital resources produced worldwide to enable better science and promote efficient use– NIF unites neuroscience information without respect to domain, funding

agency, institute or community– NIF is a library for scholarly output that is a web enabled resource and

not a paper– Aggregates all the different databases, tools and resources now

produced by the scientific community– Makes them searchable from a single interface– A practical approach to the data deluge– Educate neuroscientists and students about effective data sharing

Surveying the resource landscape

NIF resource registry: listing of > 6000 databases, tools, materials, services, websites (> 2500 databases)

NIF data federation: Pub Med Central for data

NIF was designed to accommodate the multiplicity of heterogeneous and distributed data resources, providing deep query of the contents and unified views

200 sources> 360 M records

NIF Semantic Framework: NIFSTD ontology

• NIF covers multiple structural scales and domains of relevance to neuroscience• Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene

Ontology, Chebi, Protein Ontology

NIFSTD

Organism

NS FunctionMolecule InvestigationSubcellular structure

Macromolecule Gene

Molecule Descriptors

Techniques

Reagent Protocols

Cell

Resource Instrument

Dysfunction QualityAnatomical Structure

Ontologies provide the universals for integrating across disparate data by linking them to human knowledge models

Neurolex: Machine-processable concepts for neuroscience

• Machine-processable lexical units

• Connected via relationships• Identified by a unique

identifier (URL)• Computable index for

neuroscience• Framework for linking

knowledge, claims and data

Built using a semantic wiki

NIF Analytics: The Neuroscience Landscape

Ontologies provide a semantic framework for understanding data/resource landscape

Where are the data?

StriatumHypothalamusOlfactory bulb

Cerebral cortex

Brain

Brai

n re

gion

Data source

Vadim Astakhov, Kepler Workflow Engine

A data homunculus?

Genetics of addiction?GeneProteinSubcellular componentsCellsCell microcircuitsCell macrocircuitsNetworksBrain regionsPNSWhole organismBehaving organism (environment)Networks of organismsPopulations

Genetics of addiction?GeneProteinSubcellular componentsCellsCell microcircuitsCell macrocircuitsNetworksBrain regionsPNSWhole organismBehaving organism (environment)Networks of organismsPopulations

Genetics of addiction?• Addiction is a disease of subpopulations of humans who take

sociologically undesirable drugs or sociologically desirable drugs at undesirable concentrations

• Drug is a molecule that does not exist in the body, an environmental factor

• Drugs are metabolized by the digestive system and act after crossing the BBB

• Drugs modify the activity of existing proteins on vastly different time scales

• Drugs modify behaviors that depend on the actions of an orchestra of neurons acting within circuits that all have a purpose that is not to take drugs

The ecosystem is diverse and messy (and that’s OK)

NIF favors a hybrid, tiered, federated system

• Domain knowledge– Ontologies

• Claims and observations– Virtuoso RDF triples

• Data– Data federation– Spatial data– Workflows

• Narrative– Full text access

Neuron Brain part DiseaseOrganism Gene

Caudate projects to Snpc Grm1 is upregulated in

chronic cocaineBetz cells

degenerate in ALS

Data Knowledge

Wish list: Cooperative science• A mission that will engage the entire neuroscience community

and beyond• An active community contribution model where everyone is

expected to contribute their outputs, not just a selected few– Diverse contributions are tracked and recognized– Spatial-semantic-genetic-temporal frameworks make data

discoverable-usable-integratable and help fill in the gaps

• A platform that moves neuroscience into the web– Networking data, knowledge, tools, models, efforts, people, compute

resources, simulation– Supports digital research objects as first order contributions, not just

narrative– Works through and with existing platforms to improve them where

possible

Cooperative system: “...individual components that appear to be “selfish” and independent work together to create a highly complex, greater-than-the-sum-of-its-parts system.”

20

neurolex.org

•INCF Community encyclopedia•Standardize vocabulary•Define all vocabulary, terms, protocols, brain structures, diseases, etc•Living review articles•Build and maintain working ontologiesLinks to data, models and literature•Semantic organization, search, analysis and integration•Global directory of all shared vocabularies, CDEs, etc

Slide courtesy of Sean Hill

Community Platforms: Researchers-tools-data-computing