future biology is semantic information...
TRANSCRIPT
Future biology is Semantic
information science
Tetsuro Toyoda, Ph.D.
Director of Bioinformatics And Systems Engineering Division, RIKEN,
Japan 1
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. www.base.riken.jp
Nature of Life=Information
Nature of Replication
(Self-replication)
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Life evolving
through replications and selections
Replication of genome information (DNA replication)
Various replications (Fertilized eggs)
Various groups of individuals
Natural Selection
Evolution Cycle (About 3.8billion years)
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Genome is blueprint of Life (Genetic Information )
Genomic information is recorded by combined DNA blocks. DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences of DNA blocks is Genome sequencing. Sequenced genomic information is represented by character information as shown below. ATGCTAGTCGCGTGCTAGTCGATCTCGTGCA・・・・・・ Human’s genome represented by about 3 billion letters.
Image of DNA blocks
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Biological diversity resulting from diversity of Genomic information
Sequences of DNA blocks evolve ↓means Information is evolved (Biological diversity)
DNA of human and microorganisms are consisted of the same 4 kinds of DNA blocks (G,A,C, and T). ↓ Any living organism hasn’t evolved as a material. (Common among living organisms)
Informatics viewpoint
Material viewpoint
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Phylogenetic Tree of Life
No matter the Information, Copy and Selection is the Driving Force of evolution
Phylogenetic Tree of
Open-Source Software
Information
Evolution of Open-source
programs undergo the same
Copy and Selection
data evolution process
as the Genome.
Previously, Digital and Genomic Information were separate
Recording media for information Expression of information (phenotype)
Digital In
form
ation
G
eno
mic In
form
ation
Cell
DNA
Copied DNA (Copied information)
CD、DVD
Copied CD (Copied information)
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Digital and Genomic Information merged with introduction of machine decoding of genomic
information from DNA
Digital
Info
rmatio
n
Gen
om
ic In
form
ation
DNA
CD, DVD
DNA Sequencer
DNA Synthesizer
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
“Intelligent” Cloud with DNA information
DNA Sequencer
DNA Synthesizer
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Could be a New Realm of Evolution
Natural Form of Evolution
SciNetS
Infrastructure for Collaboration
Part 2
Bioinformatics And Systems Engineering Division, RIKEN,
Japan
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. scinets.org
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
SciNetS Purpose, Methods and System
SciNetS: Science for the Planet and Society
SciNetS: Science, Innovation, Education, Technology, Synergy
SciNetS: Scientist's Networking System
What is SciNetS?
What is SciNetS? Cloud Computing
and Collaboration
sinets.org
As an integrated database, SciNetS provides a versatile database container compatible with international standards. Using the semantic web, SciNetS provides a „total incubation infrastructure system‟ for life science-related Virtual laboratories.
The Cloud as a Center for Virtual Laboratory Collaboration
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. sinets.org
1 10 100 1,000
10,000
100,000
1,000,000
Number of databases to be integrated
Number of databases hosted
by RIKEN SciNeS
10,000
Hosting databases separately
has a limitation
Number of databases (Virtual Labs) hosted
is larger than the number of servers in the
system.
Databases > Server computers > System
The many database projects provided by SciNetS
→ SciNetS is expected to function as a new academic
medium for publishing individual databases as entire
sets of research results.
Available at
http://database.riken.jp
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Single system hosts numerous databases
Scale out
sinets.org
Total file size: 4.5 Terabytes
Total data files: 120 Million
Public databases: 192
incl. DBs for mammals 23
DBs for plants 29
DBs for protein 17
Private databases: 190
Data records (instances): 7.5 Million
Semantic relationships: 26 Million
SciNetS Data Cloud
sinets.org
Databases contain meta data, a Semantic Web
archives
“Meta data” (RDF)
includes semantic
links among data
items and
resources
“Data resources”
include
experimental
results or statistical
results
Mouse Phenotype Database
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. sinets.org
Upper Ontology
Gene Allele System Phenotype
ENU
Allele
BRC
Mouse
Cell
Collaborating on Ontology Basis for Classification of Databases in the Cloud
sinets.org Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Illustration by
Hiroshi Masiya
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
A Semantic web of Mouse databases
sinets.org/db/mammal
Visualizing the Semantic Web
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. sinets.org/db/mammal
InterPhenome: International Data Sharing of Mouse Phenotypes
http://www.interphenome.org/
SciNetS
for Life Sciences
Part 3
Bioinformatics And Systems Engineering Division, RIKEN,
Japan
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
SciNetS Purpose, Methods and System
for Life Sciences
SciNetS: Science, Innovation, Education, Technology, Synergy
SciNetS: Scientist's Networking System
“Superbrain” solving biological problems by simulating artificial brain understanding the knowledge of biology
Insight
RIKEN Super Petaflop Computer
Superbrain
Collective
Intelligence
Limit of general search through database
Proposition: “My job is a jail”
job jail
Boolean understands that job ≠ jail
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Open-endedness in the brain
Proposition: “My job is a jail”
job jail
Restriction?
Brain tries to understand beyond the
framework of general concept Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Developing a brain type database(Superbrain)
↑ All calculations take within 1 second →Realization of high-speed web research
Research Result
Display the hit
document by
rounding up concept
units such as
ontology class,
candidate gene, etc.
Gene
1
Gene
3
Gene
5
Keyword Search by user
keywords (Diabetes)
Conditions (chromosomal region)
hit
hit
hit
Gene
1
Gene
4
Momentary
inference !
Documents including
Omics information,
literature, etc.
~10 million Documentron
Concepts Unit
~100 thousands
neuron
Document file for browse
~10 million
Documentron
Inference layer
(attractor)
Concept layer
(statistical test) Expression
layer (Document
summary )
Input layer (text search)
Out put Layer (web Service)
Concepts Unit
~100 thousands
neuron
Large scale system which includes over 10 million documents and concepts for each neuron.
New high speed technology calculates tens of thousands of spreadsheets
↑based on this table, statistical inferences such as calculation of statistic, Fisher’s exact test, can be done.
Gene X
hit
hit
hit
The technology, creating 3-dimension spreadsheets
for every gene (X=1,2,,,several tens of thousands of )
at once.
各種概念とキーワードとの関連性を網羅的かつ、高速・統計的に評価できる
Including
search word
Excluding
search word
Include
Gene X
“A “units of
Documents
“B “units of
Documents
Exclude
Gene X
“C “units of
Documents
“D “units of
Documents
Documentation files including literatures
(~10 millions)
Gene name , concept units, etc.
(~100 thousands)
Correspondence relationship
which curators associated
genes with documents.
Classifications by types of
document database
(Medline, OMIM, PPI,
Bio resources, etc.)
Keyword search by
users.
4 ↑ All calculations within 1 second →Realization of high-speed web research
(Calculating signal processing among neurons on brain-type database)
Ranking results of
candidate genes↓ Keyword : type 2 diabetes
Genome Condition: gene of no.6 chromosome
Interval
Inference of candidate gene
Rough mapping based on crossing
experiment
(~10Mbp wide interval)
Too many genes
Searching for candidate genes by
positional cloning.
Identification of causative gene variation
So far contributed to 65 achievements of searching researches on disease-causing genes.
Conditions of
compound retrieval
Search example: Inference of causative gene by positional cloning.
http://omicspace.riken.jp/PosMed/
PosMed: “Positional MEDLINE” assists your positional-cloning studies
SciNetS
for the Planet and Society
Part 4
Bioinformatics And Systems Engineering Division, RIKEN,
Japan
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. genocon.org
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
SciNetS Purpose, Methods and System
for the Planet and Society
SciNetS: Science for the Planet and Society
SciNetS: Science, Innovation, Education, Technology, Synergy
SciNetS: Scientist's Networking System
Fossil Resources Biomass
Petrochemistry
process
Energy Chemical
Products Materials
Distillation
He
avy O
il
Gas o
il
Kero
sene
Ga
so
line
Nap
hth
a
Bioprocess
Syngas
Oil, e
tc
Sugar
Development
of
Innovation
Technology
21st Century - Sustainable Society
~ Carbon-neutral ~ 20th Century - Consumptive Society
Creation of
New Industry
CO2
Gly
co
syla
tion
CO
2
CO2
CO2
Bio
Materials Chemical
Products
Crude
oil
Energy
Applications
of
Renewable
Resource
Plants
↓ Search for resources Development of oilfield (Fossil resources)
↓ Search for resources Development of genome (Biological Resource)
Sustainable society supported by Green Biotechnology
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Toward an age of Rational Design using genomic data.
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Creating bio resources from information
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
genocon.org
GenoCon International Rational Genome
Design Contest offers contestants the chance to compete in technologies for rational genome design using a browser-based programming environment provided at http://genocon.org
GenoCon Open platform for rational Genome design
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Optimize genome design for plants that absorb and resolve formaldehyde to prevent sick house syndrome.
• Introduce two enzymes, extracted
from bacteria, into Calvin cycle of
plants
•
• Enhancement of formaldehyde-
resistance by shepherd's-purse and
Tobacco.
sh
ep
he
rd„s
-purs
e
To
ba
cco
AB: Gene introduction strain
CK: Control
PHI
HPS Hexulose 6-phosphate synthase
Hexulose 6-phosphate isomerase
Introduced
enzymes Patent applied by Prof. Izui,
Kyoto/Kinki University
Formaldehyde tolerance designed by Prof. Izui is the inaugural challenge in GenoCon
34
34 genocon.org
SciNetS provides a web-based programming interface for a user to process each data item in our database cloud, where all the programs and files the user has created are stored.
Programming on a web browser
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
genocon.org
Creating Bio Resources from Information Resources
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. genocon.org
SciNetS
Semantic JSON
PowerTool
Part 5
Bioinformatics And Systems Engineering Division, RIKEN,
Japan
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. semantic-json.org
Short URL services such as http://bit.ly are often used in social media such as twitter. These offer similar domain mapping services by mapping external URLs under various domain names to internal URLs of a uniform domain with a short internal ID that corresponds to the external one in the repository.
During semantic inference, a user gains access to a Semantic-JSON repository by invoking Semantic-JSON URIs, consisting of an internal ID and a command name, along with step-by-step receipt of a fragment of Semantic Web data in the JSON format.
Semantic-JSON: Uniform Domain Mapping
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Semantic-JSON upgrades the mapping service to have semantic relationships among the internal IDs that can be used to accelerate semantic communications in social media of the future.
domain/json/command/internalID
bit.ly/internalID
semantic-json.org
Sample Operation in Mathematica
Load the Semantic-JSON package
Set the server’s URL
Obtain labels of two individuals
Obtain the shortest paths between two individuals
Similarly, Semantic-JSON is available in popular languages including Java, Ruby, Perl, Python.
Semantic-JSON supports most languages used by Bioinformaticians
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. semantic-json.org
provided by Semantic-JSON for user to process each data item in our database cloud where the user stored all the programs and files they created
Web-based programming interface
semantic-json.org
Semantic JSON Application Snapshot of the Plant Ontology tree viewer embedded in SciNetS.org.
When the user opens the web page the Tree structure is programmably drawn using Semantic-JSON and published into human readable content.
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
Ontology
Browser semantic-json.org
Semantic-JSON: API to access SemanticWeb data with JSON
Please Visit http://semantic-json.org
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. semantic-json.org
SemanticTable
PowerTool
Part 6
Bioinformatics And Systems Engineering Division, RIKEN,
Japan
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. SematicTable.org
Current Cloud Table Applications don’t support Semantic Web, so can not be used for Life-Sciences
• Current Table applications only use simple Semantic Data, so do not harness the power of the Semantic web + Ontology
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
http://dabbledb.com
http://socrata.com
http://factual.com
http://google.com/fusiontables
Introduction >> Biological data formats used in life science research .
• Biologists like to use the table format for their research results
• Since most biological research results are essentially structured kinds of data, however, it is appropriate to represent these research results as Semantic Web graphs consisting of subject-property-object triples.
>> Characteristics of table data and semantic data .
SematicTable.org
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. SematicTable.org
What is the Semantic Table
• SciNetS SemanticTable is a data container with advantages of both tables and Semantic Web graphs: 1. It provides a table view for biological data convertible to TSV format (tab-separated-value) text table data popular among biologists for computer processing.. 2. It is convertible to explicit Semantic Web graph such as RDF that a computer can read and process intelligently.
What is the SciNetS Semantic Table?
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. SematicTable.org
Explicit Correspondence Semantic
Table and Semantic Web Graph
This table presents implicitly the
following semantic meanings, which
may be understood by people, but not
by computers
Yellow row - Table Header
(Class or property name of data)
Blue Row - Table Data
Green Column - Subjects
Red column - Objects
- The first row means:
“The Person named Michael is 23 years
of age and his gender is Male”
- The second row means: “ The Person
named Jennifer is 27 years of age and
her gender is Female”
For Intelligent processing by a computer,
the table data needs to be presented
with explicit semantic data as follows in
computer- readable Semantic Web
graph format.
Correspondence between Semantic Table and Semantic Web Graph
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. SematicTable.org
SemanticTable uses Ontology to represent Semantic Web data about the considered subject
Resource
B1
Resource
B2
Resource
B3
Resource
B4
Predicate
Predicate
Predicate
Predicate
Resource
C1
Resource
C4
Predicate
Predicate
Resource
C2 Predicate
Resource
C4 Predicate
Resource
A1
Resource
A2
Table about Class B
Table about Class C
Table about Class A
Class A Class B Class C
Resource
A3
SematicTable.org Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
SemanticTable for Life Sciences
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.
By adding Ontology on a huge scale, we can apply SemanticTable to the life sciences
SematicTable.org
About SemanticTable.org • A "Table" is the most common type of data format used in biological research. The Semantic Web, on the other hand, where knowledge is represented by statements comprising a subject, predicate and object, is also becoming a major container for data accessible by automatic computational reasoners; i.e. computers. Here we present "SemanticTable.org", a data-mart server providing a huge amount of biological semantic-web data as simple table-style data files that are reversely convertible to the semantic-web format.
The SemanticTable.org server contains 644 tables and 7.5 million rows of SemanticTable data as of December 2010. Conversion from table data files to semantic-web data files and vice versa are supported by our web site (http://semantictable.org) and the World Wide Web Consortium (W3C) web site RDFa Distiller and Parser page at (http://www.w3.org/2007/08/pyRdfa/).
• We propose SemanticTable, based on the standard RDFa specification, as the most understandable way for biologists not familiar with semantic-web specifications to utilize semantic-web data.
About Semantic Table
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. SematicTable.org
Using Semantic data could allow
automatic drag and drop field
rearrangement by a program or
browser running on a PC.
Semantic Table Merging
SemanticTable Merge Program
Features: ・Merging Multiple Class Semantic Tables
・Subject Column Swapping
・Dumping to Text
SemanticTable
RDF
SemanticTable
RDF
SemanticTable
join
Merging SciNetS Semantic Tables
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. SematicTable.org
Join the SemanticTable Consortium
• Call for developers to join the consortium to create and author such programs;
the fundamental semantic groundwork is ready to use.
• Open collaboration for the standard will jumpstart the new platform.
• Open source programs for SemanticTable run on cloud-computing technology,
as do the programs for spreadsheet tables.
• We hope the SemanticTable format will be distributed broadly and freely for
general use, as well as for scientific use.
Bioinformatics And Systems Engineering (BASE) division,
RIKEN Yokohama Institute of Chemical and Physical Science
SciNetS.org Semantictable.org
Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.