o'reilly webcast: organizing the internet of things - actionable insight through ontologies

39
ORGANIZING THE INTERNET OF THINGS ACTIONABLE INSIGHT THROUGH ONTOLOGIES Boris Adryan [email protected]

Upload: boris-adryan

Post on 31-Jul-2015

457 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

ORGANIZING THE INTERNET OF THINGS

ACTIONABLE INSIGHT THROUGH ONTOLOGIES

Boris [email protected]

Page 2: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

• Computational biologist• Research group leader• Advisor at• 2015 Fellow of the

Who is@BorisAdryan

Page 3: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT HOUR…(including questions!)

Page 4: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 10 MINUTES

Page 5: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

DNA = storage of a blueprint

RNA = ‘active copy’ of DNA

protein = the building blocks of cells and tissues

LIFE AS WE KNOW IT

transcription

translation

Gregor Johann Mendel,exhibited in the Library at the NIMR

Page 6: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

‣ Reading DNA information

‣ Determining “the sequence of a gene” was a PhD in the early 1980s

‣ Data processing was mainly transcribing the observation into a research paper

BIOLOGY THEN AND NOWSEQUENCE INFORMATION

Sanger sequencing ca. 1980

http://www.eplantscience.com

Page 7: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

189,739,230,107 bases base pairs on 15th April 2015(from 159,813,411,760 bases pairs in April 2015)

‣ We can sequence a human genome in half a day

‣ Sequence databases grow faster than storage capacity

‣ Data processing is the key step in scientific understanding

BIOLOGY THEN AND NOWSEQUENCE INFORMATION

1990: automation kilobases a day

2007: next-gen seq megabases a day

2015: 1000s of instruments world-wide

Page 8: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

BIOLOGY THEN AND NOWGENE ACTIVITY INFORMATION

‣ When are genes needed?

‣ Classical molecular biology workflow, taking days…

‣ Data is semi-quantitative; testing one gene at the time

Northern blot, ca. 1995

‣ High-throughput gene expression profiling since mid-1990s

‣ Quantitative information for every gene in an organism

‣ Key challenge is the graphical representation and interpretation of the data

screenshot from FlyBase, today

Page 9: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

26 ATP

‣ Signal transduction and metabolic pathways

‣ Characterisation of proteins and substrates that mediate chemical reactions

‣ Nobel prize material

BIOLOGY THEN AND NOWBIOCHEMISTRY

Page 10: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

‣ We know about 250k metabolites

‣ 100k protein structures

‣ on the order of 10k different chemical reactions

BIOLOGY THEN AND NOWBIOCHEMISTRY

“The Robot Scientist”

“small molecules”(Organic & Biomolecular Chemistry Blog)

protein(via the Protein Databank, www.pdb.org)

Page 11: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

‣Everything is connected ‣ Big, noisy, often

unstructured data

‣We are learning how biological entities depend on each other

DNA > RNA > proteins

Page 12: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 5 MINUTES

Page 13: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

‣ Everything is connected‣ Big, noisy, often

unstructured data

www.thingslearn.com

Analytics, context integration, machine learning and predictive modelling for the IoT.

Page 14: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

0 clean shirt left +

washing machine estimates 97% of your last pack of powder used

+ it’s Wednesday, 23:55

+ the last four Thursdays had a

morning business meeting +

the car is parked 20 m from a shop +

last retail activity: 8 sec ago

Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse

“need identified” + “notification appropriate”

Actionable insight. From everything.

Page 15: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

NO ANALYTICAL FLEXIBILITY IN M2M/IOTMatt Hatton, Machina Research The BLN IoT ‘14

Internet replaces wire

It’s all about the context

M2M

consumer

IoT

defined I-P-O like it’s 1975

context

context

context

context

context

context

context

Is this hot?

Page 16: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

LIFE SCIENCE STRATEGIES DON’T WORK IN THE IOT- There are no commonly accepted

- ‘catalogue’ of things,- ‘ontology’ of things,- ‘data format’ of things,- ‘meta data’ for things.

- Most businesses are driven by revenue, not long-term strategic vision

- Service providers have no need to publish

- Data can be highly personal (cheap excuse)

unless they’re

Page 17: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

Trojan Roomcoffee pot -

ca. 1993

Oct. 1995

“The Internet of Things”Kevin Ashton, ca. 1999

20 YEARS OF NON-CONVERGENT EVOLUTION

FIRST DATA POTENTIAL RECOGNISED TODAY’S REALITY

“ignorant coexistence”

➡ Commonly accepted platforms and formats for data exchange

➡ Meta-data deposition is a must

➡ Infrastructure provides entry point for computational knowledge inference

“designed to ask questions”

Page 18: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 10 MINUTES

Page 19: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

Oct. 1995

TOWARDS MIAMI STANDARD AND DATA REPOSITORIES

cf. IoTNov. 1993

MInimal Annotation for MIcroarray Info

Page 20: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

META DATA, SHARING AND DATA REPOSITORIES

founded in Nov. 1999

But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics has yet faced. Major difficulties stem from the detail required to describe the conditions of an experiment, and the relative and imprecise nature of measurements of expression levels. The potentially huge volume of data only adds to these difficulties.

NatureFeb. 2000

Nov. 2000 Oct. 2002

Wide adoption as requirement for publication in scientific journals

Page 21: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

META DATA, SHARING AND DATA REPOSITORIES

cf. IoT 2014

since 2003

http://en.wikipedia.org/wiki/Silo

Page 22: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

THE LIFE SCIENCES FIXED THEIR KNOWLEDGE REPRESENTATION PROBLEM

Page 23: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

FORMALISING KNOWLEDGE

Page 24: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

FORMALISING KNOWLEDGE WITH GENE ONTOLOGY

Page 25: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

CURRENT GOVERNMENT INVESTMENTS INTO GENE ONTOLOGY

NIH alone spent $44,616,906 on the ontology structure since 2001(I don’t have data for UK/EU spendings)

~100 full-time salaries for experts with domain-specific knowledge

~40,000 terms

Page 26: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

story

measurements + meta data

open, public repositories

human curators

ontology terms

community

PUBLISH OR PERISH

ok?

journal

informal exchange - no credit!

funders

assessment

The majority of this infrastructure is paid for by governments and charities

industry!

Page 27: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies
Page 28: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

OUR PROBLEM IS KNOWLEDGE

DATA != INSIGHT

WITHOUT ORGANISING IT

Page 29: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

• Why a biologist is interested in large, unstructured data

• What wrong is with the IoT in its current state

• How biologists deal with similar problems

• Which academic concepts would be useful in the IoT

WHAT TO EXPECT IN THE NEXT 10 MINUTES

Page 30: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

measurements + meta data

storage & provenance

human curators

ontology terms

user

PUBLISH OR YOU’RE NOT DOING IOT

ok?

Maybe the majority of this infrastructure should be paid for by governments?

companycloud

device registration

“ “

privileges dataadded value

Page 31: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

WHAT IS AN ONTOLOGY?

used to establish conceptual connection between entities

knowledge inference

fingerontology structure

- body part - limb - arm - hand - thumb - fingerontology rules

‣controlled vocabulary‣clearly defined relationships

is a

is a

connects to

part of

with ontological reasoning, a computer can infer that “finger is a body part”, although we

haven’t explicitly defined it that way

Page 32: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

ARE PEOPLE NOT ALREADY USING ONTOLOGIES IN THE IOT?

Semantic Sensor Network Ontology

“thermostat”

The idea is not new! Cf. extension of the semantic web with the Semantic Sensor Network.

‣catalogs‣conventions

http://www.w3.org/2005/Incubator/ssn/ssnx/ssn

Page 33: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

ONTOLOGIES HAVE TO BE PRAGMATIC COMPROMISES

Gene Ontology annotation

15 years of research47 publications100+ authors

50+ PhDs

15 direct annotations~150 inferred annotations

Page 34: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

THE THREE BRANCHES OF

Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352

Localization: Where is an entity acting?

Function: What does the entity do?

Process: When is the entity needed?

Page 35: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

inferences on “is a”

“part of”

“regulates”

“has part”

from geneontology.org from Ashburner et al., Nat Genet. 2000, 25(1):25-9.

GO AND CONTEXT

Page 36: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

THE BRANCHES OF GO AND THE IOTLocalization: inside, (my?) home, living room

Function:measures temperatureregulates temperature

interacts with user directlyinteracts with user via app

Process: regulation of temperaturemeasurement of ambient temperature

‘is proxy / is avatar’ forpresencefireice age

Page 37: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

A LAST WORD ON PRAGMATISM

“perfect” ontology

The SSN Ontology allows for inference entirely on the basis of its structure and annotation.In reality, many parameters are difficult to establish and the effort to annotate things outweighs the utility.

“crude” ontology

A simplified structure allows for quick annotation even by non-specialists.The lack of details can lead to clashes in the ontology => more smartness has to go into software; more coding effort.

1 billlion

different things

1 milllion

use cases

Page 38: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

0 clean shirt left +

washing machine estimates 97% of your last pack of powder used

+ it’s Wednesday, 23:55

+ the last four Thursdays had a

morning business meeting +

the car is parked 20 m from a shop +

last retail activity: 8 sec ago

Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse

“need identified” + “notification appropriate”

Actionable insight. From everything.

“not home”

“buying”

credit card: “highly personal device” ~ alive and awake

3% left and

not pressed

“indicator of esteem”

Page 39: O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

Today’s biology is a quantitative, data-

rich science.

Infrastructure for ‘big data’ was driven by

academics.

Data is only useful if it can be turned into knowledge.

Understanding of data requires ‘data about

the data’.

Meta-data should be in a universally

understood format.Ontologies provide

context.

Gene Ontology (GO) is a de facto

standard.

Human curation is key to GO.

Public funders and industry contribute significantly to GO.

Should governments be involved in IoT?

GO is not a ‘one fits all’, but has a few useful concepts.

What does the thing do? Thing function.

For what can the thing be an avatar? Thing process.

Where is the thing? Thing localization.

@BorisAdryan