allegrograph - cognitive probability graph webcast
TRANSCRIPT
The power of the
Cognitive Probability Graph(aka Cognitive Computing)
June 2016
Jans Aasman
10 years ago
Structured Data
7 years ago
Structured Data Unstructured Data
4 to 5 years ago
Structured DataUnstructured
Data
KnowledgeDomain knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontologies
New #1: Learning. Feed output of data
science back into data infrastructure
Structured
Data
Unstructured
Data
KnowledgeDomain knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontolo
gies
Probabilistic
Inferences.
New # 2: everything in one (distributed)
semantic graph
Structured
Data
Unstructured
Data
KnowledgeDomain
knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontol
ogies
Probabilistic
Inferences.
Unstructured
Data
KnowledgeDomain
knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontol
ogies
AKA: Cognitive Computing
Structured
Data
Unstructured
Data
KnowledgeDomain
knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontol
ogies
Probabilistic
Inferences.
Unstructured
Data
KnowledgeDomain
knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontol
ogies
Examples
Examples
• Healthcare: If I have this class of diagnostics and I get this procedure what are some of the new
symptoms I might get in the next two years.
• eCommerce and brand protection: find all my products based on product similarity
• Logistics: what can I statistically predict about part P breaking down and what other parts do I
usually buy after that part breaks down.
• Police Intelligence: find the most plausible story of a temporally orderend shortest path between
two criminal through observed (hard) facts and inferred (soft facts)
• Fraud detection: find links between your local chamber of commerce and the panama papers
through similar names and addresses.
Example healthcare
• Franz and Montefiore are partners in the Semantic Data Lake project.
One cognitive computing platform for all healthcare analytics
Example healthcare
• We created a single data centric platform that can serve any type of
analytic without building a new data mart for every new question.
• Currently 2.7 million patients with 10 years of data
• All data captured in a Unified Clinical Event model with 350 classes of
events.
Healthcare: structured and unstructured data
Structured patient data combined with complex
integrated terminology
Provenance for every value
Healthcare: the knowledge bases
• More > 180 vocabularies and terminology systems integrated in on
unified terminology system (Mesh, Snomed, UMLS, RxNorm, LOINC etc,
etc)
• External databases and
• Linked Open Data
OMOP
11089001
6600349
11894800
5
7534205
16790501
14667809
35896705
9209005
1732609
9908905
1469609
329005
LOINC
113345001 140460009
skos:semanticRelation
skos:narrower
118948005
skos:broader
“9209005”
SNOMEDCT
M0024135
M0008124 M0004742
skos:semanticRelation
skos:narrower
M0015742
skos:broader“Abdominal Pain”
“M0
02
41
35
”
skos:exactMatch
A0
54
93
02
A0978543
skos-xl:prefLabel
9209005
“Abdominal Pain”
SAB
AUI
SUI
MeSH
SNOMEDCT
MedDRA
rdfs:subClassOf
rdf:type
C0172359 C0232487
C0238551
“Abdominal Pain”
“C0000737”
skos:semanticRelation
skos:broader
UMLS - MTH
skos:notation
S035799
skos-xl:label
MTH
STR
C000737
Everything linked through SKOS
SK
OS
/SK
OS
-XL
ConceptScheme
Concept
UMLS - Semantic Net
Entity Event
Label
Population,
Community
Time
Pt.Pt.Pt.Pt.
SDL Paradigm:
Pt.Pt.Pt.Pt.
Diagnosis
Codes
Disease
Classification
OMIM,
GONG
Genetic
Profile
Procedure
CodeHCPC
Manufacturers
PharmKGB
Drug
Classification
Drug
Codes
DrugBank
ClinicalTrials
CER
PubMed
Analytic Tapestry(closed loop analytics)
Healthcare: probabilistic inferences
Why is this so important?
• Usually the output of data science results in reports and publications but
• No formal trace where the data came from
• No formal link to the actual methods you used, or who did it, or when you did it
• Cannot be compared to earlier results
• Cannot be used as building blocks for further research
• In general : the output is not queryable
• This is not good for delivery of care, reproducibility of research findings,
security and compliance, and results in loss of value-added information,
and enterprise intellectual property and assets, and unnecessary
duplication of efforts
Odds ratio
Association rules
K-means clustering
And then a query you could do never before
• Using the Knowledge Base, the Structured Data and the Probabilistic
inferences all at the same time.
• To find the statistical links between Diabetes and Vision problems in our
Semantic Data Lake
• Find the set of ICD9s that are connected via one or more steps to
concepts in the KB that mention Diabetes
• Find the set of ICD9s that are connected via one or more steps to
vision* or eye* or retinal*
• An show how those two sets are related in the space of odds ratios
And then just a few other examples
In the ecommerce world: find similar objects based on > 10
criteria, including description, product codes, pictures, etc
Returns a Graph in a Table (ughh )
But powerful when visualized
Or like this
And linking with the panama papers
And now the researchers can start investigating
Summary: this is the new paradigm of computing
Structured
Data
Unstructured
Data
KnowledgeDomain
knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontol
ogies
Probabilistic
Inferences.
Unstructured
Data
KnowledgeDomain
knowledge
Linked Open Data
Vocabularies
Taxonomies/Ontol
ogies