natural language processing & semantic modelsin an imperfect world
DESCRIPTION
TRANSCRIPT
![Page 1: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/1.jpg)
Confidential
Presenter:Marc Hadfield
Natural Language Processing
& Semantic Modelsin an Imperfect World
Copyright Alitora Systems, Inc. 2009
![Page 2: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/2.jpg)
Marc Hadfield
CTO of Alitora Systems Computer Science Research in Bioinformatics
NLP Big (Fuzzy) Networks
Generalized Semantic Data Platform
![Page 3: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/3.jpg)
Alitora Systems
System Approach
…Talk about Systems & Apps more than Modules.
![Page 4: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/4.jpg)
Discussion Today
Storing Data – Semantic Repository Generating Data – NLP Modeling Data – Semantic Models Analyze Data – Methodology Using Data – Application
![Page 5: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/5.jpg)
Alitora Systems Architecture
![Page 6: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/6.jpg)
Alitora Systems API (ASAPI)
User Interfaces ASAPI Collaboration kHarmony™ Semantic
DB Alitora Foundry
Text-Mining UMIS Secure
Distributed URIs URI to Named Graphs
![Page 7: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/7.jpg)
ASAPI Cloud
Multi-Billion Triples
![Page 8: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/8.jpg)
kHarmony™ Semantic DB
Semantic / Graph DB Cloud Deployable
Distribute Data over Servers Layers of Cache
Data Analytics / Clustering Determine High-Value
Knowledge Knowledge Relevancy
Embedded Scripting Data Entitlements
Users, Teams, Organizations, Colleagues
Base Ontology
![Page 9: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/9.jpg)
Alitora Foundry
Manages NLP processes Annotators which add metadata to text
Includes external services like OpenCalais as annotators
Workflows to link annotators together Common data representation across
components RDF in, RDF out Ontology includes representation of
certainty, error
![Page 10: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/10.jpg)
Foundry Workflow
Independent Workflows based on type of text
Combine ML &Rule-based systems
![Page 11: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/11.jpg)
Foundry Data Model
Two dimensional representation of tokens Labels/Spans to tag token ranges (features in machine learning)
Allows multiple interpretations of tokens Chemical names tokenized differently than personal names
Sequence Recognition and Categorization (with scoring/likelyhood) Entities, Entity Types, Normalized (Disambiguated) Entities (ER vs. ER)
Shared across workflow steps Direct RDF representation
“Span”
![Page 12: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/12.jpg)
NLP In Action
Copyright Alitora Systems, Inc. 2009Confidential
![Page 13: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/13.jpg)
Sentence
“Suppression of endogenous Bim greatly inhibits Gadd45a induction of apoptosis.”
Parse [action, inhibit, [action, suppress, [unknown], [gp, endogenous Bim] ], [action, induce, [gp, Gadd45a], [process, apoptosis] ], ]
Confidential Copyright Alitora Systems, Inc. 2009
Foundry Relationship Extraction
![Page 14: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/14.jpg)
Alitora Knowledge Ontology
Data Representation:
Each Object is Named Graph. Unique URI.
“chunks” of RDF
OWL2
“Core” Model
![Page 15: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/15.jpg)
Alitora Knowledge Ontology
Named Graphs:
•URI
•“Reified”
•Provenance
• Hash/Signature
• Creation, Modification, Expiration Dates
•Certainty/Error
![Page 16: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/16.jpg)
Alitora Knowledge Ontology
Lesson:
“Reification” at the model level.
Expose the topology of the knowledge.
![Page 17: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/17.jpg)
Semantic Knowledge StatementsDomain Ontology + Instance Statements
Alitora Knowledge Ontology
![Page 18: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/18.jpg)
Semantic Collaborative Statements
Alitora Knowledge Ontology
![Page 19: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/19.jpg)
Alitora Knowledge Ontology
Fact Representation This example has 9
Named Graphs The “Relation” is the
head Any number of
Relation-Parts Relation-Parts are
chained
“Company Merger”
![Page 20: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/20.jpg)
•OWL
•“Reified”
•Knowledge Representation
•Certainty, Error, Provenance, …
•Graph + Semantic
•Topology Interpretation
•Logical Interpretation
Alitora Knowledge Ontology
![Page 21: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/21.jpg)
MemomicsBio Ontology (Domain) Extends Alitora Knowledge Ontology
Inherits knowledge representation structures OWL Domain Specific Defines types of “facts” specific to
biomedical domain A general AKO fact can be
mapped/asserted into a Memomics BioOntology fact
![Page 22: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/22.jpg)
Where are we?
Store Data Generate data with NLP Represent data in a general knowledge
model Have a domain specific ontology
Where the “action” happens
Need some analysis to push facts into the domain ontology
Query, Inference using the domain ontology
![Page 23: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/23.jpg)
Relevancy
The shape or “topology” of the graph helps to identify relevant knowledge.
The “paths” connecting a User to knowledge, based on search usage, factor into Relevancy
“Knowledge Rank” “Best” facts
Relevancy based onGraph Topology
![Page 24: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/24.jpg)
Scripting, Analysis, Inference Submitted Scripts applied over Graph Walk
Groovy Scripts (Java Interface) Can calculate “scores”
Offline Clustering and Analysis Algorithms Grid/Cloud based
Inference process utilizes knowledge Asserting statements (Relation Statement) Prolog, HiLog, F-Logic Use all features in inferencing (such as certainty)
![Page 25: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/25.jpg)
Certainty
How accurate (F-score) are your NLP extractions?
How accurate is the source material? How dynamic is your domain? Can facts be independently verified
Do multiple sources reinforce a “fact”? Can your community of users curate or
validate information? How sensitive are you to error?
Will users tolerate error (such as in search) or are you trying to inference over absolute “truth”?
![Page 26: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/26.jpg)
Certainty
Choose to assert facts(or not)based on certainty assessments
![Page 27: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/27.jpg)
Confidential
Guided Inference
Inference is guided by ranked knowledge
Analysis can be performed offline
![Page 28: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/28.jpg)
Guided Inference
Dynamic Inference / Rules A question/query is posed to initiate the
inference Knowledge-based is queried to collect
relevant data Certainty Thresholds can be used Relevancy Thresholds can be used
AKO Relations are asserted as “facts” to extend the inference
Process is repeated to add assertions
![Page 29: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/29.jpg)
Demonstrations
Alitora Newstracker Sage Commons, Biomedical Domain Match Engine, Consumer Application
![Page 30: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/30.jpg)
Alitora News Tracker
Track highly relevant news in domain niche
Use NLP to extract entities and relations of interest
Use certainty assessments as thresholds to consider entities/relations
Use a score (an embedded script) to assign a relevancy to news articles Heuristic including entities types in articles,
relationship types, et cetera
![Page 31: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/31.jpg)
Application: News Tracker
![Page 32: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/32.jpg)
Application: Sage Commons
Share networks of biomedical data across the community of researchers million node networks, billions of triples
Extended AKO with Sage Ontology Use for structured data and unstructured data
Allow combination of structured data with NLP derived data
Use certainty thresholds to cut down on noise Use relevancy for efficient queries Expose data for guided inferencing
![Page 33: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/33.jpg)
![Page 34: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/34.jpg)
![Page 35: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/35.jpg)
![Page 36: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/36.jpg)
Application: Match Engine
Match Engine Extended AKO with Match Ontology Foundry for extracting music event entities
Performer, Venue, Price, Genre Certainty for reducing noise Match Engine uses inference with multiple
source of “evidence” to match users with events
Demo Application: Bandalay Facebook App
![Page 37: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/37.jpg)
![Page 38: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/38.jpg)
![Page 39: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/39.jpg)
![Page 40: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/40.jpg)
NLP and (Un)Certainty
Capture Error / Uncertainty in Model from NLP “Reify” relationships so metadata will “fit” Use multiple types of analysis
Rules, Machine Learning, Topology, Curation, User Feedback
Separate general model and domain model Allows asserting a fact in the domain model or not (don’t
“decide” everything at once) Use semantics to make decisions about data Inference can use thresholds to decide to assert
facts (or not) Guided Inference can make informed choice about
facts to add/remove from model
![Page 41: Natural Language Processing & Semantic Modelsin an Imperfect World](https://reader034.vdocuments.net/reader034/viewer/2022051313/5481cc235906b514058b45b9/html5/thumbnails/41.jpg)
Contact Information
750 Menlo Ave, Suite 340 155 Water Street
Menlo Park, CA 94025 Brooklyn, NY 11201
(415) 310-4406 (917) 463-4776
ConfidentialCopyright Alitora Systems, Inc. 2009