smart data for smart labs
TRANSCRIPT
Smart Data for Smart Labs:
Utilizing Semantic Technologies for Improved
Integration and Sharing of Laboratory Data
Eric Little, PhD
VP Data Science
Slide 2
Outline
The Current Laboratory Data Situation
The Growing Importance of Data As A Corporate Asset
What is Semantic Technology and How Can It Help?
Moving Beyond Semantics – Big Data & Analytics
Smart Labs for the 21st Century
Slide 3
The Current Lab Situation
Many challenges exist for
data to be captured,
integrated and shared
Data Silos
Incompatible
instruments and
software systems
Legacy architectures are
brittle and rigid
SME knowledge resides
in people’s heads
Data schemas are not
explicitly understood
Lack of common vision
between business units
and scientists
Slide 4
Pharma Example in Action
Documentation
Initial Step
Local
Regulatory
Affiliate
Calibration
SME
Instrumentation Marketing
R&D Data
R&D Tech
R&D Data Stores
Production Data
External
Regulatory
Affiliate
Manual Data
Verification
Process
Verify
OK?
NO
YES
Finalized
Report
• This process can take weeks to complete
because it often had to be done several
times over due to errors.
• Relations must be built by hand on the
user side from flat files or spreadsheets.
The relations can therefore not be retained
over time or automatically generated later.
• The DBs are not built for retrieval of
different information types – the joins are
not always there.
Slide 5
Why Data Matters
Enterprise systems are increasingly
“hybrid” in their design and architectures
Legacy Data Sources combined with new
tech
Integrating data is becoming more
complex
The size of data sources continues to grow
Different user groups within organizations
Answers need to reflect increasingly
complex patterns
Finding and utilizing key data within an
organization is of increasing importance
Data is a valuable corporate asset
The fundamentals of data management
have changed. Basic storage & retrieval has
given way to analytics and
responsiveness.
Slide 6
Analytics and Data Science for the 21st Century
The rate of change in digital information is growing exponentially
Cloud Computing is now critical for scaling an enterprise
New data types are being created - hold significant value
Data is becoming more personalized and context-based
The effect of data is changing the business landscape
90% of the world’s data was produced in the last 2 years – how well can
you mine/leverage this data? What is this worth to a company?
$900 Billion/year: cost of lowered employee productivity and reduced
innovation from information overload – how can we avoid these costs?
“Increasing volume and detail of enterprise information, multimedia, social media, and the
Internet of Things will fuel exponential growth in data for the foreseeable future.”
“The use of big data will become a key basis of competition and growth for individual firms.”
McKinsey: “Big data: The next frontier for innovation, competition, and productivity”, May 2011
Semantic Technologies:
What Are They & How Are They Used?
Slide 8
The Value of Semantics
Has its origins in philosophy - generally understood as the abstract
study of meaning
Distinguished from syntax – which is the rules-based grammar of a
language
“Washington”
Slide 9
Semantic Web and IT Evolution: Evolving from
Code-Centric to Data-Centric IT
Semantic technologies: IT evolution from code to data centricity
In the Code-Centric years, data was often stored in flat files
The creation of databases, specifically Network and RDBMS, was
one of the first steps leading to Data-Centric evolution
The last decade has seen standards such as XML, RDF, Web
Services, and now OWL, that further evolve IT to a Data-Centric
environment
2016
Slide 10
Utilizing Taxonomies for Reference Data
Management
Taxonomies provide important
structure to data - as a-cyclical
tree graphs
2 Types of Applications:
• Captures sub-class and super-
class relationships
• Captures broad/narrow
relationships between terms
Slide 11
Allotrope Foundation Taxonomies (AFT)
mass
inte
nsity
af-m:AFM_0000350
af-
r:A
FR
_0
00
04
95
Slide 12
Utilizing the Semantic Spectrum
(Moving Beyond Taxonomies)
Code (Lists) Terms (Soil, Plant, etc.)
Controlled Vocabulary
(Agreed Upon Terms)
Taxonomy
(Hierarchy)
Thesaurus
(Preferred Labels, Synonyms, etc.) RDF Models
(Triples as Graphs)
OWL Ontologies
(RDF + Axioms)
Reasoning
(Rule-based Logics:
Discover New Patterns)
Ontologies and Reasoning add
Axioms and Advanced Logic
Slide 13
Levels of Semantic Expressivity
Semantics can be modeled at many levels
Finding the right level is a tradeoff of expressivity, performance,
decidability, and other factors
The weakest representation is basic syntax matching
The strongest representation is higher order logic
Semantic representation in RDF and ontologies is roughly in the
middle
Using knowledge representation one can separate schema
level from data level
Data becomes much more flexible and reusable
Allows easier transformation of data to knowledge creation
Raises computational value (now data can be more easily
extracted from legacy systems, shared, and used across an
enterprise).
Slide 14
Benefits of Semantic Technology
Interoperability
Searching/
Browsing
Reuse
Architectural
Intent
Automated
Reasoning
Development
Lifecycle
Moving From Semantics to
Big Data Analytics
Slide 16
The power of analytics is now just
beginning to be felt
Moore’s Law pertaining to
processing is not the problem
Focus on the growth of Analysis:
From 1988-2003 Computer
processing speed grew by
1000x
In the same period algorithm
dev grew by 43,000x
What does this tell you about
the direction in which we are
headed?
As data grows, so too will the need
to utilize it more effectively
The Rise of Analytics is Changing the Game
AN
ALY
TIC
S
Slide 17
Understanding the 4V’s of Big Data
Normally the focus –
Big Data Analysis is
more than just size
Performance is
Critical to Success
Data complexity is
increasing – Model
complexity
Uncertainty abounds
– requires statistics
and probabilities
Majority of Big Data analytics
approaches treat these two V’s
Semantic
technologies provide
clear advantages
Mathematical
Clustering
Techniques
provide clear
advantages
Slide 18
Why Semantics Matters for Data Analytics
Big Data approaches require proper metadata
and terminologies to integrate information well
Relationships matter in the data
Understanding perspective (context) is crucial for
success in today’s world
Semantics provides better data models/schemas
Slide 19
Smart Labs for the 21st Century
Smart labs in the future will provide
customers with:
Integrated Data – common reference
data structures (vocabularies)
Sharable Data – easier interaction
across teams and business units
Scalability – Big data applications
that can be highly elastic
Conceptual Representations –
context and perspective are captured
Advanced Analytics – complex &
automated problem-solving
capabilities