scalable smart data management in the cloud
TRANSCRIPT
Scalable Smart Data Management in the Cloud
Alex Simov & Yavor Petkov
Cloud2Days, 2015
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #1 Nov 2015
• Why we developed the Self-Service Semantic Suit (S4)
• What is S4
• S4 features
• Cloud architecture
• S4 for developers
Presentation outline
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #2 Nov 2015
About Ontotext
• Provides products & solutions for content enrichment and metadata management
– 70 employees, head quartered in Sofia (Bulgaria)
– Sales presence in London, Washington & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #3 Nov 2015
Some of our clients
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #4 Nov 2015
Why we developed the Self-Service Semantic Suite (S4)
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #5 Nov 2015
• How can we unlock more insight from text?
• How can we interlink & search across text and structured data sources?
• How can we improve data & content reuse?
• How can we integrate data sources faster?
• How can we reuse external open data sources?
• How can we discover relations between entities?
Typical challenges for our customers
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #6 Nov 2015
• Unlock the value of semantic technologies to SMEs
– Most success stories so far come from bigger companies
• Lower the technology adoption barriers and risks
– Challenge: perceived risks associated with new technology adoption
– Challenge: insufficient resources to implement new technologies
– Challenge: procurement & provisioning processes
Why did we create S4?
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #7 Nov 2015
• Utilise semantic technology for smart data applications
– Extract more value hidden in text
– Interlink structured and unstructured data sources
– Semantic search (instead of keyword-based search)
– Reuse open knowledge graphs
• Low adoption cost and risk
• No need for complex planning & procurement
• Pay only for what you use
S4 benefits
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #8 Nov 2015
What is S4
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #9 Nov 2015
• Self-service capabilities for text analytics, content enrichment and metadata management
– Access to large open knowledge graphs
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
• Available anytime, anywhere
– Simple RESTful services
• Simple, pay-per-use pricing
– No upfront commitments
What is S4?
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #10 Nov 2015
What is S4?
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #11 Nov 2015
Knowledge Graphs
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #12 Nov 2015
• SPARQL query endpoint to FactForge knowledge graph
– 500 million entities
– 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and vocabularies
Knowledge graphs with S4
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #13 Nov 2015
Knowledge graph query example
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #14
SPARQL query using DBpedia
data
Nov 2015
Text Analytics
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #15 Nov 2015
• Text analytics services
– News annotation
– News categorisation
– Biomedical
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #16 Nov 2015
News analytics example
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #17
S4 result
Nov 2015
RDF Data Management
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #18 Nov 2015
• Self-managed RDF database – Available from AWS Marketplace
– Variety of hardware configurations
– Manage large data volumes
– Pay-per-hour pricing
– Free trial evaluation (one time)
• Fully-managed RDF DBaaS – Low-cost DBaaS available 24/7
– Ideal for small & moderate data volumes
– Zero administration: automated operations, maintenance & upgrades
– Users pay only for the actual database utilization
– Free data hosting tier
RDF DBaaS Overview
#19 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
• Instantly deploy new databases when needed
• Accessible as REST services
• Isolation of the multi-tenant databases
• Fair use of shared resources
• A DBaaS on S4 is…
– A GraphDB instance
– Running within a Docker container
– With a private EBS data volume
Fully-managed RDF DBaaS (cont’d)
#20 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
DBaaS management console
#21 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
Amazon Cloud Architecture
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #22 Nov 2015
• Why AWS ? – Innovation
– Elasticity
– Rich infrastructure and platform services
– Reliability
• S4 builds upon … – Compute: EC2, EBS
– Storage: S3, Glacier
– Databases: SimpleDB, DynamoDB
– Infrastructure: ELB, ASG, SQS, SNS, SES, …
– Management: ClouldWatch
S4 on AWS
#23 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
S4 Architecture
#24 Nov 2015
applications
applications
Web UI
routing nodes
data nodes
coordinator
storage notifications
Docker repository
account / quota management
monitoring & logging metadata
store
text analytics
document queue
FactForge semantic
warehouse
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
• Routing nodes
– Forward client requests to the proper data node
– Text processing requests queueing
– Access control & quota checks
• Text processing nodes
• Data nodes
– Multiple Docker containers (GDB+EBS) per node
• Coordinator (single)
– Distribute DB initialisation / creation tasks to data nodes
• Management Console
S4 Architecture
#25 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
Dealing with Failures
#26 Nov 2015
applications
applications
Web UI
routing nodes
data nodes
coordinator
storage notifications
Docker repository
account / quota management
monitoring & logging metadata
store
text analytics
document queue
FactForge semantic
warehouse
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia
For Developers
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #27 Nov 2015
S4 service GET POST PUT DELETE
text analytics
-
submit a document for processing
-
-
knowledge graph access
SPARQL query SPARQL query -
-
self- or fully managed RDF graph databases (OpenRDF REST API)
•list repositories •query data •read data •get the database configuration
•query data •update data
•create repositories •update data (RDF document) •update the database configuration
•delete repositories •delete data
S4 APIs overview
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #28 Nov 2015
• Security & access control – TSL 1.2 for all HTTP communication
– HTTP Basic Authentication
– API keys – flexible access control mechanism
• Supported data formats – Text analytics: various textual input / JSON output
– Knowledge graphs and DBaaS: any W3C recommended RDF format
• Free monthly quotas – Text analytics: 250 MB of data processed
– Knowledge graphs access – 5000 requests
– Fully-managed RDF database – 1 million hosted triples
S4 APIs overview (cont’d)
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #29 Nov 2015
Getting started in minutes
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #30
1. Register a personal account at s4.ontotext.com
2. Generate an API key pair
3. Check out the docs, demos & code at
docs.s4.ontotext.com
4. Contact us with questions!
Nov 2015
• Java, Python & C# SDKs
• Sample code
– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
– cUrl examples for the most impatient
• GATE/UIMA plugins
• Firefox/Chrome plugins
• Online documentation
Supporting materials
Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #31 Nov 2015
Thank you!
s4.ontotext.com
Nov 2nd, 2015
#32 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia Nov 2015