scalable smart data management in the cloud

32
Scalable Smart Data Management in the Cloud Alex Simov & Yavor Petkov Cloud2Days, 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #1 Nov 2015

Upload: ontotext

Post on 13-Apr-2017

796 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Scalable Smart Data Management in the Cloud

Scalable Smart Data Management in the Cloud

Alex Simov & Yavor Petkov

Cloud2Days, 2015

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #1 Nov 2015

Page 2: Scalable Smart Data Management in the Cloud

• Why we developed the Self-Service Semantic Suit (S4)

• What is S4

• S4 features

• Cloud architecture

• S4 for developers

Presentation outline

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #2 Nov 2015

Page 3: Scalable Smart Data Management in the Cloud

About Ontotext

• Provides products & solutions for content enrichment and metadata management

– 70 employees, head quartered in Sofia (Bulgaria)

– Sales presence in London, Washington & Boston

• Major clients and industries

– Media & Publishing

– Health Care & Life Sciences

– Cultural Heritage & Digital Libraries

– Government

– Education

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #3 Nov 2015

Page 4: Scalable Smart Data Management in the Cloud

Some of our clients

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #4 Nov 2015

Page 5: Scalable Smart Data Management in the Cloud

Why we developed the Self-Service Semantic Suite (S4)

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #5 Nov 2015

Page 6: Scalable Smart Data Management in the Cloud

• How can we unlock more insight from text?

• How can we interlink & search across text and structured data sources?

• How can we improve data & content reuse?

• How can we integrate data sources faster?

• How can we reuse external open data sources?

• How can we discover relations between entities?

Typical challenges for our customers

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #6 Nov 2015

Page 7: Scalable Smart Data Management in the Cloud

• Unlock the value of semantic technologies to SMEs

– Most success stories so far come from bigger companies

• Lower the technology adoption barriers and risks

– Challenge: perceived risks associated with new technology adoption

– Challenge: insufficient resources to implement new technologies

– Challenge: procurement & provisioning processes

Why did we create S4?

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #7 Nov 2015

Page 8: Scalable Smart Data Management in the Cloud

• Utilise semantic technology for smart data applications

– Extract more value hidden in text

– Interlink structured and unstructured data sources

– Semantic search (instead of keyword-based search)

– Reuse open knowledge graphs

• Low adoption cost and risk

• No need for complex planning & procurement

• Pay only for what you use

S4 benefits

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #8 Nov 2015

Page 9: Scalable Smart Data Management in the Cloud

What is S4

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #9 Nov 2015

Page 10: Scalable Smart Data Management in the Cloud

• Self-service capabilities for text analytics, content enrichment and metadata management

– Access to large open knowledge graphs

– Text analytics for news, life sciences and social media

– RDF graph database as-a-service

• Available anytime, anywhere

– Simple RESTful services

• Simple, pay-per-use pricing

– No upfront commitments

What is S4?

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #10 Nov 2015

Page 11: Scalable Smart Data Management in the Cloud

What is S4?

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #11 Nov 2015

Page 12: Scalable Smart Data Management in the Cloud

Knowledge Graphs

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #12 Nov 2015

Page 13: Scalable Smart Data Management in the Cloud

• SPARQL query endpoint to FactForge knowledge graph

– 500 million entities

– 5 billion triples

• Key LOD datasets integrated

– DBpedia, Freebase, GeoNames, WordNet

– Dublin Core, SKOS, PROTON ontologies and vocabularies

Knowledge graphs with S4

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #13 Nov 2015

Page 14: Scalable Smart Data Management in the Cloud

Knowledge graph query example

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #14

SPARQL query using DBpedia

data

Nov 2015

Page 15: Scalable Smart Data Management in the Cloud

Text Analytics

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #15 Nov 2015

Page 16: Scalable Smart Data Management in the Cloud

• Text analytics services

– News annotation

– News categorisation

– Biomedical

– Twitter

• Entity linking & disambiguation

– Mappings to DBpedia & GeoNames instances

– Mappings to biomedical data sources (LinkedLifeData)

• HTML, MS Word, XML, plain text input

• Simple JSON output

Text analytics with S4

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #16 Nov 2015

Page 17: Scalable Smart Data Management in the Cloud

News analytics example

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #17

S4 result

Nov 2015

Page 18: Scalable Smart Data Management in the Cloud

RDF Data Management

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #18 Nov 2015

Page 19: Scalable Smart Data Management in the Cloud

• Self-managed RDF database – Available from AWS Marketplace

– Variety of hardware configurations

– Manage large data volumes

– Pay-per-hour pricing

– Free trial evaluation (one time)

• Fully-managed RDF DBaaS – Low-cost DBaaS available 24/7

– Ideal for small & moderate data volumes

– Zero administration: automated operations, maintenance & upgrades

– Users pay only for the actual database utilization

– Free data hosting tier

RDF DBaaS Overview

#19 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 20: Scalable Smart Data Management in the Cloud

• Instantly deploy new databases when needed

• Accessible as REST services

• Isolation of the multi-tenant databases

• Fair use of shared resources

• A DBaaS on S4 is…

– A GraphDB instance

– Running within a Docker container

– With a private EBS data volume

Fully-managed RDF DBaaS (cont’d)

#20 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 21: Scalable Smart Data Management in the Cloud

DBaaS management console

#21 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 22: Scalable Smart Data Management in the Cloud

Amazon Cloud Architecture

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #22 Nov 2015

Page 23: Scalable Smart Data Management in the Cloud

• Why AWS ? – Innovation

– Elasticity

– Rich infrastructure and platform services

– Reliability

• S4 builds upon … – Compute: EC2, EBS

– Storage: S3, Glacier

– Databases: SimpleDB, DynamoDB

– Infrastructure: ELB, ASG, SQS, SNS, SES, …

– Management: ClouldWatch

S4 on AWS

#23 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 24: Scalable Smart Data Management in the Cloud

S4 Architecture

#24 Nov 2015

applications

applications

Web UI

routing nodes

data nodes

coordinator

storage notifications

Docker repository

account / quota management

monitoring & logging metadata

store

text analytics

document queue

FactForge semantic

warehouse

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 25: Scalable Smart Data Management in the Cloud

• Routing nodes

– Forward client requests to the proper data node

– Text processing requests queueing

– Access control & quota checks

• Text processing nodes

• Data nodes

– Multiple Docker containers (GDB+EBS) per node

• Coordinator (single)

– Distribute DB initialisation / creation tasks to data nodes

• Management Console

S4 Architecture

#25 Nov 2015 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 26: Scalable Smart Data Management in the Cloud

Dealing with Failures

#26 Nov 2015

applications

applications

Web UI

routing nodes

data nodes

coordinator

storage notifications

Docker repository

account / quota management

monitoring & logging metadata

store

text analytics

document queue

FactForge semantic

warehouse

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia

Page 27: Scalable Smart Data Management in the Cloud

For Developers

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #27 Nov 2015

Page 28: Scalable Smart Data Management in the Cloud

S4 service GET POST PUT DELETE

text analytics

-

submit a document for processing

-

-

knowledge graph access

SPARQL query SPARQL query -

-

self- or fully managed RDF graph databases (OpenRDF REST API)

•list repositories •query data •read data •get the database configuration

•query data •update data

•create repositories •update data (RDF document) •update the database configuration

•delete repositories •delete data

S4 APIs overview

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #28 Nov 2015

Page 29: Scalable Smart Data Management in the Cloud

• Security & access control – TSL 1.2 for all HTTP communication

– HTTP Basic Authentication

– API keys – flexible access control mechanism

• Supported data formats – Text analytics: various textual input / JSON output

– Knowledge graphs and DBaaS: any W3C recommended RDF format

• Free monthly quotas – Text analytics: 250 MB of data processed

– Knowledge graphs access – 5000 requests

– Fully-managed RDF database – 1 million hosted triples

S4 APIs overview (cont’d)

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #29 Nov 2015

Page 30: Scalable Smart Data Management in the Cloud

Getting started in minutes

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #30

1. Register a personal account at s4.ontotext.com

2. Generate an API key pair

3. Check out the docs, demos & code at

docs.s4.ontotext.com

4. Contact us with questions!

Nov 2015

Page 31: Scalable Smart Data Management in the Cloud

• Java, Python & C# SDKs

• Sample code

– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy

– cUrl examples for the most impatient

• GATE/UIMA plugins

• Firefox/Chrome plugins

• Online documentation

Supporting materials

Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia #31 Nov 2015

Page 32: Scalable Smart Data Management in the Cloud

Thank you!

s4.ontotext.com

Nov 2nd, 2015

#32 Scalable Smart Data Management in the Cloud, Cloud2Days, Sofia Nov 2015