the new database frontier: harnessing the cloud

56
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 01-Jul-2015

109 views

Category:

Technology


0 download

DESCRIPTION

The Briefing Room with Rick Sherman and MarkLogic Live Webcast on May 13, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=9cd8eec52f7968721fdcd922e4f70369 The number of data types and sources is increasing almost daily anymore, which poses serious challenges for analytics and discovery. With many of these data sets in the Cloud, analysts are realizing that merging such public resources with internal information assets can be quite problematic. Solutions like virtualization and federation can get the job done, but another option is to employ a database that can natively connect to all these external sources. Register for this episode of The Briefing Room to hear veteran Analyst Rick Sherman as he explains how the changing needs of the user are driving database innovation. He’ll be briefed by Ken Krupa of MarkLogic, who will tout his company’s NoSQL document database. He’ll discuss the importance of expanding the definition of what it means to be a database, and he’ll show how MarkLogic’s ability to tap into more sources than ever creates a scale-out data nerve center, thus delivering faster and better insights. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: The New Database Frontier: Harnessing the Cloud

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Page 2: The New Database Frontier: Harnessing the Cloud

The Briefing Room

The New Database Frontier: Harnessing the Cloud

Page 3: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: DATABASE

June: ANALYTICS & MACHINE LEARNING

July: INNOVATIVE TECHNOLOGY

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Page 6: The New Database Frontier: Harnessing the Cloud

“ “ We are stuck with technology when what we really want is just stuff that works. -Douglas Adams

Page 7: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Analyst: Rick Sherman

Rick Sherman is CEO of Athena IT Solutions

Page 8: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

MarkLogic

! MarkLogic offers a distributed, scale-out, enterprise NoSQL database

!   The platform is comprised of a database, search engine and application services

! MarkLogic can run directly on the Hadoop file system, and it features full text search, location services and geospatial alerting

Page 9: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Guest: Ken Krupa

Ken Krupa is Chief Field Architect at MarkLogic. With 24 years of professional IT experience, Mr. Krupa has a unique breadth and depth of expertise within nearly all aspects of IT architecture. Prior to joining MarkLogic, Ken consulted at some of the largest North American Financial institutions during difficult economic times, advising senior and C-level executives. Prior to that, he consulted with Sun Microsystems as a direct partner and also served as Chief Architect of GFI Group, a Wall St. inter-dealer brokerage. Although his work primarily involves high-level technology strategy, Mr. Krupa remains an active hands-on engineer. In 2005, Ken was awarded patent #6,915,304 – “System and method for converting an XML data structure into a relational database.” Today Ken continues to pursue both individual and community-based engineering activities.

Page 10: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Expect More From Your Database Ken Krupa, Chief Field Architect, MarkLogic Corporation

Page 11: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11

Overview

§  Evolution of the Database §  The Enterprise Data Warehouse (EDW) §  Big Data §  NoSQL §  Enterprise NoSQL §  The Logical Data Warehouse (LDW) §  Unified Database §  Parting Thoughts

Page 12: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12

Hierarchical Era For your application data! •  Application- and

hardware-specific

We Are the New Generation Database

Relational Era “For all your structured data!” •  Normalized, tabular

model •  Application-

independent query •  User control

Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Application services • Faster time-to-results

Page 13: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

RDBMS: One Tool, Many Contortions

§  OLTP §  3rd normal form, updates, simple query

§  Reporting DB §  Because the OLTP app slowed down during heavy query use

§  Enterprise Data Warehouse §  Star schema - unified view of the enterprise

§  Data Marts §  Because the EDW didn’t have everything – Also star schema

§  Federated §  Because it took too long to agree on a standard model

§  Hybrid §  Because Federated is too slow

Page 14: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

OLTP

It’s Complicated

Page 15: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

ETL

OLTP

Warehouse

Data Marts

ETL ETL ETL

Archives

Reference Data ETL

It’s Complicated

Page 16: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

ETL

OLTP

Warehouse

Data Marts

ETL ETL ETL

Archives

“Unstructured”

“ ”

Video Audio

Signals, Logs, Streams

Social

Documents, Messages

{ } Metadata

Search

ETL

🔍

ETL

Reference Data ETL

It’s Complicated

Page 17: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

Look closely…

ETL

Page 18: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

Look closely…

ETL

M…at the hidden “M”

Page 19: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

At a crossroads…

§  Pre-requisite modeling has become an unsustainable friction point §  Can/should we wait until the data is perfectly modeled to do discovery? §  A real dollar cost before value is realized

§  “Cost per column” §  Moving data around is also becoming a friction point

§  There’s too much of it to do it for all cases §  Also a real dollar cost before value is realized

§  Traditional data warehousing largely leaves out “the 80%” §  Most of the world’s data is unstructured §  Dimensional warehouses as we know them are not up to the task

Page 20: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

Gartner’s Take

§  Organizations that failed to deploy strategies to address data complexity and volume issues for their analytics by 2012 will experience more than doubling costs of ownership for their data warehouse and mart environments in disorganized attempts to meet this new demand.

§  By 2014, 85% of organizations will fail to deploy new strategies to address data complexity and volume in their analytics.

§  By 2014, organizations that have deployed analytics to support new

complex data types and large volumes of data in analytics will outperform their market peers by more than 20% in revenue, margins, penetration and retention.

Source: Gartner Does the 21st Century, Beyer and Feinberg.

Page 21: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

What’s being done?

Page 22: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22

What about Hadoop?

Staging Analytics

Persistence Aggregates,

Models

Page 23: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

What about Hadoop?

Staging Analytics

Persistence Aggregates,

Models

? Updates

Queries

Page 24: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24

Hadoop + RDBMS

Distill into RDBMS

…or spill-over into Hadoop

Page 25: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25

Hadoop + RDBMS

ETL Distill into RDBMS

…or spill-over into Hadoop

Page 26: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

Hadoop – What You Get Advantages Limitations

§  HDFS provides scale and economies of scale

§  File-based nature allows for greater Variety §  Raw data is fine and any

shape will do

§  Schema-on-read possible §  Map-reduce enables massive

parallel scaling

§  Hadoop was designed for batch processing §  Does not support real-time

applications on its own §  Requires expertise to configure,

deploy and manage §  Has security limitations §  Is not a database

Page 27: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27

Enter NoSQL

§  Agility §  Flexible data models (or none at all) §  Many different types

o  Simple Key/Value o  Columnar o  Document o  Graph o  Etc.

§  Enterprise features? §  More confusion…?

Page 28: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28

Legacy RDBMS §  Indexes §  Transactions §  Security §  Enterprise operations

“NoSQL” §  Flexible data model §  Commodity scale out §  Distributed, fault-tolerant §  Hadoop integration

Tough Choice…

Cashflows

PartyID Net

Date

Reference Payer TradeID

Amount Receiver

Page 29: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29

Enterprise NoSQL – Best of Both Worlds…

§  Flexible, schema agnostic, document oriented data model §  Comprehensive indexes

o  Documents: Hierarchy, text, values, tags—schema “on-demand” o  Scalars: Aggregates and range filters, including geospatial o  Triples: Linked facts and inferencing o  Permissions: Users, roles, compartments, and privileges o  Queries: Reverse indexes for alerting, matching

§  Ad-hoc dimensions §  Real-time transformation and/or schema on read §  Lock-free reads §  Strict consistency throughout §  Oh yeah.. SQL too

Page 30: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30

Universal Index

A Unified Platform

Page 31: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31

Content: Words, phrases, entities, positions, etc. ... Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) ... ... ACE inhibitors, since the risk of lithium toxicity is very high in such patients...

Structure Label

Author Ing

Comp

ID Para

Org

Values name:sorbitol date:2012-06-04 company:Roche

Security Role:researcher-worldwide

Geospatial Lat: 46.946584 Long: 93.076172

Universal Index

Relationships Trenton isCityOf NewJersey James livesIn Trenton

A Unified Platform

Page 32: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32

Content: Words, phrases, entities, positions, etc. ... Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) ... ... ACE inhibitors, since the risk of lithium toxicity is very high in such patients...

Structure Label

Author Ing

Comp

ID Para

Org

Values name:sorbitol date:2012-06-04 company:Roche

Security Role:researcher-worldwide

Geospatial Lat: 46.946584 Long: 93.076172

Universal Index

Relationships Trenton isCityOf NewJersey James livesIn Trenton

A Unified Platform

Page 33: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33

DB + Search

§  DB separate from search is another unnecessary friction point §  Best if integrated at the DB layer:

§  Quicker time to information §  Greater than the sum of its parts §  No separate indexes to maintain

§  Search as a text processing engine §  Query capability for unstructured text §  Turn text into numbers, create new dimensions §  Infer new information from text search and enrich

Page 34: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34

The advanced interface

Page 35: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 35

Semantics and RDF

Data stored in Triples §  Expressed as Subject : Predicate : Object

John  Smith

LivesIn Brooklyn

New  York Brooklyn PartOf

§  Can make inferences – e.g. John Smith LivesIn New York §  Can create relationships on-the-fly

§  “We’ve identified a special relationship between a drug and an interaction…”

§  Machine-comprehensible “knowledge”

Page 36: The New Database Frontier: Harnessing the Cloud

The World of Triples

Linked Open Data (Free semantic facts available to anyone)

Proprietary Semantic Facts (Facts and Taxonomies in your organization)

Sem

anti

c W

orld

Doc

umen

t

Wor

ld Facts from Free-Flowing Text

(Derived from semantic enrichment)

MarkLogic

Facts in Documents (Part of metadata or added with authoring tools)

Page 37: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 37

Data Everywhere

Application

Data stored in

MarkLogic

Page 38: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 38

Data Everywhere

Application

Data stored in

MarkLogic

On Tiered

Storage Widely Accessible

Page 39: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 39

Data Everywhere

Application

Data stored in

MarkLogic

On Tiered

Storage Widely Accessible

Calling out to

endpoints

RDBMS

SPARQL

REST

Page 40: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40

Data Everywhere

Application

Data stored in

MarkLogic

On Tiered

Storage Widely Accessible

Calling out to

endpoints

RDBMS

SPARQL

REST

Searchable

and Queryable

Logical Data

Warehouse

Reimagined

Page 41: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41

More from Gartner…

§  64% of surveyed organizations either have invested in big data already (30%) or have plans to invest within 24 months

§  Through 2017, 90% of the information assets from big data analytic efforts will be siloed and unleveragable across multiple business processes

§  By 2016, excessive focus of truth over trust in big data will prompt leadership change in 75% of projects

§  Through 2017, premiums for big data-related technology and project skills will remain 20% to 30% above norms for traditional information management skills

Source: Gartner – Predicts 2014: Big Data, Heudecker, Beyer, et al §  Companies will spend more on application integration than on new application

systems

§  By 2018, more than 50% of the cost of implementing 90% of new large systems will be spent on integration

Source: Gartner – Predicts 2013: Application Integration, Lheureux, Pezzini, et al

Page 42: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42

Integration…?

Hadoop Ecosystem

Page 43: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43

The New Data Life-cycle

Textual

Structured

Multi-media Geospatial “ ” Social

Semantic

Discovery/Model Loop

Page 44: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 44

Some parting thoughts

§  ETL is a friction point that should be minimized §  Integration is another friction point §  Modeling should no longer be pre-requisite to discovery

§  Evolve your model alongside discovery §  Expect more from your database

§  Schema agility of Hadoop but in a DBMS §  Enterprise capabilities of traditional DBMS

§  ACID, Security, HA/DR, etc. §  Support for indexing and analyzing heterogeneous

information assets (text, data, geospatial, semantics, etc.) §  Support for heterogeneous locality of data for strategy execution

§  Operational data + LDW with fewer moving parts §  Tiered Storage

Page 45: The New Database Frontier: Harnessing the Cloud

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Thank You

Ken Krupa [email protected] @kenkrupa Check out these pages: www.marklogic.com developer.marklogic.com

Now reimagine the possibilities! Join us in a city near you! May 15 Amsterdam May 20 London June 3 New York City June 5 Chicago June 19 Baltimore June 24 Washington DC

Page 46: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Rick Sherman

Page 47: The New Database Frontier: Harnessing the Cloud

Rick  Sherman  Athena  IT  Solu:ons  rsherman@athena-­‐solu:ons.com    

Copyright © 2014 Athena IT Solutions

The  Briefing  Room:  

The  New  Database  Fron:er    

Page 48: The New Database Frontier: Harnessing the Cloud

Slide 48 Copyright © 2014 Athena IT Solutions All rights reserved.

The New Database Frontier: Our History

•  Relational emerged in 1980s & went mainstream in 1990s ü  Transactional ü  Data warehousing, Business Intelligence & Analytics

•  Relational keep adding features for BI & DW ü  More infrastructure for I/O & memory ü  More complexity ü  More skills ü  Can “one size fit all”?

•  OLAP (On-line Analytical Processing) Emerge ü  Successful but…not pervasive ü  Proprietary: Database, ETL & BI ü  Specialized skills ü  Scalability & extensibility issues

Page 49: The New Database Frontier: Harnessing the Cloud

Slide 49 Copyright © 2014 Athena IT Solutions All rights reserved.

The New Database Frontier: Data & Analytical Needs Have Expanded

•  Data Volume, Velocity & Variety Exploding

•  Data Integration has Succeeded in Many Ways ü  5 C’s: comprehensive, consistent, clean & current ü Achieved at a cost

•  BI & Analytics has Evolved & Expanded

Page 50: The New Database Frontier: Harnessing the Cloud

Slide 50 Copyright © 2014 Athena IT Solutions All rights reserved.

The New Database Frontier: Do We Need to Change?

•  Volume & Velocity for Structured Data ü  Typically can be handled by traditional DW platform with

enhancements & emerging technologies û  In-memory, Columnar, MPP, other û  Infrastructure, Appliances, Cloud û  Better architecture & design

ü  But are there other approaches that can do it better?

•  Variety is key difference & requires different approach ü Unstructured: text, audio, video, click streams, log files, social media ü  Semi-structured: XML, RSS feeds, machine data ü None of big 3 (ETL, databases & BI) were built it

Page 51: The New Database Frontier: Harnessing the Cloud

Slide 51 Copyright © 2014 Athena IT Solutions All rights reserved.

The New Database Frontier: NoSQL Data Stores

•  NoSQL differ from Relational databases ü  Structure ü  Purpose ü  SQL not used as primary query language

•  Structures: ü Wide Column Store / Column Families ü Document Store ü Key Value / Tuple Store ü Graph Databases

•  Characteristics ü  Scalable, flexible, commodity hardware ü  Supports 3 V’s ü  Fixed table schemas not needed ü May not guarantee ACID (atomicity, consistency, isolation, durability)

Page 52: The New Database Frontier: Harnessing the Cloud

Slide 52 Copyright © 2014 Athena IT Solutions All rights reserved.

The New Database Frontier: Big Data is Evolving

Big data platforms are evolving •  NoSQL tools versus platforms •  Cloud deployments •  Integration •  Advanced analytics

Benefits: •  Increased capabilities & reduced programming •  Shift roles

ü  IT & Business ü Data scientists & business analysts ü  Services

•  Lower costs & time to market

Page 53: The New Database Frontier: Harnessing the Cloud

Slide 53 Copyright © 2014 Athena IT Solutions All rights reserved.

Q&A

•  Big Data Implementations o  How are you addressing high cost, manual coding, time to market & skills shortage?

•  Use Cases: (Assume used for unstructured data) o  When would you use your database to store structured data? o  Would your database be used for operational & transaction processing applications?

•  Data Ingestion & Integration: o  How would data sources typically be ingested? o  Are there data integration capabilities similar to ETL available?

•  BI: o  In order to use BI tools or SQL is a data model needed? o  How use extension keywords (“MATCH”) be used?

•  Modeling data: o  Contrast semantic triples/RDF vs dimensional models o  Compare skills sets needed

•  What are the differences in how your database handles: o  Data capture versus information analysis o  Data ingestion versus processing business transaction or processes

Page 54: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Page 55: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: DATABASE

June: ANALYTICS & MACHINE LEARNING

July: INNOVATIVE TECHNOLOGY

Page 56: The New Database Frontier: Harnessing the Cloud

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!