the new database frontier: harnessing the cloud
Post on 01-Jul-2015
109 Views
Preview:
DESCRIPTION
TRANSCRIPT
Grab some coffee and enjoy the pre-show banter before the top of the hour!
The Briefing Room
The New Database Frontier: Harnessing the Cloud
Twitter Tag: #briefr
The Briefing Room
Welcome
Host: Eric Kavanagh
eric.kavanagh@bloorgroup.com @eric_kavanagh
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: DATABASE
June: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
“ “ We are stuck with technology when what we really want is just stuff that works. -Douglas Adams
Twitter Tag: #briefr
The Briefing Room
Analyst: Rick Sherman
Rick Sherman is CEO of Athena IT Solutions
Twitter Tag: #briefr
The Briefing Room
MarkLogic
! MarkLogic offers a distributed, scale-out, enterprise NoSQL database
! The platform is comprised of a database, search engine and application services
! MarkLogic can run directly on the Hadoop file system, and it features full text search, location services and geospatial alerting
Twitter Tag: #briefr
The Briefing Room
Guest: Ken Krupa
Ken Krupa is Chief Field Architect at MarkLogic. With 24 years of professional IT experience, Mr. Krupa has a unique breadth and depth of expertise within nearly all aspects of IT architecture. Prior to joining MarkLogic, Ken consulted at some of the largest North American Financial institutions during difficult economic times, advising senior and C-level executives. Prior to that, he consulted with Sun Microsystems as a direct partner and also served as Chief Architect of GFI Group, a Wall St. inter-dealer brokerage. Although his work primarily involves high-level technology strategy, Mr. Krupa remains an active hands-on engineer. In 2005, Ken was awarded patent #6,915,304 – “System and method for converting an XML data structure into a relational database.” Today Ken continues to pursue both individual and community-based engineering activities.
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Expect More From Your Database Ken Krupa, Chief Field Architect, MarkLogic Corporation
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
Overview
§ Evolution of the Database § The Enterprise Data Warehouse (EDW) § Big Data § NoSQL § Enterprise NoSQL § The Logical Data Warehouse (LDW) § Unified Database § Parting Thoughts
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
Hierarchical Era For your application data! • Application- and
hardware-specific
We Are the New Generation Database
Relational Era “For all your structured data!” • Normalized, tabular
model • Application-
independent query • User control
Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Application services • Faster time-to-results
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
RDBMS: One Tool, Many Contortions
§ OLTP § 3rd normal form, updates, simple query
§ Reporting DB § Because the OLTP app slowed down during heavy query use
§ Enterprise Data Warehouse § Star schema - unified view of the enterprise
§ Data Marts § Because the EDW didn’t have everything – Also star schema
§ Federated § Because it took too long to agree on a standard model
§ Hybrid § Because Federated is too slow
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
OLTP
It’s Complicated
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
ETL
OLTP
Warehouse
Data Marts
ETL ETL ETL
Archives
Reference Data ETL
It’s Complicated
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
ETL
OLTP
Warehouse
Data Marts
ETL ETL ETL
Archives
“Unstructured”
“ ”
Video Audio
Signals, Logs, Streams
Social
Documents, Messages
{ } Metadata
Search
ETL
🔍
ETL
Reference Data ETL
It’s Complicated
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
Look closely…
ETL
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
Look closely…
ETL
M…at the hidden “M”
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
At a crossroads…
§ Pre-requisite modeling has become an unsustainable friction point § Can/should we wait until the data is perfectly modeled to do discovery? § A real dollar cost before value is realized
§ “Cost per column” § Moving data around is also becoming a friction point
§ There’s too much of it to do it for all cases § Also a real dollar cost before value is realized
§ Traditional data warehousing largely leaves out “the 80%” § Most of the world’s data is unstructured § Dimensional warehouses as we know them are not up to the task
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20
Gartner’s Take
§ Organizations that failed to deploy strategies to address data complexity and volume issues for their analytics by 2012 will experience more than doubling costs of ownership for their data warehouse and mart environments in disorganized attempts to meet this new demand.
§ By 2014, 85% of organizations will fail to deploy new strategies to address data complexity and volume in their analytics.
§ By 2014, organizations that have deployed analytics to support new
complex data types and large volumes of data in analytics will outperform their market peers by more than 20% in revenue, margins, penetration and retention.
Source: Gartner Does the 21st Century, Beyer and Feinberg.
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
What’s being done?
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22
What about Hadoop?
Staging Analytics
Persistence Aggregates,
Models
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
What about Hadoop?
Staging Analytics
Persistence Aggregates,
Models
? Updates
Queries
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
Hadoop + RDBMS
Distill into RDBMS
…or spill-over into Hadoop
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25
Hadoop + RDBMS
ETL Distill into RDBMS
…or spill-over into Hadoop
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
Hadoop – What You Get Advantages Limitations
§ HDFS provides scale and economies of scale
§ File-based nature allows for greater Variety § Raw data is fine and any
shape will do
§ Schema-on-read possible § Map-reduce enables massive
parallel scaling
§ Hadoop was designed for batch processing § Does not support real-time
applications on its own § Requires expertise to configure,
deploy and manage § Has security limitations § Is not a database
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27
Enter NoSQL
§ Agility § Flexible data models (or none at all) § Many different types
o Simple Key/Value o Columnar o Document o Graph o Etc.
§ Enterprise features? § More confusion…?
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28
Legacy RDBMS § Indexes § Transactions § Security § Enterprise operations
“NoSQL” § Flexible data model § Commodity scale out § Distributed, fault-tolerant § Hadoop integration
Tough Choice…
Cashflows
PartyID Net
Date
Reference Payer TradeID
Amount Receiver
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29
Enterprise NoSQL – Best of Both Worlds…
§ Flexible, schema agnostic, document oriented data model § Comprehensive indexes
o Documents: Hierarchy, text, values, tags—schema “on-demand” o Scalars: Aggregates and range filters, including geospatial o Triples: Linked facts and inferencing o Permissions: Users, roles, compartments, and privileges o Queries: Reverse indexes for alerting, matching
§ Ad-hoc dimensions § Real-time transformation and/or schema on read § Lock-free reads § Strict consistency throughout § Oh yeah.. SQL too
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30
Universal Index
A Unified Platform
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31
Content: Words, phrases, entities, positions, etc. ... Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) ... ... ACE inhibitors, since the risk of lithium toxicity is very high in such patients...
Structure Label
Author Ing
Comp
ID Para
Org
Values name:sorbitol date:2012-06-04 company:Roche
Security Role:researcher-worldwide
Geospatial Lat: 46.946584 Long: 93.076172
Universal Index
Relationships Trenton isCityOf NewJersey James livesIn Trenton
A Unified Platform
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32
Content: Words, phrases, entities, positions, etc. ... Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) ... ... ACE inhibitors, since the risk of lithium toxicity is very high in such patients...
Structure Label
Author Ing
Comp
ID Para
Org
Values name:sorbitol date:2012-06-04 company:Roche
Security Role:researcher-worldwide
Geospatial Lat: 46.946584 Long: 93.076172
Universal Index
Relationships Trenton isCityOf NewJersey James livesIn Trenton
A Unified Platform
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33
DB + Search
§ DB separate from search is another unnecessary friction point § Best if integrated at the DB layer:
§ Quicker time to information § Greater than the sum of its parts § No separate indexes to maintain
§ Search as a text processing engine § Query capability for unstructured text § Turn text into numbers, create new dimensions § Infer new information from text search and enrich
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34
The advanced interface
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 35
Semantics and RDF
Data stored in Triples § Expressed as Subject : Predicate : Object
John Smith
LivesIn Brooklyn
New York Brooklyn PartOf
§ Can make inferences – e.g. John Smith LivesIn New York § Can create relationships on-the-fly
§ “We’ve identified a special relationship between a drug and an interaction…”
§ Machine-comprehensible “knowledge”
The World of Triples
Linked Open Data (Free semantic facts available to anyone)
Proprietary Semantic Facts (Facts and Taxonomies in your organization)
Sem
anti
c W
orld
Doc
umen
t
Wor
ld Facts from Free-Flowing Text
(Derived from semantic enrichment)
MarkLogic
Facts in Documents (Part of metadata or added with authoring tools)
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 37
Data Everywhere
Application
Data stored in
MarkLogic
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 38
Data Everywhere
Application
Data stored in
MarkLogic
On Tiered
Storage Widely Accessible
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 39
Data Everywhere
Application
Data stored in
MarkLogic
On Tiered
Storage Widely Accessible
Calling out to
endpoints
RDBMS
SPARQL
REST
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40
Data Everywhere
Application
Data stored in
MarkLogic
On Tiered
Storage Widely Accessible
Calling out to
endpoints
RDBMS
SPARQL
REST
Searchable
and Queryable
Logical Data
Warehouse
Reimagined
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41
More from Gartner…
§ 64% of surveyed organizations either have invested in big data already (30%) or have plans to invest within 24 months
§ Through 2017, 90% of the information assets from big data analytic efforts will be siloed and unleveragable across multiple business processes
§ By 2016, excessive focus of truth over trust in big data will prompt leadership change in 75% of projects
§ Through 2017, premiums for big data-related technology and project skills will remain 20% to 30% above norms for traditional information management skills
Source: Gartner – Predicts 2014: Big Data, Heudecker, Beyer, et al § Companies will spend more on application integration than on new application
systems
§ By 2018, more than 50% of the cost of implementing 90% of new large systems will be spent on integration
Source: Gartner – Predicts 2013: Application Integration, Lheureux, Pezzini, et al
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42
Integration…?
Hadoop Ecosystem
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43
The New Data Life-cycle
Textual
Structured
Multi-media Geospatial “ ” Social
Semantic
Discovery/Model Loop
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 44
Some parting thoughts
§ ETL is a friction point that should be minimized § Integration is another friction point § Modeling should no longer be pre-requisite to discovery
§ Evolve your model alongside discovery § Expect more from your database
§ Schema agility of Hadoop but in a DBMS § Enterprise capabilities of traditional DBMS
§ ACID, Security, HA/DR, etc. § Support for indexing and analyzing heterogeneous
information assets (text, data, geospatial, semantics, etc.) § Support for heterogeneous locality of data for strategy execution
§ Operational data + LDW with fewer moving parts § Tiered Storage
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Thank You
Ken Krupa Ken.krupa@marklogic.com @kenkrupa Check out these pages: www.marklogic.com developer.marklogic.com
Now reimagine the possibilities! Join us in a city near you! May 15 Amsterdam May 20 London June 3 New York City June 5 Chicago June 19 Baltimore June 24 Washington DC
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Rick Sherman
Rick Sherman Athena IT Solu:ons rsherman@athena-‐solu:ons.com
Copyright © 2014 Athena IT Solutions
The Briefing Room:
The New Database Fron:er
Slide 48 Copyright © 2014 Athena IT Solutions All rights reserved.
The New Database Frontier: Our History
• Relational emerged in 1980s & went mainstream in 1990s ü Transactional ü Data warehousing, Business Intelligence & Analytics
• Relational keep adding features for BI & DW ü More infrastructure for I/O & memory ü More complexity ü More skills ü Can “one size fit all”?
• OLAP (On-line Analytical Processing) Emerge ü Successful but…not pervasive ü Proprietary: Database, ETL & BI ü Specialized skills ü Scalability & extensibility issues
Slide 49 Copyright © 2014 Athena IT Solutions All rights reserved.
The New Database Frontier: Data & Analytical Needs Have Expanded
• Data Volume, Velocity & Variety Exploding
• Data Integration has Succeeded in Many Ways ü 5 C’s: comprehensive, consistent, clean & current ü Achieved at a cost
• BI & Analytics has Evolved & Expanded
Slide 50 Copyright © 2014 Athena IT Solutions All rights reserved.
The New Database Frontier: Do We Need to Change?
• Volume & Velocity for Structured Data ü Typically can be handled by traditional DW platform with
enhancements & emerging technologies û In-memory, Columnar, MPP, other û Infrastructure, Appliances, Cloud û Better architecture & design
ü But are there other approaches that can do it better?
• Variety is key difference & requires different approach ü Unstructured: text, audio, video, click streams, log files, social media ü Semi-structured: XML, RSS feeds, machine data ü None of big 3 (ETL, databases & BI) were built it
Slide 51 Copyright © 2014 Athena IT Solutions All rights reserved.
The New Database Frontier: NoSQL Data Stores
• NoSQL differ from Relational databases ü Structure ü Purpose ü SQL not used as primary query language
• Structures: ü Wide Column Store / Column Families ü Document Store ü Key Value / Tuple Store ü Graph Databases
• Characteristics ü Scalable, flexible, commodity hardware ü Supports 3 V’s ü Fixed table schemas not needed ü May not guarantee ACID (atomicity, consistency, isolation, durability)
Slide 52 Copyright © 2014 Athena IT Solutions All rights reserved.
The New Database Frontier: Big Data is Evolving
Big data platforms are evolving • NoSQL tools versus platforms • Cloud deployments • Integration • Advanced analytics
Benefits: • Increased capabilities & reduced programming • Shift roles
ü IT & Business ü Data scientists & business analysts ü Services
• Lower costs & time to market
Slide 53 Copyright © 2014 Athena IT Solutions All rights reserved.
Q&A
• Big Data Implementations o How are you addressing high cost, manual coding, time to market & skills shortage?
• Use Cases: (Assume used for unstructured data) o When would you use your database to store structured data? o Would your database be used for operational & transaction processing applications?
• Data Ingestion & Integration: o How would data sources typically be ingested? o Are there data integration capabilities similar to ETL available?
• BI: o In order to use BI tools or SQL is a data model needed? o How use extension keywords (“MATCH”) be used?
• Modeling data: o Contrast semantic triples/RDF vs dimensional models o Compare skills sets needed
• What are the differences in how your database handles: o Data capture versus information analysis o Data ingestion versus processing business transaction or processes
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: DATABASE
June: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!
top related