roundtable 1: relational and analytic database innovations

39
Wednesday, February 22, 12

Upload: inside-analysis

Post on 22-May-2015

734 views

Category:

Technology


4 download

DESCRIPTION

Slides from the Live Webcast on Feb. 22, 2012 Watch this Roundtable Webcast to learn about what’s happening in the relational and specialized “analytics” database market. The discussion will include four veteran analysts: Robin Bloor of The Bloor Group, Mark Madsen of Third Nature, Malcolm Chisholm of AskGet, and Rajeev Rawat of BI Results. For more information visit: http://www.databaserevolution.com Watch this and the entire series at : http://www.youtube.com/playlist?list=PLE1A2D56295866394

TRANSCRIPT

Page 1: Roundtable 1: Relational and Analytic Database Innovations

Wednesday, February 22, 12

Page 2: Roundtable 1: Relational and Analytic Database Innovations

Eric [email protected]

Twitter Tag: #briefrWednesday, February 22, 12

Page 3: Roundtable 1: Relational and Analytic Database Innovations

To conduct an Open Research program that invites the participation of both IT users and technology vendors

To assist IT buyers in understanding database technology and the architecture that surrounds it.

Allow audience members to pose serious questions... and get answers!

Publish all findings

Twitter Tag: #briefrWednesday, February 22, 12

Page 4: Roundtable 1: Relational and Analytic Database Innovations

Your Host: Eric Kavanagh

Research Leader: Mark Madsen - Third Nature

Primary Collaborator: Robin Bloor - The Bloor Group

Guest Analyst 1: Rajiv Rawat - BI Results

Guest Analyst 2: Malcolm Chisholm - Consultant

Wednesday, February 22, 12

Page 5: Roundtable 1: Relational and Analytic Database Innovations

Rajeev Rawat is the founder and CEO of BI Results. His career has involved leading large cross-functional teams at both IBM and Xerox, where he was involved in direct customer facing roles as well as taking part in headquarters assignments.His headquarters positions with worldwide responsibility included strategic assignments for alliances and relationships with technology partners, product management and product marketing. Other responsibilities include restructuring business models, test of new technology platforms, and sales coverage plans. Rajeev led the introduction of new technologies and solutions for Xerox and IBM.www.biresults.com, [email protected]: Rajeev Rawat

Twitter Tag: #brief

Wednesday, February 22, 12

Page 6: Roundtable 1: Relational and Analytic Database Innovations

The Bloor Group

©Copyright BI Results, LLC 2012

Ç√

Ç√

Fit for Purpose: The New Database RevolutionThe Bloor Group – February 22, 2012

Rajeev RawatServing to achieve your full potential

Five Years of Incredible Excitement In Information Acrobatics!

-Seismic shift in data Variety, Volume, Velocity

Wednesday, February 22, 12

Page 7: Roundtable 1: Relational and Analytic Database Innovations

The Bloor Group

©Copyright BI Results, LLC 2012

Ç√

Ç√

The Next Five YearsThe Most Exciting Times In Information Acrobatics

New Venture Funding

New (Needed) Functionality

New Skills

New Ventures

Innovative Code

Lots of Great Innovation

Reports of the Death of The RDBMSAre Highly Exaggerated

NoSQL InnovationApache Project, Amazon, Facebook,

Google, Open Source Community, Twitter

Key Value Store, Big Table, Graph DB, Document DB

Wednesday, February 22, 12

Page 8: Roundtable 1: Relational and Analytic Database Innovations

The Bloor Group

©Copyright BI Results, LLC 2012

Ç√

Ç√

RDBMS Still DominatesReliable Heavy Lifting

Strengths- Robust (ACID, Fail-proof)- Structure (Granular, Scalable, Fast)- Governance (Backups, Precision)- Tools (ETL, Analytics, Reporting)- Ecosystem (Global deep collaboration)- Skills (Certifications, Experience)- Policies, Procedures (Reliability)- Documentation (Support, Training)

Reports of the Death of RDBMSAre Highly Exaggerated

Photo: Watchmojo.com

RDBMS Vs. NoSQL?

Wednesday, February 22, 12

Page 9: Roundtable 1: Relational and Analytic Database Innovations

The Bloor Group

©Copyright BI Results, LLC 2012

Ç√

Ç√

- Co-Existence, Transition, NoSQL Only

- Meta Tag, Master Data Other scheme/s

- Data Governance, Controls. Authentication, Security

- Deep Analytics on Mixed Datasets

Fantastic Growth OpportunitySkills, Investing

NoSQLBeing Tested, Validated, Calibrated

Key Value Store, Big Table, Graph DB, Document DB

Complexity, Semi- Structured, Highly Connected Data

Wednesday, February 22, 12

Page 10: Roundtable 1: Relational and Analytic Database Innovations

The Bloor Group

©Copyright BI Results, LLC 2012

Ç√

Ç√

NoSQL, RDBMS InnovationFantastic Opportunity for Growth

Gaps You Can Help Close

- Mapping Big Data with Legacy Data

- Strategy and Policy for Governance, Precision, Controls

- Opportunities at all sides - Enterprise - Legacy Vendors - Innovative Ventures - Technology and Business Time to Rise To The Top

Skills, Investing

Tested For Prime Time

Finish Line

The Race Is On!

Wednesday, February 22, 12

Page 11: Roundtable 1: Relational and Analytic Database Innovations

Disection & Discussion

Twitter Tag: #briefrWednesday, February 22, 12

Page 12: Roundtable 1: Relational and Analytic Database Innovations

Twitter Tag: #briefr

Robin Bloor is Chief Analyst at The Bloor Group.

[email protected]

Wednesday, February 22, 12

Page 13: Roundtable 1: Relational and Analytic Database Innovations

Wednesday, February 22, 12

Page 14: Roundtable 1: Relational and Analytic Database Innovations

RDBMS

Wednesday, February 22, 12

Page 15: Roundtable 1: Relational and Analytic Database Innovations

The SQL BarrierSQL has:

DDL (for data definition)DML (for Select, Project and Join)But it has no MML or TML

Usually result sets are brought to the client for further manipulation, but using them for further data access becomes problematic.Conclusions:

This separation of data from process is arbitrary and unhelpful

AnalyticDBMS

SQLBarrier

SQL

Resultsprocessing

must be done here

Or resultsprocessing

must be done here

Wednesday, February 22, 12

Page 16: Roundtable 1: Relational and Analytic Database Innovations

That MapReduce ThingThere are two fundamental approaches to parallelism

Data PartitioningProcess partitioning

MapReduce implements an approach which is oriented to the first of these. Thus proves to be suited to many “big data” tasks.It is not the end ofd the parallel processing story by any means.

Wednesday, February 22, 12

Page 17: Roundtable 1: Relational and Analytic Database Innovations

Twitter Tag: #briefr

Malcolm Chisholm has 25+ years experience in data management working in finance, insurance, manufacturing, government, defense, pharmaceuticals, and retail. He specializes in data governance, MDM, metadata engineering, business rules management/execution, data architecture and design. He is a well-known presenter at conferences in the U.S. and Europe, writes columns in trade journals, and has authored the books: Managing Reference Data in Enterprise Databases; How to Build a Business Rules Engine; and Definitions in Information Management. In 2011, Malcolm was presented with the prestigious DAMA International Professional Achievement Award for contributions to Master Data Management. He can be contacted at [email protected].

Wednesday, February 22, 12

Page 18: Roundtable 1: Relational and Analytic Database Innovations

Disection & Discussion

Twitter Tag: #briefrWednesday, February 22, 12

Page 19: Roundtable 1: Relational and Analytic Database Innovations

The New Database Revolution: Relational Roundtable

Malcolm Chisholm [email protected]

Telephone 732-687-9283 • Fax 407-264-6809www.refdataportal.com

www.bizrulesengine.com

© AskGet.com Inc., 2012. All rights reserved

The Virtual Circle

February 22, 2012

San Francisco

Wednesday, February 22, 12

Page 20: Roundtable 1: Relational and Analytic Database Innovations

Relational Paradigm

“Big Data” Is Used Differently

© AskGet.com Inc., 2012. All rights reserved

• The relational paradigm is different to ULS “Big Data”. [ULS = Ultra-Large Scale - usually Petabyte scale]

• Difficult to rely on relational thinking in Cloud databases

ULS Dataspace in Cloud

“Set at a time” processing

Behavior of populations of identical things

Event data predominates

Exception reporting for singular things/events (bust still top-down)

Uncover individual facts

Surf and drill

Can aggregate from individual facts (but bottom-up)

Much is master data

Events are not as much repetitive transactions

Heavy data entry supported

Data entry is to support analysis

Wednesday, February 22, 12

Page 21: Roundtable 1: Relational and Analytic Database Innovations

Sources

© AskGet.com Inc., 2012. All rights reserved

• Sources provide data to the ULS dataspace• One source can provide many data formats• Many sources can provide the same format• Sources may duplicate the same data• HINT – Think metadata

ULS Dataspace in Cloud

Emails

Documents

Web Pages

XML

Relational

Flat Files

Audio

Image

Video

INGESTION

Source A

Source B

Source C

Source D

Source E

Wednesday, February 22, 12

Page 22: Roundtable 1: Relational and Analytic Database Innovations

Segments in Dataspace

© AskGet.com Inc., 2012. All rights reserved

• The ULS dataspace is not a single “blob” of data• It will have different segments with different kinds of data in it• The segments will be derived from the originally ingested data• MapReduce (M/R) is the equivalent of ETL to move data around and

transform it (filter, summarize)

ULS Dataspace in CloudSource A

Source B

Source C

Source N

INGESTION

Ingested Data Store

Extracted Master Data

Terms in Documents

M/R

M/R

M/R

M/R

Deduplicated Master Data

Document-Term Inverted Index

Wednesday, February 22, 12

Page 23: Roundtable 1: Relational and Analytic Database Innovations

No Common Notation for Columnar Designs

© AskGet.com Inc., 2012. All rights reserved

• E/R diagramming techniques allow us to visualize a relational database• There is nothing that is quite the same for columnar databases• (a) It is sparse and columns may be missing• (b) How do you show the MapReduce transformations (not quite relations)?

Row 01

Col A

Val1A

Col B Col C Col D Col E

Row 02 Val2A Val2B Val2C Val2D Val2E

Row 03 Val3A Val3C Val3E

?

Wednesday, February 22, 12

Page 24: Roundtable 1: Relational and Analytic Database Innovations

Need a Data Dictionary

© AskGet.com Inc., 2012. All rights reserved

• The ULS dataspace can grow quickly and have many data objects• Without a DD developers and users will get hopelessly lost (none of the

logic imposed by the relational model)• The fundamental unit is the field – show where it occurs in rows, ColQuals

and payloads• Tables less important than in relational

Wednesday, February 22, 12

Page 25: Roundtable 1: Relational and Analytic Database Innovations

Disection & Discussion

Twitter Tag: #briefrWednesday, February 22, 12

Page 26: Roundtable 1: Relational and Analytic Database Innovations

Twitter Tag: #briefr

Mark Madsen is founder of Third Nature, a research and consulting firm focused on analytics, BI and decision-making. Mark spent the past two decades working on analysis and decision support in many industries and countries. He is an award-winning architect and former CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.

Wednesday, February 22, 12

Page 27: Roundtable 1: Relational and Analytic Database Innovations

One Size Doesn’t Fit All

February 22, 2012

Mark R. Madsenhttp://ThirdNature.net

Wednesday, February 22, 12

Page 28: Roundtable 1: Relational and Analytic Database Innovations

The  future  of  data  is  the  database

Wednesday, February 22, 12

Page 29: Roundtable 1: Relational and Analytic Database Innovations

You keep using that word. I do not think it means what you think it means.

Wednesday, February 22, 12

Page 30: Roundtable 1: Relational and Analytic Database Innovations

The  rela*onal  database  is  the  franchise  technology  for  storing  and  retrieving  data,  but…

1.Global,  sta*c  schema  model

2.No  rich  typing  system

3.Many  are  not  a  good  fit  for  network  parallel  compu*ng,  aka  cloud

4.Limited  API  in  atomic  SQL  statement  syntax    &  simple  result  set  return

Good  conceptual  model,  but  a  prematurely  standardized  implementa5on

Wednesday, February 22, 12

Page 31: Roundtable 1: Relational and Analytic Database Innovations

Plus,  if  they’re  all  the  same  why  are  there  so  many?

Sybase  IQ,  ASETeradata,  Aster  DataOracle,  RACMicrosoT  SQLServer,  PDWIBM  DB2s,  NetezzaParaccelKogni*oEMC/GreenplumOracle  ExadataSAP  HANAInfobrightMySQLMarkLogicTokyo  Cabinet

EnterpriseDB  LucidDBVectorwiseMonetDBExasolIlluminateVer*caInfiniDB1010  DataSANDEndecaXtreme  DataIMSHive

AlgebraixIntersystems  CachéStreambaseSQLStreamCoral8IngresPostgresCassandraCouchDBMongoHbaseRedisRainStorScalaris

And a few hundred more.Wednesday, February 22, 12

Page 32: Roundtable 1: Relational and Analytic Database Innovations

The  future  of  data  is  the  rela0onal  database?

SQL noSQL

Wednesday, February 22, 12

Page 33: Roundtable 1: Relational and Analytic Database Innovations

The  future  of  data  is  the  rela0onal  database?

SQL noSQL

Wednesday, February 22, 12

Page 34: Roundtable 1: Relational and Analytic Database Innovations

Technologies  are  not  perfect  replacements  for  one  another.

When  replacing  the  old  with  the  new  (or  ignoring  the  new  over  the  old)  you  always  make  tradeoffs,  and  usually  you  won’t  see  them  for  a  long  0me.

Wednesday, February 22, 12

Page 35: Roundtable 1: Relational and Analytic Database Innovations

Disection & Discussion

Twitter Tag: #briefrWednesday, February 22, 12

Page 36: Roundtable 1: Relational and Analytic Database Innovations

Wednesday, February 22, 12

Page 37: Roundtable 1: Relational and Analytic Database Innovations

March:Vendor ResearchMarch 14th: Second Round Table focusing on No SQL databases and their applicationDB Revolution Survey conducted

April:Vendor ResearchPublishing of Round Table Transcripts, with comments

May:Authoring of White PaperPublishing of White PaperPublishing of survey activity

Twitter Tag: #briefrWednesday, February 22, 12

Page 38: Roundtable 1: Relational and Analytic Database Innovations

March 14th: Second DB Revolution Round Table

March Briefing Room: Integration

April Briefing Room: Discovery

May Briefing Room: Analytics

Twitter Tag: #briefrWednesday, February 22, 12

Page 39: Roundtable 1: Relational and Analytic Database Innovations

Thank YouFor YourAttention

Wednesday, February 22, 12