roundtable 1: relational and analytic database innovations
DESCRIPTION
Slides from the Live Webcast on Feb. 22, 2012 Watch this Roundtable Webcast to learn about what’s happening in the relational and specialized “analytics” database market. The discussion will include four veteran analysts: Robin Bloor of The Bloor Group, Mark Madsen of Third Nature, Malcolm Chisholm of AskGet, and Rajeev Rawat of BI Results. For more information visit: http://www.databaserevolution.com Watch this and the entire series at : http://www.youtube.com/playlist?list=PLE1A2D56295866394TRANSCRIPT
Wednesday, February 22, 12
Eric [email protected]
Twitter Tag: #briefrWednesday, February 22, 12
To conduct an Open Research program that invites the participation of both IT users and technology vendors
To assist IT buyers in understanding database technology and the architecture that surrounds it.
Allow audience members to pose serious questions... and get answers!
Publish all findings
Twitter Tag: #briefrWednesday, February 22, 12
Your Host: Eric Kavanagh
Research Leader: Mark Madsen - Third Nature
Primary Collaborator: Robin Bloor - The Bloor Group
Guest Analyst 1: Rajiv Rawat - BI Results
Guest Analyst 2: Malcolm Chisholm - Consultant
Wednesday, February 22, 12
Rajeev Rawat is the founder and CEO of BI Results. His career has involved leading large cross-functional teams at both IBM and Xerox, where he was involved in direct customer facing roles as well as taking part in headquarters assignments.His headquarters positions with worldwide responsibility included strategic assignments for alliances and relationships with technology partners, product management and product marketing. Other responsibilities include restructuring business models, test of new technology platforms, and sales coverage plans. Rajeev led the introduction of new technologies and solutions for Xerox and IBM.www.biresults.com, [email protected]: Rajeev Rawat
Twitter Tag: #brief
Wednesday, February 22, 12
The Bloor Group
©Copyright BI Results, LLC 2012
Ç√
Ç√
Fit for Purpose: The New Database RevolutionThe Bloor Group – February 22, 2012
Rajeev RawatServing to achieve your full potential
Five Years of Incredible Excitement In Information Acrobatics!
-Seismic shift in data Variety, Volume, Velocity
Wednesday, February 22, 12
The Bloor Group
©Copyright BI Results, LLC 2012
Ç√
Ç√
The Next Five YearsThe Most Exciting Times In Information Acrobatics
New Venture Funding
New (Needed) Functionality
New Skills
New Ventures
Innovative Code
Lots of Great Innovation
Reports of the Death of The RDBMSAre Highly Exaggerated
NoSQL InnovationApache Project, Amazon, Facebook,
Google, Open Source Community, Twitter
Key Value Store, Big Table, Graph DB, Document DB
Wednesday, February 22, 12
The Bloor Group
©Copyright BI Results, LLC 2012
Ç√
Ç√
RDBMS Still DominatesReliable Heavy Lifting
Strengths- Robust (ACID, Fail-proof)- Structure (Granular, Scalable, Fast)- Governance (Backups, Precision)- Tools (ETL, Analytics, Reporting)- Ecosystem (Global deep collaboration)- Skills (Certifications, Experience)- Policies, Procedures (Reliability)- Documentation (Support, Training)
Reports of the Death of RDBMSAre Highly Exaggerated
Photo: Watchmojo.com
RDBMS Vs. NoSQL?
Wednesday, February 22, 12
The Bloor Group
©Copyright BI Results, LLC 2012
Ç√
Ç√
- Co-Existence, Transition, NoSQL Only
- Meta Tag, Master Data Other scheme/s
- Data Governance, Controls. Authentication, Security
- Deep Analytics on Mixed Datasets
Fantastic Growth OpportunitySkills, Investing
NoSQLBeing Tested, Validated, Calibrated
Key Value Store, Big Table, Graph DB, Document DB
Complexity, Semi- Structured, Highly Connected Data
Wednesday, February 22, 12
The Bloor Group
©Copyright BI Results, LLC 2012
Ç√
Ç√
NoSQL, RDBMS InnovationFantastic Opportunity for Growth
Gaps You Can Help Close
- Mapping Big Data with Legacy Data
- Strategy and Policy for Governance, Precision, Controls
- Opportunities at all sides - Enterprise - Legacy Vendors - Innovative Ventures - Technology and Business Time to Rise To The Top
Skills, Investing
Tested For Prime Time
Finish Line
The Race Is On!
Wednesday, February 22, 12
Disection & Discussion
Twitter Tag: #briefrWednesday, February 22, 12
Twitter Tag: #briefr
Robin Bloor is Chief Analyst at The Bloor Group.
Wednesday, February 22, 12
Wednesday, February 22, 12
RDBMS
Wednesday, February 22, 12
The SQL BarrierSQL has:
DDL (for data definition)DML (for Select, Project and Join)But it has no MML or TML
Usually result sets are brought to the client for further manipulation, but using them for further data access becomes problematic.Conclusions:
This separation of data from process is arbitrary and unhelpful
AnalyticDBMS
SQLBarrier
SQL
Resultsprocessing
must be done here
Or resultsprocessing
must be done here
Wednesday, February 22, 12
That MapReduce ThingThere are two fundamental approaches to parallelism
Data PartitioningProcess partitioning
MapReduce implements an approach which is oriented to the first of these. Thus proves to be suited to many “big data” tasks.It is not the end ofd the parallel processing story by any means.
Wednesday, February 22, 12
Twitter Tag: #briefr
Malcolm Chisholm has 25+ years experience in data management working in finance, insurance, manufacturing, government, defense, pharmaceuticals, and retail. He specializes in data governance, MDM, metadata engineering, business rules management/execution, data architecture and design. He is a well-known presenter at conferences in the U.S. and Europe, writes columns in trade journals, and has authored the books: Managing Reference Data in Enterprise Databases; How to Build a Business Rules Engine; and Definitions in Information Management. In 2011, Malcolm was presented with the prestigious DAMA International Professional Achievement Award for contributions to Master Data Management. He can be contacted at [email protected].
Wednesday, February 22, 12
Disection & Discussion
Twitter Tag: #briefrWednesday, February 22, 12
The New Database Revolution: Relational Roundtable
Malcolm Chisholm [email protected]
Telephone 732-687-9283 • Fax 407-264-6809www.refdataportal.com
www.bizrulesengine.com
© AskGet.com Inc., 2012. All rights reserved
The Virtual Circle
February 22, 2012
San Francisco
Wednesday, February 22, 12
Relational Paradigm
“Big Data” Is Used Differently
© AskGet.com Inc., 2012. All rights reserved
• The relational paradigm is different to ULS “Big Data”. [ULS = Ultra-Large Scale - usually Petabyte scale]
• Difficult to rely on relational thinking in Cloud databases
ULS Dataspace in Cloud
“Set at a time” processing
Behavior of populations of identical things
Event data predominates
Exception reporting for singular things/events (bust still top-down)
Uncover individual facts
Surf and drill
Can aggregate from individual facts (but bottom-up)
Much is master data
Events are not as much repetitive transactions
Heavy data entry supported
Data entry is to support analysis
Wednesday, February 22, 12
Sources
© AskGet.com Inc., 2012. All rights reserved
• Sources provide data to the ULS dataspace• One source can provide many data formats• Many sources can provide the same format• Sources may duplicate the same data• HINT – Think metadata
ULS Dataspace in Cloud
Emails
Documents
Web Pages
XML
Relational
Flat Files
Audio
Image
Video
INGESTION
Source A
Source B
Source C
Source D
Source E
Wednesday, February 22, 12
Segments in Dataspace
© AskGet.com Inc., 2012. All rights reserved
• The ULS dataspace is not a single “blob” of data• It will have different segments with different kinds of data in it• The segments will be derived from the originally ingested data• MapReduce (M/R) is the equivalent of ETL to move data around and
transform it (filter, summarize)
ULS Dataspace in CloudSource A
Source B
Source C
Source N
INGESTION
Ingested Data Store
Extracted Master Data
Terms in Documents
M/R
M/R
M/R
M/R
Deduplicated Master Data
Document-Term Inverted Index
Wednesday, February 22, 12
No Common Notation for Columnar Designs
© AskGet.com Inc., 2012. All rights reserved
• E/R diagramming techniques allow us to visualize a relational database• There is nothing that is quite the same for columnar databases• (a) It is sparse and columns may be missing• (b) How do you show the MapReduce transformations (not quite relations)?
Row 01
Col A
Val1A
Col B Col C Col D Col E
Row 02 Val2A Val2B Val2C Val2D Val2E
Row 03 Val3A Val3C Val3E
?
Wednesday, February 22, 12
Need a Data Dictionary
© AskGet.com Inc., 2012. All rights reserved
• The ULS dataspace can grow quickly and have many data objects• Without a DD developers and users will get hopelessly lost (none of the
logic imposed by the relational model)• The fundamental unit is the field – show where it occurs in rows, ColQuals
and payloads• Tables less important than in relational
Wednesday, February 22, 12
Disection & Discussion
Twitter Tag: #briefrWednesday, February 22, 12
Twitter Tag: #briefr
Mark Madsen is founder of Third Nature, a research and consulting firm focused on analytics, BI and decision-making. Mark spent the past two decades working on analysis and decision support in many industries and countries. He is an award-winning architect and former CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.
Wednesday, February 22, 12
One Size Doesn’t Fit All
February 22, 2012
Mark R. Madsenhttp://ThirdNature.net
Wednesday, February 22, 12
The future of data is the database
Wednesday, February 22, 12
You keep using that word. I do not think it means what you think it means.
Wednesday, February 22, 12
The rela*onal database is the franchise technology for storing and retrieving data, but…
1.Global, sta*c schema model
2.No rich typing system
3.Many are not a good fit for network parallel compu*ng, aka cloud
4.Limited API in atomic SQL statement syntax & simple result set return
Good conceptual model, but a prematurely standardized implementa5on
Wednesday, February 22, 12
Plus, if they’re all the same why are there so many?
Sybase IQ, ASETeradata, Aster DataOracle, RACMicrosoT SQLServer, PDWIBM DB2s, NetezzaParaccelKogni*oEMC/GreenplumOracle ExadataSAP HANAInfobrightMySQLMarkLogicTokyo Cabinet
EnterpriseDB LucidDBVectorwiseMonetDBExasolIlluminateVer*caInfiniDB1010 DataSANDEndecaXtreme DataIMSHive
AlgebraixIntersystems CachéStreambaseSQLStreamCoral8IngresPostgresCassandraCouchDBMongoHbaseRedisRainStorScalaris
And a few hundred more.Wednesday, February 22, 12
The future of data is the rela0onal database?
SQL noSQL
Wednesday, February 22, 12
The future of data is the rela0onal database?
SQL noSQL
Wednesday, February 22, 12
Technologies are not perfect replacements for one another.
When replacing the old with the new (or ignoring the new over the old) you always make tradeoffs, and usually you won’t see them for a long 0me.
Wednesday, February 22, 12
Disection & Discussion
Twitter Tag: #briefrWednesday, February 22, 12
Wednesday, February 22, 12
March:Vendor ResearchMarch 14th: Second Round Table focusing on No SQL databases and their applicationDB Revolution Survey conducted
April:Vendor ResearchPublishing of Round Table Transcripts, with comments
May:Authoring of White PaperPublishing of White PaperPublishing of survey activity
Twitter Tag: #briefrWednesday, February 22, 12
March 14th: Second DB Revolution Round Table
March Briefing Room: Integration
April Briefing Room: Discovery
May Briefing Room: Analytics
Twitter Tag: #briefrWednesday, February 22, 12
Thank YouFor YourAttention
Wednesday, February 22, 12