choosing which big data, nosql or database technology to use
DESCRIPTION
Basic overview of how to evaluate and match workloads to the various database technologies available, from NoSQL to relational. Workloads have different characteristics. If you don’t understand them you can end up implementing the wrong solution for the problem you have. The video from this presentation is available at https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=4953842&rKey=d03b10ecd9163770TRANSCRIPT
One Size Doesn’t Fit AllChoosing which big data, NoSQL or database technology to use
March 14, 2012
Mark R. Madsenhttp://ThirdNature.net
The problem of “big” is three problems of volume
Number of users!
Computations!
Amount of data!
Unstructured data isn’t really unstructured.
The problem is that this data is unmodeled.
The real challenge is complexity.
Big data?
The holy grail of databases under current market hype
A key problem is that we’re talking mostly about computation over data when we talk about “big data” and analytics, a potential mismatch for both relational and nosql.
Solving the Problem Depends on the Diagnosis
You must understand your workload ‐ throughput and response time requirements aren’t enough.▪ 100 simple queries accessing month‐to‐date data
▪ 90 simple queries accessing month‐to‐date data plus 10 complex queries using two years of history
▪ Hazard calculation for the entire customer master
▪ Performance problems are rarely due to a single factor.
Workload: One big query or many small queries?
Retrieval: small return set or large?
Selectivity: large volume of data scanned or small?
Important workload parameters to know
• Read‐intensive vs. write‐intensive
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
• Short vs. long access latency
Important workload parameters to know
• Read‐intensive vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
• Short vs. long access latency
• Predictable vs. unpredictable data access patterns
Types of workloads
Write‐biased: ▪ OLTP▪ OLTP, batch▪ OLTP, lite▪ Object persistence▪ Data ingest, batch▪ Data ingest, real‐time
Read‐biased:▪ Query▪ Query, simple retrieval
▪ Query, complex
▪ Query‐hierarchical / object / network
▪ Analytic
Mixed?Inline analytic execution, operational BI
Matching to parameters, at assumption of data scale
Workload parameters
Write‐biased
Read‐biased
Updateabledata
Eventual consistency ok
Un‐predictablequery path
Computeintensive
Standard RDBMS
ParallelRDBMS
NoSQL (kv,dht, obj)
Hadoop*
Streaming database
You see the problem: it’s an intersection of multiple parameters, and this chart only includes the first tier of parameters. Plus, workload factors can completely invert these general rules of thumb.
Matching to parameters, at assumption of data scale
Workload parameters
Complex queries
Selective queries
Low latency queries
High concurrency
High ingest rate
Standard RDBMS
Parallel RDBMS
NoSQL (kv, dht, obj)
Hadoop
Streaming database
You have to look at the combination of workload factors: data scale, concurrency, latency & response time, then chart the parameters.
Always build a proof of concept!
Image Attributions
Thanks to the people who supplied the images used in this presentation:
Holy Grail – © Monty Python Ltd.Cupcakes – <lost attribution on Flickr>
rock‐fall‐roadblock.jpg ‐ http://www.flickr.com/photos/wsdot/4679360979/
roadblock‐sheep.jpg ‐ http://www.flickr.com/photos/brizo_the_scot/4013939756/
Slide 17
About the PresenterMark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.