data modelers still have jobs: adjusting for the nosql environment
DESCRIPTION
Data modeling emerged in the 1970’s in response to the needs of database designers. This accident of history has influenced perceptions and practices of data modeling in harmful ways. Most notably, business-focused requirements analysis has been wrongly commingled with relational modeling. Compounding the problem, vendors have produced data-modeling tools that blur the important distinction between the client’s problem and the technologist’s solution. Enter NoSQL, with its promise of liberating practitioners from the tiresome burden of designing relational databases. The chance to dispense with relational modeling was embraced enthusiastically, but for many organizations, it has meant discarding the only rigorous activity that had any hope of formally expressing the client’s data needs. This is a textbook case of throwing out the baby with the bathwater. This presentation shows you how to save the baby, and your career as a data modeler. Understanding the client’s data problem remains essential, regardless of the technology used to build the solution. For that matter, understanding the client’s data problem is the first step toward making an informed choice of technology for the solution. Using concrete, real-world examples, the presenter will show the following: - How abandoning modeling altogether is a recipe for disaster, even in—or especially in—NoSQL environments How experienced relational modelers can leverage their skills for NoSQL projects - How the NoSQL context both simplifies and complicates the modeling endeavor - How lessons learned modeling for NoSQL projects can make you a more effective modeler for any kind of projectTRANSCRIPT
Data Modelers Save Their Careers: Surviving and Thriving with NoSQL
Joe MaguireData Quality Strategies, LLC
http://www.DataQualityStrategies.com/
© 2013 Data Quality Strategies, LLC
2
Thesis
• Relational DBMS’s have dominated,• ...so relational modeling subsumed other
forms, including conceptual modeling.• As R-DBMS wanes, so does relational
modeling – and sadly, whatever it subsumed.• Conceptual modeling must be saved.• Relational modelers can step in to save it...• ...with some significant effort.
25 June 2013 © 2013 Data Quality Strategies, LLC
3
My Perspective
• Over three decades in industry• Career is a three-legged stool
– Product development for software vendors– Solution design for enterprises– Author, Industry Analyst, Thought Leader
• Specialize in – Modeling– Requirements analysis– Data architecture– Data quality
• [email protected] 25 June 2013 © 2013 Data Quality Strategies, LLC
4
Agenda
• History• Current Events• Your Future as a Data Modeler• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC
5
A Big-Picture Framework
25 June 2013 © 2013 Data Quality Strategies, LLC
Meta-model Data Perspective
Conceptual • Entities• Attributes• Relationships• Identifiers
Logical • Tables• Columns• Primary and foreign keys
Physical • Indexes• Table spaces• Vertical and horizontal partitioning• Denormalizations
6
Good Ideas in the Framework• Information Hiding– e.g., conceptual excludes implementation details
• The Type/Instance distinction– Models describe categories, data describes members
• Application/Data Independence– Data modeling is separate from process modeling
• User Requirements ≠ System Requirements– Users should not participate in logical and physical
• Model-Driven Development– Forward and reverse engineering across model levels
25 June 2013 © 2013 Data Quality Strategies, LLC
7
A Big-Picture Framework, distorted
25 June 2013 © 2013 Data Quality Strategies, LLC
Meta-model Data Perspective
Relational • Entities / Tables• Attributes / Columns• Relationships / FKs• Identifiers / PKs
Physical • Indexes• Table spaces• Vertical and horizontal partitioning• Denormalizations
8
How the Distortion Happens• Tool Vendors Dismiss Conceptual Modeling– Because their tools cannot support it anyway
• Info Mgmt Specialists Confuse Models w Reality– E.g., believing the relational model suffices to
describe the universe• Institutionalized Expediency – We know about conceptual modeling, but to save
time, we combine it with relational modeling...– ...then we formalize that into our dev processes...– ...and eventually, that becomes the “best practices.”
25 June 2013 © 2013 Data Quality Strategies, LLC
9
Distortions, Revisited
• Summary of Distortions:– Distortion: Conceptual means vague– Distortion: Logical implies relational• Rather than implying XML, OO, KV Store, Array Database,
Graph Database
• Results of Distortions:– Two levels only: relational and physical– Relational modeling used for user requirements
25 June 2013 © 2013 Data Quality Strategies, LLC
10
Agenda
• History• Current Events• Your Future as a Data Modeler• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC
11
Current Events: NoSQL• The “Just Say No” Interpretation
25 June 2013 © 2013 Data Quality Strategies, LLC
Meta-model Data Perspective
LogicalRelational
• Entities / Tables• Attributes / Columns• Relationships / FKs• Identifiers / PKs
Physical NO LONGER RELATIONAL:• Schemas Based on Big Table Implementations• Alien DDL language• Limited Support from Modeling Tools
12
Current Events: NoSQL
25 June 2013 © 2013 Data Quality Strategies, LLC
• The “Not Only SQL” Interpretation– Okay, so there might be some work for you– But you’re at risk of being marginalized
13
Agenda
• History• Current Events• Your Future as a Data Modeler• Summary• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC
14
Your Future as a Modeler
25 June 2013 © 2013 Data Quality Strategies, LLC
• Remaining Relevant– Selfishly: Saving your career– Nobly: Serving your client / company / customer
• What You Can Do:– Wait for relational projects– Become a NoSQL database designer– Help your client choose data platforms• That starts with understanding the problems
– which starts with CONCEPTUAL MODELING.
15
A New (?) Modeling Framework
• Conceptual Modeling• Choosing a Logical Meta-model• Logical Modeling• Physical Modeling
• Tool Support?
25 June 2013 © 2013 Data Quality Strategies, LLC
16
Conceptual Modeling
• Behaviors and constructs will compare to relational modeling:– Keep some– Discard some– Stress some– Change some
25 June 2013 © 2013 Data Quality Strategies, LLC
17
Conceptual Data Model Example
25 June 2013 © 2013 Data Quality Strategies, LLC
18
Keep Some
• Keep Entities• Keep Attributes• Keep Relationships• Keep Identifiers• Keep Maximum Cardinality of Relationships
25 June 2013 © 2013 Data Quality Strategies, LLC
19
Keep Entities
• Minimum Expressiveness• Entities, Not Tables– Don’t express horizontal or vertical partitioning for
performance• But yes if motivated by privacy/security/risk
• Entity names, not table names– Honor user vocabulary, not IT naming standards
25 June 2013 © 2013 Data Quality Strategies, LLC
20
Keep Attributes
• Honor The User Phenomenon– Attributes are part of user discourse
• Attributes, Not Columns– Worry about scale (nominal, numeric, ordinal,
Boolean, cyclic), not data type– Attribute names, not column names
• Support In-Progress Models– During which attributes can become entities
25 June 2013 © 2013 Data Quality Strategies, LLC
21
Keep Relationships
• Minimum Expressiveness– Relationships are part of user discourse
• Allow Many-Many and Collection Entities– If the latter seem illegal, you’ve been in IT too long
• Relationships, not FKs
25 June 2013 © 2013 Data Quality Strategies, LLC
22
• Relationships, not Foreign Keys
– (achievement DOES NOT have code or creatureID)
Keep Relationships
25 June 2013 © 2013 Data Quality Strategies, LLC
23
• Many-Many AllowedKeep Relationships
25 June 2013 © 2013 Data Quality Strategies, LLC
24
Keep Identifiers
• Identifiers, Not PKs– IDs are not motivated by computerization, but by
typography– IDs predate the information revolution• and the automotive revolution, for that matter
– Allow collection entities• Support In-Progress Modeling– IDs help the modeler ferret out the homonym
problem
25 June 2013 © 2013 Data Quality Strategies, LLC
25
Keep Identifiers
• Identifiers, not PKs. (E.g., Collection Entities):
– (each squad is identified by the skaters on it.)
25 June 2013 © 2013 Data Quality Strategies, LLC
26
Discard Some
• Discard Foreign Keys– They’re relational
• Discard Minimum Cardinality– A function of process or policy, not data– Over-reported by users
• Discard Most Constraints– A function of process or policy, not data– Are over-reported by users
25 June 2013 © 2013 Data Quality Strategies, LLC
27
Discard Minimum Cardinality• Must EVERY instance of meeting have a person?
– No. E.g., CassandraSummit 2014 already has a date and location but has zero persons associated with it.
• More generally: Should the DBMS refuse to store incomplete data?– People get interrupted and want to save their partial
work.25 June 2013 © 2013 Data Quality Strategies, LLC
28
Keep/Discard Rule of Thumb
• Keep– Anything that helps you and the users together
discover and name the user categories• Discard– Anything else
25 June 2013 © 2013 Data Quality Strategies, LLC
29
Conceptual Data Model Examples
25 June 2013 © 2013 Data Quality Strategies, LLC
30
Stress Some
• Stress Consistency Requirements– Relational modelers (of non-distributed databases)
have not been asking about these.• Stress Data Volume / Velocity Requirements– Can lead or force your to relax application-data
independence
25 June 2013 © 2013 Data Quality Strategies, LLC
31
Change Some
• Change Your Process– From math-y normalization to English-y
conversation with users– Very difficult to achieve rigor conversationally
25 June 2013 © 2013 Data Quality Strategies, LLC
• More help:– Mastering Data Modeling: A
User-Driven Approach by Carlis & Maguire
32
A New Modeling Framework
• Conceptual Modeling• Choosing a Logical Meta-Model• Logical Modeling• Physical Modeling
• Tool Support?
25 June 2013 © 2013 Data Quality Strategies, LLC
33
Choosing a Logical Meta-Model
• Don’t Assume Relational (Duh...)• Don’t Assume Big Table, KV-Store, Cassandra• Lots of Choices– Relational– Key-Value Store– XML/Document Database– Graph database– Array database– ...
25 June 2013 © 2013 Data Quality Strategies, LLC
34
A New Modeling Framework
• Conceptual Modeling• Choosing a Logical Meta-Model• Logical Modeling• Physical Modeling
• Tool Support?
25 June 2013 © 2013 Data Quality Strategies, LLC
35
Logical, Physical, and Tool Support
• Minimal Support From Modeling Tools– Because few tools support conceptual modeling– Because vendors have not caught up to NoSQL yet
• Community Needs to Develop Shapes– And the attendant transformations from conceptual
shapes to Big-Table shapes• During Logical NoSQL Modeling, Process
Requirements Will Infiltrate
25 June 2013 © 2013 Data Quality Strategies, LLC
36
Agenda
• History• Current Events• Your Future as a Data Modeler• Summary• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC
37
Summary
• Recommit to Conceptual Modeling for Requirements Analysis– Some but not all relational-modeling skills will
apply– Must learn to focus on user communication, not
nerdy stuff like intermediate normal forms
25 June 2013 © 2013 Data Quality Strategies, LLC
38
Summary
• Remember the fundamentals, so that you can make informed decisions about relaxing them– Application-data independence (relax knowingly)– Distinguish problems from solutions (relax at your
own peril)– Consistency level as a user requirement (as you
ask, you’ll find immediate consistency is often negotiable)
25 June 2013 © 2013 Data Quality Strategies, LLC
39
Summary
• Additional Benefits– Users will like you better– Agile developers will like you better– This framework works in traditional, all-SQL
environments
25 June 2013 © 2013 Data Quality Strategies, LLC
40
Q&A
• [email protected]• www.DataQualityStrategies.com
25 June 2013 © 2013 Data Quality Strategies, LLC