michael franklin, uc berkeley my cidr epiphany: real world data, schema, and environment michael...

11
Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium (old persons track) February 11, 2005

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

My CIDR Epiphany:Real World Data, Schema, and Environment

Michael FranklinUC Berkeley

Post SIGMOD PC Research Symposium

(old persons track)

February 11, 2005

Page 2: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

How it Happened or why it sometimes pays to hang around until the end of a conference

• The “gloom and doom” panel

• DeWitt’s gong show challenge

• Grappa consumption & staying up too late

• A great last session on sensor/stream processing, including:– Jennifer Widom’s Trio Talk– Shawn Jeffery’s HiFi Talk – Sam Madden’s Probabilistic

Sensor Net Talk

Page 3: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

The SIGMOD Credo

Codd made relations,

all else is the work of man.Leopold Kronecker (paraphrased by Raghu Ramakrishnan)

Page 4: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

Database Management: Then

Page 5: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

Database Management: Now

Page 6: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

RM has been tremendously successful, but at a cost

• Shoehorn the world into regular, flat tables.– This works particularly well for data that looks

like regular, flat tables.

• Ignore inconvenient facts about real world.– Source of a multi-billion $/yr consulting industry.

• But, new applications, environments, devices, user expectations, are finally reaching a tipping point — stretching the model beyond its inherent capabilities.

Page 7: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

Relational Model Assumptions: Real World Data

All data in the database is 100% Valid

The facts in the database are self-consistent

Anything outside of the DB does not exist

Time and space are just regular attributes

Data items unambiguously map to real world entities

Page 8: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

RM Assumptions: Schema

All data conforms to a strict schema

These schemas and their relationship to the data don't change much

Everyone agrees on the meaning of the data

No one cares where the data came from

Page 9: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

RM Assumptions: EnvironmentUsers know exactly what they want to ask of the database

Users want absolute answers (no satisficing)

Queries can be independent of the user’s context

All data is always available

Page 10: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

Bridging the Physical Divide

• We need to build systems that more realistically model the real world (and all its ambiguity)

• We need to build systems that support users and conform to their goals, requirements, and habits (not vice versa)

• This is going to require new data and query models, and likely another 30 years of work to get it right.

Page 11: Michael Franklin, UC Berkeley My CIDR Epiphany: Real World Data, Schema, and Environment Michael Franklin UC Berkeley Post SIGMOD PC Research Symposium

Michael Franklin, UC Berkeley

RM Assumption Cheat Sheet(A baker’s dozen)

1) All data in the database is 100% Valid 2) The facts in the database are self-consistent3) Anything outside of the DB does not exist4) Time and space are just regular attribute5) Data items unambiguously map to real world entities 6) All data conforms to a strict schema7) These schemas and their relationship to the data

don't change much8) Everyone agrees on the meaning of the data9) No one cares where the data came from

10) Users know exactly what they want to ask of the database

11) Users want absolute answers (no satisficing)12) Queries can be independent of the user’s context13) All data is always available

Real WorldData

Schema

Environment