Graceful Database Schema Evolution:
the PRISM Workbench
Carlo A. Curino, Hyun J. Moon, and Carlo Zaniolo
VLDB 2008
Presented by: Kristian Torp
Database Specialization Course 2010 2
Motivation
• Problems select * from emp insert into emp (name, dept) values (:x, :y)
• Current status Manual, error prone, time-consuming process Missing methods and tools
Name Dept
Jens HR
Lone RD
Name Dept Salary
Jens HR 10
Lone RD 20
Version n Version n+1
New requirements
“Database Refactoring”
Overview• Challenges• Schema Modification Operators (SMOs)• The tool and performance• Conclusion• Critique
Database Specialization Course 2010 3
The Current Situation and Solution
Database Specialization Course 2010 4
v1
v2
Time
“Rock Climbing”
v1 v2 v3 v4 v5
Time
“Stepping Stones”
forwardbackward
Main Challenges• Schema evolution
Dramatic impact data and queries Extensive application maintenance Cause unacceptable system downtime
• Becoming a bigger problem More collaboration (web-databases) No downtime (300K$/hours) Larger databases
• Method/tool must support Predict and evaluate the effect of schema changes Rewrite queries and applications Migrate the data
Database Specialization Course 2010 5
Problem Context: The Power of RDBMSs
Database Specialization Course 2010 6
Physical
Logical
View 1 View 2 View n
DBMSs
Prism
WikeMedia Schema Change
Database Specialization Course 2010 7
oidtitleuser
minortext
timestamp
oidtitleuser
minortext
timestampis_new
is_redirect
V41
pidtitle
is_newis_redirect
latest
ridpageiduser
minortimestamp
tidtext
V42
old
cur
page
revision
text
Database Specialization Course 2010 8
Schema Change: Effect Queries• Get current information about Auckland entry
SELECT cur.textFROM curWHERE cur.title = ’Auckland’
SELECT text.textFROM page, revision, textWHERE page.pid = revision.pageAND revision.rid = text.tidAND page.latest = revision.ridAND page.title = ’Auckland’
V41
V42
SMO: Diagram and Specification
Database Specialization Course 2010 9
SMO: Examples
Database Specialization Course 2010 10
partition merge
decompose join
SMO Overview (Schema and Data)
Database Specialization Course 2010 11
SMO Characteristics
Database Specialization Course 2010 12
Prism: The Architecture
Database Specialization Course 2010 13
Prism: The Workflow
Database Specialization Course 2010 14
[source: http://yellowstone.cs.ucla.edu:8180/Prism2/Prism.html#]
Prism: A Screen Dump
Database Specialization Course 2010 15[source: http://yellowstone.cs.ucla.edu:8180/Prism2/Prism.html#]
Prism: Performance
Database Specialization Course 2010 16
Conclusion• Tool that support schema evolution and data/query
migration Provides a method (and naturally a tool) for the DBA Available online for testing
• Tool used on real-world non-trivial examples• Data provenance
Documentation of what is done
• Schema modification almost entirely be automated 97% in tested, i.e., 3% manually
• Data/query migration performance well
Database Specialization Course 2010 17
Database Specialization Course 2010 18
Good• Relevant problem
Bridges the theoretical world with practice Provides an operational tool
• Real-world, non-trivial running example• Makes practical use of existing research• Validation:
“practical soundness and completeness” “theoretical advances into practical solutions”
• Round-trip, e.g., forward and backward Data provenance
Could be improve• What happens if you drop a table is the data then gone?
Can you then go back (flash-back) “system stores and maintains the schema layout history….”
• “chase-based query rewriting” p764 unclear• Better description of key mappings in running example
Handling of constraints in general
• Too many foot notes• Minor errors
Some spelling errors “gygabytes” DED introduced multiple times
• Table 1 misplaced• Dislikes SELECT * (p763) but uses it later (p. 769)• Is (unique, perfect) values (no, yes) impossible in Table 3?
Database Specialization Course 2010 19