lecture 4: transactions (recovery)
DESCRIPTION
Lecture 4: Transactions (Recovery). Wednesday, October 19, 2011. Homework 3. The key concepts here: Connect to db and call SQL from java Dependent joins Integrate two databases Transactions Amount of work: 20 SQL queries+180 lines Java ≈ 12 hours (?). Outline. Transaction basics - PowerPoint PPT PresentationTRANSCRIPT
Dan Suciu -- CSEP544 Fall 2011 1
Lecture 4:Transactions (Recovery)
Wednesday, October 19, 2011
Homework 3The key concepts here:• Connect to db and call SQL from java• Dependent joins• Integrate two databases• TransactionsAmount of work:• 20 SQL queries+180 lines Java ≈ 12 hours (?)
2
Dan Suciu -- CSEP544 Fall 2011 3
Outline
• Transaction basics• Recovery• Concurrency control (next lecture)
Reading Material for Lectures 4/5
Textbook (Ramakrishnan): Ch. 16, 17, 18
Second textbook (Garcia-Molina)• Ch. 17.2, 17.3, 17.4• Ch. 18.1, 18.2, 18.3, 18.8, 18.9
Optional: M. Franklin, Concurrency Control and Recovery
Dan Suciu -- CSEP544 Fall 2011 5
TransactionDefinition: a transaction is a unit of program execution that accesses/updates various data items• It consists of all operations between:
BEGIN TRANSACTIONEND TRANSACTION
• The collection of steps must appear to the user as a single, indivisible unit
Dan Suciu -- CSEP544 Fall 2011
Transaction
6
BEGIN TRANSACTION
[SQL statements]
COMMIT or ROLLBACK (=ABORT)
May be omitted:first SQL query
starts txn
In ad-hoc SQL: each statement = one transaction
Dan Suciu -- CSEP544 Fall 2011
Turing Awards to Database Researchers
• Charles Bachman 1973 for CODASYL
• Edgar Codd 1981 for relational databases
• Jim Gray 1998 for transactions
7
Dan Suciu -- CSEP544 Fall 2011
Implementing TransactionsWhat can go wrong ?
• System crash– Division by 0, power failure
• Interferences with other users– “Anomalies” – 3 have famous names
8
Dan Suciu -- CSEP544 Fall 2011 9
System Crash
What can go wrong?
Client 1:BEGIN TRANSACTIONUPDATE AccountsSET balance= balance - 500WHERE name= ‘Fred’
UPDATE AccountsSET balance = balance + 500WHERE name= ‘Joe’COMMIT
Crash !
Dan Suciu -- CSEP544 Fall 2011 10
1st Famous Anomaly:Lost Update
Client 1:BEGIN TRANSACTION
UPDATE CustomerSET rentals= rentals + 1WHERE cname= ‘Fred’
COMMIT
Client 2:BEGIN TRANSACTION
UPDATE CustomerSET rentals= rentals + 1WHERE cname= ‘Fred’
COMMIT
What can go wrong ?
Dan Suciu -- CSEP544 Fall 2011 11
2nd Famous Anomaly:Inconsistent Read
Client 1: move from gizmogadgetBEGIN TRANSACTIONUPDATE Products SET quantity = quantity + 5WHERE product = ‘gizmo’
UPDATE Products SET quantity = quantity - 5WHERE product = ‘gadget’COMMIT
Client 2: inventory….BEGIN TRANSACTIONSELECT sum(quantity) FROM ProductCOMMIT
What can go wrong ?
3rd Famous Anomaly:Dirty Reads
What can go wrong ?
-- Client 1: transfer $100 acc1 acc2BEGIN TRANSACTIONX = Account1.balance; Account2.balance += 100
If (X>=100 and other stuff OK…) { Account1.balance -=100; COMMIT }else {print(“Denied !”); ROLLBACK}
-- Client 2: transfer $100 acc2 acc3BEGIN TRANSACTIONX = Account2.balance; Account3.balance += 100
If (X>=100 and other stuff OK…) { Account2.balance -=100; COMMIT }else {print(“Denied !”); ROLLBACK}
13
The Three Famous anomalies• Lost update
– Two tasks T and T’ both modify the same data– T and T’ both commit– Final state shows effects of only T, but not of T’
• Dirty read– T reads data written by T’ while T’ has not committed– What can go wrong: T’ write more data (which T has
already read), or T’ aborts
• Inconsistent read– One task T sees some but not all changes made by T’
Dan Suciu -- CSEP544 Fall 2011 14
ACID Properties• Atomic
– State shows either all the effects of txn, or none of them
• Consistent– Txn moves from a state where integrity holds, to
another where integrity holds• Isolated
– Effect of txns is the same as txns running one after another (ie looks like batch mode)
• Durable– Once a txn has committed, its effects remain in the
database
Dan Suciu -- CSEP544 Fall 2011 15
ACID: Atomicity
• Two possible outcomes for a transaction– It commits: all the changes are made– It aborts: no changes are made
• That is, transaction’s activities are all or nothing
Dan Suciu -- CSEP544 Fall 2011 16
ACID: Isolation
• A transaction executes concurrently with other transaction
• Isolation: the effect is as if each transaction executes in isolation of the others
17
ACID: Consistency• The database satisfies integrity constraints
– Account numbers are unique– Stock amount can’t be negative– Sum of debits and of credits is 0
• Consistency = if the database satisfied the constraints at the beginning of the transaction, and if the application is written correctly, then the constraints must hold at the end of the transactions
• Introduced as a requirement in the 70s, but today we understand it is a consequence of atomicity and isolation
Dan Suciu -- CSEP544 Fall 2011 18
ACID: Durability
• The effect of a transaction must continue to exists after the transaction, or the whole program has terminated
• Means: write data to disk
• Sometimes also means recovery
Dan Suciu -- CSEP544 Fall 2011 19
Reasons for Rollback
• Explicit in the application– E.g. use it freely in HW 3
• System-initiated abort– System crash– Housekeeping, e.g. due to timeouts
Dan Suciu -- CSEP544 Fall 2011 20
Outline
• Recovery from failures (the A in ACID)– Today
• Concurrency Control (the C in ACID)– Next lecture
Dan Suciu -- CSEP544 Fall 2011 21
Log-based Recovery
Simple recovery algorithms (Garcia-Molina Ch. 17.2, 17.3, 17.4)
• Undo logging• Redo logging• Redo/undo loggingState of the art (Ramakrishnan Ch. 18)• Aries
22
Buffer Management in a DBMS
DB
disk page
free frame
BUFFER POOLREADWRITE
INPUTOUTUPT
Application(Database server)
Large gap between disk I/O and memory Buffer pool
Page Replacement Policies
• LRU = expensive– Next slide
• Clock algorithm = cheaper alternative– Read in the book
Both work well in OS, but not always in DB
Dan Suciu -- CSEP544 Fall 2011 23
Least Recently Used (LRU)
Dan Suciu -- CSEP544 Fall 2011 24
P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)
??
Most recent Least recent
Least Recently Used (LRU)
Dan Suciu -- CSEP544 Fall 2011 25
P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)
P6, P5, P2, P8, P4, P1, P9, P3, P7
Most recent Least recent
Least Recently Used (LRU)
Dan Suciu -- CSEP544 Fall 2011 26
P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)
P6, P5, P2, P8, P4, P1, P9, P3, P7
Read(P10)
??
Most recent Least recent
Least Recently Used (LRU)
Dan Suciu -- CSEP544 Fall 2011 27
P5, P2, P8, P4, P1, P9, P6, P3, P7Read(P6)
P6, P5, P2, P8, P4, P1, P9, P3, P7
Input(P10)
P10, P6, P5, P2, P8, P4, P1, P9, P3
Read(P10)
Most recent Least recent
28
RecoveryType of Crash Prevention
Wrong data entry Constraints andData cleaning
Disk crashes Redundancy: e.g. RAID, archive
Fire, theft, bankruptcy… Remote backups
System failures:e.g. power
DATABASERECOVERY
Dan Suciu -- CSEP544 Fall 2011 29
Transactions
• Assumption: the database is composed of elements– Usually 1 element = 1 block– Can be smaller (=1 record) or larger (=1
relation)• Assumption: each transaction consists
of a sequence of reads/writes of elements
Dan Suciu -- CSEP544 Fall 2011 30
Primitive Operations of Transactions
• READ(X,t)– copy element X to transaction local variable t
• WRITE(X,t)– copy transaction local variable t to element X
• INPUT(X)– read element X to memory buffer
• OUTPUT(X)– write element X to disk
31
Example
Atomicity:Both A and Bare multiplied by 2,or none is.
Dan Suciu -- CSEP544 Fall 2011
BEGIN TRANSACTIONREAD(A,t); t := t*2;WRITE(A,t); READ(B,t); t := t*2;WRITE(B,t)COMMIT;
32
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Buffer pool DiskTransaction
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t)
Is this bad ?
Crash !
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Is this bad ?
Crash !
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Yes it’s bad: A=16, B=8….
Is this bad ?
Crash !
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Is this bad ?
Crash !
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Yes it’s bad: A=B=16, but not committed
Is this bad ?
Crash !
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
Is this bad ?
Crash !
Action t Mem A Mem B Disk A Disk B
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT
No: that’s OK
39
Atomic Transactions
• FORCE or NO-FORCE– Should all updates of a transaction be forced to
disk before the transaction commits?• STEAL or NO-STEAL
– Can an update made by an uncommitted transaction overwrite the most recent committed value of a data item on disk?
Dan Suciu -- CSEP544 Fall 2011
Dan Suciu -- CSEP544 Fall 2011 40
Force/No-steal
• FORCE: Pages of committed transactions must be forced to disk before commit
• NO-STEAL: Pages of uncommitted transactions cannot be written to disk
Easy to implement (how?) and ensures atomicity
41
No-Force/Steal
• NO-FORCE: Pages of committed transactions need not be written to disk
• STEAL: Pages of uncommitted transactions may be written to disk
Dan Suciu -- CSEP544 Fall 2011
In either case, Atomicity is violated; need WAL
Dan Suciu -- CSEP544 Fall 2010
42
Write-Ahead LogThe Log: append-only file containing log records• Enables NO-FORCE and STEAL• Records every single action of every TXN• Force log entry to disk• After a system crash, use log to recoverThree types: UNDO, REDO, UNDO-REDO
43
UNDO Log
Dan Suciu -- CSEP544 Fall 2011
FORCE and STEAL
Dan Suciu -- CSEP544 Fall 2011 44
Undo LoggingLog records• <START T>
– transaction T has begun• <COMMIT T>
– T has committed• <ABORT T>
– T has aborted• <T,X,v>
– T has updated element X, and its old value was v
45
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
46
Crash !
WHAT DO WE DO ?
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
47
Crash !
WHAT DO WE DO ?
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
We UNDO by setting B=8 and A=8
48Crash !
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
What do we do now ?
49Crash !
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
What do we do now ? Nothing: log contains COMMIT
Dan Suciu -- CSEP544 Fall 2011 50
After Crash
• In the first example:– We UNDO both changes: A=8, B=8– The transaction is atomic, since none of its actions
has been executed
• In the second example– We don’t undo anything– The transaction is atomic, since both it’s actions
have been executed
Dan Suciu -- CSEP544 Fall 2011 51
Recovery with Undo Log
After system’s crash, run recovery manager
• Idea 1. Decide for each transaction T whether it is completed or not– <START T>….<COMMIT T>…. = yes– <START T>….<ABORT T>……. = yes– <START T>……………………… = no
• Idea 2. Undo all modifications by incomplete transactions
52
Recovery with Undo Log……<T6,X6,v6>……<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>
Question1: Which updatesare undone ?
Question 2:How far backdo we need toread in the log ?
Question 3:What happens if thereis a second crash,during recovery ?Crash
Dan Suciu -- CSEP544 Fall 2011 53
When Recovery Fails
• Redo the recovery!• Reason: all undo commands are
idempotent– If we perform them a second time, no
harm is done–E.g. if there is a system crash during
recovery, simply restart recovery from scratch
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
When mustwe force pagesto disk ?
?
?
55
Action t Mem A Mem B Disk A Disk B UNDO Log
<START T>
INPUT(A) 8 8 8
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8>
INPUT(B) 16 16 8 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
COMMIT <COMMIT T>
RULES: log entry before OUTPUT before COMMIT
FORCE
Dan Suciu -- CSEP544 Fall 2011 56
Undo-Logging Rules
U1: If T modifies X, then <T,X,v> must be written to disk before OUTPUT(X)
U2: If T commits, then OUTPUT(X) must be written to disk before <COMMIT T>
• Hence: OUTPUTs are done early, before the transaction commits
FORCE
Dan Suciu -- CSEP544 Fall 2011 57
Recovery with Undo Log
When do we stop reading the log ?• We cannot stop until we reach the
beginning of the log file• This is impractical
Instead: use checkpointing
Dan Suciu -- CSEP544 Fall 2011 58
Checkpointing
Checkpoint the database periodically• Stop accepting new transactions• Wait until all current transactions
complete• Flush log to disk• Write a <CKPT> log record, flush• Resume transactions
59
Undo Recovery with Checkpointing
……<T9,X9,v9>……(all completed)<CKPT><START T2><START T3<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>
During recovery,Can stop at first<CKPT>
transactions T2,T3,T4,T5
other transactions
60
Nonquiescent Checkpointing
• Problem with checkpointing: database freezes during checkpoint
• Would like to checkpoint while database is operational
• Idea: nonquiescent checkpointing
Quiescent = being quiet, still, or at rest; inactiveNon-quiescent = allowing transactions to be active
Dan Suciu -- CSEP544 Fall 2011 61
Nonquiescent Checkpointing
• Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are all active transactions
• Continue normal operation• When all of T1,…,Tk have completed,
write <END CKPT>
62
Undo Recovery with Nonquiescent Checkpointing
………………<START CKPT T4, T5, T6>…………<END CKPT>………
During recovery,Can stop at first<CKPT>
T4, T5, T6, plus later transactions
earlier transactions plus T4, T5, T6
later transactionsQ: do we need <END CKPT> ?
Dan Suciu -- CSEP544 Fall 2011
Implementing ROLLBACK
• A transaction ends in COMMIT or ROLLBACK• Use the undo-log to implement ROLLBCACK
• LSN = Log Seqence Number• Log entries for the same transaction are
linked, using the LSN’s• Read log in reverse, using LSN pointers
63
64
REDO Log
Dan Suciu -- CSEP544 Fall 2011
NO-FORCE and NO-STEAL
Dan Suciu -- CSEP544 Fall 2011 65
Redo Logging
Log records• <START T> = transaction T has begun• <COMMIT T> = T has committed• <ABORT T>= T has aborted• <T,X,v>= T has updated element X, and
its new value is v
66
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16Crash !
Is this bad ?
67
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16Crash !
Is this bad ? Yes, it’s bad: A=16, B=8
68
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Crash !
Is this bad ?
69
Action t Mem A Mem B Disk A Disk B
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8
COMMIT
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
Crash !
Is this bad ? Yes, it’s bad: T committed but A=B=8
70
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
71
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16Crash !
How do we recover ?
72
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16Crash !
How do we recover ? We REDO by setting A=16 and B=16
73
Recovery with Redo Log
After system’s crash, run recovery manager
• Step 1. Decide for each transaction T whether we need to redo or not– <START T>….<COMMIT T>…. = yes– <START T>….<ABORT T>……. = no– <START T>……………………… = no
• Step 2. Read log from the beginning, redo all updates of committed transactions
Dan Suciu -- CSEP544 Fall 2011 74
Recovery with Redo Log<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>
Show actionsduring recovery
75
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
When mustwe force pagesto disk ?
?
?
76
Action t Mem A Mem B Disk A Disk B REDO Log
<START T>
READ(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,16>
COMMIT <COMMIT T>
OUTPUT(A) 16 16 16 16 8
OUTPUT(B) 16 16 16 16 16
RULE: OUTPUT after COMMIT
NO-STEAL
Dan Suciu -- CSEP544 Fall 2011 77
Redo-Logging Rules
R1: If T modifies X, then both <T,X,v> and <COMMIT T> must be written to disk before OUTPUT(X)
• Hence: OUTPUTs are done late
NO-STEAL
78
Checkpointing
• To ensure recovery we must read from the beginning of the log: this is too inefficient
• Checkpointing: periodically write information to the log to allow us to process only the tail of the log
Dan Suciu -- CSEP544 Fall 2011
Dan Suciu -- CSEP544 Fall 2011 79
Nonquiescent Checkpointing
• Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are all active transactions
• Flush to disk all blocks of committed transactions (dirty blocks), while continuing normal operation
• When all blocks have been flushed, write <END CKPT>
Note: this differs significantly from ARIES (later this lecture)
80
Redo Recovery with Nonquiescent Checkpointing
…<START T1>…<COMMIT T1>…<START T4>…<START CKPT T4, T5, T6>…………<END CKPT>………<START CKPT T9, T10>…
Step 1: look forThe last<END CKPT>
Step 2: redofrom theearlieststart ofT4, T5, T6ignoringtransactionscommittedearlier
All OUTPUTs of T1 are guaranteedto be on disk
Cannotuse
81
Comparison Undo/Redo
• Undo logging: OUTPUT must be done early: – Inefficient
• Redo logging: OUTPUT must be done late: – Inflexible
Dan Suciu -- CSEP544 Fall 2011
82
UNDO/REDO Log
Dan Suciu -- CSEP544 Fall 2011
NO-FORCE and STEAL
Dan Suciu -- CSEP544 Fall 2011 83
Undo/Redo Logging
Log records, only one change• <T,X,u,v>= T has updated element X, its
old value was u, and its new value is v
Action T Mem A Mem B Disk A Disk B Log
<START T>
REAT(A,t) 8 8 8 8
t:=t*2 16 8 8 8
WRITE(A,t) 16 16 8 8 <T,A,8,16>
READ(B,t) 8 16 8 8 8
t:=t*2 16 16 8 8 8
WRITE(B,t) 16 16 16 8 8 <T,B,8,16>
OUTPUT(A) 16 16 16 16 8
<COMMIT T>
OUTPUT(B) 16 16 16 16 16
Can OUTPUT whenever we want: before/after COMMIT
Dan Suciu -- CSEP544 Fall 2011 85
Recovery with Undo/Redo Log
After system’s crash, run recovery manager • Redo all committed transaction, top-down• Undo all uncommitted transactions, bottom-
up
Dan Suciu -- CSEP544 Fall 2011 86
Recovery with Undo/Redo Log<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>
Dan Suciu -- CSEP544 Fall 2011 87
Undo/Redo-Logging Rule
UR1: If T modifies X, then <T,X,u,v> must be written to disk before OUTPUT(X)
Note: we are free to OUTPUT early or late relative to <COMMIT T>
NO-FORCE and STEAL
88
ARIES
Dan Suciu -- CSEP544 Fall 2011
Dan Suciu -- CSEP544 Fall 2010
89
Aries
• ARIES pieces together several techniques into a comprehensive algorithm
• Developed at IBM Almaden, by Mohan• IBM botched the patent, so everyone
uses it now• Several variations, e.g. for distributed
transactions
90
ARIES Recovery Manager
• A redo/undo log• Physiological logging
– Physical logging for REDO– Logical logging for UNDO
• Efficient checkpointing
Dan Suciu -- CSEP544 Fall 2011
Why ?
91
ARIES Recovery Manager
Log entries:• <START T> -- when T begins• Update: <T,X,u,v>
– T updates X, old value=u, new value=v– In practice: undo only and redo only entries
• <COMMIT T> or <ABORT T>• CLR’s – we’ll talk about them later.
Dan Suciu -- CSEP544 Fall 2011
92
ARIES Recovery Manager
Rule:• If T modifies X, then <T,X,u,v> must be
written to disk before OUTPUT(X)
We are free to OUTPUT early or late
Dan Suciu -- CSEP544 Fall 2011
93
LSN = Log Sequence Number• LSN = identifier of a log entry
– Log entries belonging to the same txn are linked
• Each page contains a pageLSN:– LSN of log record for latest update to that page– Will serve to determine if an update needs to be
redone
Dan Suciu -- CSEP544 Fall 2011
94
ARIES Data Structures• Active Transactions Table
– Lists all running txns (active txns)– For each txn: lastLSN = most recent update by txn
• Dirty Page Table– Lists all dirty pages– For each dirty page: recoveryLSN (recLSN)= first LSN
that caused page to become dirty• Write Ahead Log
– LSN, prevLSN = previous LSN for same txn
Dan Suciu -- CSEP544 Fall 2011
ARIES Data Structures
pageID recLSNP5 102
P6 103
P7 101
LSN prevLSN transID pageID Log entry101 - T100 P7
102 - T200 P5
103 102 T200 P6
104 101 T100 P5
Dirty pages Log (WAL)
transID lastLSNT100 104
T200 103
Active transactionsP8 P2 . . .
. . .
P5PageLSN=104
P6PageLSN=103
P7PageLSN=101
Buffer Pool
WT100(P7)WT200(P5)WT200(P6)WT100(P5)
96
ARIES Normal Operation
T writes page P• What do we do ?
Dan Suciu -- CSEP544 Fall 2011
97
ARIES Normal Operation
T writes page P• What do we do ?
• Write <T,P,u,v> in the Log• pageLSN=LSN• lastLSN=LSN• recLSN=if isNull then LSN
Dan Suciu -- CSEP544 Fall 2011
98
ARIES Normal Operation
Buffer manager wants to OUTPUT(P)• What do we do ?
Buffer manager wants INPUT(P)• What do we do ?
Dan Suciu -- CSEP544 Fall 2011
99
ARIES Normal Operation
Buffer manager wants to OUTPUT(P)• Flush log up to pageLSN• Remove P from Dirty Pages tableBuffer manager wants INPUT(P)• Create entry in Dirty Pages table
recLSN = NULL
Dan Suciu -- CSEP544 Fall 2011
100
ARIES Normal Operation
Transaction T starts• What do we do ?
Transaction T commits/aborts• What do we do ?
Dan Suciu -- CSEP544 Fall 2011
101
ARIES Normal Operation
Transaction T starts• Write <START T> in the log• New entry T in Active TXN;
lastLSN = nullTransaction T commits/aborts• Write <COMMIT T> in the log• Flush log up to this entry
Dan Suciu -- CSEP544 Fall 2011
102
Checkpoints
Write into the log
• Entire active transactions table• Entire dirty pages table
Dan Suciu -- CSEP544 Fall 2011
Recovery always starts by analyzing latest checkpoint
Background process periodically flushes dirty pages to disk
103
ARIES Recovery1. Analysis pass
– Figure out what was going on at time of crash– List of dirty pages and active transactions
2. Redo pass (repeating history principle)– Redo all operations, even for transactions that will not commit– Get back to state at the moment of the crash
3. Undo pass– Remove effects of all uncommitted transactions– Log changes during undo in case of another crash during undo
Dan Suciu -- CSEP544 Fall 2011
104
ARIES Method Illustration
[Figure 3 from Franklin97]Dan Suciu -- CSEP544 Fall 2011
First undo and first redo log entry might bein reverse order
105
1. Analysis Phase• Goal
– Determine point in log where to start REDO– Determine set of dirty pages when crashed
• Conservative estimate of dirty pages– Identify active transactions when crashed
• Approach– Rebuild active transactions table and dirty pages table– Reprocess the log from the checkpoint
• Only update the two data structures– Compute: firstLSN = smallest of all recoveryLSN
Dan Suciu -- CSEP544 Fall 2011
1. Analysis Phase(crash)Checkpoint
Dirtypages
Activetxn
Log
pageID recLSN pageID
transID lastLSN transID
firstLSN= ??? Where do we startthe REDO phase ?
1. Analysis Phase(crash)Checkpoint
Dirtypages
Activetxn
Log
pageID recLSN pageID
transID lastLSN transID
firstLSN=min(recLSN)
1. Analysis Phase(crash)Checkpoint
Dirtypages
Activetxn
Log
pageID recLSN pageID
transID lastLSN transID
pageID recLSN pageID
transID lastLSN transID
Replayhistory
firstLSN
109
2. Redo Phase
Main principle: replay history• Process Log forward, starting from
firstLSN• Read every log record, sequentially• Redo actions are not recorded in the log• Needs the Dirty Page Table
Dan Suciu -- CSEP544 Fall 2011
110
2. Redo Phase: Details
For each Log entry record LSN: <T,P,u,v>• Re-do the action P=u and WRITE(P)• But which actions can we skip, for
efficiency ?
Dan Suciu -- CSEP544 Fall 2011
111
2. Redo Phase: Details
For each Log entry record LSN: <T,P,u,v>• If P is not in Dirty Page then no update• If recLSN > LSN, then no update• INPUT(P) (read page from disk):
If pageLSN > LSN, then no update• Otherwise perform update
Dan Suciu -- CSEP544 Fall 2011
112
2. Redo Phase: Details
What happens if system crashes during REDO ?
Dan Suciu -- CSEP544 Fall 2011
113
2. Redo Phase: Details
What happens if system crashes during REDO ?
We REDO again ! Each REDO operation is idempotent: doing it twice is the as as doing it once.
Dan Suciu -- CSEP544 Fall 2011
Dan Suciu -- CSEP544 Fall 2011 114
3. Undo Phase
• Cannot “unplay” history, in the same way as we “replay” history
• WHY NOT ?
Dan Suciu -- CSEP544 Fall 2011 115
3. Undo Phase
• Cannot “unplay” history, in the same way as we “replay” history
• WHY NOT ?
• Need to support ROLLBACK: selective undo, for one transaction
• Hence, logical undo v.s. physical redo
116
3. Undo Phase
Main principle: “logical” undo• Start from end of Log, move backwards• Read only affected log entries• Undo actions are written in the Log as special
entries: CLR (Compensating Log Records)• CLRs are redone, but never undone
Dan Suciu -- CSEP544 Fall 2011
117
3. Undo Phase: Details• “Loser transactions” = uncommitted
transactions in Active Transactions Table
• ToUndo = set of lastLSN of loser transactions
Dan Suciu -- CSEP544 Fall 2011
118
3. Undo Phase: Details
While ToUndo not empty:• Choose most recent (largest) LSN in ToUndo• If LSN = regular record <T,P,u,v>:
– Undo v– Write a CLR where CLR.undoNextLSN = LSN.prevLSN
• If LSN = CLR record:– Don’t undo !
• if CLR.undoNextLSN not null, insert in ToUndootherwise, write <END TRANSACTION> in log
Dan Suciu -- CSEP544 Fall 2011
119
3. Undo Phase: Details
[Figure 4 from Franklin97]
Dan Suciu -- CSEP544 Fall 2011
120
3. Undo Phase: Details
What happens if system crashes during UNDO ?
Dan Suciu -- CSEP544 Fall 2011
121
3. Undo Phase: Details
What happens if system crashes during UNDO ?
We do not UNDO again ! Instead, each CLR is a REDO record: we simply redo the undo
Dan Suciu -- CSEP544 Fall 2011
122
Physical v.s. Logical Loging
Why are redo records physical ?
Why are undo records logical ?
Dan Suciu -- CSEP544 Fall 2011
123
Physical v.s. Logical Loging
Why are redo records physical ?• Simplicity: replaying history is easy, and
idempotent
Why are undo records logical ?• Required for transaction rollback: this not
“undoing history”, but selective undo
Dan Suciu -- CSEP544 Fall 2011