Dai Min
School of Computer Science and Technology, Tianjin University of
Technology
Advanced Database Technology
Slides adapted from material by Profs. Jeff Ullman
(Stanford) and Art Keller (UCSC)
?2006Dai Min
2
§Major concern so far:
?How to manipulate database efficiently?
§Now shift focus to:
?How to ensure correctness of database manipulation?
?2006Dai Min 3Integrity or correctness of data
§Would like data to be “accurate”or
“correct ”at all times
EMP
Name White
Green Gray
Age 5234211
?2006Dai Min 4
Integrity or consistency constraints
§Predicates that the data must satisfy §Examples:
-x is key of relation R
-x →y holds in R
-Domain(x) = {Red, Blue, Green}-no employee should make more than twice the average salary
Definition:
§Consistent state:satisfies all constraints §Consistent DB:DB in a consistent state
Consistency
§DB cannot be consistent always!
Example:a1 + a2 +…. an = TOT (constraint)
Transaction: Deposit $100 in a2: a2 ←a2 + 100
TOT ←TOT + 100
..50..1000
..150..1000
.
.150..1100
a2TOT
State 1:consistent
State 2:“inconsistent”
State 3:consistent
?2006Dai Min
7
Consistency
§事务的ACID 性质
?Atomicity, Consistency, Isolation, Durability
consistency of transaction Consistent DB
Consistent DB’
T
但事务内部不保证DB 的一致性
?2006Dai Min
8
Correctness
DB
Reality
DB should reflect real world
In general, various policies to ensure that the DB corresponds to "real world"
?2006Dai Min
9
Correctness
DB should reflect real world
Example :A telephone number 3601123 --correct abcdefg --not correct
Can be preserved by explicit constraints !
?2006Dai Min
10
Correctness
DB should reflect real world
Example :A telephone number 3601123 --correct 9999999 --Is it correct ?
Not correct in reality, but can DB know this?
Answer : NO!
Correctness
§Correctness of DB ≠Correctness of reality Correctness of DB
如果数据库在事务开始执行时是一致的,并且事务执行结束后数据库仍处于一致状态,则数据库满足正确性.
Consistency of DB + ACID of transaction
Correctness of DB
Transaction: collection of actions
that preserve consistency (≈process)
Consistent DB
Consistent DB’
T
SQL: each query or modification statement Embedded SQL: Sequence of DB operations up to COMMIT or ROLLBACK ("abort")
?2006Dai Min
13Big assumption:
If transaction T
starts with consistent state +executes in isolation ?T leaves consistent state
Correctness (informally)
§If we stop running transactions, DB left consistent
§Each transaction sees a consistent DB
?2006Dai Min 14
How can constraints be violated?
§Transaction bug §DBMS bug
§Hardware failure
e.g., disk crash alters balance of account
§Simultaneous transactions accessing shared data
e.g.: T1: give 10% raise to programmers
T2: change programmers ?systems analysts
?2006Dai Min 15How can we prevent/fix violations?
§Chapter [17]: due to failures only
§Chapter [18]: due to data concurrency only §Chapter [19]: due to failures and concurrency
?2006Dai Min 16
Will not consider:
§How to write correct transactions §How to write correct DBMS §Constraint checking & repair
That is, solutions studied here do not need to know constraints
Outline
§Failure Model §Undo Log §Redo Log
§Undo/Redo log §Checkpoint §Media failure
Chapter [17]:Crash Recovery here
Events
Desired Undesired
Expected Unexpected
1. Failure Model
?2006Dai Min
191) Context of failure model
processor
memory
disk
-volatile,
-nonvolatile
disappears when power off
CPU
M
D
?2006Dai Min 20
Desired events:see product manuals….Undesired expected events:System failures
-memory lost -CPU halts, resets
Undesired Unexpected:Everything else!
that ’s it!!
?2006Dai Min 21Examples:
§Disk data is lost
§Memory lost without CPU halt
§Aliens attack and destroy the building….
Undesired Unexpected:Everything else!
(can protect against this by archive backups)
?2006Dai Min 22
Is this model reasonable?
Approach:Add low level checks +
redundancy to increase probability that the model holds E.g., Replicate disk storage (stable store)
Memory parity CPU checks
2) Interacting Address Spaces:
Storage hierarchy
Memory Disk
x
x
t
Input(x)
Output(x)
Read(x,t)
Write(x,t)
§Input (x): block containing x →memory §Output (x): block containing x →disk §Read (x,t): t ←value of x in block
(do input(x) if necessary)
§Write (x,t): value of x in block ←t
Note on "database elements" X:
§Anything that can have value and be accessed of modified by transactions:
?a relation (or extent of an object class)?a disk block ?a record
§Easiest to use blocks as database elements
?2006Dai Min 253) Key problem:Unfinished transaction
Example
Constraint: A=B T 1: A ← A ×2
B ← B ×2
?2006Dai Min
26
T 1:Read (A,t); t ←t ×2
Write (A,t);
Read (B,t); t ←t ×2Write (B,t);Output (A);
Output (B);
A: 8B: 8A: 8B: 8
memory
disk
16
16failure! ?>A != B 16
?2006Dai Min 27
§Need atomicity:either execute all actions of a transaction or none at all
?2006Dai Min 28
§Failure Model §Undo Log §Redo Log
§Undo/Redo log §Checkpoint §Media failure
Outline
here
One solution:undo logging (immediate
modification)
1) Log:sequence of log records, recording
actions of transactions T i :
old value was v
2. Undo Logging
T 1:Read (A,t); t ←t ×2
Write (A,t);
Read (B,t); t ←t ×2Write (B,t);Output (A);Output (B);
A:8B:8A:8B:8memory
disk
log
Undo logging (Immediate modification)
1616
16
constraint:A=B
?2006Dai Min 31One “complication ”
§Log is first written in memory §Not written to disk on every action
memory DB Log
A: 8 16B: 8 16Log:
A: 8B: 8
16BAD STATE
# 1
?2006Dai Min 32
One “complication ”
§Log is first written in memory §Not written to disk on every action
memory DB Log
A: 8 16B: 8 16Log:
A: 8B: 8
16BAD STATE
# 2
...
?2006Dai Min
332) Undo logging rules
(1) Generate an undo log record for every update action (with the old value)(2) Before modifying element X on disk, any log records for X must be flushed to disk (write -ahead logging)(3) All WRITE of transaction T must be OUTPUT to disk before
?2006Dai Min
34
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log
< T, Start >
Log
Initial:A=1000B=2000
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log
Fail here
A:900B:1900
A:1000B:2000
…
< T, Start >
…
Memory
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log
Fail here
A:900B:1900
A:900B:2000
…
< T, Start >
…
< T,Start >
Memory
Disk
?2006Dai Min
37
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log
Fail here
A:900B:1900
A:900
B:1900
…
< T, Start >
…
< T, Start >
Memory
Disk
?2006Dai Min 38
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log
Success!
Memory
A:900B:1900
A:900B:1900
…
< T, Start >
…
< T, Start >
Disk
?2006Dai Min
393) Recovery rules: Undo logging
§For every Ti with
-If < Ti,commit > or < Ti,abort >
in log, do nothing
-Else For all < Ti, X , v > in log:
write (X, v )output (X )
Write
yIS THIS CORRECT??
?2006Dai Min 40
T 1: A ←A+1; B ←B+1; T 2: A ←A+1; B ←B+1;
A:10B:10
DB on disk
log
Recovery with Undo logging
A should be restored to 10, not 11
11constraint:A=B
Undo logging Recovery :(more precisely)
(1) Let S = set of transactions with
(2) For each
in reverse order (latest →earliest) do:-if Ti ∈S then -write (X, v)
-output (X)
(3) For each Ti ∈S do
-write
§Failure Model §Undo Log §Redo Log
§Undo/Redo log §Checkpoint §Media failure
Outline
here
?2006Dai Min 433. Redo logging (deferred modification)
T 1:Read(A,t); t t ×2; write (A,t);
Read(B,t); t t ×2; write (B,t);Output(A); Output(B)
A: 8B: 8A: 8B: 8
memory
DB
LOG
1616
output
16?2006Dai Min 44
1) Redo logging rules
(1) For every disk update, generate a redo log record (with new value)
(2) Before modifying DB element X on disk, all log records for the transaction that (including COMMIT) must be flushed to disk
?2006Dai Min 45
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);
Log
Initial:A=1000B=2000
?2006Dai Min
46
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);
Fail here
A:900B:1900
A:1000B:2000
…
…
Memory
Disk
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);
Fail here
A:900B:1900
A:1000B:2000
…
…
T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);
Fail here
A:900B:1900
A:900B:2000
…
…
?2006Dai Min 49T1:
Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);
Fail here
A:900B:1900
A:900B:1900
…
< T, Start>
…
< T, Start>
Disk
?2006Dai Min 50
§For every Ti with
?For all
Write(X, v)Output(X)
2) Recovery rules:Redo logging
This time, the last updates should remain in effect
?2006Dai Min 51(1) Let S = set of transactions with
-if Ti ∈S then Write (X, v)
Output(X)
Recovery with Redo Logging:(more precisely)
?2006Dai Min 52
Recovery is very, very
SLOW !
Redo log:
First T1 wrote A,B
Last Record Committed a year ago
Record
(1 year ago)--> STILL, Need to redo after crash!!
...
...
...
Crash
3) Solution:Checkpointing
(for redo -logging , simple version)
Periodically:
(1) Stop accepting new transactions
(2) Wait all transactions to finish (commit/abort)(3) Flush all log records to disk (log)(4) Flush all buffers to disk (DB)
(5) Write & flush a
Example: what to do at recovery?
Redo log (on disk):
C h e c k p o i n t
Crash
..................
Result of transactions completed prior to the checkpoint are stored permanently on disk
-> Redo
write(B, 17)output(B)
?2006Dai Min
55Undo vs. Redo
§Undo 基于立即更新(Immediately Update)§Redo 基于延迟更新(Deferred Update)
?2006Dai Min
56
T1:
Read (A,t); t ←t -100;Write (A,t);Flush Log Output (A);Read (B,t);t ←t -100;Write (B,t);
Flush Log
Output (B);
Flush Log
T2:
Read (A,t);
t ←t -100;Write (A,t);Output (A);Read (B,t);t ←t -100;Write (B,t);Output (B);
Flush Log
延迟更新(Redo log)
More IOs But less buffers
More buffers
But less IOs
Undo vs. Redo
?2006Dai Min
57
Drawbacks:
§Undo logging:
?System forced to update disk at the end of transactions (may increase disk I/O)
?cannot redo recent updates to refresh a backup copy of the DB
§Redo logging:
?need to keep all modified blocks in memory until commit (may lead to shortage of buffer pool)
?2006Dai Min 58
§Failure Model §Undo Log §Redo Log
§Undo/Redo log §Checkpoint §Media failure
Outline
here
4. Solution: Undo/redo logging!
Update record ?
1) Rules for undo/redo logging
(1) Log record
flushed to disk before writing X to disk (2) Flush the log at commit
?Element X can be written to disk either before or after the commit of Ti
?2006Dai Min
612) Undo/Redo Recovery Policy
§To recover from a crash,
?undo updates by any incomplete
transactions (in backward -order, latest first)
?redo updates by any committed
transactions (in forward -order, earliest first)
?2006Dai Min
62
Log file segment < T1 Start >
Undo List: {T3,T4}; Redo List:{T1,T2}?Undo
T4: D=1000T3: B=1900T3: C=3000?Redo
T1:B=1900T2:A=900?Write log
?2006Dai Min 633) Problem with Simple Checkpointing
§Simple checkpointing stops the system from accepting new transactions -> system effectively shuts down §Solution: "nonquiescent" checkpointing ?accepts new transactions during a checkpoint
?2006Dai Min 64
Nonquiescent Checkpointing
§Slightly different for various logging policies §Rules for undo/redo logging:
?Write log record (and flush log)
where T1, …, Tk are the active transactions
?Flush to disk all dirty buffers which contain changed database elements (Corresponding log records first!)=> Effect of writes prior to the chkpt made permanent ?Write & flush log record
L O G
for undo
dirty buffer pool pages flushed
...
...
......
Nonquiescent Checkpointing
Examples what to do at recovery time?
no T1 commit
L O G
Ckpt T 1
...
Ckpt end ... b, b'> ...yUndo T 1 (restore Y to b and X to a) ?2006Dai Min 67Example L O G ... T 1a ......T 1b ......T 1c ...T 1cmt ...ckpt-end ckpt-s T 1yRedo T1: (redo update of elements to b and to c) ?2006Dai Min 68 Recovery process: §Backwards pass (end of log ülatest checkpoint start) ?construct set S of committed transactions ?undo actions of transactions not in S §Undo pending transactions ?follow undo chains for transactions in (checkpoint active list) -S §Forward pass (latest checkpoint start üend of log) ?redo actions of transactions in set S backward pass forward pass start check-point ?2006Dai Min 69 < T1, START > < T3,START > ?2006Dai Min 70 §Failure Model §Undo Log §Redo Log §Undo/Redo log §Checkpoint §Media failure Outline here 5. Media failure (loss of non -volatile storage)A: 16 Solution:Make copies of data! Disk crash! 1)Solution a:Use mirrored or redundant disks §Mirroring: copies on separate disks ?Output(X) --> three outputs ?Input(X) --> three inputs + vote D1 D2D3 ?2006Dai Min 73Example #2Redundant writes,Single reads §Keep N copies on separate disks §Output(X) --> N outputs §Input(X) --> Input one copy -if ok, done -else try another one óAssumes bad data can be detected ?2006Dai Min 74 2) Solution b:Archiving backup database active database log ?If active database is lost, –restore active database from backup (archive)–bring up-t o-date using redo entries in log ?2006Dai Min 75When can we discard the log? check -point db dump last needed undo not needed for media recovery not needed for undo after system failure not needed for redo after system failure log time ?2006Dai Min 76 Summary §Consistency of data §Recovery from system failures ?Logging (undo/redo/undo-redo), checkpoints ?Recovery: using log to reconstruct the DB §Recovery form media failures ?reducing risk through redundancy ?backups / archiving