当前位置：文档库 › DB系统实现(5)

DB系统实现(5)

Dai Min

School of Computer Science and Technology, Tianjin University of

Technology

Ａｄｖａｎｃｅｄ　Ｄａｔａｂａｓｅ　Ｔｅｃｈｎｏｌｏｇｙ

Slides adapted from material by Profs. Jeff Ullman

(Stanford) and Art Keller (UCSC)

?2006Dai Min

§Major concern so far:

?How to manipulate database efficiently?

§Now shift focus to:

?How to ensure correctness of database manipulation?

?2006Dai Min 3Integrity or correctness of data

§Would like data to be “accurate”or

“correct ”at all times

EMP

Name White

Green Gray

Age 5234211

?2006Dai Min 4

Integrity or consistency constraints

§Predicates that the data must satisfy §Examples:

-x is key of relation R

-x →y holds in R

-Domain(x) = {Red, Blue, Green}-no employee should make more than twice the average salary

Definition:

§Consistent state:satisfies all constraints §Consistent DB:DB in a consistent state

Consistency

§DB cannot be consistent always!

Example:a1 + a2 +…. an = TOT (constraint)

Transaction: Deposit $100 in a2: a2 ←a2 + 100

TOT ←TOT + 100

..50..1000

..150..1000

.150..1100

a2TOT

State 1:consistent

State 2:“inconsistent”

State 3:consistent

?2006Dai Min

Consistency

§事务的ACID 性质

?Atomicity, Consistency, Isolation, Durability

consistency of transaction Consistent DB

Consistent DB’

但事务内部不保证DB 的一致性

?2006Dai Min

Correctness

Reality

DB should reflect real world

In general, various policies to ensure that the DB corresponds to "real world"

?2006Dai Min

Correctness

DB should reflect real world

Example ：A telephone number 3601123 －－correct abcdefg －－not correct

Can be preserved by explicit constraints ！

?2006Dai Min

Correctness

DB should reflect real world

Example ：A telephone number 3601123 －－correct 9999999 －－Is it correct ？

Not correct in reality, but can DB know this?

Answer : NO!

Correctness

§Correctness of DB ≠Correctness of reality Correctness of DB

如果数据库在事务开始执行时是一致的，并且事务执行结束后数据库仍处于一致状态，则数据库满足正确性.

Consistency of DB + ACID of transaction

Correctness of DB

Transaction: collection of actions

that preserve consistency (≈process)

Consistent DB

Consistent DB’

SQL: each query or modification statement Embedded SQL: Sequence of DB operations up to COMMIT or ROLLBACK ("abort")

?2006Dai Min

13Big assumption:

If transaction T

starts with consistent state +executes in isolation ?T leaves consistent state

Correctness (informally)

§If we stop running transactions, DB left consistent

§Each transaction sees a consistent DB

?2006Dai Min 14

How can constraints be violated?

§Transaction bug §DBMS bug

§Hardware failure

e.g., disk crash alters balance of account

§Simultaneous transactions accessing shared data

e.g.: T1: give 10% raise to programmers

T2: change programmers ?systems analysts

?2006Dai Min 15How can we prevent/fix violations?

§Chapter [17]: due to failures only

§Chapter [18]: due to data concurrency only §Chapter [19]: due to failures and concurrency

?2006Dai Min 16

Will not consider:

§How to write correct transactions §How to write correct DBMS §Constraint checking & repair

That is, solutions studied here do not need to know constraints

Outline

§Failure Model §Undo Log §Redo Log

§Undo/Redo log §Checkpoint §Media failure

Chapter [17]：Crash Recovery here

Events

Desired Undesired

Expected Unexpected

1. Failure Model

?2006Dai Min

191) Context of failure model

processor

memory

disk

-volatile,

-nonvolatile

disappears when power off

CPU

?2006Dai Min 20

Desired events:see product manuals….Undesired expected events:System failures

-memory lost -CPU halts, resets

Undesired Unexpected:Everything else!

that ’s it!!

?2006Dai Min 21Examples:

§Disk data is lost

§Memory lost without CPU halt

§Aliens attack and destroy the building….

Undesired Unexpected:Everything else!

(can protect against this by archive backups)

?2006Dai Min 22

Is this model reasonable?

Approach:Add low level checks +

redundancy to increase probability that the model holds E.g., Replicate disk storage (stable store)

Memory parity CPU checks

2) Interacting Address Spaces:

Storage hierarchy

Memory Disk

Input(x)

Output(x)

Read(x,t)

Write(x,t)

§Input (x): block containing x →memory §Output (x): block containing x →disk §Read (x,t): t ←value of x in block

(do input(x) if necessary)

§Write (x,t): value of x in block ←t

Note on "database elements" X:

§Anything that can have value and be accessed of modified by transactions:

?a relation (or extent of an object class)?a disk block ?a record

§Easiest to use blocks as database elements

?2006Dai Min 253) Key problem:Unfinished transaction

Example

Constraint: A=B T 1: A ← A ×2

B ← B ×2

?2006Dai Min

T 1:Read (A,t); t ←t ×2

Write (A,t);

Read (B,t); t ←t ×2Write (B,t);Output (A);

Output (B);

A: 8B: 8A: 8B: 8

memory

disk

16failure! ?>A != B 16

?2006Dai Min 27

§Need atomicity:either execute all actions of a transaction or none at all

?2006Dai Min 28

§Failure Model §Undo Log §Redo Log

§Undo/Redo log §Checkpoint §Media failure

Outline

here

One solution:undo logging (immediate

modification)

1) Log:sequence of log records, recording

actions of transactions T i :

: transaction has begun : completed successfully

: could not complete successfully : T i has updated element X, whose

old value was v

2. Undo Logging

T 1:Read (A,t); t ←t ×2

Write (A,t);

Read (B,t); t ←t ×2Write (B,t);Output (A);Output (B);

A:8B:8A:8B:8memory

disk

log

Undo logging (Immediate modification)

1616

constraint:A=B

?2006Dai Min 31One “complication ”

§Log is first written in memory §Not written to disk on every action

memory DB Log

A: 8 16B: 8 16Log:

A: 8B: 8

16BAD STATE

# 1

?2006Dai Min 32

One “complication ”

§Log is first written in memory §Not written to disk on every action

memory DB Log

A: 8 16B: 8 16Log:

A: 8B: 8

16BAD STATE

# 2

...

?2006Dai Min

332) Undo logging rules

(1) Generate an undo log record for every update action (with the old value)(2) Before modifying element X on disk, any log records for X must be flushed to disk (write -ahead logging)(3) All WRITE of transaction T must be OUTPUT to disk before is flushed to log

?2006Dai Min

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log

< T, Start >

Log

Initial:A=1000B=2000

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log

Fail here

A:900B:1900

A:1000B:2000

…

< T, Start >

…

Memory

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log

Fail here

A:900B:1900

A:900B:2000

…

< T, Start >

…

< T,Start >

Memory

Disk

?2006Dai Min

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log

Fail here

A:900B:1900

A:900

B:1900

…

< T, Start >

…

< T, Start >

Memory

Disk

?2006Dai Min 38

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);Flush Log

Success!

Memory

A:900B:1900

…

< T, Start >

…

< T, Start >

Disk

?2006Dai Min

393) Recovery rules: Undo logging

§For every Ti with in log:

-If < Ti,commit > or < Ti,abort >

in log, do nothing

-Else For all < Ti, X , v > in log:

write (X, v )output (X )

Write to log

yIS THIS CORRECT??

?2006Dai Min 40

T 1: A ←A+1; B ←B+1; T 2: A ←A+1; B ←B+1;

A:10B:10

DB on disk

log

Recovery with Undo logging

CRASH =>

A should be restored to 10, not 11

11constraint:A=B

Undo logging Recovery :(more precisely)

(1) Let S = set of transactions with in log, but no (or ) record in log

(2) For each in log,

in reverse order (latest →earliest) do:-if Ti ∈S then -write (X, v)

-output (X)

(3) For each Ti ∈S do

-write to log

§Failure Model §Undo Log §Redo Log

§Undo/Redo log §Checkpoint §Media failure

Outline

here

?2006Dai Min 433. Redo logging (deferred modification)

T 1:Read(A,t); t t ×2; write (A,t);

Read(B,t); t t ×2; write (B,t);Output(A); Output(B)

A: 8B: 8A: 8B: 8

memory

LOG

1616

output

16?2006Dai Min 44

1) Redo logging rules

(1) For every disk update, generate a redo log record (with new value)

(2) Before modifying DB element X on disk, all log records for the transaction that (including COMMIT) must be flushed to disk

?2006Dai Min 45

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);

Log

Initial:A=1000B=2000

?2006Dai Min

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);

Fail here

A:900B:1900

A:1000B:2000

…

Memory

Disk

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);

Fail here

A:900B:1900

A:1000B:2000

…

T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);

Fail here

A:900B:1900

A:900B:2000

…

?2006Dai Min 49T1:

Read (A,t); t ←t -100;Write (A,t);Read (B,t);t ←t -100;Write (B,t);Flush Log Output (A);Output (B);

Fail here

A:900B:1900

…

< T, Start>

…

< T, Start>

Disk

?2006Dai Min 50

§For every Ti with in log:

?For all in log:

Write(X, v)Output(X)

2) Recovery rules:Redo logging

This time, the last updates should remain in effect

?2006Dai Min 51(1) Let S = set of transactions with

in log (2) For each in log, in forward order (earliest →latest) do:

-if Ti ∈S then Write (X, v)

Output(X)

Recovery with Redo Logging:(more precisely)

?2006Dai Min 52

Recovery is very, very

SLOW !

Redo log:

First T1 wrote A,B

Last Record Committed a year ago

Record

(1 year ago)--> STILL, Need to redo after crash!!

...

Crash

3) Solution:Checkpointing

(for redo -logging , simple version)

Periodically:

(1) Stop accepting new transactions

(2) Wait all transactions to finish (commit/abort)(3) Flush all log records to disk (log)(4) Flush all buffers to disk (DB)

(5) Write & flush a record on disk (log)(6) Resume transaction processing

Example: what to do at recovery?

Redo log (on disk):

C h e c k p o i n t

Crash

..................

Result of transactions completed prior to the checkpoint are stored permanently on disk

-> Redo

write(B, 17)output(B)

?2006Dai Min

55Undo vs. Redo

§Undo 基于立即更新(Immediately Update)§Redo 基于延迟更新(Deferred Update)

?2006Dai Min

T1:

Read (A,t); t ←t -100;Write (A,t);Flush Log Output (A);Read (B,t);t ←t -100;Write (B,t);

Flush Log

Output (B);

Flush Log

T2:

Read (A,t);

t ←t -100;Write (A,t);Output (A);Read (B,t);t ←t -100;Write (B,t);Output (B);

Flush Log 立即更新(Undo log)

延迟更新(Redo log)

More IOs But less buffers

More buffers

But less IOs

Undo vs. Redo

?2006Dai Min

Drawbacks:

§Undo logging:

?System forced to update disk at the end of transactions (may increase disk I/O)

?cannot redo recent updates to refresh a backup copy of the DB

§Redo logging:

?need to keep all modified blocks in memory until commit (may lead to shortage of buffer pool)

?2006Dai Min 58

§Failure Model §Undo Log §Redo Log

§Undo/Redo log §Checkpoint §Media failure

Outline

here

4. Solution: Undo/redo logging!

Update record ?Provides more flexibility to order actions

1) Rules for undo/redo logging

(1) Log record

flushed to disk before writing X to disk (2) Flush the log at commit

?Element X can be written to disk either before or after the commit of Ti

?2006Dai Min

612) Undo/Redo Recovery Policy

§To recover from a crash,

?undo updates by any incomplete

transactions (in backward -order, latest first)

?redo updates by any committed

transactions (in forward -order, earliest first)

?2006Dai Min

Log file segment < T1 Start >< T2 Start >< T1 Commit >< T2 Commit >< T4 Start >

Undo List: {T3,T4}; Redo List:{T1,T2}?Undo

T4: D=1000T3: B=1900T3: C=3000?Redo

T1:B=1900T2:A=900?Write log

?2006Dai Min 633) Problem with Simple Checkpointing

§Simple checkpointing stops the system from accepting new transactions -> system effectively shuts down §Solution: "nonquiescent" checkpointing ?accepts new transactions during a checkpoint

?2006Dai Min 64

Nonquiescent Checkpointing

§Slightly different for various logging policies §Rules for undo/redo logging:

?Write log record (and flush log)

where T1, …, Tk are the active transactions

?Flush to disk all dirty buffers which contain changed database elements (Corresponding log records first!)=> Effect of writes prior to the chkpt made permanent ?Write & flush log record

L O G

for undo

dirty buffer pool pages flushed

...

......

Nonquiescent Checkpointing

Examples what to do at recovery time?

no T1 commit

L O G

...

Ckpt T 1

...

Ckpt end ...

b, b'>

...yUndo T 1 (restore Y to b and X to a)

?2006Dai Min 67Example

L O G

...

T 1a ......T 1b ......T 1c ...T 1cmt

...ckpt-end ckpt-s T 1yRedo T1: (redo update of elements to b and to c)

?2006Dai Min

Recovery process:

§Backwards pass (end of log ülatest checkpoint start)

?construct set S of committed transactions

?undo actions of transactions not in S

§Undo pending transactions

?follow undo chains for transactions in (checkpoint active list) -S

§Forward pass (latest checkpoint start üend of log)

?redo actions of transactions in set S

backward pass

forward pass

start check-point

?2006Dai Min 69

< T1, START >< T2, START >< T1, COMMIT >

< T3,START >< T2, COMMIT >< T3, COMMIT >

?2006Dai Min

§Failure Model §Undo Log §Redo Log

§Undo/Redo log §Checkpoint §Media failure

Outline

here

5. Media failure (loss of non -volatile storage)A: 16

Solution:Make copies of data!

Disk crash!

1)Solution a:Use mirrored or redundant disks

§Mirroring: copies on separate disks

?Output(X) --> three outputs ?Input(X) --> three inputs + vote

D2D3

?2006Dai Min 73Example #2Redundant writes,Single reads §Keep N copies on separate disks §Output(X) --> N outputs §Input(X) --> Input one copy

-if ok, done

-else try another one

óAssumes bad data can be detected

?2006Dai Min 74

2) Solution b:Archiving

backup database

active database

log

?If active database is lost,

–restore active database from backup (archive)–bring up-t o-date using redo entries in log

?2006Dai Min 75When can we discard the log?

check -point

db dump

last needed undo

not needed for media recovery

not needed for undo after system failure

not needed for

redo after system failure

log

time

?2006Dai Min 76

Summary

§Consistency of data

§Recovery from system failures

?Logging (undo/redo/undo-redo), checkpoints ?Recovery: using log to reconstruct the DB

§Recovery form media failures

?reducing risk through redundancy ?backups / archiving