cs 600.419 storage systems dangers of replication materials taken from “j. gray, p. helland, p....

15
CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.” http://research.microsoft.com/~gray/replicas.ps

Upload: ross-fisher

Post on 28-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Dangers of Replication

Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.”

http://research.microsoft.com/~gray/replicas.ps

Page 2: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

What’s the danger?

• Replication of transactional data results in unstable system performance

• For consistent replication– Waits and deadlocks

• For update-anywhere-anytime replication– Reconciliations

• Both grow polynomially (w/ meaningful exponents) in the number of clients– Based on simple, lower bounds derived from mean-value analysis

Page 3: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

What’s the point?

• This theme is predicated on the knowledge that globally consistent replication does not scale

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 4: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Replication Policies

• Eager replication:– Copies are updated as part of the original transaction.

• Lazy replication:– One replica is updated. Other copies are updated asynchronously

• Update policy:– Group: any node can update its replica.

– Master: only master updates its replica. The rest replicas are read only.

Page 5: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Representing Writes

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Mastered and Group Replication

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

The Scale-up Pitfall

• Replication works well on small, prototype systems– But, at deployment, replication is unstable

• At larger scales– Messages propagation delay increases

– Higher transaction rates

• For eager replication– More transactions with each txn taking longer

• For lazy transactions– Delays in reconciliation leads to system delusion

Page 8: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Analysis of Eager Group Replication

• Scaling laws– Third power of the number of nodes

– Fifth power of the # of operations per transaction

• Problems with eager replication– Cannot be used by disconnected nodes

– Probability of deadlocks (failed transactions) increases with systems size

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

Page 9: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Analysis of Lazy Group Replication

• Scaling laws– Third power of the number of nodes

– third power of the # of operations per transaction

• Better than eager, but not so good

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 10: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Analysis of Lazy Master Replication

• Scaling laws– second power of the number of nodes

– fifth power of the # of operations per transaction

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Status of Replication

• Negative scaling results– Don’t account for message delays (so it’s worse)

– Can’t escape these via lazy vs eager options

• No reason for group replication– Master is the same (eager) or better (lazy)

• So, what do we do– Avoid scale, keep systems small

Page 12: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Two-Tier Replication

• Two node types:– Base nodes: Always connected, store replica, master most objects

– Mobile nodes: often disconnected, store a replica, issues tentative transactions

• Two version types:– Master version:

• Exists at the object owner, other may have older versions

– Tentative version:• Local version is updated by tentative transactions

Page 13: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Pictures to Entertain

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 14: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

System Principles

• Hierarchies to reduce scale– Nodes (Master & Mobile-disconnected)

– Transactions (Tentative and Eager/Consistent)

• Techniques– Convergence (Bayou-like eventual consistency)

– Idempotence: encode writes in non-conflicting ways

• Does it fix any of Bayou’s semantic problems?

Page 15: CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a

CS 600.419 Storage Systems

Conclusions

• Eager: waits and deadlocks

• Lazy converts waits and deadlocks into reconciliations

• Both do not scale.

• Two tier replication: – Supports mobile nodes

– Combine eager-master-replication with local updates