replica placement strategy for wide-area storage systems

22
Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004

Upload: nicholas-zoie

Post on 03-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Replica Placement Strategy for Wide-Area Storage Systems. Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004. Environment. Store large quantities of data persistently and availably Storage Strategy Redundancy - duplicate data to protect against data loss - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Replica Placement Strategy  for Wide-Area Storage Systems

Replica Placement Strategy for Wide-Area Storage Systems

Byung-Gon Chun and Hakim Weatherspoon

RADS Final PresentationDecember 9, 2004

Page 2: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:2

Environment• Store large quantities of data persistently and

availably• Storage Strategy

– Redundancy - duplicate data to protect against data loss– Place data throughout wide area for availability and durability

• Avoid correlated failures– Continuously repair loss redundancy as needed

• Detect permanent node failures and trigger data recovery

Page 3: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:3

Assumptions• Data is maintained on nodes, in the wide area, and

in well maintained sites.• Sites contribute resources

– Nodes (storage, cpu)– Network - bandwidth

• Nodes collectively maintain data– Adaptive - Constant change, Self-organizing, self-

maintaining

• Costs– Data Recovery

• Process of maintaining data availability– Limit wide area bandwidth used to maintain data

Page 4: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:4

Challenge

• Avoiding correlated failures/downtime with careful data placement– Minimize cost of resources used to maintain data

• Storage• Bandwidth

– Maximize• Data availability

Page 5: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:5

Outline• Analysis of correlated failures

– Show that correlated failures exist - are significant

• Effects of common subnet (admin area, geographic location, etc)– Pick a threshold and extra redundancy

• Effects of extra redundancy– Vary extra redundancy– Compare random, random w/ constraint, and oracle

placement– Show that margin between oracle and random is small

Page 6: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:6

Analysis of PlanetLab Trace characteristics

• Trace-driven simulation• Model maintaining data on PlanetLab• Create trace using all-pairs ping*

– Collected from February 16, 2003 to October 6, 2004

• Measure– Correlated failures v. time– Probability of k nodes down simultaneously– {5th Percentile, Median} number of available replicas v. time– Cumulative number of triggered data recovery v. time

*Jeremy Stribling http://infospect.planet-lab.org/pings

Page 7: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:7

Analysis of PlanetLab II Correlated failures

Page 8: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:8

Analysis I - Node characteristics

Page 9: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:9

Analysis II- Correlated Failures

Page 10: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:10

Correlated Failures

Page 11: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:11

Correlated Failures (machine with downtime <= 1000 slots)

Page 12: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:12

Availability Trace

Page 13: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:13

Replica Placement Strategies

• Random• RandomSite

– Avoid to place multiple replicas in the same site– A site in PlanetLab is identified by 2B IP address prefix.

• RandomBlacklist– Avoid to use machines, in blacklist, that are top k

machines with long down time

• RandomSiteBlacklist– Combine RandomSite and RandomBlacklist

Page 14: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:14

Comparison of simple strategies(m=1, th=9, n=14, |blacklist|

=35)

Strategy Random RandomSite

RandomBlacklist

RandomSiteBlacklist

# of repairs

9075 8581 8691 8160

Improvement (%)

5.44 4.23 10.08

Page 15: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:15

Simulation setup• Placement Algorithm

– Random vs. Oracle – Oracle strategies

• Max-Lifetime-Availability• Min-Max-TTR, Min-Sum-TTR, Min-Mean-TTR

• Simulation Parameters– Replication m = 1, threshold th = 9, total replicas n = 15– Initial repository size 2TB– Write rate 1Kbps per node and 10Kbps per node

• 300 storage nodes• System increases in size at rate of 3TB and 30TB per year,

respective.

• Metrics– Number of available nodes– Number of data repairs

Page 16: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:16

Comparison of simple strategies(m=1, th=9)

Page 17: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:17

Results - Random Placement(1Kbps)

Page 18: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:18

Results - Oracle Max-Lifetime-Avail

(1Kbps)

Page 19: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:19

Results - Breakdown of Random (1Kbps)

Page 20: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:20

Results - Random(10Kbps)

Page 21: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:21

Results - Breakdown of Random (10Kbps)

Page 22: Replica Placement Strategy  for Wide-Area Storage Systems

Final Presentation:22

Conclusion

• There does exist correlated downtimes. • Random is sufficient

– A minimum data availability threshold and extra redundancy is sufficient to absorb most correlation.