rim moussa [email protected] ceria.dauphine.fr/rim/rim.html

of 65 /65
Contribution to the Design & Implementation of the Highly Available Scalable and Distributed Data Structure: LH* RS Rim Moussa Rim Moussa [email protected] [email protected] http://ceria.dauphine.fr/rim/ http://ceria.dauphine.fr/rim/ rim.html rim.html Thesis Presentation in Computer Science *Distributed Databases Thesis Supervisor: Pr. Witold Litwin Examinators: Pr. Thomas J.E. Schwarz Pr. Toré Risch Jury President: Pr. Gérard Lévy Paris Dauphine University *CERIA Lab. *04th October 2004

Author: kesia

Post on 18-Jan-2016

60 views

Category:

Documents


3 download

Embed Size (px)

DESCRIPTION

Paris Dauphine University *CERIA Lab. *04th October 200 4. Contribution to the Design & Implementation of the Highly Available Scalable and Distributed Data Structure: LH* RS. Rim Moussa [email protected] http://ceria.dauphine.fr/rim/rim.html. Thesis Supervisor: Pr. Witold Litwin - PowerPoint PPT Presentation

TRANSCRIPT

  • Contribution to the Design & Implementation of the Highly Available Scalable and Distributed Data Structure: LH*RSRim Moussa [email protected] http://ceria.dauphine.fr/rim/rim.htmlThesis Presentation in Computer Science *Distributed DatabasesThesis Supervisor: Pr. Witold Litwin Examinators: Pr. Thomas J.E. Schwarz Pr. Tor RischJury President: Pr. Grard LvyParis Dauphine University *CERIA Lab.*04th October 2004

    R. Moussa, U. Paris Dauphine

  • OutlineIssue State of the Art LH*RS SchemeLH*RS ManagerExperimentationsLH*RS File CreationBucket RecoveryParity Bucket CreationConclusion & Future Work

    R. Moussa, U. Paris Dauphine

  • Facts Volume of Information of 30% /yearTechnologyNetwork Infrastructure >> Gilder Law, bandwidth triples every year.Evolution of PCs storage & computing capacities>> Moore Law, the latters double every 18 months.Bottleneck of Disks Accesses & CPUsNeed of Distributed Data Storage SystemsSDDSs: LH*, RP* High Throughput

    R. Moussa, U. Paris Dauphine

  • Facts Frequent & Costly Failures>> Stat. Published by the Contingency Planning Research in 1996: the cost of service interruption/h case of brokerage application is $6,45 million.Need of Distributed & Highly-Available Data Storage Systems Multicomputers >> Modular Architecture >> Good Price/ Performance Tradeoff

    R. Moussa, U. Paris Dauphine

  • State of the Art Parity Calculus(+) Good Response Time, Mirors are functional(-) High Storage Overhead (n if n repliquas)Data Replication Criteria to evaluate Erasure-resilient Codes: Encoding Rate (Parity Volume/ Data Volume) Update Penality (Parity Volumes) Group Size used for Data Reconstruction Encoding & Decoding Complexity Recovery Capabilitties

    R. Moussa, U. Paris Dauphine

  • Parity Schemes1-Available Schemesk-Available Schemes Binary Linear Codes: [H94] Tolerate max. 3 failuresArray Codes: EVENODD [B94 ], X-code [XB99], RDP [C+04] Tolerate max. 2 failures Reed Solomon Codes : IDA [R89], RAID X [W91], FEC [B95], Tutorial [P97], LH*RS [LS00, ML02, MS04, LMS04] Tolerate k failures (k > 3) XOR Parity Calculus : RAID Technology (level 3, 4, 5) [PGK88], SDDS LH*g [L96]

    R. Moussa, U. Paris Dauphine

  • OutlineIssueState of the Art LH*RS SchemeLH*RS?SDDSs?Reed Solomon Codes?Encoding/ Decoding OptimizationsLH*RS ManagerExperimentations

    R. Moussa, U. Paris Dauphine

  • LH*RS ?Distribution using Linear Hashing (LH*LH [KLR96]) LH*LH Manager[B00]Scalability & High Throughput High AvailabilityLH*: Scalable & Distributed Data StructureParity Calculus using Reed-Solomon Codes [RS63]LH*RS [LS00]

    R. Moussa, U. Paris Dauphine

  • SDDSs Principles (1) Dynamic File Growth ClientNetworkClientData BucketsCoordinator

    R. Moussa, U. Paris Dauphine

  • SDDSs Principles (2)(2) No Centralized Directory AccessCases de Donnes ClientFile Image

    R. Moussa, U. Paris Dauphine

  • Reed-Solomon CodesEncodingFrom m Data Symbols Calculus of n Parity SymbolsData Representation Galois Field Fields with finite size: qClosure Propoerty: Addition, Substraction, Multiplication, Division.In GF(2w),(1) Addition (XOR)(2) Multiplication (Tables: gflog and antigflog) e1 * e2 = antigflog[ gflog[e1] + gflog[e2] ]

    R. Moussa, U. Paris Dauphine

  • RS Encoding100000 C1,1 C1,j C1,n-m01000 0 C2,1 C2,j C2,n-m00100 0 C3,1 C3,j C3,n-m 0 0000 1 Cm,1 Cm,j Cm,n-m

    R. Moussa, U. Paris Dauphine

  • RS Decoding100000C1,1 C1,2 C1,3 C1,n-m01000 0 C2,1 C2,2 C2,3 C2,n-m00100 0 C3,1 C3,2 C3,3 C3,n-m 0 00 0 0 1 Cm,1 Cm,2 Cm,3 Cm,n-mS1S2S3S4:SmP1P2P3: Pn-m

    R. Moussa, U. Paris Dauphine

  • OptimizationsGalois FieldParity MatrixGF Multiplication (+)GF(216) vs. GF(28) reduces the #Symbols by 1/2 #Operations in the GF.GF(28) 1 symbol = 1 ByteGF(216) 1 symbol = 2 Bytes (-)Multiplication Tables SizeGF(28): 0,768 KoGF(216): 393,216 Ko (512 0,768)

    R. Moussa, U. Paris Dauphine

  • Optimizations (2)Galois FieldParity MatrixGF Multiplication1st Column of 1sEncoding of the 1st PB along XOR Calculus Gain in encoding & decoding 1st Row of 1sAny update from 1st DB is processed with XOR Calculus Gain in Performance of 4% (case PB creation, m =4)0001 0001 0001 0001 eb9b 2284 0001 2284 974 0001 9e44 d7f1

    R. Moussa, U. Paris Dauphine

  • Optimizations (3)Galois FieldParity MatrixGF MultiplicationEncodingLog Pre-calculus of the Coef. of P Matrix Improvement of 3,5%0000 0000 0000 0000 5ab5 e267 0000 e267 0dce 0000 784d 2b66 DecodingLog Pre-calculus of coef. of H-1 matrix and OK symbols vector Improvement of 4% to 8% depending on the #buckets to recoverGoal: Reduce GF Multiplication Complexity e1 * e2 = antigflog[ gflog[e1] + gflog[e2] ]

    R. Moussa, U. Paris Dauphine

  • LH*RS -Parity Groups

    Data BucketsParity Buckets

    : Key; DataInsert Rankr: Rank; [Key-list ]; Parity Key r 210 210A k-Acvailable Group survive to the failure of k buckets Grouping Concept m: #data buckets k: #parity buckets

    R. Moussa, U. Paris Dauphine

  • OutlineIssueState of the Art LH*RS SchemeLH*RS ManagerCommunicationGross Architecture5.Experimentations6.File CreationBucket Recovery

    R. Moussa, U. Paris Dauphine

  • CommunicationTCP/IPUDPMulticastIndividual Operations (Insert, Update, Delete, Search) Record RecoveryControl MessagesPerformance

    R. Moussa, U. Paris Dauphine

  • CommunicationTCP/IPUDPMulticastLarge Buffers TransfertNew Parity BucketsTransfer Parity Update & Record (Bucket Split)Bucket Recovery Performance & Reliability

    R. Moussa, U. Paris Dauphine

  • CommunicationTCP/IPUDPMulticastLooking for New Data/Parity BucketsCommunication Multipoints

    R. Moussa, U. Paris Dauphine

  • Architecture(1)TCP/IP Connection Handler Principle of Sending Credit & Message Conservation until delivery [J88, GRS97, D01]1 Bucket Recovery (3,125 MB): SDDS 2000: 6,7 s SDDS2000-TCP: 2,6 s (Hardware Config.: CPU 733MhZ machines, network 100Mbps) Before Improvement of 60%TCP/IP Connections are passive OPEN, RFC 793 [ISI81], TCP/IP under Win2K Server OS [MB00](2)Flow Control & Message Acknowledgement (FCMA)Enhancements to SDDS2000 Architecture:

    R. Moussa, U. Paris Dauphine

  • Architecture (2)BeforeTo tag new servers (data or parity) using Multicast:(3)Dynamic IP Addressing StructurePre-defined and Static [email protected] TableMulticast Group of Blank Data BucketsMulticast Group of Blank Parity BucketsCoordinatorCreated Buckets

    R. Moussa, U. Paris Dauphine

  • Architecture (3) Multicast Listening Port UDP Sending Port TCP/IP Port UDP Listening Port UDP Listening Thread Messages QueueTCP Listening Thread Multicast listening Thread Message QueuePool of Working ThreadsNetworkACK Mgmt Threads Free ZonesMessages waiting for ACK.Not acquittedMessagesACK StructureMulticast Working Thread

    R. Moussa, U. Paris Dauphine

  • ExperimentationPerformance Evaluation *CPU Time *Communication Time Experimental Environment*5 Machines (Pentium IV: 1.8 GHz, RAM: 512 Mb) *Ethernet Network 1 Gbps*O.S.: Win2K Server*Tested Configuration: 1 Client, A group of 4 Data Buckets, k Parity Buckets (k = 0,1,2,3).

    R. Moussa, U. Paris Dauphine

  • OutlineIssue State of the Art LH*RS Scheme LH*RS Manager ExperimentationsFile CreationParity Update PerformanceBucket RecoveryParity Bucket Creation

    R. Moussa, U. Paris Dauphine

  • File CreationClient OperationsPropagation of Data Record Inserts/ Updates/ Deletes to Parity Buckets. Update: Send only record. Deletes: Management of Free Ranks within Data Buckets.Data Bucket Split N1: #renaining recordsN2: #leaving recordsParity Group of the Splitting Data BucketN1+N2 Deletes + N1 Inserts Parity Group of the New Data BucketN2 Inserts

    R. Moussa, U. Paris Dauphine

  • PerformancesConfig.Client Window = 1Client Window = 5Max Bucket Size = 10 000 records File of 25 000 records 1 record = 104 BytesNo difference GF(28) et GF(216) (we dont wait for ACKs between DBs and PBs)

    R. Moussa, U. Paris Dauphine

  • PerformancesConfig.Client Window = 1Client Window = 5k = 0 ** k = 1 Perf. Degradation of 20%k = 1 ** k = 2 Perf. Degradation of 8%

    R. Moussa, U. Paris Dauphine

    Chart2

    0.1410.1720.171

    0.2820.3280.359

    0.4380.50.531

    0.5790.6560.703

    0.7190.8130.89

    0.8750.9841.062

    1.0321.1411.25

    1.1721.3131.421

    1.3131.4691.625

    1.4691.6411.796

    1.611.7971.984

    1.7661.9692.156

    1.9222.1412.343

    2.0632.2972.515

    2.2042.4532.687

    2.3442.6252.875

    2.52.7813.046

    2.6412.9383.234

    2.7823.1093.406

    2.9383.2663.593

    3.1574.1564.5

    3.3134.3134.687

    3.4544.4694.859

    3.5944.6415.046

    3.754.7975.218

    3.8914.9845.406

    4.0475.1565.578

    4.1885.3135.765

    4.3295.4695.937

    4.4695.6416.109

    4.615.7976.296

    4.755.9536.468

    4.9076.1096.671

    5.0476.2816.843

    5.1886.4387.031

    5.3296.5947.203

    5.4856.7667.39

    5.6256.9227.562

    5.7667.0787.75

    5.9227.257.921

    6.0637.4068.109

    6.0637.4068.109

    6.2827.4068.109

    6.2827.6568.406

    6.2828.2819

    6.3758.2819

    6.5328.4539.187

    6.6888.6259.375

    6.8298.7819.562

    6.9858.9539.75

    7.1259.1099.921

    7.2669.28110.109

    7.4229.45310.296

    7.5639.60910.484

    7.7199.78110.656

    7.8759.96910.843

    k = 0

    k = 1

    k = 2

    Inserted Keys

    File Creation Time (sec)

    7,896s

    9,990s

    10,963s

    Chart1

    000

    0.1410.1720.172

    0.2820.3280.359

    0.4380.50.531

    0.5790.6560.718

    0.7190.8130.89

    0.8750.9841.078

    1.0321.1411.25

    1.1721.3131.437

    1.3131.4691.609

    1.4691.6411.781

    1.611.7971.968

    1.7661.9692.156

    1.9222.1412.328

    2.0632.2972.515

    2.2042.4532.687

    2.3442.6252.859

    2.52.7813.047

    2.6412.9383.218

    2.7823.1093.406

    2.9383.2663.578

    3.1574.1564.484

    3.3134.3134.672

    3.4544.4694.843

    3.5944.6415.031

    3.754.7975.203

    3.8914.9845.39

    4.0475.1565.578

    4.1885.3135.75

    4.3295.4695.922

    4.4695.6416.109

    4.615.7976.281

    4.755.9536.468

    4.9076.1096.64

    5.0476.2816.828

    5.1886.4387

    5.3296.5947.187

    5.4856.7667.359

    5.6256.9227.547

    5.7667.0787.718

    5.9227.257.906

    6.0637.4068.078

    6.0637.4068.078

    6.2827.4068.093

    6.2827.6568.375

    6.2828.2819

    6.3758.2819

    6.5328.4539.203

    6.6888.6259.39

    6.8298.7819.578

    6.9858.9539.765

    7.1259.1099.937

    7.2669.28110.125

    7.4229.45310.312

    7.5639.60910.484

    7.7199.78110.672

    7.8759.96910.859

    k = 0

    k = 1

    k = 2

    Inserted keys

    Insert Time (sec)

    7,896

    9,990

    10,963

    k = 0, GF[2^8]

    AckEssai 1Essai 2Essai 3

    KeyTotal time (sec)avg rec(ms)Total time (sec)avg rec(ms)Total time (sec)avg rec(ms)

    0000000

    5000.1560.3120.1710.3420.1410.282

    10000.2970.2820.3120.2820.2820.282

    15000.4370.2800.4680.3120.4380.312

    20000.5940.3140.6090.2820.5790.282

    25000.7340.2800.7650.3120.7190.280

    30000.8910.3140.9210.3120.8750.312

    35001.0310.2801.0620.2821.0320.314

    40001.1720.2821.2180.3121.1720.280

    45001.3280.3121.3590.2821.3130.282

    50001.4690.2821.5150.3121.4690.312

    55001.6250.3121.6560.2821.6100.282

    60001.7660.2821.8120.3121.7660.312

    65001.9220.3121.9680.3121.9220.312

    70002.0620.2802.1090.2822.0630.282

    75002.2030.2822.2650.3122.2040.282

    80002.3590.3122.4060.2822.3440.280

    85002.5000.2822.5460.2802.5000.312

    90002.6410.2822.7030.3142.6410.282

    95002.7970.3122.8590.3122.7820.282

    100002.9370.2803.0000.2822.9380.312

    100013.156219.0003.218218.0003.157219.000

    105003.3120.3133.3750.3153.3130.313

    110003.4530.2823.5150.2803.4540.282

    115003.5940.2823.6710.3123.5940.280

    120003.7500.3123.8120.2823.7500.312

    125003.9060.3123.9530.2823.8910.282

    130004.0620.3124.1090.3124.0470.312

    135004.2030.2824.2500.2824.1880.282

    140004.3440.2824.3900.2804.3290.282

    145004.4840.2804.5460.3124.4690.280

    150004.6250.2824.6870.2824.6100.282

    155004.7660.2824.8280.2824.7500.280

    160004.9220.3124.9680.2804.9070.314

    165005.0620.2805.1250.3145.0470.280

    170005.2030.2825.2650.2805.1880.282

    175005.3440.2825.4060.2825.3290.282

    180005.5000.3125.5620.3125.4850.312

    185005.6410.2825.7030.2825.6250.280

    190005.7810.2805.8430.2805.7660.282

    195005.9370.3126.0000.3145.9220.312

    200006.0780.2826.1400.2806.0630.282

    200016.0780.0006.1400.0006.0630.303

    200026.297219.0006.359219.0006.282219.000

    200036.2970.0006.3590.0006.2820.314

    200046.2970.0006.3590.0006.2820.314

    200056.39194.0006.45394.0006.37593.000

    205006.5470.3156.6250.3476.5320.317

    210006.7030.3126.7650.2806.6880.312

    215006.8440.2826.9210.3126.8290.282

    220006.9840.2807.0620.2826.9850.312

    225007.1250.2827.2180.3127.1250.280

    230007.2810.3127.3750.3147.2660.282

    235007.4220.2827.5150.2807.4220.312

    240007.5620.2807.6560.2827.5630.282

    245007.7190.3147.8120.3127.7190.312

    250007.8590.2807.9530.2827.8750.312

    DB 0 split0.2180.2030.219

    DB 1 split0.2190.2190.219

    DB 2 split0.0940.0940.094

    (k = 1) + RS, GF[2^8]

    AckEssai 1Essai 2Essai 3

    KeyTotal time (sec)avg rec(ms)Total time (sec)avg rec(ms)Total time (sec)avg rec(ms)

    0000000

    5000.1560.3120.1880.3760.1560.312

    10000.3280.3440.3440.3120.3280.344

    15000.4840.3120.5160.3440.4840.312

    20000.6410.3140.6720.3120.6560.344

    25000.8130.3440.8440.3440.8120.312

    30000.9690.3121.0160.3440.9840.344

    35001.1410.3441.1720.3121.1410.314

    40001.2970.3121.3440.3441.3120.342

    45001.4530.3121.5000.3121.4690.314

    50001.6250.3441.6720.3441.6410.344

    55001.7810.3121.8440.3441.7970.312

    60001.9530.3442.0160.3441.9690.344

    65002.1250.3442.1880.3442.1410.344

    70002.2810.3122.3440.3122.2970.312

    75002.4380.3142.5160.3442.4690.344

    80002.5940.3122.6720.3122.6410.344

    85002.7500.3122.8440.3442.8120.342

    90002.9220.3443.0000.3122.9690.314

    95003.0780.3123.1720.3443.1410.344

    100003.2500.3443.3440.3443.2970.312

    100014.125875.0004.250906.0004.172875.000

    105004.2810.3134.4220.3454.3440.345

    110004.4530.3444.5780.3124.5000.312

    115004.6090.3124.7500.3444.6720.344

    120004.7660.3144.9220.3444.8280.312

    125004.9380.3445.0780.3124.9840.312

    130005.0940.3125.2500.3445.1560.344

    135005.2500.3125.4220.3445.3120.312

    140005.4060.3125.5780.3125.4690.314

    145005.5780.3445.7500.3445.6410.344

    150005.7340.3125.9060.3125.7970.312

    155005.8910.3146.0780.3445.9530.312

    160006.0470.3126.2340.3126.1250.344

    165006.2190.3446.4060.3446.2810.312

    170006.3750.3126.5630.3146.4370.312

    175006.5310.3126.7340.3426.6090.344

    180006.6880.3146.9060.3446.7660.314

    185006.8590.3427.0630.3146.9370.342

    190007.0160.3147.2340.3427.0940.314

    195007.1880.3447.4060.3447.2500.312

    200007.3440.3127.5630.3147.4220.344

    200017.3440.0007.5630.0007.4220.000

    200027.3440.0007.57815.0007.43715.000

    200037.609265.0007.828250.0007.687250.000

    200048.219610.0008.438610.0008.328641.000

    200058.2190.0008.4380.0008.3280.000

    205008.4220.4108.6410.4108.5000.347

    210008.5780.3128.8130.3448.6560.312

    215008.7500.3448.9840.3428.8280.344

    220008.9220.3449.1410.3148.9840.312

    225009.0780.3129.3130.3449.1410.314

    230009.2660.3769.4840.3429.3120.342

    235009.4380.3449.6560.3449.4840.344

    240009.5940.3129.8130.3149.6410.314

    245009.7660.3449.9840.3429.7970.312

    250009.9220.31210.1410.3149.9690.344

    DB 0 split0.8750.8900.875

    DB 1 split0.8590.8600.906

    DB 2 split0.2500.2500.250

    (k = 1) + New Matrix, GF[2^8]

    AckEssai 1Essai 2Essai 3

    KeyTotal time (sec)avg rec(ms)Total time (sec)avg rec(ms)Total time (sec)avg rec(ms)

    0000000

    5000.1720.3440.1720.3440.1720.344

    10000.3280.3120.3280.3120.3280.312

    15000.5000.3440.5000.3440.5000.344

    20000.6560.3120.6560.3120.6560.312

    25000.8280.3440.8280.3440.8130.314

    30000.9840.3120.9840.3120.9840.342

    35001.1560.3441.1560.3441.1410.314

    40001.3120.3121.3280.3441.3130.344

    45001.4840.3441.5000.3441.4690.312

    50001.6400.3121.6560.3121.6410.344

    55001.8120.3441.8280.3441.7970.312

    60001.9840.3441.9840.3121.9690.344

    65002.1400.3122.1560.3442.1410.344

    70002.3120.3442.3280.3442.2970.312

    75002.4690.3142.4840.3122.4530.312

    80002.6250.3122.6400.3122.6250.344

    85002.7970.3442.8120.3442.7810.312

    90002.9530.3122.9680.3122.9380.314

    95003.1250.3443.1400.3443.1090.342

    100003.2810.3123.3120.3443.2660.314

    100014.156875.0004.187875.0004.156890.000

    105004.3280.3454.3590.3454.3130.315

    110004.4840.3124.5150.3124.4690.312

    115004.6560.3444.6870.3444.6410.344

    120004.8120.3124.8430.3124.7970.312

    125004.9690.3145.0000.3144.9840.374

    130005.1400.3425.1720.3445.1560.344

    135005.2970.3145.3280.3125.3130.314

    140005.4690.3445.5000.3445.4690.312

    145005.6250.3125.6560.3125.6410.344

    150005.7810.3125.8120.3125.7970.312

    155005.9370.3125.9840.3445.9530.312

    160006.1090.3446.1400.3126.1090.312

    165006.2650.3126.3120.3446.2810.344

    170006.4220.3146.4680.3126.4380.314

    175006.5940.3446.6400.3446.5940.312

    180006.7500.3126.7970.3146.7660.344

    185006.9060.3126.9680.3426.9220.312

    190007.0780.3447.1250.3147.0780.312

    195007.2340.3127.2970.3447.2500.344

    200007.4060.3447.4530.3127.4060.312

    200017.4060.0007.4530.0007.4060.370

    200027.4060.0007.46815.0007.4060.370

    200037.656250.0007.718250.0007.656250.000

    200048.281625.0008.359641.0008.281625.000

    200058.2810.0008.3590.0008.2810.414

    205008.4840.4108.5310.3478.4530.347

    210008.6400.3128.7030.3448.6250.344

    215008.7970.3148.8590.3128.7810.312

    220008.9690.3449.0310.3448.9530.344

    225009.1400.3429.1870.3129.1090.312

    230009.2970.3149.3590.3449.2810.344

    235009.4690.3449.5310.3449.4530.344

    240009.6250.3129.6870.3129.6090.312

    245009.7970.3449.8590.3449.7810.344

    250009.9690.34410.0310.3449.9690.376

    DB 0 split0.8750.8750.875

    DB 1 split0.8750.8910.875

    DB 0 split0.2500.2500.250

    (k = 2) + New Matrix, GF[2^8]

    AckEssai 1Essai 2Essai 3

    KeyTotal time (sec)avg rec(ms)Total time (sec)avg rec(ms)Total time (sec)avg rec(ms)

    0000000

    5000.1870.3740.1720.3440.1710.342

    10000.3750.3760.3590.3740.3590.376

    15000.5620.3740.5310.3440.5310.344

    20000.7500.3760.7180.3740.7030.344

    25000.9370.3740.8900.3440.8900.374

    30001.1400.4061.0780.3761.0620.344

    35001.3280.3761.2500.3441.2500.376

    40001.5150.3741.4370.3741.4210.342

    45001.7030.3761.6090.3441.6250.408

    50001.8900.3741.7810.3441.7960.342

    55002.0780.3761.9680.3741.9840.376

    60002.2810.4062.1560.3762.1560.344

    65002.4680.3742.3280.3442.3430.374

    70002.6560.3762.5150.3742.5150.344

    75002.8430.3742.6870.3442.6870.344

    80003.0310.3762.8590.3442.8750.376

    85003.2180.3743.0470.3763.0460.342

    90003.4060.3763.2180.3423.2340.376

    95003.6090.4063.4060.3763.4060.344

    100003.7960.3743.5780.3443.5930.374

    100014.703907.0004.484906.0004.500907.000

    105004.8900.3754.6720.3774.6870.375

    110005.0780.3764.8430.3424.8590.344

    115005.2500.3445.0310.3765.0460.374

    120005.4370.3745.2030.3445.2180.344

    125005.6250.3765.3900.3745.4060.376

    130005.8120.3745.5780.3765.5780.344

    135006.0000.3765.7500.3445.7650.374

    140006.1870.3745.9220.3445.9370.344

    145006.3590.3446.1090.3746.1090.344

    150006.5460.3746.2810.3446.2960.374

    155006.7340.3766.4680.3746.4680.344

    160006.9060.3446.6400.3446.6710.406

    165007.0930.3746.8280.3766.8430.344

    170007.2810.3767.0000.3447.0310.376

    175007.4680.3747.1870.3747.2030.344

    180007.6560.3767.3590.3447.3900.374

    185007.8280.3447.5470.3767.5620.344

    190008.0150.3747.7180.3427.7500.376

    195008.2030.3767.9060.3767.9210.342

    200008.3900.3748.0780.3448.1090.376

    200018.3900.0008.0780.0008.1090.405

    200028.40616.0008.09315.0008.1090.405

    200038.4060.0008.375282.0008.406297.000

    200049.312906.0009.000625.0009.000594.000

    200059.3120.0009.0000.0009.0000.450

    205009.5150.4109.2030.4109.1870.378

    210009.7030.3769.3900.3749.3750.376

    215009.8750.3449.5780.3769.5620.374

    2200010.0620.3749.7650.3749.7500.376

    2250010.2500.3769.9370.3449.9210.342

    2300010.4370.37410.1250.37610.1090.376

    2350010.6250.37610.3120.37410.2960.374

    2400010.8120.37410.4840.34410.4840.376

    2450011.0000.37610.6720.37610.6560.344

    2500011.1870.37410.8590.37410.8430.374

    DB 0 split0.9060.9060.891

    DB 1 split0.9220.9210.891

    DB 2 split0.2810.2820.281

    |||comparaison|||

    Essai 1Essai 2Essai 3MoyenneImprovement (%)Improvement (%)

    Total time (sec)Total time (sec)Total time (sec)|New MatrixGF[2^8] /GF[2^16]

    k = 0, GF[2^8]7.8597.9537.8757.896

    k = 0, GF[2^16]7.9078.0627.9857.9851.115

    k = 1,RS, GF[2^8]9.92210.1419.96910.011

    k = 1,XOR, GF[2^8]9.96910.0319.9699.9900.2097762387

    k = 1,RS, GF[2^16]10.21810.06210.17210.1511.379

    k = 1,XOR, GF[2^16]10.15610.15610.06210.1250.25614081181.333

    k = 2, GF[2^8]11.18710.85910.84310.963

    k = 2, GF[2^16]10.98410.9381110.9740.1002369236

    0.23295852520.982

  • PerformancesConfig.Client Window = 1Client Window = 5k = 0 ** k = 1 Perf. Degradation of 37%k = 1 ** k = 2 Perf. Degradation of 10%

    R. Moussa, U. Paris Dauphine

    Chart1

    000

    0.10366666670.14066666670.161

    0.20333333330.27066666670.3126666667

    0.3070.4060.4633333333

    0.40633333330.54666666670.6196666667

    0.51033333330.67166666670.776

    0.61466666670.8180.9323333333

    0.71866666670.9531.0883333333

    0.83333333331.08833333331.2343333333

    0.9321.21333333331.3903333333

    1.03633333331.35433333331.5416666667

    1.14566666671.48966666671.698

    1.251.63033333331.8593333333

    1.36466666671.77566666672.0156666667

    1.45833333331.90066666672.1716666667

    1.55733333332.03633333332.3176666667

    1.66133333332.16666666672.4633333333

    1.7712.30233333332.6143333333

    1.8752.42733333332.7656666667

    1.9742.56733333332.9216666667

    2.0782.7033.073

    2.08333333332.7033.073

    2.39066666673.98433333334.448

    2.44766666674.0524.5156666667

    2.51033333334.124.5886666667

    2.56733333334.20833333334.6666666667

    2.6254.2764.745

    2.68233333334.34866666674.8176666667

    2.7554.40633333334.8906666667

    2.8234.4794.9633333333

    2.89066666674.53633333335.0366666667

    2.94266666674.60966666675.1093333333

    34.67166666675.1823333333

    3.04666666674.74466666675.255

    3.1044.81233333335.3283333333

    3.16666666674.8755.4063333333

    3.2194.9485.479

    3.27566666675.01566666675.5626666667

    3.3285.07833333335.6353333333

    3.39066666675.15133333335.7083333333

    3.44266666675.2245.7916666667

    3.5055.29166666675.8646666667

    3.5055.29166666675.8646666667

    3.5055.29166666675.8646666667

    3.58833333335.29166666675.8646666667

    3.67166666675.2975.8646666667

    3.7555.2975.8696666667

    3.95833333336.46366666677.281

    4.00033333336.51533333337.328

    4.0476.56766666677.37

    4.08866666676.6257.4113333333

    4.13033333336.6727.458

    4.16666666676.7297.505

    4.21866666676.78133333337.552

    4.2556.82833333337.599

    4.29666666676.88533333337.6563333333

    4.34866666676.94266666677.7186666667

    k = 0

    k = 1

    k = 2

    Number of Inserted Keys

    File Creation Time (sec)

    4,349s

    6,940s

    7,720s

    UDP listen priority = highest

    without lossAVG

    Ackk = 0k = 1k = 2Avg k= 0Avg k= 1Avg k = 2

    KeyEssai 1Essai 2Essai 3Essai 1Essai 2Essai 3Essai 1Essai 2Essai 3

    5000.0930000.1250000.1090000.1410000.1250000.1410000.1720000.188000

    10000.2030000.2190000.2030000.2820000.2500000.2810000.3280000.344000

    15000.2960000.3280000.3130000.4220000.3910000.4060000.5000000.500000

    20000.4060000.4380000.4220000.5630000.5160000.5470000.6570000.657000

    25000.5150000.5470000.5310000.7030000.6560000.6880000.8130000.813000

    30000.6250000.6560000.6410000.8600000.7970000.8280000.9850000.985000

    35000.7340000.7660000.7500001.0000000.9370000.9530001.1570001.157000

    40000.8430000.8750000.8590001.1250001.0620001.0940001.3130001.297000

    45000.9530000.9840000.9690001.2660001.2030001.2190001.4690001.454000

    50001.0620001.0940001.0780001.4070001.3440001.3600001.6410001.610000

    55001.1710001.2030001.1880001.5470001.4840001.5000001.7970001.766000

    60001.2810001.3130001.2970001.6880001.6410001.6410001.9690001.938000

    65001.4060001.4220001.4220001.8440001.7810001.7810002.1410002.094000

    70001.5000001.5310001.5160001.9690001.9060001.9060002.2970002.250000

    75001.6090001.6250001.6250002.1100002.0310002.0780002.4850002.407000

    80001.7030001.7340001.7190002.2500002.1720002.2190002.6410002.563000

    85001.8120001.8440001.8280002.3910002.2970002.3440002.8130002.719000

    90001.9210001.9530001.9380002.5320002.4370002.4850002.9690002.860000

    95002.0310002.0470002.0470002.6720002.5780002.6100003.1250003.016000

    100002.1400002.1560002.1560002.8130002.7190002.7500003.2970003.188000

    100012.1400002.1560002.1560002.8130002.7190002.7500003.2970003.188000

    105002.8120002.4840002.4840003.9530004.3750003.8440004.40700017.313000

    110002.8430002.5470002.5780004.0160004.4370003.9220004.53200017.391000

    115002.8900002.6090002.6410004.0940004.4840003.9850004.61000017.454000

    120002.9370002.6560002.6880004.1570004.5470004.0630004.70300017.579000

    125002.9680002.7660002.7500004.2350004.6090004.1250004.78200017.641000

    130003.0150002.8280002.8130004.3130004.6720004.2030004.87500017.719000

    135003.0460002.8750002.8590004.3750004.7190004.2810004.95300017.766000

    140003.0780002.9380002.9220004.4530004.7810004.3440005.04700017.829000

    145003.1250002.9840002.9840004.5320004.8280004.4220005.12500017.891000

    150003.1560003.0470003.0310004.6410004.8910004.4850005.21900017.938000

    155003.2030003.1090003.0940004.7030004.9530004.5630005.29700018.000000

    160003.2340003.1560003.1560004.7820005.0160004.6250005.39100018.063000

    165003.2650003.2190003.2190004.8600005.0620004.7030005.46900018.125000

    170003.3120003.2810003.2660004.9220005.1250004.7660005.56300018.172000

    175003.3430003.3440003.3280005.0000005.1870004.8440005.64100018.235000

    180003.3900003.3910003.3910005.0780005.2500004.9220005.73500018.297000

    185003.4210003.4530003.4530005.1570005.3120004.9850005.81300018.360000

    190003.4680003.5160003.5160005.2350005.3750005.0630005.90700018.422000

    195003.5000003.5780003.5630005.2970005.4370005.1410006.00000018.469000

    200003.5460003.6410003.6250005.3750005.5000005.2030006.07800018.532000

    200013.5460003.6410003.6250005.3750005.5000005.2190006.07800018.532000

    200023.5460003.6410003.6250005.3750005.5000005.2190006.07800018.532000

    200033.5460003.6410003.6250005.3750005.5000005.2190006.07800018.532000

    200043.5460003.6410003.6410005.3750005.5000005.2190006.07800018.532000

    200053.5460003.9060003.8910005.3910005.5000005.2190006.07800018.532000

    205003.5780004.1880004.0310007.2820005.7970007.0940008.04700018.594000

    210003.6090004.2190004.0780007.3130005.8590007.2810008.09400018.657000

    215003.6560004.2660004.1250007.3440005.8910007.3280008.14100018.719000

    220003.6870004.2970004.1560007.3910005.9370007.4690008.18800018.782000

    225003.7340004.3280004.2030007.4380005.9840007.5160008.21900018.844000

    230003.7650004.3750004.2340007.4690006.0310007.5630008.26600018.907000

    235003.8120004.4060004.2810007.5000006.0620007.6100008.34400018.969000

    240003.8430004.4530004.3130007.5470006.1090007.6410008.39100019.032000

    245003.8900004.4840004.3590007.5780006.1410007.6880008.45300019.079000

    250003.9210004.5310004.3910007.6410006.1870007.7350008.50000019.141000

    DB 0 split

    DB 1 split

    DB 0 split

    DB 0 split

    k = 0

    Essai 1: 6 DBs, {DB0, DB1, DB4, DB5}: 3125 records, {DB2, DB3}: 6250 records

    Essai 2: 5 DBs, {3125, 6251, 6249, 6249, 3126} records respectiv, {DB0, ,,,, DB4}

    Essai 3: 4 DBs, {6251, 6250, 6249, 6250} records respectiv, {DB0, ,,,, DB3}

    k = 1

    Essai 1

    DB0DB1DB2DB3DB4Total 1PB1Perte 1Total 2PB2Perte 2

    # recs3125625062506250312562503125

    # recv ins15671628012351224609243780.05%6090%

    # sent to PBs1566062721235122460924391598

    # forwards118000

    Essai 2

    DB0DB1DB2DB3TotalPB1Perte

    # recs62516250624962506251

    # recv ins13809625637041246250000.00%

    # sent to PBs1380062503704124625000

    # forwards9600

    Essai 3

    DB0DB1DB2DB3DB4DB5Total 1PB1Perte 1Total 2PB2Perte 2

    # recs31263127624962503125312362503125

    # recv ins15708573512231212587557238460.04%11440%

    # sent to PBs15695572712221212587557238561144

    # forwards1381000

    k = 2

    Essai 1

    DB0DB1DB2DB3DB4Total1PB1PB2Perte 1Total 2PB3PB4Perte2

    # recs312562516249624931266158615931263126

    # recv ins1568662881226121660724294242980.40%6066060%

    # sent to PBs3134112562245024321212487851212

    # forwards147101

    Essai 2

    DB0DB1DB2DB3DB4DB5Total1PB1PB2Perte 1Total 2PB3PB4Perte2

    # recs3127312862546252312531246254625431253125

    # recv ins119621972371837051851183621310213070.06%368736870%

    # sent to PBs2387739207436741037023672426437374

    # forwards22120000

    Trial number 2: I've tried many times I get same results (19,,,, for file creation time): --> data records losses at the client level --> so messages retransmissions

    Fast + Ack [client -- DB]

    without lossAVG

    Ackk = 0k = 1k = 2Avg k= 0Avg k= 1Avg k = 2

    KeyEssai 1Essai 2Essai 3Essai 1Essai 2Essai 3Essai 1Essai 2Essai 3

    0000000000000

    5000.0930.1090.1090.1250.1560.1410.1560.1560.1710.1040.140.16

    10000.1720.2190.2190.2650.2810.2660.3130.3130.3120.2030.270.31

    15000.2650.3280.3280.3900.4220.4060.4690.4530.4680.3070.410.46

    20000.3590.4380.4220.5310.5620.5470.6250.6090.6250.4060.550.62

    25000.4530.5470.5310.6560.6870.6720.7810.7660.7810.5100.670.78

    30000.5470.6560.6410.7970.8440.8130.9380.9220.9370.6150.820.93

    35000.6400.7660.7500.9370.9690.9531.0941.0781.0930.7190.951.09

    40000.7340.8750.8911.0621.1091.0941.2501.2191.2340.8331.091.23

    45000.8120.9841.0001.1871.2341.2191.4061.3751.390.9321.211.39

    50000.9061.0941.1091.3281.3751.3601.5631.5311.5311.0361.351.54

    55001.0151.2031.2191.4531.5161.5001.7191.6881.6871.1461.491.70

    60001.1091.3131.3281.5941.6561.6411.8911.8441.8431.2501.631.86

    65001.2181.4381.4381.7341.8121.7812.047221.3651.782.02

    70001.2971.5471.5311.8591.9371.9062.2032.1562.1561.4581.902.17

    75001.3901.6411.6411.9842.0782.0472.3602.2972.2961.5572.042.32

    80001.4841.7501.7502.1252.2032.1722.5002.4532.4371.6612.172.46

    85001.5781.8911.8442.2502.3442.3132.6562.5942.5931.7712.302.61

    90001.6722.0001.9532.3752.4692.4382.8132.752.7341.8752.432.77

    95001.7502.1092.0632.5152.6092.5782.9692.9062.891.9742.572.92

    100001.8432.2192.1722.6402.7502.7193.1253.0633.0312.0782.703.07

    100011.8592.2192.1722.6402.7502.7193.1253.0633.0312.0832.703.07

    105002.1562.5472.4693.7034.4063.8444.8604.2344.252.3913.984.45

    110002.2032.6092.5313.7654.4693.9224.9224.3134.3122.4484.054.52

    115002.2652.6722.5943.8444.5313.9854.9854.3914.392.5104.124.59

    120002.3122.7342.6563.9844.5784.0635.0474.4694.4842.5674.214.67

    125002.3752.7972.7034.0624.6414.1255.1104.5634.5622.6254.284.75

    130002.4222.8592.7664.1404.7034.2035.1724.6414.642.6824.354.82

    135002.4842.9532.8284.2034.7504.2665.2354.7194.7182.7554.414.89

    140002.5313.0162.9224.2814.8124.3445.2974.7974.7962.8234.484.96

    145002.6253.0632.9844.3444.8594.4065.3604.8754.8752.8914.545.04

    150002.6723.1253.0314.4224.9224.4855.4224.9534.9532.9434.615.11

    155002.7183.1883.0944.4844.9844.5475.4855.0315.0313.0004.675.18

    160002.7653.2343.1414.5625.0474.6255.5475.1095.1093.0474.745.26

    165002.8123.2973.2034.6405.0944.7035.6105.1885.1873.1044.815.33

    170002.8753.3593.2664.7035.1564.7665.6885.2665.2653.1674.885.41

    175002.9223.4223.3134.7815.2194.8445.7505.3445.3433.2194.955.48

    180002.9683.4843.3754.8595.2664.9225.8135.4385.4373.2765.025.56

    185003.0153.5313.4384.9225.3284.9855.8755.5165.5153.3285.085.64

    190003.0783.5943.5005.0005.3915.0635.9385.5945.5933.3915.155.71

    195003.1253.6563.5475.0785.4535.1416.0165.6885.6713.4435.225.79

    200003.1873.7193.6095.1565.5165.2036.0785.7665.753.5055.295.86

    200013.1873.7193.6095.1565.5165.2036.0785.7665.753.5055.295.86

    200023.1873.7193.6095.1565.5165.2036.0785.7665.753.5055.295.86

    200033.4373.7193.6095.1565.5165.2036.0785.7665.753.5885.295.86

    200043.4373.7193.8595.1565.5165.2196.0785.7665.753.6725.305.86

    200053.4373.9693.8595.1565.5165.2196.0785.7665.7653.7555.305.87

    205003.5934.1414.1416.4695.8757.0476.4697.7037.6713.9586.467.28

    210003.6254.1884.1886.5155.9377.0946.5167.757.7184.0006.527.33

    215003.6724.2504.2196.5785.9847.1416.5637.7977.754.0476.577.37

    220003.7034.2974.2666.6406.0477.1886.6107.8287.7964.0896.637.41

    225003.7504.3444.2976.6876.0947.2356.6567.8757.8434.1306.677.46

    230003.7814.3914.3286.7346.1567.2976.7037.9227.894.1676.737.51

    235003.8284.4534.3756.7976.2037.3446.7507.9697.9374.2196.787.55

    240003.8594.5004.4066.8446.2507.3916.7978.0167.9844.2556.837.60

    245003.8904.5474.4536.9066.3127.4386.8448.0948.0314.2976.897.66

    250003.9534.6094.4846.9696.3597.5006.9068.1418.1094.3496.947.72

    DB 0 split0.2500.9840.9841.0631.0621.1101.125

    DB 1 split0.0930.2650.3120.2660.3430.3130.281

    DB 0 split0.2650.9840.6090.9670.5941.0151.000

    DB 0 split0.5470.5460.547

    Configuration: toutes les entits DBs et client sont sur des machines 1,8Ghz + 1Gbps

    Window Size = Working Threads in a DB = 5

    PBs and other buckets if created are in the 2,6 GKz machine

    k = 1

    Essai 1

    DB0DB1DB2DB3TotalPB1Perte

    # recs62506250625062505887

    # recv ins16288628612221215246321.50%

    # sent to PBs1628262811222121525000

    # forwards6500

    Essai 2

    DB0DB1DB2DB3TotalPB1Perte

    # recs62506250625062506207

    # recv ins13792625637201244248220.70%

    # sent to PBs1378562523719124425000

    # forwards7410

    Essai 3

    DB0DB1DB2DB3DB4Total 1PB1Perte 1Total 2PB2Perte 2

    # recs3125625062496250312662503126

    # recv ins15690628512221216598244020%5980%

    # sent to PBs1568362811222121659824402598

    # forwards74000

    k = 2

    Essai 1

    DB0DB1DB2DB3TotalPB1PB2Perte

    # recs625062516250624959575985

    # recv ins1378862503727124624263242533.00%

    # sent to PBs27557124947452249249995

    # forwards7310

    Essai 2

    DB0DB1DB2DB3DB4Total1PB1PB2Perte 1Total 2PB3PB4Perte2

    # recs312562506251625031246251625131243124

    # recv ins1570162921226120860024400243970%6006000%

    # sent to PBs3136712572244224161200487971200

    # forwards166500

    Essai 3

    DB0DB1DB2DB3DB4Total1PB1PB2Perte 1Total 2PB3PB4Perte2

    # recs312562516250624931256121611931253125

    # recv ins1568362871228121360524257242520.60%6056050%

    # sent to PBs3134112564245624261210487871210

    # forwards115000

    Fast + Ack [client -- DB]

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    k = 0

    k = 1

    k = 2

    Nombre de cls insres

    Temps cration fichier (sec)

    4,349s

    6,940s

    7,720s

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    k = 0

    k = 1

    k = 2

    Number of Inserted Keys

    File Creation Time (sec)

    4,349s

    6,940s

    7,720s

  • OutlineIssue State of the Art LH*RS Scheme LH*RS Manager ExperimentationsFile CreationBucket RecoveryScenarioPerformances8. Parity Bucket Creation

    R. Moussa, U. Paris Dauphine

  • ScenarioFailure DetectionAre you Alive? Data BucketsParity BucketsCoordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (2)Waiting for Responses OKData BucketsParity BucketsOKOKOKCoordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (3)Searching Spare Buckets Wanna be Spare ? Multicast Group of Blank Data BucketsCoordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (4)Waiting for Replies Launch UDP Listening Launch TCP Listening, Launch Working Thredsl*Waiting for Confirmation* If Time-out elapsed cancel everythingI would Multicast Group of Blank Data BucketsCoordinatorI would I would

    R. Moussa, U. Paris Dauphine

  • Scenario (5)Spare SelectionMulticast Group of Blank Data BucketsConfirmedCancellationConfirmedYou are HiredCoordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (6)Parity BucketsRecover Failed BucketsRecovery Manager Selection Coordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (7)Data BucketsParity BucketsRecovery ManagerSpare BucketsBuckets participating to RecoverySend me Records of rank in [r, r+slice-1] Query Phase

    R. Moussa, U. Paris Dauphine

  • Scenario (8)Decoding Phase Recovered SlicesData BucketsParity BucketsSpare BucketsBuckets participating to RecoveryRequested BuffersReconstruction PhaseRecovery ManagerIn // with Query Phase

    R. Moussa, U. Paris Dauphine

  • Performances2 DBs1 DB XORConfig.1 DB RSXOR vs. RS File Info File of 125 000 records Record Size = 100 bytes Bucket Size = 31250 records 3.125 MB Group of 4 Data Buckets (m = 4), k-Available with k = 1,2,3 Decoding * GF(216) * RS+ Decoding (RS + log Pre-calculus of H-1 and OK Symboles Vector) Recovery per Slice (adaptative to PCs storage & computing capacities)

    R. Moussa, U. Paris Dauphine

  • Performances2 DBs1 DB XORConfig.1 DB RSXOR vs. RS

    SliceTotal Time (sec)CPU Time (sec)Com. Time (sec)12500,6250,2660,34831250,5880,2550,32362500,5520,2400,312156250,5620,2550,302312500,5780,2500,328

    R. Moussa, U. Paris Dauphine

  • Performances2 DBs1 DB XORConfig.1 DB RSXOR vs. RS

    SliceTotal Time (sec)CPU Time (sec)Com. Time (sec)12500,7340,3490,36531250,6880,3590,32362500,6560,3540,297156250,6670,3600,297312500,6880,3600,328

    R. Moussa, U. Paris Dauphine

  • Performances2 DBs1 DB XORConfig.Time to Recover 1DB -XOR : 0,58 secXOR in GF(216) realizes a gain of 13% in Total Time (and 30% in CPU Time)Time to Recover 1DB RS : 0,67 sec1 DB RSXOR vs. RS

    R. Moussa, U. Paris Dauphine

  • Performances3 DBs2 DBsSummaryXOR vs. RS1 DB RS

    SliceTotal Time (sec)CPU Time (sec)Com. Time (sec)12500,9760,5770,37531250,9320,5890,33862500,8830,5620,321156250,8750,5620,281312500,8750,5620,313

    R. Moussa, U. Paris Dauphine

  • Performances3 DBs2 DBsSummaryXOR vs. RS1 DB RS

    SliceTotal Time (sec)CPU Time (sec)Com. Time (sec)12501,2810,8280,40631251,2500,8280,39062501,2110,8520,352156251,1880,8230,361312501,2030,8280,375

    R. Moussa, U. Paris Dauphine

  • Performances3 DBs2 DBsSummaryXOR vs. RS1 DB RSTime to Recover f Buckets f Time to Recover 1 Bucket Factorized Query Phase The + is Decoding Time & Time to send Recovered Buffers

    fBucket Size (MB)Total Time (sec)Recovery Speed (MB/sec)1 (XOR)1 (RS)3,1250,585.380,674.6626,2500,96.9439,3751,237,62

    R. Moussa, U. Paris Dauphine

  • PerformancesGF(28) XOR in GF(28) improves decoding perf. of 60% compared to RS in GF(28).

    RS/RS+ decoding in GF(216) realize a gain of 50% compared to decoding in GF(28).

    3 DBs2 DBsSummaryXOR vs. RS

    R. Moussa, U. Paris Dauphine

  • Outline1.Issue2.State of the Art 3.LH*RS Scheme 4.LH*RS Manager 5.Experimentations6.File Creation7.Bucket Recovery8.Parity Bucket CreationScenarioPerformances

    R. Moussa, U. Paris Dauphine

  • ScenarioMulticast Group of Blank Parity BucketsWanna Join Group g ? Searching for a new Parity BucketCoordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (2)CoordinatorI Would Launch UDP Listening Launch TCP Listening, Launch Working Thredsl*Waiting for Confirmation* If Time-out elapsed cancel everything

    Waiting for Replies Multicast Group of Blank Parity BucketsI Would I Would

    R. Moussa, U. Paris Dauphine

  • Scenario (3)You are HiredConfirmedCancellationCancellationNew Parity Bucket SelectionMulticast Group of Blank Parity BucketsCoordinator

    R. Moussa, U. Paris Dauphine

  • Scenario (4)Send me your contents ! Group of Data BucketsNew Parity BucketAuto-creation *Query Phase

    R. Moussa, U. Paris Dauphine

  • Scenario (5)Group of Data BucketsAuto-creation *Encoding PhaseNew Parity Bucket

    R. Moussa, U. Paris Dauphine

  • PerformancesMax Bucket Size : 5000 .. 50000 recordsBucket Load Factor: 62,5%Record Size: 100 octetsGroup of 4 Data BucketsEncoding GF(216) RS++ ( Log Pre-calculus & Row 1s XOR encoding to Process 1st DB buffer)XORRSXOR vs. RSConfig.GF(28)

    R. Moussa, U. Paris Dauphine

  • PerformancesXORRSXOR vs. RSConfig.GF(28)Same Encoding RateBucket Size: CPU Time 74% Total Time

    Bucket SizeTotal Time (sec)CPU Time (sec)Com. Time (sec)50000.1900.1400.029100000.4290.3040.066250001.0070.7380.144500002.0621.4840.322

    R. Moussa, U. Paris Dauphine

  • PerformancesXORRSXOR vs. RSConfig.GF(28)Same Encoding RateBucket Size: CPU Time 74% Total Time

    Bucket SizeTotal Time (sec)CPU Time (sec)Com. Time (sec)50000.1930.1490.035100000.4460.3280.059250001.0530.7660.153500002.1031.5310.322

    R. Moussa, U. Paris Dauphine

  • PerformancesXOR encoding speed : 2.062 secRS encoding speed: 2.103 secXOR realizes a performance gain in CPU time of 5% ( only 0,02% on Total Time)For Bucket Size = 50000 recordsXORRSXOR vs. RSConfig.GF(28)

    R. Moussa, U. Paris Dauphine

  • PerformancesXORRSXOR vs. RSConfig.GF(28) Idem GF(216), CPU Time = 3/4 Total Time XOR in GF(28) improves CPU Time by 22%

    R. Moussa, U. Paris Dauphine

  • PerformanceFile Creation Rate0.33MB/s for k = 00.25MB/s for k = 10.23MB/s for k = 2Record Insert Time0.29ms for k = 00.33ms for k = 10.36ms for k = 2

    Bucket Recovery Rate4.66MB/s from 1-unavailability6.94MB/s from 2-unavailability7.62MB/s from 3-unavailabilityRecord Recovery TimeAbout 1.3msKey Search TimeIndividual> 0.24msBulk> 0.056msWintel P4, 1.8GHz, 1Gbps

    R. Moussa, U. Paris Dauphine

  • ConclusionExperiments prove:Optimizations Encoding/ DecodingArchitecture Impact on PerformanceGood Recovery Performances

    R. Moussa, U. Paris Dauphine

  • Future WorkUpdate Propagation to Parity Buckets ReliabilityPerformanceReduce Coordinator Tasks Parity DeclusteringInvestigation of New Erausure-Resilient Codes

    R. Moussa, U. Paris Dauphine

  • References[PGK88] D. A. Patterson, G. Gibson & R. H. Katz, A Case for Redundant Arrays of Inexpensive Disks, Proc. of ACM SIGMOD Conf, pp.109-106, June 1988.

    [ISI81] Information Sciences Institute, RFC 793: Transmission Control Protocol (TCP) Specification, Sept. 1981, http://www.faqs.org/rfcs/rfc793.html

    [MB 00] D. MacDonal, W. Barkley, MS Windows 2000 TCP/IP Implementation Details, http://secinf.net/info/nt/2000ip/tcpipimp.html[J88] V. Jacobson, M. J. Karels, Congestion Avoidance and Control, Computer Communication Review, Vol. 18, No 4, pp. 314-329. [XB99] L. Xu & J. Bruck, X-Code: MDS Array Codes with Optimal Encoding, IEEE Trans. on Information Theory, 45(1), p.272-276, 1999.[CEG+ 04] P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, S. Sankar, Row-Diagonal Parity for Double Disk Failure Correction, Proc. of the 3rd USENIX Conf. On File and Storage Technologies, Avril 2004. [R89] M. O. Rabin, Efficient Dispersal of Information for Security, Load Balancing and Fault Tolerance, Journal of ACM, Vol. 26, N 2, April 1989, pp. 335-348. [W91] P.E. White, RAID X tackles design problems with existing design RAID schemes, ECC Technologies, ftp://members.aol.com.mnecctek.ctr1991.pdf [GRS97] J. C. Gomez, V. Redo, V. S. Sunderam, Efficient Multithreaded User-Space Transport for Network Computing, Design & Test of the TRAP protocol, Journal of Parallel & Distributed Computing, 40 (1) 1997.

    R. Moussa, U. Paris Dauphine

  • References (2)[BK+ 95] J. Blomer, M. Kalfane, R. Karp, M. Karpinski, M. Luby & D. Zuckerman, An XOR-Based Erasure-Resilient Coding Scheme, ICSI Tech. Rep. TR-95-048, 1995. [LS00] W. Litwin & T. Schwarz, LH*RS: A High-Availability Scalable Distributed Data Structure using Reed Solomon Codes, p.237-248, Proceedings of the ACM SIGMOD 2000. [KLR96] J. Karlson, W. Litwin & T. Risch, LH*LH: A Scalable high performance data structure for switched multicomputers, EDBT 96, Springer Verlag.[RS60] I. Reed & G. Solomon, Polynomial codes over certain Finite Fields,Journal of the society for industrial and applied mathematics, 1960. [P97] J. S. Plank, A Tutorial on Reed-Solomon Coding for fault-Tolerance in RAID-like Systems, Software Practise & Experience, 27(9), Sept. 1997, pp 995- 1012,[D01] A.W. Dine, Contribution la Gestion de Structures de Donnes Distribues et Scalables, PhD Thesis, Nov. 2001, Universit Paris Dauphine. [B00] F. Sahli Bennour, Contribution la Gestion de Structures de Donnes Distribues et Scalables, PhD Thesis, Juin 2000, Universit Paris Dauphine. + Rfrences: http://ceria.dauphine.fr/rim/theserim.pdf

    R. Moussa, U. Paris Dauphine

  • Publications[ML02] R. Moussa, W. Litwin, Experimental Performance Analysis of LH*RS Parity Management, Carleton Scientific Records of the 4th International Workshop on Distributed Data & Structure : WDAS 2002, p.87-97. [MS04] R. Moussa, T. Schwarz, Design and Implementation of LH*RS A Highly-Available Scalable Distributed Data Structure, Carleton Scientific Records of the 6th International Workshop on Distributed Data & Structure: WDAS 2004.[LMS04] W. Litwin, R. Moussa, T. Schwarz, Prototype Demonstration of LH*RS: A Highly Available Distributed Storage System, Proc. of VLDB 2004 (Demo Session) p.1289-1292. [LMS04-a] W. Litwin, R. Moussa, T. Schwarz, LH*RS: A Highly Available Distributed Storage System, journal version submitted, under revision.

    R. Moussa, U. Paris Dauphine

  • Thank You For Your AttentionQuestions ?

    R. Moussa, U. Paris Dauphine

    Le client a une image du fichierCalcul d1 symbole de parit (RS): m Mult GF + m-1 XORsCalcul d1 symbole de parit (XOR): m-1 XORs multiplier par le nombre de symboles/ enregA multiplier par le nombre denregistrements

    Dduire le nombre de cases de donnes factices

    Calcul d1 symbole de parit (RS): m Mult GF + m-1 XORsCalcul d1 symbole de parit (XOR): m-1 XORs multiplier par le nombre de symboles/ enregA multiplier par le nombre denregistrements

    Dduire le nombre de cases de donnes factices

    Performances de GF(2^16)Par rapport GF(2^8):

    Le 1 cest lidentit

    Performances de GF(2^16)Par rapport GF(2^8):

    En //le aprs phase de dcodage , phase dinterrogationPerformances de GF(2^16)Par rapport GF(2^8):

    Vitesses de rcuprationVitesses de rcupration