practical web-based delta sync for cloud storage services · web apps with local storage or log...

26
He Xiao Zhenhua Li Ennan Zhai Tianyin Xu Practical Web-based Delta Sync for Cloud Storage Services [email protected] July 10, 2017 Hotstorage’17

Upload: others

Post on 08-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

HeXiao Zhenhua Li Ennan Zhai Tianyin Xu

PracticalWeb-basedDeltaSyncforCloudStorageServices

[email protected],2017Hotstorage’17

Page 2: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

NetworkTrafficisOverwhelminginCloudStorage

FileSync

2

CloudTraffichas30%CAGR(CompoundAverage Growth Rate)

SeverClient

NetworkTrafficUsers Vendors

Page 3: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

DeltaSyncImproves Network Efficiency

DeltaSynciscrucialforreducingcloudstoragenetworktraffic.

10MB1B

DeltaSync

DeltaData

3

NewFile OldFile

Delta sync support in nine state-of-the-art cloud storage services 10MB

FullSyncNewFile OldFile

FullFile

Page 4: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

No Web-basedDeltaSync

Whyweb-baseddeltasyncisnotsupportedbytoday’scloudstorageservices?

4

WebAppswithlocalstorageorlogfilesneedweb-basedDeltaSync

WebisthemostpervasiveandOS-independent cloudstorageaccessmethod

Web-baseddeltasyncisessentialforcloudstoragewebclientsandwebapps

Page 5: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

Contribution

• Wequantitatively studywhyweb-baseddeltasyncisnotoffered bytoday’scloudstorageservices.

• Webuildapracticalweb-baseddeltasyncsolutionforcloudstorageservices.• Byreversing traditionaldeltasyncprocess,wemaketheoverheadaffordableatthewebclientside.• Byexploitingthelocality ofusers’editsandtradingoffhashalgorithms,wemakethecomputationoverheadaffordableattheserverside.

5

Page 6: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

WebRsync:ImplementDeltaSynconWeb

• Implementrsync onrealcloudstoragewith nativewebtech:JavaScript + HTML5 + WebSocket• rsync isthedefactosolutionofdeltasyncincloudstorage

JavaScriptImplementationofRsync

WebServer

LocalFile System

HTML5FileAPI

WebSocket

StorageBackendAliyun OSS/OpenStack Swift

High-SpeedInternalNetwork

Web Browser

CImplementationofRsync

6

Page 7: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

WebRsyncvs.rsync

7

Sync time of WebRsync vs rsync

Average Client CPU utilization

Page 8: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

StagnationduetoJavaScript’sSingle-thread EventLoopModel

//printtimestampevery100mssetInterval(print(timestamp),100) //printthetimestampofeverykeystone( startorendofatask)on_start(task); print(task.id, timestamp) on_finish(task); print(task.id, timestamp)

8

StagMeter

Page 9: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

1.SendmetadataWaitserver

2.ChecksumSearchandComparison

3.SendtokensandliteralbytesWaitserver

High CPU Utilizationwhencomputing

TimestampPrintingissuspendedWebisunderstagation state

StagMeteronWebRsync

9

Sync Process (Second)

Page 10: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

WebR2sync:Client-sideOptimizationReverseComputationProcess

Client Server

RequestforSyncingFilef’

ChecksumListoffSegmentationFingerprinting

SearchingComparing

GeneratetokensandLiteralBytes Construct

NewFilefACK

10

WebRsync

Page 11: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

WebR2sync:Client-sideoptimizationReverseComputationProcess

• Web Reverse Rsync: Reverse complicated computation fromserver to client.

Client Server

RequestforSyncingFilef’

SegmentationFingerprinting

GenerateTokensAndLiteralBytes

ConstructNewFilefACK

SearchingComparing

ChecksumListoff

11

Page 12: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

PerformanceofWebR2sync

Edit Size (Byte)

Sync

Tim

e (S

econ

d)

12

Edit Size (Byte)

Sync

Tim

e (S

econ

d)

Issue:Servertakesseverelyheavyoverhead.

Page 13: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

Server-sideOverheadProfiling

Checksumsearching andblockcomparison occupy80%ofthecomputingtime

MD5 Computing Checksum Search

13

Ø UsefasterhashfunctionstoreplaceMD5Ø Reducechecksumsearchingoverhead

Page 14: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

ReplacingMD5withSipHashinChunkComparison

HashFunction CollisionProbability

CyclesperByte

MD5 Low 5.58

Murmur3 High 0.33

Spooky High 0.14

SipHash Low 1.13

SipHashremainlowCollisionProbabilityat muchfasterspeed

14

A comparison of pseudorandom hash functions

Page 15: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

SolvePossibleHashCollision

• ReplaceMD5withSipHash,maycausepotentialcollisions(Probabilityp),sodoesMD5.

• OurSolution:UseSpooky(fastestmethod,collisionprobabilityp’).• Theprobabilityofcollisionsisp*p’

• Alternative:UseMD5orotherstronghashfunctionsasaglobalverification.• ComputeMD5overwholefileisexpensive.

15

Page 16: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

ReduceChunkSearchingbyExploitingLocality ofFileEdits.

16

MD5-4

HashTableAdler32-1 Adler32-2 Adler32-3 Adler32-4

MD5-1 MD5-2 MD5-3

Block1 Block2 Block3 Block4

Checksumsearch

Compare

95%synchronizedfileshavelessthan10 edits.

Page 17: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

EvaluationSetup

17

Basic experiment setup visualized in a map of China

Page 18: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

SyncTime

18

1 10 100 1K 10K 100kEdit Size (Byte)

10-1

100

101Sy

nc T

ime

(Sec

ond) WebRsync

WebR2syncWebR2sync+rsync

WebR2sync+is2-3 times fasterthanWebR2syncand15-20timesfasterthanWebRsync

Page 19: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

Throughput

19

0 2000 4000 6000 8000Number of Concurrent Users

NoWebRsync

WebRsync

WebR2sync

WebR2sync+

rsync

Thisthroughputisas4 timesasthatofWebR2sync/rsyncandas 9timesasthatofNoWebRsync.

Page 20: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

FutureWork

• Evaluateourapproachunderdifferenteditmodes• delete,insert,append

• Evaluatetrafficefficiency• allthemethodsshouldhavesimilartrafficefficiency

• Understandtheeffectsofthreeoptimizations• evaluatethemseparately

20

Page 21: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

Discussion

• Probabilityofcollisionsoffilechecksums

• Characteristicsoffileoperationsinreal-worldscenariosfromtheperspectiveofsync

• Localitymeasurefordecidingwhethertoapplylocality-basedoptimization.

21

Page 22: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

Conclusion

•WebR2sync+isapracticalsolutionforweb-baseddeltasync• lightweightcomputation attheclientside• optimizedoverheadattheserverside• theserver-sideoptimizationscanbeadoptedinthetraditionalcloudstoragearchitecture

22

Page 23: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

Thanks!discussion

23

Page 24: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

WebRsyncDetailed Description

Block1

Block2

Block3

Adler32 MD5

Adler32 MD5

Adler32 MD5

… …

WeakChecksumSearch

StrongChecksumCompare

1 block offset

YES

YES

NO

NO

MatchedTokens LiteralBytes ConstructNewFile

Client Server

1 byte offset

Rolling Adler32O(1): Adler(i)=>Adler(i+1)

24

Page 25: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

WebR2sync:FlowchartandData structure

ConstructNewFilesClient Server

WeakChecksumSearch

StrongChecksumCompare

YES

NO

NO

1 byte offsetNo further Operation

YESBlock 1Block 2Block 3Block 4

Block 1Block 2Block 3Block 4

Whenfind amatch,recordtheassociatedindex

25

Page 26: Practical Web-based Delta Sync for Cloud Storage Services · Web Apps with local storage or log files need web -based Delta Sync ... overhead affordable at the web client side. •

SyncTimedecomposed

26

1 10 100 1K 10K 100KEdit Size (Byte)

0

0.05

0.1

0.15

0.2Sy

nc T

ime

(Sec

ond) Server

NetworkClient

WebR2sync+clienttakesstableandshortertime.BecauseoftheServer-sideoptimization,computingtimeismuchshorterbothinclientandserver.