virtualization technique for replica synchronization
TRANSCRIPT
Virtualization Technique
For Replica Synchronization
By :
Ashwin G.Sancheti
Email:[email protected]
Instructor : Prof.Randal Burns
Date : 19th Feb 2008
Roadmap
� Motivation/Goals
� What is Virtualization?
� Advantages?
� My Previous work� My Previous work
� Architecture and design
� Algorithm and working
� Pros and cons for my work
� New proposed idea
� Different approaches for the same
� Conclusion
Problem and Motivation
� Redundancy in Replica Synchronization
� To transfer the redundant information which might already present the receiver side.
Time required to transfer the data is very � Time required to transfer the data is very high.
� Disk space usage
� More I/O
e.g.: Patch update example
Goal
� To extract the common part (Delta) part between two virtual machine and transfer only the common part with minimum transmission delay.delay.
� Reduce the I/O traffic
� Reduce the disk space
Two Solutions:
� TAPER Algorithm
Replayfs trace (VFS approach)� Replayfs trace (VFS approach)
Non virtual machine and VM
Advantages
� Server Consolidation
� Testing and development
� Dynamic Load Balancing � Dynamic Load Balancing
� Disaster Recovery
� Resource sharing
� Reduce power consumption
� Reduce the land requirement
� Many others…
My Work
� Current Problem :� Large Disk Space to store multiple virtual machines.
Goal :� Goal :� To get the delta part between two virtual machines and delete one virtual machine.
� Save lots of space.
� Example :� Windows 2K Server (Plain Vanilla)
� Windows 2K + Exchange Server
Architecture and Design
� VMDK Architecture
� Sparse header
� Descriptor table� Descriptor table
� Grain Directories
� Grain tables
� Grain Size
� CRC computation
� Algorithm to compute the delta part
Test cases and Results
� VM 1 : Win 2000 (Plain)� Size :1 GB
� VM 2 : Win 2000 + Exchange Server� VM 2 : Win 2000 + Exchange Server� Size : 1.3 GB
� Delta VM: Only Exchange Server� Size : 120 MB
New proposal
� Goal :� To get the Delta/Common part between the client machine and the remote machine with minimum transmission delay.with minimum transmission delay.
� Advantages� No need to transmit whole virtual machine.
� Less time to transmit the data.
First Approach
� TAPER Algorithm� Directory tree synchronization protocol between source and target node
� Works in four phases� Works in four phases� Directory tree
� Large chunks
� Smaller blocks
� Bytes
First Approach(Conti..)
� Directory Matching
� Eliminates identical portions of the directory tree that are common in content and structure
� Hierarchical hash tree implementation
First Approach(Conti…)
� Matching Chunks� Now we are left with unmatched files at the source and Target.
� Use content-defined chunking (CDC) to � Use content-defined chunking (CDC) to reduce the unmatched data.
� Boundary for chunk is defined by Rabin Fingerprinting (?)
� Target send the SHA values for all the remaining files to the source
First Approach(Conti…)
� Matching Blocks� Each file at the source will be series of matched
and unmatched regions (holes)
� Fine grained block matching is performed so we � Fine grained block matching is performed so we will left with unmatched data blocks at source side.
� Matching Bytes� The blocks in the unmatched data are delta
encoded with similar blocks in matched set.
� Finally remaining unmatched data and delta bytes are sent to the target using standard compression algorithm.
Second Approach
� Perform the comparison at the VFS level.
� Replayfs : Replaying file system traces at the VFS level.
Main goal : To reproduce the original file � Main goal : To reproduce the original file system workload as accurately as possible
Replayfs Component
� Raw traces and re-playable traces
� Trace compiler : To get the raw traces
� Command
Sequence of VFS operations with their � Sequence of VFS operations with their associated timestamps, process ID
� Actual return value is compared with the return value captured in the original trace.
Replayfs (Conti…)
� Resource Allocation Table (RAT)� To refer to the command parameters and the return values.
� Always kept in the memory.
Replayfs(Conti…)
� Memory buffers� Largest component of the Replayfs trace
� Necessary to replay the trace.
� Includes file name and buffers to be written in future.
Conclusion
� Detecting common part between two virtual machine reduces the disk space drastically.
� TAPER is one of the algorithm to get the common part.common part.
� Another approach we can choose as capturing traces at VFS level(ReplayFS)
References
� TAPER: Tiered Approach for Eliminating Redundancy in Replica Synchronization
University of Texas Austin
Accurate and Efficient Replaying of File System � Accurate and Efficient Replaying of File System Traces
Stony Brook University
THANK YOU
Any questions?