[lecture notes in electrical engineering] recent advances in computer science and information...

15
Z. Qian et al. (Eds.): Recent Advances in CSIE 2011, LNEE 125, pp. 399–413. springerlink.com © Springer-Verlag Berlin Heidelberg 2012 Impact on Chunk Size on Deduplication and Disk Prefetch * Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Cyrille Artho, and Yoshihito Watanabe Abstract. CAS (Content Addressable Storage) systems reduce total volume of vir- tual disk with deduplication technique. The effects of deduplication has been eva- luated and confirmed in some papers. Most evaluations, however, were achieved by small chunk size (4KB-8KB) and did not care about I/O optimization (disk pre- fetch) on a real usage. Effective disk prefetch is larger than the chunk size and causes many CAS operations. Furthermore, previous evaluations did not care about ratio of effective data in a chunk. The ratio is improved by block realloca- tion of file system, which considers access profile. Chunk size should be decided by considering these effects on a real usage. This paper evaluates effectiveness of deduplication on a large chunk of CAS system which considers the optimization for disk prefetch and effective data in a chunk. The optimization was achieved for boot procedure, because it was a mandatory operation on any operating systems. The results showed large chunk (256KB) was effective on booting Linux and could maintain the effect of deduplication. 1 Introduction Content Addressable Storage (CAS) becomes popular method to manage disk im- age for many instances on virtual machine [1,2,3]. In CAS systems, data is ma- naged by certain size of chunk, and it is addressed not by its physical location but by a name derived from the content of that data (secure hash is used as a unique name usually). CAS system can reduce its total volume by data deduplication which shares same content chunks with a unique name. Most CAS systems use 4 or 8 KB chunk size. The size fits to traditional block size of file system. It facilitates the abstraction of block device. However, it was too small from the views of effectiveness of disk access. If 4KB chunk is managed by a Kuniyasu Suzaki · Toshiki Yagi · Kengo Iijima · Cyrille Artho National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba, Ibaraki, Japan 305-8568 Yoshihito Watanabe Alpha Systems Inc.., 6-6-1 Kamidanaka, Nakahara-ku, Kawasaki, Kanagawa, Japan 211-0053

Upload: huamin

Post on 06-Aug-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Z. Qian et al. (Eds.): Recent Advances in CSIE 2011, LNEE 125, pp. 399–413. springerlink.com © Springer-Verlag Berlin Heidelberg 2012

Impact on Chunk Size on Deduplication and Disk Prefetch*

Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Cyrille Artho, and Yoshihito Watanabe

Abstract. CAS (Content Addressable Storage) systems reduce total volume of vir-tual disk with deduplication technique. The effects of deduplication has been eva-luated and confirmed in some papers. Most evaluations, however, were achieved by small chunk size (4KB-8KB) and did not care about I/O optimization (disk pre-fetch) on a real usage. Effective disk prefetch is larger than the chunk size and causes many CAS operations. Furthermore, previous evaluations did not care about ratio of effective data in a chunk. The ratio is improved by block realloca-tion of file system, which considers access profile. Chunk size should be decided by considering these effects on a real usage. This paper evaluates effectiveness of deduplication on a large chunk of CAS system which considers the optimization for disk prefetch and effective data in a chunk. The optimization was achieved for boot procedure, because it was a mandatory operation on any operating systems. The results showed large chunk (256KB) was effective on booting Linux and could maintain the effect of deduplication.

1 Introduction

Content Addressable Storage (CAS) becomes popular method to manage disk im-age for many instances on virtual machine [1,2,3]. In CAS systems, data is ma-naged by certain size of chunk, and it is addressed not by its physical location but by a name derived from the content of that data (secure hash is used as a unique name usually). CAS system can reduce its total volume by data deduplication which shares same content chunks with a unique name.

Most CAS systems use 4 or 8 KB chunk size. The size fits to traditional block size of file system. It facilitates the abstraction of block device. However, it was too small from the views of effectiveness of disk access. If 4KB chunk is managed by a

Kuniyasu Suzaki · Toshiki Yagi · Kengo Iijima · Cyrille Artho National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba, Ibaraki, Japan 305-8568 Yoshihito Watanabe Alpha Systems Inc.., 6-6-1 Kamidanaka, Nakahara-ku, Kawasaki, Kanagawa, Japan 211-0053

400 K. Suzaki et al.

SHA-1 digest (20 Byte), 1TB virtual disk is expressed by 5GB (5GB/4KB x 20 byte SHA-1 digests) SHA-1 digest. The volume increase is 0.49% but management overhead is not trivial. Larger chunk is desired from the view of management.

On current operating systems, high I/O throughput is obtained by block access optimization, especially disk prefetch. Disk prefetch is a widely used technique to reduce the number of I/O by reading extra blocks and keeping them in a memory as cache. Many small CAS chunks per a disk prefetch are against the optimization, because it causes many searches of chunk in CAS system.

Locality of reference (access locality) is important on CAS system, because scattered (i.e., fragmented) data decreases effective data in a chunk and increases the number of chunks for read requests. Locality of reference is increased by block reallocation of file system, which considers access profile. High locality of refer-ence expands window size of disk prefetch, because it increases cache hit ratio and reduce number of access.

However, large chunk reduces effect of deduplication. Furthermore, larger chunk reduces ratio of effective data in a chunk, because locality of access is li-mited in general case. We have to adjust chunk size to balance effects of I/O opti-mization and deduplication on a real usage.

In this paper, the size effect on deduplication and disk prefetch are evaluated on LBCAS (Loopback Content Addressable Storage) [4] which manages each chunk with a file and reconstructs loopback block device. The optimization for access profile is applied for boot procedure, because it is mandatory operation on any op-erating system. We estimate chunk size impact on deduplication and disk prefetch.

This paper is organized as follows. Section 2 reviews related works and makes clear the features of CAS systems. Section 3 introduces issues on CAS systems: retrieve overhead and alignment. Section 4 describes the implementation detail of LBCAS. Section 5 analyses the effect of disk prefetch on LBCAS. Section 6 eva-luates the effect of deduplication and the affinity of disk prefetch. Section 7 dis-cusses directions for future work, and section 8 summarizes our conclusions.

2 Related Work

This section describes features of CAS systems. The features are compared with LBCAS which is used in this paper.

Venti is a pioneer of CAS system, developed for Plan9 data archive[5,6]. Venti is a block storage system in which chunks are identified by a collision-resistant cryptographic hash. A chunk of CAS is created when it is a first appear data in the CAS system. The chunks are stored in a data-base of CAS system, which retrieves data by the hash. The chunks are not removed and read many times. The feature is called WORM (Write Once Read Many). A chunk is self-verifying with its hash digest and can keep data integrity in network. The LBCAS has same features but each chunk is saved in a file with SHA-1 file name because file abstraction is easy to handle. It does not require special data-base.

Current CAS system is divided into two categories with fix or variable length chunk. Foundation[8] use fix length and DeepStore[8] and NEC-Hydra[9] use va-riable length. Variable length chunk can find effective deduplication, but it causes internal fragmentation on the data-base. From the view of management overhead, fixed alignment is easy to handle. LBCAS uses fix length for maintenancebility.

Impact on Chunk Size on Deduplication and Disk Prefetch 401

The effects of deduplication on several OS images were evaluated in [2, 3]. They showed features of deduplication from many aspects: on a single OS image, among related OS images, and between updated OS images. The results showed advantage of deduplication, but they were evaluated on small chunk size (from 512B to 8KB). This paper evaluates effect of larger chunk size considering access optimization on a real usage.

For high throughput, chunk aggregation was introduced for distributed CAS sys-tems [10]. In the paper[10], 8KB chunks were aggregated into larger container ob-ject (4MB) and showed efficiency on the network. Paper [11] used same technique and showed 100MB/sec throughput. However, they used special methods for pack-ing and did not care the access pattern on client OS. Our proposal uses general block reallocation technique to aggregate data, which considers access profile.

3 Issues on CAS Systems

CAS system is a virtual block device, which has its own problems. The hash colli-sion is the most popular problem, but there are some other problems related to chunk size. In this section we summarize these problems.

3.1 Issues on CAS Systems

Effects of deduplication appear in many aspects. The effects are categorized in 3 types; intra, inter, and update [3]. Intra is deduplicated chunk which appears in a disk image. Inter appears among disk images. When different OS images have same chunks, they are treated as Inter. Update appears between updated disk im-ages. Update compares updated disk image with previous one. It causes high de-duplication, because most parts of disk are not changed in an update.

Intra deduplication makes a profit for each VM instance, because a VM can re-duce the total access to a disk image. The most intra deduplications are occupied by zero-filled chunks in general. Unfortunately, they are not effective for a VM instance, because they are not read from an OS. Other intra deduplications reduce the number of access and contribute the performance on a real situation.

Inter and update deduplications make a profit for servers, because consumption of physical storage can be reduced. Fortunately, normal update changes a part of disk image and the remainder are treated as deduplication.

3.2 Retrieve Overhead

Recent disk prefetch mechanism makes larger access than 4KB in order to hide disk access latency. The extra data is saved to a memory as cache. CAS system may cause multiple retrieving of chunk per a disk prefetch. It is against the opti-mization of disk prefetch. In order to hide the multiple retrieving, CAS system should use suitable chunk size.

However, disk access size varies with hit ratio of cache. Constant access size is suitable for CAS systems. We should re-consider the disk prefetch mechanism.

402 K. Suzaki et al.

3.3 Access Alignment

CAS systems with fixed chunk size have access alignment. If an access crosses over the alignment, the access has to take some chunks. If the chunk size is small, there are a lot of crossovers. It increases the overhead of retrieving on a CAS sys-tem. The behavior differs from physical block device. We should decide a chunk size under the consideration of alignment and access size.

4 LBCAS

LBCAS is a kind of CAS system which was called HTTP-FUSE CLOOP in [7]. It was developed for Internet block device but it is used as a local CAS system. The data chunks are saved to files with their SHA-1 file names and works as a part of loopback block device. The abstraction level is high and the overhead is not tri-vial. However, file abstraction is easy to handle, because we can utilize many technology of contents deliver (e.g., file sharing, proxy, etc).

LBCAS is made from an existing block device. The block device is divided by a fixed chunk size and each data saved to a small block file. Saved data are also compressed. The compression reduces traffic but it causes internal fragmentation in a file system, because normal file system manages a file with 4KB data block. The same problem causes on a data-base of CAS system which uses fixed size of record and compression. The loss is described in later section.

Each block file has a name of SHA-1 of its contents. The address of block files is managed by mapping table. The mapping table has the relational information of physical address and SHA-1 file name. Figure 1 shows the creation of block files and mapping table file “map01.idx”. Block files are treated as network transparent between client and server. A virtual block device (loopback file) is reconstructed with the mapping table file on a client. Client’s local storage acts as a cache. The files are measured with SHA-1 hash values when they are mapped to virtual disk. It keeps the integrity of block device.

When an access is issued to a virtual disk on a client, the relevant block file is downloaded and the data is mapped to the virtual disk. The block files are used when it is required, and the file can be removed on a client if needed. Namely, the only necessary blocks files are used on demand.

LBCAS has two level of cache to prevent redundant download and decompres-sion, which is called “storage cache” and “memory cache”. Storage cache saves the downloaded block files at a local storage. It eliminates the download of same block file. If the necessary block files are saved at storage cache, the LBCAS works without network connection. The volume of storage cache is managed by water mark algorithm of LIFO in current implementation. The latest downloaded block files are removed when the volume is over the water mark, because aged block files might be used for boot time. Memory cache saves the uncompressed block file at the memory of LBCAS driver. It eliminates decompression when same block file is accessed in succession. Memory cache saves 1 block file and the coverage is the size of block file. It should be coordinated with the disk pre-fetch of existing OS.

Impact on Chunk Size on De

Fig. 1 Server and client for LBCAS.

5 Disk Prefeatch

Most operating systems hDisk prefetch reads extracache. They hide latency cache does not hit, the nedow size achieves efficiepends on the locality of re

Some techniques of dication level. Block realloctechnique, because it inccache. These techniques a

5.1 Kernel Level

When a read request is imemory as cache. It reduThe function in Linux kedahead is extended or shra

Figure 2 shows the action is issued, some blocdevice. When chunk size and reduce the effect of re

This figure also showblock file is required sequLBCAS and the decompre

eduplication and Disk Prefetch 40

have the function of disk prefetch to reduce I/O operatio data from a block device and saves them to memory aof I/O when next read requests hit the cache. When pag

ext readahead shrink the window size. The suitable winent usage of cache memory and I/O request, which deference. sk prefetch are developed on kernel level and user applcation of file system is considered a part of disk prefetccreases locality of reference and improves hit ratio oare closely related to chunk size problem on CAS.

issued, kernel reads extra data and saves them to maices the number of I/O operation and hides the I/O dela

ernel is called “readahead” [12,13]. The window of reank by the profile of cache hit and miss-hit. ction of readahead on LBCAS. When a readahead operk files are downloaded and mapped to the virtual blocis small, many small chunks per a disk prefetch are use

eadahead. s the effect of storage cache of LBCAS. When a samuentially, the block file is stored on the memory cache oession is eliminated.

03

n. as ge n-e-

li-ch of

in y. a-

a-ck ed

me of

404 K. Suzaki et al.

Fig. 2 Relation be-tween LBCAS and readahead.

5.2 User Level

Disk prefetch technique is also developed on user level. For example, libpre-fetch[14] is a prefetch library to reduce the seek overhead. Especially, Linux ker-nel has the system call “readahead” from 2.4.13, which populates page cache with data of a file. The name is same to the kernel level “readahead” but the action is different. Whole data of a file is populated on page cache before the file is read.

Some Linux distributions have a tool to utilize the readahead system call to speed up the boot process. The files requested at boot time are listed at“/etc/readahead/boot”, and the data of the files are populated on the page cache in advance at boot time. Unfortunately, user level readahead requires a lot of memory because it saves whole data of files. The total memory size has to be ba-lanced. Ubuntu has a tool called “preload” to take a log of files requested at boot time and arranges the file list for readahead.

5.3 Block Reallocation

Most file systems have defragmentation tools to reallocate blocks of file system. It is said to increase the locality of reference and make quick access. The tools, how-ever, reallocate blocks from the view of continuation of file or expansion of spare space. Quick access is a side effect of continuation of file. In order to solve the problem, “ext2/3optimizer” was developed [15]. Ext2/3optimizer takes the profile of accessed blocks of ext2/3 and reallocates the blocks in line.

Figure 3 shows the block image on ext2/3 file system. Left figure indicates be-fore ext2optimizer and right figure indicates after Ext2/3optimizer. Ext2/3optimizer changes pointers of data blocks of inode only. It aggregates the data blocks at the head of device to increase locality of reference. The other struc-ture of ext2/3, namely meta-data of ext2/3, is reserved. The reallocation is based on real access profile and the cache hit ratio keeps high. As a result, the window size of readahead can keep large. On the LBCAS, effective data in a chunk is in-creased and the number of necessary block files is reduced.

Impact on Chunk Size on Deduplication and Disk Prefetch 405

Fig. 3 Reallocation of ext2/3optimizer and effect on block file of LBCAS. Red blocks are accessed on a real usage.

6 Evaluations

This section describes the results of deduplication on some chunk sizes and per-formance on disk prefetch.

6.1 Deduplication on Larger Chunk

The effects of deduplication on LBCAS are evaluated on Debian GNU/Linux and Ubuntu. Debian (Lenny) and Ubuntu (9.04) were installed on 8GB virtual disk of LBCAS. The chunk size is changed from 4KB to 512KB. We also evaluated Win-dows XP with some chunk sizes. The results are eliminated by limitation of space but we got same features on them. 1) Intra Deduplication The left of Table 1 shows the effect of intra deduplication. The deduplicated chunks are categorized by “non-zero data chunk” (NonZero) and “zero filled chunk” (Zero). The Zero was expressed by 1 block file and covered more than 50% in any cases. However, the intra deduplication for NonZero was little (less than 5 %). The effect of NonZero was reduced as long as chunk becomes large, but the reduction of Zero was not affected severely. The intra deduplication was dominated by Zero.

The right half (3 columns) of Table 1 shows the volume usage. Amount of na-tive compressed chunk indicates the volume which does not consider internal fragmentation. Used volume in file system indicates the volume which includes internal fragmentation on ext3. Disk waste rate shows the ratio of data in a 4KB ext3 block, which is caused by internal fragmentation. The internal fragmentation is caused by compression on LBCAS. 4KB chunk caused huge internal fragmenta-tion, because the chunk size is the same as the block size of ext3. All block files have internal fragmentation with compressed chunk data. The disk waste rate indi-cates true ratio of compression. Other chunk sizes also affected by the internal fragmentation by compression, but disk waste rate is improved because most compressed chunk are exceeded 4KB. The same problem is caused on a data-base

406 K. Suzaki et al.

of CAS system with irregularly compressed chunk. The larger chunk mitigated this fragmentation problem, especially more than 64KB chunk showed less than 10% loss on 4KB block of ext3.

Table 1 Intra Deduplication on 4KB-512KB chunk and Disk Usage. Upper is Debian and Lower is Ubuntu.

Chunk size Debian

Duplicated Chunk (shared ) (%)

Non-duplicated (unique)

Chunk (%)

Amount of native compressed chunk (MB) A

Used volume in File System (MB) B

Disk Waste Rate (fragmenta-tion) A/B

NonZe-ro

Zero

4KB 4.26 68.2 27.5 929.5 2568.4 0.36

8KB 2.68 67.9 29.4 932.5 1606.7 0.56

16KB 1.68 67.5 30.9 940.2 1296.4 0.72

32KB 0.82 66.7 32.5 945.6 1104.9 0.83

64KB 0.21 65.6 34.2 945.2 1039.2 0.9

128KB 0.02 64 36.0 941.5 991.1 0.95

256KB 0 61.8 38.2 937.9 964.1 0.97

512KB 0 59.1 40.9 936.0 949.8 0.98

Chunk size Ubuntu

Duplicated Chunk (shared ) (%)

Non-duplicated (unique)

Chunk (%)

Amount of native compressed chunk (MB) A

Used volume in File System (MB) B

Disk Waste Rate (fragmenta-tion) A/B

NonZe-ro

Zero

4KB 2.63 69.0 28.4 940.1 2578.7 0.36

8KB 1.08 68.5 30.4 914.9 1639.9 0.58

16KB 0.53 68.0 31.5 892.4 1247.9 0.73

32KB 0.16 67.2 32.6 874.4 1055.7 0.86

64KB 0.01 66.1 33.9 860.4 954.7 0.91

128KB 0 64.5 35.5 852.1 901.3 0.95

256KB 0 62.2 37.8 848.0 874.2 0.97

512KB 0 58.9 41.1 846.3 860.1 0.99

2) Inter Deduplication Table 2 shows the effect of inter-deduplication between Debian and Ubuntu. The effect was 10% at 4KB chunk but it also reduced as chunk become larger. Unfor-tunately 4KB was too small considering internal fragmentation problem. Storage server does not need to care about inter deduplication except zero filled block, be-cause the coverage of zero filled block was more than 50 % (Table 1).

Impact on Chunk Size on Deduplication and Disk Prefetch 407

Table 2 Inter deduplication between Debian and Ubuntu

LBCAS size Debian

block file

Ubuntu

block file Same block file

Sharing

Debian (%)

Sharing

Ubuntu (%)

4KB 603,341 612,967 65,457 10.85 10.68

8KB 316,848 322,442 14,042 4.43 4.35

16KB 165,159 165,764 2,884 1.75 1.74

32KB 86,186 85,670 474 0.55 0.55

64KB 44,947 44,427 85 0.19 0.19

128KB 23,599 23,263 24 0.10 0.10

256KB 12,527 12,388 11 0.09 0.09

512KB 6,705 6,742 4 0.06 0.06

3) Update Deduplication The effect of update-deduplication was also evaluated on Debian and Ubuntu re-spectively. They were updated every week for security.

Table 3 shows the ratio of reused chunks which indicates deduplication be-tween updates. The ratio was high in any chunk size, because most parts of disk images were not changed. It means the deduplication is effective on updates at any chunk size.

Table 3 Reuse (%) of deduplication Debian and Ubuntu.

Reuse (%) on Debian

4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB

29/05/2009-05/06/2009 Package 22.94MB

94.1 91.3 88.7 87.0 84.8 81.5 76.7 69.1

05/06/2009-12/06/2009 Package 4.32 MB

98.0 97.3 96.9 96.5 96.0 95.3 94.4 92.6

6.2 Effect of Disk Prefetch

We compared the effect of ext2/3optimizer and user level readahead (system call readahead) on LBCAS. Ext2/3optimizer and user level readahead were applied on guest OS (Ubuntu 9.04, Linux kernel 2.6.28, ext3 file system) on KVM virtual machine (version 60). The Ubuntu used 1.98GB volume in 8GB LBCAS.

1) Access Pattern of Disk Prefetch The access pattern of boot procedure was investigated. We confirmed the charac-teristics of access pattern and locality of reference and applied ext2/3optimizer on it. From after, we refer user-level readahead as “u-readahead” in order to distin-guish it from kernel level readahead.

408 K. Suzaki et al.

a) Block Reallocation: ext2/3optimizer Figure 4 shows the data allocation on ext3, which is visualized by DAVL (Disk Allocation Viewer for Linux) [16]. The left figure shows the original data alloca-tion, and right figure shows the data allocation optimized by ext2/3optimiser.

In the figure, green plots indicate the allocation of meta-data of ext3 which was aligned at the right edge. We confirmed that ext2/3optimizer keeps the structure of ext3. The blue plots indicate the contiguous allocation of data block of file and the yellow plots indicate the non-contiguous allocation. We confirmed that ext2/3optimizer reallocates non-contiguous data at the head of virtual disk. It was the result that ext2/3optimizer exploited the profiled data blocks and aggregated them to the head of the disk. As a result, ext2/3optimizer increased fragmentation from the view of file. DVAL showed that normal ext3 had 0.21% fragmentations but the ext3 optimized by ext2/3optimizer had 1.11%. The relocation, however, was good for page cache. The coverage of readahead was expected to keep large and occupancy of block file of LBCAS would be high.

Figure 5 shows the access trace of the boot procedure. The x axis indicates the physical address and y axis indicates the elapsed time. The red “+” plots indicate the access on the normal ext3 and the blue “X” plots indicate the access on the ext3 optimized by ext2/3optimizer. The figure showed that the accesses to the normal were scattered. The locality of reference was not good and the effect of page cache and the occupancy of block file of LBCAS would be low. On the other hand, the access to the ext2/3optimizer increased the locality of reference, because the most accesses were the head of disk. The rest spread accesses were the meta-data and the volume was little.

Fig. 4. Visualization of data-allocation on ext3 (left is normal and right is ext2/3optimizer) by DAVL.

Impact on Chunk Size on Deduplication and Disk Prefetch 409

Fig. 5 Access trace of boot pro-cedure (RED “+” indicates normal and BLUE “X” indicates ext2/3 optimizer.)

b) User Level readahead (system call readahead) Ubuntu has the mechanism to populate the page cache with files required at

boot time. The files are described at “/etc/readahead/boot” and “/etc/readahead/desktop”. The former file listed 937 files and the total volume was 54.1MB. The latter file listed 281 files and the total volume was 25.0MB. The listed files were not all files required boot time. Ubuntu 9.04 required 2,250 files (203MB) and the half of them were populated on the page cache by u-readahead before they were truly required.

c) Effect of Readahead Figure 6 shows the frequency for each readahead coverage size on normal, u-

readahead, and ext2/3optimizer. The figure shows that ext2/3optimizer reduced the small I/O requests. As a result, the frequency of I/O request was reduced to 2,129 from 6,379 and the coverage of readahead was changed to 67KB from 33KB. The total I/O was 140MB and 208MB on ext2/3optimizer and normal re-spectively. The I/O request was 2 times wider and the frequency of I/O request was 1/3. The effect of frequency was not the inverse of magnification of I/O. The results indicated that the locality of reference was much improved.

On the other hand, u-readahead showed same tendency with normal. The small requests were reduced and the big requests were increased a little bit. The total I/O of u-readahead was increased to 231MB from 208MB of the normal. The cover-age of readahead was expanded to 41KB but it was small than ext2/3optimizer. The result came from that the u-readahead could not decrease the small I/O, which was caused by the locality of reference. The frequency of I/O was 5,827, which was less than normal 6,379, although the total I/O was increased. The results indi-cated that ext2/3optimizer was much effective than u-readahead from the view of disk prefetch readahead.

410 K. Suzaki et al.

Table 4 Volume transitions at each processing level. The upper table indicates the volume transition on guest OS.The bottom table indicates the volume transition on LBCAS.

Normal u-readahead ext2/3opt Volume of files (number, aver-age)

203MB (2248 Av:92KB)

Volume of requested chunks 127MB

Volume of required access which includes the coverage of Readahead (average num-ber of access and size of rea-dahead)

208MB (6,379 Av:33KB)

231MB (5,827 Av:41KB)

140MB (2,129 Av:67KB)

Chunk size Downloaded size MB (Uncompressed size MB), Occupancy %

64K 128K 256K 512K

86.1(247), 51.5% 96.8(290), 43.9% 114(358), 35.5% 144(474), 26.9%

93.4(272), 46.9% 104(315), 40.3% 123(386), 35.0% 153(508), 25.1%

55.3(144), 88.7% 55.3(149), 85.3% 55.6(159), 80.0% 55.6(176), 71.8%

2) Effect of Chunk Size on Real Access Pattern Table 4 shows the volume transitions at each processing level. The upper table shows the total volume requested from transferred OS on three levels: (1) the vo-lume of files which opened by the boot procedure, (2) the chunk volume which is purely required by the boot procedure, and (3) the volume accessed to LBCAS (it includes redundant data covered by readahead). The bottom table shows the status of LBCAS for each chunk size: the volume of downloaded block files, the volume of uncompressed block files, and the occupancy of effective data in the LBCAS.

From the result, we know that the purely used chunk was 63% (127MB/203MB) of volume of opened files at boot time. It meant that 37% was not used and it caused inefficient access request of readahead. The readahead for normal ext2 required 208MB access to the LBCAS. The result shows the 81MB (208MB - 127MB) was redundant access. The u-readahead made much worse and 104MB was redundant access. The problem was solved by ext2/3optimizer signif-icantly. The readahead for ext2/3omitimizer required 140MB. The ext2/3optimizer made 67% better than the normal.

The bottom table shows the status of LBCAS. We confirmed that downloaded files were less than 56MB at any LBCAS size on ext2/3optimizer. However, the normal of 512KB chunk requires 144MB, which is 1.67 much larger than 64KB chunk (86.1MB). It was caused by bad locality of reference. On ext2/3opitimizer, the occupancy was almost same on any LBCAS size but it was decreased from 51.5% at 64KB to 26.9% at 512KB on normal. The result indicated that block reallocation was necessary for LBCAS.

Table 5 shows the frequency of each function of LBCAS for normal, u-readahead, and ext2/3optimizer. I/O requests were issued by guest OS and the fre-quency was independent of LBCAS. The rest columns indicated the function of LBCAS. The number of uncompress is summation of the number of download and storage cache. The summation of uncompress and memory cache is the total used files on the LBCAS.

Impact on Chunk Size on Deduplication and Disk Prefetch 411

Table 5 Frequency of functions of LBCAS. Upper, middle and lower tables show the normal, u-readahead and ext2/3optimizer case, respectively. “Requests” indicates the number of I/O issued by guest OS. The rest columns show the frequency of each function of LBCAS. “Files per request” indicates the frequency of downloads for files per a request.

Chunk size Normal

Requests from guestOS (R) (Av:33KB)

Download (D)

Sto-rage Cache of LBCAS (S)

Uncompress (U) =(D)+(M)

Memory Cache of LBCAS (M)

Files per Request (F) (R)= (1)+(2)+(3) (U)+(M)= (1)+(2)*2+(3)*3

64K 6,338 3,958 1,663 5,621 3,647 (1) 4,148 (2) 1,450 (3) 740

128K 6,381 2,321 1,729 4,050 3,793 (1) 4,919 (2) 1,462

256K 6,379 1,435 1,748 3,183 3,908 (1) 5,667 (2) 712

512K 6,395 948 1,769 2,717 4,019 (1) 6,054 (2) 341

u-read-ahead

(Av:41KB)

64K 5,825 4,344 1,172 5,516 3,626 (1) 3,537 (2) 1,259 (3) 1,029

128K 5,834 2,526 1,200 3,726 3,761 (1) 4,181 (2) 1,653

256K 5,827 1,544 1,179 2,723 3,908 (1) 5,032 (2) 804

512K 5,822 1,015 1,172 2,187 4,023 (1) 5,434 (2) 388

ext2/3opt (Av:67KB)

64K 2,165 2,296 626 2,922 1,311 (1) 941 (2) 380 (3) 844

128K 2,148 1,189 593 1,882 1,398 (1) 1,116 (2) 1,032

256K 2,129 634 576 1,210 1,409 (1) 1,639 (2) 490

512K 2,132 353 517 870 1,520 (1) 1,874 (2) 258

The results showed storage cache and memory cache worked well. Especially the two caches were effective on large chunk size. The frequency of storage cache and memory cache were more than the frequency of download and decompression.

412 K. Suzaki et al.

7 Discussions

Security of CAS system is important, especially when it is used in the Internet. The integrity of contents is ensured in many CAS systems, because the secure hashed names are used for verification. However, confidentiality is not ensured. First solution is to use secure file system on CAS. In this style, CAS system has to consider affiliation of access pattern and compression algorithm because it does not know the detail of secure file system. Convert encryption is used in [17] and secret sharing is used in [18] for CAS systems. These security mechanisms are in-cluded in CAS systems and access pattern is not changed. The type of implemen-tation can reuse the optimization on file system.

The data blocks are reallocated in order to be in line according to the access profile. It results in keeping large coverage of readahead at the boot procedure. It reduces the boot time but the data blocks are fragmented from the view of file. Unfortunately, the optimization is too tight and it would not fit to another access pattern. If the reallocated data blocks are used in another application, the access pattern cannot get large coverage of readahead. However, boot procedure is spe-cial and several files are used at boot procedure only. We have to estimate the spe-cial files and its ratio, which are not used for other applications.

8 Conclusions

This paper showed chunk size impact on deduplication and disk prefetch for CAS system. Most CAS systems assumed small chunk size (4KB-8KB) to get the effect of deduplication. However, the size of disk prefetch was larger, when it was opti-mized by block reallocation based on access profile. The effective chunk size has to be decided to balance the effect of deduplication and I/O optimization.

Experimental measurement was achieved on LBCAS which saves chunk data in a file with SHA-1 file name. The LBCAS image was optimized for reallocation for boot procedure. The results showed the small chunk size was inefficient in many case even if it could get the merit of deduplication. Larger chunk could mi-tigate the performance degradation with the help of reallocation of block on a file system and constant large disk prefetch. It reduced the boot time of Linux on vir-tual machine KVM with small traffic.

This paper introduced some aspects of optimization on CAS system. Chunk size was an additional factor and the optimization had to consider it for each real situation.

References

1. Tolia, N., Kozuch, M., Satyanarayanan, M., Karp, B., Bressoud, T., Perrig, A.: Oppor-tunistic use of content addressable storage for distributed file systems. In: Proceedings on USENIX Annual Technical Conference (2003)

2. Liguori, A., Hensbergen, E.C.: Experiences with Content Addressable Storage and Virtual Disks. In: Workshop on I/O Virtualization, WIOV 2008 (2008)

Impact on Chunk Size on Deduplication and Disk Prefetch 413

3. Jin, K., Miler, E.L.: The Effectiveness of Deduplication on Virtual Machine Disk Im-ages. In: The Israeli Experimental Systems Conference, SYSTOR 2009 (2009)

4. Suzaki, K., Yagi, T., Iijima, K., Quynh, N.A.: OS Circular: Internet Client for Refer-ence. In: Large Installation System Administration Conference, LISA (2007)

5. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings on The Conference on File and Storage Technologies, FAST (2002)

6. Lukkein, M.: Venti analysis and memventi implementation, Master’s thesis of Univer-sity of Twente (2008)

7. Rhea, S., Cox, R., Pesterev, A.: Fast, inexpensive content-addressed storage in Foun-dation. In: USENIX Annual Technical Conference (2008)

8. You, L.L., Pollack, K.T., Long, D.D.E.: Deepstore: An archival storage system archi-tecture. In: Proceedings 21st International Conference on Data Engineering, ICDE (2005)

9. Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: A Scalable Secondary Storage. In: USENIX Conference on File and Storage Technologies, FAST (2009)

10. Eaton, P., Weatherspoon, H., Kubiatowicz, J.: Efficiently binding data to owners in distributed content-addressable storage systems. In: Proceedings on Security in Storage Workshop, SISW 2005 (2005)

11. Zhu, B., Li, K., Patterson, H.: Avoiding the Disk Bottleneck in the Data Domain De-duplication File System. In: Proceedings of USENIX File and Storage Technologies, FAST (2008)

12. Wu, F., Xi, H., Xu, C.: On the design of a new Linux readahead framework. ACM SIGOPS Operating Systems Review 42(5) (July 2008)

13. Wu, F., Xi, H., Li, J., Zou, N.: Linux readahead: less tricks for more. In: Proceedings of the Linux Symposium (2007)

14. VanDeBogart, S., Frost, C., Kohler, E.: Reducing Seek Overhead with Application-Directed Prefetching. In: USENIX Annual Technical Conference (2009)

15. Kitagawa, K., Tan, H., Abe, D., Chiba, D., Suzaki, K., Iijima, K., Yagi, T.: File Sys-tem (Ext2) Optimization for Compressed Loopback Device. In: 13th International Li-nux System Technology Conference (2006)

16. DAVL (Disk Allocation Viewer for Linux), http://sourceforge.net/projects/davl/

17. Storer, M.W., Greenan, K., Long, D.D.E., Miller, E.L.: Secure Data Deduplication. In: Proceedings of the 4th ACM International Workshop on Storage Security and Surviva-bility, StorageSS (2008)

18. Douceur, J.R., Adya, A., Bolosky, W.J., Simon, D., Theimer, M.: Reclaiming space from duplicate files in a serverless distributed file system. In: Proceedings of Interna-tional Conference on Distributed Computing Systems, ICDCS (2002)