1 an efﬁcient hybrid i/o caching architecture …1 an efﬁcient hybrid i/o caching architecture...

1

An Efficient Hybrid I/O Caching ArchitectureUsing Heterogeneous SSDs

Reza Salkhordeh, Mostafa Hadizadeh, and Hossein Asadi

Abstract—Storage subsystem is considered as the performance bottleneck of computer systems in data-intensive applications. Solid-State Drives (SSDs) are emerging storage devices which unlike Hard Disk Drives (HDDs), do not have mechanical parts and therefore,have superior performance compared to HDDs. Due to the high cost of SSDs, entirely replacing HDDs with SSDs is not economicallyjustified. Additionally, SSDs can endure a limited number of writes before failing. To mitigate the shortcomings of SSDs while takingadvantage of their high performance, SSD caching is practiced in both academia and industry. Previously proposed caching architectureshave only focused on either performance or endurance and neglected to address both parameters in suggested architectures. Moreover,the cost, reliability, and power consumption of such architectures is not evaluated. This paper proposes a hybrid I/O caching architecturethat while offers higher performance than previous studies, it also improves power consumption with a similar budget. The proposedarchitecture uses DRAM, Read-Optimized SSD (RO-SSD), and Write-Optimized SSD (WO-SSD) in a three-level cache hierarchy andtries to efficiently redirect read requests to either DRAM or RO-SSD while sending writes to WO-SSD. To provide high reliability, dirtypages are written to at least two devices which removes any single point of failure. The power consumption is also managed byreducing the number of accesses issued to SSDs. The proposed architecture reconfigures itself between performance- and endurance-optimized policies based on the workload characteristics to maintain an effective tradeoff between performance and endurance. Wehave implemented the proposed architecture on a server equipped with industrial SSDs and HDDs. The experimental results show thatas compared to state-of-the-art studies, the proposed architecture improves performance and power consumption by an average of 8%and 28%, respectively, and reduces the cost by 5% while increasing the endurance cost by 4.7% and negligible reliability penalty.

Index Terms—Solid-State Drives, I/O Caching, Performance, Data Storage Systems.

F

1 INTRODUCTION

Hard Disk Drives (HDDs) are traditional storage devicesthat are commonly used in storage systems due to theirlow cost and high capacity. The performance gap betweenHDDs and other components of computer systems hassignificantly increased in the recent years. This is due toHDDs have mechanical parts which puts an upper limit ontheir performance. To compensate the low performance ofHDDs, storage system designers proposed several hybridarchitectures consists of HDDs and faster storage devicessuch as Solid-State Drives (SSDs).

SSDs are non-mechanical storage devices that offerhigher performance in random workloads and asymmetricread/write performance as compared to HDDs. SSD man-ufacturers design and produce several types of SSDs withdifferent performance and cost levels to match a wide rangeof user requirements. The relative performance and costsof SSDs compared to HDDs and Dynamic Random AccessMemory (DRAM) is shown in Fig. 1. Due to the relativelyvery high price of SSDs, replacing the entire disk array indata storage systems with SSDs is not practical in Big Dataera. In addition, SSDs have restricted lifetime due to thelimited number of reliable writes which can be committedto SSDs. The power outage can also cause data loss in SSDs

• Reza Salkhordeh, Mostafa Hadizadeh, and Hossein Asadi (correspondingauthor) are with the Department of Computer Engineering,Sharif University of Technology, Emails: [email protected],[email protected], and [email protected] .

25x

15

0x

40x

30

0x

5x

200x

Performance

Co

st SSD

PCI-E SSD

Enterprise

HDD

Consumer

HDD

~3x cost

~10x performance

~1.5x cost

~4x performance

+

+

DRAM

Fig. 1: Storage Devices Characteristics

as reported in [1]. Although SSDs have such shortcomings,they have received a significant attention from both aca-demic and industry and many architectures for I/O stackbased on SSDs have been proposed in recent years.

One promising application of SSDs emerged in recentyears is to alleviate low performance of HDDs with minimalcost overhead by using SSDs as a caching layer for HDD-based storage systems. The main focus of previous studiesin caching architecture is on improving performance and/orendurance. Three main approaches have been proposed inprevious studies to this end: a) prioritizing request typessuch as filesystem metadata, random, and read requests,b) optimizing baseline algorithms, and c) modifying thetraditional single-level cache. As shown in Fig. 1, caching

arX

iv:1

812.

0412

2v1

[cs

.PF]

10

Dec

201

8

2

architectures offer various performance levels with signif-icantly different costs depending on their choice of SSDs.Previous studies neglected to consider the effect of choos-ing different SSDs on the performance. Additionally, theyare mostly focused on the performance, while other majorsystem metrics such as power consumption, endurance,and reliability also need to be considered in the cachingarchitectures. To our knowledge, none of previous studiesconsidered all the mentioned parameters, simultaneously.

This paper proposes a Three-level I/O Cache Architecture(TICA) which aims to improve the performance and powerconsumption of SSD-based I/O caching while having mini-mal impact on the endurance. TICA employs Read-OptimizedSSD (RO-SSD), Write-Optimized SSD (WO-SSD), and DRAMas three levels of I/O cache. Employing heterogeneous SSDsdecreases the probability of correlated failure of SSDs sincesuch SSDs either belong to different brands or have differentinternal data structures/algorithms. To enhance the perfor-mance of both read and write requests, TICA is configuredin the write-back mode such that it buffers the most frequentread and write intensive data pages in DRAM to reducethe number of writes committed to SSDs and to increasetheir lifespan. In order to guarantee the reliability of writerequests, all write requests are committed to both DRAMand WO-SSD before the write is acknowledged to the user.Dirty data pages in DRAM are asynchronously flushedto RO-SSD to free up allocated DRAM space for futurerequests.

In order to efficiently optimize the proposed architecturefor read- and write-intensive applications, we offer twocache policies where evicted data pages from DRAM canbe either moved to SSDs or removed from the cache. Thefirst policy, called Write to Endurance Disk (TICA-WED),improves performance since the next access to the data pagewill be supplied by SSDs instead of HDD. The shortcomingof TICA-WED is reducing SSDs lifetime due to the extrawrites for moving the data page from DRAM to SSD. Toalleviate such shortcoming, the second policy, called En-durance Friendly (TICA-EF), can be employed. In TICA-EF,performance is slightly decreased while the lifetime of SSDsis significantly extended. To select between TICA-WED andTICA-EF, we propose a state-machine which analyzes therunning workload and dynamically selects the most effec-tive policy for TICA. With such data flow, TICA improvesperformance and power consumption of I/O cache whilehaving negligible endurance overhead and no cost andreliability impact.

To verify the efficiency of TICA, we have first extractedI/O traces from a server equipped with two Intel Xeon,32GB memory, and 2x SSD 512GB. I/O traces are extensivelyanalyzed and characterized to help optimize parametersof TICA towards higher performance. Experimental setupconsists of a rackmount server equipped with a RO-SSD,a WO-SSD, and 128GB memory. The benchmarking suitesfor experiments consist of over 15 traces from Microsoftresearch traces [2], HammerDB [3], and FileBench [4]. Ex-perimental results show that despite reducing the cost by5%, as compared to the state-of-the-art architectures, TICAenhances performance and power consumption, on average,by 8% (and up to 45%), and by 28% (and up to 70%),respectively, while having only 4.7% endurance overhead

and negligible reliability penalty.To our knowledge, we make the following contributions:• By carefully analyzing state-of-the-art SSDs available in

the market and their characteristics, we select two typesof SSDs to design a low-cost hybrid caching architecturecapable of providing high performance in both read-and write-intensive applications.

• We propose a three-level caching architecture, calledTICA, employing DRAM, RO-SSD, and WO-SSD to im-prove performance and power consumption of storagesystems while having negligible endurance penalty.

• TICA reduces the correlated failure rate of SSDs inI/O caching architectures by using heterogeneous SSDswhile the performance is not limited by the slower SSD,unlike traditional heterogeneous architectures.

• To balance performance and endurance, we proposeEndurance-Friendly (TICA-EF) and Write to EnduranceDisk (TICA-WED) policies for TICA, where the firstpolicy prioritizes endurance and the second policy triesto further improve performance.

• We also propose a state-machine model to select one ofTICA-EF or TICA-WED policies based on the workloadcharacteristics. Such model can identify the most ef-fective policy for TICA with negligible overhead whilerunning I/O intensive applications.

• We have implemented TICA on a physical serverequipped with enterprise SSDs and HDDs and con-ducted an extensive set of experiments to accuratelyevaluate TICA, considering all optimization and buffer-ing in storage devices and Operating System (OS).

The remainder of this paper is organized as follows.Previous studies are discussed in Section 2. The motivationfor this work is presented in Section 3. Section 4 intro-duces the proposed caching architecture. In Section 5, theexperimental setup and results are presented. Finally, Sec. 6concludes the paper.

2 PREVIOUS STUDIES

Previous studies in SSD-based I/O caching can be cate-gorized into three groups: a) prioritizing various requesttypes based on storage device characteristics, b) optimizingbaseline eviction and promotion policies, and c) proposingmulti-level caching architectures. The first category triesto characterize performance of HDDs and SSDs. Basedon the characterization, request types which have higherperformance gap between HDDs and SSDs are prioritizedto be buffered. A comprehensive study on the workloadcharacteristics and request types is conducted in [5]. Thedifferent response time of SSDs on read and write requestsis considered in [6] to prioritize data pages. The localityof data pages is employed in RPAC [7] to improve bothperformance and endurance of caching architecture. ReCA[8] tries to characterize several requests and workload typesand selects suitable data pages for caching. Filesystem meta-data is one of the primary request types which is shown tobe very efficient for caching [9], [10]. OODT [9] considersrandomness and frequency of accesses to prioritize the datapages. To reduce the migrations between HDD and SSD,[11] considers the dirty state of the data pages in memorybuffers. ECI-Cache [12] prioritizes data pages based on the

3

request type (read/write) in addition to the reuse distance.The optimization of the previous studies in this categoryis mostly orthogonal to TICA and can be employed in theeviction/promotion policies of the SSDs in TICA.

The studies in the second category try to optimize theeviction policy of caching architectures. To prevent cachepollution, Lazy Adaptive Replacement Cache (LARC) [13] issuggested which promotes data pages to cache on the sec-ond access to the data page. This technique, however, cannotperform in a timely fashion when workload is not stable.mARC [14] tries to select the more suitable option from ARCand LARC based on the workload characteristics. In [15],various management policies based on ARC for DRAM-SSD caching architectures are compared. A more generalapproach to prevent repetitive replacement of data pagesin SSDs is suggested in [16] which provides buffered datapages a more chance to be accessed again and therefore stayin the cache. S-RAC [17] characterizes workloads into sixgroups. Based on the benefit of buffering requests in eachcategory, it decides which data pages are best suited forcaching. S-RAC tries to reduce the number of writes in SSDto improve its lifetime with minimal impact on the cacheperformance. H-ARC [18] partitions the cache space intoclean and dirty sections where each section is maintained byARC algorithm. D-ARC [19] also tries to improve ARC byprioritizing the data pages based on the clean/dirty state.Me-CLOCK [20] tries to reduce the memory overhead ofSSD caching architectures by using bloom filter. RIPQ [21]suggests a segmented-LRU caching algorithm, which aggre-gates small random writes and also places user data withthe same priority close to each other. In WEC [22], write-efficient data pages are kept in cache for longer periodsto reduce the writes due to the cache replacements. Thiscategory of previous studies is also orthogonal to TICA andsuch policies can be employed jointly with TICA to furtherimprove performance and/or endurance.

Among previous studies that try to enhance perfor-mance of I/O caching by utilizing multi-level cache hier-archies, LLAMA [23] employs a DRAM-SSD architecturefor designing an Application Programming Interface (API)suitable for database management systems. FASC [24] sug-gests a DRAM-SSD buffer cache, which tries to reduce thecost of evictions from buffer cache as well as write over-heads on the SSD. Employing exclusive DRAM-SSD cachingis investigated in [15] which shows the impact of suchtechnique on improving SSD endurance. In [25], separatepromotion/demotion policies for DRAM/SSD cache levelsare replaced with a unified promotion/demotion policy toimprove both performance and endurance. uCache [26] alsoemploys a DRAM-SSD architecture and tries to reduce thenumber of writes in the SSD due to the read misses. Incase of a power loss, all dirty data pages in DRAM willbe lost which significantly reduces the reliability of uCache.Additionally, no redundant device is employed and bothDRAM and SSD are single points of failure. MDIS [27]uses a combination of DRAM, SSD, and NVM to improveperformance of I/O caching. Although performance andenergy consumption are improved in MDIS, the cost andreliability have not been taken into account. Graphene [28]suggests a DRAM-SSD architecture to improve performanceof graph computing for large graphs. SSD caching is also

suggested in distributed and High Performance Computing(HPC) environments [29], [30], [31], [32], [33].

Optimizing SSDs for key-value store is discussed inprevious studies. DIDACache [34] allows the key-valueSSD cache to directly manage the internal SSD structureto improve both performance and endurance. WiscKey [35]separates key and value storage in SSD to improve randomlookups and database loading. Deduplication and compres-sion can also be employed to extend the SSDs lifetime [36],[37], [38]. Modifying the existing interface between OS andSSDs is also suggested in previous studies to design efficientcaching architectures [39], [40]. In [40], a new interface forSSDs is designed, which does not allow overwriting of datapages, to reduce the size of the required DRAM in SSD andalso to improve performance. F2FS [41] employs an append-only logging approach to reduce the need for overwritingdata pages in SSDs. KAML [42] suggests a customizedinterface for SSDs for storing and retrieval of key-valuedata. FStream [43] employs streamID to hint Flash TranslationLayer (FTL) on lifetime of user data so that FTL places thedata pages with the same lifetime on a physical block. Opti-mizing SSD caching architectures by leveraging informationfrom SSDs internal data structures such as FTL is alsosuggested in previous studies [44], [45]. FLIN [46] providesa fair scheduler for SSDs servicing multiple applicationssimultaneously. A scheduler to maximize the efficiency ofparallelism inside of the SSD is also proposed in [47]. SHRD[48] tries to optimize the physical placement of data pagesin SSD to reduce the FTL overheads on random writerequests. AGCR [49] characterizes the workload behaviorand increases the program time of read-intensive data pagesin the flash chips so that their read time can be decreased.Such architectures require hardware modifications which isnot in the scope of this paper.

In general, one of the main shortcomings of previousstudies is neglecting to consider the difference between vari-ous SSD brands and models in terms of cost and read/writeperformance. Many types of SSDs are optimized towardsread operations while others are optimized to providehigher write performance. In addition, the tradeoff betweenperformance, power consumption, endurance, reliability,and cost has not been considered in previous works whichis crucial for I/O caching architectures.

3 MOTIVATION

In this section, we detail the three shortcomings of state-of-the-art caching architectures which motivates us to proposethree-level caching architecture employing SSDs in additionto DRAM. First, we show the diverse characteristics of theSSDs in the market and the performance impact of em-ploying such SSDs as the caching layer for HDDs. Second,we evaluate the write overhead of caching read misses inSSDs. Finally, we investigate the performance of mirroredheterogeneous SSDs employed to overcome the correlatedSSDs failure.

SSD manufacturers employ Single-Level Cell (SLC), Multi-Level Cell (MLC), or Three-Level Cell (TLC) NAND chips intheir products. SLC SSDs have the highest performance andendurance at the cost of more than 2x of MLC SSDs. Theread performance of MLC SSDs, however, is comparable

4

0

2000

4000

6000

8000

10000

12000

14000

Read Write

IOP

S p

er

$

WO-SSDRO-SSD

C-SSD

Fig. 2: Performance per Cost for various SSDs

TABLE 1: Power Consumption, Cost, Reliability, and En-durance of Storage Devices in the Cache Architectures

Device MTTF (h) $/GB Writes/GBRead/Write/Idle

Power (w)DRAM 4M 7.875 ∞ 4/4/4C-SSD 1.5M 0.375 750 3.3/3.4/0.07

RO-SSD 2M 0.74 1,171 3.3/3.4/0.07WO-SSD 2M 0.842 6,416 2.4/3.1/1.3

to the SLC SSDs due to the nature of the NAND flashes.Table 1 reports the performance and endurance of severaltypes of SSDs. Using high cost SSDs is not economicallyjustifiable in several workload types. Fig. 2 shows the readand write IOPS per $ for various SSDs. In read-intensiveworkloads employing RO-SSD or Consumer-SSD (C-SSD)results in higher performance per cost. RO- or C-SSDs,however, fail to provide high performance per cost in write-intensive workloads. This experiment reveals that high-costand low-cost SSDs can be efficient in different workloadtypes and using only one SSD type cannot provide suitableperformance per cost in all workload types.

In Write-Back (WB) cache policy which is commonlypracticed in previous studies, each read miss requires writ-ing a data page to the SSD while all write requests aredirected to the SSD. Therefore, the total number of writesin SSD will be higher than the number of write requests inthe workload. This will result in reduced lifetime of SSDsemployed as a WB cache. To evaluate the amplification ofwrites in previous studies, we introduce Cache Write Ampli-fication Factor (CWAF) parameter which is calculated basedon Equation 1. Fig. 3 shows CWAF parameter for variousworkloads. In Stg 1 and Webserver workloads, CWAF isgreater than 4.5 which shows the importance of read misseson the SSDs lifetime. By reducing the number of writes dueto the read misses on SSDs, we can significantly improve theSSD endurance.

One of the reliability concerns of employing SSDs, spe-cially in Redundant Array of Independent Disks (RAID) config-urations is correlated failures due to the either software orhardware defects [50]. Since SSDs in the RAID configurationare identical and in mirrored RAID configurations theyreceive the same accesses, any software defect probably willtrigger on both SSDs resulting in data loss. Additionally,due to the same aging pattern and lifetime, both SSDs areexpected to fail in a close time interval which also results indata loss. To mitigate such problem and reduce the probabil-ity of double disk failures, employing heterogeneous SSDswith different internal algorithms and/or from different

0

0.5

1

1.5

2

2.5

3

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

14.27 4.57

Cach

e W

rite

A

mp

lifi

cati

on

Facto

r

Fig. 3: CWAF for various workloads

CWAF =Writesssd

Writesworkload(1)

brands can be practiced. Here, we investigate the effectof employing such technique on various MLC-TLC SSDcombinations. Fig. 4 shows the normalized performance ofvarious mirrored (RAID-1) configurations for heterogeneousSSDs compared to the performance of homogeneous mir-rored SSDs. As can be seen in this figure, the performanceis limited by the slower SSD, specially in write requestswhich results in overall lower performance per cost. Forinstance, replacing a SSD in a mirrored WO-SSD with aRO-SSD results in almost 5x performance degradation inwrite requests. Write performance of two mirrored RO-SSDsis equal to the performance of mirrored WO-SSD and RO-SSD while the cost and power consumption of the latterarchitecture is higher. In read requests, the performancedegradation of employing heterogeneous architectures islower compared to write requests since the performance gapof different SSDs is smaller in read requests. This experimentshows that there is a need for heterogeneous SSD archi-tectures with high performance per cost to simultaneouslyimprove both performance and reliability.

4 PROPOSED ARCHITECTURE

An efficient I/O caching architecture should provide highperformance, endurance, and reliability with reasonable costoverhead in order to be integrated in storage and high-performance servers. Previous caching architectures haveneglected to simultaneously consider such important pa-rameters of I/O caching and focused on improving only oneof the parameters without investigating the correspondingimpact on the other parameters. The proposed architectureis motivated by the lack of a comprehensive caching ar-chitecture which is able to mitigate the shortcomings ofprevious studies discussed in Section 3.

0

2000

4000

6000

8000

10000

Write IOPS Read IOPS

IOP

S

2xWO−SSD

WO−SSD/SSD−C

WO−SSD/RO−SSD

2xSSD−C

SSD−C/RO−SSD

2xRO−SSD

Fig. 4: Performance of Heterogeneous RAID Architectures

5

Cache

WO-SSD

RO-SSDDRAM

Disk Subsystem

Block I/O Driver

Async. Write

Dirty Eviction

TRIM

Read Write

Wri

te

Wri

te

Rea

d H

it R

ead

Mis

s

Fig. 5: Proposed Architecture

For this purpose, we try to improve performance, powerconsumption, and lifetime of I/O cache by using a DRAMand high performance SSDs and reducing the number ofcommitted writes to SSDs. To address the reliability concern,TICA is designed such that it does not have any singlepoint of failure and in addition, a failure in any of thecaching devices will not result in data loss. This is while thecost overhead is kept as small as possible compared to thetraditional caching architectures. TICA is also architectedin such a way that the optimizations proposed in previousstudies for increasing the cache hit ratio and prioritizingrequest types can be directly integrated with TICA in orderto further improve performance and/or endurance.

4.1 High-Level ArchitectureTo design an efficient caching architecture, we leverage thetraditional cache architecture and use three different storagedevices for I/O caching. A DRAM module alongside a RO-SSD and a WO-SSD form the three-levels of the proposedarchitecture. In order to decrease the probability of data loss,a small battery-backup unit is added to DRAM which cansustain a cold system reboot. Such heterogeneous architec-ture improves the reliability by reducing the probability ofdouble disk failures due to the correlated failure betweenSSDs of the same model. Fig. 5 depicts the proposed ar-chitecture consists of three hardware modules. The datamigration inside the I/O cache or between the cache andthe main storage device is done using Direct-Memory Access(DMA) unit to reduce the CPU overhead. Since a data pagemight exist in more than one caching device at any time,they are looked up based on the device priority whichare prioritized as DRAM, RO-SSD, and then WO-SSD forread requests. TICA works in write-back mode and as such,all write requests will be buffered. If the old data pageresides in any of the caching devices, it will be invalidated.In addition to invalidation in mapping data structures, aTRIM1 request is sent to SSDs to improve its performanceon write requests.

The proposed architecture also employs a DRAM inaddition to SSDs in the caching layers where it is partitionedinto read and asynchronous write cache sections. The readcache partition is used for caching read miss requests. Therequested data page is moved to DRAM using DMA andafterwards the data page will be copied from DRAM cacheto the destination memory address in the user space. The

1. Informs disk about data blocks which are no longer in use by OS.

0

0.5

1

1.5

2

Read Hit Read Miss Write Eviction

No

rmalized

Late

ncy

RO-SSD

WO-SSD

Mixed

TICA

Fig. 6: Average Response Time of Cache Operations Nor-malized to RO-SSD Read Latency

user write requests arriving to the cache will be redirected toboth WO-SSD and DRAM where they will be stored in thesecond partition of DRAM. An asynchronous thread goesthrough the second partition and sends the data pages tothe RO-SSD and removes them from DRAM. The size ofpartitions is adjusted dynamically in the runtime based onthe percentage of the write requests arrived to DRAM.

To enhance the performance of the proposed architec-ture, RO-SSD and WO-SSD are configured in such a waythat they reside in the critical path of responding to thoserequests that can be handled more efficiently. This wayTICA can have optimal performance on both read and writerequests without having to use ultra high-performance SSDswhich significantly reduces the total cost of I/O cache.In order to show the difference between the proposed ar-chitecture and the traditional RAID 1 configurations, thenormalized average response time under various cache op-erations is depicted in Fig. 6. All configurations use twoSSDs where in the first two configurations, SSDs in RAID1 are the same and in the third configuration (mixed) andTICA, one RO-SSD and one WO-SSD are employed. In orderto have a fair comparison in Fig. 6, the DRAM module inthe proposed architecture is ignored in this experiment. Asshown in Fig. 6, TICA has near optimal performance onevery cache operation since the critical path of operationsand the optimal operation for each SSD is considered.

4.2 Detailed Algorithm

Algorithm 1 depicts the workflow of the proposed architec-ture in case of a request arrival. If the request is to write adata page and the data page exists in the cache, it will beinvalidated. Lines 5 through 8 check the DRAM write cachepartition for free space. If there is no space available, the sizeof the write cache partition will be extended. The calculationfor extending the write cache size considers a baseline cachesize called defwritecachesize and if the current write cachesize is greater than this value, the write cache size will beextended by smaller values. This technique prevents writecache partition from over extending which will reduce thenumber of read hits from DRAM. In addition to DRAM,WO-SSD will be checked for free space and if there is nospace left, a victim data page will be selected and discardedfrom both SSDs (lines 9 through 11). The victim data pagewill be removed from RO-SSD since leaving a dirty datapage in RO-SSD has a risk of data loss in case of failureof this SSD. After allocating a page in both DRAM andWO-SSD, the write request will be issued. The request

6

Algorithm 1 Proposed Caching Algorithm1: procedure ACCESS(Request)2: capacityEstimator(Request)3: if Request.iswrite then4: IssueDiscards(Request.address)5: if DRAMwritecache.isfull then6: writecachesize← writecachesize+

2−(writecachesize−defwritecachesize)

7: DiscardfromDRAM(writecachesize)8: waitforFreeup9: if WOSSD.isfull then

10: FreeupWOSSD11: FreeupROSSD12: Issue writes to WOSSD and DRAM13: Wait for issued writes14: update LRUDRAM and LRUWOSSD

15: Issue async. write to ROSSD16: else17: if inDRAM(Request.address) then18: ReadfromDRAM(Request.address)19: Update LRUDRAM

20: else if InROSSD(Request.address) then21: ReadfromROSSD(Request.address)22: Update LRUROSSD

23: Update LRUWOSSD

24: else if InWOSSD(Request.address) then25: ReadfromWOSSD(Request.address)26: Update WOSSDLRU27: else28: if DRAMReadcache.isfull then29: writecachesize← max(defwritecachesize,

writecachesize− 2writecachesize−defwritecachesize)30: DiscardfromDRAM(writecachesize)31: if TICA is in WED mode then32: Copy evicted page to WOSSD33: Issue page fault for Request.address

for flushing from DRAM to RO-SSD will be issued aftercompletion of the user request.

If an incoming request is for reading a data page, thecaching devices will be searched based on their read per-formance (DRAM, RO-SSD, and WO-SSD, in order). If therequest is served from DRAM, LRUDRAM will be updatedand if the request is hit in either of SSDs, the LRU queue forboth SSDs will be updated. If the request is missed in thecache while DRAM read cache is full and the DRAM writecache size is greater than defwritecachesize, the DRAMwrite cache size will be shrunk. In order to shrink thewrite cache, it is required to wait for completion of oneof the ongoing asynchronous writes to RO-SSD; this willmake the current request being stalled. On the other hand,evicting a data page from DRAM read cache imposes no I/Ooverhead. Therefore, in the proposed architecture, a requestis sent to the disk for reading the data page and a watcherthread waits for completion of one of the asynchronouswrites to RO-SSD. If one of the asynchronous writes isfinished before disk I/O, its place will be allocated for thedata page and if not, a data page will be evicted from DRAMread cache in order to make the required space available.TICA allocates a few data pages from DRAM and SSDsfor internal operations such as migrating data pages. Thedefault behavior of TICA is to discard the evicted data pagefrom DRAM which we call TICA-EF. There is an alternativeapproach which copies the evicted data page to WO-SSDwhich is called TICA-WED. TICA-WED and the algorithmfor selecting the TICA policy are detailed next.

4.3 TICA-EF vs. TICA-WEDAs mentioned earlier, the endurance of the SSD cachingarchitectures is penalized by the read misses. TICA-EF

0

10

20

30

40

50

60

70

80

90

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

Hit

Ra

tio

RAID1

TICA-EF

Fig. 7: TICA-EF Hit Ratio

0

0.2

0.4

0.6

0.8

1

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

Nu

mb

er

of

SS

D W

rite

s

RAID1-Cache TICA-EF TICA-WED

Fig. 8: Total Number of Writes Committed to SSDs

eliminates the writes in the SSDs due to the read missesand therefore, is called Endurance-Friendly. Such approach,however, imposes performance cost since the data pagesare evicted early from the cache and cache hit ratio isdecreased. Fig. 7 shows the evaluation of the TICA-EF interms of the cache hit ratio compared to the baseline RAID-1 configuration. TICA-EF fails to provide high performancein several workloads such as Usr 0, Hm 1, and Wdev 0. Ourinvestigation reveals that this is due to the large working setsize of the read-intensive data pages. Such data pages canbe only buffered in DRAM and since DRAM has a smallsize, data pages are evicted before re-referencing. Therefore,TICA-EF needs to access HDD more often to bring backevicted data pages to DRAM.

Copying the evicted data pages from DRAM to SSD canimprove performance at the cost of reducing endurance. Toshow the effect of such technique, we propose TICA-WEDwhich copies the data pages on eviction from DRAM to WO-SSD. As mentioned in the motivation section (Section 3), thiswill decrease the endurance of SSDs. Fig. 8 shows the num-ber of writes committed to SSDs in TICA-WED compared toTICA-EF. In read-intensive workloads with small workingset size, TICA-EF has close endurance efficiency to TICA-WED. In other workloads, however, TICA-WED has higherendurance efficiency. We can conclude here that both TICA-EF and TICA-WED policies can provide a suitable policy fora specific workload type and a general approach is requiredto select one of these two policies based on the workloadcharacteristics.

4.4 Adaptive TICA

To select an effective policy for TICA, we have analyzedthe performance of TICA-EF. The performance behavior ofTICA-EF in Wdev 0, Usr 0, Ts 0 and Rsrch 0 workloadsreveals that there are two reasons for low performance ofTICA-EF: 1) DRAM size is less than the working set size,and 2) cold data pages are trapped in SSDs. To mitigate theperformance degradation of TICA-EF, we propose TICA-

7

Algorithm 2 DRAM Low Capacity Identifier1: windowSize← 2 ∗DRAMsize

2: requestCounter, EQHit,DRAMReadHit← 03: procedure CAPACITYESTIMATOR(request)4: requestCounter ← requestCounter + 15: if request.isRead then6: if Hit in DRAM then7: DRAMReadHit← DRAMReadHit+ 18: else if Hit in EQ then9: EQHit← EQHit+ 1

10: if requestCounter == windowSize then11: if (EQHit+DRAMReadHit) > Tmax then12: Switch to TICA-WED13: else if EQHit > Tmin then14: Switch to TICA-WED15: else16: Switch to TICA-EF17: requestCounter, EQHit,DRAMReadHit← 0

Adaptive (TICA-A) which switches the policy from TICA-EF to TICA-WED when one of the two above conditions isdetected. In this section, the algorithms for identifying thementioned two conditions are detailed.

4.4.1 DRAM Low Capacity IdentifierDue to the small size of DRAM in TICA, the thrashingproblem [51] is likely to happen if the working set size ofthe workload is larger than DRAM size. To identify suchcondition, we keep a queue of evicted data pages fromDRAM, called Evicted Queue (EQ). Evicted data pages fromDRAM enter EQ if they are copied to the DRAM due to aread miss. The hit ratio with and without considering the EQis calculated periodically and if their difference is greaterthan a predefined threshold (Tmin), TICA will switch to theTICA-WED policy. Employing the threshold for minimumdifference between hit ratios prevents constantly switchingbetween the two policies.

Since TICA-EF lowers the Total Cost of Ownership (TCO)by extending the SSDs lifetime, we prefer it over TICA-WED. Therefore, if TICA-EF has high hit ratio, regardlessof the hit ratio of EQ, we switch to TICA-EF. The threshold(Tmax), however, should be set conservatively to ensure neg-ligible performance degradation. Modifying the thresholdsenables us to prefer one of the two policies based on the I/Odemand of the storage system and/or the overall systemstatus. For instance, when most SSDs in the array are old, wewould prefer TICA-EF to prolong their lifetime and reducethe probability of data loss. Algorithm 2 shows the flowof identifying thrashing in DRAM. Switching between thepolicies is conducted once the number of incoming requeststo the cache becomes twice the size of DRAM memory. Foreach request, the counter for hits in DRAM and EQ areupdated in Lines 5 through 9. In Lines 11 to 17, the hit ratiosare checked and TICA policy is changed if required.

4.4.2 Preventing Cold Data Trapped in SSDsIn TICA-EF, only write accesses are redirected to SSDs andall read accesses are supplied by either DRAM or HDD.Therefore, in read-intensive workloads, SSDs become idleand previously hot data pages which are now cold residein SSDs without any means to evict such data pages. Toprevent such problem, we propose a State Machine BasedInsertion (SMBI) to conservatively switch from TICA-EF toTICA-WED in order to replace the cold data pages in SSDs.

Initial State--------------------Use TICA-EF Policy

counter = steps

Wait State--------------------Use TICA-EF PolicyCounter=counter-1

Too many HDD Reads

High Read Hit

Too many HDD Reads and High

Read Hit

Too many HDD Readscounter == 0 or High Read Hit

counter > 0

WED State--------------------

Use TICA-WED Policy

counter = steps-1

Fig. 9: State Machine for Preventing Cold Data Trapped inSSD

RO-SSD WO-SSD12 Gbps

12 G

bp

s

12 G

bp

s

DRAM

RAID Controller

SAS Expander

PCI-E 8x

Fig. 10: Hardware Architecture of Experimental Setup

The simplified model of SMBI is shown in Fig. 9. Weidentify two conditions 1) too many HDD reads and 2) highhit ratio. When both conditions are met in the workload,TICA switches to TICA-WED until one of the conditionsis no longer valid. Too many HDD reads shows that theread working set size is larger than DRAM size. In suchcondition, we allow evicted data pages from DRAM to enterWO-SSD to increase its hit ratio and reduce the numberof HDD reads. We cannot rely solely on the number ofHDD reads for switching to WED since in workloads withlow locality, the number of HDD reads is also high andcopying the evicted data pages from DRAM to WO-SSDwill only impose endurance cost without any performanceimprovement. Therefore, SMBI stays in the WED state aslong as both number of HDD reads and hit ratio are high.Having high hit ratio and low number of HDD reads showsthat the working set size is smaller than DRAM size andSMBI switches back to the EF policy.

If the hit ratio is decreased while the number of HDDreads is still high, SMBI enters a waiting state which pre-vents re-entering WED mode in the next few windows.This state prevents constantly switching between the twopolicies. Algorithm 3 shows the detailed flow of SMBI. Line13 switches the policy to TICA-WED if the number of HDDread requests in the current window is greater than theThdd threshold. In lines 18 through 28, SMBI checks bothconditions and if one of them is no longer valid, it switchesback to TICA-EF policy.

8

0.6

0.7

0.8

0.9

1

1.1

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

2.91 1.3 3.15 1.25N

orm

ali

ze

d R

es

po

ns

e T

ime TICA-EF TICA-WED uCache S-RAC TICA-A

Fig. 11: Normalized Response Time: TICA vs. Conventional Caching Architectures

Algorithm 3 State Machine Based Insertion1: counter ← steps2: currentState← initialState, nextState← initialState3: diskRead, readHit, requestCounter ← 04: procedure SMBI(request)5: requestCounter ← requestCounter + 16: if request.isRead then7: if Hit in cache then8: readHit← readHit+ 19: else

10: diskRead← diskRead+ 1

11: if requestCounter == sampleSize then12: if currentState == initialState then13: if diskRead > Thdd then14: Switch to TICA-WED policy15: counter ← steps− 116: currentState← WEDState17: else if currentState == WEDState then18: if diskRead > Thdd then19: if readHit > Tread then20: Switch to TICA-WED policy21: currentState← WEDState22: else23: Switch to TICA-EF policy24: currentState← waitState25: else if readHit > Tread then26: Switch to TICA-EF policy27: counter ← steps28: currentState← initialState29: else if currentState == waitState then30: if (counter == 0 or readHit) > Tread then31: Switch to TICA-EF policy32: counter ← steps33: currentState← initialState34: else35: counter ← counter − 1

5 EXPERIMENTAL RESULTS

In this section, the performance, power consumption, en-durance, and reliability of the proposed architecture is eval-uated. We compare TICA with a state-of-the-art multi-levelcaching architecture (uCache [26]) and a state-of-the-art SSDcaching architecture (S-RAC [17]). In order to have a faircomparison, uCache is modified and all single points offailure are removed to improve its reliability. Support forRAID1 SSDs is also added to uCache. Since S-RAC is asingle-level cache architecture, a first level DRAM cache isadded so that all three examined architectures benefit fromboth DRAM and SSD. To show the effect of different TICApolicies, in addition to TICA-A, both TICA-EF and TICA-WED are also evaluated. The detailed characteristics of theworkloads are reported in Table 2.

TABLE 2: Workload Characteristics

WorkloadTotal

Requests Size Read Requests Writes RequestsTPCC 43.932 GB 1,352,983 (70%) 581,112 (30%)Webserver 7.607 GB 418,951 (61%) 270,569 (39%)DevToolRel 3.133 GB 108,507 (68%) 52,032 (32%)LiveMapsBE 15.646 GB 294,493 (71%) 115,862 (28%)MSNFS 10.251 GB 644,573 (65%) 349,485 (35%)Exchange 9.795 GB 158,011 (24%) 502,716 (76%)Postmark 19.437 GB 1,272,148 (29%) 3,172,014 (71%)Stg 1 91.815 GB 1,400,409 (64%) 796,452 (36%)Rsrch 0 13.11 GB 133,625 (9%) 1,300,030 (91%)Src1 2 1.65 TB 21,112,615 (57%) 16,302,998 (43%)Wdev 0 10.628 GB 229,529 (20%) 913,732 (80%)Ts 0 16.612 GB 316,692 (18%) 1,485,042 (82%)Usr 0 51.945 GB 904,483 (40%) 1,333,406 (60%)Hm 1 9.45 GB 580,896 (94%) 28,415 (6%)Mds 0 11.4 GB 143,973 (31%) 1,067,061 (69%)Prn 0 63.44 GB 602,480 (22%) 4,983,406 (78%)Prxy 0 61.03 GB 383,524 (5%) 12,135,444 (95%)

5.1 Experimental Setup

To conduct the experiments, we employed a rackmountserver equipped with Intel Xeon, 128GB memory, and a SSDfor the operating system to reduce the effect of the operatingsystem and the other running applications on the obtainedresults. Fig. 10 shows the actual server running experimentsand the interfaces between I/O cache layers. SAS expanderis capable of supporting both SATA and SAS disk drivers.RAID controller is configured in Just a Bunch Of Disks (JBOD)mode where disks are directly provided to the OS withoutany processing by the controller. Employing SAS expanderand RAID controller enables us to run experiments on vari-ous SATA/SAS SSDs without need for disk replacement orserver reboot.

WO- and RO-SSDs are selected from enterprise-gradeSSDs employed in the datacenters. We warm up SSDs beforeeach experiment by issuing requests until the SSD reaches astable latency. The traces are replayed on the devices usingour in-house trace player which is validated by blktrace[52] tool. The requests sent to the disk by our trace playerare compared to the original trace file to ensure it has theexpected behavior. The characteristics of DRAM and SSDsemployed in the experimental results is reported in Table 1.In the experiments, size of SSDs and DRAM is set to 10%and 1% of the working set size, respectively. The value ofTmin, Tmax, Thdd, and Tread are set to 0.15, 0.25, 0.2, and 0.2,respectively.

9

5.2 PerformanceFig. 11 shows the normalized response time of TICA com-pared to uCache and S-RAC, all normalized to uCache.TICA-WED which is optimized toward higher performance,reduces the response time by 12% on average compared touCache and S-RAC. The highest performance improvementof TICA belongs to Ts 0 workload with 45% reduction inresponse time (compared to S-RAC). Although TICA-WEDand TICA-EF differ in read miss policy and Ts 0 is a write-dominant workload (80% write requests), TICA-WED stillperforms better than TICA-EF with 42% less response time.This shows the significant impact of writing read misses onthe SSDs and therefore, forcing the dirty data pages to beevicted from cache. TICA-WED also improves performancein read-dominant workloads such as TPCC by copying theevicted data pages from DRAM to WO-SSD.

TICA-EF, optimized toward better endurance, outper-forms TICA-WED in few workloads such as Webserver andExchange. Our investigation reveals that this is due to a) thelimited space of SSDs and b) forcing the eviction of dirtydata pages from SSD which is conducted aggressively inTICA-WED. In Webserver workload, TICA-A also identifiessuch problem and manages copying evicted data pages fromDRAM to WO-SSD. Therefore, it has better performance-efficiency in Webserver workload compared to both TICA-EF and TICA-WED policies. By managing the evicted datapages from DRAM, TICA-A improves performance com-pared to previous studies by up to 45% and 8% on average.We can conclude here that TICA-A is performance-efficientin both read- and write-intensive workloads by managingthe evicted data pages from DRAM.

5.3 Power ConsumptionTo evaluate the power consumption of TICA, we estimatethe total consumed energy for workloads. In addition tothe read and write requests, idle power consumption ofthe devices is also considered in the energy consumption tofurther increase its accuracy. The read and write operationsfor background tasks such as copying the dirty data pagesfrom DRAM to RO-SSD and flushing such data pages fromRO-SSD to disk are also included in the energy consumptionformula. Equation 2 shows the formula for estimating thetotal energy consumption. All parameters are detailed inTable 3.

TICA improves the power consumption by a) employingpower-efficient SSDs while maintaining the performanceand b) reducing the number of accesses to the SSDs. Previ-ous studies employ two identical SSDs in a mirrored RAIDconfiguration to provide high reliability while as discussedin Section 3, heterogeneous SSDs are not performance-efficient in traditional mirrored RAID configurations. Assuch, state-of-the-art architectures such as uCache and S-RAC need to employ two WO-SSDs, which have high powerconsumption. TICA on the other hand, employs a WO-SSD and a RO-SSD in its architectures which results inlower power consumption compared to using two WO-SSDs. Additionally, by reducing the response time of therequests, SSDs more often enter idle state and therefore, thetotal power consumption is decreased. Fig. 12 shows thenormalized consumed energy of TICA compared to uCacheand S-RAC, normalized to uCache. In all workloads, TICA

TABLE 3: Parameters DescriptionParameter DescriptionReadwo WO-SSD Total Read RequestsWritewo WO-SSD Total Write RequestsReadro RO-SSD Total Read RequestsWritero RO-SSD Total Write RequestsReadD DRAM Total Read RequestsWriteD DRAM Total Write RequestsRLatwo WO-SSD Read LatencyWLatwo WO-SSD Write LatencyRLatro RO-SSD Read LatencyWLatro RO-SSD Write LatencyLatwo DRAM LatencyRPwo WO-SSD Read PowerWPwo WO-SSD Write PowerRPro RO-SSD Read PowerWPro RO-SSD Write PowerPD DRAM PowerIPwo WO-SSD Idle PowerIPro RO-SSD Idle PowerIPD DRAM Idle PowerIdlewo Total WO-SSD Idle TimeIdlero Total RO-SSD Idle TimeIdleD Total DRAM Idle TimeRDevice Device ReliabilityUDevice Device Unreliability

MTTFDevice Device Mean Time To Failure

policies improve power consumption which shows the ef-fectiveness of replacing a WO-SSD with a RO-SSD. The onlyexceptions are Usr 0 and Hm 1 workloads where TICA-EFhas 2.35x and 2.71x higher power consumption compared touCache, respectively. This is due to the high response timeof the requests in this workload, which prevents SSDs fromentering the idle state. TICA-A improves power consump-tion by an average of 28% and the maximum improvementin the power consumption (70%) belongs to Ts 0 workloadin comparison with S-RAC.

5.4 EnduranceTICA-WED redirects all evicted data pages from DRAM toWO-SSD and therefore, significantly increases the numberof writes in SSDs. TICA-EF on the other hand, does notcopy such data pages to SSDs to preserve their lifetime.Fig. 13 shows the normalized number of writes in the SSDscompared to uCache and S-RAC, normalized to uCache.uCache, S-RAC, and TICA-EF have almost the same numberof writes in SSDs since they limit writes in the SSDs. TICA-EF improves SSDs lifetime by an average of 1.3% comparedto uCache and S-RAC.

TICA-WED places all evicted data pages from DRAMin SSD and therefore, increases the number of writes inSSDs. For instance, in Stg 1 workload, TICA-WED reducesthe SSDs lifetime by more than 7x. TICA-A which triesto balance the performance and endurance has only 4.7%on average lifetime overhead compared to uCache and S-RAC. Since TICA employs an unbalanced writing schemebetween WO-SSD and RO-SSD, the lifetime of the RO-SSDis not affected by the increase in the number of writes. Notethat TICA-A still improves the SSDs lifetime by an averageof 38% compared to the single-level SSD architectures (notshown in Fig. 13).

5.5 Hit RatioTICA, uCache, and S-RAC do not comply with a simpleLRU algorithm (there is not a global LRU queue for all datapages). Hence, the hit ratio of such multi-level caching archi-tectures needs to be evaluated. Although the performance of

10

0

0.2

0.4

0.6

0.8

1

1.2

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

2.35 1.3 2.71N

orm

ali

ze

d E

ne

rgy

TICA-EF TICA-WED uCache S-RAC TICA-A

Fig. 12: Normalized Power Consumption: TICA vs. Conventional Caching Architectures

Energy =

Readwo∑(RLatwo ∗ RPwo) +

Writewo∑(WLatwo ∗WPwo)+

(Idlewo ∗ IPwo) +

Readro∑(RLatro ∗ RPro)+

Writero∑(WLatro ∗WPro) + (Idlero ∗ IPro)+

(ReadD+WriteD)∑(LatD ∗ PD) + (IdleD ∗ IPro) (2)

such architectures is already investigated in Section 5.2, thehit ratio evaluation enables us to predict the performancein other hardware setups such as using RAID or StorageArea Network (SAN) storages as the backend of the cachingarchitecture.

Fig. 14 shows the normalized hit ratio of TICA-EF, TICA-WED, TICA-A, and S-RAC compared to uCache. TICA-WED which is optimized toward performance, has the high-est average hit ratio. TICA-A improves hit ratio compared touCache and S-RAC in all workloads by an average of 6.4%.The highest improvement belongs to DevToolRel workloadwhere TICA-WED and TICA-A improve hit ratio by 46%and 39%, respectively. S-RAC, however, has comparable hitratio with TICA in this workload. This is due to the ghostqueue employed in S-RAC to identify requests with higherbenefit for caching. Due to the DRAM eviction policy ofTICA-EF, it has lower hit ratio compared to uCache andother TICA policies. The hit ratio degradation of TICA-EF,however, is negligible in most workloads.

5.6 ReliabilityuCache and S-RAC employ two WO-SSDs while TICA usesa WO-SSD alongside a RO-SSD in its architecture. BothuCache and S-RAC will fail only when both WO-SSDs arefailed. There are two conditions which can result in failureof TICA: a) failure of WO-SSD and RO-SSD, and b) failureof WO-SSD and DRAM. Since RO-SSD has lower MeanTime To Failure (MTTF) [53] compared to WO-SSD, TICAmight reduce the overall reliability of the system. DRAM,on the other hand, has higher MTTF compared to WO-SSD. Therefore, the actual reliability of TICA depends onthe duration of keeping dirty data pages in DRAM and RO-SSD.

The reliability of caching architectures is calculatedbased on Reliability Block Diagram (RBD) [54]. To calculate

the system reliability, RBD uses 1) the reliability of systemcomponents (storage devices in our use-case) and 2) thedependency of system failure to the failure of components.The reliability of storage devices is computed based onMTTF [53]. This is done via considering the exponential dis-tribution for faults in SSDs, which is formulated in Equation3. The MTTF value for storage devices is extracted from theirdatasheets. Although other distributions such as Weibullmight be more suitable, they require additional parametersto MTTF to model reliability. Field studies in SSD failuremodels do not disclose the brands of SSDs [50], [55], [56],and therefore, we cannot use such models. If such fieldstudies become available, we can employ a more accurateMTTF for devices in the exponential distribution. This canbe done by estimating the real MTTF of the device, basedon the Cumulative Distribution Function (CDF) of Weibulldistribution, as discussed in [57]. The description of param-eters in Equation 3 is available in Table 3. Equation 4 andEquation 5 show the formula for calculating the reliabilityof TICA and uCache, respectively. Note that the reliabilityof S-RAC is calculated using the same formula as uCache.The α variable denotes the weight of each failure scenario.In the traditional RAID architectures, α is equal to one. InTICA, α depends on the running workload and number ofwrite requests. Since TICA employs a RO-SSD instead ofWO-SSD, compared to uCache and S-RAC, it is expectedthat TICA slightly reduces the reliability. Considering 0.8 asthe value of α, which is close to the actual value of α inour experiments, TICA will have unreliability of 1.27 ∗ 10−5while unreliability of uCache and S-RAC is 1.14∗10−5. Notethat the cost of hardware in TICA is lower than uCache andTICA will have the same reliability compared to uCache ifthe same hardware is employed for both architectures.

5.7 OverallWe can conclude that the experimental results with fol-lowing observations: 1) TICA improves performance andhit ratio compared to previous state-of-the-art architectures.2) The power consumption is also improved in TICA byreducing the number of accesses to the SSDs. 3) Lifetimeof SSDs is extended in TICA compared to single-level SSDcaching architectures while the lifetime is negligibly reducedcompared to uCache and S-RAC. 4) The reliability of TICAis the same as previous studies when the same hardwareis employed. Reducing the total cost in TICA can result inslightly less reliability. Fig 15 shows the overall comparison

11

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

1.9

7.62 2.18 1.69 1.64 1.78

No

rmalized

SS

D W

rite

sTICA-EF TICA-WED uCache S-RAC TICA-A

Fig. 13: Normalized Number of Writes Committed to SSDs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Rsrch_0

Ts_0

Usr_0

Wdev_0

Src1_2

Stg_1

TPCC

Webserver

Postmark

MSN

FS

Exchange

LiveMapsB

E

DevToolR

el

Hm

_1

Mds_0

Prn_0

Prxy_0

Average

1.46

No

rmalized

Hit

Rati

o

TICA-EF TICA-WED uCache S-RAC TICA-A

Fig. 14: Hit Ratio of Caching Architectures

RDevice =e− 1

MTTFDevice∗365∗24 (3)

RTICA =α ∗ (1− (1− RWO-SSD) ∗ (1− RD))+

(1− α) ∗ (1− (1− RWO-SSD) ∗ (1− RRO-SSD)) (4)

RuCache =1− (1− RWO-SSD) ∗ (1− RWO-SSD) (5)

of TICA policies with uCache and S-RAC. All parametersare normalized to the highest value where higher valuesare better in all parameters. Fig. 16 also shows the overallbenefit of caching architectures. Benefit is computed bymultiplying normalized performance, endurance, cost, andpower consumption. uCache and S-RAC, which focus onoptimizing only one parameter have lower benefit com-pared to TICA variants. TICA-A provides the highest benefitsince it considers all mentioned parameters in designingcaching architecture and balances the performance and en-durance, based on the workload characteristics.

6 CONCLUSION

In this paper, we demonstrated that simultaneously em-ploying different SSDs in traditional architectures is notperformance-efficient. In addition, state-of-the-art architec-tures neglected to consider all aspects of the caching archi-tectures. To mitigate such problems, we proposed a three-level caching architecture, called TICA, which by employingRO-SSD and WO-SSD tries to reduce the cost and improvethe performance and power consumption. TICA does nothave any single point of failure offering high reliable I/Ocache architecture. This is while the endurance cost of theproposed architecture is only 4.7% higher than state-of-the-art caching architectures. Additionally, the hardware cost of

0.7

0.75

0.8

0.85

0.9

0.95

1Performance

Endurance

ReliabilityEnergy

Efficiency

GB/$

TICA - WED

Adaptive TICA

uCache

TICA-EF

S-RAC

Fig. 15: Overall Comparison of Caching Architectures(Higher values are better)

TICA is 5% less than conventional architectures. The SSDslifetime is extended by up to 38% compared to single-levelSSD caching architectures. The experimental results demon-strated that our architecture can improve performance andpower consumption compared to previous studies, by up to8% and 28%, respectively.

ACKNOWLEDGMENTS

This work has been partially supported by Iran NationalScience Foundation (INSF) under grant number 96006071 andby HPDS Corp.

12

0.8

0.9

1

1.1

1.2

1.3

TICA−EF TICA−WED uCache S−RAC TICA−A

No

rm

ali

ze

d B

en

efi

t

Fig. 16: Normalized Benefit of Various Caching Architec-tures

REFERENCES

[1] S. Ahmadian, F. Taheri, M. Lotfi, M. Karimi, and H. Asadi, “Inves-tigating power outage effects on reliability of solid-state drives,” into appear in Design, Automation Test in Europe Conference Exhibition(DATE), March 2018.

[2] Storage Networking Industry Association, “Microsoft enterprisetraces,” http://iotta.snia.org/traces/130, accessed: 2015-08-10.

[3] S. Shaw, HammerDB: the open source oracle load test tool, 2012,accessed: 2017-08-10. [Online]. Available: http://www.hammerdb.com/

[4] V. Tarasov, E. Zadok, and S. Shepler, “Filebench: A flexible frame-work for file system benchmarking,” USENIX; login, vol. 41, 2016.

[5] M. Tarihi, H. Asadi, A. Haghdoost, M. Arjomand, and H. Sarbazi-Azad, “A hybrid non-volatile cache design for solid-state drivesusing comprehensive I/O characterization,” IEEE Transactions onComputers (TC), vol. 65, no. 6, pp. 1678–1691, 2016.

[6] X. Wu and A. L. N. Reddy, “Managing storage space in a flashand disk hybrid storage system,” in IEEE International Symposiumon Modeling, Analysis Simulation of Computer and TelecommunicationSystems (MASCOTS), Sept 2009, pp. 1–4.

[7] F. Ye, J. Chen, X. Fang, J. Li, and D. Feng, “A regional popularity-aware cache replacement algorithm to improve the performanceand lifetime of SSD-based disk cache,” in IEEE International Con-ference on Networking, Architecture and Storage (NAS), Aug 2015, pp.45–53.

[8] R. Salkhordeh, S. Ebrahimi, and H. Asadi, “ReCA: an efficientreconfigurable cache architecture for storage systems with onlineworkload characterization,” IEEE Transactions on Parallel and Dis-tributed Systems (TPDS), vol. PP, no. 99, pp. 1–1, 2018.

[9] R. Salkhordeh, H. Asadi, and S. Ebrahimi, “Operating system leveldata tiering using online workload characterization,” The Journalof Supercomputing, vol. 71, no. 4, pp. 1534–1562, 2015.

[10] S. Liu, J. Jiang, and G. Yang, “Macss: A metadata-aware combostorage system,” in Proceedings of the International Conference onSystems and Informatics (ICSAI), May 2012, pp. 919 –923.

[11] M. Lin, R. Chen, J. Xiong, X. Li, and Z. Yao, “Efficient sequentialdata migration scheme considering dying data for HDD/SSDhybrid storage systems,” IEEE Access, vol. 5, pp. 23 366–23 373,2017.

[12] S. Ahmadian, O. Mutlu, and H. Asadi, “ECI-Cache: A high-endurance and cost-efficient I/O caching scheme for virtualizedplatforms,” in in Proceedings of the ACM International Conferenceon Measurement and Modeling of Computer Systems (SIGMETRICS).ACM, 2018.

[13] S. Huang, Q. Wei, D. Feng, J. Chen, and C. Chen, “Improvingflash-based disk cache with lazy adaptive replacement,” ACMTransactions on Storage (TOS), vol. 12, no. 2, pp. 8:1–8:24, Feb. 2016.

[14] R. Santana, S. Lyons, R. Koller, R. Rangaswami, and J. Liu, “ToARC or Not to ARC,” in Proceedings of the 7th USENIX Conferenceon Hot Topics in Storage and File Systems (HotStorage), 2015, pp. 14–14.

[15] R. Appuswamy, D. C. van Moolenbroek, and A. S. Tanenbaum,“Cache, cache everywhere, flushing all hits down the sink: Onexclusivity in multilevel, hybrid caches,” in IEEE 29th Symposiumon Mass Storage Systems and Technologies (MSST), May 2013, pp.1–14.

[16] Y. Liang, Y. Chai, N. Bao, H. Chen, and Y. Liu, “Elastic Queue: Auniversal SSD lifetime extension plug-in for cache replacementalgorithms,” in Proceedings of the 9th ACM International on Systemsand Storage Conference (SYSTOR). ACM, 2016, pp. 5:1–5:11.[Online]. Available: http://doi.acm.org/10.1145/2928275.2928286

[17] Y. Ni, J. Jiang, D. Jiang, X. Ma, J. Xiong, and Y. Wang,“S-RAC: SSD friendly caching for data center workloads,”

in Proceedings of the 9th ACM International on Systems and StorageConference. ACM, 2016, pp. 8:1–8:12. [Online]. Available:http://doi.acm.org/10.1145/2928275.2928284

[18] Z. Fan, D. Du, and D. Voigt, “H-ARC: A non-volatile memorybased cache policy for solid state drives,” in Mass Storage Systemsand Technologies (MSST), June 2014, pp. 1–11.

[19] X. Chen, W. Chen, Z. Lu, P. Long, S. Yang, and Z. Wang, “Aduplication-aware SSD-Based cache architecture for primary stor-age in virtualization environment,” IEEE Systems Journal, vol. 11,no. 4, pp. 2578–2589, Dec 2017.

[20] Z. Chen, N. Xiao, Y. Lu, and F. Liu, “Me-CLOCK:a memory-efficient framework to implement replacement policies for largecaches,” IEEE Transactions on Computers (TC), vol. 65, no. 8, pp.2665–2671, Aug 2016.

[21] L. Tang, Q. Huang, W. Lloyd, S. Kumar, and K. Li, “RIPQ:Advanced photo caching on flash for facebook,” in Proceedings ofthe 13th Usenix Conference on File and Storage Technologies (FAST),2015, pp. 373–386.

[22] Y. Chai, Z. Du, X. Qin, and D. Bader, “WEC: Improving durabilityof ssd cache drives by caching write-efficient data,” IEEE Transac-tions on Computers (TC), vol. PP, no. 99, pp. 1–1, 2015.

[23] J. Levandoski, D. Lomet, and S. Sengupta, “LLAMA: Acache/storage subsystem for modern hardware,” Proceedings ofthe VLDB Endowment, vol. 6, no. 10, pp. 877–888, Aug. 2013.[Online]. Available: http://dx.doi.org/10.14778/2536206.2536215

[24] J. Wang, Z. Guo, and X. Meng, “An efficient design and imple-mentation of multi-level cache for database systems,” in DatabaseSystems for Advanced Applications. Springer International Publish-ing, 2015, pp. 160–174.

[25] C. Yuxia, C. Wenzhi, W. Zonghui, Y. Xinjie, and X. Yang,“AMC: an adaptive multi-level cache algorithm in hybrid storagefilesystems,” Concurrency and Computation: Practice and Experience,vol. 27, no. 16, pp. 4230–4246, 2015. [Online]. Available:https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3530

[26] D. Jiang, Y. Che, J. Xiong, and X. Ma, “uCache: A utility-awaremultilevel SSD cache management policy,” in IEEE 10th Inter-national Conference on High Performance Computing and Communi-cations, IEEE International Conference on Embedded and UbiquitousComputing, Nov 2013, pp. 391–398.

[27] S. K. Yoon, Y. S. Youn, S. J. Nam, M. H. Son, and S. D. Kim,“Optimized memory-disk integrated system with DRAM andnonvolatile memory,” IEEE Transactions on Multi-Scale ComputingSystems, vol. PP, no. 99, pp. 1–1, 2016.

[28] H. Liu and H. H. Huang, “Graphene: Fine-grained IO manage-ment for graph computing,” in Proceedings of the 15th UsenixConference on File and Storage Technologies (FAST). USENIX As-sociation, 2017, pp. 285–299.

[29] S. He, Y. Wang, and X. H. Sun, “Improving performance of parallelI/O systems through selective and layout-aware SSD cache,” IEEETransactions on Parallel and Distributed Systems (TPDS), vol. 27,no. 10, pp. 2940–2952, Oct 2016.

[30] E. Kakoulli and H. Herodotou, “OctopusFS: A distributed filesystem with tiered storage management,” in Proceedings of theACM International Conference on Management of Data (SIGMOD),2017, pp. 65–78.

[31] L. Wu, Q. Zhuge, E. H. M. Sha, X. Chen, and L. Cheng, “BOSS: Anefficient data distribution strategy for object storage systems withhybrid devices,” IEEE Access, vol. 5, pp. 23 979–23 993, 2017.

[32] S. He, Y. Wang, Z. Li, X. H. Sun, and C. Xu, “Cost-aware region-level data placement in multi-tiered parallel I/O systems,” IEEETransactions on Parallel and Distributed Systems (TPDS), vol. 28,no. 7, pp. 1853–1865, July 2017.

[33] D. Arteaga, J. Cabrera, J. Xu, S. Sundararaman, and M. Zhao,“CloudCache: On-demand flash cache management for cloudcomputing,” in Proceedings of the 14th Usenix Conference on File andStorage Technologies (FAST), ser. FAST’16. Berkeley, CA, USA:USENIX Association, 2016, pp. 355–369. [Online]. Available:http://dl.acm.org/citation.cfm?id=2930583.2930610

[34] Z. Shen, F. Chen, Y. Jia, and Z. Shao, “DIDACache: a deepintegration of device and application for flash based key-valuecaching,” in Proceedings of the 15th Usenix Conference on File andStorage Technologies (FAST). USENIX Association, 2017, pp. 391–405.

[35] L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau,“Wisckey: Separating keys from values in ssd-conscious storage,”in Proceedings of the 14th Usenix Conference on File and StorageTechnologies (FAST), 2016, pp. 133–148.

http://www.hammerdb.com/

http://www.hammerdb.com/

http://doi.acm.org/10.1145/2928275.2928286

http://doi.acm.org/10.1145/2928275.2928284

http://dx.doi.org/10.14778/2536206.2536215

https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3530

http://dl.acm.org/citation.cfm?id=2930583.2930610

13

[36] W. Li, G. Jean-Baptise, J. Riveros, G. Narasimhan, T. Zhang, andM. Zhao, “CacheDedup: In-line deduplication for flash caching,”in Proceedings of the 14th Usenix Conference on File and StorageTechnologies (FAST). USENIX Association, 2016, pp. 301–314.

[37] H. Wu, C. Wang, Y. Fu, S. Sakr, K. Lu, and L. Zhu, “A differenti-ated caching mechanism to enable primary storage deduplicationin clouds,” IEEE Transactions on Parallel and Distributed Systems(TPDS), vol. 29, no. 6, pp. 1202–1216, June 2018.

[38] X. Zhang, J. Li, H. Wang, K. Zhao, and T. Zhang, “Reducing solid-state storage device write stress through opportunistic in-placedelta compression,” in Proceedings of the 14th Usenix Conference onFile and Storage Technologies (FAST). USENIX Association, 2016,pp. 111–124.

[39] M. Saxena and M. M. Swift, “Design and prototype of a solid-statecache,” Transactions on Storage (TOS), vol. 10, no. 3, pp. 1–34, 2014.

[40] S. Lee, M. Liu, S. Jun, S. Xu, J. Kim, and Arvind, “Application-managed flash,” in Proceedings of the 14th Usenix Conference on Fileand Storage Technologies (FAST), 2016, pp. 339–353.

[41] C. Lee, D. Sim, J. Hwang, and S. Cho, “F2FS: A new file system forflash storage,” in Proceedings of the 13th Usenix Conference on Fileand Storage Technologies (FAST), 2015, pp. 273–286.

[42] Y. Jin, H. Tseng, Y. Papakonstantinou, and S. Swanson, “KAML: Aflexible, high-performance key-value SSD,” in IEEE InternationalSymposium on High Performance Computer Architecture (HPCA),2017, pp. 373–384.

[43] E. Rho, K. Joshi, S.-U. Shin, N. J. Shetty, J. Hwang, S. Cho, D. D.Lee, and J. Jeong, “FStream: Managing flash streams in the filesystem,” in Proceedings of the 16th Usenix Conference on File andStorage Technologies (FAST), 2018, pp. 257–264.

[44] Q. Xia and W. Xiao, “High-performance and endurable cachemanagement for flash-based read caching,” IEEE Transactions onParallel and Distributed Systems (TPDS), vol. 27, no. 12, pp. 3518–3531, Dec 2016.

[45] J. Wan, W. Wu, L. Zhan, Q. Yang, X. Qu, and C. Xie, “DEFT-Cache:A cost-effective and highly reliable SSD cache for RAID storage,”in IEEE International Parallel and Distributed Processing Symposium(IPDPS), May 2017, pp. 102–111.

[46] A. Tavakkol, M. Sadrosadati, S. Ghose, J. Kim, Y. Luo, Y. Wang,N. M. Ghiasi, L. Orosa, J. Gmez-Luna, and O. Mutlu, “FLIN:enabling fairness and enhancing performance in modern NVMesolid state drives,” in ACM/IEEE 45th Annual International Sympo-sium on Computer Architecture (ISCA), June 2018, pp. 397–410.

[47] N. Elyasi, M. Arjomand, A. Sivasubramaniam, M. T. Kandemir,C. R. Das, and M. Jung, “Exploiting intra-request slack to im-prove SSD performance,” in Proceedings of the 22nd InternationalConference on Architectural Support for Programming Languages andOperating Systems (ASPLOS). ACM, 2017, pp. 375–388.

[48] H. Kim, D. Shin, Y. H. Jeong, and K. H. Kim, “SHRD: Improvingspatial locality in flash storage accesses by sequentializing inhost and randomizng in device,” in Proceedings of the 15th UsenixConference on File and Storage Technologies (FAST), 2017, pp. 271–283.

[49] Q. Li, L. Shi, C. J. Xue, K. Wu, C. Ji, Q. Zhuge, and E. H.-M. Sha,“Access characteristic guided read and write cost regulation forperformance improvement on flash memory,” in Proceedings of the14th Usenix Conference on File and Storage Technologies (FAST), 2016,pp. 125–132.

[50] J. Meza, Q. Wu, S. Kumar, and O. Mutlu, “A large-scale studyof flash memory failures in the field,” in Proceedings of the ACMSIGMETRICS International Conference on Measurement and Modelingof Computer Systems. ACM, 2015, pp. 177–190.

[51] V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry,“The evicted-address filter: A unified mechanism to addressboth cache pollution and thrashing,” in Proceedings of the21st International Conference on Parallel Architectures and CompilationTechniques (PACT), 2012, pp. 355–366. [Online]. Available:http://doi.acm.org/10.1145/2370816.2370868

[52] A. D. Brunelle, “Block I/O layer tracing: blktrace,” in Gelato-Itanium Conference and Expo (gelato-ICE), 2006.

[53] J. Lienig and H. Bruemmer, Fundamentals of Electronic SystemsDesign. Springer International Publishing, 2017.

[54] E. Dubrova, Fault-Tolerant Design. Springer Publishing Company,Incorporated, 2013.

[55] I. Narayanan, D. Wang, M. Jeon, B. Sharma, L. Caulfield, A. Siva-subramaniam, B. Cutler, J. Liu, B. Khessib, and K. Vaid, “SSDfailures in datacenters: What? when? and why?” in Proceedingsof the 9th ACM International on Systems and Storage Conference(SYSTOR), 2016, pp. 7:1–7:11.

[56] B. Schroeder, R. Lagisetty, and A. Merchant, “Flash reliability inproduction: The expected and the unexpected,” in 14th USENIXConference on File and Storage Technologies (FAST), 2016, pp. 67–80.

[57] M. Kishani and H. Asadi, “Modeling impact of human errors onthe data unavailability and data loss of storage systems,” IEEETransactions on Reliability (TR), vol. 67, no. 3, pp. 1111–1127, Sept2018.

Reza Salkhordeh received the B.Sc. degree incomputer engineering from Ferdowsi Universityof Mashhad in 2011, and M.Sc. degree in com-puter engineering from Sharif University of Tech-nology (SUT) in 2013. He has been a member ofData Storage, Networks, and Processing (DSN)lab since 2011. He was also a member of IranNational Elites Foundation from 2012 to 2015.He has been the director of Software division inHPDS corporation since 2015. He is currently aPh.D. candidate at SUT. His research interests

include operating systems, solid-state drives, memory systems, anddata storage systems.

Mostafa Hadizadeh received the B.Sc. degreein computer engineering from Shahid BeheshtiUniversity (SBU), Tehran, Iran, in 2016. He iscurrently pursuing the M.Sc. degree in computerengineering at Sharif University of Technology(SUT), Tehran, Iran. He is a member of DataStorage, Networks, and Processing (DSN) Lab-oratory from 2017. From December 2016 to May2017, he was a member of Dependable Sys-tems Laboratory (DSL) at SUT. His researchinterests include computer architecture, memory

systems, dependable systems and systems on chip.

http://doi.acm.org/10.1145/2370816.2370868

14

Hossein Asadi (M’08, SM’14) received theB.Sc. and M.Sc. degrees in computer engineer-ing from the SUT, Tehran, Iran, in 2000 and2002, respectively, and the Ph.D. degree in elec-trical and computer engineering from Northeast-ern University, Boston, MA, USA, in 2007.

He was with EMC Corporation, Hopkinton,MA, USA, as a Research Scientist and SeniorHardware Engineer, from 2006 to 2009. From2002 to 2003, he was a member of the De-pendable Systems Laboratory, SUT, where he

researched hardware verification techniques. From 2001 to 2002, hewas a member of the Sharif Rescue Robots Group. He has been withthe Department of Computer Engineering, SUT, since 2009, wherehe is currently a tenured Associate Professor. He is the Founder andDirector of the Data Storage, Networks, and Processing (DSN) Labora-tory, Director of Sharif High-Performance Computing (HPC) Center, theDirector of Sharif Information and Communications Technology Center(ICTC), and the President of Sharif ICT Innovation Center. He spentthree months in the summer 2015 as a Visiting Professor at the Schoolof Computer and Communication Sciences at the Ecole Poly-techniqueFederele de Lausanne (EPFL). He is also the co-founder of HPDS corp.,designing and fabricating midrange and high-end data storage systems.He has authored and co-authored more than eighty technical papersin reputed journals and conference proceedings. His current researchinterests include data storage systems and networks, solid-state drives,operating system support for I/O and memory management, and recon-figurable and dependable computing.

Dr. Asadi was a recipient of the Technical Award for the Best RobotDesign from the International RoboCup Rescue Competition, organizedby AAAI and RoboCup, a recipient of Best Paper Award at the 15th CSIInternational Symposium on Computer Architecture & Digital Systems(CADS), the Distinguished Lecturer Award from SUT in 2010, the Dis-tinguished Researcher Award and the Distinguished Research InstituteAward from SUT in 2016, and the Distinguished Technology Award fromSUT in 2017. He is also recipient of Extraordinary Ability in Sciencevisa from US Citizenship and Immigration Services in 2008. He hasalso served as the publication chair of several national and internationalconferences including CNDS2013, AISP2013, and CSSE2013 duringthe past four years. Most recently, he has served as a Guest Editorof IEEE Transactions on Computers, an Associate Editor of Microelec-tronics Reliability, a Program Co-Chair of CADS2015, and the ProgramChair of CSI National Computer Conference (CSICC2017).

1 an efﬁcient hybrid i/o caching architecture …1 an efﬁcient hybrid i/o caching architecture...

Documents