Download - Complete Latest Revised
Department of EIT Hochschule Rosenheim Page 1
Table of Contents
1 . INTRODUCTION TO MEMORY STORAGE DRIVES ......................................................................................... 7
1.1 AS IT EXISTS TODAY ............................................................................................................................... 7 1.2 SOLID STATE DRIVES – A BRIEF OVERVIEW ........................................................................................... 7 1.3 THESIS OBJECTIVE ................................................................................................................................. 8 1.4 SUMMARY OF CHAPTERS ....................................................................................................................... 8
2 . SYSTEM MEMORY OVERVIEW ..................................................................................................................... 9
2.1 SYSTEM ARCHITECTURE ........................................................................................................................ 9 2.2 MEMORY ............................................................................................................................................. 11 2.3 STORAGE HIERARCHY.......................................................................................................................... 13 2.4 MEMORY CONTROLLER ....................................................................................................................... 17 2.5 SUMMARY ............................................................................................................................................ 18
3 . MAGNETIC DISK STORAGE ........................................................................................................................ 19
3.1 HARD DISK DRIVES ............................................................................................................................. 19 3.2 HARD DISK DRIVE SYSTEM ARCHITECTURE .......................................................................................... 23 3.3 HARD DISK DRIVE INTERFACES ........................................................................................................... 25 3.4 EXTERNAL HARD DISK DRIVES ........................................................................................................... 28 3.5 FUTURE OF HARD DISK DRIVES ........................................................................................................... 29
4 . SOLID STATE DRIVES ................................................................................................................................. 32
4.1 FLASH MARKET DEVELOPMENT .......................................................................................................... 32 4.2 SOLID STATE DRIVES ........................................................................................................................... 33 4.3 PHYSICAL LAYOUT ............................................................................................................................... 38 4.4 FLASH TRANSLATION LAYER (FTL) .................................................................................................... 40 4.5 SOLID STATE DRIVE INTERFACES ........................................................................................................ 44 4.6 SSD MARKET ...................................................................................................................................... 45 4.7 FUTURE ................................................................................................................................................ 46 4.8 SUMMARY ............................................................................................................................................ 47 4.9 TYPICAL CHARACTERISTICS OF HDD AND SSD ................................................................................... 47
5 . PERFORMANCE: HDD VS SSD .................................................................................................................... 49
5.1 BENCHMARK ........................................................................................................................................ 49 5.2 BENCHMARK ENVIRONMENT ................................................................................................................ 50 5.3 TPC-H BENCHMARK ........................................................................................................................... 50 5.4 ENERGY EFFICIENCY TEST ................................................................................................................... 54 5.5 HD TUNE BENCHMARK ....................................................................................................................... 56 5.6 SUMMARY ............................................................................................................................................ 58
6 . BETTER INVESTMENT: SSD OR ADDITIONAL RAM? .................................................................................... 59
6.1 BENCHMARK ENVIRONMENT ............................................................................................................... 59 6.2 RESULTS .............................................................................................................................................. 60 6.3 CONCLUSION ....................................................................................................................................... 61 6.4 BENCHMARK PROBLEMS ...................................................................................................................... 62
7 . REVERSE ENGINEERING ............................................................................................................................. 63
7.1 INTEL X25-EXTREME ........................................................................................................................... 63 7.2 CRUCIAL REAL C300 ........................................................................................................................... 65
Department of EIT Hochschule Rosenheim Page 2
7.3 SUMMARY ............................................................................................................................................ 68 7.4 CONCLUSION ....................................................................................................................................... 69
8 . DESIGNING OPTIMAL PERFORMANCE BASED SSD SYSTEM LEVEL ARCHITECTURE AND ITS CONTROLLER
COST ESTIMATION ......................................................................................................................................... 70
8.1 COST ESTIMATION OF CONTROLLER FOR SYSTEM DESIGNED TO MEET PERFORMANCE SPECIFICATION . 70 8.2 IMPLEMENTATION FACTORS IN OPTIMIZATION TOOL ............................................................................ 71 8.3 OPTIMIZATION TOOL CONSISTENCY TEST FOR CONTROLLER SIZE ......................................................... 76 8.4 HINTS TO USE TOOL FOR OPTIMAL SYSTEM DESIGN AND CONTROLLER COST ESTIMATION: .................. 77
9 . SUMMARY ................................................................................................................................................ 78
9.1 CONCLUSION ....................................................................................................................................... 78 9.2 FUTURE WORK ON SSD ........................................................................................................................ 79
APPENDIX A...............................................................................................................................................80
APPENDIX B...............................................................................................................................................83
Department of EIT Hochschule Rosenheim Page 3
List of Figures
Figure 1: View of personal computer system [25] _______________________________________________ 10
Figure 2: Interconnections of memory components ______________________________________________ 10
Figure 3: Forms of storage, divided according to their distance from the CPU [19] ____________________ 15
Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32] ___________________ 17
Figure 5: Memory controller hub ____________________________________________________________ 17
Figure 6 : Hard Disk Drive [27] ____________________________________________________________ 20
Figure 7 : Representations of sectors, blocks and tracks on platter surface [27] _______________________ 21
Figure 8 : Representation of Hard Disk Drive as blocks __________________________________________ 23
Figure 9 : Role of Cache buffer in Hard disk ___________________________________________________ 24
Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28] __________________________ 26
Figure 11: A single-drop 68-conductor SCSI ribbon cable [28] ____________________________________ 27
Figure 12: Close-ups of SATA cable and its slots on a motherboard [28] _____________________________ 28
Figure 13: A Seagate 1TB external hard drive [28] ______________________________________________ 29
Figure 14 Moving Parts in Hard Disk Drives [29] ______________________________________________ 30
Figure 15: Evolution in density of NAND flash memory __________________________________________ 33
Figure 16: HDD and SSD [30] _____________________________________________________________ 34
Figure 17: NAND flash memory chip [30] _____________________________________________________ 35
Figure 18: Flash memory overwrite mechanism ________________________________________________ 36
Figure 19 : A generic overview of a Flash memory bank [5] _______________________________________ 37
Figure 20 : Components of SSD _____________________________________________________________ 39
Figure 21: organization of conventional SSD __________________________________________________ 40
Figure 22 : Address translation in solid state drive [8] ___________________________________________ 41
Figure 23 : Internal structure of solid state drive [6] ____________________________________________ 42
Figure 24 : X4 PC-Express card with NAND flash chips on it [31] __________________________________ 44
Figure 25 : SSD Market development _________________________________________________________ 46
Figure 26 : TPC-H benchmark application outline ______________________________________________ 52
Figure 27 : TPCH benchmark performance results ______________________________________________ 53
Figure 28: Comparison for energy efficiency ___________________________________________________ 55
Figure 29: Read speed comparison __________________________________________________________ 57
Figure 30: Access time comparison __________________________________________________________ 58
Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM _ 60
Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM _____________ 61
Figure 33: Intel X25 – Extreme SSD _________________________________________________________ 64
Figure 34: Controller from Marvell on Intel X25-E SSD board ____________________________________ 65
Figure 35: Crucial Real C300 SSD __________________________________________________________ 66
Figure 36: Controller from Marvell on Crucial Real C300 SSD board _______________________________ 67
Figure 37: Design tool outlook ______________________________________________________________ 74
Figure 38: Warning- System is over designed or under designed with respect to performance specified. ____ 74
Figure 39: Cost calculation tool _____________________________________________________________ 75
Figure 40 : Process flow for flip chip BGA and wire bonded BGA packaging. ________________________ 73
Figure 41 : Controller size for the system with SATA 2.0 interface __________________________________ 76
Figure 42 : Controller size for the system with SATA 3.0 interface __________________________________ 76
Figure 43: procedure to create application package _____________________________________________ 81
Department of EIT Hochschule Rosenheim Page 4
List of Tables
Table 4-1 SLC vs MLC [9] .................................................................................................................................. 35
Table 5-1 Overview of drives in Benchmark environment .................................................................................... 50
Table 6-1 Overview of drives in Benchmark environment .................................................................................... 59
Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68
Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68
Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD .................................................. 69
Table 8-1 System Interface types and their performances .................................................................................... 71
Table 8-2 Buffer Cache types and their performances ......................................................................................... 71
Table 8-3 SSD Controller Interface signals.......................................................................................................... 72
Department of EIT Hochschule Rosenheim Page 5
Abbreviations:
Acronym Definition
BA Bank Address
BGA Ball Grid Array
CS Chip Select
CK Clock Enable
CKE Clock Enable Rank
CAS Column Address Strobe
CLK Clock
DRAM Dynamic Random Access Memory
DQ Data Bus
DQS Data Strobe
DM Data Mask
MA Memory Address
MLC Multi Level Cell
RST Reset
RAS Row Address Strobe
REF_CLK_P/N PCI Express Clock
SATA Serial ATA
SSD Solid State Drive
SLC Single Level Cell
PATA Parallel ATA
PET_P/N PCI Express differential signal
HDD Hard Disk Drive
ODT On-Die Termination
UART Universal Asynchronous Receiver Transmitter
WE Write Enable
Department of EIT Hochschule Rosenheim Page 6
Department of EIT Hochschule Rosenheim Page 7
Chapter 1
1 . Introduction to Memory Storage Drives
1.1 As it exists today
Fifty years since the first commercial drive, the disk drive has been the prevailing storage
media in almost every computer system to date. Surprisingly, despite the technological
improvements in storage capacity and operational performance, modern disk drives are based
on the same physical and electromagnetic properties. With the rapidly changing technologies
and innovations, electronic storage devices like computer hard disks are becoming more and
more sophisticated in design as well as performance. Even though traditional hard disk drives
(HDD) are being threatened to a certain extent by flash based storage devices, they are still
the most popular form of storage for computing today. Hard drives are used in everything
from servers to desktops and notebooks and offer higher storage capacities.
1.2 Solid State Drives – A brief overview
Solid State Drives cost significantly more per unit capacity than their rotating counterparts as
of today, but there are numerous applications where they can be applied to great benefit. For
example, in transaction-processing systems, disk capacity is often wasted in order to improve
operation throughput. In such configurations, many small (cost inefficient) rotating disks are
deployed to increase I/O parallelism. Large SSDs, suitably optimized for random read and
write performance, could effectively replace whole farms of slow, rotating disks. Currently,
small SSDs are starting to appear in laptops because of their reduced power-profile and
reliability such as shockproof in portable environments. As the cost of flash continues to
decline, the potential application space for solid state disks will certainly continue to grow.
Solid State Drives are amongst the most popular computer hard disks which are available in
the electronic hardware market. It would be interesting to know what is so special about these
computer drives and all the more what make them too special!
Department of EIT Hochschule Rosenheim Page 8
1.3 Thesis Objective
Performance evaluation of hard disk drives and solid state drives by making an
extensive comparison followed by benchmarking,
Analyzing the architecture of an SSD controller and reverse engineering,
Finally, developing a tool which suggests the most optimum and cost efficient system
level SSD architecture based on a selected interface.
1.4 Summary of Chapters
Chapter 2 gives an overview of memory design architectures in traditional computer systems
and a glance to the storage hierarchy.
Chapter 3 goes into the details about hard disk drives, its architecture, physical structure and
operation followed by a discussion on the different types of host interfaces used by hard disk
drives today.
Chapter 4 is dedicated to the flash memory technology which gives a taste of solid state
drives technology, its architecture, working and advantages over its counterpart, the hard disk
drive.
In chapter 5 the performance of magnetic disks and SSD is analysed in different scenarios.
Here, it is aimed to identify the main characteristics and try to point out possible weaknesses
along with a solution to these drawbacks.
Chapter 6 provides content regarding reverse engineering of the solid state drive controller
and its structure at package level. Here, the different factors responsible for varied
performance in solid state drives are listed.
Chapter 7 gives insights about deciding between SSD and more RAM for performance
enhancements.
Chapter 8 describes about the tool that was developed which suggests a system level solid
state drive architecture for optimum performance with an estimated controller cost based on
selected host interface and.
Chapter 9 summarizes the results and discussion on the future of solid state drives.
Department of EIT Hochschule Rosenheim Page 9
Chapter 2
2 . System Memory Overview
2.1 System Architecture
The system architecture determines the main hardware components that make up the physical
computer system and the way in which they are interconnected. The main components
required for a computer system are listed below:
Central processing unit (CPU),
Random access memory (RAM),
Read-only memory (ROM),
Input / output (I/O) ports,
The system bus,
A power supply unit (PSU).
In addition to these core components, in order to extend the functionality of the system and
provide a computing environment so that a human operator interacts with much more ease,
additional components are required which could include:
Secondary storage devices (e.g. disk drives),
Input devices (e.g. keyboard, mouse, scanner)
Output devices (e.g. display adapter, monitor, printer)
The core system components are mounted on a backplane, more commonly referred to as a
mainboard (or motherboard). The mainboard is a relatively large printed circuit board that
provides the electronic channels (buses) that carry data and control signals between the
various components, as well as the necessary interfaces (in the form of slots or sockets) to
allow the CPU, Memory cards and other components to be plugged into the system. In most
cases, the ROM chip is built in to the mainboard, and the CPU and RAM must be compatible
Department of EIT Hochschule Rosenheim Page 10
with the mainboard in terms of their physical format and electronic configuration. Internal
I/O ports are provided on the mainboard for devices such as internal disk drives and optical
drives.
Figure 1: View of personal computer system [25]
The relationship between the elements that make up the core of the system is illustrated
below.
Figure 2: Interconnections of memory components
The data flows back and forth between the processor and the memory over shared electrical
conduits called buses which carry address, data, and control signals. Depending on the
particular bus design, data and address signals can share the same set of wires, or they can
use different sets.
Department of EIT Hochschule Rosenheim Page 11
External I/O ports are also provided on the mainboard to enable the system to be connected to
external peripheral devices such as the keyboard, mouse, video display unit, and audio
speakers. Both the video adaptor and audio card may be provided on-board (i.e. built in to the
mainboard), or as separate plug-in circuit boards that are mounted in an appropriate slot on
the mainboard. The mainboard also provides much of the control circuitry required by the
various system components, allowing the CPU to concentrate on its main role, which is to
execute programs. Memory is the most important integral part of a computational system. In
this chapter, the focus is on memory organization as a clear understanding of these ideas is
vital for the analysis of system performance.
2.2 Memory
Memory lies at the heart of the stored-program computer. The system memory is the place
where the computer holds current programs and data that are in use. Although memory is
used in many different forms around modern PC systems, it can be divided into two essential
types
Read-only-memory, ROM.
Random access memory, RAM
2.2.1 ROM
ROM refers to non volatile memory which means that it always retains data even after power
is shut. In fact, it needs very little charge to retain its memory. It is used to store permanent or
semi-permanent data that persists even while the system is turned off. It is usually used to
store small start up programs like BIOS which is used to bootstrap the computer. There are
many extended types of ROM namely -
PROM: Programmable ROM
EPROM: Erasable PROM
EEPROM: Electrically Erasable PROM (Flash)
Department of EIT Hochschule Rosenheim Page 12
Flash memory is essentially EEPROM with the added benefit that data can be written or
erased in blocks, removing the one-byte-at-a-time limitation. This makes flash memory faster
than EEPROM!
2.2.2 RAM
RAM refers to volatile memory which means that the data is lost once the power is turned
off. There are two types of RAM, Static RAM (SRAM) and Dynamic RAM (DRAM).
SRAM: It consists of circuits similar to the D flip-flop. Therefore, it doesn’t need to be
refreshed, unlike its counterpart, the DRAM. SRAM is faster and much more expensive than
DRAM and is used to build cache memory.
DRAM: It stores each bit of data in a separate capacitor within an integrated circuit. The
capacitor can be either charged or discharged; these two states are taken to represent the two
values of a bit, conventionally called '0' and '1'. Capacitors leak the charge stored in them
slowly over time and thus must be refreshed every few milliseconds to prevent data loss.
DRAM is ―cheap‖ memory owing to its simple design compared to SRAM. Designers use
DRAM as it is much denser, uses less power and generates less heat than SRAM. For these
reasons, DRAM's are preferred over SRAM's to be used to build the main memory.
There are many kinds of DRAM memories and new kinds appear in the market with
regularity as manufacturers attempt to keep up with rapidly increasing processor speeds. Each
design is based on the conventional DRAM cell, with optimizations that improve the speed
with which the basic DRAM cells can be accessed.
Synchronous DRAM (SDRAM)
SDRAM has a synchronous interface, meaning that it waits for a clock signal before
responding to control inputs and is therefore synchronized with the computer's system
bus. The clock is used to drive an internal finite state machine that pipelines incoming
instructions. This allows the chip to have a more complex pattern of operation, enabling
higher speeds.
Department of EIT Hochschule Rosenheim Page 13
Double Data Rate SDRAM (DDR SDRAM)
DDR SDRAM has the same working principle. The difference is that DDR SDRAM
doubles the bandwidth by double-pumping (transferring data on both the rising and the
falling edge of the clock signal, without increasing the clock frequency.
DDR2 SDRAM
DDR2 is the next generation of memory developed after DDR. DDR2 increased the data
transfer rate referred to as bandwidth by increasing the operational frequency to match
the high FSB frequencies and by doubling the pre-fetch buffer rate. Like DDR SDRAM,
DDR2 transfers data on both the rising and the falling edge of the clock signal. The trade
off is that internal operations are carried out at only half the clock rate!
DDR3 SDRAM
DDR3 is the successor to DDR2. DDR3 increased the pre-fetch buffer size to 8 bits and
increased the operating frequency once again resulting in high data transfer rates than its
predecessor DDR2. Like DDR2 SDRAM, DDR3 transfers data on both the rising and the
falling edge of the clock signal, although internal operations are limited to only a quarter
of the clock rate!
Rambus DRAM (RDRAM)
This is an alternative proprietary technology with a higher maximum bandwidth than
DDR SDRAM. Compared to other contemporary standards, Rambus shows a slight
increase in latency, heat output, manufacturing complexity, and cost.
Video RAM (VRAM)
VRAM has two ports namely, DRAM port and video port. The second port, the video
port, is typically read-only and is dedicated to providing a high bandwidth data channel
for the graphics chipset. This is used in the frame buffers of graphics systems.
2.3 Storage Hierarchy
Storage hierarchy refers to the different types of memory devices and equipment configured
into an operational computer system to provide the necessary attributes of storage capacity,
speed, access time, and cost to make a cost-effective practical system.
Department of EIT Hochschule Rosenheim Page 14
In practice, almost all computers use a variety of memory types, organized in a storage
hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower
a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from
the CPU. This traditional division of storage to primary, secondary, tertiary and off-line
storage is also guided by cost per bit.
2.3.1 Primary storage
Primary storage or the commonly referred Main Memory is the memory which is
directly accessible to the CPU. The CPU continuously reads instructions stored there
and executes them as required. Any data actively operated on is also stored there in
uniform manner.
Besides main large-capacity RAM, Primary storage consists of two additional sub-
layers namely processor registers and processor cache as shown in the Figure 3
Processor registers are located inside the processor. Each register typically holds a
word of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic
unit to perform various calculations or other operations on this data (or with the help
of it). Registers are the fastest of all forms of computer data storage.
Processor cache is an intermediate stage between ultra-fast registers and much slower
main memory. It's introduced solely to increase performance of the computer. Most
actively used information in the main memory is just duplicated in the cache memory,
which is faster, but of much lesser capacity. On the other hand it is much slower, but
much larger than processor registers. Multi-level hierarchical cache setup is also
commonly used—primary cache being smallest, fastest and located inside the
processor; secondary cache being somewhat larger and slower.
secondary cache is the L2 cache, usually contained on the motherboard. However,
more and more chip makers are planning to put this cache on board the processor
itself. The benefit is that it will then run at the same speed as the processor, and costs
less to put on the chip than to set up a bus and logic externally from the processor.
The hierarchy continues and it is referred to as L3 cache. This cache used to be the L2
cache on the motherboard, but now that some processors include L1 and L2 cache on
Department of EIT Hochschule Rosenheim Page 15
the chip, it becomes L3 cache. Usually, it runs slower than the processor, but faster
than main memory.
Random-Access Memory It is small-sized, but quite expensive at the same time. (The
particular types of RAM used for primary storage are also volatile, i.e. they lose the
information when not powered).
Main memory is directly or indirectly connected to the central processing unit via a
memory bus. It is actually two buses: an address bus and a data bus. The CPU firstly
sends a memory address that indicates the desired location of data. Then it reads or
writes the data itself using the data bus. Additionally, a memory management unit
(MMU) is a small device between CPU and RAM recalculating the actual memory
address, for example to provide an abstraction of virtual memory or other tasks.
Figure 3: Forms of storage, divided according to their distance from the CPU [19]
Department of EIT Hochschule Rosenheim Page 16
2.3.2 Secondary Storage
Secondary storage is also known as external memory or auxiliary storage. The term
'secondary' refers to the inability of the CPU to access it directly. The data in the
secondary storage is accessed by the CPU through intermediary devices like the
processor cache. The computer uses its secondary storage via the various input/ output
channels. Secondary storage is non-volatile which means it does not lose the data
when the device is powered down. Per unit, it is typically also two orders of
magnitude less expensive than primary storage. Consequently, modern computer
systems typically have two orders of magnitude more secondary storage than primary
storage and data is kept for a longer time there. In modern computers, hard disk drives
are commonly used as secondary storage.
2.3.3 Offline Storage
Offline Storage is where removable types of storage media sit such as tape cartridges
and optical disc such as CD and DVD. Offline storage is can be used to transfer data
between systems but also allow for data to be secured offsite to ensure companies
always have a copy of valuable data in the event of a disaster.
2.3.4 Tertiary Storage
Tertiary Storage is mainly used as backup and archival of data and although based on
the slowest devices can be classed as the most important in terms of data protection
against a variety of disasters that can affect an IT infrastructure. Most devices in this
segment are automated via robotics and software to reduce management costs and risk
of human error and consist primarily of disk & tape based back up devices.
Department of EIT Hochschule Rosenheim Page 17
Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32]
*The values are approximated for illustration
2.4 Memory Controller
The memory controller is a digital circuit which manages the flow of data going to and
from the main memory. It can be a separate chip or integrated into another chip, such as
on the die of a microprocessor. This is also called a Memory Chip Controller (MCC).
Figure 5: Memory controller hub
The memory controller scans for the type and speed of the RAM connected. It also
determines the maximum size of each individual memory module and the overall memory
capacity of the system.
Memory controllers contain the logic necessary to read, write and refresh the main
memory. Considering DRAM for example, Reading and writing to DRAM is performed by
Department of EIT Hochschule Rosenheim Page 18
selecting the row and column data addresses of the DRAM as the inputs to the
multiplexer circuit, where the de-multiplexer on the DRAM uses the converted inputs to
select the correct memory location and return the data, which is then passed back through a
multiplexer to consolidate the data in order to reduce the required bus width for the operation.
Bus width is the number of parallel lines available to communicate with the memory cell.
Memory controller bus widths range from 8 to 64-bit. In complicated systems, memory
controllers are operated in parallel such as a four 64-bit bus operating in parallel, though
some are designed to operate in "gang mode" where two 64-bit memory controllers can be
used to access a 128-bit memory device.
2.5 Summary
In this chapter, an introduction to different memory technologies in a computer system is
summarised. System memory hierarchy has been closely analysed which gives a much better
idea, how to choose different storage technology based on type and size suitably. In coming
chapters, more insights on current technological trends used for secondary storage are given.
In next chapter, the focus is on current state of secondary storage, represented by magnetic
disks also called as hard disk drives.
Department of EIT Hochschule Rosenheim Page 19
Chapter 3
3 . Magnetic Disk Storage
As computing capacity increases, so does the need for Secondary storage. The most
important device of this class is the Hard Disk Drive (HDD), which is based on magnetic
principles for permanently storing information. HDDs have cost per byte at least two orders
of magnitude smaller than that of DRAM, making them suitable for storing vast amounts of
data. Hence, hard disk drives are used as secondary memory in most computer systems.
In this chapter, an insight on current state of disk storage, represented by magnetic disks is
given in section 3.1
3.1 Hard Disk Drives
The hard disk drive is by far the most common secondary storage device in use today. Being
in use for half a century, hard disk drives are today considered very mature, and have seen
many major improvements. Hard Disk Drives (HDD) are storage devices containing one or
more rotating platters made out of a non-magnetic material and are coated with a thin layer of
magnetic material. Small sections of this material are manipulated into different magnetic
states, making it possible to store data. Magnetic disks have had a great ability to scale
capacity and continue to do so today. Internal view of a magnetic disk can be seen in
Figure 6.
3.1.1 Physical layout
Hard disk drives are called so because of rotating magnetic platters in them which are used
for storage. The rotating platters in magnetic disks sometimes use both sides for storage. Each
of these is divided into sectors and tracks. The intersection of a single block and a single track
makes up a block. As seen in Figure 7, tracks on the outer part of the disk platter are made up
of more sectors. This is due to the fact that the surface will pass faster under the disk head,
also more surface mean allow to store more data in these tracks. These different sections are
called as zones. To get a higher data capacity in a disk, several platters are put together in a
Department of EIT Hochschule Rosenheim Page 20
spindle. The disk arm will have a separate head for each surface, and is able to write to more
sectors without seeking to a different track. The same tracks across all surfaces are called a
cylinder. Having cylinders will make it possible to increase read and write operations, as the
disk arm can perform operations on multiple surfaces without needing to move to different
position.
Figure 6 : Hard Disk Drive [27]
3.1.2 Working Principle
The platters are made from a non-magnetic material and are coated with a thin layer of
magnetic material. Read-and-write heads are positioned on top of the disks. The platters are
spun at very high speeds with a motor. A typical hard drive has two electric motors, one to
spin the disks and one to position the read/write head assembly. Information is examined or
altered on the platter as it rotates past the read/write heads. The read-and-write head can
detect and modify the magnetization of the ferromagnetic material immediately under it.
Department of EIT Hochschule Rosenheim Page 21
3.1.3 Disk access time
Disk access time in magnetic disks is made up of three different operations. The time the
different operations take will vary on position of disk head, where in the rotation the disk
surface is and physical abilities of the disk. Disks read and write data in sector-sized blocks.
The access time for a sector has three main components:
I. Seek time: To read the contents of some target sector, the arm initially positions the
head over the track that contains the target sector. The time required to move the arm
is called the seek time. The seek time, Tseek, depends on the previous position of the
head and the speed that the arm moves across the surface. The average seek time in
modern drives, Tavg seek, measured by taking the mean of several thousand seeks to
random sectors.
II. Rotational latency: this depends on rotational speed of the disk (RPM). Once the
head is in position over the track, the drive waits for the first bit of the target sector to
pass under the head. The performance of this step depends on both the position of the
surface when the head arrives at the target sector and the rotational speed of the disk.
In the worst case, the head just misses the target sector and waits for the disk to make
a full rotation. Thus, the maximum rotational latency is given by
TMax Rotation = 1/RPM
Figure 7 : Representations of sectors, blocks and tracks on platter surface [27]
Department of EIT Hochschule Rosenheim Page 22
III. Transfer time : When the first bit of the target sector is under the head, the drive can
begin to read or write the contents of the sector. The transfer time for one sector
depends on the rotational speed and the number of sectors per track. Thus, the
average transfer time for one sector can be roughly estimated as
With these characteristics, the seek time and rotational delay becomes a significant part of a
random read or writes operation. For sequential operations, the disk will be able to work on
entire tracks/cylinders at a time, continuing with neighbouring tracks/cylinders. Doing
sequential read will, because of a short physical distance between the locations of data,
minimize time used on seeks, resulting in an overall lower access time for the data.
3.1.4 Addressing
The location of a specific sector is referenced using its cylinder number, head number and
sector number (this addressing scheme is often abbreviated to CHS). Indeed, the total number
of sectors on the drive could be calculated by multiplying the number of cylinders by the
number of read/write heads, and then multiplying the result by the number of sectors per
track. Since the introduction of zoned bit recording (as mentioned above, this is a drive
geometry in which the number of sectors per track is smaller at the centre of the disk) this
calculation can no longer be used. The way in which sectors are addressed has also become
more abstract, relieving the operating system software of the need to know about physical
drive geometry. Note that sectors that are logically sequential are not necessarily physically
contiguous. After reading a sector, there may be a small delay before the drive controller is
ready to read another sector. Sectors that are logically sequential may therefore be spaced at
discrete intervals on the disk to give the drive controller time to get ready to read the next
sector - a technique known as interleaving. If an interleave factor of 3:1 were used for
example, it would take three full rotations for the controller to read all of the sectors on a
single track. Thanks to advances in technology, most modern hard drives do not need to use
interleaving.
Modern hard disk drives use logical block addressing (LBA), a simple linear
addressing scheme in which each sector is given an integer index number, starting with 0.
The drive controller translates each logical block address into a cylinder, head and sector
TAverage transfer = (1/RPM) x (1/average number of sectors/track)
Department of EIT Hochschule Rosenheim Page 23
number in order to obtain the physical location of the sector on disk. The maximum number
of sectors that can be addressed is dependent on the number of bits used for the logical block
address.
3.2 Hard disk drive system architecture
The system level design of hard disk drive is characterized by the use of a few highly-
integrated chips; their interconnection is represented as a block diagram in Figure 8.
As one can see in the picture, the whole layout is based upon chips below:
System controller chip including the read/write channel, disk controller and RISC
control processor (microcontroller),
Flash ROM chip containing drive firmware,
RAM chip used as a cache buffer.
Figure 8 : Representation of Hard Disk Drive as blocks
Disk controller is the most complicated drive component which determines the speed of data
exchange between a HDD and HOST.
Department of EIT Hochschule Rosenheim Page 24
Disk controller has four ports used for connection to a HOST, microcontroller, buffer RAM
and data exchange channel between it and head disk assembly. Disk controller is an
automatic device driven by microcontroller; from HOST side only standard registers of task
file are accessible. Disk controller is programmed at the initialization stage by
microcontroller, during the procedure it sets up the data encoding methods, selects the
polynomial method of error correction, defines flexible or hard partitioning into sectors, etc.
Buffer manager is a functional part of disk controller governing the operations of buffer
RAM, referred to as cache. The capacity of the latter ranges in modern HDDs from 512Kb to
16MB. The buffer manager splits the whole buffer RAM into separate sectioned buffers.
Special registers accessible from microcontroller contain the initial addresses of those
sectioned buffers. When HOST exchanges data with one of the buffers the read/write channel
can exchange data with another buffer sector. Thus the system achieves multi-sequencing for
the processes of data reading/writing from/to disk and data exchange with HOST.
3.2.1 Hard disk drive cache
Hard disk drives contain an integrated cache; also often called a buffer. The purpose of this
cache is not dissimilar to other caches used in the PC, even though it is not normally thought
of as part of the regular PC cache hierarchy. The function of the cache is to act as a buffer
between a relatively fast device and a relatively slow one. For hard disks, the cache is used to
hold the results of recent reads from the disk and also to 'pre-fetch' information that is likely
to be requested in the near future, for example, the sectors or sectors immediately after the
one just requested.
Figure 9 : Role of Cache buffer in Hard disk
Department of EIT Hochschule Rosenheim Page 25
The basic principle behind the operation of a simple cache is straightforward. Reading data
from the hard disk is generally done in blocks of various sizes, not just one 512-byte sector at
a time. The cache is broken into segments, or pieces, each of which can contain one block of
data. When a request is made for data from the hard disk, the cache circuitry is first queried to
see if the data is present in any of the segments of the cache. If it is present, it is supplied to
the logic board without access to the hard disk's platters being necessary. If the data is not in
the cache, it is read from the hard disk, supplied to the controller, and then placed into the
cache in the event that it gets asked for again. Since the cache is limited in size, there are only
so many pieces of data that can be held before the segments must be recycled. Typically the
oldest piece of data is replaced with the newest one. This is called circular, first-in, first-out
(FIFO) or wrap-around caching.
The use of cache improves performance of any hard disk, by reducing the number of physical
accesses to the disk on repeated reads and allowing data to stream from the disk uninterrupted
when the bus is busy.
3.3 Hard Disk Drive Interfaces
Host interface also called as drive interface used defines the characteristics of the electronic
interface between the disk drive and the computer. The type of interface used will to a great
extent depend on the purpose for which the computer is to be used, and the type of
interface(s) supported by the system motherboard. A number of different interfaces have been
developed over the years, some of which are described below.
3.3.1 Advanced Technology Attachment (ATA)
ATA has in the past been somewhat incorrectly referred to as Integrated Drive Electronics
(IDE) and has been retrospectively renamed as Parallel ATA (PATA) to distinguish it from
the more recent Serial ATA (SATA) interface. The use of the popular IDE misnomer comes
from the fact that this interface was the first in widespread use to have the drive controller
built into the drive itself. Previously, the drive controller was a separate add-on card that
occupied one of the ISA slots on the computer's motherboard. The drive was connected to the
motherboard using a 40 or 80-conductor ribbon cable that connected a 40-pin socket on the
drive itself to a similar socket on the motherboard and transferred sixteen bits of data in
Department of EIT Hochschule Rosenheim Page 26
parallel. Each ribbon cable could connect two ATA drives in a master-slave configuration.
Enhanced IDE, introduced in anticipation of changes to the ATA standard, allowed the use of
direct memory access (DMA) which meant that data could be transferred directly between the
disk and memory without involving the CPU in the data transfer process. This freed up the
CPU for other tasks.
Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28]
3.3.2 Small Computer System Interface (SCSI)
SCSI disk and tape drives were standard fare on servers and high-performance workstations
and despite advances in ATA technology can still be found in many high-performance server
applications. SCSI can be used to connect a wide range of devices, and the SCSI standard
defines command sets for many specific types of peripheral device. The SCSI interface
allows a maximum of either 8 or 16 peripheral devices to connect to the host computer via a
shared parallel bus.
Servers typically employ RAID drives in which multiple disks are connected to a SCSI RAID
controller card via a SCSI backplane inside a disk enclosure. The connection between the
backplane and the controller card will typically be a 68 or 80-conductor single drop ribbon
cable. Multiple non-RAID devices could also be connected to a SCSI controller card using
multi-drop cables. SCSI drives have not been widely adopted for personal computers due to
their cost, and the availability of relatively inexpensive ATA drives that provide perfectly
adequate performance for most desktop computing environments. SCSI controller cards are
nonetheless still available for personal computers, and can be mounted in a standard PCI-X or
Department of EIT Hochschule Rosenheim Page 27
PCI-E expansion slot. Parallel SCSI has largely been superseded in server and mass storage
applications by Fibre Channel (FC) or Serially Attached SCSI (SAS), both of which use a
high-speed serial interface.
Figure 11: A single-drop 68-conductor SCSI ribbon cable [28]
3.3.3 Serial Advanced Technology Attachment (SATA)
SATA is the successor to Parallel ATA. One of the most obvious differences is the use of a
high-speed serial signal cable instead of the parallel ribbon cable used for ATA drives. It has
two pairs of wires for carrying data and 3 ground wires, giving a total of seven wires. The
cable is cheaper and less bulky than its PATA counterpart, allowing a better flow of air
within the system case and making it easier to install. A SATA signal cable connects a single
drive to a SATA socket on the motherboard - there is no master/slave arrangement. SATA
drives use a 15-pin power connector rather than the 4-pin Molex power connectors used for
PATA drives, although adapters are available to enable a SATA drive to be connected to a
power supply via a 4-pin Molex power cable should the need arise.
The first version of the SATA standard is officially designated as Serial ATA International
Organization: Serial ATA Revision 1.0 (the technology itself should be referred to as SATA
1.5 Gbps) and specifies a gross transfer rate of 1.5 gigabits per second. Taking encoding into
account, this equates to 1.2 gigabytes (150 megabytes) of data. Subsequent revisions have
doubled and redoubled the transfer rates. Revision 2.0 (SATA 3.0 Gbps) is capable of a gross
transfer rate of 3.0 gigabits per second, and Revision 3.0 (SATA 6.0 Gbps) has a gross
transfer rate of 6.0 gigabits per second. As of 2010, most installed hard drives and PC
chipsets implement SATA 3.0 Gb/s, although SATA 6.0 Gbps products are now becoming
available (the Version 3.0 standard was released in May 2010). Most motherboards produced
Department of EIT Hochschule Rosenheim Page 28
since 2003 have integrated SATA controllers (although an add-on controller card can be
installed in a PCI or PCI-E slot). The SATA controller can use the Advanced Host Controller
Interface (AHCI) in order to take advantage of advanced features such as the hot-swapping of
drives, providing both the motherboard and operating system support AHCI. If not, SATA
controllers are capable of operating in "IDE emulation" mode.
Figure 12: Close-ups of SATA cable and its slots on a motherboard [28]
3.4 External Hard Disk Drives
External hard disk drives are generally standard ATA, SCSI or SATA hard disk drives
mounted in a suitable portable disk enclosure. The drive can be connected to a computer via a
Universal Serial Bus (USB) or Firewire port, or in the case of SATA drives via an eSATA
(external SATA) or eSATAp (power over eSATA) interface. If an eSATA or eSATAp port is
not available on the system, one can usually be added using a PCI add-on card. The use of an
eSATA interface has the advantage that data transfer rates are generally faster than for
contemporary versions of either USB or Firewire. Having said that, a future iteration of
Firewire is predicted to be able to achieve a data transfer rate of 6.4 Gbps, which will be
slightly faster than the SATA 6.0 Gbps version of eSATA, while USB 3.0 will not be far
behind with a data transfer rate of 4.8 Gbps. Unlike USB or Firewire however, eSATA allows
low-level drive features such as SMART (Self-Monitoring, Analysis, and Reporting
Technology) to be available to the drive. Unlike Firewire, neither USB 2.0 nor eSATA are
capable of providing the 12V power supply required by some 3.5" external hard disk drives
(such as the 1TB Seagate external drive pictured below), which means they need a separate
Department of EIT Hochschule Rosenheim Page 29
power supply. The introduction of eSATAp is intended to resolve this issue, while USB 3.0
will reportedly be able to provide voltages of 5V, 12V or 24V. At the time of writing, the
storage capacity of a typical external hard drive can range from a few hundred gigabytes up
to 4 terabytes.
Figure 13: A Seagate 1TB external hard drive [28]
To meet the demands of the fast growing interface technologies, the data access time should
reduce immensely which is possible either by simply increasing the rotational speed of the
platters or increasing the cache size to hide the latency.
3.5 Future of Hard Disk Drives
Magnetic disks have followed Moore’s Law during the last decades, doubling in capacity
roughly every 12 months. As well as capacity, bandwidth has also followed this trend.
Latency does, however, improve with a smaller factor, making random seeks more and more
expensive [13]. Continuing this trend, either needs to rethink the way magnetic disks are used
or move to an alternative storage solution.
Considering the future of magnetic disk storage technology, there are few bottle necks in
continuing this trend, the chief among them being the rotational speed of the platters (RPM).
Disk RPM is a critical component of hard disk drive performance as it directly impacts the
latency and data transfer rate from the disk. The faster the disk spins, the spindle head reads
more data; the slower the RPM, the higher the mechanical latencies.
Department of EIT Hochschule Rosenheim Page 30
A white paper by Fujitsu Trends in Enterprise Hard Disk Drives [10] quotes -
"Ultrahigh-speed HDDs rotating at speeds exceeding 20,000 rpm have also been researched
but not commercialized due to heat generation, power consumption, noise, vibration and
other problems in characteristics, and a lack of long term reliability."
Companies have tried ingenious designs to reduce the excessive heat produced by a high
spin rate. Generally, the physical disk platters of a standard 3.5 inch hard disk have an
approximate diameter of 3 inches. However, in companies like Pegasus II, the platter size has
been further reduced to 2.5 inches.
The smaller platters cause less air friction and therefore reduce the amount of heat generated
by the drive. In addition, the actual drive chassis is one big heat fin, which also helps
dissipate the heat. But, the disadvantage is that since the platters are smaller they have less
data capacity. This can be overcome by using more of them in stack but consequently the
height of the drive increases.
To get higher data rates from HDDs, manufacturers can
Spin the disks faster-but at 20,000 RPM, enterprise-class HDD platters are already under
severe mechanical stress.
Increase the number of read/write heads that can be active simultaneously-which
constitutes a radical, substantial, and costly architectural and electronic change to HDD
design.
Add a second servo actuator with another set of read/write heads and another set of
read/write electronics-which is completely out of the question from an economic
perspective.
Figure 14 Moving Parts in Hard Disk Drives [29]
Department of EIT Hochschule Rosenheim Page 31
Combining these trends together suggests that, what customers of big multi user servers
would really like is - Faster disk drives with lower power consumption! But that's just getting
tougher with hard disk technology.
Department of EIT Hochschule Rosenheim Page 32
Chapter 4
4 . Solid State Drives
In the past few years flash memory became more and more important. In many mobile
devices like mobile phones, digital cameras, USB memory sticks and mp3 players, flash
memory is used in small amounts for years. But as the price for flash memory is rapidly
decreasing and the storage density of flash memory chips is growing, it becomes feasible to
use flash memory even in notebooks, desktop computers and servers. It is now possible to
construct devices containing an array of single flash chips to build a device such that its
amount of memory is sufficient to use for main storage. Such a device is called a Solid State
Drive (SSD).
Solid State Drives are increasingly common in small form factor computing like notebooks;
but SSDs are also used in the desktop and enterprise server space by those looking leverage
the speed of an SSD to get maximum performance. While solid state drives have several
benefits, including speed, longevity and practically no noise output, they are not always the
best choice as hard drives still dominate in both capacity and cost.
4.1 Flash Market Development
The market of flash memory is changing. The density of NAND Flash is increasing
drastically. While flash memory chips can be made much smaller, their capacity doubles
approximately every year
This development leads to a widespread usage of flash memory. Today solid state drives are
mainly used in notebooks. In the future they might even be used in server architectures as the
standard configuration.
Another interesting development is the cost of flash memory. The price of flash memory is
rapidly dropping. Every month flash memory devices or flash memory storage cards get
cheaper, new products with larger capacity emerge on the market.
Department of EIT Hochschule Rosenheim Page 33
Figure 15: Evolution in density of NAND flash memory
4.2 Solid State Drives
Solid state drives do not need any mechanical parts. They are fully electronic device and use
solid state memory to store the data persistently. Two different storage chips are used: flash
memory or SDRAM memory chips. In this thesis only flash memory based solid state drives
is considered which are mostly used today.
Department of EIT Hochschule Rosenheim Page 34
Figure 16: HDD and SSD [30]
FLASH memory is the cornerstone of the Solid State Drive. With the increasing use of flash-
based secondary storage, detailed understanding of flash behaviour which affects operating
system design and performance becomes important.
This chapter provides detailed information about flash memory. Then in multiple
sections the internal parts of a solid state drive are discussed. The section 4.4 describes about
the flash translation layer and techniques which ensure the functionality of the solid state
drive.
4.2.1 FLASH MEMORY
Flash memory is a specific type of EEPROM that can be electrically erased and programmed
in blocks. Flash memory is non-volatile memory. There are two different types of flash
memory cells:
NOR flash memory cells.
NAND flash memory cells.
At the beginning of flash memory NOR flash memory was often used. It can be addressed by
the processor directly and is handy small amounts of storage.
Department of EIT Hochschule Rosenheim Page 35
Figure 17: NAND flash memory chip [30]
Today, NAND flash memory is used to store the data. It offers a much higher density which
is more suitable for large data amounts. The costs are lower and the endurance is much longer
than NOR flash. NAND flash can only be addressed at the page level. Flash memory can
either come with single level cells (SLC) or multi level cells (MLC). The difference in these
two cell models is that a SLC can only store 1 bit per cell (1 or 0), whereas MLC can store
multiple bits (e.g. 00 or 01 or 10 or 11). Internally these values are managed by holding a
different voltage level. Both flash memory cells are similar in their design. MLC flash
devices cost less and allow a higher storage density. Therefore in most mass productions
MLC cells are used. SLC flash devices provide faster write performance and greater
SLC MLC
High Density X
Low Cost per Bit X
Endurance x
Operating Temperature Range x
Low Power Consumption x
Write/Erase Speeds x
Write/Erase Endurance x
Table 4-1 SLC vs MLC [9]
reliability. SLC flash cells are usually used in high performance storage solutions. Table 4-1
compares the two cell models.
Department of EIT Hochschule Rosenheim Page 36
Flash memory only allows two possible states:
erased
programmed
When a flash memory cell is in the erased status then its bits are all set to zero (or one,
depending on the flash device). Only when a flash cell is in the erased mode, the controller
can write to that cell. In this example this means the 0 can be set to 1. Now the cell is
programmed and kind of frozen. It is not possible to simply change back the 1 to a 0 and
write again. The flash memory cell has to be erased first (Figure 18). The even worse fact is
that not only a small couple of cells can be erased. The erase operation has to be done on a
much larger scale. It can only be done in the granularity of erase units which are e.g. 512KB.
If the amount of data being written is small compared to the erase unit, then a
correspondingly large penalty is incurred in writing the data. The flash memory architecture
is divided in blocks of flash memory. The smallest erasable block is called an erase unit. If
the position of the written data overlaps two blocks, then both blocks have to be erased.
However this erase operation must not be necessarily executed right before or after the write.
The controller of the device might choose just a new block for the write request and update
the internal addressing map.
Figure 18: Flash memory overwrite mechanism
Department of EIT Hochschule Rosenheim Page 37
4.2.1.1 Flash Structure
NAND Flash memory is organized into blocks where each block consists of a fixed number
of pages. Each page stores data and corresponding metadata and error correction code (ECC)
information. A single page is the smallest read and write unit. The internal structure of Flash
memory is rarely identical from chip to chip. As the technology has matured over the years,
many smaller architectural changes are been made. There are, however, a few fundamentals
for how Flash memory is constructed. Each chip will have a large number of storage cells. To
be able to store data, these will be arranged into rows and columns [1]. This is called the
Flash array. The Flash array is connected to a data register and a cache register (Figure 19).
These registers are used when reading or writing data to or from the Flash array. By having a
cache register in addition to a data register, the controller can process a request while the
controller serves data. This enables the Flash memory bank, to internally, process the next
request, while data is being read or written.
Figure 19 : A generic overview of a Flash memory bank [5]
4.2.1.2 Page
Pages in a Flash array are the smallest unit any higher level of abstraction will be working on.
The size of a page may vary, depending on the specifics of physical structure, but are
typically in the size of 4kB [6, 5]. . Having 128 pages, the next greater unit in the flash
memory hierarchy is the erase unit with 512KB; this can vary from drive to drive. In addition,
each cell will also have an allotted space for Error-Correction Code (ECC). During a read
Department of EIT Hochschule Rosenheim Page 38
operation, all the data from the page will be transferred to the data register. In a similar way,
write operations to a page will write all data in the data register to the cells within a page.
Recalling again, when writing flash cells supports only two operations. A cell can be in a
neutral or a negative state. When writing data to a page, it is only possible to change from
neutral (logical one) to negative (logical zero) state, meaning that to be able to change from
zero to one, entire page need to be reset. On whole, Flash chips can be grouped together in
so-called planes to increase storage capacity. Multiple planes can be accessed in parallel to
enhance data throughput [12]
4.2.1.3 Erase Block
When re-setting cell state with field emission, multiple pages will be affected by the reset.
This group of pages is called an erase block. A typical number of pages contained will be 128
[7], but can be different, depending on how the flash cells are structure. Given a page size of
4kB, an erase block would then be 512kB in size. This tells that changing content in any of
the pages within the erase block would need to rewrite all 512kB. For this simple reason, in-
place writes are not possible in Flash memory.
4.2.1.4 Cell degradation
The Each time a Flash cell is erased, the stress on the cell from the field emission will
contribute to cell degradation [1]. Modern flash memory banks are usually rated for
approximately 105 erase cycle, but to be able to handle a small number of faulty cells, each
page will be fitted with ECC data.
4.3 Physical layout
While flash memory is the cornerstone of the Solid State Drive before data gets to the flash
memory, there are several other SSD components that data must pass through. An SSD does
not actually have many unique parts and the differentiation in SSDs from different
manufacturers often happens in the controller and firmware more than anything else.
There is little information released by hardware manufacturers about drive layout and how
data is organized. To illustrate this, look at the entirety of what the Intel® X25-E datasheet
has to say about its architecture.
Department of EIT Hochschule Rosenheim Page 39
Figure 20 : Components of SSD
The Intel® X25-E SATA Solid State Drives utilize a cost effective System on Chip (SOC)
design to manage a full SATA 3Gbps bandwidth with the host while managing multiple flash
memory devices on multiple channels internally [2].
Having looked the structure of the Flash memory banks in Figure 19, gives an general idea
of what to expect, but only a simple read/write operation. If seen from block diagram in
Figure 23, an SSD connects several flash memory banks together in a Flash Controller (FC).
In a single SSD there are usually multiple FCs, which are commonly called channels. As
implied by the name, a channel will be able to independently process requests, giving SSDs
the ability to internally process a number of operations in parallel.
Department of EIT Hochschule Rosenheim Page 40
Figure 21: organization of conventional SSD
4.4 Flash Translation Layer (FTL)
In order to alleviate the ―erase-before-write‖ problem in flash memory, most flash memory
storage devices are equipped with a software or firmware layer called Flash Translation
Layer (FTL) [11]. An FTL makes a flash memory storage device look like a hard disk drive
to the upper layers. One key role of an FTL is to redirect each logical page write from the
host to a clean flash memory page which has been erased, and to remap the logical page
address from an old physical page to a new physical page. In addition to this address
mapping, an FTL is also responsible for data consistency and uniform wear-leveling. The
concept of the FTL is implemented by the controller of the solid state drive. The layer tries to
efficiently manage the read and write access to the underlying flash memory chips. It hides
all the details from the user. So when writing to the solid state drive the user does not have to
worry about free blocks and the erase operation. All the managing is done internally by the
FTL. It provides a mechanism to ensure that writes are distributed uniformly across the
media. This process is called wear-leveling and prevents flash memory cells from wearing
out.
4.4.1 Controller
The controller of a solid state drive manages all the internal processes to make the FTL work.
It contains a mapping table that does the logical physical mapping. The logical address that
comes from the request is mapped to the physical address which points to the flash memory
block where the data is in fact stored. Whenever a read or write request arrives in the solid
state drive the logical block address (LBA) first has to be translated into the physical block
address (PBA) (Figure 22). The LBA is the block address used by the operating system to
Department of EIT Hochschule Rosenheim Page 41
read or write a block of data on the flash drive. The PBA is the physical address of a block of
data on the flash drive. Note that over time the PBA corresponding to one and the same LBA
can change often.
Figure 22 : Address translation in solid state drive [8]
The controller handles the wear-leveling process (see section 4.4.2). When a write request
arrives at the solid state drive then a free block is selected, the data is written and the address
translation table is updated. Internally the old block has not to be erased immediately. The
controller could also choose to wait with the erasure and do a kind of garbage collection
when the amount of free blocks falls below a certain limit. Or the controller may also wait
until the drive is not busy. Certainly some data structures are used to maintain a free block list
and to store the used blocks. In a flash memory block there is a little overhead memory where
meta-data can be stored to help managing these structures. For example a counter stores how
many times each block has already been erased.
Like conventional hard disk drives, SSDs usually have an internal DRAM cache to buffer
write requests or store prefetched pages. This buffer enables solid-state disks to backup and
restore pages during erase cycles and to keep in-memory information, e.g., page-mapping
structures. Using and increasing the DRAM cache and add more intelligent techniques to
organize requests could make a huge difference. By using an FTL, it is possible to avoid most
drawbacks of flash chips while making use of the advantages. Therefore, the FTL is a major
performance-critical part of every SSD. For such a SSD controller one can think of many
optimizations.
Department of EIT Hochschule Rosenheim Page 42
Figure 23 : Internal structure of solid state drive [6]
Pre-fetching data when sequential read patterns occur (like a conventional hard disk drive
could fill its whole buffer) might speed up the reading process. A controller could also write
to different flash chips in parallel (Figure 23). Since all the parts are electronically in flash
memory, parallelization might not be very hard to add. Flash memory can also be seen as
many memory cells that are ordered in parallel. Using parallelization, the I/O requests, the
erase process and the internal maintaining of the data structures get more complicated, but a
much higher performance can be accomplished. One could even think of constructing a SSD
containing several drives combined together as a RAID configuration inside.
4.4.2 Garbage collection and Wear-leveling
Garbage collection and wear-leveling are other important tasks of the FTL. Garbage
collection is needed because blocks must be erased before they are used. The garbage
collector works by scanning the SSD blocks for invalid pages, then reclaiming those invalid
pages. Wear-leveling is necessary because most workloads write to a subset of blocks
frequently, while rarely writing to other blocks. Because each block of flash memory only has
a limited number of write-erases before it is worn out, without wear-leveling, the frequently
written to blocks would easily wear out well before the other blocks. Wear-leveling helps
solve this problem by shuffling cold (unused/less frequently used) blocks with hot (frequently
used) blocks to balance out the number of writes over all of the flash memory
Department of EIT Hochschule Rosenheim Page 43
4.4.3 Write Amplification
As seen in earlier section, SSDs will need workarounds to enable place while writing of data.
That is, changing a few bytes of data will either need the entire erase block to move, or the
entire erase block to be rewritten.
Write amplification is a measure of the number of bytes actually written when writing a
certain number of bytes. For example, if you write a 4K file, on average, the drive may write
40K bytes worth of data. This comes back to the flash characteristics. At some point, you will
need to combine data from several partially used blocks to free up pages for new data to be
written. Write amplification has an impact on the life of a drive. One effective way to
measure drive lifetime is to measure how many bytes can be written to the drive over its
lifetime.
4.4.4 Error correction
Knowing that a cell will loose its ability to properly store data after a certain number of
writes, the SSD-controller needs to be able to handle erroneous pages in a graceful manner.
To detect errors, each page has an allotted space for ECC. This makes it possible to check the
consistency of the data on writes. The ECC will be used to handle a given number of
damaged cells, but will at some point reach an un-correctable amount of noise. This page will
then be marked as invalid, and no longer be used by the FTL.
4.4.5 Trim
Trim is a function of the operating system telling the drive that a page is no longer valid!
This helps to reduce write amplification because you don’t copy stale pages. There will also
be fewer pages to copy, which will speed up the process of freeing up partially valid blocks.
When it is time to consolidate blocks to free up space, the SSD must copy all of the data it
considers valid to a new block before it can erase the current block. Without trim, the SSD
does not know a page is invalid unless the LBA associated with it has been rewritten.
Department of EIT Hochschule Rosenheim Page 44
4.5 Solid State Drive Interfaces
Solid State Drives are available with a variety of system interfaces based primarily on the
performance requirements for the SDD in the system. Also, since SDDs are generally used in
conjunction or interchangeable with magnetic disk drives, a common mass storage bus
interface is used in most cases. This also allows the system software to manage both drive
types in a similar way, making system integration nearly plug-and-play. There are also
interfaces initially designed for other purposes but have been adopted by SSDs in some cases.
Generally SSD are supported with SATA, Serial Attached SCSI, ATA/IDE similar to HDD
but SSD also support the latest revised version of these standards. To meet the demand, SSD
with PCI Express interfaces are used. In many server applications, PCI Express is used as the
end interface only when HDD’s are used in RAID mode just to meet the performance
bandwidth of PCI Express. However, when SSD using PCI Express, the flash chips are
directly placed on the PCI-Express card enabling to get better performance out –of flash chips
directly as compared to HDD’s used in RAID through other interfaces.
Figure 24 : X4 PC-Express card with NAND flash chips on it [31]
4.5.1 PCI Express
PCI Express is a 2.5 Giga transfers/second serial differential point-to-point high-speed
interconnect with added flexibility and scalability. The immediate benefit is increased
Department of EIT Hochschule Rosenheim Page 45
bandwidth. PCI Express offers 4GB/s of peak bandwidth per direction for a x16 link and 8
GB/s concurrent bandwidth. This allows for the highest performance in gaming and video
capture. In addition, PCI Express is designed for cost parity. The PCI Express x16 connector
is expected to be at cost parity to the high volume standard connectors.
Peripheral Component Interconnect Express (PCI-e) is an internal interface, so a
SSD would be on a circuit board and plugged into a PCI express slot on the motherboard.
4.6 SSD Market
The sources predict that SSD will have a major impact on the storage market. As of today,
companies like Crucial, Intel, Fusion-io: just to name few, have already released high speed
SSDs to the market. The architectural technologies that can speed up performance and IOPS
are independently developed by various original equipment manufacturers to suit particular
products or markets. The key architectural features which will increase throughput and shrink
the asymmetry gap in read / write IOPS are:-
Parallelization of the internal flash arrays
Improved flash management technology.
Faster flash controllers.
Faster host interface controllers (and faster interfaces driven by the needs of the SSD
market rather than adapted from the HDD market).
Hybridizing on board memory technologies - for example using faster RAM-like non
volatile memory in some parts of the device and slower flash-like memory in the bulk
storage arrays
A lot of trial and error will be involved as original equipment manufacturers throw products
at the market which tweak the technologies they understand best - and see which products
stick. Some of these will enhance currently known architectures, while others may make
some architectural features obsolete. In years around the corner- the flash SSD technology is
expected to have reached a point where the architecture of an ideal SSD is well established -
and the ongoing developments will be driven more by process changes than anything else.
Department of EIT Hochschule Rosenheim Page 46
Figure 25 : SSD Market development
4.7 Future
The availability and maturity of SSD-technology has changed drastically over the last couple
of years. Having gone from being a vastly more expensive technology that proved better in
only a small subset of scenarios, With ONFI (Open NAND Flash Interface)[3] working
intensively on NAND technology, The SSD future seems to be bright. ONFI has created the
Block Abstracted NAND addendum specification to simplify the host controller design by
relieving the host of the complexities of ECC, bad block management, and other low-level
NAND management tasks. The ONFI block abstracted NAND revision 1.1 specification adds
the high speed source synchronous interface, which provides up to a 5X improvement in
bandwidth compared with the traditional asynchronous NAND interface. The ONFI
workgroup continues to evolve the ONFI specifications to meet the needs of a rapidly
growing and changing industry.
Department of EIT Hochschule Rosenheim Page 47
ONFI 2.1 [3] contains a plethora of new features that deliver speeds of 166 MB/s and 200
MB/s, plus other enhancements to increase power, performance, and ECC capabilities. Along
with ONFI, the SSD manufacturing companies are designing their products to meet fast
interface technologies such as SATAIII, PCI Express.
ONFI is dedicated to simplifying NAND flash integration into consumer electronic products,
computing platforms, and any other application that requires solid state mass storage.
4.8 Summary
This chapter has given an overview of the technology behind SSDs. It is seen that flash cells
are at a point where production and technology are mature enough to make storage devices
capable of competing with magnetic disks. Discussions regarding some (FTL, Wear-leveling)
of the challenges SSDs are faced with when using these Flash cells for bulk storage.
4.9 Typical characteristics of HDD and SSD
Reliability of the drive
HDD drives use mechanical parts whose lifespan is limited. While SSD using flash memory
can sustain almost 105
write cycles per write cell [21].
Access Speed
The typical access time for a Flash based SSD is about 35 – 100 micro-seconds whereas
that of a rotating disk is around 5,000 – 10,000 micro-seconds. That makes a Flash-based
SSD approximately 100 times faster than a rotating disk.
Consistent read performance
Read performance does not change based on where data is stored on an SSD but in HDD, If
data is written in a fragmented way, reading back the data will have varying response times.
Defragmentation
SSDs do not benefit from defragmentation because there is little benefit to reading data
sequentially and any defragmentation process adds additional writes on the NAND flash
that already have a limited cycle life [22]. HDDs may require defragmentation after
Department of EIT Hochschule Rosenheim Page 48
continued operations or erasing and writing data, especially involving large files or where
the disk space becomes low.
Audible noise
HDD have audible clicks and crunching sounds. While SDD drives are often quieter
because they have no mechanical parts.
Size
Flash-based SSDs are manufactured in standard 2.5″ and 3.5″ form factors. 2.5″ SSDs are
normally used in laptops or notebooks while the 3.5″ form factors are used in desktops
Vibration
SSDs are naturally more rugged than HDDs. SSD drive can sustain up to 1,000 Gs/0.5 ms of
shock[16] before sustaining damage or a drop in performance while HDD drives can
withstand up to 63 Gs/2ms while operating and 350 Gs/1ms [24] when turned off.
Power Consumption
SSDs have low power consumption over HDDs
Heat Dissipation
Along with the lower power consumption, there is also much lesser heat dissipation for
systems using Flash-based SSDs as their data storage solution. This is due to the absence of
heat generated from the rotating/movable media. This certainly proves to be the one of the
main advantages of Flash-based SSDs relative to that of a traditional HDD. With less heat
dissipation, it serves as the ideal data storage solution for mobile systems such as PDAs,
notebooks, etc.
Mean Time Between Failures (MTBF)
Average MTBF for SSDs is approximately 2,000,000 Hours [16] while MTBF for HDDs is
approximately 700,000 Hours [24]
Cost Considerations
As of February 2011, NAND flash SSDs cost about (US) $1.20–2.00 per GB and HDDs
cost about (US) $0.05/GB for 3.5 inch and $0.10/GB for 2.5 inch drives.
Department of EIT Hochschule Rosenheim Page 49
Chapter 5
5 . Performance: HDD vs SSD
A chapter earlier has indicated SSDs have the advantage of not having moving parts, giving it
an overall low latency. Magnetic disks, on the other hand, have a harder time keeping latency
low, due to seek and rotational latency. In this chapter, the focus is on how the above
mentioned general performance characteristics add up when faced with specific application
scenarios. The goal is to get a clear profile of both SSD and HDD, making the right choice
when it comes to performance.
There are various techniques used to analyse performance of storage drives, and there
architecture behind the performance.
5.1 Benchmark
In computing, a benchmark is the act of running a computer program, a set of programs, or
other operations, in order to assess the relative performance of an object, normally by
running a number of standard tests and trials against it [23].
Benchmarks provide a method of comparing the performance of various subsystems across
different chip/system architectures.
The performance of both SSDs and magnetic disks can be difficult to summarize with just a
few numbers. As discussed earlier, certain aspects of a disk might give different performance
results, and one might get different performance depending on the workload. In addition to
these uncertainties, different file systems will store data in a fundamentally different way. All
this put together, it is hard to get a clear answer for what level of performance a given
application can expect to achieve, only looking at numbers from datasheets.
To investigate performance levels, up-to-date high-end SATA consumer and enterprise flash
solid state drives with mechanical hard disk drive are benchmarked. For this, When choosing
drives for benchmark, focused on mid-range alternatives, the two most popular/best SSDs in
Department of EIT Hochschule Rosenheim Page 50
the market today are considered, namely Intel X25-E and Crucial Real C300. The two SSDs
are differentiated by the type of memory and the system interface technology used. HDD
from Seagate is considered for benchmarking.
Disk Specification
Make Seagate[24] Intel X25-E[16] Crucial Real C300[17]
Type HDD SSD SSD
Size(GB) 80 32 128
Form factor 3.5" 2.5" 2.5"
Interface SATA SATA SATA
Transfer rate(Gbps) 1.5/3 1.5/3 06/03/1.5
Rotation 7200 - -
Memory Magnetic Platter SLC NAND MLC NAND
Average access time 4.16ms 0.08ms <0.1ms
sequential read
250MB/s 355MB/s
Sequential write
170MB/s 140MB/s
Table 5-1 Overview of drives in Benchmark environment
5.2 Benchmark environment The benchmark environment consists of two different SSDs one from Intel and other from
Crucial along with a magnetic disk drive from Seagate. Information about the drives, as
provided from datasheets for comparison is available in Table 5-1. During benchmarks, for
simplicity the drives are referred by their manufacturer. The test PC consists of Intel®
Xeon® Processor 5600 and the Intel® 5520 Chipset-2.67GHz and 2GB of Random Access
Memory (RAM), running on windows XP (32bit) operating system.
In section 5.3, different benchmarks are run on both SSD's and magnetic disk, their
performance are analysed with the resulting benchmark values.
5.3 TPC-H Benchmark
Transaction Processing Performance Council (TPC) is a non-profit organization founded in
1988 to define transaction processing and database benchmarks and to disseminate objective,
verifiable TPC performance data to the industry. TPC benchmarks are widely used today in
evaluating the performance of computer systems.
Department of EIT Hochschule Rosenheim Page 51
The TPC-H benchmark is widely used in the database community as a yardstick to
assess the performance of database management systems against large scale decision support
applications. The benchmark is designed and maintained by the Transaction Processing
Performance Council.
5.3.1 BACKGROUND AND SIGNIFICANCE OF TPC-H
The TPC-H benchmark [13] tests the performance of analytics servers used by decision
support systems by measuring the performance of ad-hoc queries against a data set (called a
scale factor) of a specific size while the underlying data is being modified. The objective is to
simulate an on-line production database environment with an unpredictable query load that
represents a business oriented decision support workload where a DBA must balance query
performance and operational requirements such as locking levels and refresh functions.
Results are usually expressed as QphH@Size for performance or $/QphH@Size where
―Size‖ indicates the database size or scale factor used for the testing. The performance metric
reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric
(QphH@Size), and reflects multiple aspects of the capability of the system to process queries.
TPC-H benchmarking database sizes are currently 1GB, 10GB, 30GB, 100GB, 300GB,
1,000GB, 3,000GB, 10,000GB, 30,000GB, and 100,000GB but the TPC discourages
comparing results across different database sizes since database size is a major and obvious
factor in performance.
Although any benchmark, including the TPC-H, is unlikely to represent any particular
customer’s decision support workload or environment, TPC-H is an important test because of
the high level of stress it puts on many parts of a decision support system, and is used by
virtually all major platform vendors, and many decision support system suppliers to
demonstrate the performance attributes of their systems.
In this thesis, there is one important consideration that has to be noted, contrast to expressing
results in QphH@Size for performance, the time taken by individual queries to run against
the database set is measured.
The TPC provides a set of tools to build TPC-H benchmark .This tool contains code files.
The tools provided with TPC-H includes, a database population generator (DBGEN) and a
query template translator (QGEN)
Department of EIT Hochschule Rosenheim Page 52
DBGEN and QGEN are written in ANSI 'C' for portability, and have been successfully
ported to over a dozen different systems. While the TPC-H specification allows an
implementer to use any utility to populate the benchmark database and to create the
benchmark query sets, the resultant population must exactly match the output of DBGEN.
The source codes have been provided to make the building a compliant database population
and query sets as simple as possible.
A TPC-H benchmark application package is created which is bound to a database, this
application measures time taken by individual query to run against 10 GB database which is
application specific, it is done by using DBGEN. The overview of TPC-H benchmark
application created is shown in Figure 26. There are several steps to be followed in order to
create an application package, for the detailed procedure look into appendix A
TPC benchmark results are expected to be accurate representations of system
performance. Hence, there are certain guidelines that are expected to be followed when
measuring those results. The approach or methodology used in measurements is explicitly
described in the specification [13].
Figure 26 : TPC-H benchmark application outline
Department of EIT Hochschule Rosenheim Page 53
TPC-H Benchmark performance
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Query number
Av
era
ge
ex
ec
uti
on
tim
e (
se
co
nd
s)
Seagate HDD Intel X25-E SSD Crucial C300 SSD
5.3.2 Test scenario
SSDs are bit vulnerable to random read in contrast to sequential read and it is therefore
interesting to observe how they perform by changing conditions for data seek through queries
while scanning through large database. In addition to looking at read performance, it is
interesting to know how same operations consume time such as scanning tables while
traversals in database handled on Intel X-25E, Crucial Real C300, as opposed to Seagate's
HDD.
The application package is created using 17 queries from QGEN, each one of them will
access 10 GB database. The application database is created on all three drives independently.
The identical application is created and ran through the drives by bonding the packages to
respective database. The execution time of every individual query is calculated.
To discover the level of impact the access time has on performance, a set of same
queries across three different disks was tested. These disks are listed in Table 5-1.Each query
was made to run 115 times and the initial 15 runs have been excluded while calculating the
average time taken. This was done to ensure standard deviation between query execution
times is less than 3 percent and the processor is only into running the application query.
5.3.3 Results
When comparing the results from Figure 27, observation indicates the query execution times
from Intel X-25 E and Crucial Real C300 are low and comparable in contrast to very high
execution times in Seagate HDD.
Figure 27 : TPCH benchmark performance results
Department of EIT Hochschule Rosenheim Page 54
The difference in execution time is not consistent over the queries executed; this solely
depends on how the database is laid across the drives. Considering the query execution time,
on an average, Intel X-25E is 8 times and Crucial C300 is 10 times faster than Seagate HDD.
Both Intel X-25E and Crucial Real C300 SSD perform relatively close to that of advertised
speed when doing reads, this is due to the symmetrical latency properties of SSDs.
Except for the query readings-1, 8, 10, 12 and 16, both SSDs, across all file systems, achieve
significantly lower execution times. This can be attributed to the fact that although database
is distributed across flash memory chips, flash memory banks are channelled in parallel,
hence the lower execution time. When getting a series of requests for data located on
different channels, the SSDs are able to handle these requests in parallel.
The execution time of query readings-1, 8, 10, 12 and 16 in both SSDs and Seagate HDD
show comparable values, this is because the queries are accessing sequentially stored data.
Summarizing, solid state drives perform significantly better than hard disk drives in random
operations in contrast to sequential operations.
5.4 Energy Efficiency Test
While talking about solid state drives, power consumption becomes an interesting point of
discussion. Nowadays there are many huge server architectures that run 24 hours a day and
consume quite an amount of power.
An advantage of flash memory is that it has low power consumption. The one
advantage of a solid state drive over a good hard disk drive in terms of power consumption is
that the operation power consumption is lower. On the contrary several measures showed that
the idle power consumption can be much lower in a hard drive. A workload can be surely
found where the hard disk drive wins and others where the solid state drive can show better
energy efficiency. Newer high performance solid state drives often contain an additional
amount of DRAM cache which will also use some additional power. The general power
consumption comparison is not that easy to do. No general statement can be found. Every
hard disk drive or solid state drive has slightly different power consumption. See the critical
article from Tom's Hardware [15] for more information.
Department of EIT Hochschule Rosenheim Page 55
5.4.1 Test scenario
It is interesting to know how much on an average Seagate hard disk drive would consume
power in comparison to Intel X25-E and Crucial Real C300.
The power consumption was measured for all 17 TPC-H queries running each of them for 10
times. In order to measure, cost control 3000-a product from Base Tech, was used, which
measures not only power consumption but also energy costs accordingly.
Consumption of the power indicated is not purely by the drives alone but also from the
system on which it is running, the variations are quite acceptable due to moving parts of the
Seagate hard disk.
5.4.2 Results
The results are displayed in the Figure 28
Figure 28: Comparison for energy efficiency
These values are indirectly influenced by the total amount of time taken by individual drives
to execute all the TPC-H queries. The Seagate drive consumes approximately 6 times more
power than Solid State Drives under tests. This is mainly due to moving parts of Seagate drive;
especially during random readings, head needs to be moved repeatedly. In case of SSDs, no
additional power is required to activate the platters or the mechanical arms.
Power consumption
0
0.5
1
1.5
2
2.5
3
Seagate HDD Intel X25-E Crucial C300
Power(KWh)
Department of EIT Hochschule Rosenheim Page 56
However, as said earlier no specific conclusion can be brought out from this but, the
picture shows the overall amount of power that can be saved by replacing HDD by solid state
drives while performing the same task.
5.5 HD Tune Benchmark
HD Tune is a hardware-independent utility that administrators can use to perform a variety of
hard disk diagnostics, regardless of manufacturer, to confirm hard disk health and
performance.
HD Tune is a hard disk utility which has many functions namely Benchmark (measures
low level read/write performance), file benchmark, random access, health status check, drive
temperature display. However, our main interest in HD tune is the benchmark functionality;
the benchmark function itself offers four different tests:
Transfer rate
The data transfer rate is measured across the entire disk surface (default) or across the
selected capacity [14] for the specified data block size. It gives option to measure transfer rate
for both read and write. To prevent accidental data loss, the write test can only be performed
on a disk with no partitions.
During the transfer test, certain parameters have to be set as per our requirements like
Test speed/accuracy: the full test will read or write every sector on the disk. This will give
the most accurate results but the test time will be very long. By choosing the
partial test the transfer speed is sampled across the disk surface. The test time
and accuracy can be chosen by moving the slider.
Block size: block size which is used during the transfer rate test. Lower values may give
lower test results. The default and recommended value is 64 KB.
Access time
The average access time is measured and displayed in milliseconds (ms).
Burst rate
The burst rate is the highest speed (in megabytes per second) at which data can be transferred
from the drive interface (IDE, SATA, USB) to the operating system.
Department of EIT Hochschule Rosenheim Page 57
CPU usage
The CPU usage shows how much CPU time (in %) the system needs to read data from the
hard disk.
As seen in section 5.3.3, the time consumed by Seagate hard disk to run queries was quite
high in comparison to Intel X-25E and Crucial Real C300. This variation could be due to
various factors such as access time, average transfer rate. In section 5.5.1, the above mention
factors by running through HD tunes through all the three drives were examined.
5.5.1 Test scenario
The HD utility was run over all three drives. For the transfer rate test, the block size setting
was assigned to 4MB with full test in fast mode.
5.5.2 Results
Transfer rate:
Figure 29: Read speed comparison
The sequential read in Seagate hard disk is low compared to hard disk drives shown in Figure
29, Seagate HDD is nearly 4 times slower than Intel X25-E and 5 times slower than Crucial
C300. This is mainly because of its large access time.
Average Read Speed
0
50
100
150
200
250
300
Seagate HDD Intel X25-E Crucial C300
MB
/s
Department of EIT Hochschule Rosenheim Page 58
Figure 30: Access time comparison
This access time difference itself shows how valuable Solid state drives could be in high
speed real time application. The SSDs access data nearly 150 times faster than HDD.
5.6 Summary
From benchmarks, the results obtained fit many of the observations done in section 3.1.3.
Magnetic disks showed an overall low performance on read operations, due to seek time and
rotational delay. SSD’s are 8-10 times faster for reads on an average, and 150 times faster
with respect to the access speed compared to Hard Disk Drive. SSDs showed a high
performance on read operations, even showing a higher degree of performance on random
reads in TPC-H benchmark. Most likely, this could be attributed to the fact that an SSD
consist of multiple flash memory chips, connected in parallel, as discussed in section 4.3.
SSDs will depend heavily on FTL, as each channel can handle requests in parallel!
The overall performance results indicate solid state drives give better performance then HDD,
but upon keen observation, considerable performance difference can be seen between the two
solid states drives. The crucial Real C300 SSD outperforms Intel X25E SSD in most of the
benchmark tests conducted above. The performance difference could be due to various
reasons like flash type, controller, system architecture, cache buffer used. In coming chapters,
the discussions bring out the analysis for performance difference.
Access time
15.7
0.1 0.10
2
4
6
8
10
12
14
16
18
Seagate HDD Intel X25-E Crucial C300
Tim
e(m
s)
Department of EIT Hochschule Rosenheim Page 59
Chapter 6
6 . Better Investment: SSD or additional RAM?
In comparison to hard disk drives, Solid state drives are faster, quieter, more efficient in
energy consumption-but costlier!
Usually, to get better performances with Hard disk drives, Systems are installed with more
Random Access Memory. Since cost per Giga-Byte in hard disk drives is low, just by adding
external RAM, performance outcome of the hard disk drives can be controlled. A system
working on 10GB of data, if installed with RAM anything more than 10GB should increase
system performance immensely as the data can be completely loaded into RAM thereby
reducing the access time of hard disk drive. But, RAM is costlier by itself - stated in section
2.3.1.
By adding more RAM to the system, the performance can be immensely increased;
as seen in previous chapter, performance can also be enhanced replacing HDD by SSD in the
system. To get a better picture of a better buy, section 6.1 focuses on – comparing the
performance of HDD with additional RAM to SSD with 2GB of RAM. This will give us an
idea of a better investment between RAM and SSD to enhance system performance.
To analyse this, the same two solid state drives used earlier in section 5.1 and the Seagate
Hard disk drive are considered. TPC-H benchmark is used for performance comparison.
6.1 Benchmark Environment
Drive Specification
Make Seagate[24] Intel X25-E[16] Crucial Real C300[17]
Type HDD SSD SSD
Size(GB) 80 32 128
Interface SATA 2.0 SATA 2.0 SATA 2.0
Rotation 7200 - -
Memory Magnetic Platter SLC NAND MLC NAND
System RAM (GB) 2 / 8 /12 2 2
Table 6-0-1 Overview of drives in Benchmark environment
Department of EIT Hochschule Rosenheim Page 60
TPC-H benchmark with scale factor of 10 was run through all the drives initially with system
memory 2GB RAM. Later, benchmark process was repeated in Seagate HDD with 8GB
RAM and 12GB RAM.
CPU: Intel® Xeon® Processor 5600
Main board: Intel® 5520
OS : Microsoft Windows 7 Professional x64;
Memory:
o 2 GB, DDR3-1333 SDRAM (Kingston)
o 4 GB, DDR3-1333 SDRAM (Kingston)
o 8GB, DDR3- 1333 SDRAM( Micron)
The TPC-H queries were run for 115 times but while calculating average query execution
time, first 15 results were excluded to ensure that processor is only into running the
application query.
6.2 Results
To match the performance of SSD, the system was upended with more RAM in steps.
Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM
Analysis of the Figure 31 indicates, expect for the queries-1, 3, 5, 12 and 16 the performance
of SSDs with 2GB RAM cannot be met even by increasing the RAM to 12GB in the system
Department of EIT Hochschule Rosenheim Page 61
with Seagate HDD. Increasing RAM beyond 12GB to attain better performance is not
productive in current scenario. Since database size is normally huge, large amount of RAM
has to be appended consistently with increase in database size to enhance the performance.
Considering overall query performances, SSD provide better results.
The comparison of performances with variation in RAM with HDD is indicated in Figure
32. The performance seems to be at saturation level irrespective of increase in RAM – except
in queries 1, 2, 8, 10, 12, 15. Increasing RAM beyond actual database size to attain better
performance is not productive.
Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM
6.3 Conclusion
Database applications (server) greatly benefit random disk access speed .This is why servers
have large DRAM footprints used as disk cache. However, solid state drive provides
significantly better performance in random reads. The test results indicate solid state drives
performs better while dealing with large data. Considering server applications, it is worth
investing on solid state drives than RAM. Hence solid state drives are a better for
performance enhancements.
A precise decision cannot be taken by just based on above results as the scenario cannot be
generalised. Applications where random reads or writes are very less compared to sequential
reads or writes, HDD with RAM is better buy saving the investment significantly. Therefore
Department of EIT Hochschule Rosenheim Page 62
you have to be really careful about where SSDs are used; otherwise it is very difficult to
justify their additional cost.
6.4 Benchmark problems
There are no standard benchmarking tools that are specifically built to test Solid State Drives.
Benchmarking SSDs using tools developed for HDDs causes several unique problems that
need to be solved by developing benchmarking software that catches up with the technology.
As mentioned, SSDs use different strategies and data geometry than conventional HDDs.
This causes some functional differences and, more importantly, makes some benchmarks
inadequate, particularly those that were optimized for the standard platter configuration of
HDDs. Due to these addressing issues, some benchmarks could show radically different
results also on the transfer graphs and or average the performance values incorrectly.
Needless to say that the same algorithms applied to a functionally totally different device will
not render the same ―realistic‖ performance values, on the contrary, many of the test points
will fall within one block but others will span from the end of one block to the beginning of
another block. That will cause delays in the completion of the reads or writes and since the
test samples are relatively small in size, it will result in low ―calculated‖ performance values.
Because the stride size is constant in most benchmarks and the page size is constant, too,
these values will result in a saw-tooth pattern of the performance graph, simply as a
consequence of the periodicity of the two address patterns. Some benchmarks appear to use
test patterns that don’t seem to work well with SSDs and, thus, generate artefacts. Thus
benchmark cannot be viewed as cent percent representative of the SSD performance.
Department of EIT Hochschule Rosenheim Page 63
Chapter 7
7 . Reverse engineering
As seen in section 5.3.3, Crucial Real C300 performed better than Intel X25E. This chapter
focuses on system level structure of solid state drives deeply. Here, it is tried to analyze the
factors that influenced the difference in performance. To visualize, reverse engineering
process is carried on Intel X25E and Crucial Real C300.
Reverse engineering is the process of discovering the technological principles of a human
made device, object or system through analysis of its structure, function and operation. It
often involves taking something (a mechanical device, electronic component, or software
program) apart and analyzing its workings in detail to be used in maintenance, or to try to
make a new device or program that does the same thing without using or simply duplicating
(without understanding) any part of the original.
Reverse engineering has its origins in the analysis of hardware for commercial or
military advantage. The purpose is to deduce design decisions from end products with little or
no additional knowledge about the procedures involved in the original production.
The basic building blocks of the Solid State Drive are Flash chip Array , Host interface and
Controller chip which holds the other two intact and manages entire system. Intel X25E and
crucial Real C300 are approached by these building blocks for analysis.
7.1 Intel X25-Extreme
The Intel X-25E [16] board which is shown below in Figure 33 mainly has three kinds of
chips namely: a set of flash chips, a controller and a DRAM chip.
Intel X25 Extreme uses 50 nm Single Level Chip (SLC) to build its Flash array block. The
X25-E uses a 10-channel storage controller backed by 16MB of cache. Amusingly, the cache
is provided by Samsung K4S281632I-UC60 SDRAM memory chip. The storage controller is
an Intel design that is particularly crafty, supporting not only SMART monitoring, but also
Department of EIT Hochschule Rosenheim Page 64
Native Command Queuing (NCQ). NCQ was originally designed to compensate for the
rotational latency inherent to mechanical hard drives, but here it is being used in reverse, the
ability of the SATA hard drive to queue and re-order commands to maximize execution
efficiency. It takes a little time (time is of course relative when you're talking about an SSD
whose access latency is measured in microseconds) between when a system completes a
request and the next one is issued.
Figure 33: Intel X25 – Extreme SSD
Intel X25E is compatible with SATA 1.5 Gbps and 3 Gbps. The flash packages of course are
only the building blocks of an SSD. Much of the magic comes from the architecture and
optimizations of the SSD controller logic.
7.1.1 Controller Analysis
The Intel X25E controller was scanned and opened to gather more information about the
internal structure. The carving of Intel X25E pictures are shown below
Department of EIT Hochschule Rosenheim Page 65
Figure 34: Controller from Marvell on Intel X25-E SSD board
The controller was a ball grid array package (BGA) with a single row wire bonding. Wires
were made of gold used for bonding. Although the controller chip seems to be from Intel at
the glance, the specifications on the die indicate it was from Marvell.
The analog and digital sections in the controller die were well distinguished. The orientation
of the controller die on the Intel X25E clearly indicates the SATA controller, DRAM
controller and the Flash controller sections.
The specifications of the controller are indicated in the Table 7-1, Table 7-2, Table 7-3.
7.2 Crucial Real C300
Crucial Real C300 [17] features with 16 MLC flash memory chips split evenly between the
two sides of the circuit board. The 128GB capacity SSD uses 8GB flash chips that have two
NAND dies apiece. Crucial Real SSD uses flash memory chips from Micron. The flash chips
in modern solid-state drives usually conforms to the Open NAND Flash Interface (ONFI) 1.0
standard like the Intel X25E uses, but the Crucial Real SSD flash chips are hip to the much
more recent ONFI 2.1 spec.
The ONFI 2.1 specification pushes NAND performance levels into a new performance range:
166 MB/s to 200 MB/s. This new specification is the first NAND specification to specifically
address the performance needs of solid-state drives to offer faster data transfer rates in
combination with other new technologies like SATA 6 Gbps, USB 3.0 and PCI Express
Gen2.
Department of EIT Hochschule Rosenheim Page 66
Figure 35: Crucial Real C300 SSD
If you want to wring more than 300MB/s from a mechanical hard drive, you are going to
have to combine several of them in RAID. Solid-state drive makers are actually faced with
the same challenge. Individual flash chips do not necessarily offer superior sequential
throughput to traditional hard drives, which means that SSD seeking to maximize
performance must distribute the load across numerous chips tied to multiple memory
channels, effectively creating a multiple channel array within the confines of a single drive.
The Crucial Real SSD inherits its 6Gbps Serial ATA support from Marvell's 88SS9174 flash
controller which supports TRIM command set. TRIM works in conjunction with Marvell's
Department of EIT Hochschule Rosenheim Page 67
garbage collection routine, which runs in the background to reclaim flash pages marked as
available by the command. The frequency with which garbage collection is performed
depends on how the drive is being used and how much free capacity it has available. With
eight memory channels, the Marvell controller is two short of the ten channels Intel squeezed
into its X25E SSD. Crucial claims the C300 SSD can sustain a sequential read rate of
355MB/s when connected to a 6Gbps SATA interface. The drive's sequential read
performance purportedly drops to 265MB/s when using a 3Gbps link.
Flipping the C300's circuit board reveals a DDR memory chip that serves as the drive's cache.
The 128MB Micron DDR3 DRAM module offers decent cache performance for fast
transaction buffering, which will become more important as SATA-III 6.0 Gbps transfers are
observed.
7.2.1 Controller Analysis
The Marvell controller provides a lot better performance; it would be interesting to have a
closer look. The scanned and opened up pictures provide more information about the internal
structure.
Figure 36: Controller from Marvell on Crucial Real C300 SSD board
Unlike Intel X25E controller, Crucial Real C300 controller die did not give clear picture.
However, by the orientation of the controller die on the board, the SATA and cache
interconnections was visualized. The closer look suggested, the controller chip was ball grid
array (BGA) package with wire bonding. Contrast to single row bonding in Intel X25E
controller die, Crucial Real C300's Marvell controller used three rows for bonding expect for
SATA where it used single row bonding. The pads for wire bonding were neatly arranged in
multiple rows to shrink the die size.
Department of EIT Hochschule Rosenheim Page 68
The surface of the die looked like FPGA but a detailed analysis suggested, it was a mesh used
by Marvell to limit inheritance of the design by its competitors. The much deeper analysis
was of interest but due to certain limitations, analysis was concluded at this stage.
7.3 Summary
The reverse engineering analysis of the controllers from Intel X-25E and Crucial C300 is
summarized in the table 7-1. With same package technology, increased number of balls, the
size of the die from Crucial Real C300 SSD controller is comparably small. Although both
SSDs use controller from Marvell, Crucial C300 uses the latest release of it which includes
improved firmware features.
7.3.1 Controller Specification
SSD Intel X25-E SSD (32GB) Crucial Real C300 SSD (128GB)
Controller Manufacturer-Year Marvell -2007 Marvell -2009
Chip Size (cm) 1.9x1.9 1.7x1.7
Part Number PC29AS21AA0 88SS9174-BJP2
Package Technology BGA BGA
Balls 409 521
Die Size(mm) 5.9 x 5.9 4.4 x 4.4
Bonding Wire Wire
# Bonding Rows 1 Row 3 Rows (SATA signals – 1 Row)
Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD
As seen in section 4.4.1, a better performance can be obtained by increasing the number of
channels, giving SSDs the ability to internally process a number of operations in parallel. The
Intel X25-E uses 10 channels with 20 flash chips but Crucial C300 uses only 8 channels with
16 flash chips still giving a better performance. This is because, flash chips in Crucial C300
uses advanced interface standard (ONFI 2.1) in contrast to Intel X-25-E’s ONFI 1.0 standard.
ONFI 2.1 offers simplified synchronous flash controller design, but pushes performance
levels to higher range –166 MB/s to 200 MB/s. This is summarized in table 7-2
7.3.2 Flash Interface
SSD Intel X25-E SSD (32GB) Crucial Real C300 SSD (128GB)
Flash Chip SLC MLC
Number of Channels 10 08
ONFI standard* 1.0 2.1
Speed/flash chip (MB/s) [26] 50 166-200
Cache SDRAM DDR3 SDRAM
Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD
Department of EIT Hochschule Rosenheim Page 69
The advanced flash interface technology in Crucial Real C300 is supported by SATA III i.e
6Gbps, while Intel is supported by SATA II interface.
7.3.3 Host Interface
7.4 Conclusion
The latest SSD controller from Marvell and advanced flash chip interface standards obtained
in Crucial Real C300 giving higher bandwidth, which is backed up by a hefty DDR3 buffer
enables it to meet the demands for faster data rates with SATA 6Gbps. These mentioned
features of Crucial Real C300 SSD contribute to outperform Intel X25E SSD.
Controller is the main part which holds interfaces from surrounding units to produce
extended performance, baring bottle necks of individual parts. Overall NAND performance is
an important factor at a time when faster speeds are a critical design factor for solid state
drives; especially the interfaces of those SSDs connect to offer faster data rates with Serial
ATA 6 Gbps, USB 3.0, and PCI Express Gen2.
*Note: In our tests, a common SATA 3Gbps was used for testing both the SSDs
SSD Intel X25-E SSD (32GB) Crucial Real C300 SSD (128GB)
SATA Interface Compatibility 1.5 Gbps , 3Gbps 1.5Gbps,3Gbps,6Gbps
Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD
Department of EIT Hochschule Rosenheim Page 70
Chapter 8
8 .Designing optimal performance based SSD system
level architecture and its Controller cost estimation
System designers perform a series of trade-offs when selecting a particular controller for their
target product and target market(s).
The trade-offs include:
Programmatic – cost, schedule, support, warranty, and availability.
Technical – performance, power, package options, features, scalability, and
flexibility.
Other – commonality, compatibility, documentation, development support, testing,
and reputation.
In the process of controller selection, the system designer is also doing the same analysis for
the flash parts and other parts needed in the design. It is an iterative process to find the right
combination of components to best meet the requirements for the particular product.
Due to proprietary concerns, not all the controller design data is available to the general
public over the Internet. There is however a significant amount of application detail that can
be learned for each of the SSD controllers on the market by studying their use in existing
SSDs. In order to meet the known performance bandwidth specifications of current interface
technologies through SSD and put them into economic perspective, this section performs a
package level cost-effectiveness analysis of controllers by varying the system level
architecture of SSD-considering SSD controller price per performance as our metric of
choice.
8.1 Cost estimation of controller for a system designed to
meet performance specification
8.1.1 MATLAB GUIDE
GUIDE, the MATLAB graphical user interface development environment, provides a set of
tools for creating graphical user interfaces (GUIs). These tools simplify the process of laying
out and programming GUIs.
Department of EIT Hochschule Rosenheim Page 71
Using the GUIDE Layout Editor, user can populate a GUI by clicking and dragging
GUI components-such as axes, panels, buttons, text fields, sliders, and so on-into the layout
area. User can also create menus and context menus for the GUI. From the Layout Editor,
user can size the GUI, modify component look and feel, align components, set tab order,
view a hierarchical list of the component objects, and set GUI options.
A tool was created using MATLAB GUIDE, where the GUI will through-up options for the
user to design his own system. The tool created defines the controller size and its cost for the
designed system.
The tool created is system level optimization tool, designed to optimize different interfaces in
solid state storage system to get the best performance and cost for a desired system interface.
It defines the Controller which meets the performance of system interface by varying the
quantity of other opted type of integral parts of SSD.
8.2 Implementation factors in optimization tool
While designing the optimization tool, several factors have been taken into considerations
which are listed below.
Performance Factors
Host Interface
Host Interface Coding technique Data rate/second Transfers/second Clock
SATA 2.0 8b/10b 300MB 3GT/s 3GHz
SATA 3.0 10b/8b 600MB 4GT/s 4GHz
PCI-e 2.0* 8b/10b 500MB 5GT/s 5GHz
PCI-e 3.0* 128b/130b 1GB 8GT/s 8GHz
Table 8-1 System Interface types and their performances
* Per Lane
Buffer Cache Interface
Cache Type Transfers/second
DDR1 200-400 MT/s
DDR2 400-1066 MT/s
DDR3 800-2133 MT/s
Table 8-2 Buffer Cache types and their performances
Department of EIT Hochschule Rosenheim Page 72
In setting the buffer cache interface for the designed system, its value assigned performs 4
times the maximum performance of system interface/Host interface selected.
Flash Interface
Flash interface is the main part in the solid state drive where the performance of the designed
system is controlled. The performance is controlled by varying the number of flash channels,
number of flash chips per channel, channel width. The tool provides with two different
options i.e Single Level Cell and Multi Level Cell to be chosen while designing the system.
Note: Flash read Performance per chip per channel is considered as a variable as it depends
on the manufacturer.
Controller Size Factors
While calculating the controller size resulting from system designed, different interfaces that
are to be handled and their resulting signal pins are considered. Table below lists the different
signal pins that are considered for respective interfaces on the controller.
Signals Flash Chips DRAM** SATA PCI-Express* UART
Control Signals Chip select CK CLK REF_CLK_P CLK
Write enable CK# REF_CLK_N
Command latch enable CKE
Ready/busy RST#
Reset/write protect RAS#
CAS#
WE#
CS#
ODT
Memory Address MA
Bank Address BA
Data signal DQ DQ Transmitter[+,-] PET_P[x : 0]
PET_N[x : 0] Transmitter
DQS Receiver[+,-] PER_P[x : 0]
PER_N[x : 0] Receiver
DQS#
DM
Table 8-3 SSD Controller Interface signals
* X: 2^Number of lanes ** More signals with varying quantity
The power to ground pin ratio for signal pins is set as variable.
Department of EIT Hochschule Rosenheim Page 73
Controller Cost Factors
The tool calculates controller cost for two types of packages namely, Flip-Chip BGA and
Wire bonded BGA. The packaging cost for FCBGA and wire bonded BGA is calculated as a
function of variables such as die cost, number of I/Os, wafer level die yield, and assembly
process yield. The cost of the designed controller is calculated in two sections – Cost of the
Die and cost of package.
The size of the die depends on number of I/O pads, pad pitch and their arrangements.
Package type FCBGA Wirebonded BGA
Pad pitch (microns) 150 80
Bond pad configuration Area array pads Peripheral pads
The gross Die Per Wafer [DPW] can be estimated by the following expression:
The cost of the die is given by
Cost per Die ($) =
The cost for respective package depends on cost of each process step indicated in the figure
Figure 37 : Process flow for flip chip BGA and wire bonded BGA packaging.
Wafer diameter [d, mm]
Die size [S, mm2]
Department of EIT Hochschule Rosenheim Page 74
8.2.1 The properties of the tool are as defined below
1. Verifies the designed system for optimality by warning over design and under design
based on selected system interface maximum performance.
2. Estimates the number of pins on the controller chip based on the designed system.
3. Based on the number of pins indicated, the die size required and the cost is calculated
with selected technology and resources.
4. Calculates the cost of the estimated controller chip for the designed system.
Figure 38: Design tool outlook
Figure 39: Warning- System is over designed or under designed with respect to performance specified.
Department of EIT Hochschule Rosenheim Page 75
Figure 40: Cost calculation tool
8.2.2 Advantages of the tool
User can design the SSD system architecture for highest degree of performance by
varying flash type, buffer type, their size, number of channels, channel bandwidth.
The user is provided with input option to set the performance for flash modules (one
of the main contributor for system performance) based on which the tool calculates
the overall SSD system performance.
Based on the flash module specifications, the tool calculates the system capacity.
The tool warns the user when system is over and under designed.
The tool suggests the number of buffer channels required automatically based on the
type, size, and channel bandwidth of buffer selected.
The tool provides options to select type of package for controller before calculating
the cost.
8.2.3 Limitations
The tool has limited number of package options while calculating the Cost.
The performance values indicated in the tool during design are all theoretical.
Department of EIT Hochschule Rosenheim Page 76
8.3 Optimization tool consistency test for controller size
Systems are designed using optimization tool with interface SATA 2.0 and SATA 3.0. The
controller size of the designed systems is compared with Intel X-25E and Crucial Real C300
shown below.
SSD Controller
(system Interface)
Tool Designed
(Number of pins/balls)
Company Designed
(Number of pins/balls)
Intel X25-E (SATA 2.0) 448 409
Crucial C300 (SATA 3.0) 554 521
Figure 41 : Controller size for the system with SATA 2.0 interface
Figure 42 : Controller size for the system with SATA 3.0 interface
Department of EIT Hochschule Rosenheim Page 77
The values from the tool indicate that controller sizes are comparable. The difference in
values could be due to various reasons such as signal to power pins ratio. These values are
just for comparison.
The illustration of the tool for optimal system design and controller cost estimation is shown
in Appendix B
8.4 Hints to use tool for optimal system design and
controller cost estimation:
Select the desired host interface to set the performance specification for the system to
be designed.
Fill in the inputs such as flash chip performance, power-ground to I/O pins ratio,
select type of cache desired.
While designing the system, beware of the warning massage for over design or under
design of the system. This helps user to design optimal system architecture for the
performance specified in first step.
Select calculate button to know total number of pins (balls) on controller chip for the
designed system.
To know the cost of the controller for the designed system, press Die Cost ($) which
is at bottom left of your screen. Select desired, node technology and then select
desired package type and finally press Chip cost button to view the cost.
Department of EIT Hochschule Rosenheim Page 78
Chapter 9
9 . Summary
9.1 Conclusion
In conclusion, the common sense intuition that flash based Solid State Drive’s (SSDs)
provide superior performance for large read I/O is validated. As studied, SSDs are several
times faster for reads on an average, and extremely faster with respect to the access speed
compared to hard disk drives. Solid state drives are more efficient in power consumption
comparably.
Heavily used transactional databases, where there is an excessive random I/O workload
benefit the most from SSD technology as it additionally helps to negate the disk configuration
issues. With HDD devices, it is critical as to how the database structure is laid out, the
number of spindles etc, but, for SSD based systems, it does not matter whether data is laid or
use column or row-oriented storage for our databases as all the data space finally results in
the same performance!
Although it may be considered as application specific, considering the test results, investment
on solid state drives is better than on Random Access Memory in regards to performance
boost.
The system level optimization tool simulates the scenario which helps for studying the solid
storage system and carrying out various odds and outs to enhance the performance of the
system in order to save cost when it’s implemented practically. The development of system
level optimization tool is very much effective in designing the solid state storage systems
architecture for best performance for a desired system interface. A detailed analysis of the
factors considered in the tool helps to guide the decision and clarify the effects of the
variables on the cost of controller.
Department of EIT Hochschule Rosenheim Page 79
With the tremendous hype, it makes truly excited about the potential of SSDs and the rate at
which manufacturers are improving on this technology making them increasingly delicious,
seeing them dominating the market is a little far from reality at this point.
9.2 Future work on SSD
DRAM based SSDs will continue as a niche product, as cost and capacities will continue to
limit its use to system memory. If this continues, then improvements will have to be made in
data management to make better use of these SSDs, bringing tiered storage architecture into
an area that was traditionally just a flat file system. Developing middleware software to take
advantage of SSD however will require a much longer time frame than simply improving on
an existing product. It will also require a certain amount of discipline to manage a more
graduated approach of architecture with a better level of overall management.
Controller in solid state drive is a vital part; companies have to focus on bringing up better
architecture. SSD controller technology has to be the target in parallel to flash interface.
Currently, major SSD designs have moved from a 4 channel to a 10 channel controller and
controllers with even more channels has to be implemented. This will allow SSD drives to
perform much faster.
Improvements in MLC technology in terms of reliability, capacity and cost will increase the
appeal of SSDs. Alternative technologies are also waiting in the wings to replace SSD such as
phase change memory (PRAM) and resistive memory (RRAM), which may give more
appealing alternative technologies to SSD in terms of cost and performance ahead of
what SSD can achieve.
Department of EIT Hochschule Rosenheim Page 80
Appendix A
Building TPC-H benchmark
The TPC Benchmark H (TPC-H) is a decision support benchmark. It consists of a suite of
business oriented ad-hoc queries and concurrent data modifications. The queries and the data
populating the database have been chosen to have broad industry-wide relevance. This
benchmark illustrates decision support systems that examine large volumes of data, execute
queries with a high degree of complexity, and give answers to critical business questions.
TPC-H benchmark is an Embedded SQL database application, which connects to database
and execute embedded SQL statements. Embedded SQL statements are embedded within a
host language application.
To build the TPC-H benchmark, TPC provides a set of tools namely, a database population
generator (DBGEN) and a query template translator (QGEN). With IBM DB2 as a platform,
DBGEN provides data for the database and QGEN provides SQL queries. Using these,
TPC-H benchmark application can be created. This is done in two parts, creating TPC-H
database and creating a package between query source file and the database.
TPC-H database generation
DBGEN generates 8 separate ascii files, each file will contain pipe-delimited data.
Create 8 tables under database schema named as TPC-H and import each one of the
ascii files into tables defined in the TPC-H database schema.
Assign keys by altering tables in TPC-H database as per TPC-H specification [13].
Department of EIT Hochschule Rosenheim Page 81
TPC-H application package
Source file is created by embedding TPC-H queries/SQL statements in 'C' programming
language. To run applications written in compiled host languages, you must create the
packages needed by the database manager at execution time. The Figure 43 shows the order
of these steps, along with the various modules of a typical compiled DB2 application [4]
1. Create source files that contain programs with TPC-H queries.
2. Connect to a TPC-H database generated using DBGEN, then precompile each source
Figure 43: procedure to create application package
Department of EIT Hochschule Rosenheim Page 82
file to convert embedded SQL source statements into a form the database manager
can use.
[The precompiler converts embedded SQL statements directly into DB2 run-time
services API calls. When the precompiler processes a source file, it specifically looks
for SQL statements and avoids the non-SQL host language. PRECOMPILE (PREP) is an
application process that modifies source files containing embedded SQL statements
(*.sqc) and yields host language calls consisting of a source file(s) (*.c) and a
package. It is at this precompile time that the TIMESTAMP, which is also known as
the UNIQUE ID or CONSISTENCY TOKEN, is generated and is associated with the
package through the bind file and modified source code.]
3. Compile the modified source files (*.c) using the host language compiler.
4. Link the object files with the DB2 and host language libraries to produce an
executable program.
Compiling and linking (steps 3 and 4) create the required object modules
5. The BIND command invokes the bind utility. It prepares SQL statements stored in the
bind file generated by the precompiler and creates a package that is stored in the
database. Bind the bind file to create the package, or bind if different database is
going to be accessed. Binding creates the package to be used by the database manager
when the program is run.
6. Run the TPC-H benchmark application. The application accesses the TPC-H database
using the access plans.
Department of EIT Hochschule Rosenheim Page 83
Appendix B
System level optimization tool
The System level optimization tool, designed to optimize different interfaces in solid state
storage system to get the best performance and cost for a desired system interface is
illustrated.
Different interfaces to the controller influencing the performance of Solid State Drives:
Host Interface
Flash Interface
Number of Channels
Channel width
Flash Chip read performance
Buffer cache Interface
Cache type
Cache standard
I/O channel width
Number of channels
Cache size
The tool is operated in two sections:
Section1: Design a solid state drive for optimal performance
The tool provides options to select different interfaces mentioned above while designing the
solid state storage system. Based on the system interface (Host Interface) selected and its
maximum performance, the tool focuses on performance optimality – the tool warns if the
system is over/under designed while selecting different interfaces which influence the
performance. This is illustrated below in steps
Department of EIT Hochschule Rosenheim Page 84
Step1: Select the desired Host Interface which is the critical factor based on which the
system is designed. The system is designed to match the maximum performance offered by
selected Host Interface. Here SATA 2.0 is selected for illustration
Step2: Enter all the parameters needed to design a system and calculate its performance such
as flash chip read performance, signal to power pin ratio
Department of EIT Hochschule Rosenheim Page 85
Step 3: Select the type of cache and its standard to vary the number of cache channels
suitably. The buffer cache channels are selected ensuring that the performance of buffer
cache is 4 times faster than the selected system interface. Here DDR2 is selected and flash
chip read performance is entered as 25ns (nano-seconds).
Step 4: The Signal-pins button calculates the designed system performance along with the
number of balls on the designed controller. In this case the tool shows a system warning as
the system is under-designed for desired system interface. This is visualized by comparing
Department of EIT Hochschule Rosenheim Page 86
desired system performance and designed system performance which are 300MBps and
80MBps respectively. The Vcc-Vdd/IO pin ratio is considered as 1(every 2 IO pins requires 1
Vdd and 1 VCC).
Step 5: To increase the performance, the number of flash channels in parallel are increased.
In this case it is increased from 2 to 4 channels.
Department of EIT Hochschule Rosenheim Page 87
Step 6: The system warning indicated that the system is still under designed. So either the
number of flash channels should be increased else the channel width can be increased to
attain better performance. In this case the channel width is increased from 8bit to 16bit.
Step 7: When the designed system performance and the desired system performance are
matched with +/- 10 percent, the warning stops, indicating that the system design is optimal
with respect to selected Host Interface. The designed system is optimal in performance and
resulting controller has 340 pins (balls).
Department of EIT Hochschule Rosenheim Page 88
Step 8: If the designed system exceeds the maximum performance by host interface, the tool
warns for overdesigned system. The system should be altered by comparing the host interface
performance and designed system performance.
Step 9: The designed system capacity can be increased by varying Flash Chips/Channel
menu and also by selecting the number of Die’s/Flash chip
Section 2: Cost calculation of the controller for the designed Solid State Drive
Continuing the previous example, the designed system has 340 pins (balls) as seen in section-
1, step 7. This section of the tool calculates cost of the die based on number of resulted pins
along with selected parameters such as node technology, wafer diameter and the package
selected. Considering the similar packages available from Texas Instruments, an estimation
of cost can be made for the controller of the system designed.
Department of EIT Hochschule Rosenheim Page 89
The cost of the controller chip for designed system is 9.89 $ approximately.
Department of EIT Hochschule Rosenheim Page 90
Bibliography
[1] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti. Introduction to flash memory.
Proceedings of the IEEE, 91(4):489–502, April 2003.
[2] Intel X25-E Extreme SATA Solid-State Drive
http://download.intel.com/design/flash/nand/extreme/319984.pdf,
[3] http://onfi.org/specifications/
[4] IBM DB2 Guide -IBM public library
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.
apdv.embed.doc/doc/c0021136.html
[5] Cagdas Dirik and Bruce Jacob. The performance of pc solid-state disks (ssds) as a
function of bandwidth, concurrency, device architecture, and system organization. In ISCA
’09: Proceedings of the 36th annual international symposium on Computer architecture,
pages 279–289,New York, NY, USA, 2009. ACM.
[6] Design tradeoffs for SSD performance, Nitin Agarwal,Vijayan Prabhakaran.
www.usenix.org/event/usenix08/tech/full_papers/agrawal/agrawal.pdf
[7] David Roberts, Taeho Kgil, and TrevorMudge. Integrating nand flash devices onto
servers. Commun. ACM, 52(4):98–103, 2009.
[9] Super Talent Technology. SLC vs. MLC: An Analysis of Flash Memory.
http://www.supertalent.com/datasheets/SLC_vs_MLCwhitepaper.pdf.
[10] Trends in Enterprise Hard Disk Drives, Seiichi Sugaya (June 30, 2005)
http://www.fujitsu.com/downloads/MAG/vol42-1/paper08.pdf
[11] Intel Corporation. Intel - Understanding the Flash Translation Layer (FTL) Specication.
http://www.embeddedfreebsd.org/Documents/Intel-FTL.pdf, 1998.
[12] Imation. Solid State Drives - Data Reliability and Lifetime.
http://www.imation.com/PageFiles/83/SSD-Reliability-Lifetime-White-Paper.pdf.
[13] TPC BENCHMARKTM H- www.tpc.org/tpch/spec/tpch2.1.0.pdf
Department of EIT Hochschule Rosenheim Page 91
[14] HD tune pro manual, hdtunepro.pdf
[15] Tom's hardware. Flash SSD Update: More Results, Answers.
http://www.tomshardware.com/reviews/ssd-hard-drive,1968-4.html.
[16] Intel® X25-E Extreme SATA Solid-State Drives
http://download.intel.com/design/flash/nand/extreme/319984.pdf
[17] RealSSD™ C300 2.5 Technical Specifications – Crucial-
www.crucial.com/pdf/Datasheets-letter_C300_RealSSD_v2-5-10_online.pdf
[18] Computer organization and design: the hardware and software interface by David
A.Patterson, John L. Hennessy (page 450-475)
[19] en.wikipedia.org/wiki/computer_data_storage
[20] Intel: Disk Interface Technology,Quick reference guide – NP2108.pdf 1040211
[21] Write Endurance in Flash Drives: Measurements and Analysis, Simona Boboila &
Peter Desnoyer-http://www.usenix.org/event/fast10/tech/full_papers/boboila.pdf
[22] Intel High Performance Solid State Drive - Solid State Drive Frequently Asked Questions
http://www.intel.com/support/ssdc/hpssd/sb/CS-029623.htm#5
[23] http://en.wikipedia.org/wiki/Benchmark_(computing)
[24] Barracuda 7200.10 – www.seagate.com/docs/pdf/datasheet/ds_7200_10.pdf
[25] en.wikipedia.org
[26] http://onfi.org/wp-content/uploads/2011/03/20100818_S104_Grunzke.pdf
[27] http://www.novopc.com/2008/09/hard-disk/
[28] http://www.easy-computer-tech.com
[29] http://www.ramsan.com/resources/SSDOverview
[30] http://www.datarecoverytools.co.uk
[31] www.bit-tech.net
[32] http://tjliu.myweb.hinet.net
Department of EIT Hochschule Rosenheim Page 92
Index
A
Addressing, 22
Advanced Technology Attachment, 25
B
ball grid array package (BGA), 65
Benchmark, 49
C
cache, 12, 14, 23, 24, 25, 29, 37, 41, 54, 58, 63, 67
Cell degradation, 38
Controller, 40
Crucial Real C300, 65
D
Disk access time, 21
E
Erase Block, 38
F
FLASH MEMORY, 34
Flash Structure, 37
Flash Translation Layer, 40
G
Garbage collection, 42
H
Hard Disk Drives, 19
HD Tune, 56
I
Intel X25-Extreme, 63
M
Marvell, 65, 66, 67, 68, 69
MATLAB GUIDE, 71
Memory, 11
MLC, 35
O
Offline Storage, 16
P
Page, 37
PCI Express, 44
Primary storage, 14
Processor cache, 14
R
RAM, 12
Reverse engineering, 63
ROM, 11
Rotational latency, 21
S
Secondary Storage, 16
Seek time, 21
Serial Advanced Technology Attachment, 27
SLC, 35
Small Computer System Interface, 26
Solid State Drives, 32
Storage Hierarchy, 13
System Architecture, 9
T
Tertiary Storage, 16
TPC-H, 50, 51, 52, 60, 81, 82, 83
Trim, 43
W
Wear-leveling, 42
Write Amplification, 43