uc berkeley 1 a disk and thermal emulation model for ramp zhangxi tan and david patterson
Post on 20-Dec-2015
217 views
TRANSCRIPT
1
UC Berkeley
A Disk and Thermal Emulation Model for RAMP
Zhangxi Tan and David Patterson
2
Outline
• Introduction and retrospective overview
• Improvement since June 06
• Disk and temperature emulation
• Future work
3
June 06 status
• Internet in a box Version 0– 3 Xilinx XUP board ($299*3)
with 12 processors– uClinux and research
application (i3)
• Limitations– Software base is poor
• No MMU, no fork, no full version of linux
• Every software need porting
– Processor is too slow (100 MHz vs 3 GHz)
– No local storage per nodes
ROUTER
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB100 Mbps Ethernet
100 Mbps Ethernet 100 Mbps Ethernet
OTHER DEVICES
OTHER DEVICES
TO EXTERNAL NETWORK
Board #1 Board #2
Board #3
192.168.1.35
192.168.1.37
192.168.1.36
192.168.2.2
192.168.3.2 192.168.4.2 192.168.13.2
192.168.12.2
192.168.14.2
192.168.23.2 192.168.24.2
192.168.22.2
4
ImprovementJun 06 Jan 07
ProcessorProcessor
MicroBlaze LEON 3
32-bit RISC/Microcontroller 32-bit SPARC V8
No MMU MMU/Configurable TLB
Single precision floating point IEEE 754 Floating Point
Direct map cache Direct map/Set associative cache
OS and SoftwareOS and Software
uClinux 2.4 (no protection, no fork) Full Linux 2.6.18.1
Every software needs porting Run latest Debian/GNU Linux binaries directly (support apt-get)
OthersOthers
No disk emulation Emulate local disk with Ethernet attached storage
Slow processor only Emulate fast systems with “Time Dilation”
- Emulate system temperature
5
Agenda
• Introduction and retrospective overview
• Improvement since June 06
• Disk and temperature emulation
• Future work
6
Disk and Thermal Emulation
• Local disk is an essential part for datacenter– Local physical storage– Variable disk specifications (VM only have a
function module)– In the context of real workload
• Temperature is a critical issue in DC– Cooling, reliability– How the workload will affect the temperature
in datacenter is an interesting topic
7
Methodology• HW Emulator (FPGA): 32-bit Leon3 with, 50MHz, 90 MHz DDR memory, 8K L1 Cache
(4K Inst and 4K Data)– Target system: Linux 2.6 kernel, 50 MHz – 2 GHz
• PC – storage, trace logger and model solver (offline or online)
Target Platform(32-bit Leon)
DiskSim(Timing
Info)Storage
AoE ParserAoE Kernel
Driver
Ethernet
FPGA
PC
Activity Monitors
MercuryThermal Emulator
Emulating IDE disk with Ethernet based network storage (ATA over Ethernet) + DiskSim
AoE: Encapsulate IDE command in Ethernet packetDiskSim: widely used disk simulator (provide access timing based on disk specification)
Thermal emulation is done by Mercury suite (ASPLOS’ 06)
Sample CPU/disk activities periodically and send to a central emulatorEmulator takes system configuration and predict temperature based on Newton’s laws of coolingDisk state will help power estimation
Time dilation makes “target” looks fasterReprogram HW timer to make ‘jiffies’ longer in terms of wall clock Slow down memory accordingly, when speeding up processor
8
Experiments
• Thermal emulation model (validated in Mercury)– Physical layout from Dell PowerEdge 2850
• 3 GHz Xeon, 10K RPM SCSI• Emulated disk model (validated disk model in Disksim)
– Seagate Cheetah 9LP• 10K RPM, 5 ms avg seek time
• Several programs run in target system with various time dilation factors– Dhrystone: CPU intensive benchmark– Postmark: A file system benchmark (disk intensive)– Unix command with pipe (both disk and CPU intensive)
• cat alargefile | grep ‘a search pattern’ > searchresultfile• 100 MB file size
• Emulation output– Performance statistics– System temperature
9
Dhrystone result (w/o memory TD)
How close to a 3 GHz x86 ~8000 Dhrystone MIPS? Memory, Cache, CPI
10
Dhrystone w. Memory TD
Keep the memory access latency constant - 90 MHz DDR DRAM w. 200 ns latency in all target (50MHz to 2GHz)- Latency is pessimistic, but reflect the trend
11
Postmark file system benchmark
• Speed-up factor is larger than TDF (overhead)• How close to modern SATA disk? Twice throughput if run the same benchmark.
12
Disk emulation performance
• Overhead analysis– <1.4ms sending packet (no zero-copy, VM)– Burst of requests (service time < 10ms, including Disksim), AoE protocol segmentation – Larger TDF offset overhead
• Overall emulated disk time still a little longer than simulated timing in disksim (~2.8 ms)
13
Emulated disk R/W time in target
• Pretty deterministic result with different TDF
14
CPU Temperature Emulation
50 MHz 250 MHz 500 MHz
1 GHz 2 GHz• Need calibration to get correct absolute value• Trend is accurate
15
Disk Temperature Emulation
50 MHz 250 MHz 500 MHz
1 GHz 2 GHz
16
Limitations and Conclusion
• Limitations– AoE limits the maximum number of RW sectors to 2!
(Ethernet packet limitation)– Naïve memory dilation (constant delay)
• Conclusion– Doing disk emulation in SW is pretty “lightweight”, if
• Time dilation makes SW disk fast enough• Having separate network channel for disk emulation
• Future work– Better statistic time dilation model (CPI, distribution),
still simple HW– Emulate real-life disk controller (e.g. Intel ICH) less
overhead