eecs 252 graduate computer architecture lec 17 – advanced memory hierarchy 2

34
EECS 252 Graduate Computer Architecture Lec 17 – Advanced Memory Hierarchy 2 David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~pattrsn http://vlsi.cs.berkeley.edu/cs252-s06

Upload: hal

Post on 21-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

EECS 252 Graduate Computer Architecture Lec 17 – Advanced Memory Hierarchy 2. David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~pattrsn http://vlsi.cs.berkeley.edu/cs252-s06. Review [1]. - PowerPoint PPT Presentation

TRANSCRIPT

  • EECS 252 Graduate Computer Architecture

    Lec 17 Advanced Memory Hierarchy 2

    David PattersonElectrical Engineering and Computer SciencesUniversity of California, Berkeley

    http://www.eecs.berkeley.edu/~pattrsnhttp://vlsi.cs.berkeley.edu/cs252-s06

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Review [1]Memory wall inspires optimizations since so much performance lost thereReducing hit time: Small and simple caches, Way prediction, Trace cachesIncreasing cache bandwidth: Pipelined caches, Multibanked caches, Nonblocking cachesReducing Miss Penalty: Critical word first, Merging write buffersReducing Miss Rate: Compiler optimizationsReducing miss penalty or miss rate via parallelism: Hardware prefetching, Compiler prefetchingAuto-tuners search replacing static compilation to explore optimization space?DRAM Continuing Bandwidth innovations: Fast page mode, Synchronous, Double Data Rate

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Review [2/2]VM Monitor presents a SW interface to guest software, isolates state of guests, and protects itself from guest software (including guest OSes)Virtual Machine RevivalOvercome security flaws of large OSesManage Software, Manage HardwareProcessor performance no longer highest priority

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*OutlineVirtual MachinesXen VM: Design and PerformanceAdministriviaAMD Opteron Memory HierarchyOpteron Memory Performance vs. Pentium 4Fallacies and PitfallsDiscuss Virtual Machines paperConclusion

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Virtual Machine Monitors (VMMs)Virtual machine monitor (VMM) or hypervisor is software that supports VMsVMM determines how to map virtual resources to physical resourcesPhysical resource may be time-shared, partitioned, or emulated in software VMM is much smaller than a traditional OS; isolation portion of a VMM is 10,000 lines of code

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*VMM Overhead?Depends on the workloadUser-level processor-bound programs (e.g., SPEC) have zero-virtualization overhead Runs at native speeds since OS rarely invokedI/O-intensive workloads OS-intensive execute many system calls and privileged instructions can result in high virtualization overhead For System VMs, goal of architecture and VMM is to run almost all instructions directly on native hardwareIf I/O-intensive workload is also I/O-bound low processor utilization since waiting for I/O processor virtualization can be hidden low virtualization overhead

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Requirements of a Virtual Machine MonitorA VM Monitor Presents a SW interface to guest software, Isolates state of guests from each other, and Protects itself from guest software (including guest OSes)Guest software should behave on a VM exactly as if running on the native HW Except for performance-related behavior or limitations of fixed resources shared by multiple VMsGuest software should not be able to change allocation of real system resources directlyHence, VMM must control everything even though guest VM and OS currently running is temporarily using themAccess to privileged state, Address translation, I/O, Exceptions and Interrupts,

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Requirements of a Virtual Machine MonitorVMM must be at higher privilege level than guest VM, which generally run in user mode Execution of privileged instructions handled by VMME.g., Timer interrupt: VMM suspends currently running guest VM, saves its state, handles interrupt, determine which guest VM to run next, and then load its state Guest VMs that rely on timer interrupt provided with virtual timer and an emulated timer interrupt by VMMRequirements of system virtual machines are same as paged-virtual memory: At least 2 processor modes, system and userPrivileged subset of instructions available only in system mode, trap if executed in user modeAll system resources controllable only via these instructions

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*ISA Support for Virtual MachinesIf plan for VM during design of ISA, easy to reduce instructions executed by VMM, speed to emulateISA is virtualizable if can execute VM directly on real machine while letting VMM retain ultimate control of CPU: direct executionSince VMs have been considered for desktop/PC server apps only recently, most ISAs were created ignoring virtualization, including 80x86 and most RISC architecturesVMM must ensure that guest system only interacts with virtual resources conventional guest OS runs as user mode program on top of VMMIf guest OS accesses or modifies information related to HW resources via a privileged instructione.g., reading or writing the page table pointerit will trap to VMMIf not, VMM must intercept instruction and support a virtual version of sensitive information as guest OS expects

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Impact of VMs on Virtual MemoryVirtualization of virtual memory if each guest OS in every VM manages its own set of page tables?VMM separates real and physical memory Makes real memory a separate, intermediate level between virtual memory and physical memorySome use the terms virtual memory, physical memory, and machine memory to name the 3 levelsGuest OS maps virtual memory to real memory via its page tables, and VMM page tables map real memory to physical memoryVMM maintains a shadow page table that maps directly from the guest virtual address space to the physical address space of HWRather than pay extra level of indirection on every memory accessVMM must trap any attempt by guest OS to change its page table or to access the page table pointer

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*ISA Support for VMs & Virtual MemoryIBM 370 architecture added additional level of indirection that is managed by the VMM Guest OS keeps its page tables as before, so the shadow pages are unnecessary(AMD Pacifica proposes same improvement for 80x86)To virtualize software TLB, VMM manages the real TLB and has a copy of the contents of the TLB of each guest VMAny instruction that accesses the TLB must trapTLBs with Process ID tags support a mix of entries from different VMs and the VMM, thereby avoiding flushing of the TLB on a VM switch

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Impact of I/O on Virtual MemoryI/O most difficult part of virtualizationIncreasing number of I/O devices attached to the computer Increasing diversity of I/O device typesSharing of a real device among multiple VMsSupporting many device drivers that are required, especially if different guest OSes are supported on same VM systemGive each VM generic versions of each type of I/O device driver, and let VMM to handle real I/OMethod for mapping virtual to physical I/O device depends on the type of device:Disks partitioned by VMM to create virtual disks for guest VMsNetwork interfaces shared between VMs in short time slices, and VMM tracks messages for virtual network addresses to ensure that guest VMs only receive their messages

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Example: Xen VMXen: Open-source System VMM for 80x86 ISA Project started at University of Cambridge, GNU license modelOriginal vision of VM is running unmodified OSSignificant wasted effort just to keep guest OS happyparavirtualization - small modifications to guest OS to simplify virtualization 3 Examples of paravirtualization in Xen:To avoid flushing TLB when invoke VMM, Xen mapped into upper 64 MB of address space of each VM Guest OS allowed to allocate pages, just check that didnt violate protection restrictions To protect the guest OS from user programs in VM, Xen takes advantage of 4 protection levels available in 80x86 Most OSes for 80x86 keep everything at privilege levels 0 or at 3.Xen VMM runs at the highest privilege level (0) Guest OS runs at the next level (1) Applications run at the lowest privilege level (3)

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Xen changes for paravirtualizationPort of Linux to Xen changed 3000 lines, or 1% of 80x86-specific code Does not affect application-binary interfaces of guest OSOSes supported in Xen 2.0

    http://wiki.xensource.com/xenwiki/OSCompatibility

    OS Runs as host OSRuns as guest OS Linux 2.4 Yes Yes Linux 2.6 Yes Yes NetBSD 2.0 No Yes NetBSD 3.0 Yes Yes Plan 9 No Yes FreeBSD 5 No Yes

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Xen and I/OTo simplify I/O, privileged VMs assigned to each hardware I/O device: driver domains Xen Jargon: domains = Virtual MachinesDriver domains run physical device drivers, although interrupts still handled by VMM before being sent to appropriate driver domain Regular VMs (guest domains) run simple virtual device drivers that communicate with physical devices drivers in driver domains over a channel to access physical I/O hardware Data sent between guest and driver domains by page remapping

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*CS252: Administrivia Advanced Memory Hierarchy (Ch 5) this weekMonday 4/10: James Laudon on Sun T1/NiagaraPh.D. Stanford 94 multiprocessor architect DASH / SGI OriginWed 4/12 Mon 4/17 Storage (Ch 6)Handout Chapter 6Project Update Meeting Wednesday 4/19Monday 4/24 Quiz 2 5-8 PM (Mainly Ch 4 to 6)Wed 4/26 Bad Career Advice / Bad Talk Advice Project Presentations Monday 5/1 (all day)Project Posters 5/3 Wednesday (11-1 in Soda)Final Papers due Friday 5/5 (email Archana, who will post papers on class web site)

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Xen PerformancePerformance relative to native Linux for Xen for 6 benchmarks from Xen developers

    Slide 6: User-level processor-bound programs? I/O-intensive workloads? I/O-Bound I/O-Intensive?

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart1

    1

    0.9704797048

    0.9186046512

    0.9527421237

    0.956937799

    0.9922779923

    Xen/Linux

    Performance relative to native Linux

    Sheet1

    L5671.00Linux (native)Linux (native)Xen (VM)Vmware (VM)Linux (User Mode)

    X5671.00Xen (VM)SPEC INT20001.001.000.980.97

    V5540.98Vmware (VM)Linux build time1.000.970.790.49

    U5500.97Linux (User Mode)OSDB-IR1.000.920.470.38

    SPEC INT2000 (score)OSDB-OLTP1.000.950.120.18

    L2631.00dbench1.000.960.740.27

    X2710.97SPEC WEB991.000.990.290.33

    V3340.79Xen/LinuxVMware Workstation 3.2User Mode Linux

    U5350.49SPEC INT2000100%1.01.0

    Linux build time (s)Linux build time97%0.80.5

    L1721.00PostgreSQL Inf. Retrieval92%0.50.4

    X1580.92PostgreSQL OLTP95%0.10.2

    V800.47dbench96%0.70.3

    U650.38SPEC WEB9999%0.30.3

    OSDB-IR (tup/s)

    L17141.00

    X16330.95

    V1990.12

    U3060.18

    OSDB-OLTP (tup/s)

    L4181.00

    X4000.96

    V3100.74

    U1110.27

    dbench (score)

    L5181.00

    X5140.99

    V1500.29

    U1720.33

    SPEC WEB99 (score)

    Sheet1

    00

    00

    00

    00

    00

    00

    Xen/Linux

    VMware Workstation 3.2

    Performance relative to native Linux

    Sheet2

    0

    0

    0

    0

    0

    0

    Xen/Linux

    Performance relative to native Linux

    Sheet3

  • *CS252 s06 Adv. Memory Hieriarchy 2*Xen Performance, Part IISubsequent study noticed Xen experiments based on 1 Ethernet network interfaces card (NIC), and single NIC was a performance bottleneck

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart1

    942942849

    18821878849

    24621539849

    24461593849

    Linux

    Xen-privileged driver VM ("driver domain")

    Xen-guest VM + driver VM

    Number of Network Interface Cards

    Receive Throughput (Mbits/sec)

    Figure 8 events web server

    # commands

    # set style data histogram

    # set style fill pattern border -1

    # set yrange [0:4.5]

    # set size 0.6,0.6

    # plot 'file' using ($2/$3):xtic(1) title 2, '' using ($3/$3) title 3, '' using ($4/$3) title 4, '' using ($5/$3) title 5

    # set xlabel "Profiled Hardware Event"

    # set ylabel "Relative costs"

    # set terminal postscript eps

    # profile for the 1 NIC runs

    #profiled parameterLinuxxen-domain0xen-guest0xen-guest1

    arbitLinux (1 CPU)Xen-priviledged driver VM (1 CPU)Xen-guest VM + driver VM (1 CPU)Xen-guest VM + driver VM (2 CPUs)

    "Instr"13005151473422734815

    "L-2"319832001383510470

    "I-TLB"423268781196512159

    "D-TLB"1119130872462627262

    Linux (1 CPU)Xen-priviledged driver VM (1 CPU)Xen-guest VM + driver VM (1 CPU)Xen-guest VM + driver VM (2 CPUs)

    "Instr"13005151473422734815

    1.22.62.7

    "L-2"319832001383510470

    1.04.33.3

    "I-TLB"423268781196512159

    "D-TLB"1119130872462627262

    11.722.024.4

    all_knot_prof.data for figure 8

    Relative to Xen-domain0

    Linux (1 CPU)Xen-priviledged driver VM (1 CPU)Xen-guest VM + driver VM (1 CPU)Xen-guest VM + driver VM (2 CPUs)

    Intructions0.861.002.262.30

    L2 misses1.001.004.323.27

    I-TLB misses0.621.001.741.77

    D-TLB misses0.091.001.882.08

    Figure 8

    Figure 8 events web server

    0000000000000000

    0000000000000000

    0000000000000000

    0000000000000000

    Linux (1 CPU)

    Xen-priviledged driver VM (1 CPU)

    Xen-guest VM + driver VM (1 CPU)

    Xen-guest VM + driver VM (2 CPUs)

    Linux (1 CPU)

    Xen-priviledged driver VM (1 CPU)

    Xen-guest VM + driver VM (1 CPU)

    Xen-guest VM + driver VM (2 CPUs)

    Linux (1 CPU)

    Xen-priviledged driver VM (1 CPU)

    Xen-guest VM + driver VM (1 CPU)

    Xen-guest VM + driver VM (2 CPUs)

    Linux (1 CPU)

    Xen-priviledged driver VM (1 CPU)

    Xen-guest VM + driver VM (1 CPU)

    Xen-guest VM + driver VM (2 CPUs)

    Event count relative to Xen-priviledged driver domain

    Figure 7 events web server

    Request rate (reqs/sec)Throughput (Mbits/sec)

    LinuxXen-priviledged driver domainXen-guest0Xen-guest1

    10008.138.138.138.13

    200016.2716.2716.2716.27

    300024.424.424.424.4

    400032.5232.5232.5232.52

    500040.6640.5240.5240.52

    600048.848.845.5148.8

    700056.9556.9543.156.95

    800065.165.143.165.1

    900073.273.145073.14

    1000081.4181.4152.581.41

    1100089.5389.5353.0389.53

    1200097.4697.465397.46

    13000105.76105.75105.75

    14000113.9113.9113.9

    15000122.05122.04107.1

    16000130.298.23106.8

    17000138.3370.9491.7

    18000146.4765.53104.6

    19000154.660.26104.51

    20000122.7557.02104.5

    21000119.2257.14

    22000103.0866.33

    2300096.8955.83

    2400089.7250.53

    Figure 7 events web server

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    Linux

    Xen-priviledged driver domain

    Xen-guest0

    Xen-guest1

    Figure 3 Rcv Thruput

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Xen guest VM +privileged driver VM

    Xen privileged driver VM only

    Lunix

    Linux

    Xen-priviledged driver domain

    Xen-guest0

    Request Rate (Reqs/sec)

    Throughput (Mbits/sec)

    # commands

    # set style data histogram

    # set style fill pattern border -1

    # set yrange [0:3000]

    # set size 0.6,0.6

    # plot 'file' using 2:xtic(1) title 2, '' using 3 title 3

    # set xlabel "Number of NICs"

    # set ylabel "Aggregate Throughput (Mb/s)"

    # set terminal postscript eps

    #number of NICSMb/s(linux)xenXen-domain0

    arbitLinuxXen-privileged driver VM ("driver domain")

    1942942

    218821878

    324621539

    424461593

    LinuxXen-privileged driver VM ("driver domain")Xen-guest VM + driver VM

    1942942849

    218821878849

    324621539849

    424461593849

    Number of NICs

    0

    000

    000

    000

    000

    Linux

    Xen-privileged driver VM ("driver domain")

    Xen-guest VM + driver VM

    Number of Network Interface Cards

    Receive Throughput (Mbits/sec)

    MBD00A6E44D.xls

    Chart5

    0.858585858612.25965537732.2984749455

    0.99937514.32343753.271875

    0.615295143911.73960453621.76781041

    0.085504699311.88171467872.0831359364

    Linux

    Xen-privileged driver VM only

    Xen-guest VM + driver VM

    Xen-guest VM + driver VM (2 CPUs)

    Event count relative to Xen-priviledged driver domain

    Figure 8 events web server

    # commands

    # set style data histogram

    # set style fill pattern border -1

    # set yrange [0:4.5]

    # set size 0.6,0.6

    # plot 'file' using ($2/$3):xtic(1) title 2, '' using ($3/$3) title 3, '' using ($4/$3) title 4, '' using ($5/$3) title 5

    # set xlabel "Profiled Hardware Event"

    # set ylabel "Relative costs"

    # set terminal postscript eps

    # profile for the 1 NIC runs

    #profiled parameterLinuxxen-domain0xen-guest0xen-guest1

    arbitLinuxXen-privileged driver VM onlyXen-guest VM + driver VMXen-guest VM + driver VM (2 CPUs)

    "Instr"13005151473422734815

    "L-2"319832001383510470

    "I-TLB"423268781196512159

    "D-TLB"1119130872462627262

    LinuxXen-privileged driver VM onlyXen-guest VM + driver VMXen-guest VM + driver VM (2 CPUs)

    "Instr"13005151473422734815

    "L-2"319832001383510470

    "I-TLB"423268781196512159

    "D-TLB"1119130872462627262

    11.722.024.4

    all_knot_prof.data for figure 8

    Relative to Xen-domain0

    LinuxXen-privileged driver VM onlyXen-guest VM + driver VMXen-guest VM + driver VM (2 CPUs)

    Intructions0.861.002.262.30

    L2 misses1.001.004.323.27

    I-TLB misses0.621.001.741.77

    D-TLB misses0.091.001.882.08

    Figure 8

    Figure 8 events web server

    Linux

    Xen-privileged driver VM only

    Xen-guest VM + driver VM

    Event count relative to Xen-priviledged driver domain

    Figure 7 events web server

    Request rate (reqs/sec)Throughput (Mbits/sec)

    LinuxXen-priviledged driver domainXen-guest0Xen-guest1

    10008.138.138.138.13

    200016.2716.2716.2716.27

    300024.424.424.424.4

    400032.5232.5232.5232.52

    500040.6640.5240.5240.52

    600048.848.845.5148.8

    700056.9556.9543.156.95

    800065.165.143.165.1

    900073.273.145073.14

    1000081.4181.4152.581.41

    1100089.5389.5353.0389.53

    1200097.4697.465397.46

    13000105.76105.75105.75

    14000113.9113.9113.9

    15000122.05122.04107.1

    16000130.298.23106.8

    17000138.3370.9491.7

    18000146.4765.53104.6

    19000154.660.26104.51

    20000122.7557.02104.5

    21000119.2257.14

    22000103.0866.33

    2300096.8955.83

    2400089.7250.53

    Figure 7 events web server

    Linux

    Xen-priviledged driver domain

    Xen-guest0

    Xen-guest1

    Figure 3 Rcv Thruput

    Xen guest VM+driver VM (1 CPU)

    Xen priviledged driver VM (1 CPU)

    Lunix(1 CPU)

    Xen guest VM+driver VM (2 CPUs)

    Linux

    Xen-priviledged driver domain

    Xen-guest1

    Xen-guest0

    Request Rate (Reqs/sec)

    Throughput (Mbits/sec)

    # commands

    # set style data histogram

    # set style fill pattern border -1

    # set yrange [0:3000]

    # set size 0.6,0.6

    # plot 'file' using 2:xtic(1) title 2, '' using 3 title 3

    # set xlabel "Number of NICs"

    # set ylabel "Aggregate Throughput (Mb/s)"

    # set terminal postscript eps

    #number of NICSMb/s(linux)xenXen-domain0

    arbitLinuxXen-priviledged driver VM ("driver domain")

    1942942

    218821878

    324621539

    424461593

    LinuxXen-priviledged driver VM ("driver domain")Xen-guest VM + driver VM

    1942942800

    218821878930

    324621539930

    424461593930

    Number of NICs

    0

    Linux

    Xen-priviledged driver VM ("driver domain")

    Xen-guest VM + driver VM

    Number of Network Interface Cards

    Receive Throughput (Mbits/sec)

  • *CS252 s06 Adv. Memory Hieriarchy 2*Xen Performance, Part III> 2X instructions for guest VM + driver VM> 4X L2 cache misses12X 24X Data TLB misses

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart3

    0.858585858612.2596553773

    0.99937514.3234375

    0.615295143911.7396045362

    0.085504699311.8817146787

    Linux

    Xen-privileged driver VM only

    Xen-guest VM + driver VM

    Event count relative to Xen-priviledged driver domain

    Chart5

    0.858585858612.25965537732.2984749455

    0.99937514.32343753.271875

    0.615295143911.73960453621.76781041

    0.085504699311.88171467872.0831359364

    Linux

    Xen-privileged driver VM only

    Xen-guest VM + driver VM

    Xen-guest VM + driver VM (2 CPUs)

    Event count relative to Xen-priviledged driver domain

    Figure 8 events web server

    # commands

    # set style data histogram

    # set style fill pattern border -1

    # set yrange [0:4.5]

    # set size 0.6,0.6

    # plot 'file' using ($2/$3):xtic(1) title 2, '' using ($3/$3) title 3, '' using ($4/$3) title 4, '' using ($5/$3) title 5

    # set xlabel "Profiled Hardware Event"

    # set ylabel "Relative costs"

    # set terminal postscript eps

    # profile for the 1 NIC runs

    #profiled parameterLinuxxen-domain0xen-guest0xen-guest1

    arbitLinuxXen-privileged driver VM onlyXen-guest VM + driver VMXen-guest VM + driver VM (2 CPUs)

    "Instr"13005151473422734815

    "L-2"319832001383510470

    "I-TLB"423268781196512159

    "D-TLB"1119130872462627262

    LinuxXen-privileged driver VM onlyXen-guest VM + driver VMXen-guest VM + driver VM (2 CPUs)

    "Instr"13005151473422734815

    "L-2"319832001383510470

    "I-TLB"423268781196512159

    "D-TLB"1119130872462627262

    11.722.024.4

    all_knot_prof.data for figure 8

    Relative to Xen-domain0

    LinuxXen-privileged driver VM onlyXen-guest VM + driver VMXen-guest VM + driver VM (2 CPUs)

    Intructions0.861.002.262.30

    L2 misses1.001.004.323.27

    I-TLB misses0.621.001.741.77

    D-TLB misses0.091.001.882.08

    Figure 8

    Figure 8 events web server

    000

    000

    000

    000

    Linux

    Xen-privileged driver VM only

    Xen-guest VM + driver VM

    Event count relative to Xen-priviledged driver domain

    Figure 7 events web server

    Request rate (reqs/sec)Throughput (Mbits/sec)

    LinuxXen-priviledged driver domainXen-guest0Xen-guest1

    10008.138.138.138.13

    200016.2716.2716.2716.27

    300024.424.424.424.4

    400032.5232.5232.5232.52

    500040.6640.5240.5240.52

    600048.848.845.5148.8

    700056.9556.9543.156.95

    800065.165.143.165.1

    900073.273.145073.14

    1000081.4181.4152.581.41

    1100089.5389.5353.0389.53

    1200097.4697.465397.46

    13000105.76105.75105.75

    14000113.9113.9113.9

    15000122.05122.04107.1

    16000130.298.23106.8

    17000138.3370.9491.7

    18000146.4765.53104.6

    19000154.660.26104.51

    20000122.7557.02104.5

    21000119.2257.14

    22000103.0866.33

    2300096.8955.83

    2400089.7250.53

    Figure 7 events web server

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    Linux

    Xen-priviledged driver domain

    Xen-guest0

    Xen-guest1

    Figure 3 Rcv Thruput

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    0000

    Xen guest VM+driver VM (1 CPU)

    Xen priviledged driver VM (1 CPU)

    Lunix(1 CPU)

    Xen guest VM+driver VM (2 CPUs)

    Linux

    Xen-priviledged driver domain

    Xen-guest1

    Xen-guest0

    Request Rate (Reqs/sec)

    Throughput (Mbits/sec)

    # commands

    # set style data histogram

    # set style fill pattern border -1

    # set yrange [0:3000]

    # set size 0.6,0.6

    # plot 'file' using 2:xtic(1) title 2, '' using 3 title 3

    # set xlabel "Number of NICs"

    # set ylabel "Aggregate Throughput (Mb/s)"

    # set terminal postscript eps

    #number of NICSMb/s(linux)xenXen-domain0

    arbitLinuxXen-priviledged driver VM ("driver domain")

    1942942

    218821878

    324621539

    424461593

    LinuxXen-priviledged driver VM ("driver domain")Xen-guest VM + driver VM

    1942942800

    218821878930

    324621539930

    424461593930

    Number of NICs

    0

    000

    000

    000

    000

    Linux

    Xen-priviledged driver VM ("driver domain")

    Xen-guest VM + driver VM

    Number of Network Interface Cards

    Receive Throughput (Mbits/sec)

  • *CS252 s06 Adv. Memory Hieriarchy 2*Xen Performance, Part IV> 2X instructions: page remapping and page transfer between driver and guest VMs and due to communication between the 2 VMs over a channel4X L2 cache misses: Linux uses zero-copy network interface that depends on ability of NIC to do DMA from different locations in memory Since Xen does not support gather DMA in its virtual network interface, it cant do true zero-copy in the guest VM12X 24X Data TLB misses: 2 Linux optimizationsSuperpages for part of Linux kernel space, and 4MB pages lowers TLB misses versus using 1024 4 KB pages. Not in XenPTEs marked global are not flushed on a context switch, and Linux uses them for its kernel space. Not in XenFuture Xen may address 2. and 3., but 1. inherent?

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Protection and Instruction Set ArchitectureExample Problem: 80x86 POPF instruction loads flag registers from top of stack in memoryOne such flag is Interrupt Enable (IE)In system mode, POPF changes IE In user mode, POPF simply changes all flags except IE Problem: guest OS runs in user mode inside a VM, so it expects to see changed a IE, but it wontHistorically, IBM mainframe HW and VMM took 3 steps:Reduce cost of processor virtualizationIntel/AMD proposed ISA changes to reduce this costReduce interrupt overhead cost due to virtualizationReduce interrupt cost by steering interrupts to proper VM directly without invoking VMM2. and 3. not yet addressed by Intel/AMD; in the future?

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*80x86 VM Challenges18 instructions cause problems for virtualization:Read control registers in user model that reveal that the guest operating system in running in a virtual machine (such as POPF), and Check protection as required by the segmented architecture but assume that the operating system is running at the highest privilege levelVirtual memory: 80x86 TLBs do not support process ID tags more expensive for VMM and guest OSes to share the TLB each address space change typically requires a TLB flush

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Intel/AMD address 80x86 VM ChallengesGoal is direct execution of VMs on 80x86Intel's VT-xA new execution mode for running VMs An architected definition of the VM state Instructions to swap VMs rapidly Large set of parameters to select the circumstances where a VMM must be invoked VT-x adds 11 new instructions to 80x86Xen 3.0 plan proposes to use VT-x to run Windows on Xen AMDs Pacifica makes similar proposalsPlus indirection level in page table like IBM VM 370Ironic adding a new modeIf OS start using mode in kernel, new mode would cause performance problems for VMM since 100 times too slow

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*OutlineVirtual MachinesXen VM: Design and PerformanceAdministriviaAMD Opteron Memory HierarchyOpteron Memory Performance vs. Pentium 4Fallacies and PitfallsDiscuss Virtual Machines paperConclusion

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*AMD Opteron Memory Hierarchy12-stage integer pipeline yields a maximum clock rate of 2.8 GHz and fastest memory PC3200 DDR SDRAM48-bit virtual and 40-bit physical addressesI and D cache: 64 KB, 2-way set associative, 64-B block, LRUL2 cache: 1 MB, 16-way, 64-B block, pseudo LRUData and L2 caches use write back, write allocate L1 caches are virtually indexed and physically taggedL1 I TLB and L1 D TLB: fully associative, 40 entries 32 entries for 4 KB pages and 8 for 2 MB or 4 MB pages L2 I TLB and L1 D TLB: 4-way, 512 entities of 4 KB pagesMemory controller allows up to 10 cache misses8 from D cache and 2 from I cache

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Opteron Memory Hierarchy PerformanceFor SPEC2000I cache misses per instruction is 0.01% to 0.09% D cache misses per instruction are 1.34% to 1.43% L2 cache misses per instruction are 0.23% to 0.36% Commercial benchmark (TPC-C-like)I cache misses per instruction is 1.83% (100X!)D cache misses per instruction are 1.39% ( same)L2 cache misses per instruction are 0.62% (2X to 3X)How compare to ideal CPI of 0.33?

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*CPI breakdown for Integer ProgramsCPI above base attributable to memory 50%L2 cache misses 25% overall (50% memory CPI)Assumes misses are not overlapped with the execution pipeline or with each other, so the pipeline stall portion is a lower bound

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart2

    0.33333333330.106240.2604266667

    0.33333333330.096530.2901366667

    0.33333333330.003690.4829766667

    0.33333333330.151050.3756166667

    0.33333333330.137180.3894866667

    0.33333333330.296490.2501766667

    0.33333333330.565210.1014566667

    0.33333333330.302830.3838366667

    0.33333333330.346250.6004166667

    0.33333333331.204060.2426066667

    0.33333333330.931910.5847566667

    0.33333333331.251510.9851566667

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    Sheet3

    Base CPIPipeline StallMemory CPITotal CPI% CPI-base in Memory

    TPC-C like0.330.921.312.5759%

    perlbmk0.330.240.130.7035%

    crafty0.330.200.190.7249%

    eon0.330.480.000.821%

    gzip0.330.320.210.8639%

    gap0.330.370.150.8629%

    vortex0.330.170.380.8869%

    bzip20.330.060.611.0091%

    gcc0.330.360.331.0248%

    parser0.330.540.401.2843%

    vpr0.33(0.01)1.461.78101%

    twolf0.330.511.011.8566%52%

    TPC-C0.330.921.312.5759%53%

    CFP2000 Avg0.330.330.491.1560%

    sixtrack0.330.260.030.6312%

    mesa0.330.350.090.7821%

    wupwise0.330.180.310.8363%

    mgrid0.330.220.340.8960%

    applu0.330.010.620.9798%

    facerec0.330.030.711.0796%

    galgel0.330.170.571.0777%

    apsi0.330.260.581.1769%

    ammp0.330.280.581.1968%

    fma3d0.330.430.571.3457%

    lucas0.330.520.881.7363%

    swim0.331.000.551.8835%

    equake0.331.210.802.3540%

    art0.330.961.733.0364%59%

    Sheet3

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Base CPI

    Pipeline Stall

    Memory CPI

    CPI

    Sheet2

    0.33333333330.26238666670.03428

    0.33333333330.35187666670.09479

    0.33333333330.18241666670.31425

    0.33333333330.22061666670.33605

    0.33333333330.01378666670.62288

    0.33333333330.03115666670.70551

    0.33333333330.16735666670.56931

    0.33333333330.25520666670.58146

    0.33333333330.27824666670.57842

    0.33333333330.43484666670.57182

    0.33333333330.52112666670.87554

    0.33333333331.00030666670.54636

    0.33333333331.21468666670.80198

    0.33333333330.96347666671.73319

    Base CPI

    Min Pipeline Stall

    Max Memory CPI

    CPI

    CPI breakdown

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.71

    CINT2000 total1.300.9014.273.570.2512.470.001.06

    Miss Penalty771607777

    Peak CPI0.3333333333

    memory CPI

    TPC-C like1.310.130.100.990.020.060.000.01

    CINT2000 total0.770.010.100.570.000.090.00.01

    Base CPI0.33

    Memory CPI (no overlap)0.77

    Pipeline Stalls (no overlap)0.19

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    Base CPIMemory CPIPipeline StallL2 Memory% Memory L2per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like0.331.310.920.9975%2.5718.3413.896.183.259.000.091.71

    Miss Pen771607777

    CINT2000 total0.330.770.190.5774%1.300.9014.273.570.2512.470.001.06INT

    gzip0.330.210.320.028%0.860.0116.030.100.0111.060.000.09INT

    vpr0.331.46(0.01)0.9263%1.780.0223.365.730.0150.520.003.22INT

    gcc0.330.330.360.1444%1.021.9419.040.900.794.530.000.19INT

    mcf0.3318.20(5.47)16.6191%13.060.02148.90103.820.0150.490.0026.98INT

    crafty0.330.190.200.015%0.723.154.050.060.1618.070.000.01INT

    parser0.330.400.540.2153%1.280.0814.801.340.0111.560.000.65INT

    eon0.330.000.480.00%0.820.060.450.000.010.050.000.00INT

    perlbmk0.330.130.240.0754%0.701.362.410.430.933.510.000.31INT

    gap0.330.150.370.0960%0.860.764.270.580.053.380.000.33INT

    vortex0.330.380.170.1949%0.883.675.861.170.6815.780.001.38INT

    bzip20.330.610.060.4778%1.000.0110.572.940.008.170.000.63INT

    twolf0.331.010.510.7271%1.850.0826.184.490.0214.790.000.01INT

    52%

    CFP2000 total0.330.490.330.3674%1.150.0813.432.260.013.700.000.79FP

    wupwise0.330.310.180.2785%0.830.006.561.660.000.220.000.17FP

    swim0.330.551.000.3259%1.880.0130.872.020.000.590.000.41FP

    mgrid0.330.340.220.2264%0.890.0116.541.350.000.350.000.25FP

    applu0.330.620.010.5588%0.970.018.483.410.002.420.000.13FP

    mesa0.330.090.350.0222%0.780.031.580.130.018.780.000.17FP

    galgel0.330.570.170.3867%1.070.0118.632.380.007.620.000.67FP

    art0.331.730.961.3276%3.030.0056.968.270.001.200.000.41FP

    equake0.330.801.210.5366%2.350.0637.293.300.001.200.000.59FP

    facerec0.330.710.030.6389%1.070.019.313.940.001.210.000.20FP

    ammp0.330.580.280.3866%1.190.0216.582.370.008.610.003.25FP

    lucas0.330.880.520.7080%1.730.0017.354.360.004.800.003.27FP

    fma3d0.330.570.430.4885%1.340.2011.843.020.050.360.000.21FP

    sixtrack0.330.030.260.0375%0.630.030.530.160.010.660.000.01FP

    apsi0.330.580.260.4068%1.170.5013.812.480.0110.370.001.69FP

    71%

    CINT2000 Avg0.330.190.771.30

    mcf0.33(5.47)18.20

    Base CPIMin Pipeline StallMax Memory CPITotal CPI

    TPC-C like0.330.921.312.57

    perlbmk0.330.240.130.70

    crafty0.330.200.190.72

    eon0.330.480.000.82

    gzip0.330.320.210.86

    gap0.330.370.150.86

    vortex0.330.170.380.88

    bzip20.330.060.611.00

    gcc0.330.360.331.02

    parser0.330.540.401.28

    vpr0.33(0.01)1.461.78

    twolf0.330.511.011.85

    TPC-C0.330.921.312.57

    CFP2000 Avg0.330.330.491.15

    sixtrack0.330.260.030.63

    mesa0.330.350.090.78

    wupwise0.330.180.310.83

    mgrid0.330.220.340.89

    applu0.330.010.620.97

    facerec0.330.030.711.07

    galgel0.330.170.571.07

    apsi0.330.260.581.17

    ammp0.330.280.581.19

    fma3d0.330.430.571.34

    lucas0.330.520.881.73

    swim0.331.000.551.88

    equake0.331.210.802.35

    art0.330.961.733.03

    CPI breakdown

    0.33333333330.106240.2604266667

    0.33333333330.096530.2901366667

    0.33333333330.003690.4829766667

    0.33333333330.151050.3756166667

    0.33333333330.137180.3894866667

    0.33333333330.296490.2501766667

    0.33333333330.565210.1014566667

    0.33333333330.302830.3838366667

    0.33333333330.346250.6004166667

    0.33333333331.204060.2426066667

    0.33333333330.931910.5847566667

    0.33333333331.251510.9851566667

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    Sheet1

    0.33333333330.034280.2623866667

    0.33333333330.094790.3518766667

    0.33333333330.314250.1824166667

    0.33333333330.336050.2206166667

    0.33333333330.622880.0137866667

    0.33333333330.705510.0311566667

    0.33333333330.569310.1673566667

    0.33333333330.581460.2552066667

    0.33333333330.578420.2782466667

    0.33333333330.571820.4348466667

    0.33333333330.875540.5211266667

    0.33333333330.546361.0003066667

    0.33333333330.801981.2146866667

    0.33333333331.733190.9634766667

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    Opteron SPEC CPU2000 Rates

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.71

    CINT2000 total1.300.9014.273.570.2512.470.001.06

    Miss Penalty771607227

    Peak CPI0.3333333333

    memory CPI

    TPC-C like1.270.130.100.990.020.020.000.01

    CINT2000 total0.710.010.100.570.000.020.00.01

    Base CPI0.33

    Memory CPI (no overlap)0.71

    Pipeline Stalls (no overlap)0.26

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    Base CPIMemory CPIPipeline StallL2 Memory% Memory L2per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like0.331.250.990.9979%2.5718.3413.896.183.259.000.091.71

    Miss Pen771602277

    CINT2000 total0.330.710.260.5780%1.300.9014.273.570.2512.470.001.06INT

    perlbmk0.330.110.260.0765%0.701.362.410.430.933.510.000.31INT

    crafty0.330.100.290.0110%0.723.154.050.060.1618.070.000.01INT

    eon0.330.000.480.00%0.820.060.450.000.010.050.000.00INT

    gzip0.330.150.380.0211%0.860.0116.030.100.0111.060.000.09INT

    gap0.330.140.390.0968%0.860.764.270.580.053.380.000.33INT

    vortex0.330.300.250.1963%0.883.675.861.170.6815.780.001.38INT

    bzip20.330.570.100.4783%1.000.0110.572.940.008.170.000.63INT

    gcc0.330.300.380.1448%1.021.9419.040.900.794.530.000.19INT

    parser0.330.350.600.2162%1.280.0814.801.340.0111.560.000.65INT

    vpr0.331.200.240.9276%1.780.0223.365.730.0150.520.003.22INT

    twolf0.330.930.580.7277%1.850.0826.184.490.0214.790.000.01INT

    mcf0.3317.94(5.22)16.6193%13.060.02148.90103.820.0150.490.0026.98INT

    58%

    CFP2000 total0.330.470.350.3677%1.150.0813.432.260.013.700.000.79FP

    sixtrack0.330.030.270.0383%0.630.030.530.160.010.660.000.01FP

    mesa0.330.050.400.0241%0.780.031.580.130.018.780.000.17FP

    wupwise0.330.310.180.2785%0.830.006.561.660.000.220.000.17FP

    mgrid0.330.330.220.2265%0.890.0116.541.350.000.350.000.25FP

    applu0.330.610.030.5589%0.970.018.483.410.002.420.000.13FP

    galgel0.330.530.210.3872%1.070.0118.632.380.007.620.000.67FP

    facerec0.330.700.040.6390%1.070.019.313.940.001.210.000.20FP

    apsi0.330.530.310.4075%1.170.5013.812.480.0110.370.001.69FP

    ammp0.330.540.320.3871%1.190.0216.582.370.008.610.003.25FP

    fma3d0.330.570.440.4885%1.340.2011.843.020.050.360.000.21FP

    lucas0.330.850.550.7082%1.730.0017.354.360.004.800.003.27FP

    swim0.330.541.000.3259%1.880.0130.872.020.000.590.000.41FP

    equake0.330.801.220.5366%2.350.0637.293.300.001.200.000.59FP

    art0.331.730.971.3277%3.030.0056.968.270.001.200.000.41FP

    74%

    CINT2000 Avg0.330.260.711.30

    mcf0.33(5.47)18.20

    Base CPIMin Pipeline StallMax Memory CPITotal CPI

    TPC-C like0.330.991.252.57

    perlbmk0.330.260.110.70

    crafty0.330.290.100.72

    eon0.330.480.000.82

    gzip0.330.380.150.86

    gap0.330.390.140.86

    vortex0.330.250.300.88

    bzip20.330.100.571.00

    gcc0.330.380.301.02

    parser0.330.600.351.28

    vpr0.330.241.201.78

    twolf0.330.580.931.85

    TPC-C0.330.991.252.57

    CFP2000 Avg0.330.350.471.15

    sixtrack0.330.260.030.63

    mesa0.330.350.090.78

    wupwise0.330.180.310.83

    mgrid0.330.220.340.89

    applu0.330.010.620.97

    facerec0.330.030.711.07

    galgel0.330.170.571.07

    apsi0.330.260.581.17

    ammp0.330.280.581.19

    fma3d0.330.430.571.34

    lucas0.330.520.881.73

    swim0.331.000.551.88

    equake0.331.210.802.35

    art0.330.961.733.03

    perlbmk0.330.260.110.70

    crafty0.330.290.100.72

    eon0.330.480.000.82

    gzip0.330.380.150.86

    gap0.330.390.140.86

    vortex0.330.250.300.88

    bzip20.330.100.571.00

    gcc0.330.380.301.02

    parser0.330.600.351.28

    vpr0.330.241.201.78

    twolf0.330.580.931.85

    TPC-C0.330.991.252.57

    sixtrack0.330.260.030.63

    mesa0.330.350.090.78

    wupwise0.330.180.310.83

    mgrid0.330.220.340.89

    applu0.330.010.620.97

    facerec0.330.030.711.07

    galgel0.330.170.571.07

    apsi0.330.260.581.17

    ammp0.330.280.581.19

    fma3d0.330.430.571.34

    lucas0.330.520.881.73

    swim0.331.000.551.88

    equake0.331.210.802.35

    art0.330.961.733.03

    Opteron SPEC CPU2000 Rates

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 missesIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.711.83%1.39%0.62%0.33%0.90%0.01%0.17%

    %%%%%%%

    CINT2000 total1.300.9014.273.570.2512.470.001.06INT0.09%1.43%0.36%0.03%1.25%0.00%0.11%

    164.gzip0.860.0116.030.100.0111.060.000.09INT0.001%1.603%0.010%0.001%1.106%0.000%0.009%

    175.vpr1.780.0223.365.730.0150.520.003.22INT0.002%2.336%0.573%0.001%5.052%0.000%0.322%

    176.gcc1.021.9419.040.900.794.530.000.19INT0.194%1.904%0.090%0.079%0.453%0.000%0.019%

    181.mcf13.060.02148.90103.820.0150.490.0026.98INT0.002%14.890%10.382%0.001%5.049%0.000%2.698%

    186.crafty0.723.154.050.060.1618.070.000.01INT0.315%0.405%0.006%0.016%1.807%0.000%0.001%

    197.parser1.280.0814.801.340.0111.560.000.65INT0.008%1.480%0.134%0.001%1.156%0.000%0.065%

    252.eon0.820.060.450.000.010.050.000.00INT0.006%0.045%0.000%0.001%0.005%0.000%0.000%

    253.perlbmk0.701.362.410.430.933.510.000.31INT0.136%0.241%0.043%0.093%0.351%0.000%0.031%

    254.gap0.860.764.270.580.053.380.000.33INT0.076%0.427%0.058%0.005%0.338%0.000%0.033%

    255.vortex0.883.675.861.170.6815.780.001.38INT0.367%0.586%0.117%0.068%1.578%0.000%0.138%

    256.bzip21.000.0110.572.940.008.170.000.63INT0.001%1.057%0.294%0.000%0.817%0.000%0.063%

    300.twolf1.850.0826.184.490.0214.790.000.01INT0.008%2.618%0.449%0.002%1.479%0.000%0.001%

    CFP2000 total1.150.0813.432.260.013.700.000.79FP0.01%1.34%0.23%0.00%0.37%0.00%0.08%

    168.wupwise0.830.006.561.660.000.220.000.17FP0.000%0.656%0.166%0.000%0.022%0.000%0.017%

    171.swim1.880.0130.872.020.000.590.000.41FP0.001%3.087%0.202%0.000%0.059%0.000%0.041%

    172.mgrid0.890.0116.541.350.000.350.000.25FP0.001%1.654%0.135%0.000%0.035%0.000%0.025%

    173.applu0.970.018.483.410.002.420.000.13FP0.001%0.848%0.341%0.000%0.242%0.000%0.013%

    177.mesa0.780.031.580.130.018.780.000.17FP0.003%0.158%0.013%0.001%0.878%0.000%0.017%

    178.galgel1.070.0118.632.380.007.620.000.67FP0.001%1.863%0.238%0.000%0.762%0.000%0.067%

    179.art3.030.0056.968.270.001.200.000.41FP0.000%5.696%0.827%0.000%0.120%0.000%0.041%

    183.equake2.350.0637.293.300.001.200.000.59FP0.006%3.729%0.330%0.000%0.120%0.000%0.059%

    187.facerec1.070.019.313.940.001.210.000.20FP0.001%0.931%0.394%0.000%0.121%0.000%0.020%

    188.ammp1.190.0216.582.370.008.610.003.25FP0.002%1.658%0.237%0.000%0.861%0.000%0.325%

    189.lucas1.730.0017.354.360.004.800.003.27FP0.000%1.735%0.436%0.000%0.480%0.000%0.327%

    191.fma3d1.340.2011.843.020.050.360.000.21FP0.020%1.184%0.302%0.005%0.036%0.000%0.021%

    200.sixtrack0.630.030.530.160.010.660.000.01FP0.003%0.053%0.016%0.001%0.066%0.000%0.001%

    301.apsi1.170.5013.812.480.0110.370.001.69FP0.050%1.381%0.248%0.001%1.037%0.000%0.169%

    2.020.41.01.713.00.70.01.6Min0.000%0.045%0.000%0.000%0.005%0.000%0.000%

    2.2229.31.02.7325.02.40.02.2Max0.367%14.890%10.382%0.093%5.052%0.000%2.698%

    Inverse0.77Icache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    0.87per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    0.38910505842.57Not MCF0.000%0.045%0.000%0.000%0.005%0.000%0.000%

    Notes:NOT MCF0.367%5.696%0.827%0.093%5.052%0.000%0.327%

    - Data measured on uniprocessor Opteron based systems

    - Metrics are averaged over the entire benchmark run

    - Only SPEC CPU2000 base results are given. All CPU2000 benchmarks run in 64-bit mode.

    - TPC-C like benchmark is run in 32-bit mode.

    - Contact: [email protected] 512-602-5581 or 512-576-9485 mobile

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.71

    CINT2000 total1.300.9014.273.570.2512.470.001.06INT

    164.gzip0.860.0116.030.100.0111.060.000.09INT

    175.vpr1.780.0223.365.730.0150.520.003.22INT

    176.gcc1.021.9419.040.900.794.530.000.19INT

    181.mcf13.060.02148.90103.820.0150.490.0026.98INT

    186.crafty0.723.154.050.060.1618.070.000.01INT

    197.parser1.280.0814.801.340.0111.560.000.65INT

    252.eon0.820.060.450.000.010.050.000.00INT

    253.perlbmk0.701.362.410.430.933.510.000.31INT

    254.gap0.860.764.270.580.053.380.000.33INT

    255.vortex0.883.675.861.170.6815.780.001.38INT

    256.bzip21.000.0110.572.940.008.170.000.63INT

    300.twolf1.850.0826.184.490.0214.790.000.01INT

    CFP2000 total1.150.0813.432.260.013.700.000.79FP

    168.wupwise0.830.006.561.660.000.220.000.17FP

    171.swim1.880.0130.872.020.000.590.000.41FP

    172.mgrid0.890.0116.541.350.000.350.000.25FP

    173.applu0.970.018.483.410.002.420.000.13FP

    177.mesa0.780.031.580.130.018.780.000.17FP

    178.galgel1.070.0118.632.380.007.620.000.67FP

    179.art3.030.0056.968.270.001.200.000.41FP

    183.equake2.350.0637.293.300.001.200.000.59FP

    187.facerec1.070.019.313.940.001.210.000.20FP

    188.ammp1.190.0216.582.370.008.610.003.25FP

    189.lucas1.730.0017.354.360.004.800.003.27FP

    191.fma3d1.340.2011.843.020.050.360.000.21FP

    200.sixtrack0.630.030.530.160.010.660.000.01FP

    301.apsi1.170.5013.812.480.0110.370.001.69FP

    2.020.41.01.713.00.70.01.6

    2.2229.31.02.7325.02.40.02.2

    Notes:

    - Data measured on uniprocessor Opteron based systems

    - Metrics are averaged over the entire benchmark run

    - Only SPEC CPU2000 base results are given. All CPU2000 benchmarks run in 64-bit mode.

    - TPC-C like benchmark is run in 32-bit mode.

    - Contact: [email protected] 512-602-5581 or 512-576-9485 mobile

  • *CS252 s06 Adv. Memory Hieriarchy 2*CPI breakdown for Floating Pt. ProgramsCPI above base attributable to memory 60%L2 cache misses 40% overall (70% memory CPI)Assumes misses are not overlapped with the execution pipeline or with each other, so the pipeline stall portion is a lower bound

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart4

    0.33333333330.034280.2623866667

    0.33333333330.094790.3518766667

    0.33333333330.314250.1824166667

    0.33333333330.336050.2206166667

    0.33333333330.622880.0137866667

    0.33333333330.705510.0311566667

    0.33333333330.569310.1673566667

    0.33333333330.581460.2552066667

    0.33333333330.578420.2782466667

    0.33333333330.571820.4348466667

    0.33333333330.875540.5211266667

    0.33333333330.546361.0003066667

    0.33333333330.801981.2146866667

    0.33333333331.733190.9634766667

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    Sheet3

    Base CPIPipeline StallMemory CPITotal CPI% CPI-base in Memory

    TPC-C like0.330.921.312.5759%

    perlbmk0.330.240.130.7035%

    crafty0.330.200.190.7249%

    eon0.330.480.000.821%

    gzip0.330.320.210.8639%

    gap0.330.370.150.8629%

    vortex0.330.170.380.8869%

    bzip20.330.060.611.0091%

    gcc0.330.360.331.0248%

    parser0.330.540.401.2843%

    vpr0.33(0.01)1.461.78101%

    twolf0.330.511.011.8566%52%

    TPC-C0.330.921.312.5759%53%

    CFP2000 Avg0.330.330.491.1560%

    sixtrack0.330.260.030.6312%

    mesa0.330.350.090.7821%

    wupwise0.330.180.310.8363%

    mgrid0.330.220.340.8960%

    applu0.330.010.620.9798%

    facerec0.330.030.711.0796%

    galgel0.330.170.571.0777%

    apsi0.330.260.581.1769%

    ammp0.330.280.581.1968%

    fma3d0.330.430.571.3457%

    lucas0.330.520.881.7363%

    swim0.331.000.551.8835%

    equake0.331.210.802.3540%

    art0.330.961.733.0364%59%

    Sheet3

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Base CPI

    Pipeline Stall

    Memory CPI

    CPI

    Sheet2

    0.33333333330.26238666670.03428

    0.33333333330.35187666670.09479

    0.33333333330.18241666670.31425

    0.33333333330.22061666670.33605

    0.33333333330.01378666670.62288

    0.33333333330.03115666670.70551

    0.33333333330.16735666670.56931

    0.33333333330.25520666670.58146

    0.33333333330.27824666670.57842

    0.33333333330.43484666670.57182

    0.33333333330.52112666670.87554

    0.33333333331.00030666670.54636

    0.33333333331.21468666670.80198

    0.33333333330.96347666671.73319

    Base CPI

    Min Pipeline Stall

    Max Memory CPI

    CPI

    CPI breakdown

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.71

    CINT2000 total1.300.9014.273.570.2512.470.001.06

    Miss Penalty771607777

    Peak CPI0.3333333333

    memory CPI

    TPC-C like1.310.130.100.990.020.060.000.01

    CINT2000 total0.770.010.100.570.000.090.00.01

    Base CPI0.33

    Memory CPI (no overlap)0.77

    Pipeline Stalls (no overlap)0.19

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    Base CPIMemory CPIPipeline StallL2 Memory% Memory L2per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like0.331.310.920.9975%2.5718.3413.896.183.259.000.091.71

    Miss Pen771607777

    CINT2000 total0.330.770.190.5774%1.300.9014.273.570.2512.470.001.06INT

    gzip0.330.210.320.028%0.860.0116.030.100.0111.060.000.09INT

    vpr0.331.46(0.01)0.9263%1.780.0223.365.730.0150.520.003.22INT

    gcc0.330.330.360.1444%1.021.9419.040.900.794.530.000.19INT

    mcf0.3318.20(5.47)16.6191%13.060.02148.90103.820.0150.490.0026.98INT

    crafty0.330.190.200.015%0.723.154.050.060.1618.070.000.01INT

    parser0.330.400.540.2153%1.280.0814.801.340.0111.560.000.65INT

    eon0.330.000.480.00%0.820.060.450.000.010.050.000.00INT

    perlbmk0.330.130.240.0754%0.701.362.410.430.933.510.000.31INT

    gap0.330.150.370.0960%0.860.764.270.580.053.380.000.33INT

    vortex0.330.380.170.1949%0.883.675.861.170.6815.780.001.38INT

    bzip20.330.610.060.4778%1.000.0110.572.940.008.170.000.63INT

    twolf0.331.010.510.7271%1.850.0826.184.490.0214.790.000.01INT

    52%

    CFP2000 total0.330.490.330.3674%1.150.0813.432.260.013.700.000.79FP

    wupwise0.330.310.180.2785%0.830.006.561.660.000.220.000.17FP

    swim0.330.551.000.3259%1.880.0130.872.020.000.590.000.41FP

    mgrid0.330.340.220.2264%0.890.0116.541.350.000.350.000.25FP

    applu0.330.620.010.5588%0.970.018.483.410.002.420.000.13FP

    mesa0.330.090.350.0222%0.780.031.580.130.018.780.000.17FP

    galgel0.330.570.170.3867%1.070.0118.632.380.007.620.000.67FP

    art0.331.730.961.3276%3.030.0056.968.270.001.200.000.41FP

    equake0.330.801.210.5366%2.350.0637.293.300.001.200.000.59FP

    facerec0.330.710.030.6389%1.070.019.313.940.001.210.000.20FP

    ammp0.330.580.280.3866%1.190.0216.582.370.008.610.003.25FP

    lucas0.330.880.520.7080%1.730.0017.354.360.004.800.003.27FP

    fma3d0.330.570.430.4885%1.340.2011.843.020.050.360.000.21FP

    sixtrack0.330.030.260.0375%0.630.030.530.160.010.660.000.01FP

    apsi0.330.580.260.4068%1.170.5013.812.480.0110.370.001.69FP

    71%

    CINT2000 Avg0.330.190.771.30

    mcf0.33(5.47)18.20

    Base CPIMin Pipeline StallMax Memory CPITotal CPI

    TPC-C like0.330.921.312.57

    perlbmk0.330.240.130.70

    crafty0.330.200.190.72

    eon0.330.480.000.82

    gzip0.330.320.210.86

    gap0.330.370.150.86

    vortex0.330.170.380.88

    bzip20.330.060.611.00

    gcc0.330.360.331.02

    parser0.330.540.401.28

    vpr0.33(0.01)1.461.78

    twolf0.330.511.011.85

    TPC-C0.330.921.312.57

    CFP2000 Avg0.330.330.491.15

    sixtrack0.330.260.030.63

    mesa0.330.350.090.78

    wupwise0.330.180.310.83

    mgrid0.330.220.340.89

    applu0.330.010.620.97

    facerec0.330.030.711.07

    galgel0.330.170.571.07

    apsi0.330.260.581.17

    ammp0.330.280.581.19

    fma3d0.330.430.571.34

    lucas0.330.520.881.73

    swim0.331.000.551.88

    equake0.331.210.802.35

    art0.330.961.733.03

    CPI breakdown

    0.33333333330.106240.2604266667

    0.33333333330.096530.2901366667

    0.33333333330.003690.4829766667

    0.33333333330.151050.3756166667

    0.33333333330.137180.3894866667

    0.33333333330.296490.2501766667

    0.33333333330.565210.1014566667

    0.33333333330.302830.3838366667

    0.33333333330.346250.6004166667

    0.33333333331.204060.2426066667

    0.33333333330.931910.5847566667

    0.33333333331.251510.9851566667

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    Sheet1

    0.33333333330.034280.2623866667

    0.33333333330.094790.3518766667

    0.33333333330.314250.1824166667

    0.33333333330.336050.2206166667

    0.33333333330.622880.0137866667

    0.33333333330.705510.0311566667

    0.33333333330.569310.1673566667

    0.33333333330.581460.2552066667

    0.33333333330.578420.2782466667

    0.33333333330.571820.4348466667

    0.33333333330.875540.5211266667

    0.33333333330.546361.0003066667

    0.33333333330.801981.2146866667

    0.33333333331.733190.9634766667

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    Opteron SPEC CPU2000 Rates

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.71

    CINT2000 total1.300.9014.273.570.2512.470.001.06

    Miss Penalty771607227

    Peak CPI0.3333333333

    memory CPI

    TPC-C like1.270.130.100.990.020.020.000.01

    CINT2000 total0.710.010.100.570.000.020.00.01

    Base CPI0.33

    Memory CPI (no overlap)0.71

    Pipeline Stalls (no overlap)0.26

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    Base CPIMemory CPIPipeline StallL2 Memory% Memory L2per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like0.331.250.990.9979%2.5718.3413.896.183.259.000.091.71

    Miss Pen771602277

    CINT2000 total0.330.710.260.5780%1.300.9014.273.570.2512.470.001.06INT

    perlbmk0.330.110.260.0765%0.701.362.410.430.933.510.000.31INT

    crafty0.330.100.290.0110%0.723.154.050.060.1618.070.000.01INT

    eon0.330.000.480.00%0.820.060.450.000.010.050.000.00INT

    gzip0.330.150.380.0211%0.860.0116.030.100.0111.060.000.09INT

    gap0.330.140.390.0968%0.860.764.270.580.053.380.000.33INT

    vortex0.330.300.250.1963%0.883.675.861.170.6815.780.001.38INT

    bzip20.330.570.100.4783%1.000.0110.572.940.008.170.000.63INT

    gcc0.330.300.380.1448%1.021.9419.040.900.794.530.000.19INT

    parser0.330.350.600.2162%1.280.0814.801.340.0111.560.000.65INT

    vpr0.331.200.240.9276%1.780.0223.365.730.0150.520.003.22INT

    twolf0.330.930.580.7277%1.850.0826.184.490.0214.790.000.01INT

    mcf0.3317.94(5.22)16.6193%13.060.02148.90103.820.0150.490.0026.98INT

    58%

    CFP2000 total0.330.470.350.3677%1.150.0813.432.260.013.700.000.79FP

    sixtrack0.330.030.270.0383%0.630.030.530.160.010.660.000.01FP

    mesa0.330.050.400.0241%0.780.031.580.130.018.780.000.17FP

    wupwise0.330.310.180.2785%0.830.006.561.660.000.220.000.17FP

    mgrid0.330.330.220.2265%0.890.0116.541.350.000.350.000.25FP

    applu0.330.610.030.5589%0.970.018.483.410.002.420.000.13FP

    galgel0.330.530.210.3872%1.070.0118.632.380.007.620.000.67FP

    facerec0.330.700.040.6390%1.070.019.313.940.001.210.000.20FP

    apsi0.330.530.310.4075%1.170.5013.812.480.0110.370.001.69FP

    ammp0.330.540.320.3871%1.190.0216.582.370.008.610.003.25FP

    fma3d0.330.570.440.4885%1.340.2011.843.020.050.360.000.21FP

    lucas0.330.850.550.7082%1.730.0017.354.360.004.800.003.27FP

    swim0.330.541.000.3259%1.880.0130.872.020.000.590.000.41FP

    equake0.330.801.220.5366%2.350.0637.293.300.001.200.000.59FP

    art0.331.730.971.3277%3.030.0056.968.270.001.200.000.41FP

    74%

    CINT2000 Avg0.330.260.711.30

    mcf0.33(5.47)18.20

    Base CPIMin Pipeline StallMax Memory CPITotal CPI

    TPC-C like0.330.991.252.57

    perlbmk0.330.260.110.70

    crafty0.330.290.100.72

    eon0.330.480.000.82

    gzip0.330.380.150.86

    gap0.330.390.140.86

    vortex0.330.250.300.88

    bzip20.330.100.571.00

    gcc0.330.380.301.02

    parser0.330.600.351.28

    vpr0.330.241.201.78

    twolf0.330.580.931.85

    TPC-C0.330.991.252.57

    CFP2000 Avg0.330.350.471.15

    sixtrack0.330.260.030.63

    mesa0.330.350.090.78

    wupwise0.330.180.310.83

    mgrid0.330.220.340.89

    applu0.330.010.620.97

    facerec0.330.030.711.07

    galgel0.330.170.571.07

    apsi0.330.260.581.17

    ammp0.330.280.581.19

    fma3d0.330.430.571.34

    lucas0.330.520.881.73

    swim0.331.000.551.88

    equake0.331.210.802.35

    art0.330.961.733.03

    perlbmk0.330.260.110.70

    crafty0.330.290.100.72

    eon0.330.480.000.82

    gzip0.330.380.150.86

    gap0.330.390.140.86

    vortex0.330.250.300.88

    bzip20.330.100.571.00

    gcc0.330.380.301.02

    parser0.330.600.351.28

    vpr0.330.241.201.78

    twolf0.330.580.931.85

    TPC-C0.330.991.252.57

    sixtrack0.330.260.030.63

    mesa0.330.350.090.78

    wupwise0.330.180.310.83

    mgrid0.330.220.340.89

    applu0.330.010.620.97

    facerec0.330.030.711.07

    galgel0.330.170.571.07

    apsi0.330.260.581.17

    ammp0.330.280.581.19

    fma3d0.330.430.571.34

    lucas0.330.520.881.73

    swim0.331.000.551.88

    equake0.331.210.802.35

    art0.330.961.733.03

    Opteron SPEC CPU2000 Rates

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Base CPI

    Max Memory CPI

    Min Pipeline Stall

    CPI

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 missesIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.711.83%1.39%0.62%0.33%0.90%0.01%0.17%

    %%%%%%%

    CINT2000 total1.300.9014.273.570.2512.470.001.06INT0.09%1.43%0.36%0.03%1.25%0.00%0.11%

    164.gzip0.860.0116.030.100.0111.060.000.09INT0.001%1.603%0.010%0.001%1.106%0.000%0.009%

    175.vpr1.780.0223.365.730.0150.520.003.22INT0.002%2.336%0.573%0.001%5.052%0.000%0.322%

    176.gcc1.021.9419.040.900.794.530.000.19INT0.194%1.904%0.090%0.079%0.453%0.000%0.019%

    181.mcf13.060.02148.90103.820.0150.490.0026.98INT0.002%14.890%10.382%0.001%5.049%0.000%2.698%

    186.crafty0.723.154.050.060.1618.070.000.01INT0.315%0.405%0.006%0.016%1.807%0.000%0.001%

    197.parser1.280.0814.801.340.0111.560.000.65INT0.008%1.480%0.134%0.001%1.156%0.000%0.065%

    252.eon0.820.060.450.000.010.050.000.00INT0.006%0.045%0.000%0.001%0.005%0.000%0.000%

    253.perlbmk0.701.362.410.430.933.510.000.31INT0.136%0.241%0.043%0.093%0.351%0.000%0.031%

    254.gap0.860.764.270.580.053.380.000.33INT0.076%0.427%0.058%0.005%0.338%0.000%0.033%

    255.vortex0.883.675.861.170.6815.780.001.38INT0.367%0.586%0.117%0.068%1.578%0.000%0.138%

    256.bzip21.000.0110.572.940.008.170.000.63INT0.001%1.057%0.294%0.000%0.817%0.000%0.063%

    300.twolf1.850.0826.184.490.0214.790.000.01INT0.008%2.618%0.449%0.002%1.479%0.000%0.001%

    CFP2000 total1.150.0813.432.260.013.700.000.79FP0.01%1.34%0.23%0.00%0.37%0.00%0.08%

    168.wupwise0.830.006.561.660.000.220.000.17FP0.000%0.656%0.166%0.000%0.022%0.000%0.017%

    171.swim1.880.0130.872.020.000.590.000.41FP0.001%3.087%0.202%0.000%0.059%0.000%0.041%

    172.mgrid0.890.0116.541.350.000.350.000.25FP0.001%1.654%0.135%0.000%0.035%0.000%0.025%

    173.applu0.970.018.483.410.002.420.000.13FP0.001%0.848%0.341%0.000%0.242%0.000%0.013%

    177.mesa0.780.031.580.130.018.780.000.17FP0.003%0.158%0.013%0.001%0.878%0.000%0.017%

    178.galgel1.070.0118.632.380.007.620.000.67FP0.001%1.863%0.238%0.000%0.762%0.000%0.067%

    179.art3.030.0056.968.270.001.200.000.41FP0.000%5.696%0.827%0.000%0.120%0.000%0.041%

    183.equake2.350.0637.293.300.001.200.000.59FP0.006%3.729%0.330%0.000%0.120%0.000%0.059%

    187.facerec1.070.019.313.940.001.210.000.20FP0.001%0.931%0.394%0.000%0.121%0.000%0.020%

    188.ammp1.190.0216.582.370.008.610.003.25FP0.002%1.658%0.237%0.000%0.861%0.000%0.325%

    189.lucas1.730.0017.354.360.004.800.003.27FP0.000%1.735%0.436%0.000%0.480%0.000%0.327%

    191.fma3d1.340.2011.843.020.050.360.000.21FP0.020%1.184%0.302%0.005%0.036%0.000%0.021%

    200.sixtrack0.630.030.530.160.010.660.000.01FP0.003%0.053%0.016%0.001%0.066%0.000%0.001%

    301.apsi1.170.5013.812.480.0110.370.001.69FP0.050%1.381%0.248%0.001%1.037%0.000%0.169%

    2.020.41.01.713.00.70.01.6Min0.000%0.045%0.000%0.000%0.005%0.000%0.000%

    2.2229.31.02.7325.02.40.02.2Max0.367%14.890%10.382%0.093%5.052%0.000%2.698%

    Inverse0.77Icache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    0.87per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    0.38910505842.57Not MCF0.000%0.045%0.000%0.000%0.005%0.000%0.000%

    Notes:NOT MCF0.367%5.696%0.827%0.093%5.052%0.000%0.327%

    - Data measured on uniprocessor Opteron based systems

    - Metrics are averaged over the entire benchmark run

    - Only SPEC CPU2000 base results are given. All CPU2000 benchmarks run in 64-bit mode.

    - TPC-C like benchmark is run in 32-bit mode.

    - Contact: [email protected] 512-602-5581 or 512-576-9485 mobile

    BenchmarkAvg CPIIcache missesDcache missesL2 missesITLB L1 missesDTLB L1 missesITLB L2 missesDTLB L2 misses

    per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.per 1k instr.

    TPC-C like2.5718.3413.896.183.259.000.091.71

    CINT2000 total1.300.9014.273.570.2512.470.001.06INT

    164.gzip0.860.0116.030.100.0111.060.000.09INT

    175.vpr1.780.0223.365.730.0150.520.003.22INT

    176.gcc1.021.9419.040.900.794.530.000.19INT

    181.mcf13.060.02148.90103.820.0150.490.0026.98INT

    186.crafty0.723.154.050.060.1618.070.000.01INT

    197.parser1.280.0814.801.340.0111.560.000.65INT

    252.eon0.820.060.450.000.010.050.000.00INT

    253.perlbmk0.701.362.410.430.933.510.000.31INT

    254.gap0.860.764.270.580.053.380.000.33INT

    255.vortex0.883.675.861.170.6815.780.001.38INT

    256.bzip21.000.0110.572.940.008.170.000.63INT

    300.twolf1.850.0826.184.490.0214.790.000.01INT

    CFP2000 total1.150.0813.432.260.013.700.000.79FP

    168.wupwise0.830.006.561.660.000.220.000.17FP

    171.swim1.880.0130.872.020.000.590.000.41FP

    172.mgrid0.890.0116.541.350.000.350.000.25FP

    173.applu0.970.018.483.410.002.420.000.13FP

    177.mesa0.780.031.580.130.018.780.000.17FP

    178.galgel1.070.0118.632.380.007.620.000.67FP

    179.art3.030.0056.968.270.001.200.000.41FP

    183.equake2.350.0637.293.300.001.200.000.59FP

    187.facerec1.070.019.313.940.001.210.000.20FP

    188.ammp1.190.0216.582.370.008.610.003.25FP

    189.lucas1.730.0017.354.360.004.800.003.27FP

    191.fma3d1.340.2011.843.020.050.360.000.21FP

    200.sixtrack0.630.030.530.160.010.660.000.01FP

    301.apsi1.170.5013.812.480.0110.370.001.69FP

    2.020.41.01.713.00.70.01.6

    2.2229.31.02.7325.02.40.02.2

    Notes:

    - Data measured on uniprocessor Opteron based systems

    - Metrics are averaged over the entire benchmark run

    - Only SPEC CPU2000 base results are given. All CPU2000 benchmarks run in 64-bit mode.

    - TPC-C like benchmark is run in 32-bit mode.

    - Contact: [email protected] 512-602-5581 or 512-576-9485 mobile

  • *CS252 s06 Adv. Memory Hieriarchy 2*Pentium 4 vs. Opteron Memory Hierarchy*Clock rate for this comparison in 2005; faster versions existed

    CPUPentium 4 (3.2 GHz*)Opteron (2.8 GHz*)Instruction CacheTrace Cache (8K micro-ops)2-way associative, 64 KB, 64B blockData Cache8-way associative, 16 KB, 64B block, inclusive in L22-way associative, 64 KB, 64B block, exclusive to L2L2 cache8-way associative, 2 MB, 128B block16-way associative, 1 MB, 64B blockPrefetch8 streams to L21 stream to L2Memory200 MHz x 64 bits200 MHz x 128 bits

    CS252 s06 Adv. Memory Hieriarchy 2

  • *CS252 s06 Adv. Memory Hieriarchy 2*Misses Per Instruction: Pentium 4 vs. OpteronOpteron betterPentium betterD cache miss: P4 is 2.3X to 3.4X vs. OpteronL2 cache miss: P4 is 0.5X to 1.5X vs. OpteronNote: Same ISA, but not same instruction count2.3X3.4X0.5X1.5X

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart1

    3.49905844833.4651969004

    1.77015906670.3413784983

    1.62797720580.8163025337

    1.2339812360.1755664967

    4.68415928980.19326934

    4.66914217021.9451292097

    5.01586036616.1920687672

    1.3099034460.723184898

    3.58570534940.4991951275

    3.96956736461.6074951627

    SPECint2000

    SPECfp2000

    D cache: P4/Opteron

    L2 cache: P4/Opteron

    Ratio of MPI: Pentium 4/Opteron

    D & L2 MPI, SPEC

    Pentium 43.2 GHzOpteron2.8 GHzP4/O

    12K16 K2 MBL1/L264 K64 K1 MBL1/L2

    I cacheD cacheL2 cacheI cacheD cacheL2 cacheI cacheD cacheL2 cache

    164.gzip0.0556.095.610.351620.0116.031.600.101605.03.53.5

    175.vpr0.3541.354.141.96210.0223.362.345.73417.61.80.3

    176.gcc3.9931.003.100.73421.9419.041.900.90212.11.60.8

    181.mcf0.08183.7418.3718.23100.02148.9014.89103.8213.81.20.2

    186.crafty5.5518.971.900.011,6363.154.050.410.06681.84.70.2

    168.wupwise0.0530.633.063.2390.016.560.661.6649.44.71.9

    171.swim0.07154.8415.4812.51120.0130.873.092.02156.85.06.2

    172.mgrid0.0221.672.170.98220.0116.541.651.35121.81.30.7

    173.applu0.0330.413.041.70180.018.480.853.4122.93.60.5

    177.mesa0.516.270.630.21300.031.580.160.131216.94.01.6

    MIN0.026.270.630.019.490.011.580.160.061.431.761.230.18

    MAX5.55183.7418.3718.231,635.963.15148.9014.89103.82160.3017.635.026.19

    I Pent4I OptD Pent4D OptL2 Pent4L2 Opt

    gzip0.050.0156.0916.030.350.10

    vpr0.350.0241.3523.361.965.73

    gcc3.991.9431.0019.040.730.90

    mcf0.080.02183.74148.9018.23103.82

    crafty5.553.1518.974.050.010.06

    wupeise0.050.0130.636.563.231.66

    swim0.070.01154.8430.8712.512.02

    mgrid0.020.0121.6716.540.981.35

    applu0.030.0130.418.481.703.41

    mesa0.510.036.271.580.210.13

    I cacheD cacheL2 cache

    gzip5.03.53.5

    vpr17.61.80.3

    gcc2.11.60.8

    mcf3.81.20.2

    crafty1.84.70.2

    wupwise9.44.71.9

    swim6.85.06.2

    mgrid1.81.30.7

    applu2.93.60.5

    mesa16.94.01.6

    D cache: P4/OpteronL2 cache: P4/Opteron

    gzip33.5

    vpr20.3

    gcc20.8

    mcf10.2

    crafty50.2

    wupwise51.9

    swim56.2

    mgrid10.7

    applu40.5

    mesa41.6

    Median3.50.8

    geomean2.760.86

    gstdev

    SPECratio2000base

    P4OpteronRatioRatio Op/P4

    gzip106612840.831.20

    vpr110312440.891.13

    gcc188014311.310.76

    mcf196211631.690.59

    crafty118318990.621.61

    wupwise259728380.921.09

    swim264022531.170.85

    mgrid146217230.851.18

    applu152621220.721.39

    mesa140318400.761.31

    All GM int158417430.941.10

    All GM fp184319761.081.07

    GM 5 int1,3871,3821.001.00

    GM 5 fp1,8462,1220.871.15

    D & L2 MPI, SPEC

    000000

    000000

    000000

    000000

    000000

    000000

    000000

    000000

    000000

    000000

    I Pent4

    I Opt

    D Pent4

    D Opt

    L2 Pent4

    L2 Opt

    Misses per Intruction

    New GeoMean

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    I cache

    D cache

    L2 cache

    GeoMean

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    SPECint2000

    SPECfp2000

    D cache: P4/Opteron

    L2 cache: P4/Opteron

    Ratio of MPI: Pentium 4/Opteron

    SPEC2000 Pent4

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    D cache: P4/Opteron

    L2 cache: P4/Opteron

    SPECRatio

    Ratio of MPI: Pentium 4/Opteron

    SPEC2000 Opt

    BenchmarkItanium 2 SPECRatioln(Sample)ln(GM)(ln(S)-ln(GM))**2

    gzip0.83(0.19)0.000.04

    vpr0.89(0.12)0.000.02

    gcc1.310.270.000.07

    mcf1.690.520.000.27

    Bold are beyondcrafty0.62(0.47)0.000.23

    1 St.Dev. Range

    N5

    1/N0.2Sum ln(S)0.02Sum[(ln(S)-ln(GM))**2]0.62

    Product1.0163530546div n0.00Sum[]/n0.12

    GM (old)1.00GM (via ln)1.00expSQRT(Sum[]/n)0.35SQRT(Su

    GM calculated old wayStDev(Sum(ln(S))0.39St Devexp(SQRT(Sum[]/n))1.42Gstdev

    exp(StDev(Sum(ln(S)))1.48Gstdev

    GM/Gstdev0.68lowGM/Gstdev0.71low

    GM*Gstdev1.49highGM*Gstdev1.43high

    GM, Gstdev, Range calculated beforeGM, Gstdev, Range calculated as on web page

    BenchmarkItanium 2 SPECRatioln(Sample)ln(GM)(ln(S)-ln(GM))**2

    wupwise0.92(0.09)0.000.01

    swim1.170.160.000.02

    mgrid0.85(0.16)0.000.03

    applu0.72(0.33)0.000.11

    Bold are beyondmesa0.76(0.27)0.000.08

    1 St.Dev. Range

    N5

    1/N0.2Sum ln(S)(0.70)Sum[(ln(S)-ln(GM))**2]0.25

    Product0.4988998701div n(0.14)Sum[]/n0.05

    GM (old)0.87GM (via ln)0.87expSQRT(Sum[]/n)0.22SQRT(Su

    GM calculated old wayStDev(Sum(ln(S))0.19St Devexp(SQRT(Sum[]/n))1.25Gstdev

    exp(StDev(Sum(ln(S)))1.21Gstdev

    GM/Gstdev0.72lowGM/Gstdev0.70low

    GM*Gstdev1.05highGM*Gstdev1.09high

    GM, Gstdev, Range calculated beforeGM, Gstdev, Range calculated as on web page

    BenchmarkPentium 4 / Opteron RatiolnD cacheBenchmarkPentium 4 / Opteron RatiolnD cacheBenchmarkPentium 4 / Opteron RatiolnD cache

    gzip3.51.25gzip3.51.25

    vpr1.80.57vpr1.80.57

    gcc1.60.49gcc1.60.49

    mcf1.20.21mcf1.20.21

    crafty4.71.54crafty4.71.54

    wupwise4.71.54wupwise4.71.54

    swim5.01.61swim5.01.61

    mgrid1.30.27mgrid1.30.27

    applu3.61.28applu3.61.28

    mesa4.01.38mesa4.01.38

    1055

    0.125450.224509197810.14Sum ln0.258.28435863264.07Sum ln0.2436.65616481446.08Sum ln

    GM2.761.01div nGM2.250.81div nGM3.371.22div n

    2.76exp2.25exp3.37exp

    0.56St Dev0.56St Dev0.54St Dev

    1.76Gstdev1.75Gstdev1.72Gstdev

    1.57low1.29low1.96low

    4.84high3.95high5.82high

    BenchmarkPentium 4 / Opteron RatiolnL2 cacheBenchmarkPentium 4 / Opteron RatiolnL2 cacheBenchmarkPentium 4 / Opteron RatiolnL2 cache

    gzip3.51.24gzip3.51.24

    vpr0.3(1.07)vpr0.3(1.07)

    gcc0.8(0.20)gcc0.8(0.20)

    mcf0.2(1.74)mcf0.2(1.74)

    crafty0.2(1.64)crafty0.2(1.64)

    wupwise1.90.67wupwise1.90.67

    swim6.21.82swim6.21.82

    mgrid0.7(0.32)mgrid0.7(0.32)

    applu0.5(0.69)applu0.5(0.69)

    mesa1.60.47mesa1.60.47

    1055

    0.10.2290200047(1.47)Sum ln0.20.0327657288(3.42)Sum ln0.26.98962035021.94Sum ln

    GM0.86(0.15)div nGM0.50(0.68)div nGM1.480.39div n

    0.86exp0.50exp1.48exp

    1.19St Dev1.24St Dev0.98St Dev

    3.30Gstdev3.45Gstdev2.66Gstdev

    0low0low1low

    3high2high4high

    BenchmarkPentium 4 / Opteron RatiolnSPECBenchmarkPentium 4 / Opteron RatiolnL2 cacheBenchmarkPentium 4 / Opteron RatiolnL2 cache

    gzip0.83(0.19)gzip0.83(0.19)

    vpr0.89(0.12)vpr0.89(0.12)

    gcc1.310.27gcc1.310.27

    mcf1.690.52mcf1.690.52

    crafty0.62(0.47)crafty0.62(0.47)

    wupwise0.92(0.09)wupwise0.92(0.09)

    swim1.170.16swim1.170.16

    mgrid0.85(0.16)mgrid0.85(0.16)

    applu0.72(0.33)applu0.72(0.33)

    mesa0.76(0.27)mesa0.76(0.27)

    1055

    0.10.507058407(0.68)Sum ln0.21.01635305460.02Sum ln0.20.4988998701(0.70)Sum ln

    GM0.93(0.07)div nGM1.000.00div nGM0.87(0.14)div n

    0.93exp1.00exp0.87exp

    0.30St Dev0.39St Dev0.19St Dev

    1.35Gstdev1.48Gstdev1.21Gstdev

    1low1low1low

    1high1high1high

    SPECint2000BaseBaseBasePeakPeakPeak

    BenchmarksRef TimeRun TimeRatioRef TimeRun TimeRatio

    164.gzip1400131106614001321063

    175.vpr1400127110314001251119

    176.gcc110058.518801100591865

    181.mcf180091.81962180091.71963

    186.crafty100084.51183100084.51183

    197.parser1800134133818001341340

    252.eon130064.82006130064.82006

    253.perlbmk180093.21932180093.21932

    254.gap110062.81751110062.51760

    255.vortex190070.12709190070.12709

    256.bzip21500124120515001251205

    300.twolf3000183163830001831638

    SPECint_base20001584

    SPECint20001585

    SPECfp2000BaseBaseBasePeakPeakPeak

    BenchmarksRef TimeRun TimeRatioRef TimeRun TimeRatio

    168.wupwise160061.62597160061.62597

    171.swim3100117264031001172640

    172.mgrid1800123146218001231462

    173.applu2100138152621001381526

    177.mesa140099.81403140099.81403

    178.galgel290085.73384290085.73384

    179.art260055.34702260055.34702

    183.equake13004727631300472763

    187.facerec190098.71925190098.71925

    188.ammp2200196112122001961121

    189.lucas200091.42188200091.42188

    191.fma3d2100143147121001431471

    200.sixtrack11001736351100173635

    301.apsi2600207125726002071257

    SPECfp_base20001843

    SPECfp20001843

    SPEC CINT2000 Summary

    Fujitsu Siemens Computers CELSIUS M440, Intel Pentium 4 640

    Wed Nov 30 11:08:35 2005

    HARDWARE

    --------

    Hardware Vendor: Fujitsu Siemens Computers

    Model Name: CELSIUS M440, Intel Pentium 4 640

    CPU: Intel Pentium 4 processor 640

    CPU MHz: 3200

    FPU: Integrated

    CPU(s) enabled: 1 core, 1 chip, 1 core/chip (Hyper-Threading Technology disabled)

    CPU(s) orderable: 1

    Parallel: No

    Primary Cache: 12k micro-ops I + 16KBD on chip

    Secondary Cache: 2048KB(I+D) on chip

    L3 Cache: N/A

    Other Cache: N/A

    Memory: 4 GB (4x1 GB DDR2-533, 2rank, CL4-4-4, with ECC)

    Disk Subsystem: Serial ATA 7200 rpm

    Other Hardware:

    SOFTWARE

    --------

    Operating System: Windows XP Professional, Service Pack 2

    Compiler: Intel(R) C++ Compiler for 32-bit app., Version 9.0,

    - Build 20050912Z Package ID: W_CC_C_9.0.024

    Microsoft Visual Studio .NET 2003 (for libraries)

    MicroQuill SmartHeap Library 7.4

    File System: NTFS

    System State: Default

    NOTES

    -----

    Portability flags:

    176.gcc: -Dalloca=_alloca /F10000000

    186.crafty: -DNT_i386

    253.perlbmk: -DSPEC_CPU2000_NTOS -DPERLDLL /MT

    254.gap: -DSYS_HAS_CALLOC_PROTO -DSYS_HAS_MALLOC_PROTO

    Feedback optimization:

    +FDO: PASS1= -Qprof_gen PASS2= -Qprof_use

    Baseline Tuning Flags:

    for C programs:

    -fast +FDO shlW32M.lib

    for C++ program 252.eon:

    -fast -Qcxx_features +FDO

    Peak Tuning Flags:

    164.gzip: -fast +FDO

    175.vpr: -fast +FDO

    176.gcc: -fast +FDO

    181.mcf: -fast +FDO shlW32M.lib

    186.crafty: -fast +FDO

    197.parser: -fast +FDO

    252.eon: -fast -Qcxx_features +FDO

    253.perlbmk: -fast +FDO shlW32M.lib

    254.gap: -fast +FDO

    255.vortex -fast +FDO shlW32M.lib

    256.bzip2: -fast +FDO

    300.twolf: -fast +FDO shlW32M.lib

    Extra Libraries:

    shlW32M.lib = Microquill SmartHeap Library 7.4

    see www.microquill.com

    The system bus runs at 800 MHz

    For information about Fujitsu Siemens Computers in your country please see:

    http://www.fujitsu-siemens.com/countries

    -----------------------------------------------------------------------------

    For questions about this result, please contact the tester.

    For other inquiries, please contact [email protected].

    Copyright 1999-2005 Standard Performance Evaluation Corporation

    Generated on Tue Dec 27 11:33:15 2005 by SPEC CPU2000 ASCII formatter v2.1

    SPECint2000BaseBaseBasePeakPeakPeak

    BenchmarksRef TimeRun TimeRatioRef TimeRun TimeRatio

    164.gzip140083.11685140082.8169114001091284140090.71543*

    175.vpr14001021373140098.5142114001131244140090.21552*

    176.gcc110055.91969110057.21923110076.91431110053.62052*

    181.mcf1800236764180014812141800155116318001191516*

    186.crafty100040.12492100039.22553100052.718991000501998*

    197.parser18001401285180011016321800103174118001041733*

    252.eon130044.82905130038.83352130050.12593130044.92895*

    253.perlbmk180095.61883180083.321621800872068180085.22112*

    254.gap110060.71813110060.7181311006018331100601833*

    255.vortex190066287719006230651900692753190065.92881*

    256.bzip2150096.41556150096.4155615001021476150099.21513*

    300.twolf30001891587300014920083000171175430001332254*

    SPECint_base200017431708

    SPECint200019451940

    SPECfp2000BaseBaseBasePeakPeakPeak

    BenchmarksRef TimeRun TimeRatioRef TimeRun TimeRatio

    168.wupwise160056.42838160049.13261

    171.swim3100138225331001332326

    172.mgrid18001041723180096.81859

    173.applu2100992122210085.82448

    177.mesa140076.11840140057.82423

    178.galgel2900903222290080.73596

    179.art260098.52639260068.33805

    183.equake130076.21706130069.21880

    187.facerec190071.42661190071.82647

    188.ammp2200134164122001251755

    189.lucas2000108184920001021963

    191.fma3d2100123171321001221720

    200.sixtrack11001179391100112979

    301.apsi2600152171126001461781

    SPECfp_base20001976

    SPECfp20002191

    SPEC CFP2000 Summary

    Advanced Micro Devices TYAN Tomcat K8E (S2865), AMD Opteron (TM) 154

    Sun Dec 11 13:58:07 2005

    Sun Dec 11 12:05:44 2005

    Hardware Vendor: Advanced Micro Devices

    Model Name: TYAN Tomcat K8E (S2865), AMD Opteron (TM) 154

    CPU: AMD Opteron (TM) 154 (939 pin)

    CPU MHz: 2800

    FPU: Integrated

    CPU(s) enabled: 1 core, 1 chip, 1 core/chip

    CPU(s) orderable: 1

    Parallel: No

    Primary Cache: 64KBI + 64KBD on chip

    Secondary Cache: 1024KB (I+D) on chip

    L3 Cache: N/A

    Other Cache: N/A

    Memory: 2x512MB, DDR400 CL2

    Disk Subsystem: IDE, 160 GB

    Other Hardware: None

    SOFTWARE

    --------

    Operating System: SuSE Linux Enterprise Server 9 for AMD64

    Compiler: PathScale EKOPath(TM) Compiler

    Suite, Release 2.3

    File System: Linux/ext3

    System State: Multi-user, run level 3

    NOTES

    -----

    Tested by Advanced Micro Devices

    +FDO: PASS1= -fb_create fbdata PASS2= -fb_opt fbdata

    Baseline optimization flags:

    C programs: -Ofast +FDO

    C++ programs: -Ofast +FDO

    Portability Flags:

    186.crafty: -DLINUX_i386

    252.eon: -DHAS_ERRLIST -DSPEC_CPU2000_LP64

    253.perlbmk: -DSPEC_CPU2000_LINUX_I386 -DSPEC_CPU2000_NEED_BOOL

    -DSPEC_CPU2000_GLIBC22 -DSPEC_CPU2000_LP64

    254.gap: -DSYS_IS_USG -DSYS_HAS_IOCTL_PROTO -DSYS_HAS_TIME_PROTO

    -DSYS_HAS_CALLOC_PROTO -DSPEC_CPU2000_LP64

    255.vortex: -DSPEC_CPU2000_LP64

    Peak Tuning:

    164.gzip: -O3 -ipa -WOPT:val=0 -OPT:unroll_size=0 +FDO

    175.vpr: -O3 -ipa -m32 +FDO

    176.gcc: -O3 -IPA:plimit=10000 -LNO:opt=0 -OPT:goto=off +FDO

    181.mcf: -O3 -ipa -IPA:field_reorder=on -m32 +FDO

    186.crafty: -Ofast -CG:local_fwd_sched=on -LNO:opt=0 -WOPT:val=0 +FDO

    197.parser: -O3 -ipa -m32 -IPA:ctype=on +FDO

    252.eon: -Ofast -CG:gcm=off:p2align_freq=1:prefetch=off -IPA:plimit=4000

    -OPT:treeheight=on -TENV:X=4:frame_pointer=off -fno-exceptions

    -LNO:fu=10:full_unroll_outer=on -GRA:optimize_boundary=on +FDO

    253.perlbmk: -O2 -ipa -OPT:Ofast:transform_to_memlib=off

    -fno-math-errno -IPA:plimit=10000 +FDO

    254.gap: basepeak = true

    255.vortex: -Ofast -OPT:goto=off -CG:p2align=on

    -GRA:optimize_boundary=on -IPA:min_hotness=120 +FDO

    256.bzip2: basepeak = true

    300.twolf: -O2 -CG:gcm=off:p2align_freq=100000

    -OPT:Ofast:unroll_times_max=8:unroll_size=256:alias=disjoint

    -WOPT:mem_opnds=on -m32 +FDO

    Corsair CMX512-3200XL memory used in Dual Channel configuration

    Memory timings manually set in BIOS: CAS=2, Trcd=2, Tras=5, Trp=2

    BIOS rev 3.01

    The tested system can be assembled using a standard ATX case and an Antec True 550

    watt EPS12V Power Supply

  • *CS252 s06 Adv. Memory Hieriarchy 2*Fallacies and PitfallsNot delivering high memory bandwidth in a cache-based system10 Fastest computers at Stream benchmark [McCalpin 2005]Only 4/10 computers rely on data caches, and their memory BW per processor is 7X to 25X slower than NEC SX7

    CS252 s06 Adv. Memory Hieriarchy 2

    Chart1

    876174.727380.459375

    8540621668.08984375

    60749237968.25

    541178.133823.63125

    43478413587

    4145731619.42578125

    4073516364.859375

    34944614560.25

    347712.35433.0046875

    33255141568.875

    System Memory BW

    Per Processor Memory BW

    Sheet1

    Machine IDCOPYPer CPUncpus

    ---------------------------------------------

    NEC_SX-7 (32)876,17527,380132

    SGI_Altix_3000 (512)854,0621,6682512

    NEC_SX-5-16A (16)607,49237,968316

    NEC_SX-7 (16)541,17833,824416

    NEC_SX-4 (32)434,78413,587532

    SGI_Altix_3000 (256)414,5731,6196256

    HP_AlphaServer (64)407,3516,365764

    NEC_SX-4 (24)349,44614,560824

    HP_AlphaServer (64)347,7125,433964

    NEC_SX-5-16A (8)332,55141,569108

    Sheet1

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Megabytes/sec

    Bandwidth

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    System Memory BW

    Per Processor Memory BW

    STREAM "standard" results

    This is the "standard" set of STREAMbenchmark results.

    This set excludes the followingcategories:

    Results for 32-bit operations and operands.

    Assembly language, hand-coded,simulated, orother experimentalresults.

    Obsolete or superceded results.

    STREAM Memory Bandwidth --- John D.McCalpin,[email protected]

    Revised to Mon Oct 3 20:33:16CDT 25

    All results are in MB/s --- 1 MB=10^6 B, *not* 2^20 B

    ------------------------------------------------------------------------

    Machine IDncpusCOPYPer CPUSCALEADDTRIAD

    ------------------------------------------------------------------------

    NEC_SX-732876,17527,380865144.1869179.2872259.1data1

    SGI_Altix_3000512854,0621,66885433810085940 1007828.0 data2

    NEC_SX-5-16A16607,49237,968590390607412583069data3

    NEC_SX-716541,17833,824539823.1548731.5548585.5data4

    NEC_SX-432434,78413,587432886437358436954data5

    SGI_Altix_3000256414,5731,619412108485323488274data6

    HP_AlphaServer_GS1280-130064407,3516,365400142437010431450data7

    NEC_SX-424349,44614,560347354342247345825data8

    HP_AlphaServer_GS128064347,7125,433341890.6373126.5377727.8data9

    NEC_SX-5-16A8332,55141,569332551371160366690data10

    Cray_T932_321024-3E32310,7219,710302182359841359270data11

    NEC_SX-78275,30534,413274726.8278324.5278244.7data12

    NEC_SX-416247,44015,465247343250262250231data

    Cray_T3E-1200512242,981246031288345284920data

    Cray_T3E-900512240,428233501243368265803

    Cray_T932_321024-3E24237,415230100281038282300data

    HP_AlphaServer_GS1280-130032207,441207441226230224390data

    NEC_SX-68202,627192306.2190231.3213024.3data

    HP_AlphaServer_GS128032177,662171680.2186545.2180608.2data

    NEC_SX-5-16A4168,486168509189555189517data

    Cray_T3D-512161,4791681939177595304

    Cray_T932_321024-3E16160,263154880193335194562data

    IBM_eServer_p5-59564157,560152770.7168974173564.2data

    NEC_SX-74138,980138876.3140393.1140384.7data

    NEC_SX-64126,896126620.5127643127633.9data

    NEC_SX-48126,084126084126725126724data

    Cray_T3E-1200256121,503123064144679142580data

    Cray_T3E-900256119,483116861107532126061

    SGI_Altix_300064106,457105359123968124601data

    Cray_C9016105,497104656101736103812data

    HP_AlphaServer_GS1280-130016104,024104024114561114514data

    Cray_T3D-25698,31684241.547824.145248.1

    HP_AlphaServer_GS12801692,03892038100902100902data

    SGI_Origin3800-50025687,02085514.4101695.699680.2data

    NEC_SX-5-16A284,853848539535295328data

    HP_Integrity_SuperDome_16cell6482,276812698303784049data

    HP_Integrity_SuperDome_16cell3281,855808638273482353data

    Cray_T932_321024-3E880,929781319870199343data

    Cray_X1474,417775778362984455data

    NEC_SX-7270,08469856.870572.670572.6data

    Sun_F25K-1050-72chip14468,18467462.775402.776097.8data

    NEC_SX-6263,77163665.763908.663908.4data

    NEC_SX-4463,537635366369463692data

    Cray_T3E-120012860,882615547233571283data

    Sun_F25K-1050-64chip12860,41259495.266334.766506.4data

    Cray_T3E-90012860,175587075717764952

    Cray_C90855,07255391.860843.363229.6data

    Sun_F15K7254,66547703.746090.750724.3data

    SGI_Altix_30003253,431529336220162521data

    HP_AlphaServer_GS1280-1300850,584526745750657506data

    Cray_T3D-12849,15642128.22391222625.2

    HP_AlphaServer_GS1280845,203461025112550384data

    HP_AlphaServer_ES80-1150844,415452034909149729d