cache coherence simulation using gems adam dyess dennis cox

15
Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Upload: sandra-rogers

Post on 28-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Cache Coherence Simulation using GEMS

Adam Dyess

Dennis Cox

Page 2: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Cache Coherence

• Caches are essential for high-performance

• Multiprocessor has many caches to keep consistent.

• Cache Coherence Protocols– Dependent on architecture and applications– Can be difficult to validate correctness– Simulation is invaluable

Page 3: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Cache Coherence Simulators

• LIMES

• RSIM

• M5

• ccSIM

• TLA+/TLC

Page 4: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

GEMS Overview

• Fully Functional simulation

• Timing focus; Simics handles functionality

• Ruby - Memory simulator– Cache coherence protocol– Interconnection network– Memory architecture

• Opal - Out of Order execution simulator

Page 5: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

SLICC

• Specification Language for Implementing Cache Coherence

• Protocol specified using– States– Events– Actions– Transitions

Page 6: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

SLICC Documentation

Page 7: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

SLICC Documentation

Page 8: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Installation (SIMICS)

• Assessing the Host Machine

• Acquiring a Simics License

• Downloading Simics

• Follow Simics Installation Instructions

• Test Simics

Page 9: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Preparing Simics

• Install Solaris

• Edit the Hardware Configuration– Create CDROM Image if you want to import

pre-compiled information– Load the CDROM Image file into the Hardware

Configuration

• Startup New Hardware

• Save Checkpoint

Page 10: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Installation (GEMS)

• Download and Install – Copy Simics into GEMS directory

• Compile Ruby, Opal, and a Cache Coherency Protocol

• Startup Simics– GEMS documentation is excellent at describing how to

start simics using the newly compiled cache coherency protocol.

– http://www.cs.wisc.edu/gems/doc/wiki/moin.cgi

– Load the Checkpoint– Init Ruby (and optionally opal)

Page 11: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Pitfalls

• Getting files in and out of Simics

• Setting RUBY parameters properly

• Simics over XWindows

Page 12: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Tested Simulation

• Heat Distribution Problem built on PTHREADS

• The operating system would disperse the 8 threads onto 8 different processors

• Each thread swapped data using shared memory

• Barriers were also used to synchronize the threads’ sharing

Page 13: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Simulation Results?Ruby Configuration------------------protocol: MOSI_SMP_bcastsimics_version: simics-2.0.28compiled_at: 12:22:02, Mar 16 2005RUBY_DEBUG: falsehostname: eb22909.eng.uah.edug_RANDOM_SEED: 1g_DEADLOCK_THRESHOLD: 50000g_FORWARDING_ENABLED: falseRANDOMIZATION: falseg_SYNTHETIC_DRIVER: falseg_DETERMINISTIC_DRIVER: falseg_FILTERING_ENABLED: falseg_DISTRIBUTED_PERSISTENT_ENABLED: trueg_DYNAMIC_TIMEOUT_ENABLED: trueg_RETRY_THRESHOLD: 1g_FIXED_TIMEOUT_LATENCY: 300g_trace_warmup_length: 1000000g_bash_bandwidth_adaptive_threshold: 0.75g_tester_length: 0g_synthetic_locks: 2048g_deterministic_addrs: 1g_SpecifiedGenerator: DetermInvGeneratorg_callback_counter: 0g_NUM_COMPLETIONS_BEFORE_PASS: 0g_think_time: 5g_hold_time: 5g_wait_time: 5PROTOCOL_DEBUG_TRACE: true

DEBUG_FILTER_STRING: noneDEBUG_VERBOSITY_STRING: noneDEBUG_START_TIME: 0DEBUG_OUTPUT_FILENAME: noneSIMICS_RUBY_MULTIPLIER: 2OPAL_RUBY_MULTIPLIER: 2TRANSACTION_TRACE_ENABLED: falseUSER_MODE_DATA_ONLY: falsePROFILE_HOT_LINES: falsePROFILE_ALL_INSTRUCTIONS: falsePRINT_INSTRUCTION_TRACE: falseBLOCK_STC: falsePERFECT_MEMORY_SYSTEM: falseDATA_BLOCK: falseREMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH: falseg_SIMICS: trueL1_CACHE_ASSOC: 4L1_CACHE_NUM_SETS_BITS: 8L2_CACHE_ASSOC: 4L2_CACHE_NUM_SETS_BITS: 16g_MEMORY_SIZE_BYTES: 1073741824g_DATA_BLOCK_BYTES: 64g_PAGE_SIZE_BYTES: 4096g_NUM_PROCESSORS: 8g_NUM_L2_BANKS: 8g_NUM_MEMORIES: 8g_PROCS_PER_CHIP: 1g_NUM_CHIPS: 8g_NUM_CHIP_BITS: 3

g_MEMORY_SIZE_BITS: 30g_DATA_BLOCK_BITS: 6g_PAGE_SIZE_BITS: 12g_NUM_PROCESSORS_BITS: 3g_PROCS_PER_CHIP_BITS: 0g_NUM_L2_BANKS_BITS: 3g_NUM_L2_BANKS_PER_CHIP_BITS: 0g_NUM_L2_BANKS_PER_CHIP: 1g_NUM_MEMORIES_BITS: 3g_NUM_MEMORIES_PER_CHIP: 1g_MEMORY_MODULE_BITS: 21g_MEMORY_MODULE_BLOCKS: 2097152MAP_L2BANKS_TO_LOWEST_BITS: falseDIRECTORY_CACHE_LATENCY: 6NULL_LATENCY: 1ISSUE_LATENCY: 2CACHE_RESPONSE_LATENCY_MINUS_1: 11MEMORY_LATENCY: 80DIRECTORY_LATENCY: 80NETWORK_LINK_LATENCY: 14COPY_HEAD_LATENCY: 4ON_CHIP_LINK_LATENCY: 1RECYCLE_LATENCY: 10L2_RECYCLE_LATENCY: 5TIMER_LATENCY: 10000L1_BANK_LATENCY_MINUS_1: 2L2_BANK_LATENCY_MINUS_2: 4TBE_RESPONSE_LATENCY: 1PERIODIC_TIMER_WAKEUPS: true

L1_REQUEST_LATENCY: 2L2_REQUEST_LATENCY: 4SINGLE_ACCESS_L2_BANKS: trueSEQUENCER_TO_CONTROLLER_LATENCY: 4L1CACHE_TRANSITIONS_PER_RUBY_CYCLE: 32L2CACHE_TRANSITIONS_PER_RUBY_CYCLE: 32DIRECTORY_TRANSITIONS_PER_RUBY_CYCLE: 32g_SEQUENCER_OUTSTANDING_REQUESTS: 16NUMBER_OF_TBES: 128NUMBER_OF_L1_TBES: 32NUMBER_OF_L2_TBES: 32FINITE_BUFFERING: falseFINITE_BUFFER_SIZE: 3PROCESSOR_BUFFER_SIZE: 10PROTOCOL_BUFFER_SIZE: 32TSO: falseg_MASK_PREDICTOR_CONFIG: AlwaysBroadcastg_TOKEN_REISSUE_THRESHOLD: 2g_PERSISTENT_PREDICTOR_CONFIG: Noneg_NETWORK_TOPOLOGY: HIERARCHICAL_SWITCHg_CACHE_DESIGN: NUCAg_endpoint_bandwidth: 10000g_adaptive_routing: trueNUMBER_OF_VIRTUAL_NETWORKS: 4FAN_OUT_DEGREE: 4g_PRINT_TOPOLOGY: false[Profiler printConfig]

Network Configuration

---------------------network: SIMPLE_NETWORK

virtual_net_0: active, orderedvirtual_net_1: active, unorderedvirtual_net_2: inactivevirtual_net_3: inactive

Simics ruby multiplier: 2Simics stall time: 2000000000

Chip Config-----------TBEs_per_TBETable: 128Cache config: L1Cache_0_L1I cache_associativity: 4 num_cache_sets_bits: 8 num_cache_sets: 256 cache_set_size_bytes: 16384 cache_set_size_Kbytes: 16 cache_set_size_Mbytes: 0.015625 cache_size_bytes: 65536 cache_size_Kbytes: 64 cache_size_Mbytes: 0.0625Cache config: L1Cache_0_L1D cache_associativity: 4 num_cache_sets_bits: 8 num_cache_sets: 256 cache_set_size_bytes: 16384

cache_set_size_Kbytes: 16 cache_set_size_Mbytes: 0.015625 cache_size_bytes: 65536 cache_size_Kbytes: 64 cache_size_Mbytes: 0.0625Cache config: L1Cache_0_L2 cache_associativity: 4 num_cache_sets_bits: 16 num_cache_sets: 65536 cache_set_size_bytes: 4194304 cache_set_size_Kbytes: 4096 cache_set_size_Mbytes: 4 cache_size_bytes: 16777216 cache_size_Kbytes: 16384 cache_size_Mbytes: 16sequencer: STD_Sequencer - SCStore buffer entries: 128 (Only valid if TSO is enabled)memory_bits: 30memory_size_bytes: 1073741824memory_size_Kbytes: 1.04858e+06memory_size_Mbytes: 1024memory_size_Gbytes: 1module_bits: 21module_size_lines: 2097152module_size_bytes: 134217728module_size_Kbytes: 131072module_size_Mbytes: 128

Real time: Apr/20/2005 16:26:33

Profiler Stats--------------Elapsed_time_in_seconds: 8368Elapsed_time_in_minutes: 139.467Elapsed_time_in_hours: 2.32444Elapsed_time_in_days: 0.0968519

Ruby_current_time: 26376000Ruby_start_time: 1Ruby_cycles: 26375999

mbytes_resident: 232.309mbytes_total: 247.68resident_ratio: 0.937987

L1D_cache cache stats: L1D_cache_total_misses: 28732 L1D_cache_total_demand_misses: 28732 L1D_cache_total_prefetches: 0 L1D_cache_total_sw_prefetches: 0 L1D_cache_total_hw_prefetches: 0 L1D_cache_misses_per_transaction: 28732 L1D_cache_misses_per_instruction: 7.66225e-05 L1D_cache_instructions_per_misses: 13051

L1D_cache_request_type_LD: 51.3156% L1D_cache_request_type_ST: 43.0565% L1D_cache_request_type_ATOMIC: 5.62787%

L1D_cache_access_mode_type_SupervisorMode: 24019 83.5967% L1D_cache_access_mode_type_UserMode: 4713 16.4033% L1D_cache_request_size: [binsize: log2 max: 64 count: 28732 average: 25.3078 | standard deviation: 27.779 | 0 1878 661 7732 8723 0 0 9738 ]

L1I_cache cache stats: L1I_cache_total_misses: 21542 L1I_cache_total_demand_misses: 21542 L1I_cache_total_prefetches: 0 L1I_cache_total_sw_prefetches: 0 L1I_cache_total_hw_prefetches: 0 L1I_cache_misses_per_transaction: 21542 L1I_cache_misses_per_instruction: 5.74482e-05 L1I_cache_instructions_per_misses: 17407

L1I_cache_request_type_IFETCH: 100%

L1I_cache_access_mode_type_SupervisorMode: 17190 79.7976% L1I_cache_access_mode_type_UserMode: 4352 20.2024% L1I_cache_request_size: [binsize: log2 max: 4 count: 21542 average: 4 | standard deviation: 0 | 0 0 0 21542 ]

L2_cache cache stats: L2_cache_total_misses: 28512 L2_cache_total_demand_misses: 28512 L2_cache_total_prefetches: 0 L2_cache_total_sw_prefetches: 0 L2_cache_total_hw_prefetches: 0 L2_cache_misses_per_transaction: 28512

L2_cache_misses_per_instruction: 7.60358e-05 L2_cache_instructions_per_misses: 13151.7

L2_cache_request_type_LD: 33.165% L2_cache_request_type_ST: 38.9766% L2_cache_request_type_ATOMIC: 3.22671% L2_cache_request_type_IFETCH: 24.6317%

L2_cache_access_mode_type_SupervisorMode: 24839 87.1177% L2_cache_access_mode_type_UserMode: 3673 12.8823% L2_cache_request_size: [binsize: log2 max: 64 count: 28512 average: 24.927 | standard deviation: 28.0464 | 0 1357 577 11484 5424 0 0 9670 ]

Total_misses: 28512total_misses: 28512 [ 2098 797 849 608 635 3632 18105 1788 ]user_misses: 3673 [ 0 0 0 0 0 234 3103 336 ]supervisor_misses: 24839 [ 2098 797 849 608 635 3398 15002 1452 ]

instruction_executed: 374981341 [ 52017683 51976805 51917161 52160663 52135482 48815032 15010842 50947673 ]cycles_per_instruction: 0.562716 [ 0.507058 0.507457 0.50804 0.505668 0.505913 0.540325 1.75713 0.517708 ]misses_per_thousand_instructions: 0.0760358 [ 0.0403324 0.0153338 0.016353 0.0116563 0.0121798 0.0744033 1.20613 0.0350948 ]

transactions_started: 0 [ 0 0 0 0 0 0 0 0 ]transactions_ended: 0 [ 0 0 0 0 0 0 0 0 ]instructions_per_transaction: 0 [ 0 0 0 0 0 0 0 0 ]cycles_per_transaction: 0 [ 0 0 0 0 0 0 0 0 ]

misses_per_transaction: 0 [ 0 0 0 0 0 0 0 0 ]

Busy Controller Counts:L1Cache-0:0 L1Cache-1:0 L1Cache-2:0 L1Cache-3:0 L1Cache-4:0 L1Cache-5:0 L1Cache-6:0 L1Cache-7:0

Directory-0:0 Directory-1:0 Directory-2:0 Directory-3:0 Directory-4:0 Directory-5:0 Directory-6:0 Directory-7:0

Busy Bank Count:0

L1TBE_usage: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]L2TBE_usage: [binsize: 1 max: 0 count: 28512 average: 0 | standard deviation: 0 | 28512 ]StopTable_usage: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]sequencer_requests_outstanding: [binsize: 1 max: 1 count: 50274 average: 1 | standard deviation: 0 | 0 50274 ]store_buffer_size: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]unique_blocks_in_store_buffer: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]

All Non-Zero Cycle Demand Cache Accesses----------------------------------------miss_latency: [binsize: 4 max: 610 count: 50274 average: 113.155 | standard deviation: 97.9074 | 0 21762 0 0 0 0 0 0 0 0 0 0 0 0 0 0 430 104 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2383 63 332 164 131 238 23 246 5 150 2 23 8 0 0 0 1 21736 100 1846 10 164 119 10 94 2 41 0 15 0 0 0 0 0 0 1 27 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 15 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 11 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

miss_latency_LD: [binsize: 4 max: 429 count: 14744 average: 122.583 | standard deviation: 91.6827 | 0 5288 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1534 47 302 158 130 237 23 241 4 147 1 23 8 0 0 0 1 5970 19 452 4 54 34 6 31 1 14 0 1 0 0 0 0 0 0 0 9 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]miss_latency_ST: [binsize: 4 max: 610 count: 12371 average: 177.71 | standard deviation: 66.447 | 0 1258 0 0 0 0 0 0 0 0 0 0 0 0 0 0 336 101 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 646 14 22 5 1 0 0 4 1 1 1 0 0 0 0 0 0 9352 27 496 1 29 14 3 12 0 3 0 1 0 0 0 0 0 0 1 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]miss_latency_ATOMIC: [binsize: 4 max: 427 count: 1617 average: 103.586 | standard deviation: 95.0209 | 0 697 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 187 2 8 1 0 1 0 1 0 2 0 0 0 0 0 0 0 551 6 48 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]miss_latency_IFETCH: [binsize: 2 max: 249 count: 21542 average: 70.3483 | standard deviation: 95.4592 | 0 0 14519 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5863 26 22 846 4 4 0 79 2 0 71 1 0 0 51 1 0 0 24 0 0 13 ]miss_latency_NULL: [binsize: 4 max: 610 count: 50274 average: 113.155 | standard deviation: 97.9074 | 0 21762 0 0 0 0 0 0 0 0 0 0 0 0 0 0 430 104 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2383 63 332 164 131 238 23 246 5 150 2 23 8 0 0 0 1 21736 100 1846 10 164 119 10 94 2 41 0 15 0 0 0 0 0 0 1 27 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 15 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 11 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]miss_latency_L2Miss: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]

All Non-Zero Cycle SW Prefetch Requests------------------------------------

prefetch_latency: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]prefetch_latency_L2Miss:[binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]multicast_retries: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]gets_mask_prediction_count: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]getx_mask_prediction_count: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]explicit_training_mask: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]

conflicting_histogram: [binsize: log2 max: 26374003 count: 28512 average: 1.38371e+07 | standard deviation: 1.55252e+07 | 0 0 0 5 0 0 0 0 1 8 9 19 38 63 72 0 0 0 0 0 116 1136 398 7457 8077 11113 ]conflicting_histogram_percent: [binsize: log2 max: 26374003 count: 28512 average: 1.38371e+07 | standard deviation: 1.55252e+07 | 0 0 0 0.0175365 0 0 0 0 0.0035073 0.0280584 0.0315657 0.0666386 0.133277 0.22096 0.252525 0 0 0 0 0 0.406846 3.98429 1.3959 26.1539 28.3284 38.9766 ]

Request Profile---------------

I M GETS 885 3.10396 I M GETX 118 0.413861 I M GET_INSTR 2 0.00701459 I OS GETS 322 1.12935 I OS GETX 5 0.0175365 I OSS GETS 1192 4.1807 I OSS GETX 17 0.059624 NP C GETS 5813 20.3879 NP C GETX 9244 32.4214 NP C GET_INSTR 4656 16.33

NP M GETS 453 1.5888 NP M GETX 158 0.554153 NP M GET_INSTR 14 0.0491021 NP OS GETS 33 0.115741 NP OSS GETS 9 0.0315657 NP S GETS 512 1.79574 NP S GETX 22 0.0771605 NP S GET_INSTR 1293 4.53493 NP SS GETS 237 0.831229 NP SS GETX 2 0.00701459 NP SS GET_INSTR 1058 3.71072 O M GETX 1 0.0035073 O OS GETX 301 1.0557 O OSS GETX 235 0.824214 S M GETX 63 0.22096 S OS GETX 538 1.88692 S OSS GETX 78 0.273569 S S GETX 1186 4.15965 S SS GETX 65 0.227974

filter_action: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]

Message Delayed Cycles----------------------Total_delay_cycles: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]Total_nonPF_delay_cycles: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ] virtual_network_0_delay_cycles: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ] virtual_network_1_delay_cycles: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation:

NaN | 0 ] virtual_network_2_delay_cycles: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ] virtual_network_3_delay_cycles: [binsize: 1 max: 0 count: 0 average: NaN |standard deviation: NaN | 0 ]

Resource Usage--------------page_size: 4096user_time: 8259system_time: 8page_reclaims: 71016page_faults: 14swaps: 0block_inputs: 0block_outputs: 0MessageBuffer: [Chip 0 0, L1Cache, mandatoryQueue_in] stats - msgs:2604 full:0MessageBuffer: [Chip 1 0, L1Cache, mandatoryQueue_in] stats - msgs:797 full:0MessageBuffer: [Chip 2 0, L1Cache, mandatoryQueue_in] stats - msgs:855 full:0MessageBuffer: [Chip 3 0, L1Cache, mandatoryQueue_in] stats - msgs:608 full:0MessageBuffer: [Chip 4 0, L1Cache, mandatoryQueue_in] stats - msgs:635 full:0MessageBuffer: [Chip 5 0, L1Cache, mandatoryQueue_in] stats - msgs:4010 full:0MessageBuffer: [Chip 6 0, L1Cache, mandatoryQueue_in] stats - msgs:38932 full:0MessageBuffer: [Chip 7 0, L1Cache, mandatoryQueue_in] stats - msgs:1833 full:0

Network Stats-------------

switch_0_inlinks: 1

switch_0_outlinks: 1links_utilized_percent_switch_0: 0.0302487 links_utilized_percent_switch_0_link_0: 0.0302487 bw: 10000 base_latency: 14

outgoing_messages_switch_0_link_0_Control: 2098 16784 [ 2098 0 0 0 ] base_latency: 14 outgoing_messages_switch_0_link_0_Data: 875 63000 [ 0 875 0 0 ] base_latency: 14

switch_1_inlinks: 1switch_1_outlinks: 1links_utilized_percent_switch_1: 0.00678496 links_utilized_percent_switch_1_link_0: 0.00678496 bw: 10000 base_latency: 14

outgoing_messages_switch_1_link_0_Control: 797 6376 [ 797 0 0 0 ] base_latency: 14 outgoing_messages_switch_1_link_0_Data: 160 11520 [ 0 160 0 0 ] base_latency: 14

switch_2_inlinks: 1switch_2_outlinks: 1links_utilized_percent_switch_2: 0.00898999 links_utilized_percent_switch_2_link_0: 0.00898999 bw: 10000 base_latency: 14

outgoing_messages_switch_2_link_0_Control: 849 6792 [ 849 0 0 0 ] base_latency: 14 outgoing_messages_switch_2_link_0_Data: 235 16920 [ 0 235 0 0 ] base_latency: 14

switch_3_inlinks: 1switch_3_outlinks: 1links_utilized_percent_switch_3: 0.00561116 links_utilized_percent_switch_3_link_0: 0.00561116 bw: 10000 base_latency: 14

outgoing_messages_switch_3_link_0_Control: 608 4864 [ 608 0 0 0 ] base_latency: 14

outgoing_messages_switch_3_link_0_Data: 138 9936 [ 0 138 0 0 ] base_latency: 14

switch_4_inlinks: 1switch_4_outlinks: 1links_utilized_percent_switch_4: 0.00451926 links_utilized_percent_switch_4_link_0: 0.00451926 bw: 10000 base_latency: 14

outgoing_messages_switch_4_link_0_Control: 635 5080 [ 635 0 0 0 ] base_latency: 14 outgoing_messages_switch_4_link_0_Data: 95 6840 [ 0 95 0 0 ] base_latency: 14

switch_5_inlinks: 1switch_5_outlinks: 1links_utilized_percent_switch_5: 0.0330998 links_utilized_percent_switch_5_link_0: 0.0330998 bw: 10000 base_latency: 14

outgoing_messages_switch_5_link_0_Control: 3632 29056 [ 3632 0 0 0 ] base_latency: 14 outgoing_messages_switch_5_link_0_Data: 809 58248 [ 0 809 0 0 ] base_latency: 14

switch_6_inlinks: 1switch_6_outlinks: 1links_utilized_percent_switch_6: 0.0840127 links_utilized_percent_switch_6_link_0: 0.0840127 bw: 10000 base_latency: 14

outgoing_messages_switch_6_link_0_Control: 18105 144840 [ 18105 0 0 0 ] base_latency: 14 outgoing_messages_switch_6_link_0_Data: 1066 76752 [ 0 1066 0 0 ] base_latency: 14

switch_7_inlinks: 1switch_7_outlinks: 1links_utilized_percent_switch_7: 0.0181438 links_utilized_percent_switch_7_link_0: 0.0181438 bw: 10000 base_latency: 14

outgoing_messages_switch_7_link_0_Control: 1788 14304 [ 1788 0 0 0 ] base_latency: 14 outgoing_messages_switch_7_link_0_Data: 466 33552 [ 0 466 0 0 ] base_latency: 14

switch_8_inlinks: 1switch_8_outlinks: 1links_utilized_percent_switch_8: 0.0817288 links_utilized_percent_switch_8_link_0: 0.0817288 bw: 10000 base_latency: 14

outgoing_messages_switch_8_link_0_Data: 2994 215568 [ 0 2994 0 0 ] base_latency: 14

switch_9_inlinks: 1switch_9_outlinks: 1links_utilized_percent_switch_9: 0.0821929 links_utilized_percent_switch_9_link_0: 0.0821929 bw: 10000 base_latency: 14

outgoing_messages_switch_9_link_0_Data: 3011 216792 [ 0 3011 0 0 ] base_latency: 14

switch_10_inlinks: 1switch_10_outlinks: 1links_utilized_percent_switch_10: 0.0805005 links_utilized_percent_switch_10_link_0: 0.0805005 bw: 10000 base_latency: 14

outgoing_messages_switch_10_link_0_Data: 2949 212328 [ 0 2949 0 0 ] base_latency: 14

switch_11_inlinks: 1switch_11_outlinks: 1links_utilized_percent_switch_11: 0.0836397 links_utilized_percent_switch_11_link_0: 0.0836397 bw: 10000 base_latency: 14

outgoing_messages_switch_11_link_0_Data: 3064 220608 [ 0 3064 0 0 ] base_latency: 14

switch_12_inlinks: 1

switch_12_outlinks: 1links_utilized_percent_switch_12: 0.0838581 links_utilized_percent_switch_12_link_0: 0.0838581 bw: 10000 base_latency: 14

outgoing_messages_switch_12_link_0_Data: 3072 221184 [ 0 3072 0 0 ] base_latency: 14

switch_13_inlinks: 1switch_13_outlinks: 1links_utilized_percent_switch_13: 0.0812921 links_utilized_percent_switch_13_link_0: 0.0812921 bw: 10000 base_latency: 14

outgoing_messages_switch_13_link_0_Data: 2978 214416 [ 0 2978 0 0 ] base_latency: 14

switch_14_inlinks: 1switch_14_outlinks: 1links_utilized_percent_switch_14: 0.0824659 links_utilized_percent_switch_14_link_0: 0.0824659 bw: 10000 base_latency: 14

outgoing_messages_switch_14_link_0_Data: 3021 217512 [ 0 3021 0 0 ] base_latency: 14

switch_15_inlinks: 1switch_15_outlinks: 1links_utilized_percent_switch_15: 0.0818653 links_utilized_percent_switch_15_link_0: 0.0818653 bw: 10000 base_latency: 14

outgoing_messages_switch_15_link_0_Data: 2999 215928 [ 0 2999 0 0 ] base_latency: 14

switch_16_inlinks: 4switch_16_outlinks: 1

links_utilized_percent_switch_16: 0.0516348 links_utilized_percent_switch_16_link_0: 0.0516348 bw: 10000 base_latency: 14

outgoing_messages_switch_16_link_0_Control: 4352 34816 [ 4352 0 0 0 ] base_latency: 14 outgoing_messages_switch_16_link_0_Data: 1408 101376 [ 0 1408 0 0 ] base_latency: 14

switch_17_inlinks: 4switch_17_outlinks: 1links_utilized_percent_switch_17: 0.139776 links_utilized_percent_switch_17_link_0: 0.139776 bw: 10000 base_latency: 14

outgoing_messages_switch_17_link_0_Control: 24160 193280 [ 24160 0 0 0 ] base_latency: 14 outgoing_messages_switch_17_link_0_Data: 2436 175392 [ 0 2436 0 0 ] base_latency: 14

switch_18_inlinks: 4switch_18_outlinks: 1links_utilized_percent_switch_18: 0.328062 links_utilized_percent_switch_18_link_0: 0.328062 bw: 10000 base_latency: 14

outgoing_messages_switch_18_link_0_Data: 12018 865296 [ 0 12018 0 0 ] base_latency: 14

switch_19_inlinks: 4switch_19_outlinks: 1links_utilized_percent_switch_19: 0.329481 links_utilized_percent_switch_19_link_0: 0.329481 bw: 10000 base_latency: 14

outgoing_messages_switch_19_link_0_Data: 12070 869040 [ 0 12070 0 0 ] base_latency: 14

switch_20_inlinks: 4

switch_20_outlinks: 4links_utilized_percent_switch_20: 0.255573 links_utilized_percent_switch_20_link_0: 0.199682 bw: 10000 base_latency: 14 links_utilized_percent_switch_20_link_1: 0.736133 bw: 10000 base_latency: 14 links_utilized_percent_switch_20_link_2: 0.0423597 bw: 10000 base_latency: 14 links_utilized_percent_switch_20_link_3: 0.0441189 bw: 10000 base_latency: 14

outgoing_messages_switch_20_link_0_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_20_link_0_Data: 4147 298584 [ 0 4147 0 0 ] base_latency: 14 outgoing_messages_switch_20_link_1_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_20_link_1_Data: 23799 1713528 [ 0 23799 0 0 ] base_latency: 14 outgoing_messages_switch_20_link_2_Control: 13966 111728 [ 13966 0 0 0 ] base_latency: 14 outgoing_messages_switch_20_link_3_Control: 14546 116368 [ 14546 0 0 0 ] base_latency: 14

switch_21_inlinks: 1switch_21_outlinks: 4links_utilized_percent_switch_21: 0.114848 links_utilized_percent_switch_21_link_0: 0.141265 bw: 10000 base_latency: 14 links_utilized_percent_switch_21_link_1: 0.107252 bw: 10000 base_latency: 14 links_utilized_percent_switch_21_link_2: 0.108617 bw: 10000 base_latency: 14 links_utilized_percent_switch_21_link_3: 0.102257 bw: 10000 base_latency: 14

outgoing_messages_switch_21_link_0_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_21_link_0_Data: 2007 144504 [ 0 2007 0 0 ] base_latency: 14 outgoing_messages_switch_21_link_1_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_21_link_1_Data: 761 54792 [ 0 761 0 0 ] base_latency: 14 outgoing_messages_switch_21_link_2_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_21_link_2_Data: 811 58392 [ 0 811 0 0 ] base_latency: 14 outgoing_messages_switch_21_link_3_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14

outgoing_messages_switch_21_link_3_Data: 578 41616 [ 0 578 0 0 ] base_latency: 14

switch_22_inlinks: 1switch_22_outlinks: 4links_utilized_percent_switch_22: 0.249035 links_utilized_percent_switch_22_link_0: 0.102994 bw: 10000 base_latency: 14 links_utilized_percent_switch_22_link_1: 0.183767 bw: 10000 base_latency: 14 links_utilized_percent_switch_22_link_2: 0.575514 bw: 10000 base_latency: 14 links_utilized_percent_switch_22_link_3: 0.133867 bw: 10000 base_latency: 14

outgoing_messages_switch_22_link_0_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_0_Data: 605 43560 [ 0 605 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_1_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_1_Data: 3564 256608 [ 0 3564 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_2_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_2_Data: 17915 1289880 [ 0 17915 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_3_Control: 28512 228096 [ 28512 0 0 0 ] base_latency: 14 outgoing_messages_switch_22_link_3_Data: 1736 124992 [ 0 1736 0 0 ] base_latency: 14

switch_23_inlinks: 1switch_23_outlinks: 4links_utilized_percent_switch_23: 0.0105899 links_utilized_percent_switch_23_link_0: 0.0101183 bw: 10000 base_latency: 14 links_utilized_percent_switch_23_link_1: 0.0107613 bw: 10000 base_latency: 14 links_utilized_percent_switch_23_link_2: 0.00997877 bw: 10000 base_latency: 14 links_utilized_percent_switch_23_link_3: 0.0115014 bw: 10000 base_latency: 14

outgoing_messages_switch_23_link_0_Control: 3336 26688 [ 3336 0 0 0 ] base_latency: 14 outgoing_messages_switch_23_link_1_Control: 3548 28384 [ 3548 0 0 0 ] base_latency: 14

outgoing_messages_switch_23_link_2_Control: 3290 26320 [ 3290 0 0 0 ] base_latency: 14 outgoing_messages_switch_23_link_3_Control: 3792 30336 [ 3792 0 0 0 ] base_latency: 14

switch_24_inlinks: 1switch_24_outlinks: 4links_utilized_percent_switch_24: 0.0110297 links_utilized_percent_switch_24_link_0: 0.011007 bw: 10000 base_latency: 14 links_utilized_percent_switch_24_link_1: 0.0106885 bw: 10000 base_latency: 14 links_utilized_percent_switch_24_link_2: 0.011556 bw: 10000 base_latency: 14 links_utilized_percent_switch_24_link_3: 0.0108675 bw: 10000 base_latency: 14

outgoing_messages_switch_24_link_0_Control: 3629 29032 [ 3629 0 0 0 ] base_latency: 14 outgoing_messages_switch_24_link_1_Control: 3524 28192 [ 3524 0 0 0 ] base_latency: 14 outgoing_messages_switch_24_link_2_Control: 3810 30480 [ 3810 0 0 0 ] base_latency: 14 outgoing_messages_switch_24_link_3_Control: 3583 28664 [ 3583 0 0 0 ] base_latency: 14

Simics Driver Transaction Stats----------------------------------Insn requests: 374980900Data requests: 88705149Memory mapped IO register accesses: 58Device initiated accesses: 0Other initiated accesses: 0Atomic load accesses: 5066Exceptions: 5871Non stallable accesses: 17230Prefetches: 0Cache Flush: 737

Requests of asi 0x4: 191870Requests of asi 0x10: 10080Requests of asi 0x11: 7735Requests of asi 0x14: 893Requests of asi 0x24: 5066Requests of asi 0x71: 90Requests of asi 0x80: 463460432Requests of asi 0xf0: 9883

Simics Driver Transaction Results Stats------------------------------------------Fast path: 463618487Request missed: 50274Sequencer not ready: 0Duplicate instruction fetches: 21541Hit return: 27115Atomic last accesses: 1617

Chip Stats----------

--- L1Cache --- - Event Counts -Load 14744Ifetch 21542Store 13988L1_to_L2 37772L2_to_L1D 7442L2_to_L1I 14520

L2_Replacement 0Own_GETS 9456Own_GET_INSTR 7023Own_GETX 12033Own_PUTX 0Other_GETS 66192Other_GET_INSTR 49161Other_GETX 84231Other_PUTX 0Data 27977

- Transitions -NP Load 7057NP Ifetch 7021NP Store 9426NP Other_GETS 50429NP Other_GET_INSTR 43420NP Other_GETX 79638NP Other_PUTX 0 <--

I Load 2399I Ifetch 2I Store 140I L1_to_L2 189I L2_to_L1D 139I L2_to_L1I 1I L2_Replacement 0 <-- I Other_GETS 3181I Other_GET_INSTR 0 <--

I Other_GETX 1024I Other_PUTX 0 <--

S Load 3180S Ifetch 14205S Store 1930S L1_to_L2 24732S L2_to_L1D 3208S L2_to_L1I 14205S L2_Replacement 0 <-- S Other_GETS 2833S Other_GET_INSTR 4903S Other_GETX 1965S Other_PUTX 0 <--

O Load 175O Ifetch 54O Store 537O L1_to_L2 319O L2_to_L1D 207O L2_to_L1I 54O L2_Replacement 0 <-- O Other_GETS 1358O Other_GET_INSTR 0 <-- O Other_GETX 635O Other_PUTX 0 <--

M Load 1933M Ifetch 260

M Store 1955M L1_to_L2 12532M L2_to_L1D 3888M L2_to_L1I 260M L2_Replacement 0 <-- M Other_GETS 1310M Other_GET_INSTR 16M Other_GETX 260M Other_PUTX 0 <--

IS_AD Load 0 <-- IS_AD Ifetch 0 <-- IS_AD Store 0 <-- IS_AD L1_to_L2 0 <-- IS_AD L2_to_L1D 0 <-- IS_AD L2_to_L1I 0 <-- IS_AD L2_Replacement 0 <-- IS_AD Own_GETS 9456IS_AD Own_GET_INSTR 7023IS_AD Other_GETS 3376IS_AD Other_GET_INSTR 411IS_AD Other_GETX 111IS_AD Other_PUTX 0 <-- IS_AD Data 0 <--

IM_AD Load 0 <-- IM_AD Ifetch 0 <-- IM_AD Store 0 <-- IM_AD L1_to_L2 0 <--

Page 14: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Weaknesses

• Requires a highly capable host Machine

• No modeling of Bus Based Architecture

• No simple way to disable Performance Statistics

Page 15: Cache Coherence Simulation using GEMS Adam Dyess Dennis Cox

Conclusion

• Complexity of Cache Coherency Protocols

• Excellent Interface for testing new Protocols

• GEMS is useful if left running for days or weeks simulating a real operating system environment

• GEMS is not useful for a quick contrast of coherency protocols.