計算機構成特論 - 立命館大学...3 x i xi 32 scheduling: in-order single instruction in...

計算機構成特論ーースーパースカラプロセッサ

立命館大学電子情報工学科

孟林

2020/9/28

1

Advanced Topics in Computer Architecture

ーSuperscalar Processor

Department of Electronic and Computer Engineering,

Ritsumeikan University

Lin Meng2020/9/28

2

コンピュータアーキテクチャってなに？プロセッサってなに？

3

プロセッサ種類• 単一サイクルプロセッサ• マルチサイクルプロセッサ（スカラプロセッサ）• スーパースカラプロセッサ

• マルチコアプロセッサ• メリーコアプロセッサ• GPU

• VLIW• ヘテロジニアスプロセッサ(コンピューティング)

• ホモジニアス 4

What is computer architecture ？What is processor

5

Type of CPU• Single-cycle processor• Multi-cycle processor (Scalar processor)• Superscalar processor

• Multi-core processor • Many-core processor• GPGPU (General-purpose computing on graphics

processing units)

• VLIW (Very long instruction word)• Heterogeneous computing

• homogeneous 6

7

シンクロサイクルプロセッサ

1 clock cycle

8

Single-cycle processor 1 clock cycle

9

マルチサイクルプロセッサ

1 clock cycle

10

Multi-cycle processor

1 clock cycle

11

パイプラインプロセッサ

12

Pipeline Processor

• スカラプロセッサ• １クロックで、最大１回のフェッチから命令演算までのステージを実行できるパイプラインプロセッサである

• スーパースカラプロセッサ：• １クロックで、複数回のフェッチから命令演算までのステージを実行できるパイプラインプロセッサである

スーパースカラプロセッサ

IF EX MA WBDEIF DE EX MA WB

Instruction

Time Clock




INS 1INS 2INS 3INS 4INS 5INS 6INS 7INS 8

13

Superscalar Processor


Instruction

Time Clock




INS 1INS 2INS 3INS 4INS 5INS 6INS 7INS 8

• Scalar Processor • Scalar processors are pipelined processor that are

designed to fetch and issue at most one instruction every cycle

• Superscalar Processor• Superscalar processors are designed to fetch and issue

multiple instruction every cycle.

14

15

命令セット(MIPS)-Execution-type op rs rt rd shamt functR-type

op: オペコード

rs: オペランド：

rt :オペランド

fd :ディスティレーション

例 add $s1, $s2, $s30 18 19 17 0 32

16

Instruction set (MIPS)-Execution-type

op rs rt rd shamt functR-type

op: operation code

rs: operand

rt :operand

rd :destination operand

Example: add $s1, $s2, $s3$3 = $s1 + $s2

17

ロード命令の実行命令形式 I形式

op rs rt address

例 lw $s1, 20($s2) # $s1 = mem[20 + $s2]

例 sw $s1, 20($s2) # mem[20 + $s2] = $s1

18

Load-Store instruction I-type

op rs rt address

Example:lw $s1, 20($s2) # $s1 = mem[20 + $s2]

Example:sw $s1, 20($s2) # mem[20 + $s2] = $s1

19

分岐命令(beq)の実行

op rs rt address

例 A: beq $s1, $s2, 25# if ($s1 == $s2) goto PC+ 25 + 4

else goto PC + 4

命令形式 I形式

B:R1=R1+1

A: beq $s1, $s2, 25

I:R2=R2+1J: Jump C

C:R1=R1+1D:R2=R2+1E:R3=R3+1

B:R1=R1+1 I:R2=R2+1J: Jump C

C:R1=R1+1D:R2=R2+1E:R3=R3+1

Addr: PC+25+4Addr: PC+4

Addr: PC

20

Branch instruction (beq)

op rs rt address

Example A: beq $s1, $s2, 25# if ($s1 == $s2) goto PC+ 25 + 4# else goto PC + 4

I-type

B:R1=R1+1

A: beq $s1, $s2, 25

I:R2=R2+1J: Jump C

C:R1=R1+1D:R2=R2+1E:R3=R3+1


C:R1=R1+1D:R2=R2+1E:R3=R3+1

Addr: PC+25+4Addr: PC+4

Addr: PC

パイプラインステージ• フェッチ：

• 命令メモリから命令をフェッチする• デコード：

• 命令をデコードする。オペランドとオペコードをデコードする

• 演算：• 命令を演算する

• メモリアクセス：• データメモリをアクセルする

• ライトバック/コミット：• レジスタファイルかメモリをアクセスし、演算結果を保存し、命令実行を完了する 21

The pipeline stages of Superscalar processor

• Fetch ： fetch the instruction from instruction cache• Decode： decode the instruction

• Opcode decode, operand decode

• Execution： execute the instructions • Memory Access： access the data memory • WriteBack/Commit：

• writeback the execution results into memory or register file

22

命令キャッシュ

命令デコード

レジスタリネーミング

分岐予測

命令ウィンドウ

LoadStore Branch shift

ArithmeticLogic unit


レジスタファイル

リオーダバッファ

データキャッシュ

フロンドエンド

ディスパッチ実行コア

バックエンド

コミット

スーパースカラプロセッサの構造23

Instruction Cache

Instruction decode

Register Renaming

Branch Prediction

Instruction window

LoadStore Branch shift



Register file

Reorder buffer

Data Cache

Front-end

Dispatch Execution core

Back-end

Commit

Constru

ctu

reof

Supe

rscalar

24

25

Ins1 Add $s1 $s5 $s4

Ins2 Sub $s2 $s1 $s6ALU1

$s5 $s4TIME $s1 $s6

C:R1←R1+1D:R2←R2+1

制限• データ依存

• ソースオペランドが必ず生成されてた

• 制御依存• 分岐命令の結果を得てから、次の命令を実行する

• 資源依存• リソースを超えないこと

26


Ins2 Sub $s2 $s1 $s6ALU1

$s5 $s4TIME $s1 $s6

C:R1←R1+1D:R2←R2+1

Constraints • Data dependence

• The source operands are must be generated

• Control dependence • The control instruction must be executed and

know what is the next instruction will be fetched

• Resource constraints • No over-subscription of resources

どうやって性能を向上

• ハードウェアの追加 ? クロック周波数の増加？• 非現実的に

• スケジューリング

• 予測• 履歴を使用し、予測すること

• 分岐予測• データ予測

27

How to improve the performance• Add the hardware ? Improve the clock

frequency ?• Unreality

• Scheduling • Prediction

• Use the history for predicting • Branch prediction • Data prediction

28

29

データ依存とその緩和手法

r1 = r2 + r3r4 = r1 + r5

r2 = r2 + r3…..r2 = r1 + r5

r3 = r2 + r3…..r2 = r1 + r5

Register renaming r2 = r2 + r3…..I12 = r1 + r5


• 真のデータ依存 : write -> read (RAW hazard)

• 出力依存: write -> write (WAW hazard)

• 逆依存: read->write (RAW hazard)

30

Data dependence and the reduction method

r1 = r2 + r3r4 = r1 + r5

r2 = r2 + r3…..r2 = r1 + r5

r3 = r2 + r3…..r2 = r1 + r5



• True dependence : write -> read (RAW hazard)

• Output dependence: write -> write (WAW hazard)

• Anti-dependence: read->write (RAW hazard)

31

スケジューリング: イン・オーダー

単個命令

イン・オーダー発行

Register file

ValueR

Src1OP Dtag Src2

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

Ready

複数命令

イン・オーダー発行

Register file

ValueR

Instruction i-1

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

Ready0…i-1

……………

Instruction 1

Instruction 0

RegisterNumber

3 x i

xi

32

Scheduling: in-order Single instruction in-order

issue Register file

ValueR

Src1OP Dtag Src2

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

Ready

Multi instructions in-order issue

Register file

ValueR

Instruction i-1

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

ValueR

Ready0…i-1

……………

Instruction 1

Instruction 0

RegisterNumber

3 x i

xi

33

スケジューリング : Tomasulo アルゴリズム

i-1

…

0

Select

tag

……

i-1

0

comp

comp

……

comp

comp

……

……

StagOP Dtag R Value StagR Value

複数命令 out-of-order 発行

34

Scheduling : Tomasulo algorithm

i-1

…

0

Select

tag

……

i-1

0

comp

comp

……

comp

comp

……

……

StagOP Dtag R Value StagR Value

Multi instructions out-of-order issue

データ依存の緩和手法

•Chaining/ ALU cascading •ロード値予測

35

True data dependence reduction method

•Chaining/ ALU cascading •Load value prediction

36

37

Chaining / ALU Cascading


Ins2 Sub $s2 $s1 $s6 ALU1

$s5 $s4TIME

$s1ALU2

$s2

$s6

Clock

38

Chaining / ALU Cascading


Ins2 Sub $s2 $s1 $s6 ALU1

$s5 $s4TIME

$s1ALU2

$s2

$s6

Clock

39

ロード/ストア命令でのデータ依存

i1: LW $1 0($2)i2: ADD $4 $1 $5

IF ID EX MA WB

IF ID

i1

i2 EX MA WB

$1

i1、i2が並列に実行できない

40

Data dependency on load/store instruction

i1: LW $1 0($2)i2: ADD $4 $1 $5

IF ID EX MA WB

IF ID

i1

i2 EX MA WB

$1

i1 and i2 cannot be executed in parallel.

ロード値予測i1: LW $1 0($2)i2: ADD $4 $1 $5

IF ID EX MA WB

IF ID

i1

i2 EX MA WB$1

ロード値予測

性能向上41

Load value prediction

i1: LW $1 0($2)i2: ADD $4 $1 $5

IF ID EX MA WB

IF ID

i1

i2 EX MA WB$1

Load value prediction

Performance increase42

ロード予測の性能議論

• ロード予測なし• Primary cache access : 1 cycle• Secondary cache access : 8 cycles• Memory access : 24 cycles

• ロード予測のミスペナルティ• 30 クロック

• 予測率と予測精度が両方重要である

43

Default of SimpleScalar

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑟𝑟𝑃𝑃𝑃𝑃（予測率） =𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑙𝑙𝑃𝑃𝑟𝑟𝑃𝑃 𝑃𝑃𝑃𝑃𝑖𝑖𝑃𝑃𝑃𝑃𝑛𝑛𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑒𝑒𝑃𝑃𝑃𝑃𝑛𝑛𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑟𝑟𝑃𝑃𝑃𝑃𝑛𝑛𝑃𝑃𝑟𝑟𝑃𝑃𝑎𝑎（予測精度） =𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑖𝑖𝑛𝑛𝑃𝑃𝑃𝑃𝑃𝑃𝑖𝑖𝑖𝑖𝑜𝑜𝑛𝑛𝑙𝑙 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

Discussion about Load Value Predictor• Without load value prediction

• Primary cache access : 1 cycle• Secondary cache access : 8 cycles• Memory access : 24 cycles

• When prediction miss• Penalty is 30 cycles

• Both high prediction rate and high prediction accuracy areimportant for performance improvement

44

Default of SimpleScalar

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑟𝑟𝑃𝑃𝑃𝑃 =𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑙𝑙𝑃𝑃𝑟𝑟𝑃𝑃 𝑃𝑃𝑃𝑃𝑖𝑖𝑃𝑃𝑃𝑃𝑛𝑛𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑒𝑒𝑃𝑃𝑃𝑃𝑛𝑛𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑟𝑟𝑃𝑃𝑃𝑃𝑛𝑛𝑃𝑃𝑟𝑟𝑃𝑃𝑎𝑎 =𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑖𝑖𝑛𝑛𝑃𝑃𝑃𝑃𝑃𝑃𝑖𝑖𝑖𝑖𝑜𝑜𝑛𝑛𝑙𝑙 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑃𝑃𝑃𝑃 𝑃𝑃𝑜𝑜 𝑝𝑝𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

Prediction table

Value

Last value prediction, stride,DFCM

45

•One hop address renaming• ロード命令アドレスをロード値にRenamingすることにより、ロード値を予測する

Load addr Predictioninformation

i1 10i4 20

One Hop

i1: LW $1 0($2)

i4: LW $5 0($6)・・・

・・

・

・

・loop

Prediction table

Value

Last value prediction, stride,DFCM

46

•One hop address renaming method• predict load value by renaming load

instruction address into load value

Load addr Predictioninformation

i1 10i4 20

Conventional Method

i1: LW $1 0($2)

i4: LW $5 0($6)・・・

・・

・

・

・loop

Conventional Method•Last value prediction

• 最後のロード命令/ストアの値を使用し、ロード命令の値を予測する

•Stride prediction•最後の値とstride値を足し算し、予測する

47

ValueLast value

1

Last value stride

3 2

+

Value5

Conventional Method•Last value prediction

• predicts load value bylast value

•Stride prediction• predicts by adding lastvalue and stride value

48

ValueLast value

1

Last value stride

3 2

+

Value5

Conventional Method•Differential Finite ContextMethod(DFCM)

• Stride値のパターンを用いて予測する

49

Last value8

pattern Next stride value0 1 2 41 2 4 82 4 8 16

Stride pattern table

+

Value49

1 2 4 8

+1 +2 +4 +8

16?

Conventional Method•Differential Finite ContextMethod(DFCM)

• predicts by the pattern of stride value

50

Last value8

pattern Next stride value0 1 2 41 2 4 82 4 8 16

Stride pattern table

+

Value50

1 2 4 8

+1 +2 +4 +8

16?

信頼性の評価を用いて

2ビットカウンタ

00 01 10 11

高信頼性低信頼性

正確正確正確

失敗失敗失敗

失敗正確

予測の精度を向上することにより、ミスのペナルティを緩和する

信頼性の情報が２ビット飽和カウンタにより構成、予測テーブルに含まれている信頼性が高いときに、値予測の結果を使用する

e.g.

51

Confidence estimation

2bit counter

00 01 10 11

Highconfidence

Lowconfidence

Correct Correct Correct

Miss Miss Miss

Miss Correct

Improving the precision of prediction is important to decrease miss penalty.Confidence information is represented in 2bit counter, and is involved in prediction table.Prediction value is used only when confidence is high

e.g.

52

面白いミス分析結果

•頻繁的に出現するミスパターン•カウンタタイプ

• 1,2,3,4,5,6…•２値反復タイプ

• 100,200,100,200,100,200…•複数種類の二つの値を反復する

• 150,150,250,250,100,100…

53

Interesting miss analysis results

• Counter type• 1,2,3,4,5,6…

• Iteration of two values• 100,200,100,200,100,200…

• Iteration of multiple occurrences of twovalues

• 150,150,250,250,100,100…

54

Up to 8th(addr) counter type Iteration of two values

Iteration of multiple occurrences of two values other

1st(4270584) 100%

2nd(4270256) 50% 50%

3rd(4254408) 100%

4th(4270424) 100%

5th(4255400) 100%

6th(4270496) 100%

7th(4270560) 100%

8th(4255496) 100%

bzip ベンチマークでのミス分析結果

55

Up to 8th(addr) counter type Iteration of two values

Iteration of multiple occurrences of two values other

1st(4270584) 100%

2nd(4270256) 50% 50%

3rd(4254408) 100%

4th(4270424) 100%

5th(4255400) 100%

6th(4270496) 100%

7th(4270560) 100%

8th(4255496) 100%

Miss pattern on bzipbenchmark

56

Proposed method• Base Predictor + Predictor for Miss Biased Instructions• Dynamically determine miss biased instructions• Pattern oriented prediction for miss biased instruction

57

Last value prediction Miss pattern extraction

pattern1 pattern2 pattern3

Predict value

Load addrBase Predictor Dedicated Predictor for Miss Biased Instructions

提案手法• ベース予測器 + ミスの偏りを予測する分岐予測器• 動的に、予測ミスするものを検知する• パターン傾向を予測する

58

Last value prediction Miss pattern extraction

pattern1 pattern2 pattern3

Predict value

Load addr ベース予測器ミス傾向がある命令を専用する予測器

61

分岐予測器と分類

•分岐予測• 分岐命令の履歴を用いて，分岐先の成否を予測する

• 静的に予測、動的に予測

•分岐予測器の分類• 単体予測器

• Bimodal, Gshare• ハイブリッド予測器

• Combining,Bimode,Bimode-Plus, Agree,Hybrid,PPM-Like,L-TAGE

Control dependence and the reduction method • Branch Prediction

• Use history of branch for predicting the taken/not taken of current branch instruction

• Static prediction/ dynamic prediction

• Branch predictor • Single branch predictor

• Bimodal, Gshare• Hybrid branch predictor

• Combining,Bimode,Bimode-Plus, Agree, Hybrid,PPM-Like,L-TAGE

62

63

単体予測器

64

Single predictor Hybrid predictorBranch Predictors

Pattern History Table( )

65

ハイブリット予測器

• Combining

• GshareとBimodalを組み合わせ,Selector により結果を選ぶ

• Bimode▫ Taken用とNotTaken用Gshareの二つを組み合わせ,Bimodalにより結果を選ぶ

• Combining▫ GshareとBimodalを組み合わせ

,Selectorにより結果を選ぶ

66

Hybrid predictor

• Combining

• GshareとBimodalを組み合わせ,Selector により結果を選ぶ

• Bimode▫ Use two gshare predictors, one is

for Taken and the anthor forNotTaken, then use Bimodal forselect the results

• Combining▫ Combine Gshare and Bimodal

predictor, use a Selector to seletethe results

67

ハイブリット予測器

• Agree▫ BTBのエントリに分岐先を記憶し

,分岐の偏向を記憶する

• Bimode-Plus▫ プログラムの最後まで常にTaken或いはNotTakenのものが存在する． Bias Tableを用意し,偏りのある分岐命令だけBias Tableにより予測を行う

68

Hybrid predictor • Agree▫ Use BTB for keeping the

branch bias

• Bimode-Plus▫ There are branch instruction are

always Taken or NotTaken. The method uses Bias Table to keep the bias and do the prediction

69

従来予測器の問題点• 破壊的な競合

• 異なる分岐命令を同じPHTエントリにアクセスすることにより,予測ミスを生じる

• 各分岐命令が独自のエントリを持つことにより,破壊的な競合を減らすとしたら,ハードウェアの量が膨大

従来の分岐予測器を分析し，

破壊的な競合を減らすことを目指す

分岐命令1

分岐命令2

Instructionmemory

PHT競合

70

The problem of current predictor • aliasing

• Different branch instructions accessing the same entry of PHT

• If every branch instruction have private entry, the aliasing may be reduced. However the hardware become huge.

Analysis the current predictor for reducing the aliasing

Branch 1

Branch 2

Instructionmemory

PHTsliasing

71

Combiningでの予測ミス偏り• Predictor size：8KB,16KB,32KB• Miss-prediction bias branch numbers ： 8,16

• 8 個のよくミスする命令がミスの 75%, 16 個のよくミスした分岐命令が全体の 85% を占める

16

8

Instruction number

85%

75%

72

Miss-Prediction Bias in Combining• Predictor size：8KB,16KB,32KB• Miss-prediction bias branch numbers ： 8,16

• 8 branches occupy 75%, 16 branches occupy 85%

16

8

Instruction number

85%

75%

73

Using branch miss prediction bias

• Detect the miss bias dynamically • Use BTB for counting the miss times

• Re-prediction the Miss bias branches• Every miss-bias instructions have a local predictor and use

it for predicting • It may reduce the aliasing

74

予測ミスの偏りの利用

• 予測ミスの偏りの分岐命令を動的に検出• BTBを利用し,予測ミスの数を数える

• 予測ミスの偏りの分岐命令を再予測• 各分岐命令毎に,ローカル履歴を利用し予測する• 破壊的な競合を減らすことが見込まれる

75

提案予測器

PCn-1 0

Tag MCT TAddr

=Hit/miss

Addr LH U FR NTCT

CF NTCT

CF NTCT

CF….

PredictionMiss Bias Prediction (MBP) Local History Branch Prediction (LHBP)

Extended BTB(EBTB)

Miss Bias Buffer(MBB)

LHLPHT1 LPHT2 LPHTn

MCT : Miss Counter LH : Local Branch HistoryLPHT : Local Pattern History TableFR : Trace Failure RateNTCT: 2bit Taken NotTaken Saturating Counter

Selector

Global History

Base P

redictor

76

Proposed Branch Predictor

PCn-1 0

TagMCT:MissCounter

TAddr

=Hit/miss Prediction

Miss Bias Detector (MBD) Local History Branch Prediction (LHBP)

Extended BTB(EBTB)

Miss Bias Buffer(MBB)

Local Pattern History Table(LPHT)

Selector

GBH

Base P

redictor

[MBB entry address , LH]

Addr LH:LocalHistory

U:Use Bit

FR:Trace Failure Rate

NTCT: 2bit Saturating Counter

CF: Confidence

MCT : Miss Counter LH : Local Branch HistoryLPHT : Local Pattern History TableFR : Trace Failure RateNTCT: 2bit Taken NotTaken Saturating Counter

77

Miss Bias分岐命令の発見と登録

PCn-1 0

Tag MCT TAddr

=Hit

Addr LH U FR

Miss Bias Detector (MBD)

Extended BTB (EBTB)

Miss Bias Buffer (MBB)

A

MCT++

A LH 1 0

NTCT

CF

Prediction

Local History Branch Prediction (LHBP)

Selector

GBHB

ase Predictor

• 予測ミスの分岐命令をコミットする– ミスカウンタをインクリメントする (MCT)

• MCT が閾値になるとき– 分岐命令をMiss Bias Bufferに登録する

• Address, Local history (LH) , U (Use bit), FR( Trace Failure Rate)


78

Detect & Register Miss-Prediction Bias Branches

PCn-1 0

Tag MCT TAddr

=Hit

Addr LH U FR


Extended BTB (EBTB)


A

MCT++

A LH 1 0

NTCT

CF

Prediction


Selector

GBHB

ase Predictor

• When commit and miss-prediction happen– Increment miss counter (MCT)

• MCT exceeds threshold– Register branch information into Miss Bias Buffer

• Address, Local history (LH) , U (Use bit), FR( Trace Failure Rate)


79

提案予測器：LPHTを用いた予測

PCn-1 0

Tag MCT TAddr

=

Addr LH U FR NTCT

CF NTCT

CF NTCT

CF….

PredictionMiss Bias Prediction (MBP)Local History Branch Prediction (LHBP)

Extended BTB (EBTB)


LHLPHT1 LPHT2 LPHTn

Selector

• Aがフェッチされると• A を miss bias buffer中で検索する。

• MBBにＡが存在する場合▫ NTCT(TakenNotaken Counter), CF(Confidence)を得る▫ CF:3,NTCTの予測を選択する

A

A LH 1 0

Global History

Base P

redictor

80

Using LPHT to PredictPC

n-1 0

Tag MCT TAddr

=

Addr LH U FR


Extended BTB (EBTB)


A

A LH 1 0

Prediction


Selector

GBH

Base P

redictor

(MBB entry address + LH)

• When A is fetched– Associate A in miss bias buffer

• When A is existed in miss bias buffer– Get NTCT and confidence (CF) from LPHT– When confidence is MAX, using NTCT as predictor result


CF: Confidence


81

提案予測器：MBB,LPHTの更新

PCn-1 0

Tag MCT TAddr

=

Addr LH

Global History

U FR NTCT

CF NTCT

CF NTCT

CF

Base P

redictor

….

PredictionMiss Bias Prediction (MBP) Local History Branch Prediction (LHBP)

Extended BTB (EBTB)


LHLPHT1 LPHT2 LPHTn

Selector

A

A LH 1 FR

• Aがコミットされると▫ LPHT更新：NTCT,CF (Resetting Counter)▫ MBB更新：LH,FR

82

Renew MBB and LPHTPC

n-1 0

Tag MCT TAddr

=

Addr LH U FR

Miss Bias Detector (MBP)

Extended BTB (EBTB)


A

A LH 1 FR

PredictionLocal History Branch Prediction (LHBP)

Selector

GBH

Base P

redictor

• When A is committed– Associate A in miss bias buffer

• When A is existed in miss bias buffer (MBB)– Renew LH and FR in miss bias buffer – Renew NTCT and CF ( Resetting Counter) in LPHT


CF: Confidence

LPHT(Local Pattern History Table)

83

Number of Miss-predictions Per Kilo Instructions

predictor SPECint2000 Commbench

SIZE 8KB 16KB 32KB 64KB 8KB 16KB 32KB 64KB

CombiningCombining+ Proposal

5.534.76

5.304.60

5.064.43

4.844.30

7.346.48

7.206.43

6.966.35

6.676.18

BimodeBimode+ Proposal

5.404.76

5.144.55

4.934.38

4.734.28

7.346.50

7.036.37

6.846.27

6.526.15

Bimode-PlusBimode-Plus+ Proposal

5.364.71

5.124.50

4.904.43

4.714.24

7.296.45

7.006.34

6.826.23

6.516.13

AgreeAgree+ Proposal

6.715.94

6.495.79

6.355.69

6.215.61

7.516.78

7.446.88

7.356.82

7.206.75

SIZE 10.5KB 17.8KB 30KB 60.5KB 10.5KB 17.8KB 30KB 60.5KB

HybridHybrid+ Proposal

7.506.92

6.235.76

5.735.27

5.325.03

7.777.40

7.007.40

6.766.60

6.515.39

８KBサイズのベース予測器に3KBのハードウェアにより、64KBのベース予測器とほぼ同じ性能に達成する。

84

Number of Miss-predictions Per Kilo Instructions

predictor SPECint2000 Commbench

SIZE 8KB 16KB 32KB 64KB 8KB 16KB 32KB 64KB

CombiningCombining+ Proposal

5.534.76

5.304.60

5.064.43

4.844.30

7.346.48

7.206.43

6.966.35

6.676.18

BimodeBimode+ Proposal

5.404.76

5.144.55

4.934.38

4.734.28

7.346.50

7.036.37

6.846.27

6.526.15

Bimode-PlusBimode-Plus+ Proposal

5.364.71

5.124.50

4.904.43

4.714.24

7.296.45

7.006.34

6.826.23

6.516.13

AgreeAgree+ Proposal

6.715.94

6.495.79

6.355.69

6.215.61

7.516.78

7.446.88

7.356.82

7.206.75

SIZE 10.5KB 17.8KB 30KB 60.5KB 10.5KB 17.8KB 30KB 60.5KB

HybridHybrid+ Proposal

7.506.92

6.235.76

5.735.27

5.325.03

7.777.40

7.007.40

6.766.60

6.515.39

Adding 3KB hardware on 8KB current predictor brings same performance with 64KB current predictor

85

Control Independence (CI)とは

B:R1=R1+1

Control Independent (CI) instructions

CIを使用する方法を提案する

Control Dependent (CD) instructions

A:Br R1,1

I:R2=R2+1J: Jump C

C:R1=R1+1D:R2=R2+1E:R3=R3+1


C:R1=R1+1D:R2=R2+1E:R3=R3+1

予測ミスの後に,従来取り消された命令を再利用する手法である

Control Independence (CI)Branch mis-predictions limit the performance

• Improve the predictor accuracy• Dual path• Exploit control independence (CI) to reduce mis-prediction

penalty

B:R1=R1+1

Control Independent (CI) instructions

This research : a new method to exploit CI

Control Dependent (CD) instructions

A:Br R1,1

I:R2=R2+1J: Jump C

C:R1=R1+1D:R2=R2+1E:R3=R3+1


C:R1=R1+1D:R2=R2+1E:R3=R3+1

86

87

CI使用の流れ• 現在の予測ミスリカバリ

• 分岐予測ミス後,分岐命令後のすべての命令を削除

• 新たにフェッチ/演算を行う• CIを削除し,新たにフェッチ、演算を行う（無駄を生じる）

• CI を用いた場合のリカバリー• ミス後 CD命令だけを削除する• 正しいパス上のCD命令をフェッチ/実行

• CI命令の再利用（無駄を減らす）

A:Br R1,1

I:R2=R2+1J: Jump C

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

A:Br R1,1

I:R2=R2+1J: Jump C

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

Exploiting CI• Current recovery

• Squash all mis-prediction path• Fetch/execute all correct path

• Squash, re-fetch ,re-execute CI (waste)

• CI recovery• Squash only mis-prediction CD• Fetch/execute only correct path CD

• Not fetch CI again• Not re-execute all of CI

A:Br R1,1

I:R2=R2+1J: Jump C

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

A:Br R1,1

I:R2=R2+1J: Jump C

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1 88

89

CI使用上の問題点

• 問題点• すべてのCIの結果をそのまま再利用出来るわけではない• CIと正しいパスの間に,新たにデータ依存関係を更新する必要がある。

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

予測開始

I:P16=P2+1J: Jump C

ミス発生：正しいパスＣＤＣＩ使用

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

P10は??P2正確?

A:Br R1,1

B:R1=R1+1 I:R2=R2+1

J: Jump C

C:R1=R1+1

D:R2=R2+1

E:R3=R3+1

実行例

What is difficult to exploiting CI

• Problem• Need to renew the Data dependence between the CI and

correct path

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

Start on predict path

I:P16=P2+1J: Jump C

Exploit CI, insert correct path CD

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

What is P10??P2 is correct??

90

91

CI使用上のいくつかの定義

• 収束ポイント(Convergence Point) • Instruction C

• CIDD命令• Control Independent Data Dependent

instructions (C,D,E)

• CIDDSop • Control Independent Data

Independent Source operand• 依存先が確定できないソースオペランド

• Insn C: ソースオペランド R1• Insn D:ソースオペランド R2• Insn E:ソースオペランド R3

A:Br R1,1


C:R1=R1+1

D:R2=R2+1

E:R3=R3+1

?

??

Br: R3, K

92

Definitions of Using CI

• Convergence Point• Instruction C

• CIDD instructions • Control Independent Data Dependent

instructions (C,D,E)

• CIDDSop• Control Independent Data

Independent Source operand• We don’t know which is dependet

• Insn C: source operand R1• Insn D: source operand R2• Insn E: source operand R3

A:Br R1,1


C:R1=R1+1

D:R2=R2+1

E:R3=R3+1

?

??

Br: R3, K

CI使用上の問題点

• 問題点• 依存性の更新が必要です。

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


I:P16=P2+1J: Jump C


A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


93

What is difficult to exploiting CI

• Problem• Need to renew the Data dependence between the CI and

correct path

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


I:P16=P2+1J: Jump C


A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


94

95

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

I:P16=P2+1J: Jump C

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

• すべてのCIをRe-renamingすることにより,依存関係を更新する▫ 欠点：Re-renamingのハードウェアが大きい▫ ：Re-renamingのオーバーヘットが大きい

Re-renaming/re-dispatch Re-renaming/re-dispatch Re-renaming

従来研究:Walker[ Rotenberg+, HPCA’99]

C:P11=P1+1D:P12=P16+1

予測開始ミス発生：使用CI, 挿入正確パスCD

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


I:P16=P2+1J: Jump C


A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

• Walk all of CI instructions to keep the data dependence▫ Re-renaming all of CI ▫ Re-dispatch, when the input is changed + no waste than current processor - do re-renaming again (all CI)

Re-renaming is hard

Re-renaming/re-dispatch Re-renaming/re-dispatch Re-renaming

Walker [ Rotenberg+, HPCA’99]

C:P11=P1+1D:P12=P16+1

96

97

従来研究:SBR [Gandhi+, HPCA’04]

• 命令を挿入することにより, 依存関係を更新する• ミスパスCDを削除し,データ依存の復元のため,レジスタ転送命令を挿入して,CIDDを再発行する

• 欠点：限定されたパターン(if-then)しか対応できない

A:Br P1,1

使用ＣＩ

C:P11=P10+1D:P12=P2+1E:P13=P3+1

re-dispatch

C:P10=P1

A:Br R1,1

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

実行プログラム例

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

予測開始

convert/dispatch

SBR [Gandhi+, HPCA’04]

• Insert the instructions to keep the data dependence• After squashing wrong-path CD, insert convert code • Dispatch convert code, re-dispatch CIDD

• +no re-renaming CI• -just for limited pattern (if-then), can not give a surprising

performance

A:Br P1,1

Exploit CI

C:P11=P10+1D:P12=P2+1E:P13=P3+1

re-dispatch

C:P10=P1

A:Br R1,1

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

Target example

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


convert/dispatch

98

99

従来研究：Ginger [Hilton+, ISCA’07]

• CheckPointを使用し,正しい依存関係を保持する• 予測パスのチェックポイント

• 分岐命令(CK A)，収束ポイント(CK cA), MT（Mapping Table）• 予測ミス後にMTを更新する

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

予測開始

I:P16=P2+1J: Jump C

使用CI, 挿入正しいCD

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1MT R1: R2: R3

Ck A

CK A: P1 P2 P3

Ck cA

CK cA: P10 P2 P3

CK MT: P11 P12 P13 CK MT: P1 P2 P3CK MT: P1 P16 P3

here

R1: R2: R3

Ck A

CK A: P1 P2 P3

Ck cA

CK cA: P10 P2 P3

Ginger [Hilton+, ISCA’07]

• Using checkpoint to keep the data dependence• When start on predict path

• make checkpoint• After mis-prediction

• insert the correct CD, renew MT

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1


I:P16=P2+1J: Jump C


A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

CK ciA: P10 P2 P3

R1: R2: R3

MT: P11 P12 P13

CK recE: P11 P12 P13

R1: R2: R3

MT: P1 P2 P3CK ciA: P10 P2 P3


hereCk ciA

here

P16

100

101

• 正しいパスCDの挿入後,Checkpointを使用し,正しい依存関係を回復

• Check tag, tag-rewriting, Re-dispatch • 欠点：タグの書き換えのため,大規模なCAMが必要

正確パスCDの挿入後: Check tag, tag-rewriting, re-dispatch

I:P16=P2+1J: Jump C

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1 R1: R2: R3

MT: P1 P3CK ciA: P10 P2 P3

P16

C:P11=P1+1D:P12=P16+1

tag-rewriting, then re-dispatch tag-rewriting, then re-dispatch

従来研究：Ginger [Hilton+, ISCA’07]

Ginger [Hilton+, ISCA’07]

• Using checkpoint to keep the data dependence• finishing inserting correct path CD

• Check tag, tag-rewriting, and Re-dispatch

Finish inserting Correct path CD: Check tag, tag-rewriting, re-dispatch

I:P16=P2+1J: Jump C

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1 R1: R2: R3

MT: P1 P2 P3CK ciA: P10 P2 P3


P16

C:P11=P1+1D:P12=P16+1

tag-rewriting, then re-dispatch tag-rewriting, then re-dispatch

102

103

Dual RenamingとはRenamingのときに,ソースオペランドにもう一つのタグを与える（予測ミスの時に利用）

Dual Renaming S-tag (Second Tag)と呼ぶ

C’:P11=S1+1D’:P12=S2+1E’:P13=S3+1

C:R1=R1+1

D:R2=R2+1

E:R3=R3+1

従来のTag RenamingP-tag(Primary Tag)と呼ぶ

C:P11=P10+1D:P12=P2+1E:P13=P3+1

104

Dual Renamingとは• Give one more tag to the source operand, when renaming

Dual Renaming S-tag (Second Tag)

C’:P11=S1+1D’:P12=S2+1E’:P13=S3+1

C:R1=R1+1

D:R2=R2+1

E:R3=R3+1

Current Tag RenamingP-tag(Primary Tag)

C:P11=P10+1D:P12=P2+1E:P13=P3+1

105

アーキテクチャ

Re-Dispatch Queue

Dispatch Queue

Decode

Dispatch

Issue Queue

Fetch

Commit

• 予測のとき▫ CIDDSopのDual Renamingを行う

▫ Dual Renamingされた命令をRe-dispatch queueに保存する

▫ Update S-Tag Table

• 予測ミスのとき▫ S-tag Tableを使用しS-tagをチェックし,変換を行う

▫ Re-dispatch する

Dual Renaming

Tag Conversion

Micor-architecture

Re-Dispatch Queue

Dispatch Queue

Decode

Dispatch

Issue Queue

Fetch

Commit

• When predict path▫ Do Dual Renaming CIDDSop, ▫ put into Re-dispatch queue ▫ Update S-Tag Table

• When mis-prediction▫ Check S-tag by using Stag Table ,

Re-dispatch

Dual Renaming

Tag Conversion

106

107

Dual Renaming の例

• CIDDSopのrenaming• 従来と同じ P-tag (primary tag)を与える, さらに S-tag (second

tag)を追加する• 依存の情報を MT とS-tag table (ST)に保存する• Dual RenamingされたCIをFIFO のRe-dispatch queueに保存

Renaming 予測パス

ST: R1

S1: S2: S3

MT:

R1: R2: R3

P10 P2 P3

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

Re-dispatch queue

A:Br R1,1

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

フェッチ

C’:P11=S1+1D’:P12=S2+1E’:P13=S3+1

P11 R2P12 P13 R3

Dual Renaming in CI

• When renaming CIDDSop• Give a P-tag (primary tag), add a S-tag (second tag)• Keep the dependence information in MT and S-tag table (ST)• Keep Dual Renamed CI into a FIFO (re-dispatch queue)

Renaming predict path

ST: R1

S1: S2: S3

MT:

R1: R2: R3

P10 P2 P3

A:Br P1,1

B:P10=P1+1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

Re-dispatch queue

A:Br R1,1

B:R1=R1+1

C:R1=R1+1D:R2=R2+1E:R3=R3+1

Fetch

C’:P11=S1+1D’:P12=S2+1E’:P13=S3+1

P11 R2P12 P13 R3

108

109

Tag Conversion

MT:

R1: R2: R3

P1 P2 P3

C’:P11=S1+1D’:P12=S2+1E’:P13=S3+1

Re-dispatch queue

I:P16=P2+1J: Jump C

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

予測ミス発生後, 正確パスCDを挿入正確パスCDの挿入後, tag 変換(Convertion)

ST:

S1: S2: S3

R1 R2 R3

R1: R2: R3

P1 P16 P3MT:

CheckCheckCheck

here

P16

D’:P12=P16+1 Convert/re-dispatchC’:P11=P1+1 Convert/re-dispatch

予測ミス発生後正しいパスを挿入し，MＴを更新する

正しいパスの挿入後，タグ変換 S-tagをチェックし，変更されたタグを発見する S-tagが変更されると，S-tagの変換を行い，re-dispatchを行う

Tag Conversion

• When miss-prediction• Insert correct path CD, renew MT, mark the changed tag

• When finishing to insert CD, tag conversion• Check the S-tag which has been changed • When the S-tag is changed, convert it and re-dispatch

MT:

R1: R2: R3

P1 P2 P3

C’:P11=S1+1D’:P12=S2+1E’:P13=S3+1

Re-dispatch queue

I:P16=P2+1J: Jump C

A:Br P1,1

C:P11=P10+1D:P12=P2+1E:P13=P3+1

When Mis-prediction, insert correct CD Finished insert CD, tag conversion

ST:

S1: S2: S3

R1 R2 R3

R1: R2: R3

P1 P16 P3MT:

CheckCheckCheck

here

P16

D’:P12=P16+1 Convert/re-dispatchC’:P11=P1+1 Convert/re-dispatch

110

111

A Branch Target Address Predictor for Reducing BTB Miss by Using CAM

PC

0

=

i

Hit/Miss

CBTB

Tag

n-1

Targetaddress

tag index

n-i

Address

• Purpose• Reduce BTB miss, improve

processor performance

• Proposal :Dividing BTB into NBTB and CBTB to reduce BTB miss

• NBTB (SRAM)• predict non-conditional branch

• CBTB(CAM)• Predict conditional branch• Renew algorithm is FIFO or CAM

• Experiment result • FIFO algorithm is best for our

proposal when using 128entirs • Adding 2.87% hardware bring

3.44% performance up

NBTB

PC Targetaddress

Address

112

A Branch Target Address Predictor for Reducing BTB Miss by Using CAM

PC

0

=

i

Hit/Miss

CBTB

Tag

n-1

Targetaddress

tag index

n-i

Address

• Purpose• Reduce BTB miss, improve

processor performance

• Proposal :Dividing BTB into NBTB and CBTB to reduce BTB miss

• NBTB (SRAM)• predict non-conditional branch

• CBTB(CAM)• Predict conditional branch• Renew algorithm is FIFO or CAM

• Experiment result • FIFO algorithm is best for our

proposal when using 128entirs • Adding 2.87% hardware bring

3.44% performance up

NBTB

PC Targetaddress

Address

計算機構成特論� ーースーパースカラプロセッサAdvanced Topics in Computer Architecture� ーSuperscalar Processorコンピュータアーキテクチャってなに？�プロセッサってなに？プロセッサ種類What is computer architecture ？�What is processorType of CPUスライド番号 7スライド番号 8スライド番号 9スライド番号 10スライド番号 11スライド番号 12スーパースカラプロセッサSuperscalar Processor 命令セット(MIPS)-Execution-type Instruction set (MIPS)-Execution-type ロード命令の実行Load-Store instruction 分岐命令(beq)の実行Branch instruction (beq)パイプラインステージThe pipeline stages of Superscalar processor スライド番号 23スライド番号 24スライド番号 25スライド番号 26どうやって性能を向上How to improve the performanceスライド番号 29スライド番号 30スライド番号 31スライド番号 32スライド番号 33スライド番号 34データ依存の緩和手法True data dependence reduction method スライド番号 37スライド番号 38スライド番号 39スライド番号 40ロード値予測Load value prediction ロード予測の性能議論Discussion about Load Value PredictorOne Hop Conventional MethodConventional MethodConventional MethodConventional MethodConventional Method信頼性の評価を用いてConfidence estimation面白いミス分析結果Interesting miss analysis resultsbzip ベンチマークでの�ミス分析結果Miss pattern on bzip benchmarkProposed method提案手法スライド番号 59スライド番号 60分岐予測器と分類Control dependence and the reduction method 単体予測器スライド番号 64ハイブリット予測器Hybrid predictorスライド番号 67スライド番号 68従来予測器の問題点The problem of current predictor Combiningでの予測ミス偏りMiss-Prediction Bias in CombiningUsing branch miss prediction bias予測ミスの偏りの利用提案予測器Proposed Branch PredictorMiss Bias分岐命令の発見と登録Detect & Register Miss-Prediction Bias Branches提案予測器：LPHTを用いた予測Using LPHT to Predict 提案予測器：MBB,LPHTの更新Renew MBB and LPHTNumber of Miss-predictions Per Kilo InstructionsNumber of Miss-predictions Per Kilo InstructionsControl Independence (CI)とはControl Independence (CI)CI使用の流れExploiting CIスライド番号 89スライド番号 90CI使用上のいくつかの定義Definitions of Using CIスライド番号 93スライド番号 94スライド番号 95スライド番号 96従来研究:SBR [Gandhi+, HPCA’04]SBR [Gandhi+, HPCA’04]従来研究：Ginger [Hilton+, ISCA’07]Ginger [Hilton+, ISCA’07]スライド番号 101Ginger [Hilton+, ISCA’07]Dual RenamingとはDual Renamingとはアーキテクチャ Micor-architecture Dual Renaming の例Dual Renaming in CITag Conversion Tag Conversion A Branch Target Address Predictor for Reducing BTB Miss by Using CAMA Branch Target Address Predictor for Reducing BTB Miss by Using CAM

計算機構成特論 - 立命館大学...3 x i xi 32 scheduling: in-order single instruction in...

Documents