anshul kumar, cse iitd csl718 : main memory cpu-cache-main memory performance 9th mar, 2006

Anshul Kumar, CSE IITD

CSL718 : Main MemoryCSL718 : Main MemoryCSL718 : Main MemoryCSL718 : Main Memory

CPU-Cache-Main Memory Performance

9th Mar, 2006

Anshul Kumar, CSE IITD slide 2

A Simple ModelA Simple ModelA Simple ModelA Simple Model

tav = tc + pm . tc.miss

where

tav = average memory access time as seen by CPU

tc = cache access time

pm = miss probability (consider only read misses, if write penalties

are hidden by buffers)

tc.miss = cache miss penalty

CPU Cache Memory


Cache miss penaltyCache miss penaltyCache miss penaltyCache miss penalty

Depends on • Various cache policies

– Read policy

– Load policy

– Write policy

– Write buffers etc.

• Main memory organization– Interleaving

– Page mode


Read PoliciesRead PoliciesRead PoliciesRead Policies

CacheMemory

Teff=(1-pm).1 + pm . (T+2)

Sequential Simple:

CacheMemory

Teff=(1-pm).1 + pm . (T+1)

Concurrent Simple:

CacheMemory

Teff=(1-pm).1 + pm . (T+1)

Sequential Forward:

CacheMemory

Teff=(1-pm).1 + pm . (T)

Concurrent Forward:

1 1 1T

1 1 1T

1 1T

1 1T


Load policiesLoad policiesLoad policiesLoad policies

4 AU Block

Cache miss on AU 1

Block Load

Load Forward

Fetch Bypass(wrap aroundload)

0 1 2 3


Analyzing Write Policies:CPU timeAnalyzing Write Policies:CPU timeAnalyzing Write Policies:CPU timeAnalyzing Write Policies:CPU time

Hit:WB, Miss: WB 1 Tb + i 1 1

Hit:WB, Miss: WTWA 1 Tb + i 1 1

Hit:WB, Miss: WTNWA 1 Tb + i 1 1

Hit:WT, Miss: WB 1 Tb + i 1 1

Hit:WT, Miss: WTWA 1 Tb + i 1 1

Hit:WT, Miss: WTNWA 1 Tb + i 1 1

Policy Read Read Write Writehit miss hit miss

i depends on read policy


Analyzing Write Policies:Bus timeAnalyzing Write Policies:Bus timeAnalyzing Write Policies:Bus timeAnalyzing Write Policies:Bus time

Hit:WB, Miss: WB 0 Tb (2-Pc) 0 Tb(2-Pc)

Hit:WB, Miss: WTWA 0 Tb (2-Pc) 0 Tb(2-Pc)+Tw

Hit:WB, Miss: WTNWA 0 Tb (2-Pc) 0 Tw

Hit:WT, Miss: WB 0 Tb (2-Pc) Tw Tb(2-Pc)

Hit:WT, Miss: WTWA 0 Tb Tw Tb+Tw

Hit:WT, Miss: WTNWA 0 Tb Tw Tw

Policy Read Read Write Writehit miss hit miss


Interleaving with Fast Page ModeInterleaving with Fast Page ModeInterleaving with Fast Page ModeInterleaving with Fast Page Mode

m

LLT

m

LTTT buscalineaccess 1


A Refined ModelA Refined ModelA Refined ModelA Refined Model

tav = tc + pm . (tc.miss + tinterference + tw-interference + tIO-interference )

where

tinterference = interference among line transfers

tw-interference = interference between word writes and line transfers

tIO-interference = interference between I/O and line transfers


Interference among line transfersInterference among line transfersInterference among line transfersInterference among line transfers

What happens when another miss occurs in tbusy = tm.miss - tc.miss

interval?

tinterference = additional delay due to this

= expected number of misses during tbusy * delay per miss

= ( * tbusy * pm) * (tbusy / 2)where = memory request rate of processor

tc tc.miss

tm.miss

CPU blocked CPU executing

Memory busy


Interference I/Os and writesInterference I/Os and writesInterference I/Os and writesInterference I/Os and writes

delay = prob that memory is busy when request arrives *

average waiting period

what happens when memory is found to be busy serving one request and some other requests are waiting?

Memory busy

request arrivals

served waiting served


I/O InterferenceI/O InterferenceI/O InterferenceI/O Interference

tIO-interference = delay due to I/O contention

= probability that memory is occupied with I/O *

average time taken to complete ongoing I/O

= ( ) * (tservice +tIO-wait)/2

tservice = time to service (block read/write time)

tIO-wait = waiting time

= 0, if CPU has a higher priority

0, otherwise

estimate using queuing model


Write Interference DelayWrite Interference DelayWrite Interference DelayWrite Interference Delay

tw-interference = probability that a write through is occupying the memory when a read miss occurs *

average time taken to complete ongoing write


Memory performance using queuing modelMemory performance using queuing modelMemory performance using queuing modelMemory performance using queuing model

Arrival ofrequests

(from processor/cache)

Servicing ofrequests

(by memory)

Requests queuedfor service

Statistical behaviour of arrivals ?Statistical behaviour of service?

Model Nomenclature: arrival / service / number M / G / 1 G : GeneralM / M / 1 M : Poisson/ExponentialM / D / 1 D : ConstantMB / D / 1 MB : Binomial


Modeling memory requestsModeling memory requestsModeling memory requestsModeling memory requests

prob of a request in one cycle = pprob of no request in one cycle = 1 – pprob of no request in T/ cycles = (1 – p)T/

prob of at least one req in T/ cycles = 1 – (1 – p)T/

prob of k requests in n (=T/ ) cycles = nCk pk (1 – p)n-k

(Binomial distribution)expected no. of requests in n cycles = n p

T : interval(memory cycle time)

: processor cycle


Poisson ApproximationPoisson ApproximationPoisson ApproximationPoisson Approximation

If processor cycles are small

(i.e., 0, p 0, n , n p T),

Binomial distribution Poisson distribution, request rate =

prob of k requests in interval T =

expected no. of requests in interval T = T

Interval between two consecutive requests has an exponential distribution, prob (inter arrival interval > t) = 1 – e - t

Tk

ek

T

!

)(


Modeling ServiceModeling ServiceModeling ServiceModeling Service

• Each request is served in constant time

e.g. cache write through requests,

cache block transfer requests

or• Service time has an exponential distribution

e.g. I/O requests with varying block sizes where small blocks are more common than large blocks


M / G / 1 ModelM / G / 1 ModelM / G / 1 ModelM / G / 1 Model

Average waiting time = Tw =

Average queue length = Q =

where

= occupancy of server = / = average service rate

c = = variance of service time

)1(2

)1(1 22

c

)1(2

)1( 22

c


Special cases: M/M/1, M/D/1Special cases: M/M/1, M/D/1Special cases: M/M/1, M/D/1Special cases: M/M/1, M/D/1

M/M/1 c = 1



M/D/1 c = 0



1

1 2

1

2

)1(2

1 2

)1(2

2


M/D/1 with low server occupancyM/D/1 with low server occupancyM/D/1 with low server occupancyM/D/1 with low server occupancy



when is small, Tw =

=

Compare this with

)1(2

1 2

)1(2

2

2

1 2

2

1

2

1

2busym tp

2

1


Designing buffer to hold the queueDesigning buffer to hold the queueDesigning buffer to hold the queueDesigning buffer to hold the queue

How to design a buffer so that buffer overflow or stalling due to buffer full is within certain limit?

For M/M/1 model ,

prob(queue size buffer size BF) = BF+1

Choose BF so that this probability is below a desired value.


Open and Closed QueuesOpen and Closed QueuesOpen and Closed QueuesOpen and Closed Queues

Arrival ofrequests



(by memory)


•Processor is not blocked by queuing delays and request rate remains unaffected – Open queue•Processor is blocked due to queuing delays and request rate drops – Closed queue


Open and Closed QueuesOpen and Closed QueuesOpen and Closed QueuesOpen and Closed Queues

Arrival ofrequests



(by memory)


Time Tw 1/ Number (open) Q = Tw = / Number (closed) Qa a

occupancy (open q) = = occupancy (closed q) + waiting (closed q) a + Qa


M/D/1 Closed QueueM/D/1 Closed QueueM/D/1 Closed QueueM/D/1 Closed QueueReduced request rate = a

Reduced occupancy = a = a /

Requests being served = a

Requests waiting =

)1(2

2

a

a

1)1(1)1(

1)1()1(2

22

22

a

aa

aa


Deriving queue length, wait timeDeriving queue length, wait timeDeriving queue length, wait timeDeriving queue length, wait time

Let ti = time when request i is being served

ri = no. of arrivals during ti

ni = queue length at the end of ti

including item in service

Assume occupancy of server = = / < 1 process reaches a steady state

Expected value E(ti ) = E(t ) = T = 1/

E(ri ) = E(r ) = E(t ) = / = E(ni ) = E(n ) = N


Relating Relating nni+1i+1 to to nniiRelating Relating nni+1i+1 to to nnii

ni+1 = ni + arrivals – departures

two cases need to be considered:

i) ni 0

ii) ni = 0

Ci+1Ci+2Ci+3 Ci

ni


When When nnii 0 0When When nnii 0 0

Ci+1 arrived before Ci left

ni+1 = ni + ri+1 - 1

Ci served Ci+1 served

Ci leaves Ci+1 leaves

timeti ti+1


When When nnii = 0 = 0When When nnii = 0 = 0

Ci+1 arrived after Ci left

ni+1 = ni + 1 + ri+1 – 1

= ni + ri+1

Ci served Ci+1 served

Ci leaves Ci+1 leaves

timeti ti+1

Ci+1 arrives


Combining the two casesCombining the two casesCombining the two casesCombining the two cases

ni+1 = ni + ri+1 – 1 + i

wherei = 0, when ni 0 and

i = 1, when ni = 0

note that ni i = 0 and i2

= i

E(ni+1) = E( ni ) + E( ri+1 ) – 1 + E( i )

in steady state, E(n) = E( n ) + E( r ) – 1 + E( )

that is, E() = 1 - E( r ) = 1 - prob ( n 0) =


Combining the two casesCombining the two casesCombining the two casesCombining the two casesni+1 = ni + ri+1 – 1 + i

ni+12 = ni

2 + (ri+1 – 1)2 + i2 + 2 ni (ri+1 – 1)

+ 2 (ri+1 – 1) i + 2 ni i

ni+12 = ni

2 + (ri+1 – 1)2 + i + 2 ni (ri+1 – 1) + 2 (ri+1 – 1) i

E(ni+12) = E( ni

2 ) + E(ri+1 – 1)2 + E( i )

+ 2 E[ ni (ri+1 – 1) ] + 2 E[(ri+1 – 1) i ]

0 = E[(r – 1)2] + E( ) + 2 E[ n (r – 1) ] + 2 E[(r – 1) ]

0 = E(r2)-2 +1+ (1-) + 2 E(n) ( – 1) + 2 ( – 1)(1-)

2 E(n) (1- ) = E(r2) -2 2 +


continuedcontinuedcontinuedcontinued

2 E(n) (1- ) = E(r2) -2 2 +

This is valid for G/G/1

)1(2

- )E(

)1(2

2- )E( )E( N

222

rr

n


Consider Poisson arrivalConsider Poisson arrivalConsider Poisson arrivalConsider Poisson arrival

P(ri) =

mean E(ri) = ti

variance ri2 = ti

ri2 = E(ri

2) - |E(ri)|2

E(ri2) = ri

2 + |E(ri)|2

Take expectation over i

E(r2) = E(t) + 2 E(t 2)

i

i

!

)(

i

i tr

er

t


continuedcontinuedcontinuedcontinued

mean E(t) = 1/

variance t2

E(t2) = t2 + [E(t) ] 2 = t

2 + 1/ 2

Recall E(r2) = E(t) + 2 E(t 2)

Therefore, E(r2) = / + 2 (t2 + 1/ 2 )

= + 2 t2 + 2

where c2 = 2 t2

)1(2

) (1

)1(2

)1(2

- )E( )E( N

222222

cr

n t


Direct Derivation for M/M/1Direct Derivation for M/M/1Direct Derivation for M/M/1Direct Derivation for M/M/1

P(n; t) = prob that there are n req in the system at time t (in queue + in service)

P(n; t+t) = P(n; t)(1 - t - t) + P(n-1; t) t + P(n+1; t) tP(0; t+t) = P(0; t)(1 - t) + P(1; t) t

Prob of more than one event in t is neglected (t2

term)



dP(n; t)/dt = P(n; t)(-- ) + P(n-1; t) + P(n+1; t) dP(0; t)/dt = P(0; t)(-) + P(1; t)In steady state, We can drop ;t Derivatives tend to 0 0 = P(n)(-- ) + P(n-1) + P(n+1) 0 = P(0)(-) + P(1) P(n) - P(n+1) = P(n-1) - P(n) P(0) - P(1) = 0

P(n-1) - P(n) = 0 P(n) = P(n-1)



P(n) = P(n-1)

P(n) = n P(0)

11)1()1(

)1()1()()(

)1()(1)0(

11

)0(1)0(1)(

2

2

000

00

i

i

i

i

i

n

i

i

i

iiiPinE

nPandP

PPiP



)(

)(Prob

)1(

)1)(1()1()(

)(Prob

1

1

2

00

k

k

k

i

ik

i

kserverqueueinitems

iP

kserverqueueinitems

anshul kumar, cse iitd csl718 : main memory cpu-cache-main memory performance 9th mar, 2006

Documents