cse 8383 - advanced computer architecture
DESCRIPTION
CSE 8383 - Advanced Computer Architecture. Week-5 Week of Feb 9, 2004 engr.smu.edu/~rewini/8383. Contents. Project/Schedule Introduction to Multiprocessors Parallelism Performance PRAM Model …. Warm Up. Parallel Numerical Integration Parallel Matrix Multiplication - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/1.jpg)
CSE 8383 - Advanced Computer Architecture
Week-5Week of Feb 9, 2004
engr.smu.edu/~rewini/8383
![Page 2: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/2.jpg)
Contents Project/Schedule Introduction to Multiprocessors Parallelism Performance PRAM Model ….
![Page 3: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/3.jpg)
Warm Up Parallel Numerical Integration Parallel Matrix Multiplication
In class: Discuss with your neighbor!Videotape: Think about it!
What kind of architecture do we need?
![Page 4: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/4.jpg)
Explicit vs. Implicit Paralleism
Parallel Architecture
Programming Environment
Parallelizer
Sequentialprogram
Parallelprogram
![Page 5: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/5.jpg)
Motivation One-processor systems are not capable
of delivering solutions to some problems in reasonable time
Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution
Speed-up versus Quality-up
![Page 6: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/6.jpg)
Multiprocessing
One-processor
Multiprocessor
Speed-up Quality-up Sharing
Physical limitations
N processors cooperate to solve a single computational task
![Page 7: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/7.jpg)
Flynn’s Classification- revisited
SISD (single instruction stream over a single data stream)
SIMD (single instruction stream over multiple data stream)
MIMD (multiple instruction streams over multiple data streams)
MISD (multiple instruction streams and a single data streams)
![Page 8: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/8.jpg)
SISD (single instruction stream over a single data stream)
SISD uniprocessor architecture
CU
IS
DSIS
PU MUI/O
Captions:
CU = control unit PU = Processing unit
MU = memory unit IS = instruction stream
DS = data stream PE = processing element
LM = Local Memory
![Page 9: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/9.jpg)
SIMD (single instruction stream over multiple data stream)
SIMD Architecture
PEn
PE1
LMn
CU
IS
DS DS
DS DS
ISProgram loaded from host
Data sets loaded from host
LM1
![Page 10: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/10.jpg)
MIMD (multiple instruction streams over multiple data streams)
CU1
CU1
PUn
IS DS
IS DS
MMD Architecture (with shared memory)
PU1
SharedMemory
I/O
I/O
IS
IS
![Page 11: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/11.jpg)
MISD (multiple instruction streams and a single data streams)
Memory(Programand data)
CU1 CU2
PU2
CUn
PUnPU1
IS IS
IS IS IS
DSI/O
DS DS DS
MISD architecture (the systolic array)
![Page 12: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/12.jpg)
System Components Three major Components
Processors
Memory Modules
Interconnection Network
![Page 13: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/13.jpg)
Memory Access Shared Memory
Distributed Memory
M PP
P
M
P
M
![Page 14: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/14.jpg)
Interconnection Network Taxonomy
Interconnection Network
Static Dynamic
Bus-based Switch-based1-D 2-D HC
Single Multiple SS MS Crossbar
![Page 15: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/15.jpg)
MIMD Shared Memory Systems
Interconnection Networks
M M M M
P P P P P
![Page 16: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/16.jpg)
Shared Memory Single address space Communication via read & write Synchronization via locks
![Page 17: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/17.jpg)
Bus Based & switch based SM Systems
Global Memory
P
C
P
C
P
C
P C
P C
P C
P C
M M M M
![Page 18: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/18.jpg)
Cache Coherent NUMA
Interconnection Network
M
C
P
M
C
P
M
C
P
M
C
P
![Page 19: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/19.jpg)
MIMD Distributed Memory Systems
Interconnection Networks
M M M M
P P P P
![Page 20: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/20.jpg)
Distributed Memory Multiple address spaces Communication via send & receive Synchronization via messages
![Page 21: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/21.jpg)
SIMD Computers
Processor
Memory
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
PM
von Neumann Computer
Some Interconnection Network
![Page 22: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/22.jpg)
SIMD (Data Parallel) Parallel Operations within a
computation are partitioned spatially rather than temporally
Scalar instructions vs. Array instructions
Processors are incapable of operating autonomously they must be diven by the control uni
![Page 23: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/23.jpg)
Past Trends in Parallel Architecture (inside the box) Completely custom designed
components (processors, memory, interconnects, I/O) Longer R&D time (2-3 years) Expensive systems Quickly becoming outdated
Bankrupt companies!!
![Page 24: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/24.jpg)
New Trends in Parallel Architecture (outside the box) Advances in commodity processors and
network technology Network of PCs and workstations
connected via LAN or WAN forms a Parallel System
Network Computing Compete favorably (cost/performance) Utilize unused cycles of systems sitting
idle
![Page 25: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/25.jpg)
Clusters
M
C
P
I/O
OS
M
C
P
I/O
OS
M
C
P
I/O
OS
Middleware
Programming Environment
Interconnection Network
![Page 26: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/26.jpg)
Grids Grids are geographically
distributed platforms for computation.
They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.
![Page 27: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/27.jpg)
Problem Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk-shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 109 times per second? What would the diameter be if the switching requirements were 1012 time per second?
![Page 28: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/28.jpg)
Grosch’s Law (1960s) “To sell a computer for twice as
much, it must be four times as fast” Vendors skip small speed
improvements in favor of waiting for large ones
Buyers of expensive machines would wait for a twofold improvement in performance for the same price.
![Page 29: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/29.jpg)
Moore’s Law Gordon Moore (cofounder of Intel) Processor performance would
double every 18 months This prediction has held for several
decades Unlikely that single-processor
performance continues to increase indefinitely
![Page 30: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/30.jpg)
Von Neumann’s bottleneck Great mathematician of the 1940s and
1950s Single control unit connecting a memory to
a processing unit Instructions and data are fetched one at a
time from memory and fed to processing unit
Speed is limited by the rate at which instructions and data are transferred from memory to the processing unit.
![Page 31: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/31.jpg)
Parallelism Multiple CPUs
Within the CPU One Pipeline Multiple pipelines
![Page 32: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/32.jpg)
Speedup S = Speed(new) / Speed(old) S = Work/time(new) /
Work/time(old) S = time(old) / time(new) S = time(before improvement) / time(after improvement)
![Page 33: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/33.jpg)
Speedup Time (one CPU): T(1)
Time (n CPUs): T(n)
Speedup: S
S = T(1)/T(n)
![Page 34: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/34.jpg)
Amdahl’s Law
The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used
![Page 35: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/35.jpg)
20 hours
200 miles
A B
Walk 4 miles /hour 50 + 20 = 70 hours S = 1Bike 10 miles / hour 20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4
must walk
Example
![Page 36: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/36.jpg)
Amdahl’s Law (1967) : The fraction of the program that
is naturally serial
(1- ): The fraction of the program that is naturally parallel
![Page 37: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/37.jpg)
S = T(1)/T(N)
T(N) = T(1) + T(1)(1- )
N
S = 1
+ (1- )
N
=N
N + (1- )
![Page 38: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/38.jpg)
Amdahl’s Law
![Page 39: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/39.jpg)
Gustafson-Barsis Law
N & are not independent from each other
T(N) = 1
T(1) = + (1- ) N
S = N – (N-1)
: The fraction of the program that is naturally serial
![Page 40: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/40.jpg)
Gustafson-Barsis Law
![Page 41: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/41.jpg)
Comparison of Amdahl’s Law vs Gustafson-Barsis’ Law
![Page 42: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/42.jpg)
For I = 1 to 10 do
begin
S[I] = 0.0 ;
for J = 1 to 10 do
S[I] = S[I] + M[I, J];
S[I] = S[I]/10;
end
Example
![Page 43: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/43.jpg)
![Page 44: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/44.jpg)
Distributed Computing Performance
Single Program Performance
Multiple Program Performance
![Page 45: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/45.jpg)
PRAM Model
![Page 46: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/46.jpg)
What is a Model? According to Webster’s Dictionary, a
model is “a description or analogy used to help visualize something that cannot be directly observed.”
According to The Oxford English Dictionary, a model is “a simplified or idealized description or conception of a particular system, situation or process.”
![Page 47: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/47.jpg)
Why Models? In general, the purpose of
Modeling is to capture the salient characteristics of phenomena with clarity and the right degree of accuracy to facilitate analysis and prediction.
Megg, Matheson and Tarjan (1995)
![Page 48: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/48.jpg)
Models in Problem Solving Computer Scientists use models to
help design problem solving tools such as:
Fast Algorithms Effective Programming Environments Powerful Execution Engines
![Page 49: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/49.jpg)
A model is an interface separating high level properties from low level ones
An InterfaceApplications
Architectures
Providesoperations
Requires implementation
MODEL
![Page 50: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/50.jpg)
PRAM Model Synchronized
Read Compute Write Cycle
EREW ERCW CREW CRCW Complexity:
T(n), P(n), C(n)
Control
PrivateMemory
P1
PrivateMemory
P2
PrivateMemory
Pp
Global
Memory
![Page 51: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/51.jpg)
The PRAM model and its variations (cont.) There are different modes for read and write operations in a
PRAM. Exclusive read(ER) Exclusive write(EW) Concurrent read(CR) Concurrent write(CW)
Common Arbitrary Minimum Priority
Based on the different modes described above, the PRAM can be further divided into the following four subclasses.
EREW-PRAM model CREW-PRAM model ERCW-PRAM model CRCW-PRAM model
![Page 52: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/52.jpg)
Analysis of Algorithms Sequential Algorithms
Time Complexity Space Complexity
An algorithm whose time complexity is bounded by a polynomial is called a polynomial-time algorithm. An algorithm is considered to be efficient if it runs in polynomial time.
![Page 53: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/53.jpg)
Analysis of Sequential Algorithms
NP
P
NP-complete
NP-hard
The relationships among P, NP, NP-complete, NP-hard
![Page 54: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/54.jpg)
Analysis of parallel algorithm
Performance of a parallel algorithm is expressed in terms of how fast it is and how much resources it uses when it runs.
Run time, which is defined as the time during the execution of the algorithm
Number of processors the algorithm uses to solve a problem
The cost of the parallel algorithm, which is the product of the run time and the number of processors
![Page 55: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/55.jpg)
Analysis of parallel algorithmThe NC-class and P-completeness
NP
P
NP-complete
NC
P-complete
NP-hard
The relationships among P, NP, NP-complete, NP-hard, NC, and P-complete
(if PNP and NC P)
![Page 56: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/56.jpg)
Simulating multiple accesses on an EREW PRAM
Broadcasting mechanism: P1 reads x and makes it known to P2. P1 and P2 make x known to P3 and P4,
respectively, in parallel. P1, P2, P3 and P4 make x known to P5,
P6, P7 and P8, respectively, in parallel. These eight processors will make x
know to another eight processors, and so on.
![Page 57: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/57.jpg)
Simulating multiple accesses on an EREW PRAM (cont.)
Simulating Concurrent read on EREW PRAM with eight processors using Algorithm Broadcast_EREW
x
xx P1
(a)
x
x
xx P2
(b)
x
x
x
x
x P3
(c)
x
x
x
x
x
x
x
x
x P5
(d)
x P4
x P6
x P7
x P8
LLLL
![Page 58: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/58.jpg)
Simulating multiple accesses on an EREW PRAM (cont.) Algorithm Broadcast_EREW
Processor P1
y (in P1’s private memory) xL[1] y
for i=0 to log p-1 doforall Pj, where 2i +1 < j < 2i+1 do in parallel
y (in Pj’s private memory) L[j-2i]L[j] y
endforendfor
![Page 59: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/59.jpg)
Bus-based Shared Memory
Collection of wires and connectors
Only one transaction at a time
Bottleneck!! How can we solve the problem?
Global Memory
P P P P P
![Page 60: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/60.jpg)
Single Processor caching
P
x
x Memory
CacheHit: data in the cache
Miss: data is not in the cache
Hit rate: h
Miss rate: m = (1-h)
![Page 61: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/61.jpg)
Writing in the cache
P
x
x
Before
Memory
Cache
P
x’
x’
Write through
Memory
Cache
P
x’
x
Write back
Memory
Cache
![Page 62: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/62.jpg)
Using Caches
Global Memory
P1
C1
P2
C2
P3
C3
Pn
Cn
- Cache Coherence problem
- How many processors?
![Page 63: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/63.jpg)
Group Activity Variables
Number of processors (n) Hit rate (h) Bus Bandwidth (B) Processor speed (V)
Condition: n*(I - h)*v <= B
Maximum number of processors n = B/(1-h)*v
![Page 64: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/64.jpg)
Cache Coherence
P1
x
P2 P3
x
Pn
x
x
-Multiple copies of x-What if P1 updates x?
![Page 65: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/65.jpg)
Cache Coherence Policies Writing to Cache in 1 processor case
Write Through Write Back
Writing to Cache in n processor case Write Update - Write Through Write Invalidate - Write Back Write Update - Write Through Write Invalidate - Write Back
![Page 66: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/66.jpg)
Write-invalidate
P1
x
P2 P3
x
x
P1
x’
P2 P3
I
x’
P1
x’
P2 P3
I
x
Before Write Through Write back
![Page 67: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/67.jpg)
Write-Update
P1
x
P2 P3
x
x
P1
x’
P2 P3
x’
x’
P1
x’
P2 P3
x’
x
Before Write Through Write back
![Page 68: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/68.jpg)
SynchronizationP1 P2 P3
Lock…..…..
unlock
Lock…..…..
unlock
Lock…..…..
unlockLocks
wait
wait
![Page 69: CSE 8383 - Advanced Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814fe5550346895dbdaf6e/html5/thumbnails/69.jpg)
Superscalar Parallelism
Scheduling