multi-core processing the past and the future amir moghimi, asic course, ut ece
TRANSCRIPT
Multi-core ProcessingThe Past and The FutureAmir Moghimi, ASIC Course, UT ECE
The Past
• Instruction Level Parallelism (ILP) Enhanced Processors
• Wide Dynamic Execution [1] with techniques such as:• speculative execution (using branch prediction)• out of order execution (using register renaming
and reservation stations)• super scalar (using multiple-issue instruction cache
and reorder buffer)
• e.g. Intel P6 Micro-architecture• Used in: Pentium® Pro processor, Pentium® II
processor and Pentium® III processors
ILP Limitations
• Window size limitation [2] due to:• 2450 comparisons for register dependency
detection among 50 instructions in one clock cycle!• A branch instruction every 5 instructions on average
• Imperfect branch prediction
• Serial nature of the application with true data dependencies
• So, how to use this huge amount of silicon coming every year and a half?• Use multiple cores on a single die
Multi-core Basics
• A multi-core chip is one which combines two or more independent processing cores into a single die (also known as Chip Multi-Processor)
• Four main questions arise [3]:• How the application is developed?• How do they share data?• How do they physically communicate?• How scalable is the architecture?
Given Answers
• For parallel application development, use the thread concept formerly proposed for discrete multi-processor systems
# of Proc
Communication model
Message passing 8 to 2048
Shared address
NUMA 8 to 256
UMA 2 to 64
Physical connection
Network 8 to 256
Bus 2 to 36
[3]
Chip Level Multi-threading
• Implemented in superscalar processors before introducing multi-core chips
• Multi-threading Methods:• Fine-grained • Coarse-grained• Simultaneous MT
• e.g. Intel HyperThreading Technology
4-way Threading Processor [3]
Thread A Thread B
Thread C Thread D
Tim
e →
Issue slots →SMTFine MTCoarse MT
Now Multi-core Processing
• A simple look at a multi-core processor (IBM Xenon used in MS-Xbox 360)
• Simple but effective
Core 0
L1D L1I
Core 1
L1D L1I
Core 2
L1D L1I
1MB UL2
[4]
A More Powerful Design
• STI Cell (used in PS3)
[8]
A Comparison
• Sun UltraSPARC T1
[5]
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
4-w
ay M
T S
PA
RC
pip
e
Crossbar
4-way banked L2
Memory controllers
I/Osharedfuncs
[3]
UltraSPARC T1 vs. Pentium EE
[5]
UltraSPARC T1 vs. Pentium EE
Performance Comparison running SPEC JBB 2000, TPC-C, TPC-W, and XML Test as server benchmarks and SPEC CPU2000 as the serial benchmark [5]
Pentium Extreme Edition Die Photo [5]
Now the Trend
• Intel will deliver a quad-core (4 full execution cores) processor in the first quarter of 2007 [1]
• “We forecast that more than 85 percent of our server processors and more than 70 percent of our mobile and desktop Pentium® family processor shipments will be multi-core–based by the end of 2006” [7]
• Intel plans to have 32 cores on a die till 2015 [7]
• But do not forget the high power density and memory bandwidth issues!
Thanks
• Any questions?
References1. http://www.intel.com/technology/architecture/coremicro/index.htm
2. John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach 2nd Edition. Morgan Kaufmann, 1999.
3. PSU CSE 431, Mary Jane Irwin, Computer Architecture, Fall 2005, Lecture 28.
4. http://www-128.ibm.com/developerworks/power/library/pa-fpfxbox/?ca=dgr-lnxw09XBoxDesign
5. http://www.dns-gmbh.de/dnsgmbh/unternehmen/event-kalender/23b3b3ee6c4e487e6f4205fa03e783bc.0.0/Niagara_CMT.pdf
6. James Laudon: Performance/Watt: the new server focus. SIGARCH Computer Architecture News 33(4): 5-13 (2005)
7. http://www.intel.com/technology/computing/multi-core/index.htm
8. http://www.pcstats.com/articleimages/200502