instruction-level parallelism for low-power embedded processors january 23, 2001 presented by anup...

43
Instruction-Level Parallelism for Low- Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Upload: virginia-jordan

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Instruction-Level Parallelism for Low-Power Embedded

Processors

January 23, 2001Presented By

Anup Gangwar

Page 2: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

Introduction

Need for high performance low power processors

Synergistic hardware -compiler design for EPIC or VLIW like architectures

A new variable instruction length scheme

Full predication support in hardware

Page 3: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

Outline

Instruction-Level Parallelism Power Consumption in VLSI Circuits A Look at Available Mobile and DSP Processors High-Level Evaluation of A Low-Power VLIW

Processor The DEVIL Low-Power Processor A Step Towards Predicated Execution Conclusion

Page 4: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 4

ILP : Concepts and Limitations

Data DependencesFlow Dependence or RAWAnti Dependence or WAROutput Dependence or WAW

Reduction of critical pathControl DependencesResource Conflicts

Page 5: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 5

Page 6: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 6

Achieving ILP : Pipelining

Control dependencies affect pipelined execution

Data dependencies affect pipelined execution

Resource conflicts affect pipelined execution

Page 7: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 7

Achieving ILP: Superscalar Architectures

In-order issue with in-order completion

In-order issue with out-of-order completion

Out-of-order issue with out-of-order completion

Page 8: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 8

Page 9: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 9

Page 10: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

0

Page 11: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

1

Achieving ILP: VLIW Processors

Low circuit overhead than Superscalar Processors

Limited number of resourcesExplicit insertion of NOPs increases

code size

Page 12: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

2

Page 13: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

3

Extracting ILP : BasicBlock Scheduling

Page 14: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

4

Extracting ILP: Superblock Scheduling

Page 15: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

5

Extracting ILP: Predicated Execution

Page 16: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

6

Power Consumption in CMOS Circuits : Parallelism for Energy Efficiency

Page 17: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

7

Page 18: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

8

Available Mobile and VLIW Processors

The ARM FamilyThe ARM7 GenerationThe StrongARMThe ARM Thumb OptionThe ARM Piccolo OptionThe ARM9 and ARM10

Page 19: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 1

9

Available Mobile and VLIW Processors

The Motorola M-CoreThe LSI TinyRiscThe Hitachi SuperH FamilyVLIW Processors

The Motorola-Lucent Star*CoreThe Philips TriMediaThe HP/Intel IA-64

Page 20: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

0

High Level Evaluation of A Low-Power VLIW Processor

Energy consumption distribution

Page 21: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

1

High Level Evaluation of A Low-Power VLIW ProcessorNOP Elimination in VLIW Processor

Page 22: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

2

High Level Evaluation of A Low-Power VLIW ProcessorSpeed-up Comparison

Page 23: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

3

High Level Evaluation of A Low-Power VLIW Processor

Energy Comparison

Page 24: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

4

High Level Evaluation of A Low-Power VLIW ProcessorEnergy-Delay Product Comparison

Page 25: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

5

The DEVIL Low-Power Processor

Complexity in VLIW ArchitecturesHardware Duplication

FUs and number of registers as well as ports

Number of FUs versus type of FU

Number of FUs versus available ILP

Page 26: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

6

The DEVIL Low-Power ProcessorCode Memory

Page 27: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

7

The DEVIL Low-Power Processor

Page 28: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

8

The DEVIL Low-Power ProcessorInstruction Fetch Mechanism

Page 29: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 2

9

The DEVIL Low-Power ProcessorBranch Prediction Mechanism

Page 30: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

0

The DEVIL Low-Power Processor Performance with and without superscalar optimizations

Page 31: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

1

The DEVIL Low-Power Processor Effect of SuperScalar optimization on code size

Page 32: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

2

The DEVIL Low-Power ProcessorEffect of NOP elimination on code size

Page 33: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

3

The DEVIL Low-Power Processor Effect of NOP elimination on the number of

accesses to code memory

Page 34: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

4

The DEVIL Low-Power Processor Effect of instruction fetch mechanism on code size

Page 35: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

5

The DEVIL Low-Power Processor Code size comparison with existing mobile processors

Page 36: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

6

A Step Towards Predicated Execution

Compiler techniques for reducing predicate code sizeReduction of number of Control InstructionsPredicate promotion and Instruction merging Instruction reduction for advanced code

generation

Page 37: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

7

A Step Towards Predicated Execution:Reduction of number of Control Instructions

Page 38: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

8

A Step Towards Predicated Execution: Predicate promotion and Instruction merging

Page 39: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 3

9

A Step Towards Predicated Execution

Introducing predication support into processorEffect on code size of full predicationPredication code size and Execution

CharactersticsPrefix based predication

Page 40: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 4

0

A Step Towards Predicated ExecutionRelative number of predicated instructions

Page 41: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 4

1

A Step Towards Predicated Execution

Code expansion considering predication

Page 42: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 4

2

A Step Towards Predicated Execution Code reductions due to predicated execution

Page 43: Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

Embedded Systems Group IIT Delhi

Slid

e 4

3

Conclusions

A synergistic hardware-compiler approach for low-power processors

A new VLIW architecture to reduce increase in code size

A prefix based predicated execution architecture framework