towards a java multiprocessor

24
Towards a Java Towards a Java Multiprocessor Multiprocessor Christof Pitter, Martin Schoeberl Vienna University of Technology, Austria 27.September 2007

Upload: wendy-rojas

Post on 31-Dec-2015

28 views

Category:

Documents


2 download

DESCRIPTION

Towards a Java Multiprocessor. Christof Pitter, Martin Schoeberl Vienna University of Technology, Austria. 27.September 2007. LEON3 by. ARM11 MPCcore by. Motivation. Chip multiprocessing (CMP) Actual trend in server & desktop systems Embedded systems Challenge for hard real-time systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards a Java Multiprocessor

Towards a Java MultiprocessorTowards a Java Multiprocessor

Christof Pitter, Martin Schoeberl

Vienna University of Technology, Austria

27.September 2007

Page 2: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 2

MotivationMotivation

• Chip multiprocessing (CMP)

• Actual trend in server & desktop systems

• Embedded systems• Challenge for hard

real-time systems• RT-Java promising

topic for future

LEON3 by

ARM11 MPCcore by

Page 3: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 3

Our GoalOur Goal

• Chip-multiprocessor (CMP)– global shared memory

• Java Optimized Processor (JOP) = Java VM in hardware

• Time predictable

• Still good performance

• Implementation in FPGA

Page 4: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 4

Our Goal IIOur Goal II

Page 5: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 5

AgendaAgenda

• CMP Architecture– Memory Model– Cache Memory– Synchronization

• FPGA Implementation• Benchmark Results• Conclusion• Future Work

Page 6: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 6

Memory ModelMemory Model

Shared Memory Distributed Shared Memory

Page 7: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 7

Why Shared Memory?Why Shared Memory?

JVM memory areas2 shared data areas: Heap, Method area

Shared Memory Distr. Shared MemoryPhysically centralized Physically distributed

Symmetric access time (UMA)

Access time varies with location (NUMA)

Arbiter Interconnection network + Message passing

Low bandwidth, high # CPUs Higher bandwidth

Page 8: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 8

Why no NoC?Why no NoC?

• No use for a network

• Multiple masters to a slave

• Masters communicate through memory

• May introduce long latencies

• Hardware Overhead

SoC bus

Page 9: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 9

Cache MemoryCache Memory

• Cache coherence conflicts avoided by architecture

• Stack cache: private data for each thread

• Method cache: read-only memory

• Heap not cached

Page 10: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 10

SynchronizationSynchronization

• Protect parallel access to shared objects– JVM: associates a lock with each object– JOP: activation & deactivation of interrupts– CMP:

• Use of one global lock for the heap• Future work: multiple locks

• Avoidance of priority inversion– Priority inheritance locks

Page 11: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 11

Proposed ArchitectureProposed Architecture

Page 12: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 12

FPGA ImplementationFPGA Implementation

• Up to 3 JOPs• Memory arbiter• SoC bus (SimpCon)• External shared

memory

• Development board:– Altera Cyclone EP1C12– 1Mbyte SRAM

Page 13: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 13

Simple SoC Interconnect Simple SoC Interconnect ((SimpCon)SimpCon)

• Synchronous SoC bus

• Point-to-point communication

• Master-Slave interconnection

• Signals only valid for 1 cycle– Master can continue execution

• Signal rdy_cnt:– Informs master of availabe data– Fast data transfer due pipelining

Page 14: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 14

Memory Arbiter IMemory Arbiter I

• Resolves conflicts of competing memory requests

• SimpCon interface:– Masters with arbiter– Arbiter with slave

• Scalable for variable

# of CPUs

CPU Arbiter Shared Memory

SimpCon SimpConMaster Slave Master Slave

CPU

CPU

Master

Master

Page 15: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 15

Memory Arbiter IIMemory Arbiter II

• Fixed priority arbitration scheme

• Priority established by unique CPU ID– Lowest ID is top priority

• Zero-cycle arbitration:– Arbitration process happens in same cycle– No bus request phase (AMBA)– Increases memory bandwidth– Will it scale? Reduces fmax

Page 16: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 16

ExperimentsExperiments

• Performance measurements on real hardware

• Benchmark JavaBenchEmbedded

• Real world application tasks:– Lift (elevation controller in automation factory)– Kfl (node of distributed motor control system)

• One task per CPU

• Performance measured in iterations/s

Page 17: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 17

Benchmark Results IBenchmark Results I

• Comparison between dual JOP against single JOP– Same frequency (80 MHz)

• Single JOP result:– Lift 13138 iterations/s

• Dual JOP result:

Processor JOP0 JOP1

Lift 12951 12951

97.113138

1295112951

dualJOPSpeedup

Page 18: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 18

Benchmark Results IIBenchmark Results II• Comparison between tripple JOP against single JOP

– Maximum frequencies

• Single JOP result at 100 MHz– Lift 16425 iterations/s

• Tripple JOP result at 75 MHz:

Processor JOP0 JOP1 JOP2

Lift 11736 11538 11260

10.216425

112601153811736

trippleJOPSpeedup

Page 19: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 19

SpeedupSpeedup

0,0

0,5

1,0

1,5

2,0

2,5

Sp

ee

du

p

1 2 3

JOP (number)

Speedup vs. number of JOPs

Page 20: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 20

Resource ConsumptionResource Consumption

• Cyclone EP1C12Q240 by Altera (12060 LE, 29,25 KB)

Processor Resources Memory fmax

(LE) (KB) (MHz)

JOP 2815 7.63 100

Dual JOP 5540 15.62 80

Tripple JOP 8219 23.42 75

Page 21: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 21

Maximum FrequencyMaximum Frequency

0

20

40

60

80

100

fmax

(M

Hz)

1 2 3

JOP (number)

Max. frequency vs. number of JOPs

Page 22: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 22

ConclusionConclusion

• Proposed Java CMP with shared memory

• Verification of CMP architecture – Dual JOP & Tripple JOP prototypes running in

real hardware

• Performance measurements:– Dual JOP 1.58 times better perf. @ fmax– Tripple JOP 2.1 times better perf. @ fmax

Page 23: Towards a Java Multiprocessor

04/19/23 Towards a Java Multiprocessor 23

Future WorkFuture Work

• Synchronization: multiple locks

• Improvement of memory arbiter:– Different arbitration schemes for time

predictability– Zero-cycle latency?

• Experiments with more cores on FPGA

• RT-Scheduling for CMP

Page 24: Towards a Java Multiprocessor

Thank You!Thank You!

Questions & CommentsQuestions & Comments