towards a java multiprocessor
DESCRIPTION
Towards a Java Multiprocessor. Christof Pitter, Martin Schoeberl Vienna University of Technology, Austria. 27.September 2007. LEON3 by. ARM11 MPCcore by. Motivation. Chip multiprocessing (CMP) Actual trend in server & desktop systems Embedded systems Challenge for hard real-time systems - PowerPoint PPT PresentationTRANSCRIPT
Towards a Java MultiprocessorTowards a Java Multiprocessor
Christof Pitter, Martin Schoeberl
Vienna University of Technology, Austria
27.September 2007
04/19/23 Towards a Java Multiprocessor 2
MotivationMotivation
• Chip multiprocessing (CMP)
• Actual trend in server & desktop systems
• Embedded systems• Challenge for hard
real-time systems• RT-Java promising
topic for future
LEON3 by
ARM11 MPCcore by
04/19/23 Towards a Java Multiprocessor 3
Our GoalOur Goal
• Chip-multiprocessor (CMP)– global shared memory
• Java Optimized Processor (JOP) = Java VM in hardware
• Time predictable
• Still good performance
• Implementation in FPGA
04/19/23 Towards a Java Multiprocessor 4
Our Goal IIOur Goal II
04/19/23 Towards a Java Multiprocessor 5
AgendaAgenda
• CMP Architecture– Memory Model– Cache Memory– Synchronization
• FPGA Implementation• Benchmark Results• Conclusion• Future Work
04/19/23 Towards a Java Multiprocessor 6
Memory ModelMemory Model
Shared Memory Distributed Shared Memory
04/19/23 Towards a Java Multiprocessor 7
Why Shared Memory?Why Shared Memory?
JVM memory areas2 shared data areas: Heap, Method area
Shared Memory Distr. Shared MemoryPhysically centralized Physically distributed
Symmetric access time (UMA)
Access time varies with location (NUMA)
Arbiter Interconnection network + Message passing
Low bandwidth, high # CPUs Higher bandwidth
04/19/23 Towards a Java Multiprocessor 8
Why no NoC?Why no NoC?
• No use for a network
• Multiple masters to a slave
• Masters communicate through memory
• May introduce long latencies
• Hardware Overhead
SoC bus
04/19/23 Towards a Java Multiprocessor 9
Cache MemoryCache Memory
• Cache coherence conflicts avoided by architecture
• Stack cache: private data for each thread
• Method cache: read-only memory
• Heap not cached
04/19/23 Towards a Java Multiprocessor 10
SynchronizationSynchronization
• Protect parallel access to shared objects– JVM: associates a lock with each object– JOP: activation & deactivation of interrupts– CMP:
• Use of one global lock for the heap• Future work: multiple locks
• Avoidance of priority inversion– Priority inheritance locks
04/19/23 Towards a Java Multiprocessor 11
Proposed ArchitectureProposed Architecture
04/19/23 Towards a Java Multiprocessor 12
FPGA ImplementationFPGA Implementation
• Up to 3 JOPs• Memory arbiter• SoC bus (SimpCon)• External shared
memory
• Development board:– Altera Cyclone EP1C12– 1Mbyte SRAM
04/19/23 Towards a Java Multiprocessor 13
Simple SoC Interconnect Simple SoC Interconnect ((SimpCon)SimpCon)
• Synchronous SoC bus
• Point-to-point communication
• Master-Slave interconnection
• Signals only valid for 1 cycle– Master can continue execution
• Signal rdy_cnt:– Informs master of availabe data– Fast data transfer due pipelining
04/19/23 Towards a Java Multiprocessor 14
Memory Arbiter IMemory Arbiter I
• Resolves conflicts of competing memory requests
• SimpCon interface:– Masters with arbiter– Arbiter with slave
• Scalable for variable
# of CPUs
CPU Arbiter Shared Memory
SimpCon SimpConMaster Slave Master Slave
CPU
CPU
Master
Master
04/19/23 Towards a Java Multiprocessor 15
Memory Arbiter IIMemory Arbiter II
• Fixed priority arbitration scheme
• Priority established by unique CPU ID– Lowest ID is top priority
• Zero-cycle arbitration:– Arbitration process happens in same cycle– No bus request phase (AMBA)– Increases memory bandwidth– Will it scale? Reduces fmax
04/19/23 Towards a Java Multiprocessor 16
ExperimentsExperiments
• Performance measurements on real hardware
• Benchmark JavaBenchEmbedded
• Real world application tasks:– Lift (elevation controller in automation factory)– Kfl (node of distributed motor control system)
• One task per CPU
• Performance measured in iterations/s
04/19/23 Towards a Java Multiprocessor 17
Benchmark Results IBenchmark Results I
• Comparison between dual JOP against single JOP– Same frequency (80 MHz)
• Single JOP result:– Lift 13138 iterations/s
• Dual JOP result:
Processor JOP0 JOP1
Lift 12951 12951
97.113138
1295112951
dualJOPSpeedup
04/19/23 Towards a Java Multiprocessor 18
Benchmark Results IIBenchmark Results II• Comparison between tripple JOP against single JOP
– Maximum frequencies
• Single JOP result at 100 MHz– Lift 16425 iterations/s
• Tripple JOP result at 75 MHz:
Processor JOP0 JOP1 JOP2
Lift 11736 11538 11260
10.216425
112601153811736
trippleJOPSpeedup
04/19/23 Towards a Java Multiprocessor 19
SpeedupSpeedup
0,0
0,5
1,0
1,5
2,0
2,5
Sp
ee
du
p
1 2 3
JOP (number)
Speedup vs. number of JOPs
04/19/23 Towards a Java Multiprocessor 20
Resource ConsumptionResource Consumption
• Cyclone EP1C12Q240 by Altera (12060 LE, 29,25 KB)
Processor Resources Memory fmax
(LE) (KB) (MHz)
JOP 2815 7.63 100
Dual JOP 5540 15.62 80
Tripple JOP 8219 23.42 75
04/19/23 Towards a Java Multiprocessor 21
Maximum FrequencyMaximum Frequency
0
20
40
60
80
100
fmax
(M
Hz)
1 2 3
JOP (number)
Max. frequency vs. number of JOPs
04/19/23 Towards a Java Multiprocessor 22
ConclusionConclusion
• Proposed Java CMP with shared memory
• Verification of CMP architecture – Dual JOP & Tripple JOP prototypes running in
real hardware
• Performance measurements:– Dual JOP 1.58 times better perf. @ fmax– Tripple JOP 2.1 times better perf. @ fmax
04/19/23 Towards a Java Multiprocessor 23
Future WorkFuture Work
• Synchronization: multiple locks
• Improvement of memory arbiter:– Different arbitration schemes for time
predictability– Zero-cycle latency?
• Experiments with more cores on FPGA
• RT-Scheduling for CMP
Thank You!Thank You!
Questions & CommentsQuestions & Comments