java microarchitectures - springer978-1-4615-0993-6/1.pdf · java microarchitectures ... mario i....

14
JAVA MICROARCHITECTURES

Upload: tranthuan

Post on 06-Mar-2018

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

JAVA MICROARCHITECTURES

Page 2: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE

Page 3: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

JAVA MICROARCHITECTURES

Edited by

Vijaykrishnan Narayanan Pennsylvania State University

Mario I. Wolczko Sun Microsystems, Inc.

S P R I N G E R SCIENCE+BUSINESS M E D I A , L L C

Page 4: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

I S B N 978-1-4613-5341-6 I S B N 978-1-4615-0993-6 (eBook) DOI 10.1007/978-1-4615-0993-6

Library of Congress Cataloging-in-Publication Data

A CLP. Catalogue record for this book is available from the Library of Congress.

Copyright © 2002 Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 2002 Softcover reprint of the hardcover 1st edition 2002

A l l rights reserved. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.

Printed on acid-free paper.

Page 5: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

Contents

List of Figures

List of Tables

Preface

1 Benchmarking the Java Virtual Architecture David Gregg, James Power and John Waldron

2

VB

Xl

Xlll

1

A Study of Memory Behavior of Java Workloads 19 Yefim Shuf, Mauricio J. Serrano, Manish Gupta and Jaswinder Pal Singh

3 An Efficient Hardware Implementation of Java Bytecodes, Threads, and 41

Processes for Embedded and Real-Time ApplIcations David S. Hardin, Allen P. Mass, Michael H. Masters and Nick M. Mykris

4 Stack Dependency Resolution for Java Processors ba,>ed on Hardware 55

Folding and Translation: A Bytecode Processing Analysis M. Watheq El-Kharashi, Fayez Gebali and Kin F. Li

5 Improving Java Performance in Embedded and General-Purpose Processors 79 Ramesh Radhakrishnan, Lizy K. John, Ravi Bhargava and Deepu Talla

6 The Delft-Java Engine 105 John Glossner and Stamatis Vassiliadis

7 Quicksilver: A Quasi-static Java Compiler for Embedded Systems 123 Samuel P. Midkiff, Pramod G. Joisha, Mauricio Serrano, Manish Gupta, Anthony Bolmarcich and Peng Wu

Page 6: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

vi 8

JAVA MICROARCHITECTURES

Concurrent Garbage Collection Using Hardware-Assisted Profiling Timothy Heil and James E. Smith

9

143

Space-Time Dimensional Computing for Java Programs on the MAJC 161 Architecture

Shailender Chaudhry and Marc Tremblay

10 Java Machine and Integrated Circuit Architecture (JAMAICA) Ahmed El-Mahdy, Ian Watson and Greg Wright

11 Dynamic Java Threads on the JAMAICA Single-Chip Multiprocessor Greg Wright, Ahmed El-Mahdy and Ian Watson

References

Index

187

207

231

251

Page 7: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

List of Figures

1.1 Average dynamic bytecode percentages for the top 10 methods in terms of bytecodes executed 7

1.2 A summary of dynamic percentages of category usage by the applications in the SPEC JVM98 suite 11

2.1 Characterization of heap accesses and accesses to object fields 24 2.2 Characterization of hot spots 26 2.3 Simulation results 28 2.4 Classification of data related misses 31 2.5 Assessment of opportunities for prefetching 35 3.1 JEMCore Java Processor Core Architecture 44 3.2 Java Grande Forum Synchronization Benchmark Results 46 3.3 Multiple Java Virtual Machine Data Structures 48 3.4 The JEMBuilder Application Builder 50 3.5 Timer interrupt handler code in Java 52 3.6 Timer interrupt notification thread 52 3.7 aJ-100 Block Diagram 53 3.8 aJ -100 Package (larger than actual size) 54 4.1 Proposed Java processor architecture 58 4.2 Dual-architecture Java processor pipeline compared with

a pure mc processor pipeline and a RISC pipeline 59 4.3 Percentages of eliminated instructions relative to all in-

structions and relative to stack instructions (producers and non-anchor consumers) only 70

4.4 Speedup of folding 70 4.5 Percentages of occurrence of different folding cases rec-

ognized by the folding information generation (FIG) unit 72

4.6 Percentages of occurrence of different folding operations performed by the bytecode queue manager (BQM) 72

4.7 Percentages of occurrence of different folding patterns at the output of the folding translator unit (FT) 72

Page 8: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

Vlll JAVA MICRO ARCHITECTURES

4.8 Percentages of occurrence of different operations per-formed by the local variable file (LVF) 72

4.9 Percentages of occurrence of different folding patterns processed by the load/store unit (LS) 73

4.10 Percentages of usage of different execution units (EXs) 73 5.1 Block diagram of the picoJava-II microprocessor core 83 5.2 Basic pipeline of the picoJava-II core 83 5.3 Increasing decode bandwidth using a fill unit and DB-Cache 84 5.4 Trends in decode rate and hit rate for different DB-Cache sizes 86

5.5 Performance improvement when adding a fill unit, DB-Cache (64-16K entries) and instruction execute width of two to a picoJava-II processor 88

5.6 Relative performance of picoJava-II using the fill unit, DB-Cache (64-16K entries), execution width of two and stack disambiguation 90

5.7 Available ILP in Java workloads 91 5.8 The Hardware Interpreter (Hard-Int) architecture 93

5.9 Translating bytecodes in the Hard-Int architecture 94

5.10 Execution cycles for different execution modes on a 4-way machine 98

5.11 Execution cycles for different execution modes on a 16-way machine 99

5.12 Cycles executed per bytecode on a 4-way machine 102

6.1 DELFT-JAVA concurrent multi-threaded processor or-ganization showing mUltiple thread units, local and global processor units, thread register files, cache memory, con-trol unit, and Link Translation Buffer (LTB) 107

6.2 Indirect register access mechanism showing indirect mem-ory locations (idx), update adders, underflow/overflow signal, and resolved register address multiplexor 113

6.3 Indirect register mapping showing how a resolved regis-ter address is mapped to main memory 114

6.4 Performance results of a vector-multiply routine for var-ious processor models showing speedup normalized to an implementable pipe lined stack model 120

7.1 The indirection scheme for quasi-static compilation 128

7.2 Pseudo-code showing explicit checks for reference resolution 131

7.3 Timing measurements for an input size of 100 138

7.4 Timing measurements for an input size of 10 139

7.5 Comparing indirection table update strategies 140

Page 9: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

List of Figures ix

8.1 Example concurrent reference mutation 146 8.2 Concurrent GC RPA query 150 8.3 The relational profiling architecture contains the profile

control table (PCT) and the query engine 152 8.4 Generational write-barrier pseudo-code 155 8.5 Time line for the second GC in the Strata benchmark 156 8.6 System-on-a-chip design 157 9.1 An illustration of the Java Stack for a Java Thread 166 9.2 Java Object Structure 171 9.3 Block diagram for a MAJC implementation 176 9.4 Efficiency of the Speculative Thread for various Over-

heads and Savings 184 10.1 Dynamic bytecode execution frequencies for various byte-

code classes 191 10.2 Normalized dynamic instruction execution counts for var-

ious execution models 192 10.3 Cumulative distribution of local variable access for se-

lected SPEC JVM98 programs 193 10.4 Method call depth distribution for selected SPEC JVM98

programs 194 10.5 Register-windows miss ratios versus the number of register-

windows, for selected SPEC JVM98 programs 196 10.6 Per-procedure visible registers and argument-passing operation 198 10.7 Normalized static instruction counts, broken down into

various bytecode-mapping overheads, for selected SPEC JVM98 kernels 200

10.8 Active temporary variables distribution for selected SPEC JVM98 kernels 201

10.9 Distribution of active temporary variables that need sav-ing across method calls, for selected SPEC JVM98 kernels 202

10.10 The effect of the proposed optimizations on static in-struction counts, for selected SPEC JVM98 kernels 204

11.1 Token/thread life-cycle 211 11.2 Serial & parallel executions 212 11.3 Speedup of nfib in the current configuration 216 11.4 Speedup of nfib in the future configuration 217 11.5 Speedup of nfib, current configuration, token passing vs. oracle 218 11.6 Speedup of jnfib, using light RTS 219 11.7 Speedup of jnfib, using medium RTS 220

Page 10: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

x JAVA MICROARCHlTECTURES

11.8 Speedup ofjnjib, current configuration, light RTS, P=32, T=2 221

11.9 The Empty program 222 11.10 Speedup vs. outer loop iterations for the Empty program 223 11.11 Load balance of Empty program, LN = 219 , current

configuration 224 11.12 mpeg2encode results 226 11.13 jmpeg2decode results 227

Page 11: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

List of Tables

1.1 Measurements of total number of method calls by SPEC JVM98 applications 5

1.2 Measurements of Java method calls made and bytecodes executed by SPEC JVM98 applications 5

1.3 Calls to non-native methods in the class library 5 1.4 Bytecode instructions executed in the class library 6 1.5 Dynamic method execution frequencies for the SPEC

JVM98 programs, excluding native methods 8

1.6 By tee ode based dynamic percentages of local variable array sizes, as well as temporary and parameter sizes for SPEC JVM98 programs 10

1.7 Dynamic percentages of category usage by the applica-tions in the Java SPEC JVM98 suite 11

1.8 Total SPEC dynamic bytecode usage increases 12 1.9 SPEC bytecode usage for compress using the different

compilers 13 1.10 SPEC bytecode usage for db using the different compilers 14 1.11 SPEC bytecode usage for jess using the different compilers 14 1.12 SPEC bytecode usage for mtrt using the different compilers 15 4.1 JVM instruction categories 60 4.2 Folding templates recognized by the FIG unit 62 4.3 Summary of BQM folding operations 63 4.4 Mapping different anchor instructions to the folding op-

erations performed by the bytecode queue manager (BQM) 65 4.5 Mapping folding templates to FT output 66 4.6 Mapping anchors to LVF operations 67 4.7 SPEC JVM98 Java benchmark suite summary 69 4.8 A trace for a Java code execution 74 4.9 Associating instruction categories with JVM basic re-

quirements and our processor modules 75

Page 12: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

xu JAVA MICROARCHITECTURES

4.10 Comparison between the three approaches in supporting Java in hardware: direct stack execution, hardware inter-pretation, and hardware translation 76

5.1 Description of the SPEC JVM98 benchmarks used in this study 85

5.2 Percentage of instructions executed in parallel when us-ing a DB-Cache of 128 entries 87

5.3 Percentage of instructions executed in parallel when us-ing stack disambiguation with a 128 entry DB-Cache 89

5.4 Execution statistics for the SPEC JVM98 benchmarks. 95 5.5 Configurations of simulated processor 97 5.6 Cache performance for the SPEC JVM98 benchmarks 101 5.7 Translated code buffer performance for the SPEC JVM98

benchmarks 101 6.1 Java Virtual Machine instructions with special support

in the DELFT-JAvA processor 118 6.2 Processor organization characteristics for various pro-

cessor models 119 6.3 Processor performance and speedup for various proces-

sor models normalized to an implementable pipelined stack model 120

7.1 Method code and indirection table sizes for SPEC JVM98 with input size=lO 136

7.2 Method code and indirection table sizes for SPEC JVM98 with input size=100 137

8.1 Benchmark characteristics 157 8.2 GC Performance Characteristics 159 8.3 Write-barrier work eliminated by the RPA 159 9.1 Speedup obtained due to STC 182 9.2 Efficiency for Speculative Thread 183 10.1 Brief descriptions of the selected benchmark programs

from the SPEC JVM98 suite 190 10.2 Relative method call depths for selected SPEC JVM98

programs 195 10.3 Comparing static normalized Ideal components for the

kernels, with the corresponding Ideal components ob-tained dynamically for the full programs 200

Page 13: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

Preface

Java is an exciting new object-oriented technology. Hardware for support­ing objects and other features of Java such as multithreading, dynamic linking and loading is the focus of this book. The impact of Java's features on micro­architectural resources and issues in the design of Java-specific architectures are interesting topics that require the immediate attention of the research com­munity. While Java has become an important part of desktop applications, it is now being used widely in high-end server markets, and making forays into low end embedded computing.

A study of the behavior of Java applications is essential in guiding the design of new architectural support features. The first chapter provides a characteriza­tion of a set of Java applications at a platform-independent level. Specifically, various characteristics of bytecode execution are considered. The second chap­ter specifically delves into the memory characteristics of Java programs. The growing performance disparity between the processor core and the memory system makes memory behavior an important factor influencing performance. Further, a detailed understanding of various Java-specific features such as heap allocation, object manipulation and garbage collection, which are memory­intensive would help in identifying appropriate architectural support.

Java is becoming increasingly popular in embedded/portable environments. It is estimated that Java-enabled devices such as cell-phones, PDAs and pagers will grow from 176 million in 2001 to 721 million in 2005 [TakOl]. One of the reasons for this is that Java enables service providers to create new features very easily as it is based on the abstract Java Virtual Machine (JVM). Thus, it is currently portable to 80-95% of platforms and lets developers design and im­plement portable applications without the special tools and libraries that coding in C or C++ normally requires [PauOl]. In addition, Java allows application writers to embed animation, sound, and other features within their applica­tions easily, an important plus in web-based portable computing. Chapters 3 to 6 focus on providing architectural support for Java execution in embed­ded environments. These chapters discuss various commercial and academic approaches to designing hardware for direct bytecode execution. Various hard­ware features to support stack folding, code translation, dynamic linking and

Page 14: JAVA MICROARCHITECTURES - Springer978-1-4615-0993-6/1.pdf · JAVA MICROARCHITECTURES ... Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, ... James Power and

xiv JAVA MICROARCHITECTURES

object management are addressed. In addition to the bytecode engines covered in this volume, various Java accelerators have also been announced over the past year. A good overview of these architectures can be found in the four part series by Levy [LevOla, LevOlb, LevOlc, LevOld] and we do not attempt to duplicate this commendable effort.

Chapters 7 and 8 focus on compilation and architectural support targeted at the memory system. Chapter 7 presents the use of a novel compilation tech­nique that helps Java applications meet the small memory footprint constraints of embedded devices. As embedded JVMs are designed to run for long pe­riods of time on limited-memory embedded systems, creating and managing Java objects is of critical importance. The garbage collector (GC) is an impor­tant part of the Java virtual machine responsible for the automatic reclamation of unused memory. Chapter 8 focuses on providing support for garbage col­lection.

The final three papers of this volume focus on high-performance single-chip mUltiprocessor architectures to support Java execution. Such high-performance processors would be ideal for Java servers and workstations. A common char­acteristic of these architectures is their support for executing multiple threads. Chapter 8 discusses the concept of space-time computing and describes the MAJC architecture which supports this. Chapters 9 and 10 explain the JA­

MAICA architecture, which supports the execution of dynamic Java threads. Many of the papers presented in this book are revised versions of papers

presented at the Workshop on Hardware Support for Objects and Microar­chitectures for Java, held in conjunction with ICCD in 1999 and 2000. We would like to thank all the authors for their contribution. We also wish to express our sincere gratitude to all those who reviewed manuscripts for the book. Narayanan would like to acknowledge grants from National Science Foundation (CAREER 0093085 and 00773419) that supported him during this endeavor

The URLS cited in the bibliography were correct at the time of writing.

VIJAYKRISHNAN NARAYANAN, PENNSYLVANIA STATE UNIVERSITY

MARIO WOLCZKO, SUN MICROSYSTEMS, INC.

JANUARY 2002