introduction - 123seminarsonly.comor any form of parametric polymorphism) and user-defined operator...
TRANSCRIPT
Seminar Report’03 HPJava
Introduction
The explosion of java over the last year has been driven
largely by its in role in bringing a new generation of interactive
web pages to World Wide Web. Undoubtedly various features of
the languages-compactness, byte code portability, security, and
so on—make it particularly attractive as an implementation
languages for applets embedded in web pages. But it is clear
that the ambition of the Java development team go well beyond
enhancing the functionality of HTML documents.
“Java is designed to meet the chalanges of application
development on the context of heterogeneous, network-wide
distributed environments. Paramaount amoung these chalanges
is secure delivery of applications that consume the minimum of
systems resources, can run on any hardware and software
platform, can be extended dynamically.”
Several of these concerns are mirrored in developments in
the High Prerformance Computing world over a number of years.
A decade ago the focus of interest in the parallel computing
community was on parallel hardware. A parallel computer was
typically built from specialized processers through a proprietary
high-performance communication switch. If the machine also
had to be programmed in a proprietary language, that was an
acceptable price for the benefits of using a supercomputer. This
attitude was not sustainable as one parallel architecture gave
way to another, and cost of porting software became exorbitant.
Dept. of CSE MESCE, Kuttippuram1
Seminar Report’03 HPJava
For several years now, portability across platforms had been a
central concern in parallel computing.
HPJava is a programming language extended from Java to
support parallel programming, especially (but not exclusively)
data parallel programming on message passing and distributed
memory systems, from multi-processor systems to workstation
clusters.
Although it has a close relationship with HPF, the design of
HPJava does not inherit the HPF programming model. Instead
the language introduces a high-level structured SPMD
programming style--the HPspmd model. A program written in
this kind of language explicitly coordinates well-defined process
groups. These cooperate in a loosely synchronous manner,
sharing logical threads of control. As in a conventional
distributed-memory SPMD program, only a process owning a
data item such as an array element is allowed to access the
item directly. The language provides special constructs that
allow programmers to meet this constraint conveniently.
Besides the normal variables of the sequential base
language, the language model introduces classes of global
variables that are stored collectively across process groups.
Primarily, these are distributed arrays. They provide a global
name space in the form of globally subscripted arrays, with
assorted distribution patterns. This helps to relieve
programmers of error-prone activities such as the local-to-
Dept. of CSE MESCE, Kuttippuram2
Seminar Report’03 HPJava
global, global-to-local subscript translations which occur in data
parallel applications.
In addition to special data types the language provides
special constructs to facilitate both data parallel and task
parallel programming. Through these constructs, different
processors can either work simultaneously on globally
addressed data, or independently execute complex procedures
on locally held data. The conversion between these phases is
seamless.
In the traditional SPMD mold, the language itself does not
provide implicit data movement semantics. This greatly
simplifies the task of the compiler, and should encourage
programmers to use algorithms that exploit locality. Data on
remote processors is accessed exclusively through explicit
library calls. In particular, the initial HPJava implementation
relies on a library of collective communication routines originally
developed as part of an HPF runtime library. Other distributed-
array-oriented communication libraries may be bound to the
language later. Due to the explicit SPMD programming model,
low level MPI communication is always available as a fall-back.
The language itself only provides basic concepts to organize
data arrays and process groups. Different communication
patterns are implemented as library functions. This allows the
possibility that if a new communication pattern is needed, it is
relatively easily integrated through new libraries.
2. Overview of HPJava
Dept. of CSE MESCE, Kuttippuram3
Seminar Report’03 HPJava
HPJava stands for “high performance java”. Java already
provides parallelism through threads. But that model of
parallelism can only be easily exploited on shared memory
computers. HPJava is targetted at distributed memory parallel
computers (most likely, networks of PCs and workstations).
2.1. HPJava History
Active in the early 1990s, the High Performance Fortran
Forum brought together leading high-performance computing
practitioners to define a common language for data parallel
computing. Inspired by the success of parallel dialects of Fortran
such as Connection Machine Fortran, the resulting high
performance Fortran(HPF) language definition extended the
then-standard Fortran 90 with several parallel features.
HPJava project started around 1997, growing out of group’s
earlier HPF work.People were starting to talking about java
Grande and the possibility that eventually java could be a high-
performance language. The reason was that the syntax was
considerably simpler, and perhaps there was scope to extend it
with the features we wanted for data-parallel computing.
2.2. HPjava Philosophy
HPJava lifts some ideas directly from HPF, and its domain
of applicability is likely to overlap HPF to a large extent. It has
an almost equivalent model of distributed arrays(the only very
Dept. of CSE MESCE, Kuttippuram4
Seminar Report’03 HPJava
significant difference in this respect is that eventually
abandoned the “block cyclic” distribution format); you can write
HPJava programs that very look like corresponding HPF
programs. But the philosophies of the languages differ in some
significant ways. HPJava was designed in “bottom up” manner.
3. Characterestics
We have explored the practicality of doing parallel
computing in Java, and of providing Java interfaces to High
Performance Computing software. For various reasons, the
success of this exercise was not a foregone conclusion. Java sits
on a virtual machine model that is significantly different to the
hardware-oriented model which C or Fortran exploit directly.
Java discourages or prevents direct access to the some of the
fundamental resources of the underlying hardware (most
extremely, its memory).
Which is the better strategy? In the long term Java may
become a major implementation language for large software
packages like MPI. It certainly has advantages in respect of
portability that could simplify implementations dramatically. In
the immediate term recoding these packages does not appear
so attractive. Java wrappers to existing software look more
sensible. On a cautionary note, our experience with MPI
suggests that interfacing Java to non-trivial communication
packages may be less easy than it sounds. Nevertheless, we
Dept. of CSE MESCE, Kuttippuram5
Seminar Report’03 HPJava
intend in the future to create a Java interface to an existing run-
time library for data parallel computation.
So is Java, as it stands, a good language for High
Performance Computing?
It still has to be demonstrated that Java can be compiled to
code of efficiency comparable with C or Fortran. Many avenues
are being followed simultaneously towards a higher
performance Java. Besides the Java chip effort of Sun, it has
been reported at this workshop that IBM is developing an
optimizing Java compiler which produces binary code directly,
that Rice University and Rochester University are working on
optimization and restructuring of bytecode generated by javac,
and that Indiana University is working on source restructuring to
parallelize Java. Parallel interpretation of bytecode is also an
emerging practice. For example, the IBM JVM, an
implementation of JVM on shared memory architectures, was
released in spring 1996, and UIUC has recently started work
aimed at parallel interpretation of Java bytecode for distributed
memory systems.
Another promising approach under investigation is to
integrate interpretation and compilation techniques for parallel
execution of Java programs. In such a system, a partially
ordered set of interpretive frames is generated by an II/CVM
compiler. A frame is a description of some subtask, whose
granularity may range from a single scalar assignment
statement to a solver for a system of equations. Under
Dept. of CSE MESCE, Kuttippuram6
Seminar Report’03 HPJava
supervision of the virtual machine (II/CVM), the actions specified
in a frame may be performed in one of three ways:
• Executed by an interpretive module directly,
which also incorporates JIT compilation capability.
• Some precompiled computational library
function is invoked locally to accomplish the task; this
function may be executed sequentially or in parallel.
• The frame is sent to some registered remote
system, which will get the work done, once again either
sequentially or in parallel.
With this approach, optimized binary codes for well formed
computation subtasks exist in runtime libraries, supporting a
high level interpretive environment. Task parallelism is observed
among different frames executed by the three mechanisms
simultaneously, while data parallelism is observed in the
execution of some of the runtime functions.
Presuming these efforts satisfactorily address the
performance issue, the second aspect in question concerns
expressiveness of the Java language. Our final interface to MPI is
quite elegant, and provides much of the functionality of the
standard C and Fortran bindings. But creating this interface was
a more difficult process than one might hope, both in terms of
getting a good specification, and in terms of making the
implementation work. The lack of features like C++ templates
(or any form of parametric polymorphism) and user-defined
operator overloading (available in many modern languages,
from functional programming languages to Fortran) made it
difficult to produce a completely satisfying interface to a data
Dept. of CSE MESCE, Kuttippuram7
Seminar Report’03 HPJava
parallel library. The Java language as currently defined imposes
various limits to the creativity of the programmer.
In many respects Java is undoubtedly a better language
than Fortran. It is object-oriented to the core and highly
dynamic, and there is every reason to suppose that such
features will be as valuable in scientific computing as in any
other programming discipline. But to displace established
scientific programming languages Java will surely have to
acquire some of the facilities taken for granted in those
languages.
Popular acclaim aside, there are some reasons to think
that Java may be a good language for scientific and parallel
programming .
● Java is a descendant of C++ .C and C++ are used
increasingly in scientific programming; they are already
used almost universally by implementers of parallel libraries
and compilers. In recent years numerous variations on the
theme of C++ for parallel computing have appeared.
● Java omits various features of C and C++ that are
considered “difficult”—notably, pointers. Poor compiler
analysis has often been blamed on these features. The
inference is that Java, like Fortran, may be a suitable source
language for highly optimizing compilers (although direct
evidence for this belief is still lacking).
● Java comes in built in multithreading. Independent threads
may be scheduled on different processor by a suitable
runtime. In any case multithreading can be very convenient
in explicit message-passing styles of parallel programming.
Dept. of CSE MESCE, Kuttippuram8
Seminar Report’03 HPJava
HPJava is a language for parallel programming, especially
suitable for programming massively parallel, distributed
memory computers.
3.1. Multidimensional arrays
First we describe a modest extension to Java that adds a
class of true multi-dimensional arrays to the standard Java
language. The new arrays allow regular section subscripting,
similar to Fortran 90 arrays. The syntax described in this section
is a subset of the syntax introduced later for parallel arrays and
algorithms: the only motivation for discussing the sequential
subset first is to simplify the overall presentation. No attempt is
made to integrate the new multidimensional arrays with the
standard Java arrays: they are a new kind of entity that coexists
in the language with ordinary Java arrays. There are good
technical reasons for keeping the two kinds of array separate.
The type-signatures and constructors of the
multidimensional array use double brackets to distinguish
them from ordinary arrays:
int [[,]] a = new int [[5, 5]] ;
float [[,,]] b = new float [[10, n, 20]] ;
int [[]] c = new int [[100]] ;
a, b and c are respectively 2-, 3- and one- dimensional arrays.
Of course c is very similar in structure to the standard array d,
created by
int [] d = new int [100] ;
c and d are not identical, though.
Dept. of CSE MESCE, Kuttippuram9
Seminar Report’03 HPJava
Access to individual elements of a multidimensional array
goes through a subscripting operation involving single brackets,
for example
for(int i = 0 ; i < 4 ; i++)
a [i, i + 1] = i + c [i] ;
For reasons that will become clearer in later sections, this
style of subscripting is called local subscripting. In the current
sequential context, apart from the fact that a single pair of
brackest may include several comma-separated subscripts, this
kind of subscripting works just like ordinary Java array
subscripting. Subscripts always start at zero, in the ordinary Java
or C style (there is no Fortran-like lower bound).
In general our language has no idea of Fortran-like array
assignments.
int [[,]] e = new int [[n, m]] ;
...
a = e ;
In the assignment simply copies a handle to object
referenced by e into a. There is no element-by-element copy
involved. Similarly we introduce no idea of elemental arithmetic
or elemental function application. If e and a are arrays, the
expressions
e + a
Math.cos(e)
are type errors.
Our HPJava does import a Fortran-90-like idea of array
regular sections. The syntax for section subscripting is different
Dept. of CSE MESCE, Kuttippuram10
Seminar Report’03 HPJava
to the syntax for local subscripting. Double brackets are used.
These brackets can include scalar subscripts or subscript
triplets.
A section is an object in its own right--its type is that of a
suitable multi-dimensional array. It describes some subset of the
elements of the parent array. This is slightly different to the
situation in Fortran, where sections cannot usually be captured
as named entities.
int [[]] e = a [[2, 2 :]] ;
foo(b [[ : , 0, 1 : 10 : 2]]) ;
e becomes an alias for the 3rd row of elements of a. The
procedure foo should expect a two-dimensional array as
argument. It can read or write to the set of elements of b
selected by the section. As in Fortran, upper or lower bounds
can be omitted in triplets, defaulting to the actual bound of the
parent array, and the stride entry of the triplet is optional. The
subscripts of e, like any other array, start at 0, although the first
element is identified with a [2, 2].
In our language, unlike Fortran, it is not allowed to use
vectors of integers as subscripts. The only sections recognized
are regular sections defined through scalar and triplet
subscripts.
The language provides a library of functions for
manipulating its arrays, closely analogous to the array
transformational intrinsic functions of Fortran 90:
int [[,]] f = new int [[5, 5]] ;
Dept. of CSE MESCE, Kuttippuram11
Seminar Report’03 HPJava
HPJlib.shift(f, a, -1, 0, CYCL) ;
float g = HPJlib.sum(b) ;
int [[]] h = new int [[100]] ;
HPJlib.copy(h, c) ;
The shift operation with shift-mode CYCL executes a cyclic
shift on the data in its second argument, copying the result to
its first argument--an array of the same shape. In the example
the shift amount is -1, and the shift is performed in dimension 0
of the array--the first of its two dimensions. The sum operation
simply adds all elements of its argument array. The copy
operation copies the elements of its second argument to its
first--it is something like an array assignment. These functions
may have to be overloaded to apply to some finite set of array
types, eg they may be defined for arrays with elements of any
suitable Java primitive type, up to some maximum rank of
array. Alternatively the type-hierarchy for arrays can be defined
in a way that allows these functions to be more polymorphic.
3.2. Process arrays
HPJava adds class libraries and some additional syntax for
dealing with distributed arrays. These arrays are viewed as
coherent global entities, but their elements are divided across a
set of cooperating processes. As a pre-requisite to introducing
distributed arrays we discuss the process arrays over which
their elements are scattered.
Dept. of CSE MESCE, Kuttippuram12
Seminar Report’03 HPJava
An abstract base class Procs has subclasses Procs1,
Procs2, ..., representing one-dimensional process arrays, two-
dimensional process arrays, and so on.
Procs2 p = new Procs2(2, 2) ;
Procs1 q = new Procs1(4) ;
These declarations set p to represent a 2 by 2 process
array and q to represent a 4-element, one-dimensional process
array. In either case the object created describes a group of 4
processes. At the time the Procs constructors are executed the
program should be executing on four or more processes. Either
constructor selects four processes from this set and identifies
them as members of the constructed group.
Procs has a member function called member, returning a
boolean value. This is true if the local process is a member of
the group, false otherwise.
if(p.member()) {
...
}
The code inside the if is executed only if the local process
is a member p. We will say that inside this construct the active
process group is restricted to p.
The multi-dimensional structure of a process array is
reflected in its set of process dimensions. An object is
associated with each dimension. These objects are accessed
through the inquiry member dim:
Dept. of CSE MESCE, Kuttippuram13
Seminar Report’03 HPJava
Dimension x = p.dim(0) ;
Dimension y = p.dim(1) ;
Dimension z = q.dim(0) ;
The object returned by the dim inquiry has class
Dimension. The members of this class include the inquiry crd.
This returns the coordinate of the local process with respect to
the process dimension. The result is only well-defined if the local
process is a member of the parent process array. The inner
body code in
if(p.member())
if(x.crd() == 0)
if(y.crd() == 0) {
...
}
will only execute on the first process from p, with
coordinates(0,0).
Distributed arrays
Some or all of the dimensions of a multi-dimensional
array can be declared to be distributed ranges. In general a
distributed range is represented by an object of class Range. A
Range object defines a range of integer subscripts, and defines
how they are mapped into a process array dimension. In fact the
Dimension class introduced in the previous section is a subclass
of Range. In this case the integer range is just the range of
coordinate values associated with the dimension. Each value in
the range is mapped, of course, to the process (or slice of
Dept. of CSE MESCE, Kuttippuram14
Seminar Report’03 HPJava
processes) with that coordinate. This kind of range is also called
a primitive range. More complex subclasses of Range implement
more elaborate maps from integer ranges to process
dimensions. Some of these will be introduced in later sections.
For now we concentrate on arrays constructed with Dimension
objects as their distributed ranges.
The syntax of section 2 is extended in the following way to
support distributed arrays
• A distributed range object may appear in place of an integer
extent in the ``constructor'' of the array (the expression
following the new keyword).
• If a particular dimension of the array has a distributed range,
the corresponding slot in the type signature of the array
should include a # symbol.
• In general the constructor of the distributed array must be
followed by an on clause, specifying the process group
over which the array is distributed. Distributed ranges of
the array must be distributed over distinct dimensions of
this group.
Assume p, x and y are declared as in the previous section, then
float [[#,#,]] a = new float [[x, y, 100]] on p ;
defines a as a 2 by 2 by 100 array of floating point numbers.
Because the first two dimensions of the array are distributed
ranges--dimensions of p--a is actually realized as four segments
of 100 elements, one in each of the processes of p. The process
in p with coordinates i, j holds the section a [[i, j, :]].
The distributed array a is equivalent in terms of storage to
four local arrays defined by
Dept. of CSE MESCE, Kuttippuram15
Seminar Report’03 HPJava
float [] b = new float [100] ;
But because a is declared as a collective object we can
apply collective operations to it. The HPJlib functions introduced
in section 2 apply equally well to distributed arrays, but now
they imply inter-processor communication.
float [[#,#,]] a = new float [[x, y, 100]] on p,
b = new float [[x, y, 100]] on p ;
HPJlib.shift(a, b, -1, 0, CYCL) ;
The shift operation causes the local values of a to be
overwritten with values of b from a processor adjacent in the x
dimension.
There is a catch in this. When subscripting the distributed
dimensions of an array it is simply disallowed to use subscripts
that refer to off-processor elements. While this:
int i = x.crd(), j = y.crd() ;
a [i, j, 20] = a [i, j, 21] ;
is allowed, this:
int i = x.crd(), j = y.crd() ;
a [i, j, 20] = b [(i + 1) % 2, j, 20] ;
is forbidden. The second example could apparently be
implemented using a nearest neighbour communication, quite
similar to the shift example above. But our language imposes an
strict policy distinguishing it from most data parallel languages:
while library functions may introduce communications, language
primitives such as array subscripting never imply
communication.
Dept. of CSE MESCE, Kuttippuram16
Seminar Report’03 HPJava
If subscripting distributed dimensions is so restricted, why
are the i, j subscripts on the arrays needed at all? In the
examples of this section these subscripts are only allowed one
value on each processor. Well, the inconvience of specifying the
subscripts will be reduced by language constructs introduced
later, and the fact that only one subscript value is local is a
special feature of the primitive ranges used here. The higher
level distributed ranges introduced later map multiple elements
to individual processes. Subscripting will no longer look so
redundant.
The on construct and the active process group
if(p.member()) {
...
}
appeared. Our language provides a short way of writing this
construct
on(p) {
...
}
In fact the on construct provides some extra value.
Informally we said in section 3 that the active process group is
restricted to p inside the body of the p.member() conditional
construct. The language incorporates a more formal idea of an
active process group (APG). At any point of execution some
process group is singled out as the APG. An on(p) construct
specifically changes the value of the APG to p. On exit from the
construct, the APG is restored to its value on entry.
Dept. of CSE MESCE, Kuttippuram17
Seminar Report’03 HPJava
Elevating the active process group to a part of the
language allows some simplifications. For example, it provides a
natural default for the on clause in array constructors. More
importantly, formally defining the active process group
simplifies the statement of various rules about what operations
are legal inside distributed control constructs like on.
Other features
We have already described most of the important
language features we propose to implement. Two additional
features that are quite important in practice but have not been
discussed are subranges and subgroups. A subrange is simply a
range which is a regular section of some other range, created by
syntax like x [0 : 49]. Subranges are created tacitly when a
distributed array is subscripted with a triplet, and they can also
be used directly to create distributed arrays with general HPF-
like alignments. A subgroup is some slice of a process array,
formed by restricting process coordinates in one or more
dimensions to single values. Again they may be created
implicitly by section subscripting, this time using a scalar
subscript. They also formally describe the state of the active
process group inside at and over constructs.
The framework described is much more powerful than
space allows us to demonstrate in this paper. This power comes
in part from the flexibility to add features by extending the
libraries associated with the language. We have only illustrated
the simplest kinds of distribution format. But any HPF 1.0 array
distribution format, plus various others, can be incorporated by
Dept. of CSE MESCE, Kuttippuram18
Seminar Report’03 HPJava
extending the Range hierarchy in the run-time library. We have
only illustrated shift and writeHalo operations from the
communication library, but the library also includes much more
powerful operations for remapping arrays and performing
irregular data accesses. Our intention is to provide minimal
language support for distributed arrays, just enough to facilitate
further extension through construction of new libraries.
For a more complete description of a slightly earlier
version of the proposed language, see
4. Considerations in HPJava language design
This section discuss some design and implementation
issues in the HPJava language. The language is briefly reviewed,
then the class library that forms the foundation of the
translation scheme is described. Through example codes, we
illustrate how HPJava source codes can be translated
straightforwardly to ordinary SPMD Java programs calling this
library. This is followed by a discussion of the rationale for
introducing the language in the first place, and of how various
language features have been designed to facilitate efficient
implementation.
4.1. Translation scheme
The initial HPJava compiler is implemented as a source-to-
source translator converting an HPJava program to a Java node
program, with calls to runtime functions. The runtime system is
built on the NPAC PCRC runtime library, which has a kernel
Dept. of CSE MESCE, Kuttippuram19
Seminar Report’03 HPJava
implemented in C++ and a Java interface implemented in Java
and C++.
4.1.1. Java packages for HPspmd programming
The current runtime interface for HPJava is called adJava. It
consists of two Java packages. The first is the HPspmd runtime
proper. It includes the classes needed to translate language
constructs. The second package provides communication and
some simple I/O functions. These two packages will be outlined
in this section.
The classes in the first package include an environment
class, distributed array ``container classes'', and related classes
describing process groups and index ranges. The environment
class SpmdEnv provides functions to initialize and finalize the
underlying communication library (currently MPI). Constructors
call native functions to prepare the lower level communication
package. An important field, apg, defines the group of processes
that is cooperating in ``loose synchrony'' at the current point of
execution.
The other classes in this package correspond directly to
HPJava built-in classes. The first hierarchy is based on Group. A
group, or process group, defines some subset of the processes
executing the SPMD program. Groups have two important roles
in HPJava. First they are used to describe how program variables
such as arrays are distributed or replicated across the process
pool. Secondly they are used to specify which subset of
processes execute a particular code fragment. Important
Dept. of CSE MESCE, Kuttippuram20
Seminar Report’03 HPJava
members of adJava Group class include the pair on(), no() used
to translate the on construct.
Figure 1: The HPJava Group hierarchy
The most common way to create a group object is through
the constructor for one of the subclasses representing a process
grid. The subclass Procs represents a grid of processes and
carries information on process dimensions: in particular an
inquiry function dim(r) returns a range object describing the -
th process dimension. Procs is further subclassed by Procs0,
Procs1, Procs2, ...which provide simpler constructors for fixed
dimensionality process grids. The class hierarchy of groups and
process grids is shown in figure 1.
The second hierarchy in the package is based on Range. A
range is a map from the integer interval into some
process dimension (ie, some dimension of a process grid).
Ranges are used to parametrize distributed arrays and the
overall distributed loop.
Dept. of CSE MESCE, Kuttippuram21
Seminar Report’03 HPJava
Figure 2: The HPJava Range hierarchy
The most common way to create a range object is to use
the constructor for one of the subclasses representing ranges
with specific distribution formats. The current class hierarchy is
given in figure 2. Simple block distribution format is
implemented by BlockRange, while CyclicRange and
BlockCyclicRange represent other standard distribution formats
of HPF. The subclass CollapsedRange represents a sequential
(undistributed range). Finally, DimRange represents the range of
coordinates of a process dimension itself--just one element is
mapped to each process.
The related adJava class Location represents an individual
location in a particular distributed range. Important members of
the adJava Range class include the function location(i) which
returns the th location in a range and its inverse, idx(l), which
returns the global subscript associated with a given location.
Important members of the Location class include at() and ta(),
used in the implementation of the HPJava that at construct.
Finally in this package we have the rather complex
hierarchy of classes representing distributed arrays. HPJava
Dept. of CSE MESCE, Kuttippuram22
Seminar Report’03 HPJava
global arrays declared using [[ ]] are represented by Java
objects belonging to classes such as:
Array1dI, Array1cI,
Array2ddI, Array2dcI, Array2cdI, Array2ccI,
...
Array1dF, Array1cF,
Array2ddF, Array2dcF, Array2cdF, Array2ccF,
...
Generally speaking the class Arrayndc...T represents -
dimensional distributed array with elements of type T, currently
one of I, F, ..., meaning int, float, ... . The penultimate part of
the class name is a string of ``c''s and ``d''s specifying
whether each dimension is collapsed or distributed. These
correlate with presence or absence of an asterisk in slots of the
HPJava type signature. The concrete Array... classes implement
a series of abstract interfaces. These follow a similar naming
convention, but the root of their names is Section rather than
Array (so Array2dcI, for example, implements Section2dcI). The
hierarchy of Section interfaces is illustrated in figure 3.
Dept. of CSE MESCE, Kuttippuram23
Seminar Report’03 HPJava
Figure 3: The adJava Section hierarchy
The need to introduce the Section interfaces should be
evident from the hierarchy diagram. The type hierarchy of
HPJava involves a kind of multiple inheritance. The array type int
[[*, *]], for example, is a specialization of both the types int
[[*, ]] and int [[, *]]. Java allows ``multiple inheritance'' only
from interfaces, not classes.
Here we mention some important members of the Section
interfaces. The inquiry dat() returns an ordinary one dimensional
Java array used to store the locally held elements of the
distributed array. The member pos(i, ...), which takes
arguments, returns the local offset of the element specified by
its list of arguments. Each argument is either a location (if the
corresponding dimension is distributed) or an integer (if it is
collapsed). The inquiry grp() returns the group over which
Dept. of CSE MESCE, Kuttippuram24
Seminar Report’03 HPJava
elements of the array are distributed. The inquiry rng(d) returns
the th range of the array.
The second package in adJava is the communication
library. The adJava communication package includes classes
corresponding to the various collective communication
schedules provided in the NPAC PCRC kernel. Most of them
provide of a constructor to establish a schedule, and an execute
method, which carries out the data movement specified by the
schedule. The communication schedules provided in this
package are based on the NPAC runtime library. Different
communication models may eventually be added through
further packages.
The collective communication schedules can be used
directly by the programmer or invoked through certain wrapper
functions. A class named Adlib is defined with static members
that create and execute communication schedules and perform
simple I/O functions. This class includes, for example, the
following methods, each implemented by constructing the
appropriate schedule and then executing it.
static public void remap(Section dst, Section src)
static public void shift(Section dst, Section src,
int shift, int dim, int mode)
static public void copy(Section dst, Section src)
static public void writeHalo(Section src,
int[] wlo, int[] whi, int[] mode)
Use of these functions will be illustrated in later examples.
Polymorphism is achieved by using arguments of class Section.
4.2. Issues in the language design
Dept. of CSE MESCE, Kuttippuram25
Seminar Report’03 HPJava
With some of the implementation mechanisms exposed,
we can better discuss the language design itself.
4.2.1. Extending the Java language
The first question to answer is why use Java as a base
language? Actually, the programming model embodied in
HPJava is largely language independent. It can bound to other
languages like C, C++ and Fortran. But Java is a convenient
base language, especially for initial experiments, because it
provides full object-orientation--convenient for describing
complex distributed data--implemented in a relatively simple
setting, conducive to implementation of source-to-source
translators. It has been noted elsewhere that Java has various
features suggesting it could be an attractive language for
science and engineering
4.2.2. Datatypes in HPJava
In a parallel language, it is desirable to have both local
variables (like the ones in MPI programming) and global
variables (like the ones in HPF programming). The former
provide flexibility and are ideal for task parallel programming;
the latter are convenient especially for data parallel
programming.
In HPJava, variable names are divided into two sets. In
general those declared using ordinary Java syntax represent
local variables and those declared with [[ ]] represent global
Dept. of CSE MESCE, Kuttippuram26
Seminar Report’03 HPJava
variables. The two sectors are independent. In the
implementation of HPJava the global variables have special data
descriptors associated with them, defining how their
components are divided or replicated across processes. The
significance of the data descriptor is most obvious when dealing
with procedure calls.
Passing array sections to procedure calls is an important
component in the array processing facilities of Fortran90 [1].
The data descriptor of Fortran90 will include stride information
for each array dimension. One can assume that HPF needs a
much more complex kind of data descriptor to allow passing
distributed arrays across procedure boundaries. In either case
the descriptor is not visible to the programmer.
Java has a more explicit data descriptor concept; its arrays are
considered as objects, with, for example, a publicly accessible
length field. In HPJava, the data descriptors for global data are
similar to those used in HPF, but more explicitly exposed to
programmers. Inquiry functions such as grp(), rng() have a
similar role in global data to the field length in an ordinary Java
array.
Keeping two data sectors seems to complicate the
language and its syntax. But it provides convenience for both
task and data parallel processing. There is no need for things
like the LOCAL mechanism in HPF to call a local procedure on
the node processor. The descriptors for ordinary Java variables
are unchanged in HPJava. On each node processor ordinary Java
data will be used as local varables, like in an MPI program
Dept. of CSE MESCE, Kuttippuram27
Seminar Report’03 HPJava
4.2.3. Programming convenience
The language provides some special syntax for the
programmer's convenience. Unlike the syntax for data
declaration, which has fundamental significance in the
programming model, these extensions are purely provide
syntactic conveniences.
There are a limited number of Java operators overloaded. A
group object can be restricted by a location using the /
operation, and a sub-range or location can be obtained from a
range using the [ ] operator enclosing a triplet expression or an
integer, These pieces of syntax can be considered as shorthand
for suitable constructors in the corresponding classes. This is
comparable to the way Java provides special syntax support for
String class constructor.
Another kind of overloading occurs in location shift, which
is used to support ghost regions. A shift operator + is defined
between a location and an integer. It will be illustrated in the
examples in the next section. This is a restricted operation--it
has meaning (and is legal) only in an array subscript expression.
5. Discussion and related work
We have described a conservative set of extensions to
Java. In the context of an explicitly SPMD programming
environment with a good communication library, we claim these
extensions provide much of the concise expressiveness of HPF,
without relying on very sophisticated compiler analysis. The
object-oriented features of Java are exploited to give an elegant
Dept. of CSE MESCE, Kuttippuram28
Seminar Report’03 HPJava
parameterization of the distributed arrays of the extended
language. Because of the relatively low-level programming
model, interfacing to other parallel-programming paradigms is
more natural than in HPF. With suitable care, it is possible to
make direct calls to, say, MPI from within the data parallel
program. In [3] we suggest a concrete Java binding for MPI.
We will mention two related projects. Spar [11] is a Java-
based language for array-parallel programming. Like our
language it introduces multi-dimensional arrays, array sections,
and a parallel loop. There are some similarities in syntax, but
semantically Spar is very different to our language. Spar
expresses parallelism but not explicit data placement or
communication--it is a higher level language. ZPL [10] is a new
programming language for scientific computations. Like Spar, it
is an array language. It has an idea of performing computations
over a region, or set of indices. Within a compound statement
prefixed by a region specifier, aligned elements of arrays
distributed over the same region can be accessed. This idea has
certain similarities to our over construct. Communication is
more explicit than Spar, but not as explicit as in the language
discussed in this article.
Implementation of Collectives
In this section we will discusses Java implementation of the
Adlib collective operations. For illustration we concentrate on
the important Remap operation. Although it is a powerful and
general operation, it is actually one of the more simple
collectives to implement in the HPJava framwork. General
algorithms for this primitive have been described by other
Dept. of CSE MESCE, Kuttippuram29
Seminar Report’03 HPJava
authors. For example it is essentially equivalent to the operation
called Regular_Section_Copy_Sched in [2]. In this section we
want to illustrate how this kind of operation can be implemented
in term of the particular Range and Group hierarchies of HPJava
(complemented by suitable set of messaging primitives). All
collective operations in the library are based on communication
schedule objects. Each kind of operation has an associated class
of schedules. Particular instances of these schedules, involving
particular data arrays and other parameters, are created by the
class constructors. Executing a schedule initiates the
communications required to effect the operation. A single
schedule may be executed many times, repeating the same
communication pattern. In this way, especially for iterative
programs, the cost of computations and negotiations involved in
constructing a schedule can often be amortized over many
executions. This paradigm was pioneered in the CHAOS/PARTI
libraries [8]. If a communication pattern is to be executed only
once, simple wrapper functions can be made available to
construct a schedule, execute it, then destroy it. The overhead
of creating the schedule is essentially unavoidable, because
even in the single-use case individual data movements
generally have to be sorted and aggregated, for efficiency. The
associated data structures are just those associated with
schedule construction. Constructor and public method of the
remap schedule for distributed arrays of float element can be
described as follows:
Dept. of CSE MESCE, Kuttippuram30
Seminar Report’03 HPJava
The remap schedule combines two functionalities: it
reorganizes data in the way indicated by the distribution formats
of source and destination array. Also, if the destination array has
a replicated distribution format, it broadcasts data to all copies
of the destination. Here we will concentrate on the former
aspect, which is handled by an object of class RemapSkeleton
contained in every Remap object. During construction of a
RemapSkeleton schedule, all send messages, receive messages,
and internal copy operations implied by execution of the
schedule are enumerated and stored in light-weight data
structures. These messages have to be sorted before sending,
for possible message agglomeration, and to ensure a deadlock-
free communication schedule. These algorithms, and
maintenance of the associated data structures, are dealt with in
a base class of RemapSkeleton called BlockMessSchedule. The
API for the superclass is outlined in Figure 11. To set-up such a
low-level schedule, one makes a series of calls to sendReq and
recvReq to define the required messages. Messages are
characterized by an offset in some local array segment, and a
set of strides and extents parameterizing a multi-dimensional
patch of the (flat Java) array. Finally the build() operation does
any necessary processing of the message lists. The schedule is
executed in a ``forward'' or ``backward'' direction by invoking
gather() or scatter().
Dept. of CSE MESCE, Kuttippuram31
Seminar Report’03 HPJava
Figure 11: API of the class BlockMessSchedule
5.2. Titanium
The Titanium language is designed to support high-
performance scientific applications. Historically, few languages
that made such a claim have achieved a significant degree of
serious use by scientific programmers. Among the reasons are
the high learning curve for such languages, the dependence on
“heroic” parallelizing compiler technology and the consequent
absence of compilers and tools, and the incompatibilities with
languages used for libraries. Our goal is to provide a language
that gives its users access to modern program structuring
through the use of object-oriented technology, that enables its
users to write explicitly parallel code to exploit their
understanding of the computation, and that has a compiler that
uses optimizing compiler technology where it is reliable and
gives predictable results. The starting design point for Titanium
is Java. We have chosen Java for several reasons. Although
Dept. of CSE MESCE, Kuttippuram32
Seminar Report’03 HPJava
Titanium project extend the Java language for scientific
computing but “compile down” to C or C++.Those approaches
offer the hope of tuning for higher ultimate performance, but
sacrifice various benefits of the full Java platform.
5.3. FIDIL
The multidimensional array support in HPJava is strongly
influenced by FIDIL maps and domains [6, 11]. HPJava, however,
sacrifices expressiveness for performance. FIDIL maps have
arbitrary shapes. FIDIL has only a general domain type, thus
making it harder to optimize code that uses the more common
rectangular kind.
5.4. Split-C
The parallel execution model and global address space
support in HPJava are closely related to Split-C [5] and AC [3].
HPJava shares a common communication layer with Split-C on
distributed memory machines, which we have extended as part
of the HPJava project to run on shared memory machines. Split-
C differs from HPJava in that the default pointer type is local
rather than global; a local pointer default simplifies interfacing
to existing sequential code, but a global default makes it easier
to port shared memory applications to distributed memory
machines. Split-C uses sequential consistency as its default
consistency model, but provides explicit operators to allow non-
blocking operations to be used. In AC the compiler introduces
Dept. of CSE MESCE, Kuttippuram33
Seminar Report’03 HPJava
non-blocking memory operations automatically, using only
dependence information, not parallel program analysis.
Dept. of CSE MESCE, Kuttippuram34
Seminar Report’03 HPJava
6. Conclusion
Our experience thus far is that Java is a good choice as a
base language: it is easy to extend, and its safety features
greatly simplify the compiler writer’s task. We also believe that
extending Java is easier than obtaining high performance within
Java’s strict language specs (assuming that the latter is at all
feasible). Many of the features of HPJava would be hard or
impossible to achieve as Java libraries, and the compiler would
not be able to perform static analysis and optimizations on
them. HPJava will be most helpful for problems that have some
degree of regularity. To a first approximation, the HPJava’s
domain is similar to HPF’s. Many of the most challenging
problems in modern computational science have irregular
structure, so the value of our language features in those
domains is more controversial.
Dept. of CSE MESCE, Kuttippuram35
Seminar Report’03 HPJava
7. References
[1] G. Fox, “Java and Grande Application,” Computing in
Science & Engineering
[2] URL: http://www.hpjava.org/index.html
[3] URL: http://www.communitygrids.iu.edu/IC2.html
[4] URL: http://www.nacse.org/HPJava/index.html
[5] URL: http://www.hpjava.org/mpijava.html
Dept. of CSE MESCE, Kuttippuram36
Seminar Report’03 HPJava
CONTENTS
1 Introduction
2 Over view of HPJava
2.1 HPJava History
2.2 HPJava Philosophy
3 Characteretics
3.1 Multidimensional arrays
3.2 Process arrays
3.3 Distributed arrays
3.4 The on construct and the active process group
3.5 Other features
4 Considerations in HPJava language design
4.1 Translation scheme
4.1.1 Java packages for HPspmd programming
4.2 Issues in the language design
4.2.1 Extending the Java language
4.2.2 Datatypes in HPJava
4.2.3 Programming convenience
5 Discussion and related work
5.1 Implementation of Collectives
5.2 Titanium
5.3 FIDIL
5.4 Split-C
6 Conclusion
7 References
Dept. of CSE MESCE, Kuttippuram37
Seminar Report’03 HPJava
ABSTRACT
The idea that Java may enable new programming
environments, combining attractive user interfaces with high
performance computation, is gaining increasing attention
amongst computational scientists. Java boasts a direct simplicity
reminiscent of Fortran, but also incorporates many of the
important ideas of modern object-oriented programming. Of
course it comes with an established track-record in the domains
of Web and Internet programming.
The language outlined here provides HPF-like distributed
arrays as language primitives, and new distributed control
constructs to facilitate access to the local elements of these
arrays. In the SPMD mold, the model allows processors the
freedom to independently execute complex procedures on local
elements: it is not limited by SIMD-style array syntax. All access
to non-local array elements must go through library functions--
typically collective communication operations. This puts an
extra onus on the programmer; but making communication
explicit encourages the programmer to write algorithms that
exploit locality, and simplifies the task of the compiler writer. On
the other hand, by providing distributed arrays as language
primitives we are able to simplify error-prone tasks such as
converting between local and global array subscripts and
determining which processor holds a particular element. As in
HPF, it is possible to write programs at a natural level of
abstraction where the meaning is insensitive to the detailed
mapping of elements. Lower-level styles of programming are
also possible.
Dept. of CSE MESCE, Kuttippuram38
Seminar Report’03 HPJava
ACKNOWLEDGEMENTS
I express my sincere thanks to Prof. M.N Agnisarman
Namboothiri (Head of the Department, Computer Science and
Engineering, MESCE), Mr. Zainul Abid (Staff incharge) for their
kind co-operation for presenting the seminar.
I also extend my sincere thanks to all other members of
the faculty of Computer Science and Engineering Department
and my friends for their co-operation and encouragement.
Rony V John
Dept. of CSE MESCE, Kuttippuram39