teragrid giscience gateway: bridging …cisc.gmu.edu/scc/readings/teragrid.pdfthe gateway...
TRANSCRIPT
This article was downloaded by: [George Mason University]On: 06 July 2011, At: 12:40Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
International Journal of GeographicalInformation SciencePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tgis20
TeraGrid GIScience Gateway: Bridgingcyberinfrastructure and GIScienceShaowen Wang a & Yan Liu aa Department of Geography and National Center forSupercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Available online: 22 Jun 2011
To cite this article: Shaowen Wang & Yan Liu (2009): TeraGrid GIScience Gateway: Bridgingcyberinfrastructure and GIScience, International Journal of Geographical Information Science, 23:5,631-656
To link to this article: http://dx.doi.org/10.1080/13658810902754977
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching and private study purposes. Anysubstantial or systematic reproduction, re-distribution, re-selling, loan, sub-licensing,systematic supply or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.
Research Article
TeraGrid GIScience Gateway: Bridging cyberinfrastructureand GIScience
SHAOWEN WANG* and YAN LIU
Department of Geography and National Center for Supercomputing Applications,
University of Illinois at Urbana-Champaign, Urbana, IL, USA
(Received 29 January 2008; in final form 15 January 2009)
Cyberinfrastructure (CI) represents the integrated information and communica-
tion technologies for distributed information processing and coordinated knowl-
edge discovery, and is promising to revolutionize how science and engineering are
conducted in the twenty-first century. The value of bridging CI and GIScience is
significant to advance CI and benefit GIScience research and education, particu-
larly in distributed geographic information processing (DGIP). This article pre-
sents a holistic framework that bridges CI and GIScience by integrating CI
capabilities to empower GIScience research and education and establish generic
DGIP services supported by CI. The framework, the TeraGrid GIScience
Gateway, is based on a CI science gateway approach developed on the National
Science Foundation (NSF) TeraGrid – a key element of US and world CI. This
gateway develops a unifying service-oriented framework with respect to its architec-
ture, design, and implementation as well as its integration with the TeraGrid. The
functions of the gateway focus on enabling parallel and distributed processing for
geographical analysis, managing the complexity of TeraGrid software environment,
and establishing a Web-based GIS for the GIScience community to gain shared and
collaborative access to TeraGrid-based geospatial processing services. The gateway
implementation uses Web 2.0 technologies to create a highly configurable and
interactive multiuser environment. Two case studies, Bayesian geostatistical model-
ing and a spatial statistic G�i ðdÞ for detecting local clustering, are used to demonstrate
the gateway functions and user environment. The service transformation for these
analyses is applied to create a shared, decentralized, and collaborative geographical
analysis environment in which GIScience community users can contribute new
analysis services and reuse existing gateway services.
Keywords: Cyberinfrastructure; Geographical analysis; Parallel and distributed
processing; Service-oriented architecture; Web 2.0; Web-based GIS
1. Background
Like the physical infrastructure of transportation, electricity power grids, commu-
nication, and other utility systems that support modern society, cyberinfrastructure
(CI) represents the distributed computer, information, and communication technol-
ogies combined with the personnel and integrating components that provide a long-
term platform to empower the modern scientific research endeavor (Atkins et al.
2003). CI also refers to the development and use of integrated and distributed
*Corresponding author. Email: [email protected]
International Journal of Geographical Information ScienceISSN 1365-8816 print/ISSN 1362-3087 online # 2009 Taylor & Francis
http://www.tandf.co.uk/journalsDOI: 10.1080/13658810902754977
International Journal of Geographical Information Science
Vol. 23, No. 5, May 2009, 631–656
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
information infrastructure that enables and accelerates the discoveries of science and
engineering. E-Science is a similar term originating in Europe with an emphasis on the
transformative roles of CI in science and engineering practices. Although coming
from slightly different perspectives, both CI and e-Science represent the powerful
paradigm in which distributed computer and knowledge systems and information andcommunication technologies are integrated to provide services to enable large-scale
and collaborative sciences and engineering.
Many domains have developed and used CI to support their sciences, and reported
promising results (GEON 2008, NEON 2008, SEEK 2008, WATERS 2008).
However, significant challenges remain to fully exploit CI in science and engineering
inquiry mainly because 1) CI technologies are sophisticated and continue to evolve;
and 2) CI workforce in science and engineering requires significant efforts to educate.
From a technological point of view, CI comprises computing systems, data, informa-tion resources, networking, digitally enabled sensors, instruments, virtual organiza-
tions (VOs), and observatories, along with an interoperable suite of software services
and tools (NSF 2007). Among these components, three types of capabilities form the
current CI core: high-performance computing; data handling, analysis, and visualiza-
tion; and VO for distributed communities (NSF 2007). These capabilities collectively
offer great opportunities to revolutionize the way that Geographical Information
Science (GIScience) evolves based on developing and using high-performance, dis-
tributed, and collaborative Geographical Information Systems (GIS).The GIScience community has been exploiting individual CI capabilities for years.
Web-based GIS solutions (e.g., Tsou and Buttenfield 2002, Yang et al. 2005) have
been developed to support online data access and map-based visualization, and
extended to facilitate the coupling of GIS and modeling (Wright et al. 2003). This
type of coupling can further be enhanced to enable collaborative and/or computa-
tionally intensive distributed geographical information processing (DGIP, Yang and
Raskin 2009) by scaling Web-based GIS from Web servers to CI. High-performance
and/or Grid computing have been used to solve computationally intensive geogra-phical analysis problems (Armstrong and Densham 1992, Wang and Armstrong 2003,
Wang et al. 2008). Moreover, ontology-driven GIS have been recognized to address
semantic integration issues for geographic knowledge development and communica-
tion (Fonseca et al. 2002). Although research and development in these areas help
understand how GIScience and CI may benefit from each other, to effectively estab-
lish and exploit the bridge between GIScience and CI requires the treatment of CI as
an integrative framework comprising a balanced, seamless blending of individual CI
capabilities (NSF 2007).The GIScience community can contribute to the understanding and development of
generic geospatial CI to enable high-performance, distributed, and collaborative GIS as
it has been examining the theoretical and practical underpinnings for geospatial tech-
nologies to be applied across many application domains. This contribution is essential
to establish coherent generic geospatial CI for domain science communities, such as
geosciences and ecological science (see GEON 2008, Yan Liu 2008, NEON 2003, SEEK
2008), to develop their domain-specific geospatial CI. Although these communities
have developed their domain-specific VO based on middleware provided by, forexample, the Globus Toolkit (http://globus.org), bridging GIScience to CI would help
address their common needs of generic geospatial CI (Zhang and Tsou 2009).
Recognizing that CI itself is an evolving research area, exhibiting complexity at
both its framework and its capability levels, a science or engineering gateway
632 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
approach has been proposed as an integrated community-specific set of tools, appli-
cations, and data collections that provide a research community with a shared online
environment to enable domain-specific collaborative problem solving through seam-
less access to CI capabilities (Catlett et al. 2007, Wilkins-Diehr 2007). The purpose of
this research is to address the ultimate challenge of bridging CI and GIScience byestablishing the TeraGrid GIScience Gateway (hereafter, the Gateway).
In the remainder of this article, Section 2 discusses the development strategies of the
Gateway. Section 3 describes its Service-Oriented Architecture (SOA) and design for
a service infrastructure that supports the deployment of services and a collaborative
geospatial processing environment. On the basis of this SOA, Section 4 presents a
Web service-based implementation of the service infrastructure and user environ-
ment. Section 5 presents two case studies to demonstrate how to integrate geographi-
cal analysis into the Gateway and leverage TeraGrid capabilities for collaborative andcomputationally intensive geospatial processing. Section 6 concludes and Section 7
discusses the significance of this research, ongoing research themes, and future
research directions.
2. Introduction and strategies
The Gateway, based on the NSF TeraGrid (a key element of US and world CI; see
http://www.teragrid.org), integrates CI capabilities and manages the associated com-
plexity to empower GIScience and establish generic geospatial CI. The Gateway
provides an open CI-based platform for GIScience community users to conduct and
share collaborative, computationally intensive geographical analysis.
2.1 TeraGrid GIScience Gateway
TeraGrid is an open scientific discovery infrastructure that consolidates high-
performance computing resources at eleven partner sites to provide an integrated
and persistent computational resource. The Gateway is one of over two dozen science
and engineering gateways that facilitate application development and TeraGrid
operations. In general, these gateways take different design and implementationapproaches that are tailored to specific computation needs of individual domain
communities. For example, differences are reflected in user interfaces based on the
adoption and customization of diverse command-line or Web-based technologies.
Although TeraGrid has started to analyze and identify common infrastructure uti-
lities that would be applicable across gateways, few such utilities have been developed
to focus on application-level capabilities.
The Gateway is based on the GISolve Toolkit (http://www.gisolve.org) and focuses
on the development and provision of generic geospatial CI capabilities and DGIPservices. The Gateway is accessible through Web interfaces based on an SOA that
bridges CI capabilities with GIS and geographical analysis functions to manage CI
complexity and enable collaborative and computationally intensive geographical
analysis. SOA has become an effective approach to building interoperable and
reusable software components and integrating software components for the develop-
ment of scalable systems. Most cutting-edge CI capabilities are based on SOA, and
implemented using Web services. Whereas geospatial Web services have been studied
to support spatial queries, map browsing, and metadata indexing, the SOA-basedGateway focuses on geospatial services that are integrated with CI capabilities.
TeraGrid GIScience gateway 633
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
2.2 Requirements
The Gateway was designed to enable GIScience community users to simultaneously
conduct collaborative and computationally intensive geographical analysis based on
scalable access to high-performance TeraGrid resources. Specific requirements are
summarized as follows:
l Scalability: TeraGrid access should be scalable to 1) the quantity of TeraGridcomputational resources; 2) the number of GIScience community users; and 3)
the computational intensity (Wang 2008) of geographical analyses.
l Usability and reusability: The Gateway capabilities should be user-friendly,
shared, and reused by a large number of GIScience community users.
l Interoperability: Gateway capabilities should be interoperable with each other,
and the underlying TeraGrid capabilities.
l Sustainability: The Gateway should be adaptive to advancements in geographi-
cal analysis methods and TeraGrid technologies. As the implementations of theGateway capabilities are improved, their interfaces should remain consistent for
the sustainability of the Gateway access.
2.3 Strategies
The requirements discussed above have been addressed in the Gateway development
based on three strategies: tackling computational intensity of geographical analysis,
managing TeraGrid complexity, and establishing a scalable SOA.
2.3.1 Tackling computational intensity of geographical analysis. Geographical ana-
lysis based on the use of heuristic search, simulation, optimization, and statistical
methods is often computationally intensive (Malanson and Armstrong 1996,
Densham and Armstrong 1998, Krzanowski and Raper 1999, Huang et al. 2002,
Wang and Zhu 2008). Enormous computational resources are needed to store and
manage geographical information, and conduct computationally intensive geogra-phical analysis. These required resources are often unavailable on a single computer.
Consequently, researchers have begun to exploit advances in CI to access required
computer resources from disparate and heterogeneous sources, for example, provided
by TeraGrid (Wang et al. 2008). A key strategy to tackle computationally intensive
geographical analysis on TeraGrid is to parallelize and distribute geographical infor-
mation processing. TeraGrid can be viewed as a massive parallel and distributed
computing environment that integrates individual computing resources built on various
parallel computing architectures. Parallelization and distribution can be used in geo-graphical analysis to exploit the computing power available at the level of individual
computing resources as well as the distributed parallel computing capacity integrated
through the TeraGrid software environment. Each geographical analysis may be
decomposed into parts that are computed in parallel. The Gateway supports parallel
and distributed processing of geographical analysis tasks submitted by multiple users.
To achieve high-performance geographical analysis and scalable resource use, it must
assure load balancing in the decomposed parts of each analysis and the entire set of
concurrent analyses.
2.3.2 Managing TeraGrid complexity. In recent years, significant progress has been
made in the use of TeraGrid capabilities for collaborative problem solving based on
access to high-performance computing resources. However, the TeraGrid will remain
634 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
complex to many scientists because its capabilities 1) have not been developed to
directly focus on the requirements of domain-specific problem solving; and 2) will
continue to evolve as driven by CI advancements. As a specific strategy to serve
rapidly growing GIScience community users, the Gateway uses the TeraGrid com-
munity account model (TeraGrid 2008) to facilitate scalable user access.TeraGrid capabilities are provided through a Coordinated TeraGrid Software and
Services (CTSS 2008) environment that is deployed on each TeraGrid resource. Major
software components in this environment are designed for 1) remote computation, 2)
distributed data transfer, 3) information services for resource management and dis-
covery, and 4) security mechanisms for shared computation resource access by dis-
tributed users. The environment requires system administration for high-performance
computational resources to carefully configure these components and their interfaces.
The effective use of these components for geographical analysis is complex because
l each component includes sophisticated Application Programming Interfaces
(API) and software tools;
l interactions among components require significant efforts to coordinate and
control in geographical analysis; and
l each component has various interfaces with respect to the management of
heterogeneous high-performance and distributed computational resources.
As a bridge between TeraGrid and GIScience, the Gateway naturally fits require-
ments for CI complexity management that enables the efficient and effective use of
TeraGrid for geographical analysis.
2.3.3 Establishing a scalable service framework. The Gateway integrates TeraGrid
capabilities and computationally intensive geographical analysis through the devel-opment and use of an SOA. The Gateway SOA supports the decoupling of the
publication, implementation, deployment, and ownership of both geographical ana-
lysis and TeraGrid services to facilitate a collaborative environment in which various
service providers can contribute.
SOA and its Web service implementations have been widely adapted to build
scalable information technology infrastructures in both academic research and indus-
try (Bieberstein et al. 2005). This trend has been reflected in recent GIS development
(see http://www. opengeospatial.org and http://www.esri.com). Web services useXML artifacts to support interoperability among SOA components (Foster 2006).
According to the World Wide Web Consortium (http://www.w3c.org), ‘an SOA is a
specific type of distributed system in which a service is a software agent that performs some
well-defined operation (i.e., provides a service) and can be invoked outside of the context
of a larger application’. By separating the service interface description and its imple-
mentation and making its service interface universally addressable (e.g., using a uni-
versal resource identifier, often referred to as URI), a service is flexible and adaptable in
implementation by service providers and in on-demand remote access by service con-sumers. A Gateway service, generally defined as a functional unit of the Gateway that
originates from components in the TeraGrid software environment and geographical
analysis, is identified and published according to the principle of generality. Once
published, a service is self-contained and operationally loosely coupled with other
services.
Several TeraGrid services (see Table 1) are available in the Gateway to support
geographical analysis. These services are provided by the service-oriented Globus
TeraGrid GIScience gateway 635
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
Toolkit (Foster 2006) and deployed as interoperable Grid services. When a geogra-
phical analysis is executed, these TeraGrid services are invoked dynamically for
gateway authentication and authorization, data transfer, computation resource dis-
covery and selection, computation execution and monitoring, and visualization. A
basic principle for the management of TeraGrid complexity is to develop a set of
common utilities for geographical analysis that encapsulate the functions of TeraGridsoftware components. This encapsulation is achieved within a scalable SOA to enable
the GIScience community to focus on geographical analysis when using the TeraGrid.
3. Service-oriented architecture and design
This section presents a unifying Gateway SOA and its design to integrate TeraGrid and
geospatial services in a scalable manner. The Gateway SOA is developed to manage
TeraGrid complexity and supports the parallel and distributed processing of computa-
tionally intensive geographical analysis. Compared to conventional GIS solutions
(e.g., ESRI software) that have been developed through the direct support of operating
systems (e.g., Windows and Linux), the Gateway adheres to an SOA approach using a
suite of interoperable middleware that integrate CI capabilities across multiple operatingsystems (Figure 1). Furthermore, conventional GIS are primarily run in single desktop
or server mode, whereas the Gateway is run in an integrated environment of networked
high-performance computers that are geographically distributed and orchestrated using
Table 1. A list of TeraGrid services.
Type TeraGrid services
Framework WS-Core (core Web services)Computing WS-GRAM (Web service-based Grid resource allocation and management)Data RFT (reliable file transfer service)Information WS-MDS (Web service-based monitoring and discovery system)Security WS-AA (Web service-based authentication and authorization)
Figure 1. Gateway design and its analogy to ESRI GIS solutions (EGEE: Enabling Grids forE-Science, see http://public.eu-egee.org; OSG: Open Science Grid, see http://www.openscience-grid.org) [49 · 27 mm (600 · 600 DPI)].
636 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
TeraGrid middleware. The user environment is designed to achieve scalable Web con-
tent integration and presentation through the Gateway SOA.
3.1 Architecture
The Gateway SOA mainly includes a service infrastructure and user environment
(Figure 2). The service infrastructure provides a decentralized service computing
platform based on publish-subscribe models (Eugsterr et al. 2003) to support the
provisioning and execution of TeraGrid and geospatial services. The user environ-
ment interacts with users and needed services to conduct geographical analysis. The
Gateway uses the Grid Security Infrastructure (Welch et al. 2004), a de facto CI
security standard, as a unified solution to secure access to the Gateway and TeraGrid.
Besides runtime support, the SOA enables the development and integration of service-based geographical analysis.
Gateway services are built into the service infrastructure and provide support for
explicit user interactions. A user interface is defined as part of a service interface to
support user interactions with individual Gateway services. As illustrated in
Figure 3, the adoption of SOA helps achieve the interoperability, scalability, and
reusability of software components. By transforming geographical analysis (e.g., A,
Figure 2. Gateway architecture [30 · 30 mm (600 · 600 DPI)].
TeraGrid GIScience gateway 637
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
B, and C in Figure 3) into service components and deploying them into service
containers in the service infrastructure, users can access individual services regard-
less of where these services are provided or hosted. Because service interface defini-
tions are described in machine-readable formats with XML artifacts, individual
services can be independently developed by community users, deployed on commu-nity service providers, and invoked as on-demand services, as opposed to static
building and sequential executions in traditional development of geographical
applications. As the Gateway evolves, its services are reusable through service
composition that implements new analysis logic by dynamically binding existing
services. Access to single or multiple TeraGrid resources required by geographical
analysis is triggered at an individual service level and supported by TeraGrid hard-
ware and software infrastructure.
Figure 3. Conceptual illustration of the Gateway (TeraGrid image source: http://www.tera-grid.org) [29 · 39 mm (600 · 600 DPI)].
638 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
In the Gateway SOA, each geographical analysis is designed as a composite service
(Figure 4). The execution of each service is driven by coordinated user interventions
and invocations specified within the service definition. Each composite service is
created and managed based on a publish-subscribe model using a set of provisioning
services with the following functions (Figure 5):
Figure 4. Streamlined execution of a composite service with four component services (Eachcomponent service has user interaction defined as part of service interface and can be optionallyconfigured in a composite service; the execution of a composite service interacts with the Gatewayuser environment according to geographical analysis logic) [48 · 32 mm (600 · 600 DPI)].
Figure 5. Internal architecture of the Gateway publish-subscribe framework (DB: database)[40 · 32 mm (600 · 600 DPI)].
TeraGrid GIScience gateway 639
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
l Service creation, which transforms a geographical analysis into a Gateway
composite service. The service interface is defined using the Web Service
Description Language (WSDL), a standard service definition language.
l Service discovery, responsible for finding interesting and available services.
During the execution of services, dynamically available service instances arediscovered for runtime service selection.
l Service orchestration, which supports service runtime management.
l Service provenance, created and managed for each service to track its state
evolution and runtime information.
The publish-subscribe model allows a service to be registered by publishing a
service interface definition and runtime address for service discovery. The service
interface is considered static metadata for service management. Whereas static service
information facilitates reuse and composition, dynamic service information commu-
nicates service availability and status.
3.2 Design
The Gateway SOA design focuses on building geospatial services and providing a
Gateway user environment. A geographical analysis is ported into the SOA through a
componentization design process, which decomposes an analysis computing logic into
loosely coupled, self-contained component services. The design of geographical analysis
services as Gateway composite services is placed within the context of the GISolve
Toolkit, which provides a problem-solving environment based on high-performanceand Grid computing (Wang et al. 2005). The Gateway integrates parallel processing,
data handling, and visualization capabilities from GISolve into the Gateway geogra-
phical analysis services. A service-based user environment is designed based on user
interface elements that are incorporated within each service.
3.2.1 Parallel and distributed processing of geographical analysis. Services for parallel
and distributed processing of geographical analysis mainly include functions for domain
decomposition and task scheduling. Domain decomposition services decompose a large
geographical analysis problem into components that can be handled in parallel
using TeraGrid resources. TeraGrid provides a diverse set of high-performance compu-
tational resources that can be used to exploit the parallelism in geographical analysis.Depending on feasible parallelization in a particular analysis, a specific parallel comput-
ing architecture of TeraGrid resources must be determined before designing an assign-
ment of decomposed analysis components to appropriate computing resources. The
parallel computing architecture of individual TeraGrid resources includes both shared-
memory and distributed-memory, whereas multiple TeraGrid resources can be dynami-
cally aggregated based on the Grid architecture.
According to a theoretical approach proposed by Wang and Armstrong (2009),
domain decomposition services are developed in two stages. First, region quadtreesand space-filling curves are used to decompose spatial computational domains, as
such domains can be represented as a field-based model and region quadtrees are
appropriate for the decomposition of field-based representations. Second, the results
of decomposing spatial computational domains are used to guide the decomposition
based on appropriate spatial data structures such as those summarized by Ding and
Densham (1996). This decoupling approach guides the service-oriented design of
parallel processing functions for geographical analysis in the Gateway. The input
640 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
for task-scheduling services includes the estimation of computational intensity for a
particular geographical analysis based on the representation of spatial computational
domain and dynamic resource status (Wang and Armstrong 2009). The output is an
execution plan that assigns each geographical analysis part to an available TeraGrid
resource (e.g., CPU in tightly coupled supercomputing architecture, or batch queue inthe Grid architecture). Domain decomposition is designed as a generalized service for
geographical analysis whereas task-scheduling solutions are specific to the architec-
ture of selected TeraGrid resources.
3.2.2 Service-oriented user environment. The Gateway is designed as a research and
education environment for the GIScience community where each community user
can be a contributor and/or consumer of geographical analysis services. Through
the Gateway, users can conduct collaborative analysis that harnesses enormousTeraGrid resources. SOA defines a service interface in a machine-readable format
such as WSDL, whereas the Web-based user interface is based on HTML and Web
scripting languages (e.g., JavaScript). A service interface defines the input and
output of the service logic. In the execution of a composite service that invokes
other services, user interactions are often needed at the component service level. For
example, a Gateway user may explicitly specify the input parameters of a geogra-
phical analysis service, check the progress of component services being executed,
and retrieve and visualize service output. Therefore, it is necessary to include inter-face support for user interactions in the service definition, which could enable the
seamless integration of user and service interactions in a streamlined composite
service.
Web 2.0 technologies play an important role in establishing the service-oriented
user environment (Goodchild 2007). Generally, Web 2.0 technologies have been
applied to support geospatial services and portals [see Yang et al. (2007) for details].
Specifically, we use Web 2.0 technologies for Web content management in the
Gateway user environment and to facilitate user interactions. For example, a serviceinterface uses portlet (JSR-168 2008) and AJAX (2008) technologies to achieve on-
demand content loading for dynamic presentation of analysis status. Unlike conven-
tional Web content management systems, each Gateway service has its own user
interface information to enable interaction with the individual service, which is
designed through a novel use of Web 2.0 technologies. When a composite service is
orchestrated, a Web-based user environment is automatically created by combining
user interface data from the services through mashups. The user environment is,
therefore, designed and maintained as a decentralized Web content managementsystem.
3.2.3 User environment management. The Gateway user environment is designed to
be highly configurable and user-friendly for TeraGrid-based geographical analysis.
Through interactions with the service infrastructure, the user environment integrates
user-related events, contents, and service information to enable the execution, mon-
itoring, and visualization of Web-based service execution. This integration is sup-
ported by the following functions:
l User management support for user registration, service orchestration and sub-
scription, and community security enforcement.
l User data management support for data transfer and service result retrieval.
TeraGrid GIScience gateway 641
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
l Service management support for the interactive creation of composite services
and interactions with Gateway service infrastructure to create the runtime user
environment.
l User interface management support for interactions between users and services.
4. Implementation
The Gateway is implemented according to its SOA design. For each service, a subset of
the Web service protocol stack is used to define the service interface, which is
implementation-independent and universally addressable. To support the execution
of composite services and user-service interactions, services are implemented as ‘state-
ful’ Web services by following the Web Service Resource Framework (WSRF 2008)standard. A service definition includes four parts: WSDL schema, service classification
metadata, service status metadata, and user interaction metadata. With service state
information, the execution of a composite service is managed as a workflow that
proceeds based on the progress of each invoked service.
The reuse and composition of services rely on a publish-subscribe model that is
accessed through Gateway provisioning services. The publish-subscribe model is a
topic-based implementation in which service classification metadata is considered as
the topic description of a service. The current implementation of the publish-subscribemodel uses a centralized method based on databases. The user environment is imple-
mented using Web portal technologies, following the Java Portlet API standard (JSR-
168 2008, also see Yang et al. 2007). Each execution of a geographical analysis service
is presented in the portal as a portal layout that is configured and rendered dynami-
cally. This layout includes a set of portlets responsible for rendering user interfaces of
the services involved.
5. Case study
The Gateway currently supports three types of computationally intensive geographical
analysis: Bayesian geostatistical modeling (Yan et al. 2007), detection of local spatialclustering (Wang et al. 2008), and inverse distance weighted spatial interpolation (Wang
and Armstrong 2003). Other geographical analysis services (e.g., simulation models; see
Wang and Zhu 2008) are being developed. In this section we present a case study on
incorporating the G�i ðdÞ spatial statistic for detecting local clustering as a composite
service into the Gateway for shared access by the GIScience community. A second case
study shows how the Gateway supports shared access to the integrated computation of
collaborative Bayesian geostatistical modeling using tightly coupled high-performance
computers individually, as well as collectively as Grids.
5.1 The Gi*(d)spatial statistic
G�i ðdÞ, a spatial statistical approach originally introduced by Getis and Ord (1992), isa measure of the local spatial association among point-referenced observations. This
statistic has been widely applied to identify clusters of points (i.e., hot spots) where
values of a variable are significantly high or low compared with the remaining
geographical region. G�i ðdÞ focuses on local spatial structure, in contrast to popular
global assessment measures, such as Geary’s C and Moran’s I (Cliff and Ord 1973).
All pair-wise distances between measurement locations must be computed to obtain
G�i ðdÞ. Consequently, G�i ðdÞ analysis consumes significant memory and is
642 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
computationally intensive when the number of locations is large. Wang et al. (2008)
have developed a parallel processing method to compute the G�i ðdÞspatial statistic
using the TeraGrid. It is beyond the scope of this case study to investigate this
parallelization method; instead, our focus is on how the method is adapted to the
Gateway services for future access and sharing through the user environment.
5.1.1 Gi*(d)service. The G�i ðdÞ service is based on existing and new services that
support the specific needs of G�i ðdÞ analysis (see Figure 6). The execution of a G�i ðdÞservice instance can be summarized in the following steps:
(1) Select service hosts and services;
(2) Get TeraGrid security proxy from a Gateway community account;
(3) Create a G�i ðdÞ data set through data simulation/query services;(4) Transfer data sets from the portal to a domain decomposition service host;
(5) Invoke a domain decomposition service (including the selection of optimalspatial decomposition strategies);
(6) Select TeraGrid computing resources;
(7) Invoke a task scheduling service to create a runtime schedule;
(8) Transfer decomposed data to selected TeraGrid resources;
(9) Execute the schedule on selected TeraGrid resources and monitor computa-
tion progress; and
(10) Transfer results from TeraGrid resources to the Gateway portal.
Figure 6. G�i ðdÞ service execution (DC: domain decomposition; TS: task scheduling; RFT:reliable file transfer service; WS-GRAM: Web service-based Grid resource allocation andmanagement; WS-AA: Web service-based authentication and authorization) [39 · 33 mm(600 · 600 DPI)].
TeraGrid GIScience gateway 643
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
Reused services include TeraGrid services (2, 4, 6, 8, 9, and 10 in Figure 6), service
infrastructure services (1 in Figure 6), and geographical analysis services (5 and 7 in
Figure 6). Users interact with these services based on the interfaces defined as velocity
(http://velocity.apache.org) templates and rendered in the G�i ðdÞ portal layout (see
Figure 7A and B).Current implementation of the domain decomposition service is specific to G�i ðdÞ
analysis (Figure 8). Additional domain decomposition strategies may be contrib-
uted as new services, and should include a spatial computational intensity measure
for each decomposed part of the corresponding analysis. Task-scheduling services
require this type of measure to evaluate load balance in using TeraGrid resources.
Both domain decomposition and task-scheduling services provide user interfaces
for choosing alternative decomposition and scheduling strategies. Figure 6 illus-
trates interactions among the component services. On the user portal side, sixportlets are designed for users to interact with the component services (see
Figure 7A and B). Depending on the progress of a G�i ðdÞ analysis, user interaction
focus is dynamically placed on a particular portlet by maximizing or highlighting
the portlet.
5.1.2 Service security overhead. Interactions between services and users through
portlets add overhead to the performance of G�i ðdÞ analysis. Service overhead is
introduced without a particular order by
(1) communications between portal server and service host, which are negligible
compared with other sources of impact;
(2) security added to message exchange;(3) middleware implementation of Web services; and
(4) Web server performance of processing XML-based requests.
Although the first type of overhead is often negligible and the third and fourth
types can be well managed in production-quality systems (Tian et al. 2003,
Humphrey et al. 2005), the second type remains as the most important factor.
Gateway’s security assurance must be implemented within all types of services to
enable collaborative geographical analysis services to be shared online. We there-
fore focus on assessing the service overhead added by security. Gateway service
interactions are secured using X.509 certificate-based authentication, authoriza-tion, and message encryption. Specifically, a X.509 certificate is used to create
symmetric keys for message encryption and sign messages for mutual authentication
between senders and receivers of service messages. Authorization is based on GSI
identity mapping (Welch et al. 2004).
In this case study, security-related overhead is measured using average round-trip
service response time. Three scenarios were designed for the evaluation of security-
related overhead: no security, transport-level security, and message-level security.
When the sender and receiver of a message are located within a single protectednetwork, network communication performance can be improved using the plain
HTTP (given sufficient protection enforcement on the network). This no-security
scenario applies only if the Gateway user environment coexists with service hosts in a
protected network environment with firewalls for external access. Transport-level
security establishes secure HTTP communication channels without support for dele-
gated message delivery that is needed when computation job submission does not
directly communicate with execution sites (e.g., inter-Grid job submission). If a
644 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
composite service involves frequent communications between service containers,
transport-level security can be used to secure the entire communication channel toreuse channel-based security contexts on all messages carried over the channel.
Message-level security supports end-to-end message delivery by applying digital
Figure 7. (A and B) User environment for G�i ðdÞ analysis [38 · 50 mm (600 · 600 DPI)].
TeraGrid GIScience gateway 645
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
signatures and encryptions to messages and is, therefore, suitable for composite
services in which a message traverses multiple service containers.A set of experiments was conducted in a local area network environment with a
service host and a group of service requesters. A G�i ðdÞ domain decomposition service
Figure 8. Domain decomposition service definition for G�i ðdÞ spatial statistic [39 · 54 mm(600 · 600 DPI)].
646 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
was used (Wang et al. 2008). Average response time (excluding computation time) was
measured as a function of the number of requesters who invoke the service simulta-
neously. As illustrated in Figure 9, security-enabled message communications
increase the service response time by an order of magnitude from tens of milliseconds
without security to hundreds with security. When security is added, message-level
security has larger overhead because of its message-by-message encryption and sign-
ing. Transport-level security leads to less overhead because encryption and signing arestreamlined on all bits transferred through communication channels. On the basis of
these experimental results, communications between the Gateway portal and service
hosts, which involve frequent service invocations, should use transport-level security.
A composite service that invokes services hosted by different service containers should
use message-level security to sign and encrypt Simple Object Access Protocol (SOAP)
messages instead of creating multiple secure communication channels between the
portal and the host services.
5.2 Bayesian geostatistical modeling
Compared to classical geostatistical modeling, Bayesian geostatistical modeling pro-
vides realistic estimation of prediction error as well as the ability to combine informa-
tion from disparate data sources (Cowles et al. 2002). However, Bayesian geostatistical
modeling, when using Markov chain Monte Carlo (MCMC) methods, is even more
computationally intensive than conventional geostatistical modeling (Cowles 2003).
Because an MCMC sampler must be run for thousands of iterations, each requiringcomputationally intensive linear algebra operations, the runtime for sequential
Bayesian geostatistical modeling algorithms can be prohibitive. Even with parallel
MCMC algorithms running on a single high-performance computer, runtime may be
unacceptable, especially for large geographical data sets.
Figure 9. Performance evaluation of service security overhead (ms: millisecond) [48 · 31 mm(600 · 600 DPI)].
TeraGrid GIScience gateway 647
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
5.2.1 Bayesian geostatistical modeling service. We focus on demonstrating how
Bayesian geostatistical modeling is incorporated into the Gateway so that the
GIScience community can share the modeling approach enabled by the TeraGrid.
Different from the componentization of the G�i ðdÞ service that produces computation
jobs based on the Grid architecture, the componentization of the Bayesian geostatisticalmodeling addresses the efficient design of a composite service that supports both tightly
and loosely coupled parallelization within a single geographical analysis. To enable the
sharing of Bayesian geostatistical modeling among numerous users, the Gateway exploits
the two-level parallelism, manages MPICH-G2 (Karonis et al. 2002) execution across
multiple TeraGrid high-performance computers, and harnesses a significant amount of
computational resource based on the unified TeraGrid software environment (Figure 10).
Component services are devised to assure computational resource availability, failure
recovery of an analysis process, and scalable use of a set of matrix manipulation librariescustomized to individual TeraGrid resources. The analysis is also implemented to support
interactive parameter tuning and Markov chain convergence diagnostics during the
execution. The selection of statistical parameters can be refined during the analysis.
Therefore, the componentization considers how to mange and present intermediate
results and provides stop-resume support during analysis computation.
The Bayesian geostatistical modeling service is composed of component services
categorized as TeraGrid, provisioning, and geospatial services [based on the same
componentization process for the G�i ðdÞ service]. Specific to the Bayesian geostatis-tical modeling service, a new computation management service is developed to
coordinate computation execution based on MPICH-G2. This service invokes
MPICH-G2 on a selected TeraGrid computer. MPICH-G2 then launches MPI jobs
on the reserved TeraGrid computers. User interfaces on specifying Bayesian inference
parameters and TeraGrid resource reservation are defined in the service.
5.2.2 Collaborative Bayesian geostatistical modeling. The Bayesian geostatistical
modeling service is developed to enable collaborative analysis based on the GatewayVO capabilities to share data, analysis progress, and results among Gateway users
(through the TeraGrid community account model). Because the specification of prior
distributions and initial parameter values and the diagnostics of Markov chains require
iterative refining and expert knowledge for analysis, collaboration among users becomes
desirable to cross validate the analysis and expedite the convergence of analysis
Figure 10. Using TeraGrid to compute Bayesian geostatistical modeling [52 · 22 mm(600 · 600 DPI)].
648 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
computation. This service allows a user to form dynamic VOs with collaborators to co-
analyze a shared data set. Each user can specify initial parameter values, prior distribu-
tions, and the number of MCMC iterations, and choose to share initial specifications,
dynamic MCMC convergence information, and results with selected collaborators. Each
user can then compare analysis information with her/his collaborators. A particularanalysis instance can be interactively stopped and resumed by its owner to refine
parameter values based on comparison among collaborators.
5.2.3 Experiment. An experiment was designed to demonstrate how the Bayesian
geostatistical modeling service enables a VO of 30 users to collaboratively analyze adata set by aggregating a substantial amount of TeraGrid resources (Figure 11A and B).
Figure 11. (A and B) User interface for the Bayesian geostatistical modeling service[54 · 23 mm (600 · 600 DPI)].
TeraGrid GIScience gateway 649
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
Each user ran 5–10 Markov chains, each of which required 16 CPUs and approximately
10-hour computations. In total, 38,000 CPU hours were contributed by three TeraGrid
sites to support this experiment. When a user logs in the Gateway user environment, an
integrated interface is rendered for the composite service to provide a comprehensive view
of computation progress, analysis configurations, results, and collaborative space. Eachuser can view analysis progress (including parameters, runtime results, and visualization)
of other participants through a collaboration interface (i.e., the ‘Similar Analyses’ area in
Figure 11A). The content of the collaboration interface is rendered and updated dyna-
mically and asynchronously by the backend service using AJAX technology. For exam-
ple, user3 was at the beginning of an analysis that provides early stage visualization of
modeling parameters. Through the collaboration interface, user1 was found to have
achieved converged results. Therefore, the information about user1’s analysis (i.e., para-
meter settings and plots that show Markov chain diagnosis information) is examined byuser3. By validating and/or learning from user1’s results, user3 can further tune an in-
progress computation by stopping it (marked ‘paused by user’ in the collaboration
interface), supplying a modified configuration file, and resuming the computation. This
experiment shows that a sizable group collaborated on a single analysis through the use of
the Gateway VO support for the dynamic sharing of analysis results and computation
progress. This collaborative analysis allows self-organizing contributions of expert
knowledge and exploration of modeling parameter space. Without imposing any over-
head on end users, the service is able to aggregate significant computational power andfacilitate a collaborative analysis to achieve high-quality results that any user may not
achieve individually.
6. Conclusion
A primary purpose of this research is to develop a shared, decentralized geographical
analysis environment in which a large number of GIScience community users can
contribute and share new analysis services and reuse existing services. The SOAenables the Gateway to adapt to changes in geographical analysis methods and
their underlying TeraGrid capabilities by conforming to interoperable service stan-
dards. The user environment is implemented to emphasize support for users to carry
out service-based geographical analysis. DGIP methods are incorporated as general-
ized services to enable scalable access to TeraGrid resources. The generalization is
derived from a theoretical construct – spatial computational domain representation
developed by Wang and Armstrong (2009).
Currently, the Gateway has implemented an online collaborative DGIP environ-ment that is shared by hundreds of users from multiple disciplines (e.g., biology,
computer science, geography, linguistics, public health, and statistics) who are inter-
ested in advancing GIScience or using CI and GIS. Three types of computationally
intensive geographical analyses (Bayesian geostatistical modeling, detection of local
spatial clustering, and inverse distance weighted spatial interpolation) are made
available as production-level services. Gateway users collaborate by contributing
and sharing component and composite services, and working together on a common
analysis such as the Bayesian geostatistical modeling.Two case studies are used to demonstrate the utility of the Gateway functions and
user environment. The studies prove the feasibility of developing generic geospatial
CI services that provide CI-based collaborative geographical analysis functions
shared by numerous users while managing the complexity of CI hardware and
650 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
software environments. This feasibility shows that GIScience enhances CI by cou-
pling GIS and geographical analysis as generic geospatial CI services, and further
indicates that CI empowers GIScience by enabling high-performance, distributed,
and collaborative geographical analysis.
In summary, our Gateway experience has demonstrated the feasibility to establishan online CI-based environment in which many users can dynamically create a VO
(i.e., collaboratory) to contribute and share geospatial services, and collaborate on
spatial analysis investigations that harness enormous CI resources. On the basis of a
SOA, the Gateway makes it straightforward for domain scientists to run and share
their own CI-empowered geographical analysis without having to resolve significant
challenges of managing CI complexity. Its architecture is designed to be open for
incorporating new CI and geographical analysis services and scalable to a large
number of users and analyses, massive CI resources, and significant computationalintensity of geographical analysis.
7. Discussions
Past experiences in bridging CI and domain sciences suggest that participation of
domain scientists is critical to developing CI and realizing its significant impacts
(Wang and Zhu 2008). This article presents an SOA framework to holistically brid-
ging CI and GIScience by describing the TeraGrid GIScience Gateway, a Web-basedGIS environment. The Gateway manages the complexity of the TeraGrid and pro-
vides a collaborative problem-solving environment for the GIScience community
users to conduct geographical analysis.
On the basis of the SOA, a novel strategy of this research is to transform a
geographical analysis into a composite service that can invoke other component
services. This strategy is a key to meeting the requirements for the Gateway service
infrastructure and user environment. The service infrastructure is based on a publish-
subscribe model to support service composition, provisioning, and execution. Theuser environment is developed by extending service interface definition to include user
interface metadata. This extension allows users to directly interact with the Gateway
services at the individual service level. To achieve highly configurable and interactive
service management interfaces, Web 2.0 technologies are used to implement this
support for user-service interactions.
The transformation of geographical analysis into geospatial CI services in the
Gateway is achieved in the context of integrating the following CI and GIScience
capabilities:
l high-performance computing, VO and Grid computing, and data and visualiza-
tion capabilities from CI perspectives; andl parallel and distributed processing of geographical analysis, service-oriented
Web GIS architecture, and collaborative geographical analysis capabilities
from GIScience perspectives.
This integration realizes a key vision of geospatial CI, emphasizing that geospatial
CI should provide geographical analysis services without exposing individual CI
components to end users. Geographical analysis services are viewed as an integral
part of geospatial CI that encapsulates the aforementioned CI and GIScience cap-
abilities. Geographical analysis services empowered by CI through the Gateway are
discussed in this section from three perspectives: 1) mechanisms to convert new
TeraGrid GIScience gateway 651
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
geographical analysis into reusable services, 2) strategies to achieve service interoper-
ability to enable a user-centric online geographical analysis environment, and 3)
service and workflow management. In addition, education and outreach experience
using the Gateway is summarized. Future work is described to advance the Gateway
toward a CI-empowered online DGIP environment that is customizable to satisfyindividual user preferences and meet collaboration needs.
7.1 Submit new services to the Gateway
The scalable Gateway SOA provides an open architecture for users to contribute their
geographical analysis as interoperable services. To convert a new analysis into a
Gateway service, Web-based tools are provided to automatically create component
services for accessing TeraGrid resources, encapsulating the analysis logic as a standardWeb service, and interacting with the user environment. Future research aims to
improve the Gateway scalability by continuing tool development to reduce the over-
head of transforming legacy geographical analysis into the service infrastructure and
user environment. A major challenge for computationally intensive geographical ana-
lysis is to address parallel and distributed processing. We are investigating sophisticated
geographical analyses such as agent-based modeling and spatial evolutionary algo-
rithms to gain an understanding about issues of automatic incorporation of such
computationally intensive analyses into the Gateway as services.
7.2 Service interoperability
To make the Gateway an open and scalable platform for numerous users to con-
tribute and share geographical analysis that is empowered by CI, the conversion tools
are compliant with the Geospatial Web Processing Service Specifications (developed
by the Open Geospatial Consortium, OGC) in terms of basic service description and
profiles. In addition, its data and visualization services adopt the Keyhole MarkupLanguage (KML) standard that has been accepted as an OGC standard (effective
version 2.2). Although such compliance or adoption is a key to service interoperability
and an open and collaborative environment for geographical analysis, Grid comput-
ing standards developed within the Open Grid Forum are necessary to enable full
integration of geographical analysis services with generic CI services. Recently, the
first OGC and Open Grid Forum Collaboration Workshop focused on exploring the
use of the Geospatial Web Processing Service that bridges Grid computing services.
We expect the Gateway services developed in this research to serve as initial bestpractices for interoperable CI-based geographical analysis services.
7.3 Service and workflow management
The orchestration and execution of geographical analysis services handle service–ser-
vice and user–service interactions, and sequentially invoke component services based
on analysis logic. Analysis logic is represented as a directed acyclic graph. The
interpretation of such graphs is handled by the publish-subscribe model through
provisioning services. Various evolving Web service-based workflow approaches(e.g., Krishnan et al. 2002, BPEL 2008, WSFL 2008, XLANG 2008) are being
investigated to link composite services in support of more sophisticated geospatial
decision-making and problem-solving. In addition, the use of CI resources requires
652 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
asynchronous service invocation on computation job submissions and data transfers
because time-consuming jobs and large data transfers are common.
7.4 Education and outreach
The Gateway has been used in a series of graduate and undergraduate courses
(Table 2) to teach CI, GIScience, and spatial statistics. The four GIS courses were
taught by GIScience faculty, whereas the two statistics courses were taught by
statistics faculty. Our preliminary findings from these courses suggest that the
Gateway may promote the active participation of students in learning CI,
GIScience, and geographical analysis, and enable the training of skills for collabora-
tive problem solving and decision making. In addition, the Gateway was used as aTeraGrid-based geospatial problem-solving environment for a student research com-
petition in the recent Second TeraGrid Annual Conference. Our Gateway, providing an
infectious disease risk-assessment problem based on spatial interpolation analysis,
was ranked the most reliable gateway. On the basis of feedback from conference
organizers, students reported that the user environment is user-friendly and were
impressed by the power of coupling GIS and TeraGrid to solve scientific problems.
Although these experiences show the successful bridging between CI and GIScience
within education contexts, they have helped stress-test its scalability to many usersand TeraGrid resources.
Acknowledgments
This research was supported in part by the NSF through the award OCI-0503697 and
the TeraGrid computation resource award TG-SES070007N. We thank Dr. Wenwu
Tang at University of Illinois at Urbana-Champaign and Dr. Jun Yan at the
University of Connecticut for their insightful comments. We are also grateful for
the insightful comments of the editors and three anonymous reviewers.
References
AJAX, 2008, AJAX, available online at: http://developer.mozilla.org/en/docs/AJAX (accessed
January 2008).
ARMSTRONG, M.P. and DENSHAM, P.J., 1992, Domain decomposition for parallel processing of
spatial problems. Computers Environment, and Urban Systems, 16, pp. 497–513.
ATKINS, D.E., et al., 2003, Revolutionizing Science and Engineering through Cyberinfrastructure:
Report of the National Science Foundation Blue-Ribbon Advisory Panel on
Cyberinfrastructure, available online at: http://www.communitytechnology.org/
nsf_ci_report/ (accessed January 2008).
Table 2. Gateway course information.
Course name (times used)Typical number
of students Level
Introduction to GIS (once) 30 Primarily undergraduateFoundations of GIS (twice) 50 UndergraduatePrinciples of GIS (twice) 20 Undergraduate and graduateAdvanced GIS (once) 20 Undergraduate and graduateBayesian Statistics (once) 20 Undergraduate and graduateComputing in Statistics (once) 20 Primarily graduate
TeraGrid GIScience gateway 653
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
BIEBERSTEIN, N., et al., 2005, Impact of service-oriented architecture on enterprise systems,
organizational structures, and individuals. IBM Systems Journal, 44, pp. 691–708.
BPEL, 2008, OASIS Web Services Business Process Execution Language, available online at:
http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html (accessed January 2008).
CATLETT, C., et al., 2007, TeraGrid: analysis of organization, system architecture, and middle-
ware enabling new types of applications. In L., Grandinetti (Ed.), HPC and Grids In
action of Advances in Parallel Computing Series, pp . 9–18 (Amsterdam: IOS Press).
CLIFF, A.D. and ORD, J.K., 1973, Spatial Autocorrelation (London, UK: Pion Press).
COWLES, M.K., 2003, Efficient model-fitting and model-comparison for high-dimensional Bayesian
geostatistical models. Journal of Statistical Planning and Inference, 112, pp. 221–239.
COWLES, M.K., 2002, Combining snow water equivalent data from multiple sources to estimate
spatio-temporal trends and compare measurement systems. Journal of Agricultural,
Biological, and Environmental Statistics, 7, pp. 536–557.
CTSS (Coordinated TeraGrid Software and Services), 2008, available online at:http://www.ter-
agrid.org/userinfo/software/ctss.php (accessed January 2008).
DENSHAM, P.J. and ARMSTRONG, M.P., 1998, Spatial analysis. In R. Healey, S. Dowers, B.,
Gittings and M.Mineter (Eds), Parallel Processing Algorithms for GIS, pp.387–413
(Bristol, PA: Taylor and Francis).
DING, Y. and DENSHAM P.J., 1996, Spatial strategies for parallel spatial modeling. International
Journal of Geographical Information Systems, 10, pp. 669–698.
EUGSTERR, P.Th, et al., 2003, The many faces of publish/subscribe. AMC Computing Survey,
35(2), pp. 114–131.
FONSECA, F.T., et al., 2002, Using ontologies for integrated geographic information systems.
Transactions in GIS, 6(3), pp. 231–257.
FOSTER, I., 2006, Globus Toolkit Version 4: Software for service-oriented systems. In H. Jin,
D.A. Reed and W. Jiang (Eds), Proceedings of the IFIP International Conference on
Network and Parallel Computing, Tokyo, Japan, pp. 2–13 (Berlin: Springer).
GEON, 2008, Geosciences Network, available online at: http://www.geongrid.org/ (accessed
January 2008).
GETIS, A. and ORD, J.K., 1992, The analysis of spatial association by use of distance statistics.
Geographical Analysis, 24, pp. 189–206.
GOODCHILD, M.F., 2007, Citizens as voluntary sensors: spatial data infrastructure in the world of
Web 2.0. International Journal of Spatial Data Infrastructures Research, 2, pp. 24– 32.
HUANG, H.-C., CRESSIE, N. and GABROSEK, J., 2002, Fast, resolution-consistent spatial predic-
tion of global processes from satellite data. Journal of Computational and Graphical
Statistics, 11, pp. 63– 88.
HUMPHREY M., et al., 2005, State and events for Web services: a comparison of five WS-resource
framework and WS-notification implementations. In Proceedings of the 14th IEEE
International Symposium on High Performance Distributed Computing (HPDC-14),
24–27 July 2005, Research Triangle Park, NC, pp. 3–13.
JSR-168, 2008, JSR-168 PortletAPI, available online at: http://developers.sun.com/portalser-
ver/reference/techart/jsr168/ (accessed January 2008).
KARONIS, N., TOONEN, B. and FOSTER, I., 2002, MPICH-G2: a Grid-enabled implementation of
the message passing interface. Journal of Parallel and Distributed Computing (JPDC),
63(5), pp. 551–563.
KRISHNAN, S., WAGSTROM, P. and LASZEWSKI, G.V., 2002, GSFL: a workflow framework for grid
services. Technical Reports (Argonne, Chicago, IL: Argonne National Laboratory),
available online at: http://www-unix.globus.org/cog/papers/gsfl-paper.pdf (accessed
January 2008).
KRZANOWSKI, R.M. and RAPER, J., 1999, Hybrid genetic algorithm for transmitter location in
wireless networks. Computers, Environment and Urban Systems, 23, pp. 359–382.
LEAD, 2008, Linked Environments for Atmospheric Discovery, available online at:https://portal.-
leadproject.org/gridsphere/gridsphere?cid=lead-grid (accessed January 2008).
654 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
MALANSON, G.P. and ARMSTRONG, A.P., 1996, Dispersal probability and forest diversity in a
fragmented landscape. Ecological Modelling, 87, pp. 91–102.
NEON, 2003, Neon: Addressing the Nation’s Environmental Challenges (Washington, DC:
National Academies Press).
NEON, 2008, National Ecological Observatory Network, available online at:http://www.neo-
ninc.org/ (accessed January 2008).
NSF (National Science Foundation), 2007, Cyberinfrastructure Vision for 21st Century
Discovery, available online at: http://www.nsf.gov/pubs/2007/nsf0728/index.jsp
(accessed January 2008).
SEEK, 2008, The Science Environment for Ecological Knowledge, available online at: http://
seek.ecoinformatics.org/ (accessed March 2008).
TERAGRID, 2008, TeraGrid Community Account Policy, available online at: http://www.tera-
gridforum.org/mediawiki/images/8/81/TGD-10.doc (accessed January 2008).
TIAN M., et al., 2003, Performance impact of Web services on Internet servers. In T. Gonzalez,
Proceedings of 2003 Parallel and Distributed Computing and Systems. November 2003,
pp. 162–184 (Marina del Rey, CA: ACTA).
TSOU, M.H. and BUTTENFIELD, B.P., 2002, A dynamic architecture for distributing geographic
information services. Transactions in GIS, 6(4), pp. 355–381.
WANG, S., 2008, Formalizing computational intensity of spatial analysis. In Proceedings of the
5th International Conference on Geographic Information Science, 23–26 September, Park
City, UT.
WANG, S. and ARMSTRONG, M.P., 2003, A quadtree approach to domain decomposition for
spatial interpolation in grid computing environments. Parallel Computing, 29, pp.
1481–1504.
WANG, S. and ARMSTRONG, M.P., 2009, A theoretical approach to the use of cyberinfrastructure
in geographical analysis. International Journal of Geographical Information Science,
23(2), pp. 169–193.
WANG, S., et al., 2005, GISolve: a Grid-based problem solving environment for computation-
ally intensive geographic information analysis. In Proceedings of the 14th International
Symposium on High Performance Distributed Computing (HPDC-14) – Challenges of
Large Applications in Distributed Environments (CLADE) Workshop, pp. 3–12
(Research Triangle Park, NC: IEEE Press).
WANG, S., COWLES, M.K. and ARMSTRONG, M.P., 2008, Grid computing of spatial statistics:
using the TeraGrid for G�i ðdÞ analysis. Concurrency and Computation: Practice and
Experience, 20(14), pp. 1697–1720.
WANG, S. and ZHU, X.-G., 2008, Coupling cyberinfrastructure and geographic information
systems to empower ecological and environmental research. BioScience, 58(2), pp. 94–95.
WATERS, 2008, WATer and Environmental Research Systems Network (WATERS Network),
available online at: http://www.watersnet.org/ (accessed January 2008).
WELCH V., et al., 2004, X.509 proxy certificates for dynamic delegation. In Proceedings of the
3rd Annual Public Key Infrastructure (PKI) Research and Development Workshop,
12–14 April, Gaithersburg, MD.
WILKINS-DIEHR, N., 2007, Special issue: science Gateways – common community interfaces to Grid
resources. Concurrency and Computation: Practice & Experience, 19(6), pp. 743–749.
WRIGHT, D.J., et al., 2003, Why Web GIS may not be enough: a case study with the Virtual
Research Vessel. Marine Geodesy, 26(1–2), pp. 73–86.
WSFL, 2008, Web Service Flow Language, available online at: http://www.ebpml.org/wsfl.htm
(accessed April 2008).
WSRF, 2008, WSRF, available online at: http://www.globus.org/wsrf/ (accessed January 2008).
XLANG, 2008, XLANG, available online at: http://www.ebpml.org/xlang.htm (accessed April
2008).
YAN, J., et al., 2007, Parallelizing MCMC for Bayesian spatiotemporal geostatistical models.
Statistics and Computing, 17, pp. 323–335.
TeraGrid GIScience gateway 655
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011
YANG, C. and RASKIN, R., 2009, Introduction to distributed geographic information processing
(DGIP) research. International Journal of Geographic Information Science, 23(5), pp.
553–560.
YANG, C., et al., 2005, Performance improving techniques in WebGIS. International Journal of
Geographical Information Science, 19(3), pp. 319–342.
YANG, C., et al., 2007, The emerging concepts and applications of the spatial web portal.
Photogrammetric Engineering & Remote Sensing, 73(6), pp. 691–698.
ZHANG, T. and TSOU, M.H., 2009, Developing a grid-enabled spatial Web portal for Internet
GIServices and geospatial cyberinfrastructure. International Journal of Geographic
Information Science, 23(5), pp. 605–630.
656 S. Wang and Y. Liu
Dow
nloa
ded
by [
Geo
rge
Mas
on U
nive
rsity
] at
12:
40 0
6 Ju
ly 2
011