teragrid giscience gateway: bridging …cisc.gmu.edu/scc/readings/teragrid.pdfthe gateway...

27
This article was downloaded by: [George Mason University] On: 06 July 2011, At: 12:40 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK International Journal of Geographical Information Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tgis20 TeraGrid GIScience Gateway: Bridging cyberinfrastructure and GIScience Shaowen Wang a & Yan Liu a a Department of Geography and National Center for Supercomputing Applications, University of Illinois at Urbana- Champaign, Urbana, IL, USA Available online: 22 Jun 2011 To cite this article: Shaowen Wang & Yan Liu (2009): TeraGrid GIScience Gateway: Bridging cyberinfrastructure and GIScience, International Journal of Geographical Information Science, 23:5, 631-656 To link to this article: http://dx.doi.org/10.1080/13658810902754977 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and- conditions This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan, sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Upload: lamtuong

Post on 21-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by: [George Mason University]On: 06 July 2011, At: 12:40Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of GeographicalInformation SciencePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tgis20

TeraGrid GIScience Gateway: Bridgingcyberinfrastructure and GIScienceShaowen Wang a & Yan Liu aa Department of Geography and National Center forSupercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Available online: 22 Jun 2011

To cite this article: Shaowen Wang & Yan Liu (2009): TeraGrid GIScience Gateway: Bridgingcyberinfrastructure and GIScience, International Journal of Geographical Information Science, 23:5,631-656

To link to this article: http://dx.doi.org/10.1080/13658810902754977

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching and private study purposes. Anysubstantial or systematic reproduction, re-distribution, re-selling, loan, sub-licensing,systematic supply or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

Research Article

TeraGrid GIScience Gateway: Bridging cyberinfrastructureand GIScience

SHAOWEN WANG* and YAN LIU

Department of Geography and National Center for Supercomputing Applications,

University of Illinois at Urbana-Champaign, Urbana, IL, USA

(Received 29 January 2008; in final form 15 January 2009)

Cyberinfrastructure (CI) represents the integrated information and communica-

tion technologies for distributed information processing and coordinated knowl-

edge discovery, and is promising to revolutionize how science and engineering are

conducted in the twenty-first century. The value of bridging CI and GIScience is

significant to advance CI and benefit GIScience research and education, particu-

larly in distributed geographic information processing (DGIP). This article pre-

sents a holistic framework that bridges CI and GIScience by integrating CI

capabilities to empower GIScience research and education and establish generic

DGIP services supported by CI. The framework, the TeraGrid GIScience

Gateway, is based on a CI science gateway approach developed on the National

Science Foundation (NSF) TeraGrid – a key element of US and world CI. This

gateway develops a unifying service-oriented framework with respect to its architec-

ture, design, and implementation as well as its integration with the TeraGrid. The

functions of the gateway focus on enabling parallel and distributed processing for

geographical analysis, managing the complexity of TeraGrid software environment,

and establishing a Web-based GIS for the GIScience community to gain shared and

collaborative access to TeraGrid-based geospatial processing services. The gateway

implementation uses Web 2.0 technologies to create a highly configurable and

interactive multiuser environment. Two case studies, Bayesian geostatistical model-

ing and a spatial statistic G�i ðdÞ for detecting local clustering, are used to demonstrate

the gateway functions and user environment. The service transformation for these

analyses is applied to create a shared, decentralized, and collaborative geographical

analysis environment in which GIScience community users can contribute new

analysis services and reuse existing gateway services.

Keywords: Cyberinfrastructure; Geographical analysis; Parallel and distributed

processing; Service-oriented architecture; Web 2.0; Web-based GIS

1. Background

Like the physical infrastructure of transportation, electricity power grids, commu-

nication, and other utility systems that support modern society, cyberinfrastructure

(CI) represents the distributed computer, information, and communication technol-

ogies combined with the personnel and integrating components that provide a long-

term platform to empower the modern scientific research endeavor (Atkins et al.

2003). CI also refers to the development and use of integrated and distributed

*Corresponding author. Email: [email protected]

International Journal of Geographical Information ScienceISSN 1365-8816 print/ISSN 1362-3087 online # 2009 Taylor & Francis

http://www.tandf.co.uk/journalsDOI: 10.1080/13658810902754977

International Journal of Geographical Information Science

Vol. 23, No. 5, May 2009, 631–656

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

information infrastructure that enables and accelerates the discoveries of science and

engineering. E-Science is a similar term originating in Europe with an emphasis on the

transformative roles of CI in science and engineering practices. Although coming

from slightly different perspectives, both CI and e-Science represent the powerful

paradigm in which distributed computer and knowledge systems and information andcommunication technologies are integrated to provide services to enable large-scale

and collaborative sciences and engineering.

Many domains have developed and used CI to support their sciences, and reported

promising results (GEON 2008, NEON 2008, SEEK 2008, WATERS 2008).

However, significant challenges remain to fully exploit CI in science and engineering

inquiry mainly because 1) CI technologies are sophisticated and continue to evolve;

and 2) CI workforce in science and engineering requires significant efforts to educate.

From a technological point of view, CI comprises computing systems, data, informa-tion resources, networking, digitally enabled sensors, instruments, virtual organiza-

tions (VOs), and observatories, along with an interoperable suite of software services

and tools (NSF 2007). Among these components, three types of capabilities form the

current CI core: high-performance computing; data handling, analysis, and visualiza-

tion; and VO for distributed communities (NSF 2007). These capabilities collectively

offer great opportunities to revolutionize the way that Geographical Information

Science (GIScience) evolves based on developing and using high-performance, dis-

tributed, and collaborative Geographical Information Systems (GIS).The GIScience community has been exploiting individual CI capabilities for years.

Web-based GIS solutions (e.g., Tsou and Buttenfield 2002, Yang et al. 2005) have

been developed to support online data access and map-based visualization, and

extended to facilitate the coupling of GIS and modeling (Wright et al. 2003). This

type of coupling can further be enhanced to enable collaborative and/or computa-

tionally intensive distributed geographical information processing (DGIP, Yang and

Raskin 2009) by scaling Web-based GIS from Web servers to CI. High-performance

and/or Grid computing have been used to solve computationally intensive geogra-phical analysis problems (Armstrong and Densham 1992, Wang and Armstrong 2003,

Wang et al. 2008). Moreover, ontology-driven GIS have been recognized to address

semantic integration issues for geographic knowledge development and communica-

tion (Fonseca et al. 2002). Although research and development in these areas help

understand how GIScience and CI may benefit from each other, to effectively estab-

lish and exploit the bridge between GIScience and CI requires the treatment of CI as

an integrative framework comprising a balanced, seamless blending of individual CI

capabilities (NSF 2007).The GIScience community can contribute to the understanding and development of

generic geospatial CI to enable high-performance, distributed, and collaborative GIS as

it has been examining the theoretical and practical underpinnings for geospatial tech-

nologies to be applied across many application domains. This contribution is essential

to establish coherent generic geospatial CI for domain science communities, such as

geosciences and ecological science (see GEON 2008, Yan Liu 2008, NEON 2003, SEEK

2008), to develop their domain-specific geospatial CI. Although these communities

have developed their domain-specific VO based on middleware provided by, forexample, the Globus Toolkit (http://globus.org), bridging GIScience to CI would help

address their common needs of generic geospatial CI (Zhang and Tsou 2009).

Recognizing that CI itself is an evolving research area, exhibiting complexity at

both its framework and its capability levels, a science or engineering gateway

632 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

approach has been proposed as an integrated community-specific set of tools, appli-

cations, and data collections that provide a research community with a shared online

environment to enable domain-specific collaborative problem solving through seam-

less access to CI capabilities (Catlett et al. 2007, Wilkins-Diehr 2007). The purpose of

this research is to address the ultimate challenge of bridging CI and GIScience byestablishing the TeraGrid GIScience Gateway (hereafter, the Gateway).

In the remainder of this article, Section 2 discusses the development strategies of the

Gateway. Section 3 describes its Service-Oriented Architecture (SOA) and design for

a service infrastructure that supports the deployment of services and a collaborative

geospatial processing environment. On the basis of this SOA, Section 4 presents a

Web service-based implementation of the service infrastructure and user environ-

ment. Section 5 presents two case studies to demonstrate how to integrate geographi-

cal analysis into the Gateway and leverage TeraGrid capabilities for collaborative andcomputationally intensive geospatial processing. Section 6 concludes and Section 7

discusses the significance of this research, ongoing research themes, and future

research directions.

2. Introduction and strategies

The Gateway, based on the NSF TeraGrid (a key element of US and world CI; see

http://www.teragrid.org), integrates CI capabilities and manages the associated com-

plexity to empower GIScience and establish generic geospatial CI. The Gateway

provides an open CI-based platform for GIScience community users to conduct and

share collaborative, computationally intensive geographical analysis.

2.1 TeraGrid GIScience Gateway

TeraGrid is an open scientific discovery infrastructure that consolidates high-

performance computing resources at eleven partner sites to provide an integrated

and persistent computational resource. The Gateway is one of over two dozen science

and engineering gateways that facilitate application development and TeraGrid

operations. In general, these gateways take different design and implementationapproaches that are tailored to specific computation needs of individual domain

communities. For example, differences are reflected in user interfaces based on the

adoption and customization of diverse command-line or Web-based technologies.

Although TeraGrid has started to analyze and identify common infrastructure uti-

lities that would be applicable across gateways, few such utilities have been developed

to focus on application-level capabilities.

The Gateway is based on the GISolve Toolkit (http://www.gisolve.org) and focuses

on the development and provision of generic geospatial CI capabilities and DGIPservices. The Gateway is accessible through Web interfaces based on an SOA that

bridges CI capabilities with GIS and geographical analysis functions to manage CI

complexity and enable collaborative and computationally intensive geographical

analysis. SOA has become an effective approach to building interoperable and

reusable software components and integrating software components for the develop-

ment of scalable systems. Most cutting-edge CI capabilities are based on SOA, and

implemented using Web services. Whereas geospatial Web services have been studied

to support spatial queries, map browsing, and metadata indexing, the SOA-basedGateway focuses on geospatial services that are integrated with CI capabilities.

TeraGrid GIScience gateway 633

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

2.2 Requirements

The Gateway was designed to enable GIScience community users to simultaneously

conduct collaborative and computationally intensive geographical analysis based on

scalable access to high-performance TeraGrid resources. Specific requirements are

summarized as follows:

l Scalability: TeraGrid access should be scalable to 1) the quantity of TeraGridcomputational resources; 2) the number of GIScience community users; and 3)

the computational intensity (Wang 2008) of geographical analyses.

l Usability and reusability: The Gateway capabilities should be user-friendly,

shared, and reused by a large number of GIScience community users.

l Interoperability: Gateway capabilities should be interoperable with each other,

and the underlying TeraGrid capabilities.

l Sustainability: The Gateway should be adaptive to advancements in geographi-

cal analysis methods and TeraGrid technologies. As the implementations of theGateway capabilities are improved, their interfaces should remain consistent for

the sustainability of the Gateway access.

2.3 Strategies

The requirements discussed above have been addressed in the Gateway development

based on three strategies: tackling computational intensity of geographical analysis,

managing TeraGrid complexity, and establishing a scalable SOA.

2.3.1 Tackling computational intensity of geographical analysis. Geographical ana-

lysis based on the use of heuristic search, simulation, optimization, and statistical

methods is often computationally intensive (Malanson and Armstrong 1996,

Densham and Armstrong 1998, Krzanowski and Raper 1999, Huang et al. 2002,

Wang and Zhu 2008). Enormous computational resources are needed to store and

manage geographical information, and conduct computationally intensive geogra-phical analysis. These required resources are often unavailable on a single computer.

Consequently, researchers have begun to exploit advances in CI to access required

computer resources from disparate and heterogeneous sources, for example, provided

by TeraGrid (Wang et al. 2008). A key strategy to tackle computationally intensive

geographical analysis on TeraGrid is to parallelize and distribute geographical infor-

mation processing. TeraGrid can be viewed as a massive parallel and distributed

computing environment that integrates individual computing resources built on various

parallel computing architectures. Parallelization and distribution can be used in geo-graphical analysis to exploit the computing power available at the level of individual

computing resources as well as the distributed parallel computing capacity integrated

through the TeraGrid software environment. Each geographical analysis may be

decomposed into parts that are computed in parallel. The Gateway supports parallel

and distributed processing of geographical analysis tasks submitted by multiple users.

To achieve high-performance geographical analysis and scalable resource use, it must

assure load balancing in the decomposed parts of each analysis and the entire set of

concurrent analyses.

2.3.2 Managing TeraGrid complexity. In recent years, significant progress has been

made in the use of TeraGrid capabilities for collaborative problem solving based on

access to high-performance computing resources. However, the TeraGrid will remain

634 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

complex to many scientists because its capabilities 1) have not been developed to

directly focus on the requirements of domain-specific problem solving; and 2) will

continue to evolve as driven by CI advancements. As a specific strategy to serve

rapidly growing GIScience community users, the Gateway uses the TeraGrid com-

munity account model (TeraGrid 2008) to facilitate scalable user access.TeraGrid capabilities are provided through a Coordinated TeraGrid Software and

Services (CTSS 2008) environment that is deployed on each TeraGrid resource. Major

software components in this environment are designed for 1) remote computation, 2)

distributed data transfer, 3) information services for resource management and dis-

covery, and 4) security mechanisms for shared computation resource access by dis-

tributed users. The environment requires system administration for high-performance

computational resources to carefully configure these components and their interfaces.

The effective use of these components for geographical analysis is complex because

l each component includes sophisticated Application Programming Interfaces

(API) and software tools;

l interactions among components require significant efforts to coordinate and

control in geographical analysis; and

l each component has various interfaces with respect to the management of

heterogeneous high-performance and distributed computational resources.

As a bridge between TeraGrid and GIScience, the Gateway naturally fits require-

ments for CI complexity management that enables the efficient and effective use of

TeraGrid for geographical analysis.

2.3.3 Establishing a scalable service framework. The Gateway integrates TeraGrid

capabilities and computationally intensive geographical analysis through the devel-opment and use of an SOA. The Gateway SOA supports the decoupling of the

publication, implementation, deployment, and ownership of both geographical ana-

lysis and TeraGrid services to facilitate a collaborative environment in which various

service providers can contribute.

SOA and its Web service implementations have been widely adapted to build

scalable information technology infrastructures in both academic research and indus-

try (Bieberstein et al. 2005). This trend has been reflected in recent GIS development

(see http://www. opengeospatial.org and http://www.esri.com). Web services useXML artifacts to support interoperability among SOA components (Foster 2006).

According to the World Wide Web Consortium (http://www.w3c.org), ‘an SOA is a

specific type of distributed system in which a service is a software agent that performs some

well-defined operation (i.e., provides a service) and can be invoked outside of the context

of a larger application’. By separating the service interface description and its imple-

mentation and making its service interface universally addressable (e.g., using a uni-

versal resource identifier, often referred to as URI), a service is flexible and adaptable in

implementation by service providers and in on-demand remote access by service con-sumers. A Gateway service, generally defined as a functional unit of the Gateway that

originates from components in the TeraGrid software environment and geographical

analysis, is identified and published according to the principle of generality. Once

published, a service is self-contained and operationally loosely coupled with other

services.

Several TeraGrid services (see Table 1) are available in the Gateway to support

geographical analysis. These services are provided by the service-oriented Globus

TeraGrid GIScience gateway 635

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

Toolkit (Foster 2006) and deployed as interoperable Grid services. When a geogra-

phical analysis is executed, these TeraGrid services are invoked dynamically for

gateway authentication and authorization, data transfer, computation resource dis-

covery and selection, computation execution and monitoring, and visualization. A

basic principle for the management of TeraGrid complexity is to develop a set of

common utilities for geographical analysis that encapsulate the functions of TeraGridsoftware components. This encapsulation is achieved within a scalable SOA to enable

the GIScience community to focus on geographical analysis when using the TeraGrid.

3. Service-oriented architecture and design

This section presents a unifying Gateway SOA and its design to integrate TeraGrid and

geospatial services in a scalable manner. The Gateway SOA is developed to manage

TeraGrid complexity and supports the parallel and distributed processing of computa-

tionally intensive geographical analysis. Compared to conventional GIS solutions

(e.g., ESRI software) that have been developed through the direct support of operating

systems (e.g., Windows and Linux), the Gateway adheres to an SOA approach using a

suite of interoperable middleware that integrate CI capabilities across multiple operatingsystems (Figure 1). Furthermore, conventional GIS are primarily run in single desktop

or server mode, whereas the Gateway is run in an integrated environment of networked

high-performance computers that are geographically distributed and orchestrated using

Table 1. A list of TeraGrid services.

Type TeraGrid services

Framework WS-Core (core Web services)Computing WS-GRAM (Web service-based Grid resource allocation and management)Data RFT (reliable file transfer service)Information WS-MDS (Web service-based monitoring and discovery system)Security WS-AA (Web service-based authentication and authorization)

Figure 1. Gateway design and its analogy to ESRI GIS solutions (EGEE: Enabling Grids forE-Science, see http://public.eu-egee.org; OSG: Open Science Grid, see http://www.openscience-grid.org) [49 · 27 mm (600 · 600 DPI)].

636 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

TeraGrid middleware. The user environment is designed to achieve scalable Web con-

tent integration and presentation through the Gateway SOA.

3.1 Architecture

The Gateway SOA mainly includes a service infrastructure and user environment

(Figure 2). The service infrastructure provides a decentralized service computing

platform based on publish-subscribe models (Eugsterr et al. 2003) to support the

provisioning and execution of TeraGrid and geospatial services. The user environ-

ment interacts with users and needed services to conduct geographical analysis. The

Gateway uses the Grid Security Infrastructure (Welch et al. 2004), a de facto CI

security standard, as a unified solution to secure access to the Gateway and TeraGrid.

Besides runtime support, the SOA enables the development and integration of service-based geographical analysis.

Gateway services are built into the service infrastructure and provide support for

explicit user interactions. A user interface is defined as part of a service interface to

support user interactions with individual Gateway services. As illustrated in

Figure 3, the adoption of SOA helps achieve the interoperability, scalability, and

reusability of software components. By transforming geographical analysis (e.g., A,

Figure 2. Gateway architecture [30 · 30 mm (600 · 600 DPI)].

TeraGrid GIScience gateway 637

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

B, and C in Figure 3) into service components and deploying them into service

containers in the service infrastructure, users can access individual services regard-

less of where these services are provided or hosted. Because service interface defini-

tions are described in machine-readable formats with XML artifacts, individual

services can be independently developed by community users, deployed on commu-nity service providers, and invoked as on-demand services, as opposed to static

building and sequential executions in traditional development of geographical

applications. As the Gateway evolves, its services are reusable through service

composition that implements new analysis logic by dynamically binding existing

services. Access to single or multiple TeraGrid resources required by geographical

analysis is triggered at an individual service level and supported by TeraGrid hard-

ware and software infrastructure.

Figure 3. Conceptual illustration of the Gateway (TeraGrid image source: http://www.tera-grid.org) [29 · 39 mm (600 · 600 DPI)].

638 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

In the Gateway SOA, each geographical analysis is designed as a composite service

(Figure 4). The execution of each service is driven by coordinated user interventions

and invocations specified within the service definition. Each composite service is

created and managed based on a publish-subscribe model using a set of provisioning

services with the following functions (Figure 5):

Figure 4. Streamlined execution of a composite service with four component services (Eachcomponent service has user interaction defined as part of service interface and can be optionallyconfigured in a composite service; the execution of a composite service interacts with the Gatewayuser environment according to geographical analysis logic) [48 · 32 mm (600 · 600 DPI)].

Figure 5. Internal architecture of the Gateway publish-subscribe framework (DB: database)[40 · 32 mm (600 · 600 DPI)].

TeraGrid GIScience gateway 639

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

l Service creation, which transforms a geographical analysis into a Gateway

composite service. The service interface is defined using the Web Service

Description Language (WSDL), a standard service definition language.

l Service discovery, responsible for finding interesting and available services.

During the execution of services, dynamically available service instances arediscovered for runtime service selection.

l Service orchestration, which supports service runtime management.

l Service provenance, created and managed for each service to track its state

evolution and runtime information.

The publish-subscribe model allows a service to be registered by publishing a

service interface definition and runtime address for service discovery. The service

interface is considered static metadata for service management. Whereas static service

information facilitates reuse and composition, dynamic service information commu-

nicates service availability and status.

3.2 Design

The Gateway SOA design focuses on building geospatial services and providing a

Gateway user environment. A geographical analysis is ported into the SOA through a

componentization design process, which decomposes an analysis computing logic into

loosely coupled, self-contained component services. The design of geographical analysis

services as Gateway composite services is placed within the context of the GISolve

Toolkit, which provides a problem-solving environment based on high-performanceand Grid computing (Wang et al. 2005). The Gateway integrates parallel processing,

data handling, and visualization capabilities from GISolve into the Gateway geogra-

phical analysis services. A service-based user environment is designed based on user

interface elements that are incorporated within each service.

3.2.1 Parallel and distributed processing of geographical analysis. Services for parallel

and distributed processing of geographical analysis mainly include functions for domain

decomposition and task scheduling. Domain decomposition services decompose a large

geographical analysis problem into components that can be handled in parallel

using TeraGrid resources. TeraGrid provides a diverse set of high-performance compu-

tational resources that can be used to exploit the parallelism in geographical analysis.Depending on feasible parallelization in a particular analysis, a specific parallel comput-

ing architecture of TeraGrid resources must be determined before designing an assign-

ment of decomposed analysis components to appropriate computing resources. The

parallel computing architecture of individual TeraGrid resources includes both shared-

memory and distributed-memory, whereas multiple TeraGrid resources can be dynami-

cally aggregated based on the Grid architecture.

According to a theoretical approach proposed by Wang and Armstrong (2009),

domain decomposition services are developed in two stages. First, region quadtreesand space-filling curves are used to decompose spatial computational domains, as

such domains can be represented as a field-based model and region quadtrees are

appropriate for the decomposition of field-based representations. Second, the results

of decomposing spatial computational domains are used to guide the decomposition

based on appropriate spatial data structures such as those summarized by Ding and

Densham (1996). This decoupling approach guides the service-oriented design of

parallel processing functions for geographical analysis in the Gateway. The input

640 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

for task-scheduling services includes the estimation of computational intensity for a

particular geographical analysis based on the representation of spatial computational

domain and dynamic resource status (Wang and Armstrong 2009). The output is an

execution plan that assigns each geographical analysis part to an available TeraGrid

resource (e.g., CPU in tightly coupled supercomputing architecture, or batch queue inthe Grid architecture). Domain decomposition is designed as a generalized service for

geographical analysis whereas task-scheduling solutions are specific to the architec-

ture of selected TeraGrid resources.

3.2.2 Service-oriented user environment. The Gateway is designed as a research and

education environment for the GIScience community where each community user

can be a contributor and/or consumer of geographical analysis services. Through

the Gateway, users can conduct collaborative analysis that harnesses enormousTeraGrid resources. SOA defines a service interface in a machine-readable format

such as WSDL, whereas the Web-based user interface is based on HTML and Web

scripting languages (e.g., JavaScript). A service interface defines the input and

output of the service logic. In the execution of a composite service that invokes

other services, user interactions are often needed at the component service level. For

example, a Gateway user may explicitly specify the input parameters of a geogra-

phical analysis service, check the progress of component services being executed,

and retrieve and visualize service output. Therefore, it is necessary to include inter-face support for user interactions in the service definition, which could enable the

seamless integration of user and service interactions in a streamlined composite

service.

Web 2.0 technologies play an important role in establishing the service-oriented

user environment (Goodchild 2007). Generally, Web 2.0 technologies have been

applied to support geospatial services and portals [see Yang et al. (2007) for details].

Specifically, we use Web 2.0 technologies for Web content management in the

Gateway user environment and to facilitate user interactions. For example, a serviceinterface uses portlet (JSR-168 2008) and AJAX (2008) technologies to achieve on-

demand content loading for dynamic presentation of analysis status. Unlike conven-

tional Web content management systems, each Gateway service has its own user

interface information to enable interaction with the individual service, which is

designed through a novel use of Web 2.0 technologies. When a composite service is

orchestrated, a Web-based user environment is automatically created by combining

user interface data from the services through mashups. The user environment is,

therefore, designed and maintained as a decentralized Web content managementsystem.

3.2.3 User environment management. The Gateway user environment is designed to

be highly configurable and user-friendly for TeraGrid-based geographical analysis.

Through interactions with the service infrastructure, the user environment integrates

user-related events, contents, and service information to enable the execution, mon-

itoring, and visualization of Web-based service execution. This integration is sup-

ported by the following functions:

l User management support for user registration, service orchestration and sub-

scription, and community security enforcement.

l User data management support for data transfer and service result retrieval.

TeraGrid GIScience gateway 641

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

l Service management support for the interactive creation of composite services

and interactions with Gateway service infrastructure to create the runtime user

environment.

l User interface management support for interactions between users and services.

4. Implementation

The Gateway is implemented according to its SOA design. For each service, a subset of

the Web service protocol stack is used to define the service interface, which is

implementation-independent and universally addressable. To support the execution

of composite services and user-service interactions, services are implemented as ‘state-

ful’ Web services by following the Web Service Resource Framework (WSRF 2008)standard. A service definition includes four parts: WSDL schema, service classification

metadata, service status metadata, and user interaction metadata. With service state

information, the execution of a composite service is managed as a workflow that

proceeds based on the progress of each invoked service.

The reuse and composition of services rely on a publish-subscribe model that is

accessed through Gateway provisioning services. The publish-subscribe model is a

topic-based implementation in which service classification metadata is considered as

the topic description of a service. The current implementation of the publish-subscribemodel uses a centralized method based on databases. The user environment is imple-

mented using Web portal technologies, following the Java Portlet API standard (JSR-

168 2008, also see Yang et al. 2007). Each execution of a geographical analysis service

is presented in the portal as a portal layout that is configured and rendered dynami-

cally. This layout includes a set of portlets responsible for rendering user interfaces of

the services involved.

5. Case study

The Gateway currently supports three types of computationally intensive geographical

analysis: Bayesian geostatistical modeling (Yan et al. 2007), detection of local spatialclustering (Wang et al. 2008), and inverse distance weighted spatial interpolation (Wang

and Armstrong 2003). Other geographical analysis services (e.g., simulation models; see

Wang and Zhu 2008) are being developed. In this section we present a case study on

incorporating the G�i ðdÞ spatial statistic for detecting local clustering as a composite

service into the Gateway for shared access by the GIScience community. A second case

study shows how the Gateway supports shared access to the integrated computation of

collaborative Bayesian geostatistical modeling using tightly coupled high-performance

computers individually, as well as collectively as Grids.

5.1 The Gi*(d)spatial statistic

G�i ðdÞ, a spatial statistical approach originally introduced by Getis and Ord (1992), isa measure of the local spatial association among point-referenced observations. This

statistic has been widely applied to identify clusters of points (i.e., hot spots) where

values of a variable are significantly high or low compared with the remaining

geographical region. G�i ðdÞ focuses on local spatial structure, in contrast to popular

global assessment measures, such as Geary’s C and Moran’s I (Cliff and Ord 1973).

All pair-wise distances between measurement locations must be computed to obtain

G�i ðdÞ. Consequently, G�i ðdÞ analysis consumes significant memory and is

642 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

computationally intensive when the number of locations is large. Wang et al. (2008)

have developed a parallel processing method to compute the G�i ðdÞspatial statistic

using the TeraGrid. It is beyond the scope of this case study to investigate this

parallelization method; instead, our focus is on how the method is adapted to the

Gateway services for future access and sharing through the user environment.

5.1.1 Gi*(d)service. The G�i ðdÞ service is based on existing and new services that

support the specific needs of G�i ðdÞ analysis (see Figure 6). The execution of a G�i ðdÞservice instance can be summarized in the following steps:

(1) Select service hosts and services;

(2) Get TeraGrid security proxy from a Gateway community account;

(3) Create a G�i ðdÞ data set through data simulation/query services;(4) Transfer data sets from the portal to a domain decomposition service host;

(5) Invoke a domain decomposition service (including the selection of optimalspatial decomposition strategies);

(6) Select TeraGrid computing resources;

(7) Invoke a task scheduling service to create a runtime schedule;

(8) Transfer decomposed data to selected TeraGrid resources;

(9) Execute the schedule on selected TeraGrid resources and monitor computa-

tion progress; and

(10) Transfer results from TeraGrid resources to the Gateway portal.

Figure 6. G�i ðdÞ service execution (DC: domain decomposition; TS: task scheduling; RFT:reliable file transfer service; WS-GRAM: Web service-based Grid resource allocation andmanagement; WS-AA: Web service-based authentication and authorization) [39 · 33 mm(600 · 600 DPI)].

TeraGrid GIScience gateway 643

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

Reused services include TeraGrid services (2, 4, 6, 8, 9, and 10 in Figure 6), service

infrastructure services (1 in Figure 6), and geographical analysis services (5 and 7 in

Figure 6). Users interact with these services based on the interfaces defined as velocity

(http://velocity.apache.org) templates and rendered in the G�i ðdÞ portal layout (see

Figure 7A and B).Current implementation of the domain decomposition service is specific to G�i ðdÞ

analysis (Figure 8). Additional domain decomposition strategies may be contrib-

uted as new services, and should include a spatial computational intensity measure

for each decomposed part of the corresponding analysis. Task-scheduling services

require this type of measure to evaluate load balance in using TeraGrid resources.

Both domain decomposition and task-scheduling services provide user interfaces

for choosing alternative decomposition and scheduling strategies. Figure 6 illus-

trates interactions among the component services. On the user portal side, sixportlets are designed for users to interact with the component services (see

Figure 7A and B). Depending on the progress of a G�i ðdÞ analysis, user interaction

focus is dynamically placed on a particular portlet by maximizing or highlighting

the portlet.

5.1.2 Service security overhead. Interactions between services and users through

portlets add overhead to the performance of G�i ðdÞ analysis. Service overhead is

introduced without a particular order by

(1) communications between portal server and service host, which are negligible

compared with other sources of impact;

(2) security added to message exchange;(3) middleware implementation of Web services; and

(4) Web server performance of processing XML-based requests.

Although the first type of overhead is often negligible and the third and fourth

types can be well managed in production-quality systems (Tian et al. 2003,

Humphrey et al. 2005), the second type remains as the most important factor.

Gateway’s security assurance must be implemented within all types of services to

enable collaborative geographical analysis services to be shared online. We there-

fore focus on assessing the service overhead added by security. Gateway service

interactions are secured using X.509 certificate-based authentication, authoriza-tion, and message encryption. Specifically, a X.509 certificate is used to create

symmetric keys for message encryption and sign messages for mutual authentication

between senders and receivers of service messages. Authorization is based on GSI

identity mapping (Welch et al. 2004).

In this case study, security-related overhead is measured using average round-trip

service response time. Three scenarios were designed for the evaluation of security-

related overhead: no security, transport-level security, and message-level security.

When the sender and receiver of a message are located within a single protectednetwork, network communication performance can be improved using the plain

HTTP (given sufficient protection enforcement on the network). This no-security

scenario applies only if the Gateway user environment coexists with service hosts in a

protected network environment with firewalls for external access. Transport-level

security establishes secure HTTP communication channels without support for dele-

gated message delivery that is needed when computation job submission does not

directly communicate with execution sites (e.g., inter-Grid job submission). If a

644 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

composite service involves frequent communications between service containers,

transport-level security can be used to secure the entire communication channel toreuse channel-based security contexts on all messages carried over the channel.

Message-level security supports end-to-end message delivery by applying digital

Figure 7. (A and B) User environment for G�i ðdÞ analysis [38 · 50 mm (600 · 600 DPI)].

TeraGrid GIScience gateway 645

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

signatures and encryptions to messages and is, therefore, suitable for composite

services in which a message traverses multiple service containers.A set of experiments was conducted in a local area network environment with a

service host and a group of service requesters. A G�i ðdÞ domain decomposition service

Figure 8. Domain decomposition service definition for G�i ðdÞ spatial statistic [39 · 54 mm(600 · 600 DPI)].

646 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

was used (Wang et al. 2008). Average response time (excluding computation time) was

measured as a function of the number of requesters who invoke the service simulta-

neously. As illustrated in Figure 9, security-enabled message communications

increase the service response time by an order of magnitude from tens of milliseconds

without security to hundreds with security. When security is added, message-level

security has larger overhead because of its message-by-message encryption and sign-

ing. Transport-level security leads to less overhead because encryption and signing arestreamlined on all bits transferred through communication channels. On the basis of

these experimental results, communications between the Gateway portal and service

hosts, which involve frequent service invocations, should use transport-level security.

A composite service that invokes services hosted by different service containers should

use message-level security to sign and encrypt Simple Object Access Protocol (SOAP)

messages instead of creating multiple secure communication channels between the

portal and the host services.

5.2 Bayesian geostatistical modeling

Compared to classical geostatistical modeling, Bayesian geostatistical modeling pro-

vides realistic estimation of prediction error as well as the ability to combine informa-

tion from disparate data sources (Cowles et al. 2002). However, Bayesian geostatistical

modeling, when using Markov chain Monte Carlo (MCMC) methods, is even more

computationally intensive than conventional geostatistical modeling (Cowles 2003).

Because an MCMC sampler must be run for thousands of iterations, each requiringcomputationally intensive linear algebra operations, the runtime for sequential

Bayesian geostatistical modeling algorithms can be prohibitive. Even with parallel

MCMC algorithms running on a single high-performance computer, runtime may be

unacceptable, especially for large geographical data sets.

Figure 9. Performance evaluation of service security overhead (ms: millisecond) [48 · 31 mm(600 · 600 DPI)].

TeraGrid GIScience gateway 647

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

5.2.1 Bayesian geostatistical modeling service. We focus on demonstrating how

Bayesian geostatistical modeling is incorporated into the Gateway so that the

GIScience community can share the modeling approach enabled by the TeraGrid.

Different from the componentization of the G�i ðdÞ service that produces computation

jobs based on the Grid architecture, the componentization of the Bayesian geostatisticalmodeling addresses the efficient design of a composite service that supports both tightly

and loosely coupled parallelization within a single geographical analysis. To enable the

sharing of Bayesian geostatistical modeling among numerous users, the Gateway exploits

the two-level parallelism, manages MPICH-G2 (Karonis et al. 2002) execution across

multiple TeraGrid high-performance computers, and harnesses a significant amount of

computational resource based on the unified TeraGrid software environment (Figure 10).

Component services are devised to assure computational resource availability, failure

recovery of an analysis process, and scalable use of a set of matrix manipulation librariescustomized to individual TeraGrid resources. The analysis is also implemented to support

interactive parameter tuning and Markov chain convergence diagnostics during the

execution. The selection of statistical parameters can be refined during the analysis.

Therefore, the componentization considers how to mange and present intermediate

results and provides stop-resume support during analysis computation.

The Bayesian geostatistical modeling service is composed of component services

categorized as TeraGrid, provisioning, and geospatial services [based on the same

componentization process for the G�i ðdÞ service]. Specific to the Bayesian geostatis-tical modeling service, a new computation management service is developed to

coordinate computation execution based on MPICH-G2. This service invokes

MPICH-G2 on a selected TeraGrid computer. MPICH-G2 then launches MPI jobs

on the reserved TeraGrid computers. User interfaces on specifying Bayesian inference

parameters and TeraGrid resource reservation are defined in the service.

5.2.2 Collaborative Bayesian geostatistical modeling. The Bayesian geostatistical

modeling service is developed to enable collaborative analysis based on the GatewayVO capabilities to share data, analysis progress, and results among Gateway users

(through the TeraGrid community account model). Because the specification of prior

distributions and initial parameter values and the diagnostics of Markov chains require

iterative refining and expert knowledge for analysis, collaboration among users becomes

desirable to cross validate the analysis and expedite the convergence of analysis

Figure 10. Using TeraGrid to compute Bayesian geostatistical modeling [52 · 22 mm(600 · 600 DPI)].

648 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

computation. This service allows a user to form dynamic VOs with collaborators to co-

analyze a shared data set. Each user can specify initial parameter values, prior distribu-

tions, and the number of MCMC iterations, and choose to share initial specifications,

dynamic MCMC convergence information, and results with selected collaborators. Each

user can then compare analysis information with her/his collaborators. A particularanalysis instance can be interactively stopped and resumed by its owner to refine

parameter values based on comparison among collaborators.

5.2.3 Experiment. An experiment was designed to demonstrate how the Bayesian

geostatistical modeling service enables a VO of 30 users to collaboratively analyze adata set by aggregating a substantial amount of TeraGrid resources (Figure 11A and B).

Figure 11. (A and B) User interface for the Bayesian geostatistical modeling service[54 · 23 mm (600 · 600 DPI)].

TeraGrid GIScience gateway 649

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

Each user ran 5–10 Markov chains, each of which required 16 CPUs and approximately

10-hour computations. In total, 38,000 CPU hours were contributed by three TeraGrid

sites to support this experiment. When a user logs in the Gateway user environment, an

integrated interface is rendered for the composite service to provide a comprehensive view

of computation progress, analysis configurations, results, and collaborative space. Eachuser can view analysis progress (including parameters, runtime results, and visualization)

of other participants through a collaboration interface (i.e., the ‘Similar Analyses’ area in

Figure 11A). The content of the collaboration interface is rendered and updated dyna-

mically and asynchronously by the backend service using AJAX technology. For exam-

ple, user3 was at the beginning of an analysis that provides early stage visualization of

modeling parameters. Through the collaboration interface, user1 was found to have

achieved converged results. Therefore, the information about user1’s analysis (i.e., para-

meter settings and plots that show Markov chain diagnosis information) is examined byuser3. By validating and/or learning from user1’s results, user3 can further tune an in-

progress computation by stopping it (marked ‘paused by user’ in the collaboration

interface), supplying a modified configuration file, and resuming the computation. This

experiment shows that a sizable group collaborated on a single analysis through the use of

the Gateway VO support for the dynamic sharing of analysis results and computation

progress. This collaborative analysis allows self-organizing contributions of expert

knowledge and exploration of modeling parameter space. Without imposing any over-

head on end users, the service is able to aggregate significant computational power andfacilitate a collaborative analysis to achieve high-quality results that any user may not

achieve individually.

6. Conclusion

A primary purpose of this research is to develop a shared, decentralized geographical

analysis environment in which a large number of GIScience community users can

contribute and share new analysis services and reuse existing services. The SOAenables the Gateway to adapt to changes in geographical analysis methods and

their underlying TeraGrid capabilities by conforming to interoperable service stan-

dards. The user environment is implemented to emphasize support for users to carry

out service-based geographical analysis. DGIP methods are incorporated as general-

ized services to enable scalable access to TeraGrid resources. The generalization is

derived from a theoretical construct – spatial computational domain representation

developed by Wang and Armstrong (2009).

Currently, the Gateway has implemented an online collaborative DGIP environ-ment that is shared by hundreds of users from multiple disciplines (e.g., biology,

computer science, geography, linguistics, public health, and statistics) who are inter-

ested in advancing GIScience or using CI and GIS. Three types of computationally

intensive geographical analyses (Bayesian geostatistical modeling, detection of local

spatial clustering, and inverse distance weighted spatial interpolation) are made

available as production-level services. Gateway users collaborate by contributing

and sharing component and composite services, and working together on a common

analysis such as the Bayesian geostatistical modeling.Two case studies are used to demonstrate the utility of the Gateway functions and

user environment. The studies prove the feasibility of developing generic geospatial

CI services that provide CI-based collaborative geographical analysis functions

shared by numerous users while managing the complexity of CI hardware and

650 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

software environments. This feasibility shows that GIScience enhances CI by cou-

pling GIS and geographical analysis as generic geospatial CI services, and further

indicates that CI empowers GIScience by enabling high-performance, distributed,

and collaborative geographical analysis.

In summary, our Gateway experience has demonstrated the feasibility to establishan online CI-based environment in which many users can dynamically create a VO

(i.e., collaboratory) to contribute and share geospatial services, and collaborate on

spatial analysis investigations that harness enormous CI resources. On the basis of a

SOA, the Gateway makes it straightforward for domain scientists to run and share

their own CI-empowered geographical analysis without having to resolve significant

challenges of managing CI complexity. Its architecture is designed to be open for

incorporating new CI and geographical analysis services and scalable to a large

number of users and analyses, massive CI resources, and significant computationalintensity of geographical analysis.

7. Discussions

Past experiences in bridging CI and domain sciences suggest that participation of

domain scientists is critical to developing CI and realizing its significant impacts

(Wang and Zhu 2008). This article presents an SOA framework to holistically brid-

ging CI and GIScience by describing the TeraGrid GIScience Gateway, a Web-basedGIS environment. The Gateway manages the complexity of the TeraGrid and pro-

vides a collaborative problem-solving environment for the GIScience community

users to conduct geographical analysis.

On the basis of the SOA, a novel strategy of this research is to transform a

geographical analysis into a composite service that can invoke other component

services. This strategy is a key to meeting the requirements for the Gateway service

infrastructure and user environment. The service infrastructure is based on a publish-

subscribe model to support service composition, provisioning, and execution. Theuser environment is developed by extending service interface definition to include user

interface metadata. This extension allows users to directly interact with the Gateway

services at the individual service level. To achieve highly configurable and interactive

service management interfaces, Web 2.0 technologies are used to implement this

support for user-service interactions.

The transformation of geographical analysis into geospatial CI services in the

Gateway is achieved in the context of integrating the following CI and GIScience

capabilities:

l high-performance computing, VO and Grid computing, and data and visualiza-

tion capabilities from CI perspectives; andl parallel and distributed processing of geographical analysis, service-oriented

Web GIS architecture, and collaborative geographical analysis capabilities

from GIScience perspectives.

This integration realizes a key vision of geospatial CI, emphasizing that geospatial

CI should provide geographical analysis services without exposing individual CI

components to end users. Geographical analysis services are viewed as an integral

part of geospatial CI that encapsulates the aforementioned CI and GIScience cap-

abilities. Geographical analysis services empowered by CI through the Gateway are

discussed in this section from three perspectives: 1) mechanisms to convert new

TeraGrid GIScience gateway 651

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

geographical analysis into reusable services, 2) strategies to achieve service interoper-

ability to enable a user-centric online geographical analysis environment, and 3)

service and workflow management. In addition, education and outreach experience

using the Gateway is summarized. Future work is described to advance the Gateway

toward a CI-empowered online DGIP environment that is customizable to satisfyindividual user preferences and meet collaboration needs.

7.1 Submit new services to the Gateway

The scalable Gateway SOA provides an open architecture for users to contribute their

geographical analysis as interoperable services. To convert a new analysis into a

Gateway service, Web-based tools are provided to automatically create component

services for accessing TeraGrid resources, encapsulating the analysis logic as a standardWeb service, and interacting with the user environment. Future research aims to

improve the Gateway scalability by continuing tool development to reduce the over-

head of transforming legacy geographical analysis into the service infrastructure and

user environment. A major challenge for computationally intensive geographical ana-

lysis is to address parallel and distributed processing. We are investigating sophisticated

geographical analyses such as agent-based modeling and spatial evolutionary algo-

rithms to gain an understanding about issues of automatic incorporation of such

computationally intensive analyses into the Gateway as services.

7.2 Service interoperability

To make the Gateway an open and scalable platform for numerous users to con-

tribute and share geographical analysis that is empowered by CI, the conversion tools

are compliant with the Geospatial Web Processing Service Specifications (developed

by the Open Geospatial Consortium, OGC) in terms of basic service description and

profiles. In addition, its data and visualization services adopt the Keyhole MarkupLanguage (KML) standard that has been accepted as an OGC standard (effective

version 2.2). Although such compliance or adoption is a key to service interoperability

and an open and collaborative environment for geographical analysis, Grid comput-

ing standards developed within the Open Grid Forum are necessary to enable full

integration of geographical analysis services with generic CI services. Recently, the

first OGC and Open Grid Forum Collaboration Workshop focused on exploring the

use of the Geospatial Web Processing Service that bridges Grid computing services.

We expect the Gateway services developed in this research to serve as initial bestpractices for interoperable CI-based geographical analysis services.

7.3 Service and workflow management

The orchestration and execution of geographical analysis services handle service–ser-

vice and user–service interactions, and sequentially invoke component services based

on analysis logic. Analysis logic is represented as a directed acyclic graph. The

interpretation of such graphs is handled by the publish-subscribe model through

provisioning services. Various evolving Web service-based workflow approaches(e.g., Krishnan et al. 2002, BPEL 2008, WSFL 2008, XLANG 2008) are being

investigated to link composite services in support of more sophisticated geospatial

decision-making and problem-solving. In addition, the use of CI resources requires

652 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

asynchronous service invocation on computation job submissions and data transfers

because time-consuming jobs and large data transfers are common.

7.4 Education and outreach

The Gateway has been used in a series of graduate and undergraduate courses

(Table 2) to teach CI, GIScience, and spatial statistics. The four GIS courses were

taught by GIScience faculty, whereas the two statistics courses were taught by

statistics faculty. Our preliminary findings from these courses suggest that the

Gateway may promote the active participation of students in learning CI,

GIScience, and geographical analysis, and enable the training of skills for collabora-

tive problem solving and decision making. In addition, the Gateway was used as aTeraGrid-based geospatial problem-solving environment for a student research com-

petition in the recent Second TeraGrid Annual Conference. Our Gateway, providing an

infectious disease risk-assessment problem based on spatial interpolation analysis,

was ranked the most reliable gateway. On the basis of feedback from conference

organizers, students reported that the user environment is user-friendly and were

impressed by the power of coupling GIS and TeraGrid to solve scientific problems.

Although these experiences show the successful bridging between CI and GIScience

within education contexts, they have helped stress-test its scalability to many usersand TeraGrid resources.

Acknowledgments

This research was supported in part by the NSF through the award OCI-0503697 and

the TeraGrid computation resource award TG-SES070007N. We thank Dr. Wenwu

Tang at University of Illinois at Urbana-Champaign and Dr. Jun Yan at the

University of Connecticut for their insightful comments. We are also grateful for

the insightful comments of the editors and three anonymous reviewers.

References

AJAX, 2008, AJAX, available online at: http://developer.mozilla.org/en/docs/AJAX (accessed

January 2008).

ARMSTRONG, M.P. and DENSHAM, P.J., 1992, Domain decomposition for parallel processing of

spatial problems. Computers Environment, and Urban Systems, 16, pp. 497–513.

ATKINS, D.E., et al., 2003, Revolutionizing Science and Engineering through Cyberinfrastructure:

Report of the National Science Foundation Blue-Ribbon Advisory Panel on

Cyberinfrastructure, available online at: http://www.communitytechnology.org/

nsf_ci_report/ (accessed January 2008).

Table 2. Gateway course information.

Course name (times used)Typical number

of students Level

Introduction to GIS (once) 30 Primarily undergraduateFoundations of GIS (twice) 50 UndergraduatePrinciples of GIS (twice) 20 Undergraduate and graduateAdvanced GIS (once) 20 Undergraduate and graduateBayesian Statistics (once) 20 Undergraduate and graduateComputing in Statistics (once) 20 Primarily graduate

TeraGrid GIScience gateway 653

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

BIEBERSTEIN, N., et al., 2005, Impact of service-oriented architecture on enterprise systems,

organizational structures, and individuals. IBM Systems Journal, 44, pp. 691–708.

BPEL, 2008, OASIS Web Services Business Process Execution Language, available online at:

http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html (accessed January 2008).

CATLETT, C., et al., 2007, TeraGrid: analysis of organization, system architecture, and middle-

ware enabling new types of applications. In L., Grandinetti (Ed.), HPC and Grids In

action of Advances in Parallel Computing Series, pp . 9–18 (Amsterdam: IOS Press).

CLIFF, A.D. and ORD, J.K., 1973, Spatial Autocorrelation (London, UK: Pion Press).

COWLES, M.K., 2003, Efficient model-fitting and model-comparison for high-dimensional Bayesian

geostatistical models. Journal of Statistical Planning and Inference, 112, pp. 221–239.

COWLES, M.K., 2002, Combining snow water equivalent data from multiple sources to estimate

spatio-temporal trends and compare measurement systems. Journal of Agricultural,

Biological, and Environmental Statistics, 7, pp. 536–557.

CTSS (Coordinated TeraGrid Software and Services), 2008, available online at:http://www.ter-

agrid.org/userinfo/software/ctss.php (accessed January 2008).

DENSHAM, P.J. and ARMSTRONG, M.P., 1998, Spatial analysis. In R. Healey, S. Dowers, B.,

Gittings and M.Mineter (Eds), Parallel Processing Algorithms for GIS, pp.387–413

(Bristol, PA: Taylor and Francis).

DING, Y. and DENSHAM P.J., 1996, Spatial strategies for parallel spatial modeling. International

Journal of Geographical Information Systems, 10, pp. 669–698.

EUGSTERR, P.Th, et al., 2003, The many faces of publish/subscribe. AMC Computing Survey,

35(2), pp. 114–131.

FONSECA, F.T., et al., 2002, Using ontologies for integrated geographic information systems.

Transactions in GIS, 6(3), pp. 231–257.

FOSTER, I., 2006, Globus Toolkit Version 4: Software for service-oriented systems. In H. Jin,

D.A. Reed and W. Jiang (Eds), Proceedings of the IFIP International Conference on

Network and Parallel Computing, Tokyo, Japan, pp. 2–13 (Berlin: Springer).

GEON, 2008, Geosciences Network, available online at: http://www.geongrid.org/ (accessed

January 2008).

GETIS, A. and ORD, J.K., 1992, The analysis of spatial association by use of distance statistics.

Geographical Analysis, 24, pp. 189–206.

GOODCHILD, M.F., 2007, Citizens as voluntary sensors: spatial data infrastructure in the world of

Web 2.0. International Journal of Spatial Data Infrastructures Research, 2, pp. 24– 32.

HUANG, H.-C., CRESSIE, N. and GABROSEK, J., 2002, Fast, resolution-consistent spatial predic-

tion of global processes from satellite data. Journal of Computational and Graphical

Statistics, 11, pp. 63– 88.

HUMPHREY M., et al., 2005, State and events for Web services: a comparison of five WS-resource

framework and WS-notification implementations. In Proceedings of the 14th IEEE

International Symposium on High Performance Distributed Computing (HPDC-14),

24–27 July 2005, Research Triangle Park, NC, pp. 3–13.

JSR-168, 2008, JSR-168 PortletAPI, available online at: http://developers.sun.com/portalser-

ver/reference/techart/jsr168/ (accessed January 2008).

KARONIS, N., TOONEN, B. and FOSTER, I., 2002, MPICH-G2: a Grid-enabled implementation of

the message passing interface. Journal of Parallel and Distributed Computing (JPDC),

63(5), pp. 551–563.

KRISHNAN, S., WAGSTROM, P. and LASZEWSKI, G.V., 2002, GSFL: a workflow framework for grid

services. Technical Reports (Argonne, Chicago, IL: Argonne National Laboratory),

available online at: http://www-unix.globus.org/cog/papers/gsfl-paper.pdf (accessed

January 2008).

KRZANOWSKI, R.M. and RAPER, J., 1999, Hybrid genetic algorithm for transmitter location in

wireless networks. Computers, Environment and Urban Systems, 23, pp. 359–382.

LEAD, 2008, Linked Environments for Atmospheric Discovery, available online at:https://portal.-

leadproject.org/gridsphere/gridsphere?cid=lead-grid (accessed January 2008).

654 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

MALANSON, G.P. and ARMSTRONG, A.P., 1996, Dispersal probability and forest diversity in a

fragmented landscape. Ecological Modelling, 87, pp. 91–102.

NEON, 2003, Neon: Addressing the Nation’s Environmental Challenges (Washington, DC:

National Academies Press).

NEON, 2008, National Ecological Observatory Network, available online at:http://www.neo-

ninc.org/ (accessed January 2008).

NSF (National Science Foundation), 2007, Cyberinfrastructure Vision for 21st Century

Discovery, available online at: http://www.nsf.gov/pubs/2007/nsf0728/index.jsp

(accessed January 2008).

SEEK, 2008, The Science Environment for Ecological Knowledge, available online at: http://

seek.ecoinformatics.org/ (accessed March 2008).

TERAGRID, 2008, TeraGrid Community Account Policy, available online at: http://www.tera-

gridforum.org/mediawiki/images/8/81/TGD-10.doc (accessed January 2008).

TIAN M., et al., 2003, Performance impact of Web services on Internet servers. In T. Gonzalez,

Proceedings of 2003 Parallel and Distributed Computing and Systems. November 2003,

pp. 162–184 (Marina del Rey, CA: ACTA).

TSOU, M.H. and BUTTENFIELD, B.P., 2002, A dynamic architecture for distributing geographic

information services. Transactions in GIS, 6(4), pp. 355–381.

WANG, S., 2008, Formalizing computational intensity of spatial analysis. In Proceedings of the

5th International Conference on Geographic Information Science, 23–26 September, Park

City, UT.

WANG, S. and ARMSTRONG, M.P., 2003, A quadtree approach to domain decomposition for

spatial interpolation in grid computing environments. Parallel Computing, 29, pp.

1481–1504.

WANG, S. and ARMSTRONG, M.P., 2009, A theoretical approach to the use of cyberinfrastructure

in geographical analysis. International Journal of Geographical Information Science,

23(2), pp. 169–193.

WANG, S., et al., 2005, GISolve: a Grid-based problem solving environment for computation-

ally intensive geographic information analysis. In Proceedings of the 14th International

Symposium on High Performance Distributed Computing (HPDC-14) – Challenges of

Large Applications in Distributed Environments (CLADE) Workshop, pp. 3–12

(Research Triangle Park, NC: IEEE Press).

WANG, S., COWLES, M.K. and ARMSTRONG, M.P., 2008, Grid computing of spatial statistics:

using the TeraGrid for G�i ðdÞ analysis. Concurrency and Computation: Practice and

Experience, 20(14), pp. 1697–1720.

WANG, S. and ZHU, X.-G., 2008, Coupling cyberinfrastructure and geographic information

systems to empower ecological and environmental research. BioScience, 58(2), pp. 94–95.

WATERS, 2008, WATer and Environmental Research Systems Network (WATERS Network),

available online at: http://www.watersnet.org/ (accessed January 2008).

WELCH V., et al., 2004, X.509 proxy certificates for dynamic delegation. In Proceedings of the

3rd Annual Public Key Infrastructure (PKI) Research and Development Workshop,

12–14 April, Gaithersburg, MD.

WILKINS-DIEHR, N., 2007, Special issue: science Gateways – common community interfaces to Grid

resources. Concurrency and Computation: Practice & Experience, 19(6), pp. 743–749.

WRIGHT, D.J., et al., 2003, Why Web GIS may not be enough: a case study with the Virtual

Research Vessel. Marine Geodesy, 26(1–2), pp. 73–86.

WSFL, 2008, Web Service Flow Language, available online at: http://www.ebpml.org/wsfl.htm

(accessed April 2008).

WSRF, 2008, WSRF, available online at: http://www.globus.org/wsrf/ (accessed January 2008).

XLANG, 2008, XLANG, available online at: http://www.ebpml.org/xlang.htm (accessed April

2008).

YAN, J., et al., 2007, Parallelizing MCMC for Bayesian spatiotemporal geostatistical models.

Statistics and Computing, 17, pp. 323–335.

TeraGrid GIScience gateway 655

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011

YANG, C. and RASKIN, R., 2009, Introduction to distributed geographic information processing

(DGIP) research. International Journal of Geographic Information Science, 23(5), pp.

553–560.

YANG, C., et al., 2005, Performance improving techniques in WebGIS. International Journal of

Geographical Information Science, 19(3), pp. 319–342.

YANG, C., et al., 2007, The emerging concepts and applications of the spatial web portal.

Photogrammetric Engineering & Remote Sensing, 73(6), pp. 691–698.

ZHANG, T. and TSOU, M.H., 2009, Developing a grid-enabled spatial Web portal for Internet

GIServices and geospatial cyberinfrastructure. International Journal of Geographic

Information Science, 23(5), pp. 605–630.

656 S. Wang and Y. Liu

Dow

nloa

ded

by [

Geo

rge

Mas

on U

nive

rsity

] at

12:

40 0

6 Ju

ly 2

011