cyberinfrastructure and its role in science

43
Cyberinfrastructure and its Role in Science Cameron Kiddle Research Fellow, Grid Research Centre Adjunct Assistant Professor, Department of Computer Science, University of Calgary Distributed Systems Architect, WestGrid

Upload: cameron-kiddle

Post on 11-May-2015

941 views

Category:

Technology


1 download

DESCRIPTION

This presentation examines some of the challenges scientists face and describes various cyberinfrastructure technologies that help address these challenges. Example projects employing cyberinfrastructure technologies that we have worked on at the Grid Research Centre, including the GeoChronos project, are also presented. This presentation was given at the IAI International Wireless Sensor Networks Summer School held at the University of Alberta on July 6th, 2009.

TRANSCRIPT

Page 1: Cyberinfrastructure and its Role in Science

Cyberinfrastructure and its Role in Science

Cameron Kiddle

Research Fellow, Grid Research Centre

Adjunct Assistant Professor, Department of Computer Science, University of Calgary

Distributed Systems Architect, WestGrid

Page 2: Cyberinfrastructure and its Role in Science

Outline Challenges Cyberinfrastructure Cyberinfrastructure Technologies Examples

ICE Force Project Molecular Dynamics Simulations GT4-based Grid for Canada Fire Dynamics Simulator Rendering on the Cloud GeoChronos

IAI Summer School July 6, 2009

Cyberinfrastructure - 2

Page 3: Cyberinfrastructure and its Role in Science

Collaboration Challenges Familiarity/awareness of collaboration tools Keeping all interested parties in the loop Finding related work and researchers Keeping up to date with current research Collaboration while working in the field

IAI Summer School July 6, 2009

Cyberinfrastructure - 3

Page 4: Cyberinfrastructure and its Role in Science

Data Challenges Acquisition of data

Many different data sources Large quantities of data Different regulations/mechanisms for accessing data Lack of automation Finding the right data Bandwidth constraints

Managing data Scattered and unorganized data Inadequate tools for recording/maintaining metadata

Data without metadata is meaningless Lack of suitable metadata standards Validation of metadata

Tracking provenance of data Pre-processing of data

Raw data typically cannot be directly analyzed Significant amount of time spent preparing data for analysis Lack of automation

IAI Summer School July 6, 2009

Cyberinfrastructure - 4

Page 5: Cyberinfrastructure and its Role in Science

Application Challenges Limited availability of computing resources Access to and familiarity of heterogeneous

computing resources Fault tolerance and reliability Access to software available in research lab

while in field or other locations Installing, configuring and updating software System dependencies of software Awareness and suitability of available software Sharing applications and results

IAI Summer School July 6, 2009

Cyberinfrastructure - 5

Page 6: Cyberinfrastructure and its Role in Science

Cyberinfrastructure

IAI Summer School July 6, 2009

Cyberinfrastructure - 6

“Like the physical infrastructure of roads, bridges, power grids, telephone lines, and water systems that support modern society, "cyberinfrastructure" refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor.”

Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 2003.

Page 7: Cyberinfrastructure and its Role in Science

Cyberinfrastructure Technologies Grid Computing Cloud Computing Virtualization Web 2.0 / Social Networking Web Portals / Scientific Gateways Semantic Web …

IAI Summer School July 6, 2009

Cyberinfrastructure - 7

Page 8: Cyberinfrastructure and its Role in Science

Grid Computing

IAI Summer School July 6, 2009

Cyberinfrastructure - 8

Many different definitions/uses computational grids, data grids, desktop grids, campus grids, sensor

grids, access grids Coordinated sharing of heterogeneous resources across

administrative domains

Resources Shared by Virtual Organization X

Resources Shared byVirtual Organization Y

Domain A

Domain B Domain C

Page 9: Cyberinfrastructure and its Role in Science

Grid Middleware

IAI Summer School July 6, 2009

Cyberinfrastructure - 9

The layer between users/applications and grid resources that glues everything together

Example grid middleware Globus Toolkit

GT2 – pre-standards GT4 – Web Services based

UNICORE gLite ARC NAREGI

Page 10: Cyberinfrastructure and its Role in Science

Key Grid Middleware Services Security Services

Concerned with authentication, authorization, secure communication, …

Information Services Provide information about resources, policy, services

and applications to tools and users Data Management Services

Manage movement and replication of data as well as metadata about data

Execution Management Services Handle placement, provisioning and lifetime

management of jobs and workflowsIAI Summer School July 6, 2009

Cyberinfrastructure - 10

Page 11: Cyberinfrastructure and its Role in Science

Benefits of Grid Computing Easier access to more resources

Users/organizations can share resources Single sign-on Common interface (hide heterogeneity)

Improved data management Efficient file transfers Abstraction of physical location of data

Automated execution of jobs and workflows

IAI Summer School July 6, 2009

Cyberinfrastructure - 11

Page 12: Cyberinfrastructure and its Role in Science

Example Grid Projects

IAI Summer School July 6, 2009

Cyberinfrastructure - 12

Name DescriptionLHC Computing Grid http://lcg.web.cern.ch/

data storage and analysis infrastructure for the high energy physics community using the Large Hadron Collider (LHC) at CERN (ATLAS Tier-1 site at TRIUMF in British Columbia)

Network for Earthquake Engineering Simulation (NEES) http://www.nees.org/

a US national network of 15 facilities to study the impact of earthquakes on buildings, bridges, etc.

Expanding GEOsciences on DEmand (EGEODE)

http://www.egeode.org/

a virtual organization (VO) associated with EGEE that is dedicated to research in geoscience for both public and private industrial R&D and academic laboratories

International Virtual Observatory Alliance (IVOA) http://www.ivoa.net/

development of standards and infrastructure to share and analyze astronomical archives from around the world

Page 13: Cyberinfrastructure and its Role in Science

Cloud Computing

IAI Summer School July 6, 2009

Cyberinfrastructure - 13

Transparent access to scalable and dynamic services over the Internet

Key features: Everything as a Service (EaaS) Utility/On-demand Accessibility/Transparency Scalability Virtualization

Page 14: Cyberinfrastructure and its Role in Science

Cloud Computing Solutions

IAI Summer School July 6, 2009

Cyberinfrastructure - 14

Page 15: Cyberinfrastructure and its Role in Science

Benefits of Cloud Computing

IAI Summer School July 6, 2009

Cyberinfrastructure - 15

Reduce capital, support and maintenance costs Pay only for what you use Get access to more/fewer resources when needed

Ready to use for users No more downloads, installations or updates

Simplify and speed up software development Don’t have to support multiple platforms

Application popularity and lifespan difficult to predict Scale applications according to user demand

Page 16: Cyberinfrastructure and its Role in Science

Cloud Computing Case Study: Application Popularity on Facebook

IAI Summer School July 6, 2009

Cyberinfrastructure - 16

Difficult to predict popularity and lifespan of applications

Facebook Application Growth Sep. 2007: ~ 3700 Sep. 2008: ~39000

Facebook Application Popularity (Sep. 12, 2008) 39181 applications Active user data for 37155

apps 3 apps > 10 million active users 80% apps < 1000 active users

Monthly Active Users vs.

Rank of Facebook Applications(September 12, 2008)

Page 17: Cyberinfrastructure and its Role in Science

Cloud Computing Case Study: Shrek (Dreamworks)

IAI Summer School July 6, 2009

Cyberinfrastructure - 17

Shrek (2001) – 5 million CPU render hours Shrek 2 (2004) – 10 million CPU render hours Shrek 3 (2007) – 20 million CPU render hours

Time to Render

1 CPU 100 CPUs 10000 CPUs

Shrek 571 years 5.7 years 21 days

Shrek 2 1142 years 11.4 years 42 days

Shrek 3 2283 years 22.8 years 83 days

(Source: R. Rowe. DreamWorks Animation "Shrek the Third": Linux Feeds an Ogre. Linux Journal. June 5, 2007. (http://www.linuxjournal.com/article/9653))

Page 18: Cyberinfrastructure and its Role in Science

Cloud Computing Case Study: Animoto

IAI Summer School July 6, 2009

Cyberinfrastructure - 18

Animoto (http://animoto.com) Produces professional quality videos from

images Runs on Amazon EC2

Popularity soared when promoted on Facebook

During the course of 4 days: Jumped from 8 to 450 renderings per minute ~20000 new users per hour 3500 instances running on Amazon EC2 at peak

(Source: D. Barker. You Need 3,500 Servers by When?! On-demand Enterprise. 2008.07.07)

Page 19: Cyberinfrastructure and its Role in Science

Virtualization

IAI Summer School July 6, 2009

Cyberinfrastructure - 19

Can transform a single physical machine into multiple virtual machines (VMs) each with their own OS and software stack

Virtualization software Xen, KVM, VMWare Support allocation, deallocation, checkpointing and

migration of VMs Benefits

Custom environments (root access) More efficient use of resources (consolidation) System maintenance without disruption

Page 20: Cyberinfrastructure and its Role in Science

Web 2.0 – The “Social Web”

IAI Summer School July 6, 2009

Cyberinfrastructure - 20

Aimed at: Providing feature rich user environments Making it easier for users to generate Web content Improving online social connectivity

Example Web 2.0 technologies Blogs (WordPress, TypePad) Wikis (Wikipedia) Mashups (HousingMaps, ChicagoCrime) Widgets/Gadgets (iGoogle, Netvibes) Social networks (Facebook, MySpace, YouTube)

Page 21: Cyberinfrastructure and its Role in Science

Social Networking Sites/Platforms

IAI Summer School July 6, 2009

Cyberinfrastructure - 21

Page 22: Cyberinfrastructure and its Role in Science

Web Portals / Scientific Gateways

IAI Summer School July 6, 2009

Cyberinfrastructure - 22

Aimed at providing a community of users access to computing resources through a common Web-based interface

Web portal development tools GridSphere (portlet based) Web 2.0/Social Networking

Examples TeraGrid Scientific Gateways (over 30 of them) nanoHUB

Page 23: Cyberinfrastructure and its Role in Science

Semantic Web Aimed at representing knowledge, not just

information Connecting and relating data in a way

understandable by machines Semantic Web standards

Resource Description Framework (RDF) Web Ontology Language (OWL)

IAI Summer School July 6, 2009

Cyberinfrastructure - 23

Page 24: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

Confederation Bridge ICE Force Monitoring Project

Monitoring of forces on the Confederation Bridge Data analyzed by civil engineering groups at University of

Calgary and Carleton University GRC developed solution to automate data management

as part of a CANARIE AAP project

(http://www.confederationbridge.com) (http://www.confederationbridge.com)

2424

Page 25: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

ICE Force - Technologies Used Grid Middleware

GT4 Data Management

Proactive Data Management Service (PDMS) Data Transfer - GridFTP, RFT Replication Management – RLS Metadata Management - MCS

25

Page 26: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

Molecular Dynamics Simulations (GROMACS)

GROMACS Parallel molecular dynamics

simulation application Can simulate hundreds to

millions of particles Simulation runs can take

days, weeks or months Issues with long running

jobs Fault tolerance Scheduler policy constraints

(http://moose.bio.ucalgary.ca/)

26

Page 27: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

GROMACS - Grid Enabled Solution Automated grid enabled solution developed

by GRC to manage GROMACS simulations as part of a CANARIE AAP project

Long jobs split into a series of shorter jobs Automates checkpointing, migration and

reconfiguration of jobs

27

Page 28: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

GROMACS - Portal

28

Page 29: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

GROMACS - Technologies Used Grid Middleware

GT4 Information Services

WS MDS Data Management

PDMS (GridFTP, RFT, RLS, MCS) Execution Management

Custom system (Condor-G, WS GRAM) Portal

GridSphere

29

Page 30: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

Web Service based Grid Environment for Canada Established a GT4-based grid environment from

resources across Canada (CANARIE CIIP)

30

Page 31: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

GT4-based Grid - Model Schemas Models developed to describe systems, applications

and scheduler policy (GRC Model Schema)

System Model Class Diagram

31

Page 32: Cyberinfrastructure and its Role in Science

32IAI Summer School July 6, 2009

Cyberinfrastructure -

GT4-based Grid – Viewing Resource Information Used WebMDS, a customizable Web based interface

for viewing resource information published by WS MDS

Page 33: Cyberinfrastructure and its Role in Science

GT4-based Grid - Technologies Used

IAI Summer School July 6, 2009

Cyberinfrastructure - 33

Grid Middleware GT4

Data Management GridFTP, RFT

Information Services GRC Model Schema, WS MDS, WebMDS

Execution Management Condor-G, WS GRAM

Page 34: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

Example: Fire Simulation Developed a comprehensive

environment for the Fire Dynamics Simulator (FDS) as part of a collaborative project between GRC and HP Labs

Deployed on HP Labs Data Centre at University of Calgary

Initial focus of project Leverage Web 2.0 technologies Explore use of virtualization in a

utility/cloud computing environment

34

Page 35: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

Fire Simulation - Technologies Used User level

Web 2.0/social networking technology (Facebook)

Service provider level LAMP environment (Linux, Apache, MySQL,

Perl/Python/PHP) Simulation (FDS, Condor) Visualization (Smokeview, VNC)

Resource (utility) provider level Cloud computing technology (ASPEN) Virtual machine technology (Xen)

35

Page 36: Cyberinfrastructure and its Role in Science

IAI Summer School July 6, 2009

Cyberinfrastructure -

Example: Rendering on the Cloud GRC created an on-

demand cloud rendering service for EDM Studio

Cybera Pilot Project Technologies used:

Cloud computing technology (ASPEN)

Virtual machine technology (Xen)

Social networking technology (Ning/Elgg)

36

Page 37: Cyberinfrastructure and its Role in Science

An on-line platform For:

Earth Observation Scientists Facilitating:

Collaboration between scientists Data access, management and sharing Application access, management and sharing

Leveraging: Web 2.0 / social networking technologies (Elgg) Semantic Web technologies (RDF, OWL) Cloud computing and virtualization technologies (ASPEN,

Xen)

IAI Summer School July 6, 2009

Cyberinfrastructure - 37

Page 38: Cyberinfrastructure and its Role in Science

GeoChronos - Collaboration

Social networking portal Elgg-based (elgg.org)

Social networking services Blogs Tags Media/document sharing Wikis Friends/contacts Groups Discussions Message boards Calendars Status News Feeds

IAI Summer School July 6, 2009

Cyberinfrastructure -

http://geochronos.org/

38

Page 39: Cyberinfrastructure and its Role in Science

GeoChronos - Data Data Acquisition

Automated acquisition of data from sensors (ground, airborne, satellite) or third party

Data Storage Store, share, browse and

search data i.e., spectral library

Data Processing Automated data workflows

i.e., mosaic, reproject and subset MODIS data

IAI Summer School July 6, 2009

Cyberinfrastructure - 39

Page 40: Cyberinfrastructure and its Role in Science

GeoChronos - Applications Interactive Application

Service (IAS) On-line, on-demand access to

scientific applications Share application sessions and

data with other users Access control to applications

Batch Processing Service Batch processing environment

for longer running data processing tasks or simulations

For use directly by individual users or as part of automated data workflows

IAI Summer School July 6, 2009

Cyberinfrastructure - 40

Page 41: Cyberinfrastructure and its Role in Science

GeoChronos - Project Team

IAI Summer School July 6, 2009

Cyberinfrastructure -

Dr. Arturo Sanchez-AzofeifaUniversity of Alberta

Dr. John GamonUniversity of Alberta

Dr. Benoit RivardUniversity of Victoria

Dr. Rob SimmondsUniversity of Calgary

Prinicipal Investigators

Project Coordination Platform Development Domain Scientists

41

Page 42: Cyberinfrastructure and its Role in Science

GeoChronos - Virtual Organization

IAI Summer School July 6, 2009

Cyberinfrastructure - 42

Page 43: Cyberinfrastructure and its Role in Science

Contact Information

IAI Summer School July 6, 2009

Cyberinfrastructure -

Cameron [email protected]://pages.cspc.ucalgary.ca/~kiddlec/

http://grid.ucalgary.ca/

43