cyberinfrastructure and california

22
UNIVERSITY OF CALIFORNIA SAN DIEGO SUPERCOMPUTER CENTER Fran Berman UCSD Dr. Francine Berman Director, San Diego Supercomputer Center Professor and High Performance Computing Endowed Chair, UC San Diego Cyberinfrastructure and California

Upload: ion

Post on 05-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Cyberinfrastructure and California. Dr. Francine Berman Director, San Diego Supercomputer Center Professor and High Performance Computing Endowed Chair, UC San Diego. The Digital World. Science. Entertainment. Commerce. Information. wireless. sensors. computer. Field instrument. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Dr. Francine BermanDirector, San Diego Supercomputer Center

Professor and High Performance Computing Endowed Chair, UC San Diego

Cyberinfrastructure and California

Cyberinfrastructure and California

Page 2: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

The Digital World

Commerce

Entertainment

Information

Science

Page 3: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Today’s Technology is a Team Sport

• Today’s “computer” is a coordinated set of hardware, software, data, and services providing an “end-to-end” resource.

network

DATA

computer

storage

fieldinstrument

network

computer

DATA

network

computerviz

computer

sensorsFieldinstrument

DATA

wireless

The “computer” as an integrated set of resources

• Cyberinfrastructure captures the integrated character of today’s IT environment

Page 4: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Cyberinfrastructure -- An Integrating Concept

Cyberinfrastructure =

Resources (computers, data

storage, networks, scientific instruments,

experts, etc.)

+ “Glue”(integrating software,

systems, and organizations)

Page 5: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

How does Cyberinfrastructure Work?Cyberinfrastructure-enabled Neurosurgery

• PROBLEM: Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue

• Brain deforms during surgery• Surgeons must align preoperative

brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation

Radiologists and neurosurgeons at Brigham and Women’s Hospital, Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment

Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons

Transmission repeated every hour during 6-8 hour surgery.

Transmission and output must take on the order of minutes

Page 6: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

SDSC• National facility funded by NSF,

NIH, DOE, Library of Congress, NARA, etc.

• Employs nearly 400 researchers, staff and students

• National Facility and UCSD Organized Research Unit

• Home to many associated activities including

• Protein Data Bank

• Biomedical Informatics Research Network (BIRN) Coordinating Center

• Geosciences Network (GEON)

• NEES IT Center, etc.

SDSC is a National Cyberinfrastructure Center

Grid andCluster

Computing

Data-oriented

Science and Engineering

Networking

High Performancecomputing

Data andKnowledge Systems

ComputationalScience and Engineering

Community Databasesand Data Collections

SW tools,workbenches,

toolkits

Page 7: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

SDSC Resources Are Available to the Community

COMPUTE SYSTEMS• DataStar

• 2,528 Power4+ processors• IBM p655 8-way and p690

32-way nodes• 7 TB total memory• Up to 3 GBps I/O to disk

• TeraGrid Cluster• 512 Itanium2 IA-64

processors• 1 TB total memory• Also 128 2-way data nodes

• Blue Gene Data• First academic IBM Blue

Gene system• 2,048 PowerPC processors• 128 I/O nodes

http://www.sdsc.edu/user_services/

SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES

• User Services• Application/Community Collaborations• Education and Training• SDSC Synthesis Center• Community SW, toolkits, portals, codes

• http://www.sdsc.edu/

DATA ENVIRONMENT• 1.4 PB Storage-area Network (SAN)• 6 PB StorageTek tape library• HPSS and SAM-QFS archival systems• DB2, Oracle, MySQL• Storage Resource Broker• 72-CPU Sun Fire 15K• IBM p690s – HPSS, DB2, etc

• http://datacentral.sdsc.edu/

Support for community data collections and

databases

Data management,

mining, analysis, and preservation

Page 8: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Cyberinfrastructure Can Help Harness Today’s Deluge of Data

• Over the next decade, data will come from everywhere• Scientific instruments• Experiments• Sensors and sensornets• New devices (personal digital devices,

computer-enabled clothing, cars, …)

• And be used by everyone• Scientists• Consumers• Educators• General public

• Cyberinfrastructure must support unprecedented diversity, globalization, integration, scale, and use

Data from sensors

Data from simulations

Data from

instruments

Data from analysis

Volunteer Data

Page 9: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

How much Data is there?*

Kilo 103

Mega 106

Giga 109

Tera 1012

Peta 1015

Exa 1018

1 human brain at the

micron level = 1 PetaByte

1 novel = 1 MegaByte

iPod Shuffle (up to 120 songs) = 512 MegaBytes

Printed materials in the Library of Congress = 10 TeraBytes

SDSC HPSS tape archive = 6 PetaBytes

All worldwide information in one year

= 2 ExaBytes

1 Low Resolution

Photo = 100 KiloBytes

* Rough/average estimates

Page 10: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Cybeirnfrastructure and Data: Using Data for Analysis and

Simulation

Page 11: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Major Major Earthquakes on Earthquakes on

the San the San Andreas Fault, Andreas Fault, 1680-present1680-present

19061906M 7.8M 7.8

18571857M 7.8M 7.8 16801680

M 7.7M 7.7

How dangerous is the How dangerous is the southern San southern San

Andreas Fault?Andreas Fault?

• The SCEC TeraShake simulation is a result of immense effort from the Geoscience community for over 10 years

• Focus is on understanding big earthquakes and how they will impact sediment-filled basins.

• Simulation combines massive amounts of data, high-resolution models, large-scale supercomputer runs

• TeraShake results provide new information enabling better

• Estimation of seismic risk

• Emergency preparation, response and planning

• Design of next generation of earthquake-resistant structures

• Such simulations provide potentially immense benefits in saving both many lives and billions in economic losses

?

Cyberinfrastructure – enabled Disaster Preparedness

Page 12: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Domain: 600Km x 300km x 80km Mesh Dimension: 3000x1500x400

Spatial resolution = 200m Simulated time = 200s

Number of time steps = 20,000• What you’re looking at:

• L.A. experiences strong ground motion from the S->N scenario

• The N->S rupture generates strong reverberations in the Imperial Valley, ultimately hitting Mexicalli and other northern Mexico cities.

• Large local peaks in ground motion near Palm Springs, resulting in immense damage.

Page 13: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Making Terashake Work -- Resources

• Data Storage• 47 TB archival tape storage

on Sun StorEdge SAM-QFS

• 47 TB backup on High Performance Storage system HPSS

• SRB Collection with 1,000,000 files

• Funding• SDSC Cyberinfrastructure

resources for TeraShake funded by NSF

• Southern California Earthquake Center is an NSF-funded geoscience research and development center

• Computers and Systems• 80,000 hours on 240

processors of DataStar

• 256 GB memory p690 used for testing, p655s used for production run, TG used for porting

• 30 TB Global Parallel file GPFS

• Run-time 100 MB/s data transfer from GPFS to SAM-QFS

• 27,000 hours post-processing for high resolution rendering

• People • 20+ people involved in information

technology support

• 20+ people involved in geoscience modeling and simulation

Page 14: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Cyberinfrastructure and Data: Preserving our Scientific and

Cultural Heritage

Page 15: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Data Preservation

• Many Science, Cultural, and Official Collections must be sustained for the foreseeable future

• Critical collections must be preserved:

• community reference data collections (e.g. Protein Data Bank)

• irreplaceable collections (e.g. Shoah collection)

• longitudinal data (e.g. PSID – Panel Study of Income Dynamics)

• No plan for preservation often means that data is lost or damaged

“….the progress of science and useful arts … depends on the reliable preservation of

knowledge and information for generations to come.”

“Preserving Our Digital Heritage”, Library of Congress

Page 16: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Key Challenges for Digital Preservation

• What should we preserve?• What materials must be “rescued”?• How to plan for preservation of materials by

design?

• How should we preserve it?• Formats• Storage media• Stewardship – who is responsible?

• Who should pay for preservation?• The content generators?• The government?• The users?

• Who should have access?

Print media provides easy access for long periods of time

but is hard to data-mine

Digital media is easier to data-mine but requires management of evolution of media

and resource planning over time

Page 17: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Planning Ahead for Preservation

Services

PolicyR&D

Ingestion

• Comprehensive approach to infrastructure for long-term preservation requires the integration of

• Collection ingestion

• Access and Services

• Research and development for new functionality and adaptation to evolving technologies

• Business model, data policies, and management issues critical to success of the infrastructure Consortium

Page 18: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Cyberinfrastructure Resources at SDSC

Page 19: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

SDSC Data Central• First program of its kind to

support research and community data collections and databases

• Comprehensive resources• Disk: 400 TB accessible via HPC

systems, Web, SRB, GridFTP• Databases: DB2, Oracle, MySQL• SRB: Collection management• Tape: 6 PB, accessible via file system,

HPSS, Web, SRB, GridFTP

• Data collection and database hosting• Batch oriented access• Collection management services• Collaboration opportunities:

• Long-term preservation • Data technologies and tools

New Allocated Data Collections include

• Bee Behavior (Behavioral Science)• C5 Landscape DB (Art)• Molecular Recognition Database

(Pharmaceutical Sciences)• LIDAR (Geoscience)• LUSciD (Astronomy)• NEXRAD-IOWA (Earth Science)

• AMANDA (Physics)• SIO_Explorer (Oceanography)• Tsunami and Landsat Data

(Earthquake Engineering)• UC Merced Library Japanese Art Collection

(Art)• Terabridge (Structural Engineering)

[email protected]

Page 20: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

SDSC Cyberinfrastructure Resources Heavily Used by UC faculty and students

• UC PIs account for 329+ trillion bytes of data stored at SDSC

• In FY05, over 5 million CPU hours on HPC machines at SDSC were used by UC faculty and students at all campuses

• UCSD faculty make up 40% of among top users of SDSC compute resources

SDSC Academic Associates Program Targets Enabling Cyberinfrastructure Collaborations

SDSC/UC Academic Associates Program Cyberinfrastructure and “Seeding” Activities

• Targeted workshops

• Priority SW installation and support

• Priority participation for Cyberinfrastructure Summer Institute

• Focused assistance with developing successful proposals for national allocation programs

• Targeted user services

• Special UC compute and data allocations

• Priority for “early usage” of new national resources

Page 21: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Cyberinfrastructure is Fundamental for California

• Cyberinfrastructure captures the practice and potential of modern science and engineering

• Cyberinfrastructure is the focus of increasing number of federal programs• NSF (all directorates), NIH (BISTI,

Bioinformatics, Computational Biology, etc.), DOE (Science Grid), etc.

• Cyberinfrastructure is critical for success in modern research and education initiatives• Stem cell research

• Grid computing

• Multi-disciplinary science and engineering

Leadership in Cyberinfrastructure

provides a competitive edge to

California researchers, educators,

practitioners, and business leaders

Page 22: Cyberinfrastructure and California

UNIVERSITY OF CALIFORNIA

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

Thank You

[email protected]