hpc, grid and cloud computing - the past, present, and future challenge

54
HPC, Grid and Cloud Computing - The Past, Present and Future Jason Shih Academia Sinica Grid computing FBI 極簡主義, Nov 3 rd , 2010

Upload: jason-shih

Post on 23-Jan-2015

661 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hpc, grid and cloud computing - the past, present, and future challenge

HPC, Grid and Cloud Computing - The Past, Present and Future

Jason Shih Academia Sinica Grid computing

FBI 極簡主義, Nov 3rd, 2010

Page 2: Hpc, grid and cloud computing - the past, present, and future challenge

Outline

  Trend in HPC   Grid: eScience Research @ PetaScale   Cloud Hype and Observation   Future Exploration Path of Computing   Summary

Page 3: Hpc, grid and cloud computing - the past, present, and future challenge

21

About ASGC

Large Hadron Collider (LHC)

Avian Flu Drug Discovery Grid Application Platform

A Worldwide Grid Infrastructure

Asia Pacific Regional Operation Center

>280 sites, >45 countries >80,000 CPUs, >20 PetaBytes >14,000 users, >200 VOs >250,000 jobs/day

Best Demo Award of EGEE’07!

Lightweight Problem Solving Framework!

1. Most Reliable T1: 98.83%!2. Very Highly Performing and

most Stable Site in CCRC08!

Max CERN/T1-ASGC Point2Point Inbound : 9.3 Gbps!

100 meters underground 27km of circumstances; locate in Geneva

Page 4: Hpc, grid and cloud computing - the past, present, and future challenge

Emerging Trend and Technologies: 2009 -2010

Page 5: Hpc, grid and cloud computing - the past, present, and future challenge

Hype Cycle for Storage Technologies - 2010

Page 6: Hpc, grid and cloud computing - the past, present, and future challenge

Trend in High Performance Computing

Page 7: Hpc, grid and cloud computing - the past, present, and future challenge

Ugly? Performance of HPC Cluster

  272 (52%) of world fastest clusters have efficiency lower than 80% (Rmax/Rpeak)

  Only 115 (18%) could drive over 90% of theoretical peak   Sampling from Top500 HPC cluster

Trend of Cluster Efficiency 2005-2009

Page 8: Hpc, grid and cloud computing - the past, present, and future challenge

Performance and Efficiency   20% of Top-performed clusters contribute 60% of Total

Computing Power (27.98PF)   5 Clusters Eff. < 30

Page 9: Hpc, grid and cloud computing - the past, present, and future challenge

Impact Factor: Interconnectivity - Capacity and Cluster Efficiency

  Over 52% of Cluster base on GbE   With efficiency around 50% only

  InfiniBand adopt by ~36% HPC Clusters

Page 10: Hpc, grid and cloud computing - the past, present, and future challenge

HPC Cluster - Interconnect Using IB   SDR, DDR and QDR in Top500

  Promising efficiency >= 80%   Majority of IB ready cluster adopt

DDR (87%) (2009 Nov)   Contribute 44% of total computing

power   ~28 Pflops

  Avg efficiency ~78%

Page 11: Hpc, grid and cloud computing - the past, present, and future challenge

Trend in HPC Interconnects: Infiniband Roadmap

Page 12: Hpc, grid and cloud computing - the past, present, and future challenge

Common semantics

  Programmer productivity   Easy of deployment   HPC filesystem are more mature, wider feature set:

  High concurrent read and write   In the comfort zone of programmers (vs cloudFS)

  Wide support, adoption, acceptance possible   pNFS working to be equivalent   Reuse standard data management tools

  Backup, disaster recovery and tiering

Page 13: Hpc, grid and cloud computing - the past, present, and future challenge

Evolution of Processors

Page 14: Hpc, grid and cloud computing - the past, present, and future challenge

Trend in HPC

Page 15: Hpc, grid and cloud computing - the past, present, and future challenge

Some Observations & Looking for Future (I)   Computing Paradigm

  (Almost) Free FLOPS   (Almost) Logic Operation   Data Access (Memory) Is A Major Bottleneck   Synchronization Is the Most Expensive   Data Communication Is A Big Factor in Performance   I/O Still A Major Programming Consideration   MPI Coding Is the Motherhood of Large Scale Computing   Computing in Conjunction of Massive Data Management   Finding Parallelism Is Not A Whole Issue In Programming   Data Layout   Data Movement   Data Reuse   Frequency of Interconnected Data Communication

Page 16: Hpc, grid and cloud computing - the past, present, and future challenge

Some Observations & Looking for Future (II)   Emerging New Possibility

  Massive “Small” Computing Elements with On Board Memory   Computing Node Can Be Caonfigured Dynamically (including Failure

recovery)   Network Switch (within on site complex) Will Nearly Match Memory

Performance   Parallel I/O Support for Massive Parallel System   Asynchronous Computing/Communication Operation   Sophisticate Data Pre-fetch Scheme (Hardware/Algorithm)   Automate Dynamic Load Balance Method   Very High Order Difference Scheme (also Implicit Method)   Full Coupling of Formerly Split Operators   Fine Numerical Computational Grid (grid number > 10,000)   Full Simulation of Protein   Full Coupling of Computational Model   Grid Computing for All

Page 17: Hpc, grid and cloud computing - the past, present, and future challenge

Some Observations & Looking for Future (3)

System will get more complicate & Computing Tool will get more sophisticated:

Vendor Support & User Readiness?

Page 18: Hpc, grid and cloud computing - the past, present, and future challenge

Grid: eScience Research @ PetaScale

Page 19: Hpc, grid and cloud computing - the past, present, and future challenge

WLCG Computing Model - The Tier Structure   Tier-0 (CERN)

  Data recording   Initial data reconstruction   Data distribution

  Tier-1 (11 countries)   Permanent storage   Re-processing   Analysis

  Tier-2 (~130 countries)   Simulation   End-user analysis

Page 20: Hpc, grid and cloud computing - the past, present, and future challenge

4 EGEE07, Budapest, 1-5 October 2007

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 4

Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences …

Page 21: Hpc, grid and cloud computing - the past, present, and future challenge

Objectives

  Building sustainable research and collaboration infrastructure

  Support research by e-Science, on data intensive sciences and applications require cross disciplinary distributed collaboration

Page 22: Hpc, grid and cloud computing - the past, present, and future challenge

ASGC Milestone

  Operational from the deployment of LCG0 since 2002   ASGC CA establish on 2005 (IGTF in same year)   Tier-1 Center responsibility start from 2005   Federated Taiwan Tier-2 center (Taiwan Analysis Facility, TAF)

is also collocated in ASGC   Rep. of EGEE e-Science Asia Federation while joining EGEE

from 2004   Providing Asia Pacific Regional Operation Center (APROC)

services to regional-wide WLCG/EGEE production infrastructure from 2005

  Initiate Avian Flu Drug Discovery Project and collaborate with EGEE in 2006

  Start of EUAsiaGrid Project from April 2008

Page 23: Hpc, grid and cloud computing - the past, present, and future challenge

LHC First Beam – Computing at the Petascale

  General Purpose, pp, heavy ions

ATLAS: General Purpose, pp, heavy ions

ALICE: Heavy ions, pp LHCb: B-physics, CP Violation

CMS: General Purpose, pp, heavy ions

Page 24: Hpc, grid and cloud computing - the past, present, and future challenge

Size of LHC Detector

Bld. 40 ATLAS

CMS 7,000 Tons

25 Meters in Height

45 Meters in Length

ATLAS Detector

Page 25: Hpc, grid and cloud computing - the past, present, and future challenge

UNESCO Information Preservation debate, April 2007 -

[email protected]

25 http://www.damtp.cam.ac.uk/user/gr/public/bb_history.html

Standard Cosmology

Good model from 0.01 sec after Big Bang

Supported by considerable observational evidence

Elementary Particle Physics

From the Standard Model into the unknown: towards energies of 1 TeV and beyond: the Terascale

Towards Quantum Gravity

From the unknown into the unknown...

Tim

e

Energy, Density, Tem

perature

Page 26: Hpc, grid and cloud computing - the past, present, and future challenge

WLCG Timeline

  First Beam on LHC, Sep. 10, 2008

  Severe Incident after 3w operation (3.5TeV)

Page 27: Hpc, grid and cloud computing - the past, present, and future challenge

Petabyte Scale Data Challenges

  Why Petabyte?   Experiment Computing Model   Comparing with conventional data management

  Challenges   Performance: LAN and WAN activities

  Sufficient B/W between CPU Farm   Eliminate Uplink Bottleneck (Switch Tires)

  Fast responding of Critical Events   Fabric Infrastructure & Service Level Agreement

  Scalability and Manageability   Robust DB engine (Oracle RAC)   KB and Adequate Administration (Training)

Page 28: Hpc, grid and cloud computing - the past, present, and future challenge

Tier Model and Data Management Components

Page 29: Hpc, grid and cloud computing - the past, present, and future challenge

Disk Pool Configuration - T1 MSS (CASTOR)

Page 30: Hpc, grid and cloud computing - the past, present, and future challenge

Distribution of Free Capacity - Per Disk Servers vs. per Pool

Page 31: Hpc, grid and cloud computing - the past, present, and future challenge

Storage Server Generation - Drive vs. Net Capacity (Raid6)

TB

TB TB

TB 15TB/DS

21TB/DS 31TB/DS

40TB/DS

Page 32: Hpc, grid and cloud computing - the past, present, and future challenge

IDC Collocation   Facility install complete at Mar 27th   Tape system delay after Apr 9th

  Realignment   RMA for faulty parts

Page 33: Hpc, grid and cloud computing - the past, present, and future challenge

Storage Farm   ~ 110 raid subsystem deployed since 2003.   Supporting both Tier1 and 2 storage fabric   DAS connection to front-end blade server

  Flexible switching front end server upon performance requirement

  4-8G fiber channel connectivity

Page 34: Hpc, grid and cloud computing - the past, present, and future challenge

Computing/Storage System Infrastructure

Page 35: Hpc, grid and cloud computing - the past, present, and future challenge

Throughput of WLCG Experiments   Throughput defined as Job Eff. x # Jobs running   Characteristic of 4 LHC Exp. depicting in-efficiency is due to poor coding.

Page 36: Hpc, grid and cloud computing - the past, present, and future challenge

Reliability From Different View Perspective

Page 37: Hpc, grid and cloud computing - the past, present, and future challenge

Storage Fabric Management – The Challenges: Events Management

Page 38: Hpc, grid and cloud computing - the past, present, and future challenge

Cloud Hype and Observation

Open Cloud Consortium

Page 39: Hpc, grid and cloud computing - the past, present, and future challenge
Page 40: Hpc, grid and cloud computing - the past, present, and future challenge

Cloud Hype

  Metacomputing (~1987, L. Smarr)   Grid Computing (~1997, I. Foster, K. Kesselman)   Cloud Computing (~2007, E. Schmidt?)

Page 41: Hpc, grid and cloud computing - the past, present, and future challenge

Type of Infrastructure

 Proprietary solutions by public providers   Turnkey solutions developed internally as they own

the software and hardware solution/tech.  Cloud specific support

  Developers of specific hardware and/or software solutions that are utilized by service providers or used internally when building private cloud

 Traditional providers   Leverage or tweak their existing

Page 42: Hpc, grid and cloud computing - the past, present, and future challenge

Grid and Cloud: Comparison   Cost & Performance   Scale & Usability   Service Mapping   Interoperability   Application Scenarios

Page 43: Hpc, grid and cloud computing - the past, present, and future challenge

Cloud Computing: “X” as a Service  Type of Cloud  Layered Service Model  Reference Model

Page 44: Hpc, grid and cloud computing - the past, present, and future challenge

Virtualization is not Cloud computing

Ref: Linux-based virtualization for HPC clusters.

  Performance Overhead   FV vs. PV

  Disk I/O and network throughput (VM scalability)

Page 45: Hpc, grid and cloud computing - the past, present, and future challenge

Cloud Infrastructure Best practical & Real world performance  Start Up: 60 ~ 44s  Restart : 30 ~ 27s  Deletion: 60 ~ <5s  Migrate

  30 VM ~ 26.8s   60 VM ~ 40s  120 VM ~ 89s

 Stop   30VM ~ 27.4s   60VM ~ 26s  120VM ~ 57s

Page 46: Hpc, grid and cloud computing - the past, present, and future challenge

Cloud Infrastructure Best practical Real World Performance  Start Up: 60 ~ 44s  Restart : 30 ~ 27s  Deletion: 60 ~ <5s  Migrate

  30 VM ~ 26.8s   60 VM ~ 40s  120 VM ~ 89s

 Stop   30VM ~ 27.4s   60VM ~ 26s  120VM ~ 57s

Page 47: Hpc, grid and cloud computing - the past, present, and future challenge

Virtualization: HEP Best Practical

Page 48: Hpc, grid and cloud computing - the past, present, and future challenge
Page 49: Hpc, grid and cloud computing - the past, present, and future challenge

Grid over Cloud or Cloud over Grid?

Page 50: Hpc, grid and cloud computing - the past, present, and future challenge

Power Consumption Challenge

Page 51: Hpc, grid and cloud computing - the past, present, and future challenge

Conclusion: My Opinion

  Future of Computing: Technology-Push & Demand-Pull

  Emerging of new science paradigm   Virtualization: Promising Technology but being overemphasized

  Green: Cloud Service Transparency & Common Platform   More Computing Power ~ Power Consumption

Challenge   Private Clouds Will be predominant way

  Commercial Cloud (Public) expect not evolving fast

Page 52: Hpc, grid and cloud computing - the past, present, and future challenge

Acknowledgment

  Thanks valuable discussion/inputs from TCloud (Cloud OS: Elaster)

  Professional Technical Support from Silvershine Tech. at beginning of the collaboration.

The interesting thing about Cloud Computing is that we’ve defined Cloud Computing to include everything that we already do….. I don’t understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads.

Larry Ellison, quote in the Wall Street Journal, Sep 26, 2008

Page 53: Hpc, grid and cloud computing - the past, present, and future challenge

Issues

  Scalability?   Infrastructure operation vs. performance

  Assessment   Application aware – Cloud service   Cost analysis   Data center power usage – PUE   Cloud Myth   Top 10 Cloud Computing Trend

  http://www.focus.com/articles/hosting-bandwidth/top-10-cloud-computing-trends/

  Use Cases & Best Practical

Page 54: Hpc, grid and cloud computing - the past, present, and future challenge

Issues (II)

  Volunteer computing (boinc)?   Total capacity & performance   successful stories & research Despines

  What’s hindering cloud adoption? Try human.   http://gigaom.com/cloud/whats-hindering-cloud-

adoption-how-about-humans/   Future projection?

  service readiness? Service level? Technical barriers?