big data everywhere chicago: high performance computing - contributions towards big data (hpc)

Sharan Kalwani

/ sharan.kalwani@acm.org

www.linkedin.com/sharankalwani 1

Outline

o History of Supercomputing *

o Technologies

o Modern Day HPC:

o Current State of the Art

o Peering beyond the Horizon:

o Next Set of technologies

* aka High Performance Computing

History

Computing Demand:

driven by needs far beyond contemporary capability

Early adopters: (1970s)

LANL (Los Alamos National Lab) and

NCAR (National Center for Atmospheric Research)

Characteristics: domain specific needs

Features: High Speed Calculations: PDE, Matrices

1972: Seymour Cray (CDC, Cray Research Inc.)

1st Model Cray-1

History

Cray-1 Characteristics: (1975-1976)

64 bit word length

12.5 nanosecond clock speed

80 MHz

“original” RISC

1 clock == 1 instruction

Vector instruction set, true multiplier effect, single instructions, multiple data

Matrix operations, pipelining,

included add+multiply!

memory <> processor balance

Cray-1, Cray-XMP, Cray-YMP, Cray-2, Cray 3

History

Enter the domain of MPP

Massively parallel processors

Introduction of Torus architectures

Seen these days in some offerings

Cray T3 D/E/F….(1st machine to break

1,000,000,000 calculations/sec barrier)

Cray T3 architecture (logical) circa 1993, looks a lot like a cluster, eh?

Hardware Contributions: _Phase_ 1

Profusion of technologies: RISC inspiration (1 clock cycle → 1 instruction)

Solid State Disk – recognized the need for keeping CPU busy all the time

multi-core software – coherence + synchronization

De-coupling of I/O from compute

Massive memory

I/O technologies – HiPPI (high speed parallel interface)

Visualization driver

Chip set design, ECL -> CMOS integration

Parallel processing software foundation -> MPI

Solid State Disk (grandpa USB stick)

The first CRAY X-MP system had SSD in 1982.

Designed for nearly immediate reading and writing of very large data files.

Data transfer rates of up to 1.250 GBytes / second,

Far exceeding *any* other data transfer devices in its time.

SSDs offered in sizes of 64, 128, or 256 million bytes of storage.

The hole in the cabinet was to attach a very high speed (VHISP)

data channel to an SSD.

Link referred to as the "skyway."

Via software, the SSD is logically accessed as a disk unit.

SSD driver ~ 1200 lines of C code!

History Marches on….

Battle of technologies:

Silicon v. Gallium Arsenide

Vector v. Killer Micros

Accelerated Strategic Computing Initiative (mid 90s) ASCI project changed directions for everyone

Everybody was focused on clock speed

Or Floating Point Operations (FLOPS/sec)

@12.5 ns clock speed 80 million Flops/sec (peak)

Leading to the famous Macho FLOPS race: USA v Japan (90s)

Megaflops Gigaflops (1000 MF)

Gigaflops Teraflops (1000 GF)

Teraflops Petaflops (1000 TF)

In 2018 the industry expects an ExaFlop machine!

First GigaFlop/sec* System

Cray YMP and Cray-2

First TeraFlop/sec System

Sandia National Lab ASCI “Red” (Intel)

First PetaFlop/sec System

LANL “RoadRunner” (IBM)

First ExaFlop/sec System

* SUSTAINED!

Is anyone keeping score?

Birth of the Top500 list

1993 – Dongarra, Strohmaier , Meuer & Simon

Linear Algebra Package (LINPAK) basis

Offshoots:

Green500 (power efficiency)

Graph500 (search oriented – little to no floating point computation)

@SC13 a new replacement metric has been proposed

Is anyone keeping score? We will return to this …..

Cluster Growth propelled by HPC

Linux Growth propelled by HPC

Track Record of Linux versions in HPC

• See also Linux Foundation report

• http://www.linuxfoundation.org/publications/linux-foundation/top500_report_2013

HPC top500 - factoids • Current #1 system has 3,120,000 cores

– Located in China, called Tianhe-2 “Milky Way”

– Peak speed of 33.9 PetaFLops/second (quadrillions of calculations per second)

– Needs 17.8 MW of power

• Current #2 system @ ORNL (US Government DoE) in Tennessee

– Has 560,640 cores, called Titan

– Peak speed of 17.6 PetaFlops/second

– Needs 8.21 MW of power

HPC top500 - factoids • Tianhe-2

HPC top500 - factoids • Titan

(http://www.ornl.gov/info/press_releases/get_press_release.cfm?ReleaseNumber=mr20121029-00)

HPC top500 - factoids • Titan

Treemap of countries in HPC

Operating Systems: History

Early HPC OS:

tiny assembled loaders

CTSS (Cray Time Sharing Systems) - LTSS

CRAY Operating Systems (COS)

CRAY UNIX ((UNICOS)

mk/Kernel – CHORUS

Beowulf cluster – Linux appears

Linux Contributions: History

Linux – attack of the killer micros, 1992

NOW – Network of workstations, 1993

Linux Contributions: History 1993-1994

133 nodes – Stone Supercomputer

First Beowulf cluster

Concept pioneered at NASA/Caltech

Thomas Sterling and Donald Becker

• Beowulf

• NASA

• LSU

• Indiana University

THOMAS STERLING

• Beowulf Components:

– Parallel Virtual Machine (PVM) – U Tennesse

– Message Passing Interface (MPI) – several folks

– Jack Dongarra,Tony Hey and David Walker

– Support of NSF and ARPA

– Today we have the MPI Forum

– MPI 2 and now MPI 3

– OpenMPI, MPICH, etc

– Future Pthreads and OpenMP,

HPC and Linux

• Beowulf Attributes (or cluster features):

• Open Source

• Low Cost

• Elasticity

• Equal Spread of work (seeds of cloud computing here!!)

• These days the Linux kernel can handle 64 cores! HPC pushes this limit even further….

HPC and Linux pushing the boundaries

• File systems:

– Large number of high performance file systems

– Lustre, now in version 2.5

– Beats HDFS several times over!!

– You can host HDFS over many HPC filesystems for massive gains

Typical Stack

Pick Distro – Linux based (usually Enterprise class)

Hardware

Hardware Contributions: _Phase_ 2

Profusion of technologies:

In-memory processing, many HPC sites implemented this

1992 built special systems for the use in cryptography using these technigques

Graph traversal systems – now available as appliances by HPC vendors

Massive memory : single memory systems over several TB in size

Infiniband interconnects: hitting 100 Gbits/sec switches you can buy them now

Parallel processing software foundation -> replacements for MPI stack being worked on

Modern Day HPC

• Building the ExaScale machine:

– Exascale is 1 quintillion calculations/second

– 1000x Petaflops/sec

– Also Known as 10^18 (hence 2018 projections)

–1, 000, 000, 000, 000, 000, 000 floating point calculations/second (sustained)

–How to feed this monster?

Modern Day HPC

• Solutions for the ExaScale monster:

• Inevitably Big Data community should watch/support/benefit issues we are tackling now:

– Memory matters!

– Resiliency in software

– Robustness in hardware

– Co-Design critical

–Power Consumption and Cooling (estimate several megawatts w/ present day approaches)

–Utterly new architectures needed

Applications: What did we use all this for?

Weather

Automotive and Aerospace Design

Traditional Sciences

Energy (Nuclear, Oil & Gas)

Bioinformatics

Cryptography

and……big data

Applications: traditional HPC

Automotive & (similar) Aerospace Design

o Car Crash Analysis – prime usage, 50%

o Each physical crash test costs $0.5 million

o Virtual Prototype test - $1000 (or less)

Applications: The real deal vs. HPC

• NHTSA requires physical validation

• Before total crash tests cost a total of $100 million/year

• Limited to a small suite: 12 tests

• Today we can do over 140+ different tests (for each vehicle) and with:

– Less cost (we instead increased the # of tests!)

– Faster response (5 years v 12 months)

– Many more design iterations (Hundreds v 10)

HPC for weather forecasting

Whither HPC and the Cloud?

HPC for Crisis Assist

Technology March! Or why simulation matters?

• Increasing resolving power - greater fidelity problem

• Decreasing product design turnaround times

• Increase cost-effectiveness relative to experiment & observation

• Reducing uncertainty

• Ramping up the ease of use by non-experts

• Powerful tool in resolving scientific questions, engineering designs, and policy support

• Co-execution environments for simulation, large-scale data enabled science, and scientific visualization

• Simple: Better Answers thus delivering an…..

– Attractiveness to the creative and entrepreneurial classes

– Straightforward case for national economic competitiveness!!!

We need more HPC because….

What about costs????

HPC is indispensable!

• Establish Capability

• Enable Adding of Complexity

• Gain a real and better Understanding

• And do not forget all that data!

• How do we tie it in?......

Approaching the eXtreme Scale

• Current paradigm: Simulation lots of equations which mimic or model actual situations “Third” Paradigm

• -------------------------------------------------------------------------------------------

• Operate without models (Big Data) “Fourth” Paradigm

Operate without models (Big Data)

• BEFORE….. * NOW/FUTURE….

Models/Theory Models/Theory

Best Example….. tell us what we do not know!

• Recent Success:

• Solar observations (actual data)

• Unknown Surface Perturbations or Energy

• Could not be explained by all classical models

• Resorted to automated machine learning driven alternate search

• Answer: Solar Earthquakes and Thunderclaps, classic acoustic signature!

• New profession: Solar Seismologists!

Trend began a decade+ ago…. http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf

Everyone is seriously interested….. http://science.energy.gov/~/media/ascr/ascac/pdf/reports/exascale_subcommittee_report.pdf.

A Peek at the Future…….

• Yes…we should definitely care

• The bigger and more relevant questions are:

– What architecture? What programming model?

– Power consumption will dominate

• Currently 4 approaches:

– Stay the course ? Not!!

– All GPGPU based?

– ARM based?

– Quantum Computing ??

GPGPU perspective….

G1 G2 G3 G4

8-Cores 8-Cores 16-Core Server Node

Multi-GPU Acceleration of

a 16-Core ANSYS Fluent

Simulation of External Aero

Xeon E5-2667 CPUs + Tesla K20X GPUs

2.9X Solver Speedup

CPU Configuration CPU + GPU Configuration

Click to Launch Movie

A Peek at the Future……. – GPGPU

GPU I/O Hub PCI-Express

A Peek at the Future……. – GPGPU based?

– http://www.anl.gov/events/overview-nvidia-exascale-processor-architecture-co-design-philosophy-and-application-results

– Echelon

– DragonFly

– http://www.nvidia.com/content/PDF/sc_2010/theater/Dally_SC10.pdf

A Peek at the Future…….

A Peek at the Future……. – ARM or Pi based?

– http://coen.boisestate.edu/ece/files/2013/05/Creating.a.Raspberry.Pi-Based.Beowulf.Cluster_v2.pdf

Quantum Computing…….

• D-WAVE systems installed at NASA Ames lab

• Uses a special chip, 512 bit “Vesuvius”

• Uses 12 KW of power

• Cooled to 0.02 Degrees K (100 times colder than outer space)

• RF shielding

• Shor’s Integer Factorization Algorithm

• Problem: Given a composite n-bit integer, find a nontrivial factor.

– Best-known deterministic algorithm on a classical computer has time complexity exp(O( n1/3log2/3 n)).

• A quantum computer can solve this problem in O( n3 ) operations.

Peter Shor

Algorithms for Quantum Computation: Discrete Logarithms and Factoring

Proc. 35thAnnual Symposium on Foundations of Computer Science, 1994, pp. 124-134

• Classical: number field sieve

– Time complexity: exp(O(n1/3 log2/3 n))

– Time for 512-bit number: 8400 MIPS years

– Time for 1024-bit number: 1.6 billion times longer

• Quantum: Shor’s algorithm

– Time complexity: O(n3)

– Time for 512-bit number: 3.5 hours

– Time for 1024-bit number: 31 hours

• (assuming a 1 GHz quantum machine)

See M. Oskin, F. Chong, I. Chuang

A Practical Architecture for Reliable Quantum Computers

IEEE Computer, 2002, pp. 79-87

What I will be looking into…….

• Julia

• Programming Environment which combines *all* the elements of: – R (express data handling)

– Scientific and Engineering process (e.g. MATLAB like)

– Parallel processing and distributed computing functional approaches (similar to Scala, Erlang and others)

– Python and other integration packages already there

– Happy marriage of several arenas

– Now in early release

• Feel free to contact or follow up with me on this

SUMMARY: Core Competencies Across HPC

Core Competencies

Extreme scale

Architecture

Compute

Memory

Storage/data management

Tera, Peta, Exabytes….

Visualization and analytics

Fast fabrics

Future architectural direction

Parallelism to extreme parallelism

Multi core

Programming models

Big Data

Models, applications, applied

analytics

Structured, unstructured data types

The need for a new discipline: HPC experts + Domain Expertise ==

Simulation.Specialists Core Competencies Where would this Computational

Specialist work?

Extreme scale

Architecture

Compute

Memory

Storage/data management

Tera, Peta, Exabytes….

Visualization and analytics

Fast fabrics

Future architectural direction

Parallelism to extreme parallelism

Multi core

Programming models

Big Data

Models, applications, applied

analytics

Structured, unstructured data types

National security

Fraud detection

Grand challenge science Physics, Chemistry, Biology,

Weather/climate, energy etc.

Bio/life sciences

Healthcare

Energy/Geophysics

Financial modeling, high frequency

and algorithmic trading

Entertainment/media

Auto/aero/mfg.

Consumer

Electronics

Risk informatics: insurance, global,

financial, medical etc.

Optimization models

Discovery analytics

On a lighter note…..

big data everywhere chicago: high performance computing - contributions towards big data (hpc)

Technology

convergence between hpc and big data: the day after...

designing hpc, big data and deep learning middleware for

big data acceleration - hpc advisory...

aws cloud for hpc and big data

hpc, big data, and ai: computing under constraints

big data hpc convergence

hpc advisory council swiss conference 2016 …...4 day 2...

hpc, big data & data center explanation by mert akın

hpc & big data

hpc & big data trends @hpe - matej bel...

big data meets hpc: exploiting hpc technologies for ... ·...

hpc-abds : the case for an integrating apache big data stack...

package once/run anywhere' big data and hpc workloads

exploiting hpc technologies to accelerate big data...

the convergence of big compute and big data in cloud-based...

c de alto rendimiento (hpc) & big data

big data meets hpc - exploiting hpc technologies for...

hpc & big data: trends to watch - university of oklahoma ·...

big data, simulations and hpc convergence

hpc & big data convergence