high productivity computing: taking hpc mainstream lee grant technical solutions professional high...

63
High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing [email protected]

Upload: jody-stevens

Post on 03-Jan-2016

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

High Productivity Computing:Taking HPC Mainstream

Lee GrantTechnical Solutions Professional High Performance [email protected]

Page 2: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Challenge: High Productivity Computing

“Make high-end computing easier and more productive to use.

Emphasis should be placed on time to solution, the major metric of value to high-end computing users…

A common software environment for scientific computation encompassing desktop to high-end systems will enhance productivity gains by promoting ease of use and manageability of systems.”

2004 High-End Computing Revitalization Task ForceOffice of Science and Technology Policy, Executive Office of the President

Page 3: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

X64 Server

Page 4: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

The Data Pipeline

Data GatheringDiscovery and Browsing

Science Exploration

Domain specific analyses Scientific Output

“Raw” data includes sensor output, data downloaded from agency or collaboration web sites, papers (especially for ancillary data

“Raw” data browsing for discovery (do I have enough data in the right places?), cleaning (does the data look obviously wrong?), and light weight science via browsing

“Science variables” and data summaries for early science exploration and hypothesis testing. Similar to discovery and browsing, but with science variables computed via gap filling, units conversions, or simple equation.

“Science variables” combined with models, other specialized code, or statistics for deep science understanding.

Scientific results via packages such as MatLab or R2. Special rendering package such as ArcGIS.

Paper preparation.

Page 5: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Free Lunch Is Over For Traditional SoftwareFr

ee L

unch

fo

r tra

ditio

nal s

oftw

are

No Free Lunch for traditional software(Without highly concurrent software it won’t get any faster!)

Ope

ratio

ns p

er s

econ

d fo

r ser

ial c

ode

Additional operations per second if code can take advantage of concurrency

6 GHz1 Core

12 GHz1 Core

24 GHz1 Core

3 GHz2 Cores

3 GHz4 Cores

3 GHz8 Cores

3 GHz1 Cor 3 GHz

1 Cores

Page 6: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

“Provide the platform, tools and broad ecosystem to reduce the complexity of HPC by making parallelism more accessible to address future computational needs.”

Microsoft’s Vision for HPC

Reduced Complexity Mainstream HPC Developer Ecosystem

Ease deployment forlarger scale clusters

Simplify management forclusters of all scale

Integrate with existing infrastructure

Address needs of traditional supercomputing

Address emerging cross-industry

computation trends

Enable non-technical users to harness the power of HPC

Increase number of parallel applications and codes

Offer choice of parallel development tools,

languages and libraries

Drive larger universe of developers and ISVs

Page 7: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Application Benefits

The most productive distributed application development environment

System Benefits

Cost-effective, reliable and high performance server operating system

Cluster Benefits

Complete HPC cluster platform integrated with the enterprise infrastructure

Microsoft HPC++ Solution

Page 8: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Systems Management

Job Scheduling

MPIStorage

Rapid large scale deployment and built-in diagnostics suite

Integrated monitoring, management and reporting

Familiar UI and rich scripting interface

Integrated security via Active Directory

Support for batch, interactive and service-oriented applications

High availability scheduling Interoperability via OGF’s HPC

Basic Profile

MS-MPI stack based on MPICH2 reference implementation

Performance improvements for RDMA networking and multi-core shared memory

MS-MPI integrated with Windows Event Tracing

Access to SQL, Windows and Unix file servers

Key parallel file server vendor support (GPFS, Lustre, Panasas)

In-memory caching options

Windows HPC Server 2008

Page 9: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 10: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 11: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Group compute nodes based on hardware, software and custom attributes; Act on groupings.

Pivoting enables correlating nodes and jobs together

Track long running operations and access operation history

Receive alerts for failures

List or Heat Map view cluster at a glance

Page 12: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 13: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 14: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 15: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 16: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 17: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Integrated Job Scheduling

Services oriented HPC apps

Expanded Job Policies

Support for Job Templates

Improve interoperability with mixed IT infrastructure

Skip/Demo

Page 18: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Node 1

S0

P0 P1

P2 P3

S1

P0 P1

P2 P3

S2

P0 P1

P2 P3

S3

P0 P1

P2 P3

Node 2

S0

P0 P1

P2 P3

S1

P0 P1

P2 P3

S2

P0 P1

P2 P3

S3

P0 P1

P2 P3

J1 J1

J3

J2

J1: /numsockets:3 /exclusive: falseJ3: /numcores:4 /exclusive: false

J2: /numnodes:1

Windows HPC Server can help your application make the best use of multi-core systems

Node/Socket/Core Allocation

J3

J3 J3J1

Skip/Demo

Page 19: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Job submission: 3 methods

• Command line– Job submit /headnode:Clus1 /Numprocessors:124 /nodegroup:Matlab– Job submit /corespernode:8 /numnodes:24– Job submit /failontaskfailure:true /requestednodes:N1,N2,N3,N4– Job submit /numprocessors:256 mpiexec \\share\mpiapp.exe– [Completel Powershell system mgmt commands are available as well]

using Microsoft.Hpc.Scheduler;class Program{

static void Main(){

IScheduler store = new Scheduler(); store.Connect(“localhost”); ISchedulerJob job = store.CreateJob(); job.AutoCalculateMax = true; job.AutoCalculateMin = true; ISchedulerTask task = job.CreateTask(); task.CommandLine = "ping 127.0.0.1 -n *"; task.IsParametric = true; task.StartValue = 1; task.EndValue = 10000; task.IncrementValue = 1; task.MinimumNumberOfCores = 1; task.MaximumNumberOfCores = 1; job.AddTask(task); store.SubmitJob(job, @"hpc\user“, "p@ssw0rd");

}}

• Programmatic• Support for C++ & .Net

languages

• Web Interface• Open Grid Forum: “HPC

Basic Profile”

Page 20: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Scheduling MPI jobs

• Job Submit /numprocessors:7800 mpiexec hostname• Start time: 1 second, Completion time: 27 seconds

Page 21: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

NetworkDirectA new RDMA networking interface built for speed and stability

• Verbs-based design for close fit with native, high-perf networking interfaces

• Equal to Hardware-Optimized stacks for MPI micro-benchmarks– 2 usec latency, 2 GB/sec

bandwidth on ConnectX

• OpenFabrics driver for Windows includes support for Network Direct, Winsock Direct and IPoIB protocols

User Mode

Kernel Mode

TCP/Ethernet

Networking

Ker

nel B

y-P

ass

MPI AppSocket-Based App

MS-MPI

Windows Sockets (Winsock + WSD)

Networking HardwareNetworking HardwareNetworking Hardware

Networking HardwareNetworking HardwareHardware Driver

Networking Hardware

Networking Hardware

Mini-port Driver

TCP

NDIS

IP

Networking HardwareNetworking HardwareUser Mode Access Layer

Networking Hardware

Networking Hardware

WinSock Direct Provider

Networking Hardware

Networking Hardware

NetworkDirect Provider

RDMA Networking

OS Component

CCP Component

IHV Component(ISV) App

Page 22: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Spring 2008, NCSA, #239472 cores, 68.5 TF, 77.7%

Fall 2007, Microsoft, #1162048 cores, 11.8 TF, 77.1%

Spring 2007, Microsoft, #1062048 cores, 9 TF, 58.8%

Spring 2006, NCSA, #130896 cores, 4.1 TF

Spring 2008, Umea, #405376 cores, 46 TF, 85.5%

30% efficiencyimprovement

Windows HPC Server 2008

Windows Compute Cluster 2003

Spring 2008, Aachen, #1002096 cores, 18.8 TF, 76.5%

Page 23: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

November 2008 Top500

Page 24: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

“Ferrari is always looking for the most advanced technological solutions and, of course, the same applies for software and engineering. To achieve industry leading power-to-weight ratios, reduction in gear change times, and revolutionary aerodynamics, we can rely on Windows HPC Server 2008. It provides a fast, familiar, high performance computing platform for our users, engineers and administrators.”

-- Antonio Calabrese, Responsabile Sistemi Informativi (Head of Information Systems), Ferrari

“It is important that our IT environment is easy to use and support. Windows HPC is improving our performance and manageability.”

-- Dr. J.S. Hurley, Senior Manager, Head Distributed Computing, Networked Systems Technology, The Boeing Company

Customers

“Our goal is to broaden HPC availability to a wider audience than just power users. We believe that Windows HPC will make HPC accessible to more people, including engineers, scientists, financial analysts, and others, which will help us design and test products faster and reduce costs.”

-- Kevin Wilson, HPC Architect, Procter & Gamble

Page 25: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

“We are very excited about utilizing the Cray CX1 to support our research activities,” said Rico Magsipoc, Chief Technology Officer for the Laboratory of Neuro Imaging. “The work that we do in

brain research is computationally intensive but will ultimately have a huge impact on our understanding of the relationship between brain structure and function, in both health and disease.

Having the power of a Cray supercomputer that is simple and compact is very attractive and necessary, considering the physical constraints we face in our data centers today.”

Page 26: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

• Windows Subsystem for Unix applications– Complete SVR-5 and BSD UNIX environment with 300

commands, utilizes, shell scripts, compilers– Visual Studio extensions for debugging POSIX applications– Support for 32 and 64-bit applications

• Recent port of WRF weather model– 350K lines, Fortran 90 and C using MPI, OpenMP– Traditionally developed for Unix HPC systems– Two dynamical cores, full range of physics options

• Porting experience– Fewer than 750 lines of code changed in makefiles/scripts– Level of effort similar to port to any new version of UNIX– Performance on par with the Linux systems

• India Interoperability Lab, MTC Bangalore– Industry Solutions for Interop jointly with partners– HPC Utility Computing Architecture– Open Source Applications on HPC Server 2008

(NAMD, PL_POLY, GROMACS)

Porting Unix Applications

Page 27: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

High Productivity Modeling

Languages/RuntimesC++, C#, VBF#, Python, Ruby, JscriptFortran (Intel, PGI)OpenMP, MPI

Team DevelopmentTeam portal: version control, scheduled build, bug trackingTest and stress generationCode analysis, Code coveragePerformance analysis

IDERapid application developmentParallel debuggingMultiprocessor buildsWork flow design

.Net FrameworkLINQ: language integrated queryDynamic Language RuntimeFx/JIT/GC improvementsNative support for Web Services

Page 28: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

MSFT || Computing Technologies

Task Concurrency

Data Parallelism

Distributed/Cloud Computing

LocalComputing

• Robotics-based manufacturing assembly line

• Silverlight Olympics viewer

• Enterprise search, OLTP, collab

• Animation / CGI rendering

• Weather forecasting• Seismic monitoring• Oil exploration

• Automotive control system

• Internet –based photo services

• Ultrasound imaging equipment

• Media encode/decode• Image processing/

enhancement• Data visualization

IFx / CCR

Maestro

TPL / PPL

Cluster-TPL

Cluster-PLINQ

MPI / MPI.Net

WCF

Cluster SOA

WF

PLINQ

TPL / PPL

CDS

OpenMP

Page 29: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 30: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

UDF

UDF

UDF

UDF

UDF

UDF

Head NodesSupports SOAfunctionality

WCF Brokers.Compute Nodes

Each performs UDF Tasks as called

From WCF Broker

UDF

UDF

Page 31: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 32: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

1 4 16 64256

10244096

163840

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6Low latency

WSD IPoIB Gige

Message Size ( bytes )

Roun

d Tr

ip L

aten

cy (

ms )

0 50 100 150 2000

1000

2000

3000

4000

5000

6000High throughput

0k pingpong 1k pingpong4k pingpong 16k pingpong

Number of clientsM

essa

ges/

sec

(25

ms c

ompu

te ti

me)

SOA Broker Performance

Page 33: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

MPI.NET• Supports all .NET languages

(C#, C++, F#, ..., even Visual Basic!)• Natural expression of MPI in C#

• Negligible overhead (relative to C) over TCP

if (world.Rank == 0) world.Send(“Hello, World!”, 1, 0);else string msg = world.Receive<string>(0, 0);

string[] hostnames = comm.Gather(MPI.Environment.ProcessorName, 0);

double pi = 4.0*comm.Reduce(dartsInCircle,(x, y) => return x + y, 0) / totalDartsThrown;

Page 34: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Allinea DDT VS Debugger Add-inSkip/Demo

Page 35: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

0.01

0.1

1

10

100

1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

Throughput (M

bps)

Message Size (Bytes)

NetPI PE Performance

C (Native)

C# (Primitive)

C# (Serialized)

Page 36: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Parallel Extensions to .NET

• Declarative data parallelism (PLINQ)

• Imperative data and task parallelism (TPL)

• Data structures and coordination constructs

var q = from n in names.AsParallel()        where n.Name == queryInfo.Name && n.State == queryInfo.State && n.Year >= yearStart && n.Year <= yearEnd        orderby n.Year ascending        select n;

Parallel.For(0, n, i=> {  result[i] = compute(i);});

Page 37: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

static void ProcessNode<T>(Tree<T> tree, Action<T> action) { if (tree == null) return;

ProcessNode(tree.Left, action); ProcessNode(tree.Right, action); action(tree.Data);}

Sequential

Example: Tree Walk

static void ProcessNode<T>(Tree<T> tree, Action<T> action) { if (tree == null) return;

Stack<Tree<T>> nodes = new Stack<Tree<T>>(); Queue<T> data = new Queue<T>();

nodes.Push(tree); while (nodes.Count > 0) { Tree<T> node = nodes.Pop(); data.Enqueue(node.Data); if (node.Left != null) nodes.Push(node.Left); if (node.Right != null) nodes.Push(node.Right); }

using (ManualResetEvent mre = new ManualResetEvent(false)) { int waitCount = Environment.ProcessorCount;

WaitCallback wc = delegate { bool gotItem; do { T item = default(T); lock (data) { if (data.Count > 0) { item = data.Dequeue(); gotItem = true; } else gotItem = false; } if (gotItem) action(item); } while (gotItem);

if (Interlocked.Decrement(ref waitCount) == 0) mre.Set(); };

for (int i = 0; i < Environment.ProcessorCount - 1; i++) { ThreadPool.QueueUserWorkItem(wc); }

wc(null); mre.WaitOne(); }}

Thread Pool

Page 38: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

static void ProcessNode<T>(Tree<T> tree, Action<T> action) { if (tree == null) return; Task t = Task.Create(delegate { ProcessNode(tree.Left, action); }); ProcessNode(tree.Right, action); action(tree.Data); t.Wait();}

Parallel Extensions (with Task)

static void ProcessNode<T>(Tree<T> tree, Action<T> action) { if (tree == null) return; Parallel.Do( () => ProcessNode(tree.Left, action), () => ProcessNode(tree.Right, action), () => action(tree.Data) );}

Parallel Extensions (with Parallel)

static void ProcessNode<T>(Tree<T> tree, Action<T> action) { tree.AsParallel().ForAll(action);}

Parallel Extensions (with PLINQ)

Example: Tree Walk

Page 39: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

F# is...

...a functional, object-oriented, imperative and explorative

programming language for .NET

F#

Strongly Typed

SuccinctScalableLibrariesExplorative

Interoperable

Efficient

Page 40: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Interactive F# Shell

C:\fsharpv2>bin\fsi

MSR F# Interactive, (c) Microsoft Corporation, All Rights Reserved F# Version 1.9.2.9, compiling for .NET Framework Version v2.0.50727

NOTE: NOTE: See 'fsi --help' for flags NOTE: NOTE: Commands: #r <string>;; reference (dynamically load) the given DLL. NOTE: #I <string>;; add the given search path for referenced DLLs. NOTE: #use <string>;; accept input from the given file. NOTE: #load <string> ...<string>;; NOTE: load the given file(s) as a compilation unit. NOTE: #time;; toggle timing on/off. NOTE: #types;; toggle display of types on/off. NOTE: #quit;; exit. NOTE: NOTE: Visit the F# website at http://research.microsoft.com/fsharp. NOTE: Bug reports to [email protected]. Enjoy!

> let rec f x = (if x < 2 then x else f (x-1) + f (x-2));;

val f : int -> int

> f 6;;

val it = 8 val it : int

Page 41: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Example: Taming Asynchronous I/Ousing System;using System.IO;using System.Threading; public class BulkImageProcAsync{    public const String ImageBaseName = "tmpImage-";    public const int numImages = 200;    public const int numPixels = 512 * 512;     // ProcessImage has a simple O(N) loop, and you can vary the number    // of times you repeat that loop to make the application more CPU-    // bound or more IO-bound.    public static int processImageRepeats = 20;     // Threads must decrement NumImagesToFinish, and protect    // their access to it through a mutex.    public static int NumImagesToFinish = numImages;    public static Object[] NumImagesMutex = new Object[0];    // WaitObject is signalled when all image processing is done.    public static Object[] WaitObject = new Object[0];    public class ImageStateObject    {        public byte[] pixels;        public int imageNum;        public FileStream fs;    }  

    public static void ReadInImageCallback(IAsyncResult asyncResult)    {        ImageStateObject state = (ImageStateObject)asyncResult.AsyncState;        Stream stream = state.fs;        int bytesRead = stream.EndRead(asyncResult);        if (bytesRead != numPixels)            throw new Exception(String.Format                ("In ReadInImageCallback, got the wrong number of " +                "bytes from the image: {0}.", bytesRead));        ProcessImage(state.pixels, state.imageNum);        stream.Close();         // Now write out the image.          // Using asynchronous I/O here appears not to be best practice.        // It ends up swamping the threadpool, because the threadpool        // threads are blocked on I/O requests that were just queued to        // the threadpool.         FileStream fs = new FileStream(ImageBaseName + state.imageNum +            ".done", FileMode.Create, FileAccess.Write, FileShare.None,            4096, false);        fs.Write(state.pixels, 0, numPixels);        fs.Close();         // This application model uses too much memory.        // Releasing memory as soon as possible is a good idea,         // especially global state.        state.pixels = null;        fs = null;        // Record that an image is finished now.        lock (NumImagesMutex)        {            NumImagesToFinish--;            if (NumImagesToFinish == 0)            {                Monitor.Enter(WaitObject);                Monitor.Pulse(WaitObject);                Monitor.Exit(WaitObject);            }        }    }

        public static void ProcessImagesInBulk()    {        Console.WriteLine("Processing images...  ");        long t0 = Environment.TickCount;        NumImagesToFinish = numImages;        AsyncCallback readImageCallback = new            AsyncCallback(ReadInImageCallback);        for (int i = 0; i < numImages; i++)        {            ImageStateObject state = new ImageStateObject();            state.pixels = new byte[numPixels];            state.imageNum = i;            // Very large items are read only once, so you can make the             // buffer on the FileStream very small to save memory.            FileStream fs = new FileStream(ImageBaseName + i + ".tmp",                FileMode.Open, FileAccess.Read, FileShare.Read, 1, true);            state.fs = fs;            fs.BeginRead(state.pixels, 0, numPixels, readImageCallback,                state);        }         // Determine whether all images are done being processed.          // If not, block until all are finished.        bool mustBlock = false;        lock (NumImagesMutex)        {            if (NumImagesToFinish > 0)                mustBlock = true;        }        if (mustBlock)        {            Console.WriteLine("All worker threads are queued. " +                " Blocking until they complete. numLeft: {0}",                NumImagesToFinish);            Monitor.Enter(WaitObject);            Monitor.Wait(WaitObject);            Monitor.Exit(WaitObject);        }        long t1 = Environment.TickCount;        Console.WriteLine("Total time processing images: {0}ms",            (t1 - t0));    }}

Processing 200 images in parallel

Page 42: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

let ProcessImageAsync(i) =

async { let inStream = File.OpenRead(sprintf "source%d.jpg" i)

let! pixels = inStream.ReadAsync(numPixels)

let pixels' = TransformImage(pixels,i)

let outStream = File.OpenWrite(sprintf "result%d.jpg" i)

do! outStream.WriteAsync(pixels')

do Console.WriteLine "done!" }

 

let ProcessImagesAsync() =

Async.Run (Async.Parallel

[ for i in 1 .. numImages -> ProcessImageAsync(i) ])

 

Read from the file,

asynchronously

Write the result asynchronously

Equivalent F# code

(same perf)

Generate the tasks and

queue them in parallel

Open the file synchronously

Example: Taming Asynchronous I/O

Page 43: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

The Coming of Accelerators

Page 44: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Current Offerings

Accelerator Brook+ RapidMind

Compute Shader CUDACAL LRB Native

Ct

nVidia GPUAMD CPU or GPU

Intel CPU Larrabee

D3DX, DaVinci, FFT,

ScanACML-GPU cuFFT,

cuBLAS, cuPP MKL++

Microsoft AMD nVidia Intel

Any Processor

OpenCL

Grand Central

CoreImageCoreAnim

Apple

Any Processor

Page 45: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

DirectX11 Compute Shader

• A new processing model for GPUs– Integrated with Direct3D– Supports more general constructs– Enables more general data structures– Enables more general algorithms

• Image/Post processing:– Image Reduction, Histogram, Convolution, FFT– Video transcode, superResolution, etc.

• Effect physics– Particles, smoke, water, cloth, etc.

• Ray-tracing, radiosity, etc.• Gameplay physics, AI

Page 46: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

FFT Performance Example

• Complex 1024x1024 2-D FFT:– Software 42ms 6 GFlops– Direct3D9 15ms 17 GFlops 3x– CUFFT 8ms 32 GFlops 5x– Prototype DX11 6ms 42 GFlops 6x– Latest chips 3ms 100 GFlops

• Shared register space and random access writes enable ~2x speedups

Page 47: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

IMSL .NET Numerical Library• Linear Algebra• Eigensystems• Interpolation and

Approximation• Quadrature• Differential Equations• Transforms

• Nonlinear Equations• Optimization• Basic Statistics• Nonparametric Tests• Goodness of Fit• Regression

• Variances, Covariances and Correlations

• Multivariate Analysis• Analysis of Variance• Time Series and Forecasting• Distribution Functions• Random Number Generation

Page 48: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Data acquisition from source systems and integration

Data transformation and synthesis

Data enrichment, with business logic, hierarchical views

Data discovery via data mining

Data presentation and distribution

Data access for the masses

Integrate Analyze Report

Research

Page 49: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Data Browsing with Excel

Annual Mean

MonthlyMean

WeeklyMean

Courtesy Catherine van Ingen, MSR

Page 50: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Datamining with Excel

Integrated algorithms• Text Mining• Neural Nets• Naïve Bayes• Time Series• Sequent Clustering• Decision Trees• Association Rules

Page 51: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Workflow Design for Sharepoint

Page 52: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Microsoft HPC++ Labs:Academic Computational Finance Service

Page 53: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 54: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 55: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 56: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 57: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 58: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 59: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 60: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 61: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com
Page 62: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

Taking HPC Mainstream

Page 63: High Productivity Computing: Taking HPC Mainstream Lee Grant Technical Solutions Professional High Performance Computing leegrant@microsoft.com

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of

the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.