300x matlab - mit lincoln laboratory

999999-1XYZ 10/3/02

MIT Lincoln Laboratory

300x Matlab

Dr. Jeremy KepnerMIT Lincoln Laboratory

September 25, 2002HPEC WorkshopLexington, MA

This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the Department of Defense.

MIT Lincoln LaboratorySlide-2

MatlabMPI HPEC 2002

• Motivation• Challenges

Outline

• Introduction

• Approach

• Performance Results

• Future Work and Summary


MatlabMPI HPEC 2002

Motivation: DoD Need

• Cost= 4 lines of DoD code

• DoD has a clear need to rapidly develop, test and deploy new techniques for analyzing sensor data

– Most DoD algorithm development and simulations are done in Matlab

– Sensor analysis systems are implemented in other languages

– Transformation involves years of software development, testing and system integration

• MatlabMPI allows any Matlab program to become a high performance parallel program

• MatlabMPI allows any Matlab program to become a high performance parallel program


MatlabMPI HPEC 2002

Challenges: Why Has This Been Hard?

• Productivity– Most users will not touch any solution that requires other

languages (even cmex)

• Portability– Most users will not use a solution that could potentially make

their code non-portable in the future

• Performance– Most users want to do very simple parallelism– Most programs have long latencies (do not require low

latency solutions)

CF77

C++


MatlabMPI HPEC 2002

• Basic Requirements• File I/O based messaging

Outline

• Introduction

• Approach




MatlabMPI HPEC 2002

Modern Parallel Software Layers

Vector/MatrixVector/Matrix CompCompTask

Conduit

Math Kernel Messaging Kernel

Application

ParallelLibrary

Hardware

Input Analysis Output

UserInterface

HardwareInterface

WorkstationPowerPC

Cluster

IntelCluster

• Can build any parallel application/library on top of a few basic messaging capabilities

• MatlabMPI provides this Messaging Kernel

• Can build any parallel application/library on top of a few basic messaging capabilities

• MatlabMPI provides this Messaging Kernel


MatlabMPI HPEC 2002

MatlabMPI “Core Lite”

• Parallel computing requires eight capabilities– MPI_Run launches a Matlab script on multiple processors– MPI_Comm_size returns the number of processors– MPI_Comm_rank returns the id of each processor– MPI_Send sends Matlab variable(s) to another processor– MPI_Recv receives Matlab variable(s) from another

processor– MPI_Init called at beginning of program– MPI_Finalize called at end of program


MatlabMPI HPEC 2002

Key Insight: File I/O based messaging

• Any messaging system can be implemented using file I/O

• File I/O provided by Matlab via load and save functions– Takes care of complicated buffer packing/unpacking problem– Allows basic functions to be implemented in ~250 lines of

Matlab code

SharedFile System


MatlabMPI HPEC 2002

MatlabMPI:Point-to-point Communication

load

detect

Sender

variable Data filesave

createLock file

variable

ReceiverShared File System

MPI_Send (dest, tag, comm, variable);

variable = MPI_Recv (source, tag, comm);

• Sender saves variable in Data file, then creates Lock file• Receiver detects Lock file, then loads Data file• Sender saves variable in Data file, then creates Lock file• Receiver detects Lock file, then loads Data file


MatlabMPI HPEC 2002

Example: Basic Send and Receive

M P I _ I n i t ; % I n i t i a l i z e M P I .c o m m = M P I _ C O M M _ W O R L D ; % C r e a t e c o m m u n i c a t o r .c o m m _ s i z e = M P I _ C o m m _ s i z e ( c o m m ) ; % G e t s i z e .m y _ r a n k = M P I _ C o m m _ r a n k ( c o m m ) ; % G e t r a n k .s o u r c e = 0 ; % S e t s o u r c e .d e s t = 1 ; % S e t d e s t i n a t i o n .t a g = 1 ; % S e t m e s s a g e t a g .

i f ( c o m m _ s i z e = = 2 ) % C h e c k s i z e .i f ( m y _ r a n k = = s o u r c e ) % I f s o u r c e .

d a t a = 1 : 1 0 ; % C r e a t e d a t a .M P I _ S e n d ( d e s t , t a g , c o m m , d a t a ) ; % S e n d d a t a .

e n di f ( m y _ r a n k = = d e s t ) % I f d e s t i n a t i o n .

d a t a = M P I _ R e c v ( s o u r c e , t a g , c o m m ) ; % R e c e i v e d a t a .e n d

e n d

M P I _ F i n a l i z e ; % F i n a l i z e M a t l a b M P I .e x i t ; % E x i t M a t l a b

• Uses standard message passing techniques• Will run anywhere Matlab runs• Only requires a common file system

• Uses standard message passing techniques• Will run anywhere Matlab runs• Only requires a common file system

• Initialize• Get processor ranks• Initialize• Get processor ranks

• Execute send• Execute recieve• Execute send• Execute recieve

• Finalize• Exit• Finalize• Exit


MatlabMPI HPEC 2002

MatlabMPI Additional Functionality

• Important MPI conveniences functions – MPI_Abort kills all jobs– MPI_Bcast broadcasts a message

exploits symbolic links to allow for true multi-cast– MPI_Probe returns a list of all incoming messages

allows more dynamic message reading

• MatlabMPI specific functions– MatMPI_Delete_all cleans up all files after a run– MatMPI_Save_messages toggles deletion of messages

individual messages can be inspected for debugging– MatMPI_Comm_settings user can set MatlabMPI internals

rsh or ssh, location of Matlab, unix or windows, …

• Other– Processor specific directories


MatlabMPI HPEC 2002

• Bandwidth• Parallel Speedup

Outline

• Introduction

• Approach




MatlabMPI HPEC 2002

MatlabMPI vs MPI bandwidth

• Bandwidth matches native C MPI at large message size• Primary difference is latency (35 milliseconds vs. 30 microseconds)• Bandwidth matches native C MPI at large message size• Primary difference is latency (35 milliseconds vs. 30 microseconds)

1.E+05

1.E+06

1.E+07

1.E+08

1K 4K 16K 64K 256K 1M 4M 32M

C MPIMatlabMPI

Message Size (Bytes)

Ban

dw

idth

(B

ytes

/sec

)Bandwidth (SGI Origin2000)


MatlabMPI HPEC 2002

MatlabMPI bandwidth scalability

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1K 4K 16K 64K 256K 1M 4M 16M

Message Size (Bytes)

Ban

dw

idth

(B

ytes

/sec

)

Linux w/Gigabit Ethernet

16 Processors

2 Processors

• Bandwidth scales to multiple processors• Cross mounting eliminates bottlenecks• Bandwidth scales to multiple processors• Cross mounting eliminates bottlenecks


MatlabMPI HPEC 2002

Image Filtering Parallel Performance

Parallel performance

1

10

100

1 2 4 8 16 32 64

LinearMatlabMPI

Fixed Problem Size (SGI O2000)

0

1

10

100

1 10 100 1000

MatlabMPILinear

Number of Processors

Gig

aflo

ps

Scaled Problem Size (IBM SP2)

Number of Processors

Spe

edup

• Achieved “classic” super-linear speedup on fixed problem• Achieved speedup of ~300 on 304 processors on scaled problem• Achieved “classic” super-linear speedup on fixed problem• Achieved speedup of ~300 on 304 processors on scaled problem


MatlabMPI HPEC 2002

Productivity vs. Performance

10

100

1000

0.1 1 10 100 1000

Matlab

VSIPL/MPI

SingleProcessor

SharedMemory

DistributedMemory

Matlab

C

Peak Performance

Lin

es o

f C

od

e

C++

VSIPL

ParallelMatlab*

MatlabMPI

VSIPL/OpenMP

PVLVSIPL/MPI

CurrentResearch

CurrentPractice

• Programmed image filtering several ways

•Matlab•VSIPL•VSIPL/OpenMPI•VSIPL/MPI•PVL•MatlabMPI

• MatlabMPI provides•high productivity•high performance

• Programmed image filtering several ways

•Matlab•VSIPL•VSIPL/OpenMPI•VSIPL/MPI•PVL•MatlabMPI

• MatlabMPI provides•high productivity•high performance


MatlabMPI HPEC 2002

Current MatlabMPI deployment

• Lincoln Signal processing (7.8 on 8 cpus, 9.4 on 8 duals)

• Lincoln Radar simulation (7.5 on 8 cpus, 11.5 on 8 duals)

• Lincoln Hyperspectral Imaging (~3 on 3 cpus)

• MIT LCS Beowulf (11 Gflops on 9 duals)

• MIT AI Lab Machine Vision

• OSU EM Simulations

• ARL SAR Image Enhancement

• Wash U Hearing Aid Simulations

• So. Ill. Benchmarking

• JHU Digital Beamforming

• ISL Radar simulation

• URI Heart modelling

• Rapidly growing MatlabMPI user base• Web release may create hundreds of users• Rapidly growing MatlabMPI user base• Web release may create hundreds of users


MatlabMPI HPEC 2002

Outline

• Introduction

• Approach




MatlabMPI HPEC 2002

Future Work: Parallel Matlab Toolbox

DoD SensorProcessing

High Performance Matlab Applications

Parallel Matlab

Toolbox

DoD MissionPlanning

ScientificSimulation

CommercialApplications

UserInterface

HardwareInterface

Parallel Computing Hardware

• Parallel Matlab need has been identified

•HPCMO (OSU)

• Required user interface has been demonstrated

•Matlab*P (MIT/LCS)•PVL (MIT/LL)

• Required hardware interface has been demonstrated

•MatlabMPI (MIT/LL)

• Allows parallel programs with no additional lines of code

• Allows parallel programs with no additional lines of code


MatlabMPI HPEC 2002

Future Work: Scalable Perception

• Data explosion– Advanced perception techniques must process vast (and rapidly

growing) amounts of sensor data• Component scaling

– Research has given us high-performance algorithms and architectures for sensor data processing,

– But these systems are not modular, reflective, or radically reconfigurable to meet new goals in real time

• Scalable perception– will require a framework that couples the computationally daunting

problems of real-time multimodal perception to the infrastructure of modern high-performance computing, algorithms, and systems.

• Such a framework must exploit:– High-level languages– Graphical / linear algebra duality– Scalable architectures and networks


MatlabMPI HPEC 2002

Summary

• MatlabMPI has the basic functions necessary for parallel programming

– Size, rank, send, receive, launch– Enables complex applications or libraries

• Performance can match native MPI at large message sizes

• Demonstrated scaling into hundreds of processors

• Demonstrated productivity

• Available on HPCMO systems

• Available on Web


MatlabMPI HPEC 2002

Acknowledgements

• Support– Charlie Holland DUSD(S&T) and John Grosh OSD– Bob Bond and Ken Senne (Lincoln)

• Collaborators– Stan Ahalt and John Nehrbass (Ohio St.)– Alan Edelman, John Gilbert and Ron Choy (MIT LCS)

• Lincoln Applications– Gil Raz, Ryan Haney and Dan Drake– Nick Pulsone and Andy Heckerling– David Stein

• Centers– Maui High Performance Computing Center– Boston University


MatlabMPI HPEC 2002

http://www.ll.mit.edu/MatlabMPI

300x matlab - mit lincoln laboratory

Documents