advanced parallel primitives in spm.python for inheriting ......parallelism: the management of a...
TRANSCRIPT
![Page 1: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/1.jpg)
Advanced Parallel Primitives in SPM.Pythonfor Inheriting Fault-Tolerance, and Scalable
Processing of Data and Graphs
Minesh B. Aminmamin @ mbasciences.com
http://www.mbasciences.com
HPC Advisory Council / Stanford Workshop 2011
Stanford University, CA
Dec 7, 2011
© 2011 MBA Sciences, Inc. All rights reserved.
![Page 2: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/2.jpg)
Problem Statement
... exploiting parallelism using libraries
![Page 3: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/3.jpg)
Problem Statement
... exploiting parallelism using frameworks
libraries
![Page 4: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/4.jpg)
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
![Page 5: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/5.jpg)
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
![Page 6: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/6.jpg)
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
![Page 7: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/7.jpg)
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
![Page 8: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/8.jpg)
Terminology: ”Exploiting Parallelism”
Parallelism: The management of a collection of serial tasks
Management: The policies by which:● tasks are scheduled,
● premature terminations are handled,
● preemptive support is provided,
● communication primitives are enabled/disabled, and
● the manner in which resources are obtained andreleased
Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate
prior to conclusion, or
● Fine grain ... where tasks may communicate priorto conclusion.
Management policies codify how serial tasks areto be managed ... independent of what they may be
![Page 9: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/9.jpg)
Terminology: ”Exploiting Parallelism”
Parallelism: The management of a collection of serial tasks
Management: The policies by which:● tasks are scheduled,
● premature terminations are handled,
● preemptive support is provided,
● communication primitives are enabled/disabled, and
● the manner in which resources are obtained andreleased
Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate
prior to conclusion, or
● Fine grain ... where tasks may communicate priorto conclusion.
Management policies codify how serial tasks areto be managed ... independent of what they may be
![Page 10: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/10.jpg)
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?
![Page 11: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/11.jpg)
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?
![Page 12: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/12.jpg)
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?What makes
![Page 13: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/13.jpg)
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?What makes
Supposition: The gap between developer’s intent and API of PET(parallel enabling technology) ...
![Page 14: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/14.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 15: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/15.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 16: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/16.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 17: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/17.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 18: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/18.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 19: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/19.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures
>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 20: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/20.jpg)
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
![Page 21: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/21.jpg)
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
![Page 22: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/22.jpg)
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
![Page 23: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/23.jpg)
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
![Page 24: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/24.jpg)
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
![Page 25: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/25.jpg)
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Fundamental Prerequisite
Ability to express parallelism in terms of parallelprimitives (pclosures)
![Page 26: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/26.jpg)
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
![Page 27: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/27.jpg)
Partition/OpenMPI: Prologue
GNU/Linux [] mpirun ... ./hello world -prefix ”api”
Typical OpenMPI application ... lacks support for:
● fault tolerance
● timeout
● detection of deadlocks
![Page 28: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/28.jpg)
Partition/OpenMPI: Prologue
GNU/Linux [] mpirun ... ./hello world -prefix ”api”
Typical OpenMPI application ... lacks support for:
● fault tolerance
● timeout
● detection of deadlocks
![Page 29: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/29.jpg)
Partition/OpenMPI: Prologue
GNU/Linux [] mpirun ... ./hello world -prefix ”api”
Typical OpenMPI application ... lacks support for:
● fault tolerance
● timeout
● detection of deadlocks
⇒ Prototyping is (deeply)∞
frustrating
![Page 30: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/30.jpg)
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
![Page 31: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/31.jpg)
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
![Page 32: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/32.jpg)
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
![Page 33: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/33.jpg)
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
![Page 34: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/34.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 35: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/35.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 36: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/36.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 37: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/37.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 38: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/38.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 39: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/39.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 40: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/40.jpg)
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
![Page 41: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/41.jpg)
Partition/OpenMPI: Anatomy - Timeline
GNU/Linux []spm.python ...
mpirun ./hello world -prefix ”api”
![Page 42: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/42.jpg)
Partition/OpenMPI: Anatomy - Timeline (Cont’d)
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
![Page 43: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/43.jpg)
Partition/OpenMPI: Anatomy - Timeline (Cont’d)
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
![Page 44: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/44.jpg)
Partition/OpenMPI: Anatomy - Timeline (Cont’d)
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
Establish a nervous system over the OpenMPI application
Populate nervous systemwith streams oftime-series data
![Page 45: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/45.jpg)
Partition/OpenMPI: Anatomy - Breakdown
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
Built-in Package Management System
● Selectively change default OpenMPI env
Redirection of library calls
● Augment libmpi.so, libc.so ...
with libSPM.so
Second Parallel Capability
● ∼ 60-line Python script
● Authored by developer
![Page 46: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/46.jpg)
Partition/OpenMPI: Second Parallel Capability
@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __init():return spm.pclosure.macro.papply.template.openMPI.\
policyA.defun(signature = ’signature::Hub’,stage1Cb = __taskStat,);
__pc = __init();
Declaration + Definition of Pclosure
![Page 47: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/47.jpg)
Partition/OpenMPI: Second Parallel Capability
@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def main(pool,
taskApiArgs,taskTimeout):
# Initialize ’stage0’.__pc.stage0.init.main(typedef = ...);hdl = __pc.stage0.payload.tie();# Populate the template taskhdl.spm.meta.label = ’***’; # Not interested.hdl.spm.meta.apiArgs = taskApiArgs;hdl.spm.meta.timeout = taskTimeout;# Invoke the pmanager__pc.stage0.event.manage(pool = pool,
nSpokesMin = ...nSpokesMax = ...timeoutWaitForSpokes = ...timeoutExecution = ...);
return;
Population + Invocation of Pclosure
![Page 48: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/48.jpg)
Partition/OpenMPI: Second Parallel Capability
r"""task<template> ::struct {# SPM component ...spm ::struct {
meta ::struct {label ::scalar<stringSnippet> = deferred;apiArgs ::dict<string,mixed> = deferred;timeout ::scalar<timeout> = deferred;
};
core ::struct {relaunchPre ::scalar<bool> = None;relaunchPost ::scalar<bool> = None;nameHost ::scalar<auto> = None;whoAmI ::scalar<auto> = None;
};
stat ::struct {exception ::scalar<auto> = None;returnValue ::scalar<record> = None;
};};# non-SPM component ...
};"""
Typedef for Template Task
![Page 49: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/49.jpg)
Partition/OpenMPI: Second Parallel Capability
@spm.util.dassert(predicateCb = spm.sys.sstat.amOnline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __taskStat(pc):try:hdl = pc.stage1.payload.tie();returnValue = hdl.spm.stat.returnValue;if (returnValue.Has(attr = ’stdOut’)):
print("\tstdOut : %s", returnValue.stdOut);if (returnValue.Has(attr = ’stdErr’)):
print("\tstdErr : %s", returnValue.stdErr);if (returnValue.Has(attr = ’stdOutErr’)):
print("\tstdOutErr: %s", returnValue.stdOutErr);except (SPMTaskDropped,
SPMTaskLoad,SPMTaskEval,), (hdl,):
pass;
return (pc.stage1.event.done(),None,)[-1];
Callback for Status Reports
![Page 50: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/50.jpg)
Partition/OpenMPI: SPM.Python Session
l GNU/Linux [] spm.3.111116.trial.A.python(Trial Edition)
Spm.Python 3.111116 / Python 2.4.6
[GCC 4.4.3 (64 bit) on linux2]
NOTE
>>>> Trial period ends at <<<<
>>>> 24:00 hrs (Pacific Standard Time) <<<<
>>>> December 29, 2011 <<<<
Type "help", "copyright", "credits", "license" or "spm.Api()" for more information.
Type "spm.DemoExtract(dirname = ...)" to extract demo scripts.
Please visit www.mbasciences.com for the latest and growing
collection of scripts and technical briefs classified in terms of
parallel management patterns.
l >>> import pooll >>> import demol >>> import os;l >>> taskApiArgs = \l dict(app = os.getcwd() + ’/hello_world’,l appOptions = "-prefix=’app’",l );l >>> taskTimeout = spm.util.timeout.after(seconds = 10);3 >>> demo.main(pool = pool.intraAll(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall app => 0l app => 1l #: MetaStatus (hub): Tasks - EvalDone3 >>> demo.main(pool = pool.intraOnePerServer(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall #: MetaStatus (hub): Tasks - EvalDonel >>> exit()l GNU/Linux []
![Page 51: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/51.jpg)
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
![Page 52: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/52.jpg)
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
![Page 53: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/53.jpg)
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
![Page 54: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/54.jpg)
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
![Page 55: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/55.jpg)
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
![Page 56: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/56.jpg)
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
![Page 57: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/57.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
![Page 58: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/58.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
![Page 59: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/59.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP
![Page 60: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/60.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative
![Page 61: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/61.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative BSP
![Page 62: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/62.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative BSP DAG
![Page 63: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/63.jpg)
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative BSP DAG
● ● ●
![Page 64: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/64.jpg)
Conclusion
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
![Page 65: Advanced Parallel Primitives in SPM.Python for Inheriting ......Parallelism: The management of a collection of serial tasks Management: The policies by which: tasks are scheduled,](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec4d8e926e62b306404b9fc/html5/thumbnails/65.jpg)
Conclusion (Cont’d)
http://www.mbasciences.com
⎧⎪⎪⎪⎨⎪⎪⎪⎩
SPM.Python distribution
Technical Briefs
Parallel Management Patterns
⎫⎪⎪⎪⎬⎪⎪⎪⎭
CloneOnceRepeat
PartitionDAGList
PartitionAggregateCentralizedDecentralized
Elementary
Parallel Primitives
PartitionGrid/OpenMPI
In
Limited Beta
HPC
Parallel Primitives
PartitionData FlowGraph
Limited Beta
Jan 24, 2012
Data / Graph
Parallel Primitives