use or disclosure of this data outside the arms program or government is restricted without the...

67
Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company. Phase II PI Meeting Lockheed Martin Lockheed Martin Advanced Technology Laboratories Advanced Technology Laboratories April 11-13, 2006 DARPA:ARMS DARPA:ARMS

Upload: derek-thornton

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Phase II PI Meeting Phase II PI Meeting Lockheed Martin Lockheed Martin

Advanced Technology LaboratoriesAdvanced Technology LaboratoriesApril 11-13, 2006

DARPA:ARMSDARPA:ARMS

Page 2: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

2ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Team OverviewTeam Overview

Tom Damiano, Patrick Lardieri, Gautam ThakerProject Leadership

Tom Damiano (ATL) & Ed Muholland , Jaiganesh Balasubramanian Will Otte, Nilabja Roy, Nishanth Shankaran (Vanderbilt)

Resource Allocation and Control Engine (RACE)

Patrick Lardieri & Tom Damiano (ATL), Doug Schmidt (Vanderbilt)

Technology Transition

Ming Xiao (Vanderbilt)CIAO DDS Integration

Don Krecker (ATL), Blake Ross (LM), Rose Daley & I-Jeng Weng (APL), Yiaming Je (BBN)

Company Resource Management

Gautam Thaker (ATL), Chenyang Lu &Yuanfang Zhang, Chris Gill (Washington University St. Louis)

Certification Technologies

Gautam Thaker (ATL), Raj Rajkumar & Gaurav Bhatia (CMU), Joe Cross (DARPA)

Gate Test 2

Michael Price, Ed Mulholland, & Tom Damiano (ATL), Matt Gillen (BBN), Doug Stuart (Boeing), John Cosgrove (Raytheon), Will Otte (Vanderbilt)

Gate Test 1

Extended Team Phase II Activity

Page 3: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

3ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressTechnical Accomplishments/Progress

Phase II - Gate Test IPhase II - Gate Test IExperimental ResultsExperimental Results

Page 4: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

4ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Gate Test 1 was conducted using two scenarios: GT-1A and GT-1BGate Test 1 was conducted using two scenarios: GT-1A and GT-1BInvolving two pools, three nodes per pool, and two application stringsInvolving two pools, three nodes per pool, and two application strings

GT-1A GT-1A

Pre-Condition: The TSCE is operating normally.

Scenario: A fault occurs which is detected by MLRM. MLRM begins dynamic reconfiguration when an artificial fault is induced within the MLRM. The MLRM detects the failure to dynamically reconfigure and deploys a feasible static configuration.

Post-Condition: The TSCE is operating with the static configuration.

GT-1BGT-1B

Pre-Condition: The TSCE is in a MLRM determined configuration following a failure(s).

Scenario: A human operator signals the system to ‘fallback’ to a feasible static configuration.

Post-Condition: The TSCE is operating with the static configuration.

GM3-string 2.2GM3-string 2.2

GM3-string1.1GM3-string1.1

ed-1, ed-2, ed-1, ed-2, plan-3plan-3, plan-1, cfgop-1, , plan-1, cfgop-1, eff-1, eff-7, eff-8, eff-12,eff-13eff-1, eff-7, eff-8, eff-12,eff-13

smm-1, smm-1, plan-3plan-3, plan-4, plan-1, plan-4, plan-1

Technical Accomplishments/ProgressPhase II - Gate Test 1: CONOPS - Do No HarmTechnical Accomplishments/ProgressPhase II - Gate Test 1: CONOPS - Do No Harm

Node-Chaparal

Node-Javelin

Node-Hogfish

Pool-2Pool-2

Node-Checkmate

Node-Mako

Node-Champion

Pool-1Pool-1

Page 5: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

5ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Run Sequence Run Sequence PlotPlot

Lag PlotLag Plot

Histogram Histogram PlotPlot

Normal Probability Normal Probability PlotPlot

Test CaseTest Case

Tim

e (m

s)T

ime

(ms)

Elapsed Time (ms)Elapsed Time (ms)

To

tal T

est

Cas

eT

ota

l Tes

t C

ase

Tim

e

Tim

e tt

(m

s) (

ms)

Time Time t-1t-1 (ms) (ms)

Ord

ered

Res

po

nse

Ord

ered

Res

po

nse

Normal Order Statistic MediansNormal Order Statistic Medians

Technical Accomplishments/ProgressPhase II - Gate Test 1A: Final Experimental ResultsTechnical Accomplishments/ProgressPhase II - Gate Test 1A: Final Experimental Results

Outliers are due to Non-RT OSOutliers are due to Non-RT OS

Time (ms)Time (ms)

time

Pool Failure Pool Failure DetectedDetected

Pool Mgr Receives Pool Mgr Receives New DeploymentNew Deployment

Resource Allocator Executes Resource Allocator Executes … Induced Error Occurs… Induced Error Occurs

PM Detects RA PM Detects RA ErrorError

IA Notified of IA Notified of Redeploy FailureRedeploy Failure

app performs useful app performs useful workwork

Pool 1.B FailsPool 1.B Fails

XPM Receives Static PM Receives Static

FallbackFallback

WLGs Started IA WLGs Started IA Declares Declares

Redeployment Redeployment CompleteComplete

Data Collection PeriodData Collection Period

Code Base: CVS Branch PHASE2_GM1

Environment: Emulab build phase2-gm1-emulholl

Scenario Time Line

Location Measures Dispersion Measures Mid-Range 101.41 Range 101.43

Mean 68.59 17.22

Median 66.79 Minimum 50.69

Lower ¼ 61.56 Upper ¼ 70.31

Observations 30 Maximum 152.12

Click for Animated Scenario

Results on ARMS wikiResults on ARMS wiki

Page 6: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

6ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 1B: Final Experimental ResultsTechnical Accomplishments/ProgressPhase II - Gate Test 1B: Final Experimental Results

Time (ms)Time (ms)

Code Base: CVS Branch PHASE2_GM1

Environment: Emulab build phase2-gm1-emulholl

time

Operator Operator Initiated Initiated FallbackFallback

ASM Suspends ASM Suspends Execution of Execution of Affected AppsAffected Apps ASM Starts/Resumes ASM Starts/Resumes

new Appsnew Apps

IA Notified of Static IA Notified of Static Deployment Deployment

RequestRequest

app performs app performs useful workuseful work

System in System in MLRM MLRM

Determined Determined StateState

PM Receives Static PM Receives Static FallbackFallback

WLGs Started IA WLGs Started IA Declares Declares

Redeployment Redeployment CompleteComplete

NP Kills Affected NP Kills Affected AppsApps

X

Data Collection PeriodData Collection Period

Scenario Time Line

Click for Animated Scenario

Location Measures Dispersion Measures Mid-Range 315.54 Range 2.94

Mean 315.19 1.22

Median 314.39 Minimum 314.07

Lower ¼ 314.22 Upper ¼ 316.31

Observations 5 Maximum 317.01

Note: Timeline includes startup of WLGsNote: Timeline includes startup of WLGs

Results on ARMS wikiResults on ARMS wiki

Page 7: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

7ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 1: Gate Test CompletedTechnical Accomplishments/ProgressPhase II - Gate Test 1: Gate Test Completed

Does the MLRM deploy a feasible static configuration? YES

Time between the occurrence of the fault and restored operation using the statically defined configuration. Mean = 68ms.

GT-1A Metrics:GT-1A Metrics:

GT-1B Metrics:GT-1B Metrics: Does the MLRM deploy a feasible static

configuration? YES

Time between the issuance of a command and restored operation using the statically defined configuration. Current Mean = 315.2s.

Gate Test Passed!Gate Test Passed!

Page 8: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

8ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Phase II - Gate Test IIExperimental Results

Phase II - Gate Test IIExperimental Results

Page 9: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

9ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 2: ObjectivesTechnical Accomplishments/ProgressPhase II - Gate Test 2: Objectives

• Provide efficient algorithms for finding a feasible allocation solution when one exists for Bob(X)-scale problems and beyond

• Exploit special features of practical aspects of problem in a provable way• Presence of ‘slack’ in the packing• Discrete sizes of objects sizes• Expected number of bins and/or objects

• Employ an Ensemble approach - Run multiple heuristics in sequence (or in parallel) – If one heuristic does better in one particular part of the problem space, a solution will be

found by one of these heuristics with a very high probability– Framework uses multiple heuristics in sequence until one succeeds or all fail. Sequence

ordered based on properties of problem set e.g. Non-zero slack, zero slack, size_ratio, etc

3.1536E-07

10 yrs

1.5768E-073.1536E-081.0E-15Probability of meteor strike within duration

5 yrs1 yr1 secDuration

• No time limit specified• Assumption: feasible allocation to be found within 1 second

Acceptable failure probability:

Page 10: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

10ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Runtime(s)

Problem Size

1 second1 second

Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble & ResultsTechnical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble & Results

100,000 tests each ~0.5% quantization (bin size of 210, object size: multiple of 1) Problem size(x): x2 bins and x3 objects

Ensemble Heuristics: WFD (Worst-Fit-Decreasing): spreads

objects across bins (load-balancing heuristic)

FFD (First-Fit-Decreasing) BFD (Best-Fit-Decreasing) Efficient SubsetSums enumeration Base SubsetSums with preference for

low homogeneity subset sums. Base SubsetSums with preference for

high homogeneity subset sums. LSUBS (developed by Gautam Thaker ) Java Kimchee (developed by Dr. Joe

Cross)

Only a small # of size 3.3size 3.3 cases fail the strict G2 test.

Page 11: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

11ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble Runs for size=3.3Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble Runs for size=3.3

0.000080.9999299992 Kimchee with a 60-second timeout

0.006680.9933299332 LSubs with a 1-second timeout

0.021890.9781197811 Subset Sums with Hi Homogeneity

0.02880.971297120 Subset Sums with Lo Homogeneity

0.085490.9145191451 Efficient Subset Sums

0.914510.023492349 BFD

0.80030.01997 1997 FFD

0.999750.00025 25 WFD

% Failure% Success# SuccessesHeuristic

Randomly generated 100,000100,000 zero-slackzero-slack tasksets for the most difficultmost difficult size_3.3size_3.3 case.

Complete Failure Probability Complete Failure Probability if the heuristics were independent:if the heuristics were independent: 2.10743E-112.10743E-11

Page 12: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

12ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble ObservationsTechnical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble Observations

• The Ensemble approach is an excellent scheme to adopt.– A collection of heuristics (each of which has < 100% success rates) can

yield 100% success rates– Runtimes decrease significantly since the most complex schemes are

invoked only when the efficient ones fail.

• However, as used, it does NOT meet the strict 1-second time limit we assumed in GT-2

– Can take 20 seconds or longer in the worst case

• Accepting that there is quantization levels

A Practical Assumptions:

Page 13: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

13ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 2: The Quantization EffectTechnical Accomplishments/ProgressPhase II - Gate Test 2: The Quantization Effect

3.1536E-07

1.5768E-07

3.1536E-08

1.0E-15

-

Acceptance Threshold

002.05E-025.12E-01If an allocation occurs every hour for 10 years, probability of at least 1 failure = 1-(1-p)(10*365*24)

001.03E-023.02E-01If an allocation occurs every hour for 5 years, probability of at least 1 failure = 1-(1-p)(5*365*24)

0*0*2.37*10-78.04*10-6Probability of allocation failure with 1s timeout (p)

002.07E-036.93E-02If an allocation occurs every hour for 1 year, probability of at least one failure with 1s timeout =

1-(1-p)(365*24)

0

5%

02378042# of failures with a 1s timeout

2.5%1%~0.5%Quantization Level

Note: Failures occur only for size 3.3

* A lot more samples are needed to observe this extremely improbable event.

Bin-Packing Ensemble Failure Probability (from 101099 cases)

Page 14: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

14ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressPhase II - Gate Test 2: Related Observations Technical Accomplishments/ProgressPhase II - Gate Test 2: Related Observations

• Other failure thresholds considered in practice:– Air Traffic Control availability requirement is 99.99999% failure probability at

any given instant is 1010-7-7.

– Hardware / software failure probability is of the order of 1010-7-7 to 1010-8-8 even in reliable systems

• In the (very unlikely) event of an allocation failure – critical tasks can be allocated very efficiently first (with non-zero slack >= 20%,

even the basic heuristics succeed)

• As the 1-second time limit is relaxed, the failure probability decreases exponentially even at low quantization levels

– With a 10-second timeout and 0.5% granularity, probability of allocation failure over 1 year drops to 3.36E-04 (from 6.93E-02)

– With a 50-second timeout and 0.5% granularity, probability of allocation failure over 1 year drops to 1.24E-07 (from 6.93E-02)

Page 15: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

15ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Gate Test II requirements have been satisfied: Feasible allocation found over independent, large sample, problem

sets. Feasible allocation found in all cases in less than 1 second except

size_3_3 where there were a small number of outliers. Solution was demonstrated for no slack stress cases and more

realistic slack cases. A careful study of impact of distribution of item sizes, item size

quantization and overall problem size was completed Parallel ensemble execution shows a collection of heuristics (each

of which has < 100% success rates) can yield 100% success overall With allowance for quantization, event the most demanding cases

can meet the “Meteorite-bound”

Related additional research completed beyond strict requirements: Extend to multi-dimensional bin-packing

Constraints along each dimension must be satisfied In the Bob(X) context, the dimension of processor utilization, network

utilization, and memory needs are typical.

Technical Accomplishments/ProgressPhase II - Gate Test 2: Gate Test CompletedTechnical Accomplishments/ProgressPhase II - Gate Test 2: Gate Test Completed

Gate Test Passed!Gate Test Passed!

Page 16: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

16ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Node Alive Research and Results

Node Alive Research and Results

Page 17: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

17ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

UDP-PUSH Approach to Node-Failure-DetectionAll clients “push” node-alive messages to a monitor at 100HZ

– Inter-arrival of messages at the monitor should be 10 msec, confirmed in data that is collected – see graphic.

• Node-Alive Monitor “sweeps” over received messages at 50HZ

• Monitor declares client node failure after 2 sweeps without receiving a beat from a client

• Current testing is at Emulab using up to 20 real nodes and 380 virtual nodes.

• Failures are simulated by the clients by suppressing 10 messages at every 60 second mark

• Fastest detection is 40 msec, slowest 60 msec – confirmed in current testing (see graphic).

• A RT Linux kernel was used to obtain accurate 100HZ and 50HZ loops (Ingo Molnar kernel with real-time patches – version 2.6.15-rt15-smp).

• With 380 nodes monitor receives 38,000 messages/sec– Monitor load has been observed to be about 8%– It is estimated that a Hierarchical solution (not yet implemented) will cut this

down to < 2% at cost of increase in maximum detection time.

• In current tests no UDP packets are lost – no false alarms• Further testing and hierarchical implementation underway

10 msec mean interarrivals for

2.2% of samples exceed theoretical max of 60 msec

Technical Accomplishments/ProgressNode Failure Detection: UDP Push ModelTechnical Accomplishments/ProgressNode Failure Detection: UDP Push Model

Page 18: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

18ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Acceptable Real-time Performance w/ Up to 1000 ClientsAcceptable Real-time Performance w/ Up to 1000 ClientsAcceptable Real-time Performance w/ Up to 1000 ClientsAcceptable Real-time Performance w/ Up to 1000 Clients

• Observed a 5x increase in CPU load when using Linux w/ complete preemption patches

• Initiated technical exchanges with RT Linux group (Ted Tso, Ingo Molnar, others.)

Technical Accomplishments/ProgressNode Failure Detection: Performance Technical Accomplishments/ProgressNode Failure Detection: Performance

Page 19: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

19ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressNode Failure Detection: Performance (continued)

Technical Accomplishments/ProgressNode Failure Detection: Performance (continued)

SMP Kernel w/ Preemption Patches has 3x Larger LatencySMP Kernel w/ Preemption Patches has 3x Larger Latency

Page 20: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

20ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Certification of DRM Systems Technologies and Methods

Certification of DRM Systems Technologies and Methods

Page 21: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

21ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Certification is a process to verify that system behavior remains within safety and effectiveness parameters

DRE system effectiveness typically requires performing a subset of tasks within temporal bounds or deadlines In many cases the deadlines apply to an end-to-end string

Dynamic Resource Management generates new system configurations and thereby moves part of the certification process into the system runtime

Technical Accomplishments/ProgressCertification: ProblemTechnical Accomplishments/ProgressCertification: Problem

Use scheduability analysis techniques (periodic and strict aperiodic, and transient periodic tasks triggered by a periodic events) to predict whether a particular allocation will meet deadlines while bounding pessimism.

Automate the process of determining feasible and appropriate deployment placements by providing algorithms for release, development, and integration that determine the appropriate allocation, based on the QoS requirements and constraints of the applications and operational strings.

Approach Certification with Simple DRM CapabilitiesSimple DRM Capabilities and Full DRM CapabilitiesFull DRM Capabilities

Solution Space

Problem Space

Page 22: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

22ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressCertification: ApproachTechnical Accomplishments/ProgressCertification: Approach

Simple DRM Capabilities

• Add constraint capabilities to current allocation methodologies (Phase II)– Mutual Placement Constraints (e.g. replicas)– Attribute Matching Constraints (e.g. OS type)

• Introduce multi-dimensional bin packing algorithms (Phase II)

• Engineering Support Tools (Phase II)– Provide capabilities to Bob(X) to build a pedigree of cases, while providing from static generation

tools (Phase III)– ARMS (Phase III)

• Provide RACE Capability for Online Use (Phase III)

• Schedulability Method for simple QoS Allocation (Phase I)• Schedulability Method for ARMS (Phase II)• Constraint Method (Phase III)• Online Capabilities in RACE (Phase III)

– Constraint Capable Bin-Packer Planner– Attribute Matching Constraints (e.g. OS type)

• Full QoS Allocation (Phase III)• Verification (Phase III)

– Delta from static plans w/small perturbations

Full DRM Capabilities

Page 23: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

23ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

– Possible use Offline and Online• Offline tool-suite with integrated algorithm support for

generation of static deployment plans and research into new algorithms.

• Pluggable algorithms – usable “as-is” both online and offline; I.e. the same components run online (within RACE and offline within the tool-suite).

– Statistical history capture– Output adaptation to accommodate varying needs

for deployment configuration file generation– Support for ensemble algorithm runs– Flexible test input distribution generation for

validating algorithms and Extensions for Scheduability Analysis

– Variations of simple bin packing and heuristics-based algorithms for more challenging (e.g. zero slack) problems.

– Multi-Dimensional variations on allocation algorithms – including 3-D bin-packing along CPU, memory, and network bandwidth dimensions.

– Constraint-Based allocation– Incorporation of Scheduability

Ru

nti

me

Off

lin

e

Deployment Deployment ConfigurationConfiguration

Technical Accomplishments/ProgressCertification: QoS Driven Allocation Tools CapabilitiesTechnical Accomplishments/ProgressCertification: QoS Driven Allocation Tools Capabilities

Page 24: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

24ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Schedulability analyses for end-to-end aperiodic tasks with hard deadlines 1st Approach: Aperiodic Utilization Bound (AUB) - Online 2nd Approach: Deferrable Server (DS) - Offline

Accomplishments Implemented AUB and DS schedulability analyses Developed heuristics for tuning Deferrable Server Compared two approaches via numerical studies Implementation on TAO federated event channel

• The first DS implementation in middleware• Online admission control based on AUB

Empirical results on a Linux cluster• Validation of schedulability analysis• Run-time overhead

On-going Developing deferrable server mechanisms in TAO’s federated event channel Validating schedulability analyses via empirical studies on TAO

Technical Accomplishments/ProgressTowards Certification: Aperiodic Tasks - OverviewTechnical Accomplishments/ProgressTowards Certification: Aperiodic Tasks - Overview

Page 25: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

25ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Incorporate aperiodic tasks in periodic scheduling

• Server: a periodic task responsible for processing aperiodic requests.• Budget: maximum time the server can run in a period

• Algorithm• Server is suspended when its budget runs out

Bound aperiodic tasks’ impact on periodic tasks

• Budget is replenished in the beginning of each period

Technical Accomplishments/ProgressTowards Certification: Deferrable ServerTechnical Accomplishments/ProgressTowards Certification: Deferrable Server

Overview

Implementation

• Challenge: Implement bandwidth preserving servers on top of priority-based operating systems.

• Solution– Server thread processes aperiodic events (2nd highest priority)– Budget thread manages the budget and controls the execution of server threads

(highest priority)

Page 26: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

26ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Correctness: No schedulable task sets had deadline misses.

• Pessimism: Some of the unschedulable task sets also met deadlines.

4 processors; 4 aperiodic tasks+ 8 periodic tasks4 processors; 4 aperiodic tasks+ 8 periodic tasks

Technical Accomplishments/ProgressTowards Certification: Deferrable Server ValidationTechnical Accomplishments/ProgressTowards Certification: Deferrable Server Validation

Page 27: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

27ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Budget manager: < 89us per server period • Server thread: < 159us per aperiodic subtask

Technical Accomplishments/ProgressTowards Certification: Deferrable Server OverheadTechnical Accomplishments/ProgressTowards Certification: Deferrable Server Overhead

Page 28: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

28ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Central admission controller for end-to-end tasks.

• Admission test– If the system remains within the feasible region

• admit the new task into the system

• increase the synthetic utilization

– Decrement synthetic utilization• at the deadlines of aperiodic tasks

• [resetting rule] when CPU idles

Technical Accomplishments/ProgressTowards Certification: Admission Control (AC)Technical Accomplishments/ProgressTowards Certification: Admission Control (AC)

AC Policies• Soft Tasks

– Send an event to notify the central admission controller– Hold the task in a waiting queue and waits for the reply

• Hard Tasks– Release immediately, then notify AC– AC may eject soft periodic tasks when it receives the notification.

• Aperiodic Tasks– Admission test for every job– CPU idles idle thread reports the departed aperiodic tasks to AC

• Periodic Tasks– Admit once and maintains reservation for a task

Page 29: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

29ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Round-trip latency for admitting a soft taskRound-trip latency for admitting a soft task

Hard tasks are admitted immediatelyHard tasks are admitted immediately

Technical Accomplishments/ProgressTowards Certification: AC Latency for Soft TasksTechnical Accomplishments/ProgressTowards Certification: AC Latency for Soft Tasks

Page 30: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

30ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Online admission control significantly outperformed offline analysis.– All task sets are unschedulable under offline analysis – Resetting significantly increased the number of admitted tasks.

3 processors + 1 AC processor3 processors + 1 AC processor4 soft aperiodic tasks and 5 soft periodic tasks4 soft aperiodic tasks and 5 soft periodic tasks

Technical Accomplishments/ProgressTowards Certification: AC – Admission RatioTechnical Accomplishments/ProgressTowards Certification: AC – Admission Ratio

Page 31: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

31ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Workshop and Demonstration

RACE Workshop and Demonstration

Page 32: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

32ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopTool Chain: Demonstration Highlights RACE Demo and WorkshopTool Chain: Demonstration Highlights

Scenario 1 - Demonstrates RACE Control by reacting to deadline misses in a critical path modeled into the RT1H operation string. The critical path exceeds its EED threshold due to the introduction of a competing operation string that consumes excessive CPU.

Scenario 2 -Scenario 2 - Demonstrates the ability of the tool chain to handle Shared Components. Two operation strings are deployed with shared components between them. After deployment a string is torn down to show the other (involving the shared component) is still operational.

Scenario 3 -Scenario 3 - Demonstrates FT extensions to PICML to capture fault tolerant requirements. The concepts of SRG and FOU are shown and an integrated interpreter is used to run an offline constraint-based algorithm for replica placement.

The RACE demonstration is composed of three scenarios. These scenarios involve RACERACE (control and allocation), DAnCEDAnCE, PICMLPICML, CUTSCUTS, CoWorkErCoWorkEr and the BMW elements.

PICMLPICML Flat Deployment Flat Deployment PlanPlan

Flat DeploymentFlat DeploymentPlanPlan

(modified)(modified)

RACERACE

DAnCEDAnCE

Hierarchical PlanHierarchical Plan

Many of the initial capabilities being shown will support GT-4GT-4, or are extendable to do so.

Page 33: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

33ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopTool Chain: New Capability Highlights RACE Demo and WorkshopTool Chain: New Capability Highlights

The RACE demonstration highlights many of the new capabilities developed for the RACE framework and related tool chain, many of which are intended to support GT-4GT-4..

* Importance Attr. (supports GT-4) * Static and Dynamic Plans (supports GT-4) * Component Dynamic Placeability Attr. (supports GT-4) * Shared Components (supports GT-4) * Hierarchical Descriptors (supports GT-4) * PICML Modifications (supports GT-4) o FT Elements o Shared Components o Qos Attributes * DaNCE Modifications (supports GT-4) o ReDAC o Priority Control o Component-Process Mapping o Shared Component Support * Web and Interactive Input Adapters

* RACE Control (supports GT-4) o EED Monitoring o Reactive control of OS priority based on importance

* WLG-2 Capabilities o Code Generation o BMW Integration o BDC Integration

* Ensemble Planner * Target Manager * Fault Model Elements o Failover Unit o Replication Group o CCM IOGRs o Shared Risk Group o Constraint-Based Allocation # metrics (e.g. distance, co-failure) # integrated in interpreter - offline analysis # motivates contraint-based allocation

Page 34: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

34ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopTool Chain: Future CapabilitiesRACE Demo and WorkshopTool Chain: Future Capabilities

The RACE framework and tool chain will require additional capabilities to support the current GT-4GT-4 CONOPS.

* RACE Follow-on work o Plan State (supports GT-4) + ReDAC Integration + (Re)plan on Importance + Include FT simplex deployments + Integration of Node Alive Solution

o Events on plan progress and status (supports GT-4) o Warfighter Value/Importance Constraints on Placement (supports GT-4) o Submission of Multiple Plans Simultaneously (supports GT-4)

* Multi-D Planner o Multiple Heuristics: FFD, WFD, BFD, Efficient Subset Sums o Modeled 3 dimensions: CPU Utilization, Memory, Network Bandwidth Algorithms available and initial development done.

Page 35: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

35ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Controller ResearchRACE Control Research

Controller ResearchRACE Control Research

Page 36: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

36ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Task model• Soft, end-to-end deadlines

• Two types of tasks: critical tasks and non-critical tasks

• Goals• Performance isolation: protect critical tasks against

disturbance from non-critical ones

• Minimize deadline misses: improve overall performance

• Handle uncertainties and dynamics• Task arrival/departure• Fluctuation in execution times

• Practical, application-transparent adaptation• Actuator: Priority adjustment• Sensor: CPU utilization, deadline miss• Planned for future RACE implementation

Technical Accomplishments/ProgressRACE Control: Flexible Maximum Urgency First (FMUF)

Technical Accomplishments/ProgressRACE Control: Flexible Maximum Urgency First (FMUF)

Page 37: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

37ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Two priority classes• Each class is scheduled by a real-time policy (RMS, EDF)• Critical tasks high-priority class

• Feedback control• Dynamically change the priority-class of non-critical tasks

based on deadline misses in the high-priority class• No miss: Non-critical tasks high-priority class• Miss: Non-critical tasks low-priority class

• Avoid oscillation based on measured CPU utilization• Maximize #tasks in the high-priority class without

causing deadline misses in that class

Technical Accomplishments/ProgressRACE Control: The MUF ApproachTechnical Accomplishments/ProgressRACE Control: The MUF Approach

Page 38: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

38ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Phase III Future Work

Phase III Future Work

Page 39: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

39ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Scheduability Analysis (integrated with the allocation/placement problem)

• Multi-Dimensional Allocation

• Constraint-Based Allocation/Placement

• Certifiability of these approaches

Including a framework for testing and researching new algorithms

Verifying allocations meet certification constraints (e.g. differ from a static plan in a specified manner or according to specified rules)

• Offline and Online capability for this analysis and planning

Offline tool-suite with integrated algorithm support for generation of static deployment plans and research into new algorithms.

Pluggable algorithms usable "as-is" both online and offline; I.e. the same components run online (within RACE and offline within the tool-suite).

Phase III IdeasPossible Phase III Research Areas Phase III IdeasPossible Phase III Research Areas

Page 40: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company
Page 41: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

41ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Backup SlidesMain Presentation Support Slides

Backup SlidesMain Presentation Support Slides

Page 42: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

42ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Pool-1.B

JavelinJavelin

ChaparalChaparal

Pool-1.A

MakoMako

ChampionChampion

CheckmateCheckmate

HogfishHogfish

smm.1

Primary WLG

Redeployed WLG

Legend

• TSCE is operating normally – as configured by MLRM

• A fault occurs and is detected by MLRM

• An artificial error causes MLRM dynamic allocation to fail

MLRM

Fault Detected

Dynamic Allocation

• MLRM attempts dynamic re-allocation

X

eff.13eff.12plan.1

ed.1 eff.8 plan.4

plan.3eff.1ed.2

plan.3

• MLRM deploys a feasible static allocation

cfgop.1

cfgop.1

eff.7

eff.7

sharedshared

Technical Accomplishments/ProgressPhase II - Gate Test 1A: Test ScenarioTechnical Accomplishments/ProgressPhase II - Gate Test 1A: Test Scenario

Click to return to results slide.

Page 43: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

43ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Pool-1.A

MakoMako

ChampionChampion

CheckmateCheckmate

Pool-1.B

JavelinJavelin

HogfishHogfish

ChaparalChaparal

• TSCE is operating normally – as configured by MLRM

• Operator elects to fall back to a feasible static allocation

• MLRM deploys a feasible static allocation

MLRM

Static Allocation Request

• MLRM tears down existing dynamically allocated strings

smm.1eff.1eff.12plan.12

ed.1 eff.8 plan.4

plan.3eff.1ed.2

plan.3

cfgop.1

sharedshared

eff.7

eff.7

cfgop.1

Technical Accomplishments/ProgressPhase II - Gate Test 1B: Test ScenarioTechnical Accomplishments/ProgressPhase II - Gate Test 1B: Test Scenario

Primary WLG

Redeployed WLG

Legend

Click to return to results slide.

Page 44: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

44ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Technical Accomplishments/ProgressCertification Model: Constrained Perturbation Technical Accomplishments/ProgressCertification Model: Constrained Perturbation

Template(analogous to a static plan)

Dynamic Plans(DRM generated plans)

constraints, parameters

class of plan

traditionally certifiable

class of plan

from plan class

transformation domain

class transformation relation-pair

inverse verification

φRφR-1

cert

ifiab

ly

cons

trai

nt-

isom

orph

ic

legal dynamic domain for

class of plan

constraints, parameters

from plan classclass transformation relation-pair

inverse verification

ΨRΨR-

φ

legal dynamic domain for

class of plan

cert

ifiab

ly

cons

trai

nt-

isom

orph

ic

feasibly scheduable

feasibly allocatable

Φ П

isomorphic transformation

Γboolean

certification meterics

DRM certification gauntlet

Page 45: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demonstration and WorkshopRACE Demonstration and WorkshopRACE Demonstration and WorkshopRACE Demonstration and Workshop

Lockheed Martin Lockheed Martin Advanced Technology LaboratoriesAdvanced Technology Laboratories

andandVanderbilt UniversityVanderbilt University

DARPA:ARMSDARPA:ARMS

Page 46: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

46ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopTool Chain Demonstration RACE Demo and WorkshopTool Chain Demonstration

Scenario 1 - Demonstrates RACE Control by reacting to deadline misses in a critical path modeled into the RT1H operation string. The critical path exceeds its EED threshold due to the introduction of a competing operation string that consumes excessive CPU.

Scenario 2 -Scenario 2 - Demonstrates the ability of the tool chain to handle Shared Components. Two operation strings are deployed with shared components between them. After deployment a string is torn down to show the other (involving the shared component) is still operational.

Scenario 3 -Scenario 3 - Demonstrates FT extensions to PICML to capture fault tolerant requirements. The concepts of SRG and FOU are shown and an integrated interpreter is used to run an offline constraint-based algorithm for replica placement.

The RACE demonstration is composed of three scenarios. These scenarios involve RACERACE (control and allocation), DAnCEDAnCE, PICMLPICML, CUTSCUTS, CoWorkErCoWorkEr and the BMW elements.

PICMLPICML Flat Deployment Flat Deployment PlanPlan

Flat DeploymentFlat DeploymentPlanPlan

(modified)(modified)

RACERACE

DAnCEDAnCE

Hierarchical PlanHierarchical Plan

Many of the initial capabilities being shown will support GT-4GT-4, or are extendable to do so.

Page 47: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

47ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopPhysical Demo Setup: ISIS Lab RACE Demo and WorkshopPhysical Demo Setup: ISIS Lab

RACERACE

DAnCEDAnCE

PICMLPICML

ISIS LabISIS Lab

wiki.isis.vanderbilt.edu/support/isislab.htm

RACEController

Resource Utilization (system and per Opplication)

Opp string QoS

Target Manager

Resource Monitor

Resource Monitor

Resource Monitor

Resource Utilization

Resource Utilization

Resource Utilization

RACE Control Agent

CKRM Control Agent

FCSControl Agent

CPU Broker Control Agent

Opp-string 1 Control Agent

CUTS BDC

Opp-string 1 QoS

Monitor

QoS Information

QoS Information

QoS Information

OS Priority Agent

Opp-string 2 QoS

Monitor

Opp-string n QoS

Monitor

Opp-string 1 Control Agent

Opp-string 1 Control Agent

InternetInternet

Local Demo LaptopLocal Demo Laptop

RACE Demo GUIRACE Demo GUI

Page 48: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

48ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Control Critical Path

End-to-End Deadline Monitoring and Reactive Control

RACE Control Critical Path

End-to-End Deadline Monitoring and Reactive Control

Page 49: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

49ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopRACE Controller: RACE ComponentsRACE Demo and WorkshopRACE Controller: RACE Components

RACEController

Resource Utilization (system and per Opplication)

Opp string QoS

Target Manager

Resource Monitor

Resource Monitor

Resource Monitor

Resource Utilization

Resource Utilization

Resource Utilization

RACE Control Agent

CKRM Control Agent

FCSControl Agent

CPU Broker Control Agent

Opp-string 1 Control Agent

CUTS BDC

Opp-string 1 QoS

Monitor

QoS Information

QoS Information

QoS Information

OS Priority Agent

Opp-string 2 QoS

Monitor

Opp-string n QoS

Monitor

Opp-string 1 Control Agent

Opp-string 1 Control Agent

Hierarchical Packages&

Deployment Plans

RACE Controller Receives plans from the RACE allocation planners.Key Elements: Target Manager, Race Controller, CUTS BDC, and DAnCE

Page 50: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

50ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopRACE Controller: Scenario OneRACE Demo and WorkshopRACE Controller: Scenario One

First Demo Scenario: 1. Deploy RT1HRT1H Operational String, which has an EEDrequirement specified. View post RACE deployment.

2. Monitor EED

3. Deploy Competing (CPU Hog) Hog_StringHog_String. View post RACE deployment

4. Monitor EED Miss

5. Observe RACE Reactive Control

Deployment After RACE Processing

RACE Demo GUI

All Deployments occur through All Deployments occur through DAnCEDAnCE

Page 51: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

51ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Shared CCM Components RACE and DAnCE handle deployment of shared WLGs

Shared CCM Components RACE and DAnCE handle deployment of shared WLGs

Page 52: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

52ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Second Demo Scenario: 1. Deploy RT1H_Shared_ART1H_Shared_A Operational String, which has shared components. View post RACE deployment that was dynamically planned.

2. Deploy RT1H_Shared_BRT1H_Shared_B Operational String,which shares components with RT1H_Shared_B.RACE uses a planner to dynamically place string.View deployment post RACE processing.

Deployment After RACE Processing

RACE Demo GUI

RACE Demo and WorkshopShared Components: Scenario TwoRACE Demo and WorkshopShared Components: Scenario Two

3. Teardown an Op-String and observe the remaining string stays operational. All Deployments occur through All Deployments occur through DAnCEDAnCE

Page 53: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

53ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

CCM Fault ToleranceModeling Concepts and Demonstration

CCM Fault ToleranceModeling Concepts and Demonstration

Page 54: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

54ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Shared Risk Groups (SRG)Shared Risk Groups (SRG)SRGs are an FT modeling element added to PICML, that allow a modeler to capture associations related to risk. This risk association is then used by the interpreter to constrain replica placement decisions in an attempt to minimize the risk of failures affecting primary and replica(s).

Failover Units (FOU)Failover Units (FOU)FOUs are used to model FT requirements on a component or string. The FOU specifies the number of replicas (among other things) and is used by the interpreter to inject replica components into the deployment and perform the correct connection establishment.

Constraint-Based Node assignment•Offline analysis and planning•Metrics

• Composite Distance•Distance to primary•Comparing two placements•Penalties

•Uniformity•Replica Pair-wise Distance (future)

•Co-Failure Probability (another formulation)

FT Interpreter•Injection

•Components•Connections - CCM IOGR

•Placement

RACE Demo and WorkshopIntroduction: FT Modeling Concepts and Demonstration RACE Demo and WorkshopIntroduction: FT Modeling Concepts and Demonstration

Cli

ck im

ages

for

Det

aile

d S

lid

esC

lick

imag

es f

or D

etai

led

Sli

des

Page 55: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

55ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

Deployment PlansDeployment Plans

6. Replicas placed according to distance-based constraint algorithm using SRG information.

Example of an Offline constraint placement approach within interpreterExample of an Offline constraint placement approach within interpreter

FT InterpreterFT Interpreter

Replica Placement Algorithm

Plan Viewer

GME/PICMLGME/PICML

injection

model

Model Model InformationInformation

Domain, Deployment, SRG,

and FOU

RACE Demo and WorkshopFT Modeling Demonstration: Scenario Three RACE Demo and WorkshopFT Modeling Demonstration: Scenario Three

Third Demo Scenario: 1. Model Components and Strings in PICML2. Create Deployment Plan3. Model FOU4. Model SRG

5. Interpreter Automatically Injects Replicas and Associated CCM IOGRs

Page 56: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

56ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Workshop SlidesPanel Support Material

RACE Workshop SlidesPanel Support Material

Page 57: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

57ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

PICMLPICML Flat Deployment Flat Deployment PlanPlan

Flat DeploymentFlat DeploymentPlanPlan

(modified)(modified)

RACERACE

DAnCEDAnCE

RACE Demonstration and WorkshopModel Driven DRM: Tool Suite

RACE Demonstration and WorkshopModel Driven DRM: Tool Suite

Hierarchical PlanHierarchical Plan

RACE is an extensible CCM framework that integrates multiple resource management RACE is an extensible CCM framework that integrates multiple resource management algorithms for dynamically (re)deploying and (re)configuring application components.algorithms for dynamically (re)deploying and (re)configuring application components.

RACE decouples resource allocation and system adaptation logic from the underlying RACE decouples resource allocation and system adaptation logic from the underlying middleware deployment, configuration, and control mechanisms.middleware deployment, configuration, and control mechanisms.

Page 58: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

58ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

• Pluggable Input AdaptersInput Adapters are responsible for translating input provided to RACE into IDL data structures

• The Plan AnalyzerPlan Analyzer is responsible for examining metadata in the plan and selecting pluggable planners to be run on the plan.

• The Plan ManagerPlan Manager executes the planners selected by the Plan Analyzer.

• Pluggable Output AdaptersOutput Adapters are responsible for translating the provisioned deployment plans into a native format for deployment.

• The ControllerController is responsible for reacting to events presented by the Monitors and actuating any required changes to the configuration and deployment through deployed agents.

RACE Demonstration and WorkshopRACE: A PrimerRACE Demonstration and WorkshopRACE: A Primer

Page 59: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

59ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

PIC

ML

mo

de

lR

AC

E

A model driven process that allows a complete description of information required for managing, deploying, and configuring RTE applications.

A Platform-Independent Component Modeling Language (PICML) is used to capture all pertinent model elements (e.g. AIM and DIM).

Interpreters capture the information in an OMG compliant DnC deployment specification.

Output from the model drives a flexible and extensible CCM based Resource Allocation and Control Engine (RACE).

RACE analyzes and constructs deployment plans (deployable through DAnCE, for example) based on a plug-in framework where planning such as allocation and schedulability analysis contribute to a final configuration.

RACE monitors and adjusts deployments based on prevailing conditions within its domain of control.

CIAO/DAnCECIAO/DAnCE

The RACE infrastructure and tool chain provides…The RACE infrastructure and tool chain provides…

RACE Demonstration and WorkshopRACE: A Primer (continued)RACE Demonstration and WorkshopRACE: A Primer (continued)

Page 60: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

60ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demonstration and WorkshopTool Chain: New Capability Highlights RACE Demonstration and WorkshopTool Chain: New Capability Highlights

The RACE demonstration highlights many of the new capabilities developed for the RACE framework and related tool chain, many of which are intended to support GT-4GT-4..

* Importance Attr. (supports GT-4) * Static and Dynamic Plans (supports GT-4) * Component Dynamic Placeability Attr. (supports GT-4) * Shared Components (supports GT-4) * Hierarchical Descriptors (supports GT-4) * PICML Modifications (supports GT-4) o FT Elements o Shared Components o Qos Attributes * DaNCE Modifications (supports GT-4) o ReDAC o Priority Control o Component-Process Mapping o Shared Component Support * Web and Interactive Input Adapters

* RACE Control (supports GT-4) o EED Monitoring o Reactive control of OS priority based on importance

* WLG-2 Capabilities o Code Generation o BMW Integration o BDC Integration

* Ensemble Planner * Target Manager * Fault Model Elements o Failover Unit o Replication Group o CCM IOGRs o Shared Risk Group o Constraint-Based Allocation # metrics (e.g. distance, co-failure) # integrated in interpreter - offline analysis # motivates contraint-based allocation

Page 61: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

61ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demonstration and WorkshopTool Chain: Future CapabilitiesRACE Demonstration and WorkshopTool Chain: Future Capabilities

The RACE framework and tool chain will require additional capabilities to support the current GT-4GT-4 CONOPS.

* RACE Follow-on work o Plan State (supports GT-4) + ReDAC Integration + (Re)plan on Importance + Include FT simplex deployments + Integration of Node Alive Solution

o Events on plan progress and status (supports GT-4) o Warfighter Value/Importance Constraints on Placement (supports GT-4) o Submission of Multiple Plans Simultaneously (supports GT-4)

* Multi-D Planner o Multiple Heuristics: FFD, WFD, BFD, Efficient Subset Sums o Modeled 3 dimensions: CPU Utilization, Memory, Network Bandwidth

Page 62: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

62ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

DataCenter1_SRGDataCenter1_SRG DataCenter2_SRGDataCenter2_SRG

Rack1_SRGRack1_SRG Rack2_SRGRack2_SRG Node1Node1(blade31)(blade31)

Node2Node2(blade32)(blade32)

Shelf1_SRGShelf1_SRG Shelf2_SRGShelf2_SRG

Blade30Blade30

Ship_SRGShip_SRG

Blade34Blade34 Blade29Blade29

Shelf1_SRGShelf1_SRG

Blade36Blade36

RACE Demo and WorkshopShared Risk Group (SRG): ExampleRACE Demo and WorkshopShared Risk Group (SRG): Example

Blade33Blade33

Page 63: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

63ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

R1

P

R2

R3

Node2Node2(blade32)(blade32)

Shelf1_SRGShelf1_SRG

Ship_SRGShip_SRG

Blade34Blade34 Blade36Blade36

RACE Demo and WorkshopShared Risk Group (SRG): Example (continued)

RACE Demo and WorkshopShared Risk Group (SRG): Example (continued)

Replica1

Primary

Replica2

Replica3

Composite Composite DistanceDistance

Choose a feasible replica placement based on Composite Distance constraints.

Blade30Blade30

Node1Node1(blade31)(blade31)

DataCenter2_SRGDataCenter2_SRGDataCenter1_SRGDataCenter1_SRG

Rack1_SRGRack1_SRG Rack2_SRGRack2_SRG

Shelf1_SRGShelf1_SRGShelf2_SRGShelf2_SRG

Blade29Blade29 Blade33Blade33

Click to return to concepts slide.

Page 64: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

64ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

R1

P

R2

R3

RACE Demo and WorkshopShared Risk Group (SRG): Distance Metric Calculation

RACE Demo and WorkshopShared Risk Group (SRG): Distance Metric Calculation

Choose a feasible replica placement based on Composite Distance constraints.

Formulation of Replica Distance from Primary

Define N orthogonal vectors, one for each of the distance values computed for the N components (with respect to a primary) and vector-sum these to obtain a resultant.  Compute the magnitude of the resultant as a representation of the composite distance captured by the placement . 

1.  Compute the distance from each of the replicas to the primary for a placement.  2.  Record each distance as a vector, where all vectors are orthogonal.  3.  Add the vectors to obtain a resultant.4.  Compute the magnitude of the resultant.5.  Use the resultant in all comparisons (either among placements or against a threshold) 6. Apply a penalty function to the composite distance (e.g. pair-wise replica distance or uniformity)

Click to return to concepts slide.

Page 65: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

65ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

container/component servercontainer/component server

FPCFPC

A

primary IO

R

secondary IOR

HB

container/component servercontainer/component server

FPCFPC

B

HB

container/component servercontainer/component server

FPCFPC

C

HB

container/component servercontainer/component server

FPCFPC

A’

HB

container/component servercontainer/component server

FPCFPC

B’

HB

container/component servercontainer/component server

FPCFPC

C’

HB

periodic FPC heartbeat

IOG

R

IOG

RIO

GR

IOG

R

“client”

IOG

R

RACE Demo and WorkshopFailover Unit (FOU): Component FOU ExampleRACE Demo and WorkshopFailover Unit (FOU): Component FOU Example

Click to return to concepts slide.

Page 66: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company

66ARMS Phase II PI Meeting April 11-13, 2006Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company.

RACE Demo and WorkshopFailover Unit (FOU): OpString FOU ExampleRACE Demo and WorkshopFailover Unit (FOU): OpString FOU Example

container/component servercontainer/component server

FPCFPC

“client”

periodic FPC heartbeat

primary IOR

primary stringprimary string

A

HB

container/component servercontainer/component server

Bcontainer/component servercontainer/component server

C

container/component servercontainer/component server

replica stringreplica string

A’container/component servercontainer/component server

B’container/component servercontainer/component server

C’

secondary IOR

IOG

R

FPCFPC

HB HB

intra-FOU heartbeat

Click to return to concepts slide.

HBHB HB

Page 67: Use or disclosure of this data outside the ARMS Program or Government is restricted without the express written permission of the Lockheed Martin Company