increased reliability through failure predictive scheduling with temperature sensor feedback wesley...

7
Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Upload: valentine-ball

Post on 18-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

Project Goals Use feedback from sensor networks to predict which components are most reliable Increase reliability of system as seen by tasks through failure predictive scheduling

TRANSCRIPT

Page 1: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback

Wesley Emeneker

CSE 534

Dr. Sandeep Gupta

Page 2: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Background

High temperatures reduce computer reliability

Thermal Scheduling is good but doesn’t look at failure rates of components

Estimating failure rates allows tasks to be scheduled on the most reliable components

Page 3: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Project Goals

Use feedback from sensor networks to predict which components are most reliable

Increase reliability of system as seen by tasks through failure predictive scheduling

Page 4: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Methodology

MTBF half-life between 5-10 degrees C MTBF calculation:

Temperature floats to max based on equation modeled after measured values:

Combined failure probability for distributed tasks:

Failure prediction is a random variable:

halflifeTTc

MTBFMTBF0

2

10

501

sTT

cc eTT

)()()()( BAPBPAPBAP

MTBFe1

1

Page 5: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Results

Tasks on lightly loaded systems are more reliable with failure predictive scheduling

3 Processors

0.000

10.000

20.000

30.000

40.000

50.000

60.000

70.000

80.000

Avg Finish Avg StdDev Avg Max Finish

Sim

ulat

ion

Tim

e Optimal

Non-optimal

Page 6: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Tasks on heavily loaded systems do not benefit from predictive scheduling

10 Processors

0.000

50.000

100.000

150.000

200.000

250.000

300.000

350.000

Avg Finish Avg StdDev Avg Max Finish

Sim

ulat

ion

Tim

e

Optimal

Non-optimal

Results

Page 7: Increased Reliability Through Failure Predictive Scheduling with Temperature Sensor Feedback Wesley Emeneker CSE 534 Dr. Sandeep Gupta

Conclusions

Reliability scheduling with respect to thermal management can make a significant difference for lightly loaded systems

Heavily loaded systems do not see a benefit from reliability scheduling