1
Energy Prediction for I/O Intensive Workflow Applications
MASc ExamHao Yang
NetSysLabThe Electrical and Computer Engineering Department
The University of British Columbia
2
Background - Workflow Applications
Montage Workflow
Computation
File Dependency
Characteristics: • File based communication• Large number of tasks• Large amount of I/O • Common data access patterns
Background - Application Execution
3
Central Storage System (e.g., GPFS, NFS)
File based communication
Large I/O volumeWorkflow Runtime
EngineApp. task
Local storage
App. task
Local storage
App. task
Local storage
App. task
Local storage
App. task
Local storage
I/O Bottleneck
Background - Intermediate Storage System
4
Central Storage System (e.g., GPFS, NFS)
App. task
Local storage
App. task
Local storage
App. task
Local storage
Intermediate Storage
…
Workflow Runtime
Engine
Stage In
Stage Out
Compute Nodes
5
Background - Context of this thesis
This work focuses on workflow application execution on intermediate storage systems.
6
Research Problem – Energy Consumption
• The pursuit of performance use to dominate the conventional computing area.
• Energy efficiency is the new concern.
Computing Equipment Energy Bill
7
Research Problem - Configuration Decisions
Montage Workload Energy Delay Product (EDP)
Configuring the runtime system is complex (Example: resource allocation decision)
8
• Q1: What performance optimizations in storage systems lead to energy savings?
• Q2: What is the performance and energy impact of power-centric tuning techniques?
• Q3: How can users balance time-to-solution and energy consumption when given a target application?
Research Problem - Questions
10
Methodology – Building Energy Consumption Predictor
The goal of this work is to build an energy consumption predictor to aid system configuration and provisioning decisions.
• Answer what-if questions (E.g, is A configuration better than B from the energy perspective?)
• Customize optimization metric (E.g., energy consumption, performance-energy product)
Methodology – Energy Model
11
App. task
Local storage
App. task
Local storage
App. task
Local storage
Intermediate Storage
…
Compute Nodes
Execution States: • Idle • Network Transfer• Storage I/O • Task Processing
A C D
App. task
Local storage
BWorkflow Runtime
Engine
Power Profiles:
12
Methodology – Energy Model
Idle
Network Transfer
I/O ops (read, write)
Task Processing
Energy Power Profile * Predicted TimesExecution States:
13
Methodology – Energy Model
How to seed the energy model?
• Power states: using synthetic benchmarks to get the power consumption in each state.
• Time estimates: augments a performance predictor to track the time spent in each state.
14
Methodology – Building Energy Consumption Predictor
L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, (Acceptance Rate: 20%) June 2014. L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for Workflow Applications”, In Proceedings PDSW'13, 2013.
Sources of inaccuracies
homogeneity, Power meter
Time Prediction
Model Simplification(metadata, scheduling, …)
15
Evaluation Outline
• Synthetic benchmarks: Workflow Patterns• Real workflow applications • Predicting Energy Impact of Power-tuning Techniques• Predicting Energy-Performance Tradeoffs
16
Evaluation - Platform
• Taurus Cluster (11 nodes) two 2.3GHz Intel Xeon E5-2630 CPUs (each with 6 cores), 32GB memory, 10 Gbps NIC
• Sagittaire Cluster (16 nodes) two 2.4GHz AMD Opteron CPUs (each with one core), 2GB RAM and 1 Gbps NIC
• SME Omegawatt power-meter per Node 0.01W power resolution at 1Hz sampling rate
Grid5000 Lyon site
IdleAppStorage I/ONet transfer
19
Evaluation – Synthetic benchmarks: Workflow Patterns
• Average 88% accuracy• 20-30x times faster than running the actual benchmark • 200x-300x less resources
(machines * runtime)
Using Default Storage System Configuration (DSS)
20
Evaluation – Synthetic benchmarks: Workflow Patterns
S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, Journal of Grid Computing 2014.
Pipeline Energy Consumption
DSS – Default Storage System ConfigurationWOSS – Workflow Optimized Storage System Configuration
Q1: What are the energy savings that performance optimizations in storage can bring?
• Accurate in both configurations. • Suggests the configuration from
energy perspective.
22
Evaluation – Real Workflow Applications
BLAST Result (Energy 89%, Time 95% )
Montage Result (Energy 84%, Time 86% )
23
Evaluation – CPU Throttling
• CPU throttling is an important technique where processors run at less-than-maximum frequency to conserve power.
• this technique can prolong the execution time while conserving instantaneous power.
Q2: What is the energy and performance impact of CPU throttling? Is it application-specific?
CPU bound application: BLAST I/O bound application: pipeline benchmark
24
Evaluation – CPU Throttling
BLAST Result
Pipeline Result
Energy Time
Energy Time
17% savings when using maximum throttling
96% cost when using maximum CPU throttling
Frequency Level: 1200MHz, 1800MHz, 2300MHz
Conclusion: • The computational and I/O characteristics
Energy savings/ energy costs
• The predictor can be used in make the decisions.
25
Evaluation – Predicting Energy Delay Product
User’s optimization metric • Performance (use more machines)• Energy • Energy-Delay Product (EDP, energy * time)
• Consider allocation decision. • Use Montage workload on two clusters to demonstrate prediction.
Q3: How can users balance time-to-solution and energy consumption when given a target application?
27
Conclusion
• This thesis presents an energy consumption predictor in the workflow application domain.
• The proposed energy model and prediction framework achieve adequate accuracy to be useful for the energy-oriented configurations this work targets.
28
Resulting PublicationsEnergy Prediction• H. Yang, L. B. Costa and M. Ripeanu, “Energy Prediction for I/O Intensive Workflows Applications”, submitted to
7th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2014 (Co-located with Supercomputing/SC 2014), under-review.
Performance Prediction and Provisioning • L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration and Provisioning for I/O Intensive Workflows”, In Preparation. • L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive
Workflows”, In Proceedings of ICS'14, Acceptance rate: 20%. June 2014. • L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for
Workflow Applications”, In Proceedings PDSW'13, 2013.
Evaluating Storage Systems for Scientific Data in the Cloud• K. Maheshwari, J. Wozniak, H. Yang, D. S. Katz, M. Ripeanu, V. Zavala, M. Wilde, “Evaluating Storage
Systems for Scientific Data in the Cloud”, In Proceedings of the 5th Workshop on Scientific Cloud Computing (ScienceCloud), Co-located with ACM HPDC 2014 (Best Paper Award)
A Workflow-Optimized Storage System • S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan , M. Ripeanu, “A Software Defined Storage for Scientific Workflow Applications”, In Preparation. • S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014• L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, accepted by Journal of Grid Computing, 2014.
29
• The system model• Model seeding• Workload description
System Deployment ConfigurationNumber of Storage Nodes
Number of Client NodesChunk Size
Replication Level…
Platform Performance ParametersManger Service Time
Storage Service Time
Client Service Time
Remote network service Time
Local network service time
𝜇𝑚𝑎
𝜇𝑠𝑚
𝜇𝑟𝑒−𝑛𝑒𝑡
𝜇lo−𝑛𝑒𝑡
𝜇𝑐𝑙𝑖
I/O traces Task Dependency Graph
L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, June 2014.
Backup Slides
30
Limitations: • Simplification of the model• Short tasks/ small workload• Not validated using new devices (e.g, SSD)
Backup Slides
32
Apply benchmarks in parallel to get combined power state: E.g., perform storage and network benchmarks in parallel
91.6W, :129.0W, : 127.7W
Backup Slides
Combined states
33
Energy Composition (pipeline benchmark): • Idle energy: 64%• App processing: 9.2%• Storage operations: 15.8%• Network transfer: 10.6%
Backup Slides