psweep: a lightweight pattern for distributed computational experiments

Post on 02-Jan-2016

30 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments. Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University. Introduction. Parameter Sweeps are common cluster applications Approaches Scripts (sh, perl: ssh, mpi) - PowerPoint PPT Presentation

TRANSCRIPT

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Christopher Mueller and Andrew Lumsdaine

Open Systems Lab, Indiana University

Introduction

Parameter Sweeps are common cluster applications

Approaches Scripts (sh, perl: ssh, mpi) Low level applications (C++, Fortran: MPI) Parameter sweep applications (e.g., Nimrod)

Problems Custom solutions become tangled quickly Applications are not available on all platforms

How do we use our clusters?Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08

Anatomy of a Parameter Sweep

1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)

Parameters and Enumeration Order

*

* Resrouce distribution is handled by the execution enviroment, e.g. mpirun

Anatomy of a Parameter Sweep

Tasks and Experiments

1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)

Anatomy of a Parameter Sweep

Artifacts and Errors

1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)

User’s View

process

load_image()unload_image()

time()

process_image()

clear_process()

[0, n]

[.01, .1, 1.0]

[10, 12, 14]

stats

query_image()

image_stats()

script gen

print …0, 0.01, 100, 0.01, 120, 0.01, 140, 0.1, 100, 0.1, 12…

Experiments

Parameters

[i, j, k]

Resources

The PSWEEP Pattern

Abstracting the Loops

Parameter. A Parameter is an iterator or container that supplies the values for a variable in the experiment.

Enumerator. The enumerator takes a ordered list of parameters and lexigraphically enumerates all possible values.

State. The state contains the current value of each parameter, in order.

1. i = [‘house.jpg’, ‘lena.jpg’]2. j = [1, 2, 4, 8]3. K = [‘motion’, ‘gaussian’]4. 5. params = [i, j, k]6. e = enumerator(params)7. 8. for state in e: process_image(state)

Abstracting the Experiments

Task. A Task is any unit of work performed when a parameter value changes. A Task is subdivided into setup and cleanup operations, corresponding to the work done at the beginning and end of a block of code in a loop, respectively.

Experiment. An Experiment is a collection of tasks.

1. def PrepareImage(state, img):2. # Setup3. db_load(img, './current.jpg')4. yield # suspend the function5. # Cleanup6. delete('./current.jpg')

1. def ProcessImage(state, alg):2. data = load('./current.jpg')3. img = process(data, alg(value))4. save(img, str(state) + '.jpg')5. 6. return # no cleanup

Binding Experiments to State

Bound Task Semantics. Tasks must execute in the same order they would if the parameter sweep was expanded to nested loops.

1. for img in images:2. PrepareImage.setup(img)3. for alg in algs:4. ProcessImage.setup(alg)5. PrepareImage.cleanup(img)

1. e = enumerator([images, algs])2. e.bind(images, PrepareImage)3. e.bind(algs, ProcessImage)4. 5. for state in e: pass

These examples are equivalent.

Distributing the Workload

DistributedEnumerator. DistributedEnumerator is an Enumerator that distributes the state to multiple instances across multiple computing resources.

e = RoundRobin(params)for state in e: pass

States:

p1: [house.jpg, 1, motion]p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]

e = Domain(params, images)for state in e: pass

States:

p1: [house.jpg, 1, motion] [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian]p2: [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]

e = MasterWorker(params)for state in e: pass

States:

p1: [house.jpg, 1, motion]p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]

The DistributedEnumerators must ensure that bound state semantics are satisfied.

Implementations

Python Designed around Iterators and Generators DistribtedEnumerator based on pyMPI Ideal for managing experiments on clusters

C++ Template metaprogramming techniques

remove abstraction penalties Ideal for applications with many nested loops

C++ Example

1. struct table_task {2. void setup(State& state) {3. std::cout << "<table title=\"";4. print_last_param()(state);5. std::cout << "\">\n";6. }

7. void cleanup(State&) {8. std::cout << "</table>\n";9. }10. };

11. struct table_row_task {12. // As above with <tr>13. };

14. struct table_data_task {15. // As above with <td>16. };

1. int main()2. {3. using boost::make_tuple;

4. sweep(make_tuple("Sat", "Sun"5. make_tuple(range(24)6. make_tuple(range(0,60,10))))7. empty_state().8. bind<0>(table_task()).9. bind<1>(table_row_task()).10. bind<2>(table_data_task()),11. print_last_param());

12. return 0;13. }

Task Classes Parameter Sweep

Generate HTML tables for days of the week with hours for the rows and minutes for the colums

Conclusions

PSWEEP cleanly separates concerns Parameters Tasks Resources

Modern languages enable flexible and high-performance implementations

Reference

http://www.osl.iu.edu/~chemuell/new/psweep.php

A Lightweight Pattern for Managing Distributed Computational Experiments Christopher Mueller, Douglas Gregor, and Andrew Lumsdaine. Submitted to HPDC 2006.

Questions?

top related