psweep: a lightweight pattern for distributed computational experiments

17
PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University

Upload: kennedy-pitts

Post on 02-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments. Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University. Introduction. Parameter Sweeps are common cluster applications Approaches Scripts (sh, perl: ssh, mpi) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Christopher Mueller and Andrew Lumsdaine

Open Systems Lab, Indiana University

Page 2: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Introduction

Parameter Sweeps are common cluster applications

Approaches Scripts (sh, perl: ssh, mpi) Low level applications (C++, Fortran: MPI) Parameter sweep applications (e.g., Nimrod)

Problems Custom solutions become tangled quickly Applications are not available on all platforms

Page 3: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

How do we use our clusters?Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08

Page 4: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Anatomy of a Parameter Sweep

1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)

Parameters and Enumeration Order

*

* Resrouce distribution is handled by the execution enviroment, e.g. mpirun

Page 5: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Anatomy of a Parameter Sweep

Tasks and Experiments

1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)

Page 6: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Anatomy of a Parameter Sweep

Artifacts and Errors

1. for i in range(rank, n, size):2. if process: load_image(i)3. elif stats: query_image(i)4. 5. for j in [1, 2, 4, 8]:6. if process: time(i, j)7. 8. for k in [‘motion’, ‘gaussian’]:9. if process: process_image(i,j,k)10. elif stats: image_stats(i,j,k)11. else:12. print 'ssh n%d run %d %d' % (i, j, k)13. 14. if process: clear_process(k)15. elif bgi: clear_temp(k)16. 17. if process: unload_image(i)

Page 7: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

User’s View

process

load_image()unload_image()

time()

process_image()

clear_process()

[0, n]

[.01, .1, 1.0]

[10, 12, 14]

stats

query_image()

image_stats()

script gen

print …0, 0.01, 100, 0.01, 120, 0.01, 140, 0.1, 100, 0.1, 12…

Experiments

Parameters

[i, j, k]

Resources

Page 8: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

The PSWEEP Pattern

Page 9: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Abstracting the Loops

Parameter. A Parameter is an iterator or container that supplies the values for a variable in the experiment.

Enumerator. The enumerator takes a ordered list of parameters and lexigraphically enumerates all possible values.

State. The state contains the current value of each parameter, in order.

1. i = [‘house.jpg’, ‘lena.jpg’]2. j = [1, 2, 4, 8]3. K = [‘motion’, ‘gaussian’]4. 5. params = [i, j, k]6. e = enumerator(params)7. 8. for state in e: process_image(state)

Page 10: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Abstracting the Experiments

Task. A Task is any unit of work performed when a parameter value changes. A Task is subdivided into setup and cleanup operations, corresponding to the work done at the beginning and end of a block of code in a loop, respectively.

Experiment. An Experiment is a collection of tasks.

1. def PrepareImage(state, img):2. # Setup3. db_load(img, './current.jpg')4. yield # suspend the function5. # Cleanup6. delete('./current.jpg')

1. def ProcessImage(state, alg):2. data = load('./current.jpg')3. img = process(data, alg(value))4. save(img, str(state) + '.jpg')5. 6. return # no cleanup

Page 11: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Binding Experiments to State

Bound Task Semantics. Tasks must execute in the same order they would if the parameter sweep was expanded to nested loops.

1. for img in images:2. PrepareImage.setup(img)3. for alg in algs:4. ProcessImage.setup(alg)5. PrepareImage.cleanup(img)

1. e = enumerator([images, algs])2. e.bind(images, PrepareImage)3. e.bind(algs, ProcessImage)4. 5. for state in e: pass

These examples are equivalent.

Page 12: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Distributing the Workload

DistributedEnumerator. DistributedEnumerator is an Enumerator that distributes the state to multiple instances across multiple computing resources.

e = RoundRobin(params)for state in e: pass

States:

p1: [house.jpg, 1, motion]p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]

e = Domain(params, images)for state in e: pass

States:

p1: [house.jpg, 1, motion] [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian]p2: [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]

e = MasterWorker(params)for state in e: pass

States:

p1: [house.jpg, 1, motion]p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian]

The DistributedEnumerators must ensure that bound state semantics are satisfied.

Page 13: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Implementations

Python Designed around Iterators and Generators DistribtedEnumerator based on pyMPI Ideal for managing experiments on clusters

C++ Template metaprogramming techniques

remove abstraction penalties Ideal for applications with many nested loops

Page 14: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

C++ Example

1. struct table_task {2. void setup(State& state) {3. std::cout << "<table title=\"";4. print_last_param()(state);5. std::cout << "\">\n";6. }

7. void cleanup(State&) {8. std::cout << "</table>\n";9. }10. };

11. struct table_row_task {12. // As above with <tr>13. };

14. struct table_data_task {15. // As above with <td>16. };

1. int main()2. {3. using boost::make_tuple;

4. sweep(make_tuple("Sat", "Sun"5. make_tuple(range(24)6. make_tuple(range(0,60,10))))7. empty_state().8. bind<0>(table_task()).9. bind<1>(table_row_task()).10. bind<2>(table_data_task()),11. print_last_param());

12. return 0;13. }

Task Classes Parameter Sweep

Generate HTML tables for days of the week with hours for the rows and minutes for the colums

Page 15: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Conclusions

PSWEEP cleanly separates concerns Parameters Tasks Resources

Modern languages enable flexible and high-performance implementations

Page 16: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Reference

http://www.osl.iu.edu/~chemuell/new/psweep.php

A Lightweight Pattern for Managing Distributed Computational Experiments Christopher Mueller, Douglas Gregor, and Andrew Lumsdaine. Submitted to HPDC 2006.

Page 17: PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

Questions?