david p. anderson space sciences laboratory university of california – berkeley
DESCRIPTION
David P. Anderson Space Sciences Laboratory University of California – Berkeley. Public Distributed Computing with BOINC. Public-resource computing. 1 billion Internet-connected PCs in 2010 >50% of PCs are privately owned Assume 100M participants At least 100 PetaFLOPs - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/1.jpg)
David P. AndersonSpace Sciences Laboratory
University of California – Berkeley
Public Distributed Computingwith BOINC
![Page 2: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/2.jpg)
Public-resource computing
● 1 billion Internet-connected PCs in 2010● >50% of PCs are privately owned● Assume 100M participants
– At least 100 PetaFLOPs– At least 1 Exabyte (10^18) storage
● Problems– incentive, security, failures, ...
![Page 3: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/3.jpg)
SETI@home
● Started May 1999● ~600,000 active participants● ~60 TeraFLOPs● Problems with current software
– hard to change/add algorithms– can't share participants w/ other projects– inflexible data architecture
![Page 4: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/4.jpg)
SETI@home data architecture
ideal:current:
commercialInternet
Berkeley
participants
tapes Internet2(free)
commercialInternet
Berkeley Stanford USC
participants
50 Mbps
![Page 5: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/5.jpg)
BOINC: Berkeley Open Infrastructure for Network Computing
● Multiple projects
– easy to develop and operate
– independent● Support wide range of tasks
– computation/storage
– task “topologies”● Participant features
– can choose projects, resource allocation
– configurable; invisible on participant hosts
– many platforms supported
![Page 6: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/6.jpg)
BOINC server architecture
work generator
projectDBBOINC
DB
timeout/retry
validater
assimilator
file deleter data serverdata serverdata server
data serverdata serverscheduling server
Web interfaces(PHP)
![Page 7: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/7.jpg)
BOINC client architecture
BOINCcore client
screensaver
application
BOINClibrary
application
BOINClibrary
files,shared memory
messages schedulers,data servers
![Page 8: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/8.jpg)
Data architecture
● Files
– immutable, replicated– may originate on client or project– may remain resident on client
● Persistent, non-intrusive file transfers● XML descriptor:
<file_info><name>arecibo_3392474_jun_23_01</name><url>http://ds.ssl.berkeley.edu/a3392474</url><url>http://dt.ssl.berkeley.edu/a3392474</url><md5_cksum>uwi7eyufiw8e972h8f9w7</md5_cksum><nbytes>10000000</nbytes>
</file_info>
![Page 9: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/9.jpg)
BOINC applications
● Any language (C, C++, Fortran)● BOINC API
– filename translation– checkpoint/restart, % done, CPU time– graphics (based on OpenGL, GLUT)
![Page 10: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/10.jpg)
![Page 11: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/11.jpg)
Work units● Template for a computation● Resource estimates
– Integer, FP ops; memory; disk space● Delay bound
– determines retry, client abort
<file_info><name>arecibo_3392474_jun_23_01</name>...
</file_info><workunit>
<name>ar_13323313</name><file_ref>
<name>arecibo_3392474_jun_23_01</name><open_name>input_file</open_name>
</file_ref><command_line>-niter 1000</command_line>
</workunit>
![Page 12: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/12.jpg)
Results
● An instance of a computation (completed or not)
● Includes: host ID, claimed/granted credit
<file_info><name>arecibo_3392474_jun_23_01.out</name>...
</file_info><result>
<workunit_name>ar_13323313</workunit_name><file_ref>
<name>arecibo_3392474_jun_23_01.out</name><open_name>output_file</open_name>
</file_ref></result>
![Page 13: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/13.jpg)
Scheduling
● Work buffering on client– upper, lower bounds
● Host attributes– FP/int/mem speeds, disk/memory sizes– network bandwidth up/down– fraction of time connected, computing
● Scheduler policy:– send as much work as requested, subject
to feasibility, WU deadlines
![Page 14: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/14.jpg)
Client/server protocol (XML-RPC)
● Request– Authentication– Host description– Persistent file descriptions– Result descriptions– Duration of work requested
● Reply– Application, workunit, result descriptors– Result acknowledgements– Preferences– Control messages (redirect, back off, etc.)
![Page 15: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/15.jpg)
Work sequences● Handle long (weeks or months)
computations with large local state● Sequence normally stays on one host;
move to different host if failure● Scheduling, redundancy checking are
trickyUpload state
Check for abort
![Page 16: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/16.jpg)
Redundant computing
● Create several results per workunit● Find “canonical result” with project-
specific consensus policy● Generate additional copies as needed,
up to error thresholds● One result per WU per user
![Page 17: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/17.jpg)
Participant Credit● Goals:
– credit for work actually done (CPU, network, storage)
– don't know workunit size in advance– cheat-proof
● Integration with redundancy– claimed credit = benchmark * CPU time– granted credit = minimum claimed credit
● Handling graphics coprocessors– project-specific benchmarks
![Page 18: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/18.jpg)
Work unit lifecycle
● Work generator: create WU, N results
● Timeout check
– create new results if needed
– detect too many errors, too many results without consensus
● Validator
– find canonical result; grant credit● Assimilator
– merge canonical result into project DB● File deleter
– delete input and output files when no longer needed
![Page 19: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/19.jpg)
Participating in a BOINC project
User Project web site
create account
email account IDdownload core client
core client
enter account ID, project URL
get list of scheduling servers
scheduler RPC
![Page 20: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/20.jpg)
Windows GUI
● Multi-language● Operations: suspend/resume,
attach/detach projects, etc.
![Page 21: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/21.jpg)
Participant preferences
![Page 22: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/22.jpg)
Project-specific preferences
![Page 23: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/23.jpg)
User-visible web features
● User profiles– user of the day
● Forums● Self-moderating FAQs● Teams● XML data export (3rd party statistics
reporting)
![Page 24: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/24.jpg)
Project configuration file
<boinc><config> <db_name>ap</db_name> <db_passwd></db_passwd> <shmem_key>0x35740417</shmem_key> <key_dir>/mydisks/a/users/boincadm/keys</key_dir> <upload_url>http://setiboinc.ssl.berkeley.edu/ap_cgi/file_upload_handler</upload_url> <upload_dir>/mydisks/a/users/boincadm/projects/AstroPulse_Beta/upload</upload_dir> <cgi_url>http://setiboinc.ssl.berkeley.edu/ap_cgi</cgi_url> <log_dir>/mydisks/a/users/boincadm/projects/AstroPulse_Beta/log</log_dir> <disable_account_creation/></config><daemons> <daemon><cmd>feeder -d 1</cmd></daemon> <daemon><cmd>validate_test -d 2 -app AstroPulse -quorum 3</cmd></daemon> <daemon><cmd>timeout_check -d 2 -app AstroPulse -nerror 10 -ndet 10 -nredundancy 3</cmd></daemon> <daemon><cmd>assimilator -d 2 -app AstroPulse</cmd></daemon> <daemon><cmd>file_deleter -d 2</cmd></daemon></daemons><tasks> <task><cmd>update_stats -update_users -update_hosts -update_teams</cmd><period>1 hour</period></task> <task><cmd>get_load</cmd><period>5 min</period></task> <task><cmd>db_count "user"</cmd><output>count_users.out</output><period>5 min</period></task> <task><cmd>db_count "result"</cmd><output>count_results_all.out</output><period>5 min</period></task></tasks></boinc>
![Page 25: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/25.jpg)
Project control
● Single control program– enable, disable– cron– status
● uses PID files to keep track of daemons● uses timestamp file for period tasks● uses lockfiles for mutual exclusion
![Page 26: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/26.jpg)
Python-based testing system● Create objects representing projects,
hosts, applications, work, etc.● Activate objects to realize (create
databases and directories, run servers and clients)
● Simulate various types of failures● Check correctness of final system state
(database, result files, etc.) host = Host() user = UserUC() for i in range(2): ProjectUC(users=[user], hosts=[host], redundancy=5, short_name="test_1sec_%d"%i, resource_share=[1, 5][i]) run_check_all()
![Page 27: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/27.jpg)
Monitoring/debugging tools
● All backend processes create log files– web/grep tool for tracking particular
WU/result● Database browsing tools
– summary of activity; entry point for browsing● Strip charts
– record, graph measures of system health● Watchdogs
– detect system failures; ring pager
![Page 28: David P. Anderson Space Sciences Laboratory University of California – Berkeley](https://reader036.vdocuments.net/reader036/viewer/2022062409/568145e9550346895db2e9d9/html5/thumbnails/28.jpg)
Summary and status
● BOINC is funded by a 3-year NSF grant● Computing projects at Space Sciences Lab
– Astropulse (in beta test)– SETI@home (original, Australian)
● Other projects– Folding@home– Climateprediction.net
● Source code is free for noncommercial use