scientific computing in the consumer digital infrastructure david p. anderson space sciences lab...
TRANSCRIPT
Scientific Computing in theConsumer Digital Infrastructure
David P. Anderson
Space Sciences LabUniversity of California, Berkeley
The Austin ForumNovember 7, 2013
Science needs computing power
● High-performance computing● High-throughput computing
– Thousands or millions of independent jobs
– What matters is the rate of job completion, not the turnaround time of individual jobs
High-throughput computing applications
● Physical simulation
– particle collision– atomic/molecular (bio, nano)– Earth climate system
● Compute-intensive data analysis
– particle physics (LHC)– Astrophysics (radio, gravitational)– genomics
● Bio-inspired optimization
– genetic algorithms, flocking, ant colony etc.
Approaches to HTC
● Cluster computing– lots of commodity or rack-mounted PCs in a
room● Grid computing
– share clusters between organizations● Cloud computing
– rent cluster nodes, e.g. Amazon EC2● Volunteer computing
– use computers owned by consumers
The Consumer Digital Infrastructure
● Computing devices– Desktop and laptop computers– Mobiles devices: tablets, smart phones– Game consoles– Set-top boxes, DVRs– Appliances
● Commodity Internet– Cable, DSL, fiber to the home, cell networks
Measures of computing speed
● Floating-point operation (FLOP)● GigaFLOPS (109/sec): 1 Central Processing Unit (CPU)● TeraFLOPS (1012/sec): 1 Graphics Processing Unit
(GPU)● PetaFLOPS (1015/sec): 1 supercomputer● ExaFLOPS (1018/sec): current Holy Grail
CDI performance potential
● 1 billion Desktop/laptop PCs– CPUs: 10 ExaFLOPS– GPUs: 1,000 ExaFLOPS
● 2.5 billion smartphones– CPUs: 10 ExaFLOPS
Volunteer computing
● Consumers donate computing capacity to– support science– be in a community– compete
● History– 1997: GIMPS, distributed.net– 1999: SETI@home, Folding@home– 2003: BOINC
Limiting factors
● Volunteership– Study of college students [Toth 2006]
● 5% would “definitely participate”● 10% would “possible participate”
● PC availability– 65% average availability [Kondo 2008]– 35% of PCs are available 24/7
Other limiting factors
● Network bandwidth (client, server)– Commodity Internet
● Memory, disk usage– new PCs average 6 GB RAM
BOINC: middleware for volunteer computing
● Supported by NSF since 2002● Open source (LGPL)● Based at University of California, Berkeley● http://boinc.berkeley.edu
Volunteer computing with BOINC
volunteers projects
CPDN
LHC@home
WCGattachments
How to volunteer
Choose projects
Configure
Community
Creating a BOINC project
● Install BOINC server software on a Linux box
● Compile apps for Windows/Mac/Linux● Attract volunteers
– develop web site– generate publicity– communicate with volunteers
Volunteer computing today
● 500,000 active computers● 50 projects● 15 PetaFLOPS average
Some BOINC-based projects
● IBM World Community Grid● Einstein@home● Climateprediction.net● LHC@home● Rosetta@home
Cost
The cost of 10 TeraFLOPS for 1 year:● CPU cluster: $1.5M● Amazon EC2: $4M
– 5,000 small instances● Volunteer: ~ $0.1M
How BOINC works
home PC
BOINCclient
project
HTTP
download data, executables
compute
upload outputs
BOINCserver
get jobs
Issues handled by BOINC
● Heterogeneous computers● Untrusted, anonymous computers
– Result validation● replication, adaptive replication
● Credit: amount of work done● Consumer-friendly client
Using GPUs
● BOINC detects and schedules GPUs– NVIDIA, AMD, Intel– multiple/mixed GPUs– various language systems (CUDA, OpenCL,
CAL)● Issues
– non-preemptive GPU scheduling– no paging of GPU memory
Multicore apps
● Next-generation PCs may have 100 cores● BOINC supports multi-core apps
– OpenMP, MPI– OpenCL CPU apps
Using VM technology
● CDI platforms:– 85% Windows– 7% Linux– 7% Mac OS X
● Developing and maintaining versions for different platforms is hard
● Even making a portable Linux executable is hard
Virtual machines
Host operating system
Guest operating system
application
Virtual machines
Windows 7
Debian Linux 2.6
application
BOINC VM support
● Create a VM image for your favorite environment
● Create executables for that environment
BOINCclient
VirtualBoxexecutive
Vboxwrapper
VM instanceshared directory:executableinput, output files
VM advantages
● Develop in your favorite environment– No need for multiple versions
● A VM is a strong “sandbox”– Can run untrusted applications
● Free “checkpointing”
BOINC on Android
● New GUI● Battery-related issues● Released July 2013
– Google, Amazon App Stores– ~50K active devices
Why hasn’t volunteer computing gained traction?
● “Ecosystem of projects” model– Lots of competing projects
● Problems with this model– Creating/operating a project is too hard and
risky– Volunteers need simplicity– No coherent PR; too many brands
Umbrella projects
● One project serves many scientists● Examples
– CAS@home (Chinese Academy of Science)– World Community Grid (IBM)– U. of Westminster (desktop grid)– Ibercivis (Spanish consortium)
Integrating BOINC
● HTCondor (U. of Wisconsin)– Goal: BOINC-based back end for Open
Science Grid or any Condor pool
BOINCserver
HTCondor node
Grid manager
BOINC GAHP
Job submission
Integrating BOINC
● HUBzero (Purdue)– Goal: BOINC-based back end for science
portals such as nanoHUB
BOINCserver
HubprojectsprojectsPCs
Proposal: Science@home
● Single “brand” for volunteer computing● Volunteers register for science areas
rather than projects● How to allocate computing power?
– Involve the HPC, scientific funding communities
projectsprojects
Implementing Science@home
● BOINC “account manager” architecture
Science@home
BOINCclient
projects
Summary
● Volunteer computing is– Usable for most HTC applications– A path to ExaFLOPS computing– A way to popularize science
● BOINC provides the software infrastructure
● Barriers are largely organizational
Contacts
● http://boinc.berkeley.edu● [email protected]