scientific computing in the consumer digital infrastructure david p. anderson space sciences lab...

Post on 12-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scientific Computing in theConsumer Digital Infrastructure

David P. Anderson

Space Sciences LabUniversity of California, Berkeley

The Austin ForumNovember 7, 2013

Science needs computing power

● High-performance computing● High-throughput computing

– Thousands or millions of independent jobs

– What matters is the rate of job completion, not the turnaround time of individual jobs

High-throughput computing applications

● Physical simulation

– particle collision– atomic/molecular (bio, nano)– Earth climate system

● Compute-intensive data analysis

– particle physics (LHC)– Astrophysics (radio, gravitational)– genomics

● Bio-inspired optimization

– genetic algorithms, flocking, ant colony etc.

Approaches to HTC

● Cluster computing– lots of commodity or rack-mounted PCs in a

room● Grid computing

– share clusters between organizations● Cloud computing

– rent cluster nodes, e.g. Amazon EC2● Volunteer computing

– use computers owned by consumers

The Consumer Digital Infrastructure

● Computing devices– Desktop and laptop computers– Mobiles devices: tablets, smart phones– Game consoles– Set-top boxes, DVRs– Appliances

● Commodity Internet– Cable, DSL, fiber to the home, cell networks

Measures of computing speed

● Floating-point operation (FLOP)● GigaFLOPS (109/sec): 1 Central Processing Unit (CPU)● TeraFLOPS (1012/sec): 1 Graphics Processing Unit

(GPU)● PetaFLOPS (1015/sec): 1 supercomputer● ExaFLOPS (1018/sec): current Holy Grail

CDI performance potential

● 1 billion Desktop/laptop PCs– CPUs: 10 ExaFLOPS– GPUs: 1,000 ExaFLOPS

● 2.5 billion smartphones– CPUs: 10 ExaFLOPS

Volunteer computing

● Consumers donate computing capacity to– support science– be in a community– compete

● History– 1997: GIMPS, distributed.net– 1999: SETI@home, Folding@home– 2003: BOINC

Limiting factors

● Volunteership– Study of college students [Toth 2006]

● 5% would “definitely participate”● 10% would “possible participate”

● PC availability– 65% average availability [Kondo 2008]– 35% of PCs are available 24/7

Other limiting factors

● Network bandwidth (client, server)– Commodity Internet

● Memory, disk usage– new PCs average 6 GB RAM

BOINC: middleware for volunteer computing

● Supported by NSF since 2002● Open source (LGPL)● Based at University of California, Berkeley● http://boinc.berkeley.edu

Volunteer computing with BOINC

volunteers projects

CPDN

LHC@home

WCGattachments

How to volunteer

Choose projects

Configure

Community

Creating a BOINC project

● Install BOINC server software on a Linux box

● Compile apps for Windows/Mac/Linux● Attract volunteers

– develop web site– generate publicity– communicate with volunteers

Volunteer computing today

● 500,000 active computers● 50 projects● 15 PetaFLOPS average

Some BOINC-based projects

● IBM World Community Grid● Einstein@home● Climateprediction.net● LHC@home● Rosetta@home

Cost

The cost of 10 TeraFLOPS for 1 year:● CPU cluster: $1.5M● Amazon EC2: $4M

– 5,000 small instances● Volunteer: ~ $0.1M

How BOINC works

home PC

BOINCclient

project

HTTP

download data, executables

compute

upload outputs

BOINCserver

get jobs

Issues handled by BOINC

● Heterogeneous computers● Untrusted, anonymous computers

– Result validation● replication, adaptive replication

● Credit: amount of work done● Consumer-friendly client

Using GPUs

● BOINC detects and schedules GPUs– NVIDIA, AMD, Intel– multiple/mixed GPUs– various language systems (CUDA, OpenCL,

CAL)● Issues

– non-preemptive GPU scheduling– no paging of GPU memory

Multicore apps

● Next-generation PCs may have 100 cores● BOINC supports multi-core apps

– OpenMP, MPI– OpenCL CPU apps

Using VM technology

● CDI platforms:– 85% Windows– 7% Linux– 7% Mac OS X

● Developing and maintaining versions for different platforms is hard

● Even making a portable Linux executable is hard

Virtual machines

Host operating system

Guest operating system

application

Virtual machines

Windows 7

Debian Linux 2.6

application

BOINC VM support

● Create a VM image for your favorite environment

● Create executables for that environment

BOINCclient

VirtualBoxexecutive

Vboxwrapper

VM instanceshared directory:executableinput, output files

VM advantages

● Develop in your favorite environment– No need for multiple versions

● A VM is a strong “sandbox”– Can run untrusted applications

● Free “checkpointing”

BOINC on Android

● New GUI● Battery-related issues● Released July 2013

– Google, Amazon App Stores– ~50K active devices

Why hasn’t volunteer computing gained traction?

● “Ecosystem of projects” model– Lots of competing projects

● Problems with this model– Creating/operating a project is too hard and

risky– Volunteers need simplicity– No coherent PR; too many brands

Umbrella projects

● One project serves many scientists● Examples

– CAS@home (Chinese Academy of Science)– World Community Grid (IBM)– U. of Westminster (desktop grid)– Ibercivis (Spanish consortium)

Integrating BOINC

● HTCondor (U. of Wisconsin)– Goal: BOINC-based back end for Open

Science Grid or any Condor pool

BOINCserver

HTCondor node

Grid manager

BOINC GAHP

Job submission

Integrating BOINC

● HUBzero (Purdue)– Goal: BOINC-based back end for science

portals such as nanoHUB

BOINCserver

HubprojectsprojectsPCs

Proposal: Science@home

● Single “brand” for volunteer computing● Volunteers register for science areas

rather than projects● How to allocate computing power?

– Involve the HPC, scientific funding communities

projectsprojects

Implementing Science@home

● BOINC “account manager” architecture

Science@home

BOINCclient

projects

Summary

● Volunteer computing is– Usable for most HTC applications– A path to ExaFLOPS computing– A way to popularize science

● BOINC provides the software infrastructure

● Barriers are largely organizational

Contacts

● http://boinc.berkeley.edu● davea@ssl.berkeley.edu

top related