scientific computing in the consumer digital infrastructure david p. anderson space sciences lab...

38
Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November 7, 2013

Upload: junior-jackson

Post on 12-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Scientific Computing in theConsumer Digital Infrastructure

David P. Anderson

Space Sciences LabUniversity of California, Berkeley

The Austin ForumNovember 7, 2013

Page 2: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Science needs computing power

● High-performance computing● High-throughput computing

– Thousands or millions of independent jobs

– What matters is the rate of job completion, not the turnaround time of individual jobs

Page 3: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

High-throughput computing applications

● Physical simulation

– particle collision– atomic/molecular (bio, nano)– Earth climate system

● Compute-intensive data analysis

– particle physics (LHC)– Astrophysics (radio, gravitational)– genomics

● Bio-inspired optimization

– genetic algorithms, flocking, ant colony etc.

Page 4: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Approaches to HTC

● Cluster computing– lots of commodity or rack-mounted PCs in a

room● Grid computing

– share clusters between organizations● Cloud computing

– rent cluster nodes, e.g. Amazon EC2● Volunteer computing

– use computers owned by consumers

Page 5: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

The Consumer Digital Infrastructure

● Computing devices– Desktop and laptop computers– Mobiles devices: tablets, smart phones– Game consoles– Set-top boxes, DVRs– Appliances

● Commodity Internet– Cable, DSL, fiber to the home, cell networks

Page 6: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Measures of computing speed

● Floating-point operation (FLOP)● GigaFLOPS (109/sec): 1 Central Processing Unit (CPU)● TeraFLOPS (1012/sec): 1 Graphics Processing Unit

(GPU)● PetaFLOPS (1015/sec): 1 supercomputer● ExaFLOPS (1018/sec): current Holy Grail

Page 7: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

CDI performance potential

● 1 billion Desktop/laptop PCs– CPUs: 10 ExaFLOPS– GPUs: 1,000 ExaFLOPS

● 2.5 billion smartphones– CPUs: 10 ExaFLOPS

Page 8: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Volunteer computing

● Consumers donate computing capacity to– support science– be in a community– compete

● History– 1997: GIMPS, distributed.net– 1999: SETI@home, Folding@home– 2003: BOINC

Page 9: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Limiting factors

● Volunteership– Study of college students [Toth 2006]

● 5% would “definitely participate”● 10% would “possible participate”

● PC availability– 65% average availability [Kondo 2008]– 35% of PCs are available 24/7

Page 10: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Other limiting factors

● Network bandwidth (client, server)– Commodity Internet

● Memory, disk usage– new PCs average 6 GB RAM

Page 11: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

BOINC: middleware for volunteer computing

● Supported by NSF since 2002● Open source (LGPL)● Based at University of California, Berkeley● http://boinc.berkeley.edu

Page 12: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Volunteer computing with BOINC

volunteers projects

CPDN

LHC@home

WCGattachments

Page 13: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

How to volunteer

Page 14: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Choose projects

Page 15: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Configure

Page 16: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Community

Page 17: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Creating a BOINC project

● Install BOINC server software on a Linux box

● Compile apps for Windows/Mac/Linux● Attract volunteers

– develop web site– generate publicity– communicate with volunteers

Page 18: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Volunteer computing today

● 500,000 active computers● 50 projects● 15 PetaFLOPS average

Page 19: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Some BOINC-based projects

● IBM World Community Grid● Einstein@home● Climateprediction.net● LHC@home● Rosetta@home

Page 20: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Cost

The cost of 10 TeraFLOPS for 1 year:● CPU cluster: $1.5M● Amazon EC2: $4M

– 5,000 small instances● Volunteer: ~ $0.1M

Page 21: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

How BOINC works

home PC

BOINCclient

project

HTTP

download data, executables

compute

upload outputs

BOINCserver

get jobs

Page 22: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Issues handled by BOINC

● Heterogeneous computers● Untrusted, anonymous computers

– Result validation● replication, adaptive replication

● Credit: amount of work done● Consumer-friendly client

Page 23: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Using GPUs

● BOINC detects and schedules GPUs– NVIDIA, AMD, Intel– multiple/mixed GPUs– various language systems (CUDA, OpenCL,

CAL)● Issues

– non-preemptive GPU scheduling– no paging of GPU memory

Page 24: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Multicore apps

● Next-generation PCs may have 100 cores● BOINC supports multi-core apps

– OpenMP, MPI– OpenCL CPU apps

Page 25: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Using VM technology

● CDI platforms:– 85% Windows– 7% Linux– 7% Mac OS X

● Developing and maintaining versions for different platforms is hard

● Even making a portable Linux executable is hard

Page 26: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Virtual machines

Host operating system

Guest operating system

application

Page 27: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Virtual machines

Windows 7

Debian Linux 2.6

application

Page 28: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

BOINC VM support

● Create a VM image for your favorite environment

● Create executables for that environment

BOINCclient

VirtualBoxexecutive

Vboxwrapper

VM instanceshared directory:executableinput, output files

Page 29: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

VM advantages

● Develop in your favorite environment– No need for multiple versions

● A VM is a strong “sandbox”– Can run untrusted applications

● Free “checkpointing”

Page 30: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

BOINC on Android

● New GUI● Battery-related issues● Released July 2013

– Google, Amazon App Stores– ~50K active devices

Page 31: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Why hasn’t volunteer computing gained traction?

● “Ecosystem of projects” model– Lots of competing projects

● Problems with this model– Creating/operating a project is too hard and

risky– Volunteers need simplicity– No coherent PR; too many brands

Page 32: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Umbrella projects

● One project serves many scientists● Examples

– CAS@home (Chinese Academy of Science)– World Community Grid (IBM)– U. of Westminster (desktop grid)– Ibercivis (Spanish consortium)

Page 33: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Integrating BOINC

● HTCondor (U. of Wisconsin)– Goal: BOINC-based back end for Open

Science Grid or any Condor pool

BOINCserver

HTCondor node

Grid manager

BOINC GAHP

Job submission

Page 34: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Integrating BOINC

● HUBzero (Purdue)– Goal: BOINC-based back end for science

portals such as nanoHUB

BOINCserver

HubprojectsprojectsPCs

Page 35: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Proposal: Science@home

● Single “brand” for volunteer computing● Volunteers register for science areas

rather than projects● How to allocate computing power?

– Involve the HPC, scientific funding communities

Page 36: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

projectsprojects

Implementing Science@home

● BOINC “account manager” architecture

Science@home

BOINCclient

projects

Page 37: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Summary

● Volunteer computing is– Usable for most HTC applications– A path to ExaFLOPS computing– A way to popularize science

● BOINC provides the software infrastructure

● Barriers are largely organizational

Page 38: Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November

Contacts

● http://boinc.berkeley.edu● [email protected]