data, data everywhere

44
Data, Data Everywhere Why We Need Broadband Connectivity By Ruzena Bajcsy

Upload: pennie

Post on 21-Mar-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Data, Data Everywhere. Why We Need Broadband Connectivity By Ruzena Bajcsy. Who Generates the Data?. Astronomers Biologists High Energy Physicists Geophysicists Archeologists and Anthropologists Psychologists Engineers Artists. Center for Information Technology Research - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data, Data Everywhere

Data, Data Everywhere

Why We Need Broadband Connectivity

By Ruzena Bajcsy

Page 2: Data, Data Everywhere

Who Generates the Data?

• Astronomers• Biologists• High Energy Physicists• Geophysicists• Archeologists and Anthropologists• Psychologists• Engineers• Artists

Page 3: Data, Data Everywhere

A Year of Innovation and A Year of Innovation and AccomplishmentAccomplishment

UC Santa Cruz

Center for Information Technology Research Center for Information Technology Research in the Interest of Societyin the Interest of Society

Page 4: Data, Data Everywhere

Solving SocietSolving Societal-Scale Problemsal-Scale Problems

Energy Conservation Emergency Response and

Homeland Defense Transportation Efficiency

Page 5: Data, Data Everywhere

Solving Societal-Scale ProblemsSolving Societal-Scale Problems

Monitoring Health Care Land and Environment Education

Page 6: Data, Data Everywhere

Societal-Scale SystemsSocietal-Scale Systems

““Client”Client”

““Server”Server”

Clusters

Massive Cluster

Gigabit Ethernet

Secure, non-stop utilitySecure, non-stop utilityDiverse componentsDiverse componentsAdapts to interfaces/usersAdapts to interfaces/usersAlways connectedAlways connected

MEMSMEMSSensorsSensors

Scalable, Reliable,Scalable, Reliable,Secure ServicesSecure Services

InformationInformationAppliancesAppliances

Page 7: Data, Data Everywhere

February 2000February 2001

February 2002August 2001

Page 8: Data, Data Everywhere

$8,000 each$8,000 each

Seismic Monitoring of Buildings: Seismic Monitoring of Buildings: Before CITRISBefore CITRIS

Page 9: Data, Data Everywhere

Seismic Monitoring of Buildings: Seismic Monitoring of Buildings: With CITRIS Wireless MotesWith CITRIS Wireless Motes

$70 each$70 each

Page 10: Data, Data Everywhere

Ad-hoc sensor networks work• 29 Palms Marine Base, March 2001

– 10 Motes dropped from an airplane landed, formed a wireless network, detected passing vehicles, and radioed information back

• Intel Developers Forum, Aug 2001– 800 Motes running TinyOS hidden

in auditorium seats started up and formed a wireless network as participants passed them around

• tinyos.millennium.berkeley.edu

Page 11: Data, Data Everywhere

Recent Progress:

Energy Efficiency and

Smart Buildings

Arens, Culler, Pister, Orens, Rabaey, Sastry

Page 12: Data, Data Everywhere

The Inelasticity of California’s Electrical Supply

0

100

200

300

400

500

600

700

800

20000 25000 30000 35000 40000 45000MW

$/M

Wh

Power-exchange market price for electricity versus load(California, Summer 2000)

Page 13: Data, Data Everywhere

How to Address the Inelasticity of the Supply

• Spread demand over time (or reduce peak)– Make cost of energy

• visible to end-user• function of load curve (e.g. hourly pricing)

– “demand-response” approach• Reduce average demand (demand side)

– Eliminate wasteful consumption– Improve efficiency of equipment and appliances

• Improve efficiency of generation and distribution network (supply side)

Enabled by Information!

Page 14: Data, Data Everywhere

Energy Consumption in Buildings (US 1997)

End Use Residential Commercial Space heating 6.7 2.0Space cooling 1.5 1.1Water heating 2.7 0.9Refrigerator/Freezer 1.7 0.6Lighting 1.1 3.8Cooking 0.6 -Clothes dryers 0.6 -Color TVs 0.8 -Ventilation/Furnace fans 0.4 0.6Office equipment - 1.4Miscellaneous 3.0 4.9Total 19.0 15.2

Source: Interlaboratory Working Group, 2000

(Units: quads per year = 1.05 EJ y-1)

Page 15: Data, Data Everywhere

A Three-Phase Approach• Phase 1: Passive Monitoring

– The availability of cheap, connected (wired or wireless) sensors makes it possible for the end-user to monitor energy-usage of buildings and individual appliances and act there-on.

– Primary feedback on usage– Monitor health of the system (30% inefficiency!)

• Phase 2: Quasi-Active Monitoring and Control– Combining the monitoring information with instantaneous feedback on the cost of usage

closes the feedback loop between end-user and supplier.

• Phase 3: Active Energy-Management through Feedback and Control—Smart Buildings and Intelligent Appliances– Adding instantaneous and distributed control functionality to the sensoring and monitoring

functions increases energy efficiency and user comfort

Page 16: Data, Data Everywhere

Cory Hall Energy Monitoring Network

50 nodes on 450 nodes on 4thth floor floor30 sec sampling30 sec sampling250K samples to database over 6 weeks250K samples to database over 6 weeksMoved to Intel Lab – come play!Moved to Intel Lab – come play!

Page 17: Data, Data Everywhere

Smart Buildings

Dense wireless network of Dense wireless network of sensor, control, andsensor, control, and

actuator nodesactuator nodes

• Task/ambient conditioning systems allow conditioning in small, localized zones, to be individually controlled by building occupants and environmental conditions

• Joint projects among BWRC/BSAC, Center for the Built Environment (CBE), IEOR, Intel Lab, LBNL

Page 18: Data, Data Everywhere

Control of HVAC systemsUnderfloor Air DistributionConventional Overhead System

Page 19: Data, Data Everywhere

Control of HVAC Systems

• Underfloor system can save energy because it can get hotter near ceiling

• Project with CBE (Arens, Federspiel)• Need temperature sensors at different heights• Simulation results

– Hot August day in Sacramento– Underfloor HVAC saves 46% of energy

• Future: test in instrumented room

Page 20: Data, Data Everywhere

More sensors – air velocity

• Uses time of flight of sound to determine 3D air velocity

• Significance – Heat transfer (energy)– Air quality– Perception of temperature

Page 21: Data, Data Everywhere

Smart Dust Goes NationalSmart Dust Goes National Academia: UCSD, UCLA, USC, MIT,

Rutgers, Dartmouth, U. Illinois UC, NCSA, U. Virginia, U. Washington, Ohio State

Industry: Intel, Crossbow, Bosch, Accenture, Mitre, Xerox PARC, Kestrel

Government: National Center of Supercomputing, Wright Patterson AFB

Page 22: Data, Data Everywhere

Why Broadband Connectivity When Memory Is So Cheap?

• Because users want to interact with the data in real time

• Users need to access the data at the right time and at the right place

• They need to access data in the right format• They want the right amount of data

Page 23: Data, Data Everywhere

Examples

• Distributed computation

• Cluster technology

• The Berkeley Millenium Project

Page 24: Data, Data Everywhere

Cluster Counts• NOW (circa 1994) 4proc HP ->36proc SPARC10 ->100proc Ultra1• Millennium Central Cluster (Intel Donation)

– 99 Dell 2300/6400/6450 Xeon Dual/Quad: 332 processors– Total: 211GB memory, 3TB disk– Myrinet 2000 + 1000Mb fiber ethernet

• OceanStore/ROC cluster, Astro cluster, Math cluster, Cory cluster, more

• CITRIS Pilot Cluster : 3/2002 deployment (Intel Donation)– 4 Dell Precision 730 Itanium Duals: 8 processors– Total: 20GB memory, 128GB disk– Myrinet 2000 + 1000Mb copper ethernet

Page 25: Data, Data Everywhere

Current Network

Page 26: Data, Data Everywhere

CITRIS Network Rollout

Page 27: Data, Data Everywhere

Network Rollout

• Millennium Cluster– Keep existing Nortel 1200/1100/8600– New Foundry FastIron 1500

• CITRIS Cluster– New Foundry FastIron 1500

• Backbone– 2 Foundry BigIron 8000

• Cost of expansion $280K (SimMillennium)

Page 28: Data, Data Everywhere

Millennium Cluster Tools

• Rootstock Installation• Ganglia Cluster Monitoring• gEXEC – remote execution/load balancing• Pcp – parallel copying/job staging

All in production, open source, cluster community development on sourceforge.net

Page 29: Data, Data Everywhere

Rootstock Installation Tool• Installation configuration

stored centrally• Build local cluster specific

root from central root• Install/reinstall cluster

nodes from local rootstock• http://rootstock.millenniu

m.berkeley.edu/

• Has become basis for http://rocks.npaci.edu/ cluster distribution.

Page 30: Data, Data Everywhere

Ganglia Monitoring• Coherent distributed hash of cluster information

– Static: cpu speed, total memory, software versions, boottime, upgradetime etc.– Dynamic: load, cpu idle, memory available, system clock, etc.– Heartbeat– Customizable with simple API for any other metric

• Data is exchanged in well defined XML and XDR• Lightweight – small memory footprint and minimal communication

(tunable).• Scalable – tested on several 512+ node clusters• Trusted hosts - feature allows clusters of clusters to be linked within a single

monitoring and execution domain.• Ported to Linux, FreeBSD, Solaris, AIX, and IRIX, +active development

by community for other ports• Dell Open Cluster Group seriously evaluating this as basis for their cluster

computing tool distribution. “The only monitoring that scales over 64 nodes”

Page 31: Data, Data Everywhere
Page 32: Data, Data Everywhere

gEXEC – remote execution• History

– Glunix from NOW– rEXEC from Millennium– gEXEC UCB/CalTech collaboration

• Lightweight – minimal number of threads on frontend + fanout• Decentralized – no central point of failure• Fault tolerant – fallback ability + failure checks at runtime• Interactive – feels like a single machine• Load balanced from Ganglia Monitoring data • Scalable to at least 512 nodes.• Unix authorization plus cluster keys

e.g. gexec –n 3 hostnamegexec –n 0 render –in input.${VNN} –out output.${VNN}

Page 33: Data, Data Everywhere

Pcp – parallel copy

• Newest addition to cluster suite• Fanout copy of files/directories to nodes• Scalable• Used for job staging• Future of this tool is to wrap it up as an

option into gEXEC.

Page 34: Data, Data Everywhere

• Centre National De La Recherche Scientifique http://www.in2p3.fr• SDSC http://www.sdsc.edu• IE&M http://iew3.technion.ac.il/• GMX http://www.gmx.fr• CAS, Chemical Abstracts Service http://www.cas.org• Keldysh Institute of Applied Mathematics (Russia) http://www.kiam1.rssi.ru• LUCIE (Linux Universal Config. & Install Engine) http://matsu-

www.is.titech.ac.jp/~takamiya/lucie/• Mellanox Technologies http://www.mellanox.co.il/• TerraSoft Solutions (PowerPC Linux) http://terraplex.com/tss_about.shtml• Intel http://www.intel.com/• BellSouth Internet Services http://services.bellsouth.net/external/• ArrayNetworks http://www.clickarray.com/• MandrakeSoft http://www.mandrakesoft.com• Technische Universitat Graz http://www.TUGraz.at/• GeoCrawler http://www.geocrawler.com/• Cray http://www.cray.com/• Unlimited Scale http://www.unlimitedscale.com/• UCSF Computer Science http://cs.usfca.edu/• RoadRunner http://www.houston.rr.com• Veritas Geophysical Integrity http://www.veritasdgc.com• Dow http://www.dow.com/• The Max Planck Society for the Advancement of Science http://www.mpg.de• Lockheed Martin http://www.lockheedmartin.com• Duke University http://www.duke.edu• Framestore Computer Film Company http://www.framestore-cfc.com• nVidia http://www.nvidia.com/• SAIC http://www.saic.com• Paralogic http://www.plogic.com/• Singapore Computer Systems Limited http://www.scs.com.sg/• Hughes Network Solutions http://www.hns.com• University of Washington, Computer Science http://www.cs.washington.edu• Experian http://www.experian.com• L'Universite de Geneva http://www.unige.ch• Purdue Physics Department http://www.physics.purdue.edu/• Atos Origin Engineering Services http://www.aoes.nl/• Teraport http://www.teraport.se• Daresbury Laboratory http://www.dl.ac.uk

• Clinica Sierra Vista http://www.clinicasierravista.org• LondonTown http://www.londontown.com/• National Hellenic Research Foundation http://www.eie.gr• RightNow Techologies http://www.rightnow.com/• Idaho National Engineering and Environmental Laboratory http://www.inel.gov• WesternGeco http://www.westerngeco.com• 80/20 Software Tools http://rc.explosive.net• Optiglobe Brazil http://www.optiglobe.com.br• Brunel University http://www.brunel.ac.uk• Cinvestav Instituto Politecnico Nacional http://www.ira.cinvestav.mx• Conexant http://www.hotrail.com• Dell http://www.dell.com/• SuSE Linux http://www.suse.de• Arabic on Linux http://www.planux.com• Delgado Community College, New Orleans http://www.dcc.edu• Boeing http://www.boeing.com• RedHat http://www.redhat.com/• University of Pisa, Italy http://www.df.unipi.it• Ecole Normale Superieure De Lyon http://www.ens-lyon.fr• iMedium http://www.imedium.com• Moving Picture Company http://www.moving-picture.com• Professional Service Super Computers http://www.pssclabs.com• AlgoNomics http://www.algonomics.com• Ocimum Biosolutions http://www.ocimumbio.com• Caltech http://www.caltech.edu• VitalStream http://www.publichost.com• Sandia National Laboratory http://www.sandia.gov/• UC Irvine http://www.uci.edu• Guide Corporation http://www.guidecorp.com/• Matav http://www.matav.hu• Math Tech, Denmark http://www.math-tech.dk• Istituto Trentino Di Cultura http://www.itc.it• Compaq http://www.compaq.com/• National Research Council Canada http://www.nrc.ca• Overture http://www.overture.com• Petroleum Geo-Services http://www.pgs.com• National Research Laboratory of the US Navy http://www.nrl.navy.mil• White Oak Technologies, Inc. http://www.woti.com/

Known Sites Using Ganglia Cluster ToolkitMost popular cluster and distributed computing software on sourceforge.netOver 7000 downloads since release of 1/2002

Page 35: Data, Data Everywhere

Grid computing• Working with key cluster software developers from research and

industry to standardize cluster tools within the Global Grid Forum (GGF).

Page 36: Data, Data Everywhere

CITRIS Cluster• Goal is to build a production level cluster environment

that supports and is driven by CITRIS applications– NOW mostly experimental– Millennium ½ developmental ½ production

• Clusters adopted as primary compute platform– ~800 current Millennium users– 65% average CPU utilization on Millennium cluster, many

times 100% utilization– 50% of top 20 PACI users compute on Linux clusters for

development and production runs.

Page 37: Data, Data Everywhere

Foundry8000

1TFlop 1.6TB memory100 Dual ItaniumCompute Nodes

10 Storage Nodes

50TB Fibre ChannelStorage

Myrinet2000

Foundry8000

Foundry1500

CampusCore

100

10

100

10

101 Gigabit EthernetMyrinet

Fibre Channel

2 Frontend Nodes22

Page 38: Data, Data Everywhere

Steve Brenner ProjectLarge Molecular Sequence and

Structure Databases• These databases are in gigabytes• They provide web services in which low latency is

important• They often work remotely• The campus 70Mbit limit is increasingly saturated,

making it impossible to effectively provide services and do the work

• They need tele/video conferencing over IP

Page 39: Data, Data Everywhere

Background of the Brain Imaging Center at Berkeley• Campus-wide resource dedicated to Functional

Magnetic Resonance Imaging (FMRI) research• Non-invasive “neuroimaging” technique used to

investigate the blood flow correlates of neural activity

• BIC houses a Varian 4 Tesla scanner and Neuroimaging Computational Facility providing collaboration among neuroscientists, physicists, chemists, statisticians, ee and cs scientists

Page 40: Data, Data Everywhere

Currant LAN

• Due to high volume of data, we established high speed connections between computers in buildings around the campus

• LAN consists of two Cisco Catalyst 6500 switches connected with optic fiber and communicate at Gigabit Ethernet speed

• Workstations connected to network at Fast Ethernet speed (100 Mbits/sec, full duplex)

Page 41: Data, Data Everywhere

WAN Needs

• Geographically distributed collaborative researchers and immense data sets make high speed networking a priority.

• Collaborations exist between researchers at UCSD, UCSF, UC Davis, Stanford, Varian Inc. and NASA Ames.

• With spiral imaging, we will soon be capable of generating data in excess of 1MB/s per scanner

Page 42: Data, Data Everywhere

NASDAQ vs. O'Reilly Tech Book Sales at Amazon January 1, 1999 through September 30, 2001

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Date

Norm

aliz

ed U

nits

Sol

d Va

lue

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized O'Reilly Unit Sales atAmazon

Normalized NASDAQ Index Value

Page 43: Data, Data Everywhere

CITRIS Network in Smart Classroom

Page 44: Data, Data Everywhere