edinburgh - at the frontiers of e-science richard kenway
TRANSCRIPT
Edinburgh - at the Frontiers of e-Science
Edinburgh - at the Frontiers of e-Science
Richard KenwayRichard Kenway
e-science = searching for the unknowne-science = searching for the unknown
discovery sciencediscovery science
in vast amounts of datain vast amounts of data
electronic ‘needle in a haystack’electronic ‘needle in a haystack’
• to find the Higgs boson– and explain where mass comes from
• to find the Higgs boson– and explain where mass comes from
• you need to build a Grid• you need to build a Grid
and
… are not enough
LHC computing challengeLHC computing challenge
Tier2 Centre ~1000 PCs
Online System
Offline Farm~20,000 PCs
CERN Computer Centre >20,000 PCs
RAL Regional Centre
US Regional Centre
French Regional Centre
Italian Regional Centre
InstituteInstituteInstituteInstitute ~200 PCs
Workstations
~100 MByte/sec
~100 MByte/sec
100 - 1000 Mbit/sec
•one bunch crossing per 25 ns
•100 triggers per second
•each event is ~1 MByte
physicists work on analysis “channels”
each institute has ~10 physicists working on one or more channels
data for these channels is cached by the institute server
Physics data cache
~PByte/sec
~ Gbit/sec or Air Freight
Tier2 Centre ~1000 PCs
Tier2 Centre ~1000 PCs
~Gbit/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 3Tier 3
Tier 4Tier 4
assumes PC = ~ 25 SpecInt95
ScotGRID++ ~1000 PCs
Tier 2Tier 2
the web on steroidsthe web on steroids
• 1989: Tim Berners-Lee invented the web– so physicists around the world
could share documents
• 1989: Tim Berners-Lee invented the web– so physicists around the world
could share documents
• 1999: Grids add to the web– computing power– data management– big instruments– (eventually) sensors
• 1999: Grids add to the web– computing power– data management– big instruments– (eventually) sensors
a new global infrastructurea new global infrastructure
• the Grid is an emergent infrastructure to deliver dependable, pervasive and uniform access to globally distributed, dynamic and heterogeneous resources
• problems of scalability, interoperability, fault tolerance, resource management and security
• the Grid is an emergent infrastructure to deliver dependable, pervasive and uniform access to globally distributed, dynamic and heterogeneous resources
• problems of scalability, interoperability, fault tolerance, resource management and security
sensor nets
data archives
computers
software
colleagues
instruments
• information on demand - like power from a socket• information on demand - like power from a socket
underpinning technologyunderpinning technology
why now?why now?
• for 50 years, we have been riding the crest of a IT wave– building vast
untapped global resources
– hundreds of millions of (mostly) idle PCs
• for 50 years, we have been riding the crest of a IT wave– building vast
untapped global resources
– hundreds of millions of (mostly) idle PCs
• big science is facing a data tsunami• big science is facing a data tsunami
and
3.5 million users22 teraflops
1,000,0001,000,000
100,000100,000
10,00010,000
1,0001,000
100100
1010
11
0019851985 19901990 19951995 20002000 20052005 20102010
MIPS/chipMIPS/chip
MIPS - Millions of instructions per second
*Pentium, 286, 386 and 486 are registered trademarks of Intel Corp.
MIPS - Millions of instructions per second
*Pentium, 286, 386 and 486 are registered trademarks of Intel Corp.
YearYear
increase in MIPS per chip increase in MIPS per chip
286*286*
386*386*
486*486*
Pentium*Pentium*Pentium ProPentium Pro
P8P8P7 (Merced)P7 (Merced)
P12P12microprocessor speeds double every 18 months (Moore’s Law)
Jul-95Jul-95 Jul-96Jul-96 Jul-97Jul-97 Jul-98Jul-98 Jul-99Jul-99 Jul-00Jul-00 Jul-01Jul-01 Jul-02Jul-02 Jul-03Jul-03
Source: ITU “Challenges to the Network: Internet for Development, 1999”
Internet Software Consortium (www.isc.org), RIPE (www.ripe.net)
Source: ITU “Challenges to the Network: Internet for Development, 1999”
Internet Software Consortium (www.isc.org), RIPE (www.ripe.net)
(million) actual and projected(million) actual and projected
internet hosts internet hosts
8.28.216.716.7
26.126.1
36.736.7
56.256.2
8585
120120
150150
180180
network capacity doublesevery 9 months
1,2001,200
1,0001,000
800800
600600
0019951995 19961996 19971997 19981998 19991999 20002000
note: columns show actual and projected users at end of year
source: ITU
note: columns show actual and projected users at end of year
source: ITU
400400
200200
20012001 20022002 20032003
fixed lines, mobile phones & internet usersfixed lines, mobile phones & internet users
fixed-line telephonesfixed-line telephones
estimated Internet usersestimated Internet users
mobile phonesmobile phones
millions
Quality of Service on the internetQuality of Service on the internet
• aim to distinguish types of traffic– high priority fast lanes– low priority slow lanes
• hard to configure• intersim simulation
tool– detailed model of
network– understand and
validate configurations
• aim to distinguish types of traffic– high priority fast lanes– low priority slow lanes
• hard to configure• intersim simulation
tool– detailed model of
network– understand and
validate configurations
EPCC + Cisco Systems
Grid applicationsGrid applications
whole-system simulationswhole-system simulations
•braking performance•steering capabilities•traction•dampening capabilities
landing gear models
•lift capabilities•drag capabilities•responsiveness
wing models
•deflection capabilities•responsiveness
stabilizer modelsairframe models
crew capabilities- accuracy- perception- stamina- reaction times- SOP’s
human models •thrust performance•reverse thrust performance•responsiveness•fuel consumption
engine models
NASA Information Power Grid: coupling all sub-system simulations
global in-flight engine diagnosticsglobal in-flight engine diagnostics
in-flight data
airline
maintenance centre
ground station
global networkeg SITA
internet, e-mail, pager
DS&S Engine Health Center
data centre
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
National Airspace Simulation EnvironmentNational Airspace Simulation Environment
NASA Information Power Grid: aircraft, flight paths, airport operations and the environmentare combined to get a virtual national airspace
VirtualNational Air
SpaceVNAS
GRCengine models
LaRC
airframe models
landinggear models
ARC
wing models
stabilizer models
human models
• FAA ops data• weather data• airline schedule data• digital flight data• radar tracks• terrain data• surface data
22,000 commercialUS flights a day
50,000 engine runs
22,000 airframe impact runs
132,000 landing/take-off gear runs
48,000 human crew runs
66,000 stabilizer runs
44,000 wing runs
simulationdrivers
from genome to functionfrom genome to function
• gene expression as an embryo develops• gene expression as an embryo developsEPCC MouseGrid: optical tomography image reconstruction in real time
digital radiology on the Griddigital radiology on the Grid
• 28 petabytes/year for 2000 hospitals• must satisfy privacy laws
• 28 petabytes/year for 2000 hospitals• must satisfy privacy laws
University of Pennsylvania
emergency response teamsemergency response teams
• bring sensors, data, simulations and experts together– wildfire: predict movement
of fire & direct fire-fighters – also earthquakes,
peacekeeping forces, battlefields,…
• bring sensors, data, simulations and experts together– wildfire: predict movement
of fire & direct fire-fighters – also earthquakes,
peacekeeping forces, battlefields,…
Los Alamos National Laboratory: wildfireNational Earthquake Simulation Grid
Earth observationEarth observation
• ENVISAT– € 3.5 billion– 400 terabytes/year– 700 users
• ENVISAT– € 3.5 billion– 400 terabytes/year– 700 users
• ground deformation prior to a volcano
• ground deformation prior to a volcano
Grid developmentGrid development
data, information and knowledgedata, information and knowledge
• virtual data …from the grid– from a database somewhere– computed on request– measured on request
• automated knowledge …from computer science– data: un-interpreted bits and bytes– information: data equipped with meaning– knowledge: information applied to solve a problem
• virtual data …from the grid– from a database somewhere– computed on request– measured on request
• automated knowledge …from computer science– data: un-interpreted bits and bytes– information: data equipped with meaning– knowledge: information applied to solve a problem
three layer Grid abstractionthree layer Grid abstraction
Information GridInformation Grid
Knowledge GridKnowledge Grid
Computation/Data Grid
Computation/Data Grid
Data to
Knowledge
Data to
Knowledge
ControlControl
the Grid as an evolving conceptthe Grid as an evolving concept
• enabler for transient ‘virtual organisations’ • anatomy: a software infrastructure that enables flexible,
secure, co-ordinated resource sharing among dynamic collections of individuals, institutions and resources– Foster, Kesselman & Tuecke (2001)
• evolution of and integration with web services• physiology: everything is a Grid service ie a service that
conforms to a set of conventions for management and exchanging messages– Foster, Kesselman, Nick & Tuecke (2002)
• Global Grid Forum: define a standard Grid architecture– big business and big science working together
• enabler for transient ‘virtual organisations’ • anatomy: a software infrastructure that enables flexible,
secure, co-ordinated resource sharing among dynamic collections of individuals, institutions and resources– Foster, Kesselman & Tuecke (2001)
• evolution of and integration with web services• physiology: everything is a Grid service ie a service that
conforms to a set of conventions for management and exchanging messages– Foster, Kesselman, Nick & Tuecke (2002)
• Global Grid Forum: define a standard Grid architecture– big business and big science working together
e-science in Scotlande-science in Scotland
UK e-Science programmeUK e-Science programme
‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
‘e-Science will change the dynamic of the way science is undertaken.’
John Taylor
Director General of Research Councils
Office of Science and Technology
‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
‘e-Science will change the dynamic of the way science is undertaken.’
John Taylor
Director General of Research Councils
Office of Science and Technology
£80m Collaborative projects
E-ScienceSteering
Committee
DG Research Councils
Director
Director’s Management Role
Director’sAwareness and Co-ordination Role
Generic Challenges EPSRC (£15m), DTI (£15m)
Industrial Collaboration (£40m)
Academic Application SupportProgramme
Research Councils (£74m), DTI (£5m)
PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)
Grid TAG
UK e-Science fundingUK e-Science funding
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Soton
London
Belfast
DL
RAL Hinxton
UK e-science centres
AccessGrid always-on video walls
AccessGrid always-on video walls
National e-Science CentreNational e-Science Centre• Edinburgh + Glasgow Universities
– Physics & Astronomy 2– Informatics, Computing Science– EPCC
• £6M EPSRC/DTI + £2M SHEFC over 3 years
• Edinburgh + Glasgow Universities– Physics & Astronomy 2– Informatics, Computing Science– EPCC
• £6M EPSRC/DTI + £2M SHEFC over 3 years
www.nesc.ac.uk
• e-Science Institute– visitors, workshops, co-ordination,
outreach
• middleware development– 50 : 50 industry : academia
• ‘last-mile’ networking
• e-Science Institute– visitors, workshops, co-ordination,
outreach
• middleware development– 50 : 50 industry : academia
• ‘last-mile’ networking
data, data everywhere… data, data everywhere…
• Scottish e-Data Information & Knowledge Transformation Centre (eDIKT)– proposal to SHEFC for a centre to develop scalable
database tools– astronomy, bioinformatics, geophysics, particle physics
& commerce
• Scottish e-Data Information & Knowledge Transformation Centre (eDIKT)– proposal to SHEFC for a centre to develop scalable
database tools– astronomy, bioinformatics, geophysics, particle physics
& commerce
• globally distributed heterogeneous databases are growing very fast– science is at the frontier– commerce, healthcare, entertainment are not far behind
• globally distributed heterogeneous databases are growing very fast– science is at the frontier– commerce, healthcare, entertainment are not far behind
Scotland at the frontier… leadingScotland at the frontier… leading
• UK AstroGrid– virtual observatory– linked to EU AVO
• UK AstroGrid– virtual observatory– linked to EU AVO
• UK GridPP + ScotGrid– particle physics data
analysis– linked to EU DataGrid
• UK GridPP + ScotGrid– particle physics data
analysis– linked to EU DataGrid
• UK core e-science– data integration– linked to US Globus
• UK core e-science– data integration– linked to US Globus
• EU enacts + GRIDSTART– supercomputer centres– EU grid projects
• EU enacts + GRIDSTART– supercomputer centres– EU grid projects
Scotland at the frontier… participatingScotland at the frontier… participating
• EU DataGrid: particle physics, biology & medical imaging, Earth observation
• EU DataGrid: particle physics, biology & medical imaging, Earth observation
over 100 scientists engaged in grid development by the end of 2002
over 100 scientists engaged in grid development by the end of 2002
• US DARPA Control of Agent-Based Systems Grid: multinational military operations
• US DARPA Control of Agent-Based Systems Grid: multinational military operations
• UK RealityGrid: interactively couple experiments, simulations and visualisation
• UK RealityGrid: interactively couple experiments, simulations and visualisation
DARPADARPA
imagine a political party reception… imagine a political party reception…
the leader enters… the leader enters…
a rumour is started… a rumour is started…
and propagates across the roomand propagates across the room
from little acorns… from little acorns…
“ … a billion people interacting with a million e-businesses with a trillion intelligent devices interconnected ”
Lou Gerstner, IBM (2000)
“ … a billion people interacting with a million e-businesses with a trillion intelligent devices interconnected ”
Lou Gerstner, IBM (2000)
“ It is worth noting that an essential feature of the type of theory which has been described in this note is the prediction of incomplete multiplets of scalar and vector bosons. ”
Peter Higgs (1964)
“ It is worth noting that an essential feature of the type of theory which has been described in this note is the prediction of incomplete multiplets of scalar and vector bosons. ”
Peter Higgs (1964)
another technological revolution is underwayanother technological revolution is underway