the worldwide lhc computing grid, gridpp and...
TRANSCRIPT
29/08/2013 S 2
Overview
• The Grid
– Motivation and history
• Thanks David Britton
– Live Demo
29/08/2013 S 3 3
Introduction
The physics
The LHC
The Grid
The Experiments
July 4th 2012:
Rolf Heuer
CERN
Director
General
29/08/2013 S 4 4 David Britton, University of Glasgow
Why we need the Grid.
What is a Grid?
How the Grid works.
Grid usage and impact.
Evolution.
Summary.
Outline
29/08/2013 S 5
Challenge – The Data Volume
5 David Britton, University of Glasgow
All Events
Standard
Model
W,Z, Jets
Higgs
Notionally
40TB/sec
200MB/sec
recorded
Higgs?
10 Orders of
Magnitude
29/08/2013 S 6
Challenge: Date Complexity
6 David Britton, University of Glasgow
Multiple separate interactions
during each “collision”.
TABLE OF MILLIONS
Collisions 40-million times a second,
each a composite of many interactions.
150-million electronic channels on
ATLAS and CMS detectors.
15-million gigabytes (15 Petabytes) of
data recorded per year.
Expect a few per million recorded
collisions to contain a Higgs (but
individually you can’t tell that they are
Higgs).
29/08/2013 S 7
Data Pyramid
7 David Britton, University of Glasgow
Raw Data
Reconstructed Data
Analysis Objects
Tag Data
Ntuples
H
Monte Carlo Data
Reconstructed Data
Analysis Objects
Tag Data
Ntuples
H
Total ATLAS Disk Used
100 PB
2008 2012
1 Petabyte = 1000 Terabytes
1 Terabyte = 1000 Gigabytes
1 Gigabyte = 1000
Megabytes
29/08/2013 S 8
What is the Grid?
8 David Britton, University of Glasgow
29/08/2013 S 9 9 David Britton, University of Glasgow
Web:
Focused historically on sharing information
(high level data - text, picture, music, video)
Allows a limited set of predetermined actions
(data processing) such as search, filter, sort,
stream, etc.
Grid: The idea is to share storage and
computing power more directly,
enabling much larger data sets to be
shared with user-determined data
processing.
Web vs Grid
29/08/2013 S 10 10 David Britton, University of Glasgow
Evolution towards Grid
29/08/2013 S 11 11 David Britton, University of Glasgow
Why “Grid?”
29/08/2013 S 12
How does it work?
12 David Britton, University of Glasgow
E = mc2
Grid
Middleware
29/08/2013 S 13 13 David Britton, University of Glasgow
Middleware
CPU Disks, CPU etc
Application Layer
OPERATING
SYSTEM
Word/Excel
Email/Web
Your
Program
Games
MIDDLEWA
RE
CPU
Cluster
User
Interface
Machine
CPU
Cluster
CPU
Cluster
Resource
Broker Information
Service
Grid
Disk
Server
Your
Program
Replica
Catalogue Bookkeeping
Service
Single PC
Middleware is the Operating System of a distributed computing system.
29/08/2013 S 14 14 David Britton, University of Glasgow
How does it work?
Getting Started
1. Get a digital certificate (UK Certificate Authority)
2. Join a Virtual Organisation (VO)
Authentication – who you are
Authorisation – what you are allowed to do
29/08/2013 S 15 15 David Britton, University of Glasgow
How does it works?
The details
VOMS
WMS
JS
RB
LFC
BDII
Logging &
Bookkeeping
3
CPU Nodes Storage
Grid Enabled Resources
CPU Nodes Storage
Grid Enabled Resources
CPU Nodes Storage
Grid Enabled Resources
CPU Nodes Storage
Grid Enabled Resources
4
5
Submitter
6
7
8 9
10
The Grid
glite-wms-job-submit myjob.jdl Myjob.jdl
JobType = “Normal”;
Executable = "/sum.exe";
InputData = "LF:testbed0-00019";
DataAccessProtocol = "gridftp";
InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};
OutputSandbox = {“sim.err”, “test.out”, “sim.log"};
Requirements = other. GlueHostOperatingSystemName == “linux" &&
other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ && other.GlueCEPolicyMaxWallClockTime > 10000;
Rank = other.GlueCEStateFreeCPUs;
gridui
JDL
11
0 VOMS-proxy-init
1
2
Job S
tatu
s?
29/08/2013 S 16
Storage Development
The similar exponential increase in the storage
density, and the corresponding fall in the cost
of data storage.
CPU Development
The sustained exponential increase in the
density of transistors, and the corresponding
fall in the cost of computational power.
Enabling Technology
16
Network Development
The similar exponential increase in the
available bandwidth and the corresponding fall
in the cost of moving data.
Density of storage
29/08/2013 S 17
When is a Grid useful?
17 David Britton, University of Glasgow
Problems that are highly parallelizable Problem
Grid
Solution
Input data is independent
e.g. Images:
A=2
B=3 A=3
B=3 A=2
B=4
Simulation using
different parameters:
Not so good for closely
coupled problems
These pieces may be
independent
These pieces will have
to interact
29/08/2013 S 18
Structure of the Grid
Institutes
CERN computer centre
RAL,UK
ScotGrid NorthGrid SouthGrid London
France Italy Germany USA
Glasgow Edinburgh Durham
Tier 0
Tier 1 National centres
Tier 2 Regional groups
Offline farm
Online system
Workstations
Useful model for
Particle Physics but
not necessary for
others
Studies in the late 90’s lead to a hierarchical structure.
RAL,UK
ScotGrid NorthGrid SouthGrid London
Glasgow Edinburgh Durham
“GridPP”
“wLCG”
18 David Britton, University of Glasgow
29/08/2013 S 19
Multiple Grids
19 David Britton, University of Glasgow
Worldwide LHC Computing Grid
(WLCG) combines:
•EGI (European Grid Infrastructure).
•OSG (Open Science Grid) in the US.
•NorduGrid in the Nordic countries.
Combined Resources (August 2012):
•152 Sites in 36 Countries
•325,000 logical CPUs
•210 Petabytes of disk
•180 Petabytes of tape
Comparing number of cores (which is not a fair measure of the computing
power) “CERN’s (distributed) SuperComputer” would rank 3rd in the current
top-10 supercomputers worldwide.
29/08/2013 S 20
UK Contribution
20 David Britton, University of Glasgow
UK Resources (August 2012):
•19 Sites
•36,000 logical CPUs
•21 Petabytes of disk
•5+ Petabytes of tape
CPU contributions in 2012
USA
UK
29/08/2013 S 21
Grid in Action
21 David Britton, University of Glasgow
29/08/2013 S 22
Data Transfer
22 David Britton, University of Glasgow
Nominal
design rate
was 1.3 GB/s
29/08/2013 S 23
Moving Data – Quick quiz
• With a Gbit connection, how long does it take to move
– 1 GB (Gbyte)?
– 1TB
– 10TB
29/08/2013 S 24
Worldwide Usage
24 David Britton, University of Glasgow
1 million jobs/day
29/08/2013 S 25
Impact
“Wealth Creation” “Quality of Life”
Immense (pictures); Cambridge Ontology (n-
grams); Econophysica (financial); Total Oil
(exploration); Constellation Tech (software)
Avian Flu (biomed); Malaria (Wisdom
project); Landslide prediction; nano-CMOS;
photonics; etc.
29/08/2013 S 26
Evolution-I
26 David Britton, University of Glasgow
Tier-structure for wLCG designed in the late
90’s assumed 600Mbps links. Today’s multi-
Gigabit links enable a more flexible and robust
architecture. May increase complexity.
New CPU architectures require more
application development in order to exploit the
increase in computing capacity. This is a
challenge for legacy code.
Maximum Sustained Bandwidth Density of storage
Maximum Sustained Bandwidth
Although storage density continues to
increase it is getting more difficult to use,
which puts demands on the architecture and
applications, increasing the complexity.
Evolution of computing models
Hierarchy Mesh
29/08/2013 S 27
Evolution-II
27 David Britton, University of Glasgow
MIDDLEWA
RE
CPU
Cluster
User
Interface
Machine
CPU
Cluster
CPU
Cluster
Resource
Broker Information
Service
Grid
Disk
Server
Your
Program
Replica
Catalogue Bookkeeping
Service
MIDDLEWARE
CPU
Cluster
User
Interface
Machine
CPU
Cluster
CPU
Cluster
Resource
Broker Information
Service
Gri
d
Disk
Server
Your
Program
Replica
Catalogue Bookkeeping
Service
Too much middleware actually resides in the
application-layer and is unique to an individual
user group (virtual organisation).
In addition, there are multiple middleware
stacks (gLite; ARC, Unicore, etc) used by
different user groups.
Some degree of rationalisation and consolidation is required - this is a
natural part of the process when working in a development
environment.
29/08/2013 S 28
Evolution-III
28 David Britton, University of Glasgow
The boundary between web and Grid has
become blurred as Grid ideas are taken up.
The web is becoming much more machine-
readable; data movement is becoming more
automated and more extensive as bandwidth
improvements enable new services.
The Grid is also about collaboration: This is somewhat different in the
commercial world where partners tend not to share internal
infrastructure but out-source to a third-party. So we’ve seen the
growth of “Cloud Computing”. Again, the boundaries are becoming
blurred with Grids of Clouds and Clouds of Grids likely in the future.
29/08/2013 S 30
Summary
30 David Britton, University of Glasgow
29/08/2013 S 31
Summary
31 David Britton, University of Glasgow
A Large Hadron Collider Delivering collisions up to 40 million times per second
A Global Supercomputer
29/08/2013 Support for Non LHC
VOs 32
VO usage (3 months)
29/08/2013 Support for Non LHC
VOs 33
Non LHC VO share
http://pprc.qmul.ac.uk/~walker/votable.html
29/08/2013 S 34
Demo
• Submitting a job
– Helloworld
– Running a script
• Managing data
– Copying a file
• LFC
– Mounting via WebDAV
• In future redirect via LFC
29/08/2013 S 35
Hello world example
walker@heplt019:~/talks/2013-daresbury/londongrid-
example$ cat helloworld.jdl
#############Hello World#################
Executable = "/bin/echo";
Arguments = "Hello welcome to londongrid ";
StdOutput = "hello.out";
StdError = "hello.err";
OutputSandbox = {"hello.out","hello.err"};
######################################
29/08/2013 Support for Non LHC
VOs 36
“Live Demo – mounting
storage”
heplt019:~# mount -t davfs https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ /mnt/liverpoolPlease enter the username to authenticate with server
https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ or hit enter for none.
Username:
Please enter the password to authenticate user with server
https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ or hit enter for none.
Password:
Please enter the password to decrypt client
certificate /etc/davfs2/certs/private/my_cert.p12.
Password: /sbin/mount.davfs: the server certificate is not trusted
issuer: Authority, eScienceCA, UK
subject: CSD, Liverpool, eScience, UK
identity: hepgrid11.ph.liv.ac.uk
fingerprint: 34:c1:2d:63:57:2d:ff:07:10:21:cc:1d:a7:7a:ad:58:f9:bd:4d:b0
You only should accept this certificate, if you can
verify the fingerprint! The server might be faked
or there might be a man-in-the-middle-attack.
Accept certificate for this session? [y,N] y
/sbin/mount.davfs: warning: the server does not support locks
29/08/2013 Support for Non LHC
VOs 37
“Live Demo”
heplt019:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 90G 78G 7.2G 92% /
tmpfs 2.0G 0 2.0G 0% /lib/init/rw
udev 2.0G 316K 2.0G 1% /dev
tmpfs 2.0G 0 2.0G 0% /dev/shm
https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/
26G 13G 13G 50% /mnt/liverpool
heplt019:~# cd /mnt/liverpool/atlas/atlasscratchdisk
heplt019:/mnt/liverpool/atlas/atlasscratchdisk# echo "hello webdav" >cjwhello
heplt019:/mnt/liverpool/atlas/atlasscratchdisk# cat cjwhello
hello webdav
29/08/2013 S 38
What does this mean for you?
• GridPP expertise: Big Data
– Compute
– Data transfer over Wide area network
– Federated access
• If your problem fits our solution:
– Talk to us
• Share experience
• Some resources
– Scientific Linux (RHEL/centos compatible)
29/08/2013 S 39
Lessons
• Grid good for
– embarrassingly parallel problems
• Need to deal with failure
– Bookkeeping difficult
– Ganga and Dirac solutions
• FTS for file transfers
29/08/2013 S 40
What does this mean for you?
• GridPP expertise: Big Data
– Compute
– Storage
– Data transfer over Wide area network
– Federated access
• Lots of people accessing the same data
• How do I learn more?
– Talk to me
– Talk to your local grid admin (high energy physics group)
29/08/2013 S 41
GridPP sites in the UK
• If you want to
know more
about GridPP,
talk to your local
site admin
29/08/2013 S 42
Conclusions
• Overview of Grid computing
– LHC and the Higgs
• GridPP
• Demo
– Submitted some jobs
– Transferred some data
29/08/2013 S 43
Acknowledgements
• GridPP
• David Britton
– Many of the slides