big data from the lhc commissioning: practical lessons from big science - simon metson (cloudant)

Post on 15-Jul-2015

537 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data from the LHC Commissioning

!

Practical Lessons from Big Science

Simon/@drsm79

Hello!

Bristol University Cloudant

Time at places I’ve worked

0

25

50

75

100

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Python Perl Bash C++ Java Javascript Fortran

The formula

G * E

The formulaFixed

Fixed Usually fixed

G* E

The formula

Grant * Effectiveness

The life of LHC data1. Detected by experiment

2. “Online” filtering (hardware and software)

3. Transferred to CERN main campus, archived & reconstructed

4. Transferred to T1 sites, archived, reconstructed & skimmed

5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

6. Written into locally analysable files, put on laptops

7. Turned into a plot in a paper

The life of LHC data1. Detected by experiment

2. “Online” filtering (hardware and software)

3. Transferred to CERN main campus, archived & reconstructed

4. Transferred to T1 sites, archived, reconstructed & skimmed

5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

6. Written into locally analysable files, put on laptops

7. Turned into a plot in a paper

D i g b i g t u n n e l s

C h a i n u p s e r i e s o f “ a t o m s m a s h e r s ”

P u t s e n s i t i v e c a m e r a s i n a w k w a r d p l a c e s

R e c o r d e v e n t s

Process data on high end machines

http://www.chilton-computing.org.uk

The life of LHC data1. Detected by experiment

2. “Online” filtering (hardware and software)

3. Transferred to CERN main campus, archived & reconstructed

4. Transferred to T1 sites, archived, reconstructed & skimmed

5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

6. Written into locally analysable files, put on laptops

7. Turned into a plot in a paper

CMS online data flow

We have a big digital camera

It takes photos of this

courtesy of James Jackson

which come out like this

courtesy of James Jackson

CMS online data flow

We have a big digital camera

Which goes into lots of computers (the HLT)

CMS online data flow

We have a big digital camera

Which goes into lots of computers (the HLT)

Which goes into lots of disk (the Storage Manager)

CMS data flow

We have a big digital camera

Which goes into lots of computers (the HLT)

Which goes into lots of disk (the Storage Manager)

Write to HLT at ~200GB/s

Write to Storage Manager at ~2GB/s

Write to T0 at ~2GB/s

The life of LHC data1. Detected by experiment

2. “Online” filtering (hardware and software)

3. Transferred to CERN main campus, archived & reconstructed

4. Transferred to T1 sites, archived, reconstructed & skimmed

5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

6. Written into locally analysable files, put on laptops

7. Turned into a plot in a paper

1 0 P B o f d a t a / y e a r

The life of LHC data1. Detected by experiment

2. “Online” filtering (hardware and software)

3. Transferred to CERN main campus, archived & reconstructed

4. Transferred to T1 sites, archived, reconstructed & skimmed

5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

6. Written into locally analysable files, put on laptops

7. Turned into a plot in a paper

1PB/week

Why transfer so much data?

To process all the data taken in one year on one computer would take ~64,000 years

The life of LHC data1. Detected by experiment

2. “Online” filtering (hardware and software)

3. Transferred to CERN main campus, archived & reconstructed

4. Transferred to T1 sites, archived, reconstructed & skimmed

5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

6. Written into locally analysable files, put on laptops

7. Turned into a plot in a paper

Analysis

• Each analysis is ~unique

• Query language is C++

• Runs on distributed system and local resources

• Series of “cut” selections to identify interesting events

• Data in the final plot may be substantially reduced from the original dataset

Workflow ladderLarge datasets (>100 TB) Complex computation

Private datasets (0.1-10 GB) Simple computation

Work on laptop/desktop machine, store resulting datasets to Grid storage

Use Grid compute and storage exclusively

Shared datasets (0.1-10 GB) Simple computation

Large datasets (>100 TB) Simple computation

Shared datasets (10-100 GB) Simple computation

Work on departmental resources, store resulting datasets to Grid storage

Shared datasets (10-500 GB) Complex computation

Shared datasets (>500 GB) Complex computation

}}}

Number of users

The life of LHC simulated data

1. Simulated by experimentalists at T0/T1/T2 sites

2. Transferred to T1 sites, archived possibly reconstructed & skimmed

3. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed

4. Written into locally analysable files, put on laptops

5. Turned into a plot in a paper

Most events get cut

!“We are going to die, and that makes us the lucky ones. Most people are never going to die because they are never going to be born.”

!- Richard Dawkins

Adoption & Use

Setup

• Maybe a bit different to other people

• Many sites (>100) with >100’s TB storage, 10000’s worker nodes

• Global system

• Why not at one site?

• politics, power budget, cost

The grid

We Have a “Big Data” Problem

We Have a Big “Data Problem”

Do what you do best, out source the rest

What's interesting is that big data isn't

interesting any more

NIH

Define and refine workflows

Our situation

• Expert users, who are not interested in infrastructure

• Will work around things they perceive as unnecessary limitations

Disruptive users

How to engage disruptive users?

Open access

1PB/week

Open access

Our situation

• Limited resources for integration/testbed style activities

• Strange organisation

Data temperature

There is no such thing as now

Keep things as local as possible

Defining monitoring is difficult

Small files are bad, m'kay

Compartmentalise metadata

Recognise, embrace and communicate failures

People are harder than computers

People are important

The formula

���64

Consequences

• Automate all the things

• Learn to love a configuration management system

• Make sure everyone in the team knows how to interact with it

• Simple human solutions go a long way

Build good abstractions

Encourage collaboration

Workflow ladderLarge datasets (>100 TB) Complex computation

Private datasets (0.1-10 GB) Simple computation

Work on laptop/desktop machine, store resulting datasets to Grid storage

Use Grid compute and storage exclusively

Shared datasets (0.1-10 GB) Simple computation

Large datasets (>100 TB) Simple computation

Shared datasets (10-100 GB) Simple computation

Work on departmental resources, store resulting datasets to Grid storage

Shared datasets (10-500 GB) Complex computation

Shared datasets (>500 GB) Complex computation

}}}

Number of users

Summary

top related