more speed, more data, more automation, more work? alun ashton

48
More speed, more data, more automation, more work? Alun Ashton

Upload: zihna

Post on 13-Jan-2016

23 views

Category:

Documents


2 download

DESCRIPTION

More speed, more data, more automation, more work? Alun Ashton. Thanks to organisers. Diamond Light Source. 1.75+ million man-hours 2,100 tons of steel 35,000 m 3 of concrete 33,000 m 2 of roofing Joint venture company between CCLRC (86%) and Wellcome Trust (14%). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: More speed, more data, more automation, more work? Alun Ashton

More speed,

more data,

more automation,

more work?

Alun Ashton

Page 2: More speed, more data, more automation, more work? Alun Ashton

Thanks to organisers.

Page 3: More speed, more data, more automation, more work? Alun Ashton

1.75+ million man-hours2,100 tons of steel35,000 m3 of concrete33,000 m2 of roofing

Joint venture company betweenCCLRC (86%) and Wellcome Trust (14%)

Electron Beam Energy 3 GeV

Circumference 561.6 m

Diameter of outer wall 235 m

Beam current 300 mA (500 mA)

Start March 2003: Users January 2007

Diamond Light Source

Page 4: More speed, more data, more automation, more work? Alun Ashton

Beamlines

Page 5: More speed, more data, more automation, more work? Alun Ashton

Computing at Diamond.

• Data Acquisition and Scientific Computing

• Controls

• IT support

• External groups

Page 6: More speed, more data, more automation, more work? Alun Ashton

Scientific Computing

Data Analysis

Data Visualisation

eScience

Data Curation

Data Acquisition

Automation

SimulationAnd

Theory

Page 7: More speed, more data, more automation, more work? Alun Ashton

Macromolecular Crystallography computing at Diamond

Phase I (2007)• 3 MX (0.5 – 2.5 Å optimised for 0.98Å) with double crystal monochromator,

Kirkpatrick Baez horizontal and vertical focusing mirrors; Focal spot size ~ 94 m (h) x 17 m (v) (FWHM); estimated flux at 12.6 keV 3.5 x 1012 ph/s; fully automated sample handler; cryo cooling; CCD detector.

• One station will have containment three facility for pathogenic samples

Phase II• Microfocus beam line• Fixed wavelength side station (0.96 Å) (MR & ligand binding studies)• Long wavelength side station for Sulphur anomalous (1.5 – 2.5 Å)

Page 8: More speed, more data, more automation, more work? Alun Ashton

More speed

Page 9: More speed, more data, more automation, more work? Alun Ashton

MX computing at diamond on the beamline

On each of the 3 Beamlines

2 CPU server for Data Acquisition

2 CPU server for Data Analysis

20Tb (RAW) beamline storage1 read and 1 write server(Approx 1 month data storage)

4 Beamline user workstations per beamline:3 RedHat Linux, (2 with dual monitors)

1 windows XP

1 in hutch computer similar to tablet PC with touch screen.

Networking is 1 GBit on beamline and 10 between MX beamlines and MX “near” beamline computers.

Page 10: More speed, more data, more automation, more work? Alun Ashton

MX computing at diamond “near” the beamline

180 Tb (RAW) secondary MX storage(shared between 3 Phase 1 beamlines,

approx 3 months data storage)Administered by 8 servers

24 dual dual (2x2) core CPU Cluster(50% infiniband fast interconnects

Running Sun Grid Engine queuing system)

Local user backup via USB and Firewire drives (small scale CD and DVD writing facilities available)

CCLRC Atlas Data Store – Petabyte data storage

Long term data storage and backup:

Page 11: More speed, more data, more automation, more work? Alun Ashton

Near Beamline computing

Crunchie the cluster

Page 12: More speed, more data, more automation, more work? Alun Ashton

Where does everything fit?

Synchrotron

Crystallization

PIMS(Protein

Production)

Data Processing & Structure

SolutionPipelines

CollectionDB

e-HTPX

Page 13: More speed, more data, more automation, more work? Alun Ashton

More data

Page 14: More speed, more data, more automation, more work? Alun Ashton

PiMS

www.pims-lims.org

Thanks to Chris Morris and

PiMS developers

Page 15: More speed, more data, more automation, more work? Alun Ashton

General Introduction

Page 16: More speed, more data, more automation, more work? Alun Ashton

www.pims-lims.org

Why is Data Modelling Important?

■ A Data Model is a plan for building a database■ detailed enough to be used

to create the physical structure

■ simple enough to communicate to the end user the data structure

■ The Unified Modelling Language (UML)

Page 17: More speed, more data, more automation, more work? Alun Ashton

www.pims-lims.org

Database

■ Record keeping is an important aspect of most business today

■ A stable and clean repository of data■ Constraints to enforce data integrity

■ Open interface ■ Allow users to access, search and retrieve data easily■ Multiple concurrent access

■ Extensible■ New data added

■ Maintainable■ Database provides maintenance tools, plus industry

standards to ensure long-term compatibility■ Robust

■ “industrial strength”

Page 18: More speed, more data, more automation, more work? Alun Ashton

www.pims-lims.org

Scientific goals

■ Recording laboratory information■ A lot of data keeping■ 10,000s of experiments■ 1,000,000s of samples

■ Data interchange and interoperation■ Collaboration in protein production■ Share data between stages and sites■ Data transfer to beamline or NMR ops

■ Data mining and reporting■ Analysis■ Negative results can be mined to improve methods■ Scientific publications■ Data deposition

■ All made feasible by data model■ … plus common understanding of it

Page 19: More speed, more data, more automation, more work? Alun Ashton

www.pims-lims.org

Acknowledgements

■ PiMS developers■ Chris Morris (CCP4)■ Ed Daniel (Daresbury)■ Peter Troshin (MPSI)■ Bill Lin (CCP4)■ Jo van Niekerk (SSPF)■ Susy Griffiths (YSBL)■ Jon Diprose (OPPF)■ Marc Savitsky (OPPF)■ Anne Pajon (EBI)

■ Crystallization developers■ Ian Berry (OPPF)■ Gael Seroul (EMBL-

Grenoble)■ Diederick de Vries

(NKI-Amsterdam)■ Sabrina Haquin (Paris)

■ CCPN developers■ Wayne Boucher■ Rasmus Fogh■ Tim Stevens■ Wim Vranken

Page 20: More speed, more data, more automation, more work? Alun Ashton

What does ‘PiMS’ mean for diamond and diamond users?

Page 21: More speed, more data, more automation, more work? Alun Ashton

Synchrotron data

Page 22: More speed, more data, more automation, more work? Alun Ashton

Image format

Page 23: More speed, more data, more automation, more work? Alun Ashton

Images off the beamlines

• ADSC Q315 – ADSC image size – 20-80Mb– ADSC image rate - <>60Mb/second

• ImgCIF/CBF– 30% size of ADSC uncompressed images

• NeXus

Page 24: More speed, more data, more automation, more work? Alun Ashton

imgCIF/CBF

ADSC header• HEADER_BYTES= 512;• DIM=2;• BYTE_ORDER=little_endian;• TYPE=unsigned_short;• PIXEL_SIZE=0.1026;• BIN=2x2;• ADC=fast;• DETECTOR_SN=922;• DATE=Fri Sep 15 10:07:46

2006;• TIME=1.00;• DISTANCE=250.000;• OSC_RANGE=1.000;• PHI=0.000;

• OSC_START=0.000;• TWOTHETA=0.000;• AXIS=phi;• WAVELENGTH=1.0000;• BEAM_CENTER_X=10.000;• BEAM_CENTER_Y=20.000;• CREV=1;• CCD=TH7899;• BIN_TYPE=HW;• ACC_TIME=1781;• UNIF_PED=1500;• IMAGE_PEDESTAL=40;• SIZE1=3072;• SIZE2=3072;

Page 25: More speed, more data, more automation, more work? Alun Ashton

• Synchrotron and Beamline• Beam conditions: ring energy and current Beam

size Attenuation If available, estimate of photon flux coming out of the collimator.

• Backstop type, size and position wrt sample Date and time

• Detector type and serial number Goniostat (manufacturer and model) Method of sample mounting (by hand, arcs/tongs or by robotics (type))

• Temperature of sample• Sample code (barcode ?)• Text field to allow any special comments

relevant to this experiment to be stored. eg If crystal has been annealed, and if so, what the conditions were. Has the crystal been cryocooled in a capillary etc

Page 26: More speed, more data, more automation, more work? Alun Ashton

• Record the mode the synchrotron is running in.

• Attenuation - this should be a calculated factor Photon flux + error. Maybe an intensity reading

• A record of an experiment number, this would give us the link back to everything else e.g. user etc.

• An image of the crystal, with the cross hairs marking the beam and beam size?

• Beam size at sample and beam size on detector.

Page 27: More speed, more data, more automation, more work? Alun Ashton

NeXus

• All diamond data collection runs will produce NeXus files

• NeXus will serve as a longer term data storage format.

Page 28: More speed, more data, more automation, more work? Alun Ashton

More automation

Page 29: More speed, more data, more automation, more work? Alun Ashton

• Joint collaboration between Daresbury SRD and Diamond.

• GDA sits ‘above’ EPICS which wich does the majority of low level/component/compound motion control.

Generic Data Acquisition (GDA)

Page 30: More speed, more data, more automation, more work? Alun Ashton

Design considerations

• A single software framework which can be applied to all beamlines• Must be flexible \ adaptable – “plug and play”

– must work with both EPICS and non-EPICS hardware

– highly configurable system: different GUIs and hardware on different beamlines, but all work within the same overall architecture

• Similar look and feel across all beamlines– users can visit different beamlines without learning new software every

time

• A single window to operate the beamline• Framework defines more than just code: includes programming

methodologies, coding conventions etc.• Result is a system which is simpler and easier to maintain

Page 31: More speed, more data, more automation, more work? Alun Ashton

Experiment automation

• automateD collectioN of datA – DNA– Automated strategy calculation using BEST– Multi crystal ranking and data collection– Automated autoindex with Mosflm– Automated integration with Mosflm– Quick Scaling results for data quality– Basic radiation damage consideration– Data reading and writing into beamline

database– MiniKappa incorporation with STAC

Page 32: More speed, more data, more automation, more work? Alun Ashton

DNA

• Acknowledgements – Cambridge -MRC– Diamond– EMBL Grenoble– EMBL Hamburg – ESRF – GlobalPhasing– Soleil– SRD Daresbury

– Brookhaven– Users

• DNA 2.0…..

Page 33: More speed, more data, more automation, more work? Alun Ashton

ISPyB

• Management of experimental data produced in protein crystallography

• Management of experiment related information (shipping of samples, beam time allocation, safety information…)

• Tracking your progress through the experimental process:– Retrieves information from DataCollection automatically– Stores both Beamline and Experimental information– Allows disparate groups to monitor projects– Communicates with other systems (Sample Changer, DNA, …)– Portable Interface (using PDA + wireless DataMatrix reader) to track

Samples– User friendly web interface– Custom interface and access restricted based on privileges– Generates report

Page 34: More speed, more data, more automation, more work? Alun Ashton

23/11/2005 http://ispyb.esrf.fr

ISPyB: Webservice or web based user interface …

Webservices available for:

• Crystal details

• Shipment

• Diffraction and Screening plan

• Diffraction results

Page 35: More speed, more data, more automation, more work? Alun Ashton

23/11/2005 http://ispyb.esrf.fr

LIMS

DataBaseSolange DelageniereRicardo Leal

Darren SpruceDominique Porte & MIS GroupLilian CardonneMatias GuijarroOlof SvenssonJose Gabadinho

Collaboration to develop joint system ISPyB & associated

BM14

eHTPX

eHTPX members and associated collaborations

Ludovic LaunerMartin Walsh

Hugo CaserottoMax NanaoJean_Baptiste ReiserHassan Belrhali

Laurent Geoffroy (Maatel)

David Stuart, Robert Esnouf Oxford, Colin Nave, Rob Allan, Martyn Winn, Daresbury, Kim Henrick EBI, Kevin Cowtan York, Martin Walsh Grenoble

DEVELOPERS:Chris Mayo, Ian Berry (Oxford) Graeme Winter, Ronan Keegan, David Meredith (Daresbury) Joel Fillon (EBI),Paul Young (York), Ludovic Launer (Grenoble)

Florent CiprianiFranck FelisazJean-Sebastien Aksoy Bernard LavaultArnaud ClereJulien HuetS. Cusack

Page 36: More speed, more data, more automation, more work? Alun Ashton

Where does everything fit?

Synchrotron

Crystallization

PIMS(Protein

Production)

Data Processing & Structure

SolutionPipelines

CollectionDB

e-HTPX

Page 37: More speed, more data, more automation, more work? Alun Ashton

Remote data collection

• Remote data monitoring– ISPyB

• Remote experiment monitoring– ISPyB

• Remote experiment control– GDA– VNC

• eInfrastructure!

Page 38: More speed, more data, more automation, more work? Alun Ashton

10 second pause

Page 39: More speed, more data, more automation, more work? Alun Ashton

How do MX ‘legacy’ projects bespoke solutions fit into a bigger

picture?

More work!

Page 40: More speed, more data, more automation, more work? Alun Ashton

e-Science Infrastructure for Diamond Light Source

Page 41: More speed, more data, more automation, more work? Alun Ashton

Phase 1

• Single Sign On • Automatic cataloguing of data and metadata

relating to a scientific experiment. • Backup all Diamond’s data to the Atlas Data

Centre for long term storage. • Be able to view and retrieve your data. • Works in conjunction with Diamonds current

computing infrastructure. • Backbone for further e-Science work

Page 42: More speed, more data, more automation, more work? Alun Ashton

Single Sign On

Page 43: More speed, more data, more automation, more work? Alun Ashton

GDA DDHStorageD

Data / metadata

NexusFile

& Data

DUO DUO Desk

IKitten

DLS ICAT

SRB

People DB

Active Directory

Diamond, CICT

Modified by e-Science

DataPortalDiamondProposal

Web pages

AtlasData Store

Page 44: More speed, more data, more automation, more work? Alun Ashton

GDA DDHStorageD

Data / metadata

NexusFile

& Data

DUO DUO Desk

IKitten

DLS ICAT

SRB

People DB

Active Directory

Diamond, CICT

Modified by e-Science

DataPortalDiamondProposal

Web pages

AtlasData Store

Page 45: More speed, more data, more automation, more work? Alun Ashton

SRB in practice

Page 46: More speed, more data, more automation, more work? Alun Ashton

What Next?

• Work towards live collection of data on Beamlines. Gain operational experience.

• Have a consultation period with scientist to get feedback on the work and input into what metadata to collect.

• Work closer with science community to understand what metadata best describes the experiments.

• Add analytical framework.

Page 47: More speed, more data, more automation, more work? Alun Ashton

What's really next?

• More work!

• Plenty of software to demonstrate

Page 48: More speed, more data, more automation, more work? Alun Ashton

Acknowledgements