william o'mullane european space astronomy centre 1 inauguration of institute for data...
TRANSCRIPT
William O'Mullane European Space Astronomy Centre1
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Data Intensive Engineering and Science
Gaia and Virtualisation
Bringing the process to the data – Virtualization?
William O’MullaneGaia Science Operations Development Manager
European Space Astronomy Centre (ESAC)
Madrid,Spain
http://www.rssd.esa.int/Gaia
William O'Mullane European Space Astronomy Centre2
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
SatelliteMission:
Stereoscopic Census of Galaxy arcsec Astrometry G<20 (10^9
sources)Radial Velocities G<16Photometry G < 20
• Status: ESA Corner Stone 6
– ESA provide the hardware and launch Launch: Spring 2012. Satellite In development
EADS/Astrium
William O'Mullane European Space Astronomy Centre3
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Cruise to L2
Graphic -EADSAstrium
William O'Mullane European Space Astronomy Centre4
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Lissajous Scanning L2Full sky 3 fold every six months5 year coverage
Lindegren
William O'Mullane European Space Astronomy Centre5
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Giga Pixel Focal Plane
Image motion
22
106 CCDs , 938 million pixels, 2800 cm2
Star motion in 10 s
Astrometric Field CCDs
Blue P
hotometer C
CD
s
Sky Mapper CCDs
Red P
hotometer C
CD
s Radial-Velocity Spectrometer
CCDs
Basic Angle
Monitor
Wave Front Sensor
Basic Angle
Monitor
Wave Front Sensor
42.3
5cm
104.26cm
William O'Mullane European Space Astronomy Centre6
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Pixels and images ….• Need the astrometric centroid of the CCD
image determined to an accuracy of 1% of the pixel size!
• There will be 10^12 images• Images are ‘windowed’ on board
– Only binned windows are down linked• Never actually get to see the Gaia ‘picture’
• Milimag Photometry also difficult (calib)• Spectra – serious blending problems• And then there is radiation (CTI effects)
William O'Mullane European Space Astronomy Centre7
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Processing• Done by the community
– Data Processing and Analysis Consortium (DPAC)– ~360 active (means >=10%) participants – Divied in 9 Coordination Units (CU) – Top level Executive (DPACE) and Project Office (PO)– 6 Data Processing Centres to run Software
• All code in Java (only one exception ) – for portability have to run till 2020– Maintainability, testability etc.. JUnit, Hudson– Easier to write CORRECT code in higher level language
• Fewer Core Dumps• OK so its replaced by the NPE (Null Pointer Exception)
• Several ‘Relational’ Databases – Oracle– Postgress– MySql– Derby for Testing
William O'Mullane European Space Astronomy Centre8
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
ArchitectureHighly distributed
Multiple independent DPCs and CUs
Want/need decoupleReduce
dependencies
risk
Hub and spokes
Max flexibility for CUs and DPCs
Minimum ICDs = Interface Control Document
William O'Mullane European Space Astronomy Centre9
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
ESA/SOC and DPAC• The Science Operations Centre (SOC) is
funded by ESA (part of Gaia CAC)– Will carry out Science Operations of Gaia – Get the data to DPAC for processing
• Initial processing software to run in ESAC
• SOC is also embedded directly in the DPAC structure since outset:– Provides Architecture and Technical
advice/guidance (CU1) • CU1 also has CNES and other DPC people• All CU leaders in Executive
– Provides one of the six Data Processing Centres– Provides Technical support for Core processing
• Specifically significant effort in Astrometric Solution
William O'Mullane European Space Astronomy Centre10
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS• Astrometric Global Iterative Solution
(Lindegren,Lammers)• Provide rigid independent reference frame for Gaia
Observations– rotate to ICRS using quasars
• perhaps about 10% of all the processing – only deals with about 10% of data (well behaved stars to make
the grid)
• Block iterative solution– using Gauss-Seidel “preconditioner” with simple iterations– Moving to Conjugate Gradient
• Collaboration with Yoshiyuki Yamada for Nano Jasmine processing – Picardo – last year, Parache now
William O'Mullane European Space Astronomy Centre11
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Global Iterative Solution
Sky scans(highest accuracy
along scan)
Scan width: 0.7°
1. Objects are matched in successive scans2. Attitude and calibrations are updated3. Objects positions etc. are solved4. Higher-order terms are solved5. More scans are added6. Whole system is iterated
William O'Mullane European Space Astronomy Centre12
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS – Observation ModelThe centroid of a star image is modelled as
noiseoffset
CCD/pixel
Attitude
instrument
position
Source
frame ref.
Global
location
Observed
6 astrometricparameters
fixed
r 000
geom. calibration +
chromaticity +
CTI shift
e.g.PPN white
gaussian, known
nCASGO Symbolically:
quaternion q(t)represented by
cubic splinecoefficients
William O'Mullane European Space Astronomy Centre13
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS – How?Block-iterative least-squares solution of the over-determined system of equations
initialize S, A, C, G
GCAOS
GCSOA
GASOC
CASOG
one star at a time
one attitude interval at a time
one calibration unit a time + renormalise*
for the whole data set
iterate until convergence renormalise** Sand adjust A
* defines origin of instrument axes** defines origin of celestial axes
(order of operations may vary)
nCASGO
William O'Mullane European Space Astronomy Centre14
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS Architecture
Optimised AGISDatabase
SourceAttitudeGlobalCalibration
Datatrains drive through AGIS Database passing observations to algorithms.
SourceAttitudeGlobalCalibrationSourceAttitudeGlobalCalibration
SourceAttitudeGlobalCalibrationSourceAttitudeGlobalCalibration
There can be as many Datatrains in parallel as we wish
SourceAttitudeGlobalCalibration
ElemetaryTakers
ObjectFactoryStore GaiaTable
Data Access Layer
AstroElementary
William O'Mullane European Space Astronomy Centre15
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Scheduling• Very simple ..
Keep all machines busy all the time!Busy = CPU ~90%
Post jobs on whiteboard (like OPUS blackboard)
Job 1Job 2Job 3Job 4Job ..Job N
Trains/Workers Mark Jobs – and do them
*
*
Mark finished – repeat until done
**
*
Previous attempt had much more general scheduling It was also ~1000 times slower.
William O'Mullane European Space Astronomy Centre16
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Some important architecture pointsFrom the outset we try in Gaia to:• Keep it just as simple as possible• Isolate algorithms form Data
– Already tried to virtualize algorithms
• Let Data drive the system (DataTrain)– Algorithms mostly not allowed to ‘query’– Specific data access patterns Data orgainised accordingly– Similar to Ferris-Wheel idea (Szalay ) but no hopping on/off!
• Access any piece of data on disk exactly once – preload some data on each node– E.g. 5 years attitude quaternions fit in 150-250Mb
• Be distributed – try to avoid large memory processes– Again in some cases it makes sense ..
William O'Mullane European Space Astronomy Centre17
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Notes on AGIS Implementation• Highly distributed usually running on >40
nodes has run on >100 (1400 threads).• Only uses Java no special MPI libraries
needed – new languages come with almost all you need.– Hard part is breaking problem in
distributable parts – no language really helps with that.
• Truly portable – can run on laptops desktops, clusters and even Amazon cloud.
William O'Mullane European Space Astronomy Centre18
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS Evolution –selected iterations
• Assuming:– Need 40 Iterations – more complexity – scaling to full data – availability of a x25 better machine ~10TFLOP
• Final AGIS would take ~50days in ESAC
William O'Mullane European Space Astronomy Centre19
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Efficiency - an asideHigh load, low network, high CPU!Some HPC people tell us we should rewrite in C (save energy)
• On SOME machines C is faster • The energy bill is large (see later)• The (re)coding effort also large• IMHO Cost more than the energy• Maintainability ? (to 2020)
Supercomputer centres seem to have very specific macros to include in C code to make it efficient for THEIR machine.
• looks a little like a virtual machine • Why not provide a better JVM for
their machine ?• Or windows CLR ?
William O'Mullane European Space Astronomy Centre20
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Virtualization• Started looking at virtualization ~2007• Seemed ideal for the multiple test setups
needed (VMWARE)• Agreed Cloud experiment 2009 (with
Parsons)• Had to be convincing• RUN AGIS obvious choice
– Already 4 years in development– In Java
• so its portable right!
William O'Mullane European Space Astronomy Centre21
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS on the cloud• Took one person less than one week to get
running (Parsons,Olias).– Main problem DB config– Also found scalability problem in our code (never
had one hundred nodes before)• It ran at similar performance to our in house
cheap cluster.– E2C indeed is no super computer– Oracle image was available already– AGIS image was straightforward to construct but
was time consuming – better get it correct !• Availability of large number of nodes very
interesting –not affordable in house.
William O'Mullane European Space Astronomy Centre22
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Cost effectiveness of E2C.• AGIS runs intermittently with growing Data volume.
• Estimate 2015 ~1.1MEuro (machine) + 3Meuro (energy bill more?) = ~4Meuro– In fact staggered spending for machines– buy machines as data volume increase
• Estimate on Amazon at today prices -340K for final run + 1.7MEuro for intermittent runs (less data) = ~2Meuro– Possibility to use more nodes and finish faster !
• Reckon you still need in house machine to avoid wasting time testing on E2C
• Old nut, Vendor lock-in ? (Sayeed railways..)
William O'Mullane European Space Astronomy Centre23
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Final cloudy thought- the title!• Gaia Archive will have multi parameter data in time
series– Solar System sources – Galactic sources– Extra Galactic sources
• Cloud seems ideal way to allow complex access to an archive. - Ton Hey tells us MS is doing it – Amazon offer free storage for public datasets..
• Make Data available as Database on cloud– Provide VM to user – User codes in favorite language directly against DB api– BUT it runs local to the Data !
• Should we consider a new type of Archive?• Could VO = Virtualized Observatory ?
William O'Mullane European Space Astronomy Centre24
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Questions ??
Ariane V188 carrying Herschel and Planck (May 14 2009)
William O'Mullane European Space Astronomy Centre25
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
AGIS matrix
calibration (~106)
Filled
Sparse
Zeroes
s1s2s3 ... a1 a2 a3 ... c
source attitude calibration
source (5·108)
attitude (4·107)
0 0
0
0Gauss-Seidel
Pre-Conditioner
(Lammers)
William O'Mullane European Space Astronomy Centre26
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Data Train load
William O'Mullane European Space Astronomy Centre27
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
EC2 Instance Types
Small Large Extra Large
High CPU Medium
High CPU Large
Bits 32 64 64 32 64
RAM 1.7 GB 7.5 GB 15 GB 1.7 GB 7 GB
Disk 160 GB 850 GB 1690 GB 350 GB 1690 GB
EC2 Compute Units 1 4 8 5 20
I/O Performance Medium High High High High
Firewall Yes Yes Yes Yes Yes
William O'Mullane European Space Astronomy Centre28
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
Architecture in the Cloud
ConvergenceMonitor
ConvergenceMonitor
Attitude UpdateServer
Attitude UpdateServer
StoreStore
GaiaTable
Object FactoryObject Factory
AstroElementaryElementaryDataTrain
ElementaryDataTrain
Request AstroElementaries between a range (x,y)
Calibration CollectorCalibration Collector
Attitucde CollectorAttitucde Collector
Source CollectorSource Collector
Global CollectorGlobal Collector
Source CollectorSource Collector
Data Trains
AGIS DB
RunManagerRunManager
1x Large Instance
AGIS AMI
Elastic IP
<n> x Extra Large or High CPU Large instances
AGIS AMI1x Large instance
Oracle AMI
Elastic IP
3 x Extra Large instances
AGIS AMI
William O'Mullane European Space Astronomy Centre29
Inau
gura
tion
of
Inst
itut
e fo
r D
ata
inte
nsiv
e E
ngin
eeri
ng a
nd S
cien
ce
JHU
Bal
tim
ore
MD
, US
A. A
ugus
t 25,
26th 2
009
FLOPS and FLOP count estimatesCurrent hardware
18 Dual-processor, single-core Xeon blades8 Dual-processor, quad-core Xeon blades5 TB FibreChannel SANGives about 400 GFLOPS
Run time of one outer iteration: 1h/106 stars, so, 1 cycle with 40 iterations takes about 2d
CPU occupancy >90% (I/O never a problem)FLOP count estimates:
1.4 · 1020 for creation of final catalog2.2 · 1019 for final cycle [50d on 10 TFLOP machine]Estimate is based on a simple modelRegularly updated