project athena overview project athena: origins the world modeling summit (wms) in may 2008 called...
Post on 19-Dec-2015
218 Views
Preview:
TRANSCRIPT
Project Athena Overview
Project Athena: Origins· The World Modeling Summit (WMS) in May 2008 called for
a revolution in climate modeling to more rapidly advance improvements in accuracy and reliability
· The WMS recommended petascale supercomputers dedicated to climate modeling based in at least 3 international facilities
· Dedicated petascale machines are needed to provide enough computational capability and a controlled environment to support long runs and the management, analysis and stewardship of very large (petabyte) data sets
· The U.S. National Science Foundation, recognizing the importance of the problem, realized that a resource (Athena) was available to meet the challenge of the World Modeling Summit and offered to dedicate the Athena supercomputer for 6 months in 2009-2010
· An international collaboration was formed among groups in the U.S., Japan and the U.K. to use Athena to take up the challenge
Project Athena Overview
COLA - Center for Ocean-Land-Atmosphere Studies, USA (NSF-funded)ECMWF - European Centre for Medium-range Weather Forecasts, UKJAMSTEC - Japan Agency for Marine-Earth Science and Technology, Research Institute for Global Change, JapanUniversity of Tokyo, JapanNICS - National Institute for Computational Sciences, USA (NSF-funded)Cray Inc.
Project Athena: Collaborating Groups
CodesNICAM: Nonhydrostatic Icosahedral Atmospheric ModelIFS: ECMWF Integrated Forecast System
SupercomputersAthena: Cray XT4 - 4512 quad-core Opteron nodes (18048)
#30 on Top500 list (November 2009) – dedicated Oct’09 – Mar’10
Kraken: Cray XT5 - 8256 dual hex-core Opteron nodes (99072)
#3 on Top500 list (November 2009) replaced Athena – allocation of 5M SUs
Straus/GMU/COLA AGU Dec 2010 3
Athena Experiments
http://wxmaps.org/athena/home/
Surface pressure Potential Vorticity
Project Athena Overview
Blocking Index. 13 month integrations of ECMWF model (at T159 and T1259). DJFM 1960-2003
ERA-40
T159
T1259
500 hPa Geopotential height – ERA DJFM 1960-2007
DJFM Weather Regimes Euro-Atlantic Region
Project Athena Overview
Winds@850 hPa: DJF 1990-2007
60W100W140W180140E100E
40N
20N
0
20S
40S60W100W140W180140E100E
40N
20N
0
20S
40S
60W100W140W180140E100E
40N
20N
0
20S
40S60W100W140W180140E100E
40N
20N
0
20S
40S
b T511- T159
d T2047- T1279
a T159- ERA
c T1279- T511
5.0m/s
Straus/GMU/COLA AGU Dec 2010 8
Aircraft observations showing spectra of wind components and T, plotting log(E) vs. log(k), so that the slope of the straight lines indicate the exponent n in the previous slide.
Other in-situ observations have confirmed these results!
Atmospheric SpectraPower LawsTwo scaling regimes:
Log-log plot ofEnergy vs. wavenumber
100 – 10 km
Straus/GMU/COLA AGU Dec 2010 9
ECMWF Dec-March Simulations: Eddy Kinetic Energy Spectrum 250 hPa
Total Eddy Kinetic Energy ECMWFBlack: T511 (40 km grid)Red: T1279 (16 km grid)Blue: T2047 (10 km grid)250 hPa 5 DJF seasons
Note sudden downturn in spectra:suggests dissipation regime
Hint of two regimes at T1279 and T2047but not at T511
Straus/GMU/COLA AGU Dec 2010 10
Straus/GMU/COLA AGU Dec 2010 11
Slope of Total Eddy Kinetic EnergyBlack: T511 (40 km grid)Red: T1279 (16 km grid)Blue: T2047 (10 km grid)250 hPa level 5 DJF seasons
Local Spectral Slope b En ~ n-b
Least squares fit of log10(eddy kinetic energy) to line with slope –b , locally over a range of constant log10(n)
Large slope indicates dissipation regime
y-axis is bx-axis is log10(n)
T2047, T1279 show weak shallowing of spectra at higher wavenumbers
Straus/GMU/COLA AGU Dec 2010 17
Athena Results
• Seasonal Length Runs
• Results shown for 5 DJF seasons and 5 JJA seasons
• Results for both ECMWF and NICAM models
Cluster analysis methodology
1) Identification of clusters in the reduced phase space defined by the empirical orthogonal functions (EOFs. The leading EOFs (to explain about
80% of the space-time variance) are retained. 2) For a given number k of clusters, the optimum partition of data into k
clusters is found by an algorithm that takes an initial cluster assignment (based on the distance from pseudorandom seed points), and iteratively changes it by assigning each element to the cluster with the closest centroid, until a ‘‘stable’’ classification is achieved. (A cluster centroid is defined by the average of the PC coordinates of all states that lie in that cluster.)
3) This process is repeated many times (using different seeds), and for each partition the ratio r*k of variance among cluster centroids (weighted by
the population) to the average intra-cluster variance is recorded.
4) The partition that maximises this ratio is the optimal one.
The (modified) K-means cluster analysis method (K is the number of clusters into which the data will be grouped, this number must be specified in advance) (Straus et al. 2007) can be summarized in the following four steps:
Cluster analysis - Significance
The goal is to assess the strength of the clustering compared to that expected from an appropriate reference distribution, such as a multidimensional Gaussian distribution.
In assessing whether the null hypothesis of multi-normality can be rejected, it is therefore necessary to perform Monte-Carlo simulations using a large number M of synthetic data sets. Each synthetic data set has precisely the same length as the original data set against which it is compared, and it is generated from a series of n dimensional Markov processes, whose mean, variance and first-order auto-correlation are obtained from the observed data set.
A cluster analysis is performed for each one of the simulated data sets. For each k-partition the ratio rmk of variance among cluster
centroids to the average intra-cluster variance is recorded. Since the synthetic data are assumed to have a unimodal distribution, the proportion Pk
of red-noise samples for which rmk < r*k is a measure of the significance of the k-cluster partition of the actual data, and 1- Pk
is the corresponding confidence level for the existence of k clusters.
Cluster analysis - How many clusters?
The need of specifying the number of clusters can be a disadvantage of K-means method if we don’t know in advance what is the best cluster partition of the data set in question. However there are some criteria that can be used to choose the optimal partition.Significance: partition with the highest significance with respect to predefined Multinormal distributions
Reproducibility: We can use as a measure of reproducibility the ratio of the mean-squared error of best matching cluster centroids from a N pairs of randomly chosen half-length datasets from the full actual one. The partition with the highest reproducibility will be chosen.
Consistency: The consistency can be calculated both with respect to variable (for example comparing clusters obtained from dynamically linked variables) and with respect to domain (test of sensitivities with respect to the lateral or vertical domain).
top related