![Page 1: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/1.jpg)
需完成之平行化工作需完成之平行化工作
1. 平行化 domain decomposition 之方案確定。2. timcom 之 preprocessor with f95 and dynamic
allocated memory (inmets, indata, bounds)3. timcom main code with f95 and dynamic allocated
memory4. EVP solver with f95 and dynamic allocated memory5. Subroutines a2o/o2a with cross cpu core data exchange.
或6. Timcom 改寫為每 cpu core 可同時處理南北半球。7. Netcdf input and output 。
![Page 2: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/2.jpg)
Domain DecompositionDomain Decomposition 方案方案
1) 採 timcom 之架構。需要修改 a2o/o2a 等subroutine ,使其可以跨 node 來交換timcom 及 echam 之資料。優 :y 方向 ghost zone 之傳輸量為 2) 之一半。缺 : 需額外跨core 交換 (llon*llat-2*ng*llat) 之資料。
2) 採 echam 之架構。如採此,則每一個 cpu core 皆需同時計算南北半球之海洋 domain,這在 timcom 需修改部份 code 。優 : 同樣cpu 數下,會比 1) 快,因跨 core 之交換資料llon>2 之條件較少。
![Page 3: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/3.jpg)
1 5 9
2 6 10
3 7 11
4 8 12
4 8 12
3 7 11
2 6 10
1 5 9
nproca(4 )
nprocb (3)glon
glat EQ
![Page 4: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/4.jpg)
J0
J1(jon0)
2(jos0
)
12
(jow0)I1
(ioe0)I0
YVDEG(J0), YV(J0)
YDEG(J0), Y(J0)
Y1DEG, YVDEG(J1)
YVDEG(3)
YDEG(3)
YVDEG(2), YV(2)
YVDEG(2), Y(2)
Y0DEG, YV(1)YVDEG(1), YDEG(1)
ngX
Y
DX(J0)DY(J0)
DYV(J1)
DYV(3)DY(3)
DX(3)
DX(2)DY(2)
DYV(2)
X0DEG X1DEG
![Page 5: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/5.jpg)
![Page 6: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/6.jpg)
Parallel ConsiderationParallel Consideration
目前這版本許多設定還有問題,因此一下子就會 crash 。但試一下是好的。
另如有可能,建議將目前 mo_ocean 中與原始 timcom 之同樣功能之 subroutine 併入 standalone 平行化之 timcom 版,以方便測試,看是否正常,尤其是希望可以發展中 ng>=2 之版之 f90, dynamic allocated memory 之單純海洋模式。這些測試有助於我們之後再併入 echam 。
![Page 7: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/7.jpg)
Information for whole ECHAM domain nlon : number of longitudes of the global domain nlat : number of latitudes of the global domain nlev : number of levels of the global domain
Information valid for all processes of a model instance nproca : number of processors for the dimension counts longitudes nprocb : number of processors for the dimension counts latitudes d_nprocs : number of processors used in the model domain
nproca × nprocb spe, epe : Index number of first and last processor which handles this
model domain mapmesh(ib,ia) : array mapping from a logical 2-d mesh to the
processor index numbers within the decomposition table global decomposition. ib=1, nprocb ; ia=1, nproca
![Page 8: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/8.jpg)
General local information pe : processor identifier. This number is used in the mpi send and
receive routines set_b : index of processor in the direction of longitudes. This number
determines the location within the array mapmesh. processors with ascending numbers handle subdomains with increasing longitudes.
set_a : index of processor in the direction of latitudes. This number determines the location within the array mapmesh. processors with ascending numbers handle subdomains with decreasing values of absolute latitudes.
![Page 9: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/9.jpg)
Grid space decomposition nglat , nglon : mumber of longitudes and latitudes in grid space handle
by this processor. nglpx : number of longitudes allocated. glats(1: 2), glate(1: 2) : start and end values of global latitude indices. glons(1: 2), glone(1: 2) : start and end values of global longitude
indices. glat (1: nglat) : global latitude index. glon(1: nglon) : offset to global longitude index.
![Page 10: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/10.jpg)
echam memory_g3b 等變數 ( 如 sitwt, sitwu ,皆是 local 之變數。並不是基於一個 main scatter 出去然後 collect 各processors 的 。而是各個 node 分別計算而來。只是 echam 其排列方式仍與timecom 不同。
![Page 11: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/11.jpg)
The Lin-Rood Finite Volume The Lin-Rood Finite Volume (FV) Dynamical Core:(FV) Dynamical Core:
TutorialTutorial
Christiane Jablonowski
National Center for Atmospheric Research
Boulder, Colorado
NCAR Tutorial, May / 31/ 2005
![Page 12: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/12.jpg)
Topics that we discuss todayTopics that we discuss today
The Lin-Rood Finite Volume (FV) dynamical coreThe Lin-Rood Finite Volume (FV) dynamical core– History: where, when, who, …History: where, when, who, …– Equations & some insights into the numericsEquations & some insights into the numerics– Algorithm and code designAlgorithm and code design
The gridThe grid– Horizontal resolutionHorizontal resolution– Grid staggering: the C-D grid conceptGrid staggering: the C-D grid concept– Vertical grid and remapping techniqueVertical grid and remapping technique
Practical advice when running the FV dycorePractical advice when running the FV dycore
– Namelist and netcdf variables variables (input & output)Namelist and netcdf variables variables (input & output)
– Dynamics - physics couplingDynamics - physics coupling
Hybrid parallelization conceptHybrid parallelization concept
– Distributed-shared memory parallelization approach: MPI and OpenMPDistributed-shared memory parallelization approach: MPI and OpenMP
Everything you would like to knowEverything you would like to know
![Page 13: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/13.jpg)
Who, when, where, …Who, when, where, …
FV transport algorithm developed by S.-J. Lin and Ricky Rood (NASA GSFC) in 1996
2D Shallow water model in 1997 3D FV dynamical core around 1998/1999 Until 2000: FV dycore mainly used in data assimilation system at
NASA GSFC Also: transport scheme in ‘Impact’, offline tracer transport In 2000: FV dycore was added to NCAR’s CCM3.10 (now CAM3) Today (2005): The FV dycore
– might become the default in CAM3
– Is used in WACCAM
– Is used in the climate model at GFDL
![Page 14: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/14.jpg)
Dynamical cores of General Circulation ModelsDynamical cores of General Circulation Models
Dynamics
Physics
FV: No explicit diffusion (besidesdivergence damping)
![Page 15: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/15.jpg)
The NASA/NCAR finite volume dynamical coreThe NASA/NCAR finite volume dynamical core
3D hydrostatic dynamical core for climate and weather prediction:– 2D horizontal equations are very similar to the shallow water equations
– 3rd dimension in the vertical direction is a floating Lagrangian coordinate: pure 2D transport with vertical remapping steps
Numerics: Finite volume approach– conservative and monotonic 2D transport scheme
– upwind-biased orthogonal 1D fluxes, operator splitting in 2D
– van Leer second order scheme for time-averaged numerical fluxes
– PPM third order scheme (piecewise parabolic method)for prognostic variables
– Staggered grid (Arakawa D-grid for prognostic variables)
![Page 16: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/16.jpg)
The 3D Lin-Rood Finite-Volume Dynamical CoreThe 3D Lin-Rood Finite-Volume Dynamical Core
v h
t ( f )
k
v h
(K D)
p0
0
vpt
p
0)()(
vpt
p
Momentum equation in vector-invariant form
Continuity equation
Thermodynamic equation, also for tracers (replace ):
The prognostics variables are: zgpvu ,,,
p: pressure thickness, =Tp-: scaled potential temperature
1
p Pressure gradient term
in finite volume form
![Page 17: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/17.jpg)
Finite volume principleFinite volume principle
p
t p
v 0
Continuity equation in flux form:
tn
tn1
p
tddt
tn
tn1
p
v dtd0
Adp
dttn
tn1
dt t
F d0
Integrate over one time step t and the 2D finite volume with area A:
Integrate and rearrange:
F : Time-averaged
numerical flux
p : Spatially-averagedpressure thickness
![Page 18: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/18.jpg)
Finite volume principleFinite volume principle
dp
dttn
tn1
dt t
A
F ˆ n dl 0
Apply the Gauss divergence theorem:
ˆ n : unit normal vector
p n1 p n t
A
F i
i1
4
ˆ n iliDiscretize:
t
Ai, j
xi, j
1
2
Gi, j
1
2
xi, j
1
2
Gi, j
1
2
p i, jn1 p i, j
n t
Ai, j
yi
1
2, j
Fi
1
2, j y
i1
2, j
Fi
1
2, j
F F,G T
![Page 19: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/19.jpg)
Orthogonal fluxes across cell interfaces
G i,j-1/2
G i,j+1/2
F i+1/2,jF i-1/2,j
F: fluxes in x directionG: fluxes in y direction
Flux form ensures mass conservation
(i,j)
Wind directionUpwind-biased:
![Page 20: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/20.jpg)
Quasi semi-Lagrange approach in x direction
G i,j-1/2
G i,j+1/2
F i+1/2,jF i-5/2,j (i,j)
CFLx = u * t/y > 1 possible: implemented as an integer shift and fractional flux calculation
CFLy = v * t/y < 1 required
![Page 21: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/21.jpg)
Numerical fluxes & Numerical fluxes &
subgrid distributionssubgrid distributions 1st order upwind
– constant subgrid distribution 2nd order van Leer
– linear subgrid distribution 3rd order PPM (piecewise parabolic method)
– parabolic subgrid distribution ‘Monotonocity’ versus ‘positive definite’ constraints Numerical diffusion
Explicit time stepping scheme: Requires short time steps that are stable for the fastest waves (e.g. gravity waves)
CGD web page for CAM3:http://www.ccsm.ucar.edu/models/atm-cam/docs/description/
![Page 22: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/22.jpg)
Subgrid distributions:Subgrid distributions:constant (1st order)constant (1st order)
x1 x3 x4x2
u
![Page 23: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/23.jpg)
Subgrid distributions:Subgrid distributions:piecewise linear (2nd order)piecewise linear (2nd order)
x1 x3 x4x2
u
van Leer
See details in van Leer 1977
![Page 24: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/24.jpg)
Subgrid distributions:Subgrid distributions:piecewise parabolic (3rd order)piecewise parabolic (3rd order)
x1 x3 x4x2
u
PPM
See details in Carpenter et al. 1990 and Colella and Woodward 1984
![Page 25: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/25.jpg)
Monotonicity constraintMonotonicity constraint
x1 x3 x4x2
u
van Leer
Monotonicity constraint resultsin discontinuities
not allowed
• Prevents over- and undershoots• Adds diffusion
See details of the monotinity constraint in van Leer 1977
![Page 26: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/26.jpg)
Simplified flow chartSimplified flow chart
stepon dynpkg
physpkg
cd_core
te_map
trac2d
p_d_coupling
c_sw 1/2 t only: compute C-grid time-mean winds
d_sw full t: update all D-grid variables
subcycled
Verticalremapping
d_p_coupling
![Page 27: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/27.jpg)
vu
Grid staggerings (after Arakawa)
A gridB grid
u
v
vv
v u
u
u
v
v v
v
uu
uu
D gridC grid
Scalars:
,p
![Page 28: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/28.jpg)
Regular latitude - longitude gridRegular latitude - longitude grid
• Converging grid lines at the poles decrease the physical spacing x• Digital and Fourier filters remove unstable waves at high latitudes• Pole points are mass-points
![Page 29: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/29.jpg)
Typical horizontal resolutionsTypical horizontal resolutions
• Time step is the ‘physics’ time step:• Dynamics are subcyled using the time step t/nsplit• ‘nsplit’ is typically 8 or 10
CAM3: check (dtime=1800s due to physics ?) WACCAM: check (nsplit = 4, dtime=1800s for 2ox2.5o ?)
x Lat x Lon Max. x (km) t (s) ≈ spectral
4o x 5o 46 x 72 556 7200 T21 (32x64)
2o x 2.5o 91 x 144 278 3600 T42 (64x128)
1o x 1.25o 181 x 288 139 1800 T85 (128x256)
Defaults:
![Page 30: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/30.jpg)
Idealized baroclinic wave test caseIdealized baroclinic wave test case
Jablonowski and Williamson 2005
The coarse resolution does not capture the evolution of the baroclinic wave
![Page 31: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/31.jpg)
Idealized baroclinic wave test caseIdealized baroclinic wave test case
Finer resolution: Clear intensification of the baroclinic wave
![Page 32: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/32.jpg)
Idealized baroclinic wave test caseIdealized baroclinic wave test case
Finer resolution: Clear intensification of the baroclinic wave, it starts to converge
![Page 33: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/33.jpg)
Idealized baroclinic wave test caseIdealized baroclinic wave test case
Baroclinic wave pattern converges
![Page 34: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/34.jpg)
Idealized baroclinic wave test case:Idealized baroclinic wave test case:Convergence of the FV dynamicsConvergence of the FV dynamics
Solution starts converging at 1deg
Global L2 error norms of ps
Shaded region indicates the uncertainty of thereference solution
![Page 35: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/35.jpg)
Floating Lagrangian vertical coordinateFloating Lagrangian vertical coordinate
• 2D transport calculations with moving finite volumes (Lin 2004)• Layers are material surfaces, no vertical advection• Periodic re-mapping of the Lagrangian layers onto reference grid
• WACCAM: 66 vertical levels with model top around 130km• CAM3: 26 levels with model top around 3hPa (40 km)• http://www.ccsm.ucar.edu/models/atm-cam/docs/description/
![Page 36: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/36.jpg)
Physics - Dynamics couplingPhysics - Dynamics coupling
Prognostic data are vertically remapped (in cd_core) before dp_coupling is called (in dynpkg)
Vertical remapping routine computes the vertical velocity and the surface pressure ps
d_p_coupling and p_d_coupling (module dp_coupling) are the interfaces to the CAM3/WACCAM physics package
Copy / interpolate the data from the ‘dynamics’ data structure to the ‘physics’ data structure (chunks), A-grid
Time - split physics coupling: – instantaneous updates of the A-grid variables – the order of the physics parameterizations matters– physics tendencies for u & v updates on the D grid are collected
![Page 37: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/37.jpg)
Practical tipsPractical tips
What do IORD, JORD, KORD mean? IORD and JORD at the model top are different (see cd_core.F90) Relationship between
– dtime – nsplit (what happens if you don’t select nsplit or nsplit =0,
default is computed in the routine d_split in dynamics_var.F90)– time interval for the physics & vertical remapping step
Namelist variables:
Input / Output: Initial conditions: staggered wind components US and VS
required (D-grid) Wind at the poles not predicted but derived
User’s Guide: http://www.ccsm.ucar.edu/models/atm-cam/docs/usersguide/
![Page 38: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/38.jpg)
Practical tipsPractical tips
IORD, JORD, KORD determine the numerical scheme–IORD: scheme for flux calculations in x direction
–JORD: scheme for flux calculations in y direction
–KORD: scheme for the vertical remapping step Available options:
• - 2: linear subgrid, van-Leer, unconstrained
• 1: constant subgrid, 1st order
• 2: linear subgrid, van Leer, monotonicity constraint (van Leer 1977)
• 3: parabolic subgrid, PPM, monotonic (Colella and Woodward 1984)
• 4: parabolic subgrid, PPM, monotonic (Lin and Rood 1996, see FFSL3)
• 5: parabolic subgrid, PPM, positive definite constraint
• 6: parabolic subgrid, PPM, quasi-monotone constraint Defaults: 4 (PPM) on the D grid (d_sw), -2 on the C grid (c_sw)
Namelist variables:
![Page 39: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/39.jpg)
‘‘Hybrid’ Computer Architecture Hybrid’ Computer Architecture
• SMP: symmetric multi-processor• Hybrid parallelization technique possible:• Shared memory (OpenMP) within a node • Distributed memory approach (MPI) across nodes
Example: NCAR’s Bluesky (IBM) with 8-way and 32-way nodes
![Page 40: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/40.jpg)
Schematic parallelization technique Schematic parallelization technique
NP
SP
Eq.
1D Distributed memory parallelization (MPI) across the latitudes:
Proc.
1
4
3
2
Longitudes0 340
![Page 41: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/41.jpg)
Schematic parallelization technique Schematic parallelization technique
NP
SP
Eq.
Each MPI domain contains ‘ghost cells’ (halo regions):copies of the neighboring data that belong to different processors
Proc.
2
Longitudes0 340
3 ghostcells for PPM
![Page 42: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/42.jpg)
Schematic parallelization technique Schematic parallelization technique
Shared memory parallelization (in CAM3 most often) in the vertical direction via OpenMP compiler directives:
Typical loop:
do k = 1, plev …enddo
Can often be parallelized with OpenMP (check dependencies):!$OMP PARALLEL DO …do k = 1, plev …enddo
![Page 43: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/43.jpg)
Schematic parallelization technique Schematic parallelization technique
Shared memory parallelization (in CAM3 most often) in the vertical direction via OpenMP compiler directives:
e.g.: assume 4 parallel ‘threads’ anda 4-way SMP node (4 CPUs)!$OMP PARALLEL DO …do k = 1, plev …enddo
k CPU1
plev
1
2
3
4
4
5
8
![Page 44: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/44.jpg)
Thank you !Thank you !Any questions ???Any questions ???
Tracer transport ?Fortran code…
![Page 45: 需完成之平行化工作 1. 平行化 domain decomposition 之方案確定。 2. timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) 3. timcom](https://reader035.vdocuments.net/reader035/viewer/2022081421/5697bfca1a28abf838ca97ab/html5/thumbnails/45.jpg)
ReferencesReferences
Carpenter, R., L., K. K. Droegemeier, P. W. Woodward and C. E. Hanem 1990: Application of the Piecewise Parabolic Method (PPM) to Meteorological Modeling. Mon. Wea. Rev., 118, 586-612
Colella, P., and P. R. Woodward, 1984: The piecewise parabolic method (PPM) for gas-dynamical simulations. J. Comput. Phys., 54,174-201
Jablonowski, C. and D. L. Williamson, 2005: A baroclinic instability test case for atmospheric model dynamical cores. Submitted to Mon. Wea. Rev.
Lin, S.-J., and R. B. Rood, 1996: Multidimensional Flux-Form Semi-Lagrangian Transport Schemes. Mon. Wea. Rev., 124, 2046-2070
Lin, S.-J., and R. B. Rood, 1997: An explicit flux-form semi-Lagrangian shallow water model on the sphere. Quart. J. Roy. Meteor. Soc., 123, 2477-2498
Lin, S.-J., 1997: A finite volume integration method for computing pressure gradient forces in general vertical coordinates. Quart. J. Roy. Meteor. Soc., 123, 1749-1762
Lin, S.-J., 2004: A ‘Vertically Lagrangian’ Finite-Volume Dynamical Core for Global Models. Mon. Wea. Rev., 132, 2293-2307
van Leer, B., 1977: Towards the ultimate conservative difference scheme. IV. A new approach to numerical convection. J. Comput. Phys., 23. 276-299