backtrajectory analysis in the source apportionment ... 201902-crocchianti.pdf · backtrajectory...

84
Backtrajectory analysis in the Source Apportionment: models, methods and applications All models are wrong, but some are useful. (G. Box) Stefano Crocchianti [email protected] Dipartimento di Chimica, Biologia e Biotecnologie, Università di Perugia, Environmental Chemistry and Technologies & Consortium for computational molecular and materials sciences Via Elce di Sotto 8, Perugia (I-06123) 20 Feb 2019

Upload: others

Post on 28-Sep-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the SourceApportionment:

models, methods and applicationsAll models are wrong, but some are useful. (G. Box)

Stefano [email protected]

Dipartimento di Chimica, Biologia e Biotecnologie, Università di Perugia,Environmental Chemistry and Technologies &

Consortium for computational molecular and materials sciencesVia Elce di Sotto 8, Perugia (I-06123)

20 Feb 2019

Page 2: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

General motivationObjective: source localization & apportionment:

i.e. assess the extent of long range source contributions to evaluate thepollutants emitted by national sources.

Models: single particle calculations:1 atmospheric motions2 Lagrangian models potentialities and weaknesses3 spotlight on Hysplit, on/off-line pros and cons4 estimating accuracy (a.k.a. avoiding garbage in → garbage out)

Source localization through trajectories classifications:1 wind roses2 cluster analysis3 trajectories frequency

Page 3: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

General motivationSource localization through source-receptor correlation:

1 R language2 Openair3 PSCF4 CWT

Page 4: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Atmospheric Motions

Vertical

Convection(buoyancy)

Convergence/divergence ofhorizontal flows

Horizontal flow over topographicfeatures

Horizontal

Pressure gradient

Coriolis force

Page 5: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Turbulence

The atmospheric flow of the wind is turbulent: it is irregular and random andvaries randomly with time.

This randomness can be considered as generated by rotating air massestransferring momentum, heat, and mass between adjacent air parcels.

Turbulence can be caused by heat released in the atmosphere(thermal turbulence) and by air passing obstacles androughness of the surface (mechanical turbulence).

The length and time scales of these motions cover wide rangesfrom millimeters and seconds to thousands of kilometers anddays.

Page 6: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

TurbulenceMathematically, turbulence arises from the nonlinear term

U · ∇U

(U = wind velocity) into the Navier-Stokes equations.

Such nonlinearity can induce a chaotic behavior into the U(x , y , z , t) solution

in the sense that starting from slightly different initial conditions the systemdiverges at a sufficiently fast rate toward totally different final states. Example

Velocities are usually random and their exact values can never be predictedprecisely.

They are usually expressed as the sum of a slowly varying mean and a rapidfluctuation around it:

U = 〈U〉+ U′

U′ is associated with the irregular and stochastic nature of the motions.

Page 7: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Lagrangian models

Continuity EquationdCidt = E (t) + P(t)− D(t)− L(t)

where

E = emissionsD = depositions

P = chemical productionL = chemical loss

Page 8: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Lagrangian Particle Dispersion Models (LPDMs)

U = dxdt x(t0 + ∆t) = x(t0) + U(t0)∆t + . . .

Given a ∆t, the trajectory is calculated for points said endpoints.

NOTE: the calculation of a single trajectory imply neglecting the turbulent(stochastic) component U′

Lagrangian models (e.g. FlexTraj, FlexPart, Hysplit):have lower numerical transport errors than Eulerianare better suited for tracking the transport of pollutionare receptor oriented (source-receptor correlation)

Page 9: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Hysplit (https://www.ready.noaa.gov/HYSPLIT.php1)

Hybrid approach:

Uses:

Lagrangian for advection and diffusionEulerian to compute pollutant airconcentrations

Single particle trajectoriesLagrangian Particle Dispersion Model(LPDM)

1Shutdown permitting: from 26 Dec 2018 to 25 Jan 2019 web and ftp servers wereunavailable.

Page 10: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Hysplit Web interface

Note: limited to 500 trajectories/day

Page 11: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

On-line run

Model submitted on:Wed Dec 5 10:46:01 EST 2018

8.5 days (204 h)

Trajs generated on:Wed Dec 5 10:50:52 EST2018

4.85’ to calculate 3 backtrajectories propagated 204 hours (meteo resolution0.5× 0.5◦)The use of the off-line version is strongly recommended.

Page 12: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

hysplit4.tcl Tcl/Tk Graphical User Interface or cmdline

Selecting "Run Model" isequivalent to execute hyts_std

at the command line

"Setup Run" interface createsthe CONTROL file

Page 13: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

GDAS1 repository ftp://arlftp.arlhq.noaa.gov/archives/gdas1/

Note: 1 week 1× 1◦ data ≈ 580 MB

Page 14: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

GDAS05 repository ftp://arlftp.arlhq.noaa.gov/archives/gdas0p5/

Note: 1 day 0.5× 0.5◦ data ≈ 480 MB

Page 15: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectory inaccuracy2 (a.k.a. avoiding garbage in → garbage out)Main sources of inaccuracy are:

failing to consider turbulence:”Real air parcels of finite dimensions are drawn asunder and are beingdeformed by inhomogeneities of the wind field, by turbulent andconvective motions and by precipitation processes.[...] in general a single trajectory is not sufficient for a description of thepath and that the mass center of gravity of the air parcel does not exactlyfollow the path of the computed trajectory.”

Pflüger et al., German Weather Service, 1990uncertainties in source height;model-versus-actual terrain heights;numerical integration truncation errors;time and space winds interpolation errors;wind field errors (analysis/forecast).

2A. Stohl, Atm. Env. 32 (1998), 947-966

Page 16: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Estimating accuracy (assuming LPDM calculations are unfeasible)Trajectory ensemble3,4: multiple trajectories slightly offset

"If all the trajectories follow asimilar path then the user can haveconfidence that any one trajectoryrepresents the mean flow.However, if the trajectories divergesignificantly [...] the user needs toconsider using the dispersion modelto calculate many particletrajectories with turbulent mixing(dispersion) to best represent thepollutant transport." 5

3Kahl et al., Tellus, 41B (1989), 524-5364Stein et al., Bull. Amer. Meteor. Soc., 96, (2015), 2059-20775Rolph et al., Env. Mod. & Soft., 95 (2017), 210-228

Page 17: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Estimating accuracy (assuming LPDM calculations are unfeasible)reversing a trajectory6: compare a forward trajectory to a backward onestarted at the termination point of the forward.

6HYSPLIT Tutorial (last revision 21 June 2012)

Page 18: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Estimating accuracy (assuming LPDM calculations are unfeasible)different resolution of meteorological data: Hysplit v. 4.513

NOAA GDAS1 NOAA GDASp05period 1 Jan to 31 Dec 2015# trajs 4x1460 (4x4 t/d); 1h time step, 5 daysend points 50, 500, 1000, 3000 m a.g.l.meteo 1× 1◦ 584 MB/w 0.5× 0.5◦ 460 MB/dproc. 2x4 Intel Xeon E4572@3GHz 2x3 Intel Xeon E4572@3GHzelapsed t. 2:56 h 29:52 hp. mem. 148 MB 1.01 GBoutput 76 MB 76 MB

Page 19: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Effects of meteo resolution (Ny Ålesund, 10 Jul 2015)

Page 20: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Effects of meteo resolution (Ny Ålesund, 8 Apr 2015)

Page 21: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories Classification7Classification of air mass pathways can help to match trajectories andconcentrations.The statistical method to classify consists in grouping trajectories using amathematical technique

whereas the geographical sector classification sorts air masses by designatingair mass sectors (e.g. wind roses), possibly comparing them with measuredconcentrations.

Cluster analysis is a multivariate statistical technique that groups trajectorieswith similar directions and lengths in a virtually objective8 way.

Hysplit CLUSTER program uses the Ward’s minimum variance clusteringmethod to minimize the dissimilarity in the trajectories within each clusterwhile maximizing the dissimilarity of different clusters9.7Fleming et al., Atm. Env., 104-105 (2012), 1-398It should be noted though that the grouping depends on the clustering algorithmand the use of more than one techniques is recommended.

9Stunder et al., J. Appl. Meteor. 35 (1996), 1319-1331

Page 22: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Cluster analysisThe measure of dissimilarity is based on the latitude and longitude of eachtrajectory endpoint (i.e. horizontal speed). Thus, vertical motion is notconsidered.

Clusters are then represented by their mean trajectory.

MIN

trajs incluster k∑j=1

endpoints∑i=1

d2i

The optimal number of clusters is determined10 by calculating the sum of allclusters Spatial Variance (TSV) for all the possible number of clusters

until the total variance of the individual trajectories about their cluster-meanstarts to increase substantially.

10Stein et al., Bull. Amer. Meteor. Soc., 96 (2015), 2059-2077

Page 23: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Monte Martano 2014, 4x365 5 days backtrjectories11The step just before the large increase in TSV gives the final number ofclusters.

2

n. TSV %clusters change

2 118.948 110.883 44.024 34.02

11Chiara Petroselli, PhD Thesis (2017), Perugia University (IT)

Page 24: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Cluster components11One advantage of the method is that the errors in the individual trajectoriestend to average out.

The clusters give informations on theflow patterns (i.e. directions) affectingthe site and their relative importancemore accurately than wind roses,however there is somewhat uncertaintyabout the source location along thepath.

11Chiara Petroselli, PhD Thesis (2017), Perugia University (IT)

Page 25: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Wind rosesIn situ wind roses are often used to trace the direction of pollutants reaching asite, though they fail to take into account the synoptic scale of the flow field:

Wind roses arerepresentative only ofthe local circulation(2-3 h before thestation)

whereas trajectoriesrefer to different timesin different positions.

Page 26: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Clusters selection11

1st ranked 2nd ranked

?

Lhs cluster n. 2 appears as rhs clusters n. 4 + 6 but lhs n. 2 is loosely relatedto the remaining rhs clusters.11Chiara Petroselli, PhD Thesis (2017), Perugia University (IT)

Page 27: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Monte Martano 2010 4x365 5 days long backtraj.11

1st ranked 2nd ranked

n. TSV %clusters change

8 98.022 81.385 48.096 35.763 32.08

Which lhs cluster corresponds to rhs cluster n. 1 and which to rhs cluster n. 2?

11Chiara Petroselli, PhD Thesis (2017), Perugia University (IT)

Page 28: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Cluster composition11Monte Martano 2009-2016 winter saharian 5 days long backtraj. (best choice)

n. TSV %clusters change

3 103.382 61.567 47.2716 33.70

deleted trajectories incluster #3

n. TSV %clusters change

4 92.102 88.4411 50.64

Deleting one cluster gave four clusters as 1st choice instead of two!11Chiara Petroselli, PhD Thesis (2017), Perugia University (IT)

Page 29: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Cluster composition11n. TSV %

clusters change3 103.382 61.567 47.2716 33.70

deleted trajectories incluster #3

n. TSV %clusters change

4 92.102 88.4411 50.64

The 2nd ranked are strongly related to the original clusters.

11Chiara Petroselli, PhD Thesis (2017), Perugia University (IT)

Page 30: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories frequencyA step forward might be represented by programs like the on-line version orthe TRAJFREQ Hysplit ones, both capable of calculating trajectory frequencies:

Page 31: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories frequencyThe calculated trajectories can be superimposed by a user-defined grid and thenumber of trajectories passing over each grid cell evaluated:

Freqi,j = ni,jN

ni,j : number of trajectoriescrossing the cell i , j

N: total number oftrajectories

Page 32: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories frequencyThe number of trajectories passing over each grid cell can be accounted for intwo ways:

1 ni,j is incremented by 1 when a new trajectory crosses the cell.

Page 33: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories frequency2 ni,j is incremented by 1 for every endpoint falling into the cell (residence

time):

When residence time is used, the time a trajectory spends in each cell isconsidered.

Page 34: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories frequency: 12 backtraj GDAS1 120 h

residence time off residence time on

TRAJFREQ also allows to select a stratum where the endpoints must becontained as well as three different types of normalization.

Page 35: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Source-receptor correlationTrajectory frequency and, to a lesser extent, cluster analysis provide a map ofthe areas where an hypothetical source would exert higher influence on thereceptor site during the selected period.

However, the association between high trajectory frequency and high measuredconcentrations implicitly assumes all the sources having the same strength (i.e.emission capacity) and the accumulated signal proportional to the frequency.

This approach would consider scarcelyprobable an intense source affecting thesite for a short period.

Moreover, the greatly increased timeresolution of the samplers coupled withtheir capability of unmanned recording forconsiderably long periods provided awealth of information often underutilized.

Page 36: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Source-receptor correlationIt would be useful to label the backtrajectory on the basis of their arrival timeand comparing them with the time modulation of the pollutant concentration:

The study of the correlation between the concentration at the receptor andthe backtrajectories gives much better informations about where the sourcemight be located (e.g. along the path of the trajectory).

Page 37: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Defining high concentrationsMany methods aimed to classify air masses or air sectors on the basis of theireffects on the composition of samples collected at the receptor site

start isolating the highest measured concentrations of a compound or, better,its exceedance levels and try to find out the regions they came from:

There is not consensus in literature about when a concentration should beconsidered high. Often the limit is taken at the 90th percentilea.

Therefore the n concentrations are ordered in increasing order of magnitudeand the values above the 90% of the total are taken, interpolating if necessary.ahttps://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm

Page 38: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Potential Source Contribution Function (PSCF)To estimate the probability a source is located on a particular area, an arbitrarygrid is superimposed to the region and the proportion of trajectories reachingthe site when concentrations was high to the total is evaluated for each cell:

+

Page 39: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Potential Source Contribution Function (PSCF)12

PSCF calculates13 this proportion as the ratio of the high concentrationprobability a trajectory crosses a specific cell divided by the total probability:

PSCFi,j = mi,j/ni,j

mi,j : correlating trajectoriesni,j : total trajectories

In other words, the method implements a source-receptor correlation by meansof backtrajectories when their arrival time is set equal to the sampling date.12Petroselli et al., Atm. Env. 204 (2018), 67-7713Ashbaugh et al., Atm. Env. 19 (1985), 1263-1270

Page 40: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Residence time

PSCFi,j = mi,j/ni,j

mi,j : correlating trajectoriesni,j : total trajectories

Yet there are two ways to count the numbers of trajectories in each cell:1 every trajectory crossing the cell is counted as 1 (e.g. PSCFi,j = 3

3+4 );2 every trajectory endpoint falling into the cell is considered, the residence

time (PSCFi,j =∑

green∑green

∑red).

Page 41: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Weighting functionThe residence time option attempts to consider that a slow trajectory passingabove a source can collect more pollutant than a fast one.

A few limitations of the PSCF approach are:unfitness to identify local sources since the trajectories all converge to thereceptor pointfalse positive areas upwind and downwind of the sources since theprobabilities are evenly distributed along the path (tailing effect);tendency to give the same probability to cells with very few counts andthose much more reliable containing a high number of endpoints.

Those limitations are more severe when the number of samples is small.

Their effects are often diminished by introducing a weighting function todecrease the probabilities computed for cells having a small number ofendpoints.

Page 42: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

The R languageR14 is an interpreted language for data analysis and statistical computing. It isequipped by a run time environment with graphics, debugger, access to systemfunctions and can run script files.

Its main power consists in the operators for carrying out calculations on arraysand in a large collection of tools for data analysis.

R is a free software (a GNU software15) available for Unix, Unix-like,Microsoft Window (MS).

Notice: the examples in the following part were all executed on a Linuxinstallation. However I struggled to make them platform independent. Theyshould be portable with minimum number of changes, if any.For example, a MS user might need to translate Unix path separator:Unix: /folder/subfolder/filenameMS: \folder\subfolder\filename14https://www.r-project.org/15http://www.gnu.org/

Page 43: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

R command line executionFor MS Window users, R can be run in two ways16:

1 in a terminal window (command prompt or prompt dei comandi:cmd.exe) using r.exe, rcmd.exe, rterm.exe (recommended);

cmd window

cmdprompt R execution

R promptsource: https://www.jube.io/r-blog/navigate-to-and-launch-the-r-command-line

All the command are issued at the R prompt.

16N. Venables et al, An Introduction to R v. 3.5.2, p. 95

Page 44: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

R command line executionThe working directory, where file without pathname are saved by default,is the starting directory if:

the environment variables R_USER and HOME are undefined;"My Documents" can’t be accessed;HOMEDRIVE and HOMEPATH are also undefined17.

The command getwd() returns an absolute path of the working directorywhile setwd(dir), where dir is the pathname of a directory, sets theworking directory to dir:� �> getwd ( )[ 1 ] "C : /Documents␣and␣ S e t t i n g s "> setwd ( "C : /Documents␣and␣ S e t t i n g s /Data " )� �

17N. Venables et al, An Introduction to R v. 3.5.2, p. 95

Page 45: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Rgui2 in a console-based Graphical User Interface such as:

1 Rgui.exe, part of the default R installation (it is the commandcorresponding to the system menu and the desktop icons):

source:https://geonetcast.wordpress.com/2018/02/19/geonetclass-manipulating-goes-16-data-with-r-a-contribution-from-unalm/

The working directory in this case can be set in the "Start In" field in the Rshortcut18.18N. Venables et al, An Introduction to R v. 3.5.2, p. 9

Page 46: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Rstudio2 in a console-based Graphical User Interface such as:

2 Rstudio19, a free and open-source environment for R recommended also bythe Openair creator:

source:https://upload.wikimedia.org/wikipedia/commons/4/4d/Rstudio.png

19https://www.rstudio.com/

Page 47: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

R nuts and boltsRegardless of the invocation, R commands are issued at the prompt, thereforehereafter the bare R prompt will be shown in the examples:� �> v e r s i o n. . . . . . .major 3minor 3 . 3l anguage R. . . . . . .� �

bear in mindwhen comparing

� �> 2+2[ 1 ] 4

> q ( )Save workspace image ? [ y/n/c ] : n� �

� �> he l p ( q u a n t i l e )q u a n t i l e package : s t a t s R Documentat ion

. . . . . . . . . . . . .The g e n e r i c f u n c t i o n " q u a n t i l e " p roduces sample q u a n t i l e sp r o b a b i l i t y o f 1 .. . . . . . . . . . . . .q u a n t i l e ( x , p robs = seq (0 , 1 , 0 . 2 5 ) , na . rm = FALSE ,

names = TRUE, type = 7 , . . . )� �

Page 48: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Openair20 (http://www.openair-project.org/)Openair is an impressive open source software free of charge, designed tofacilitate the study of atmospheric problems and written as an R package.

The first 50 pages if its manual21 are devoted to present the language syntax,paying specific attention to commands often used for pollution data analysis(recommended) while the A Appendix gives a short description on Rinstallation.

Since Openair is written in R, it must be installed and loaded while R isrunning. It is available on the official R packages repository (CRAN) thereforeit can be installed either selecting it in the "packages" menu of the GUI,issuing the command:� �

> i n s t a l l . packages (" op ena i r " )� �or downloading first the separate packages for the off-line procedure.

20Carslaw et al., Environ. Model. Softw. 27-28, (2012)21D.C. Carslaw The openair manual for version 2.6-0, (2018) University of York https:

//www.dropbox.com/s/2n7wdyursdul8dk/openairManual.pdf?dl=0

Page 49: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Openair functionsOpenair provides a great deal of handy functions for atmospheric dataanalysis, such as:

wind and pollution roses;scatter plots;pollutant linear relationships;automatic plots of day of the week, daily, weekly, monthly trends;model evaluation:

model vs models/measurements statistics (mean bias, mean gross error,rmse, etc.);Taylor diagrams;

running averages;Openair needs to be loaded (i.e. "read") in memory by R choosing it in the"Load package" GUI menu or by the command:� �> l i b r a r y ( o p en a i r )� �

Page 50: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Openair syntaxThe package version I am using is:� �> packageVe r s i on (" op en a i r " )[ 1 ] ’ 2 . 6 . 0 ’� �

Note that the R language is case sensitive.

Most of Openair’s utilities can be exploited as functions:� �> funct ionName ( data , op t i on s , . . . )� �The output of a function or operation can be assigned to a variable:� �> a <− 2+2> a[ 1 ] 4> b <− windRose (mydata )� �

Page 51: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Backtrajectory functionsPre-calculated backtrajectories stored at the London Air data archive can beimported in Openair using the importTraj() function22. They are calculatedusing Hysplit, for the whole year every 3-hours starting at 10 m a.g.l..

Unfortunately:"So far only a few receptors are available to users but in time thenumber will increase. It should be feasible for example to run backtrajectories for the past 20 years at all the EMEP sites in Europe.a"

aIt takes about 15 hours to run 20 years of 96-hour back trajectories at 3-hourintervals.

The openair manual ver. 15th November 2018, p. 195

importTraj() function can also import local pre-calculated backtrajectories(see help(importTraj) command).

22The openair manual ver. 15th November 2018, p. 62

Page 52: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

.RData backtrajectory fieldsThey must have been stored as .RData file (a format used by R to store data)and for each trajectory contain the fields:

date rpt23 year month day hour hour.inc lat lon height press2013-04-20 1 2013 4 20 0 0 42.805 12.565 500.0 891.72013-04-20 1 2013 4 19 23 -1 42.644 12.488 509.3 897.12013-04-20 1 2013 4 19 22 -2 42.500 12.396 495.7 907.12013-04-20 1 2013 4 19 21 -3 42.377 12.279 462.4 920.72013-04-20 1 2013 4 19 20 -4 42.277 12.138 419.2 937.82013-04-20 1 2013 4 19 19 -5 42.205 11.986 380.5 953.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2013-04-20 1 2013 4 15 0 -120 42.922 7.606 910.5 919.2

(please, note the replicated "date" and "rpt" fields).

The user’s manual Appendix D provides instructions to run Hysplit within (Rand) Openair, then analyze the resulting trajectories.

23rpt: receptor number (currently 1).

Page 53: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectory files provisionThe advantage of the latter approach is the use of a single, geared procedure(run_Hysplit.R) to make all the steps (meteo file downloading, Hysplit runsetup and execution, trajectories loading in R). Few user interventions arerequired if the procedure works well.

The main drawbacks are debugging and computer performances:

Alternatively Hysplit can be run first, then the backtrajectories loaded for theanalysis24.

24This procedure is also recommended by the author for high meteo resolution calcu-lations (see The openair manual ver. 15th November 2018, p. 195.).

Page 54: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectory file formatTo be loaded, the files containing the trajectories have to:

1 be in the same folder;2 be named using a common suffix (e.g. "tdump"");3 contain only one trajectory.

Fortunately, these requirementsare compatible with Hysplitstandard procedure:

currentfolder suffix

Page 55: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

tdump00: an example of trajectory file� �5 1

GDAS 13 4 8 0 0GDAS 13 4 15 0 0GDAS 13 4 22 0 0GDAS 13 4 29 0 0GDAS 13 5 1 0 01 BACKWARD OMEGA

13 4 20 0 42.805 12 .565 500 .02 PRESSURE MIXDEPTH1 1 13 4 20 0 0 0 0 .0 42 .805 12 .565 500 .0 891 .7 97 .21 1 13 4 19 23 0 1 −1.0 42 .644 12 .488 509 .3 897 .1 94 .81 1 13 4 19 22 0 2 −2.0 42 .500 12 .396 495 .7 907 .1 91 .21 1 13 4 19 21 0 3 −3.0 42 .377 12 .279 462 .4 920 .7 85 .41 1 13 4 19 20 0 2 −4.0 42 .277 12 .138 419 .2 937 .8 96 .11 1 13 4 19 19 0 1 −5.0 42 .205 11 .986 380 .5 953 .2 110 .51 1 13 4 19 18 0 0 −6.0 42 .156 11 .826 355 .2 959 .7 122 .6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1 13 4 15 5 0 1 −115.0 42 .740 7 .357 759 .8 934 .7 82 .91 1 13 4 15 4 0 2 −116.0 42 .790 7 .407 783 .6 931 .8 93 .11 1 13 4 15 3 0 3 −117.0 42 .826 7 .469 805 .0 929 .2 104 .51 1 13 4 15 2 0 2 −118.0 42 .855 7 .528 830 .5 927 .4 99 .31 1 13 4 15 1 0 1 −119.0 42 .888 7 .573 866 .0 923 .9 93 .81 1 13 4 15 0 0 0 −120.0 42 .922 7 .606 910 .5 919 .2 87 .5� �

Page 56: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

run_Hysplit.ROpenair does not include a built in function to load trajectory files. Howeverthe Appendix D points to the GitHub gist https://gist.github.com/davidcarslaw/c67e33a04ff6e1be0cd7357796e4bdf5:

run_Hysplit.R can be downloaded as a zipfile or saved through the browserafter clicking on the "Raw" button.

Page 57: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Adapting run_Hysplit.RAlthough run_Hysplit.R was meant to run the model inside Openair, it canbe used merely to load the pre-calculated trajectory files.

For this use there is no need to install the devtools package as it is stated inthe Appendix D. It needs nonetheless a little hack to adapt it to a variablenumber of Hysplit input meteo files.

Since run_Hysplit.R is an text (ASCII) file, it can be changed using a texteditor25. like Notepada, Wordpada, MS Worda, Rguib and Rstudio.aRemember to save it as a text file without the "txt" extension.bSelecting the "Open script..." item in the "File" menu.

25The openair manual ver. 15th November 2018, p. 43-44.

Page 58: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

run_Hysplit.R, rows 389-402. In pink the characters to add.� �389 : r e a d _ h y s p l i t _ f i l e <− f u n c t i o n ( f i l e , drop ) {390 :

# Adapted to t r a j s w i th unknown number o f m e t e o f i l e s# by S . C r o c c h i a n t iskp <− as . numer ic ( r ead . t a b l e ( f i l e , nrows=1) [ 1 ] ) + 4

391 : # Load f i l e , e r r o r c a t c h i n g i s f o r when two or t h r e ei npu t met f i l e s a r e used

392 : # and r e s u l t s i n a d i f f e r e n t l e n g t h f i l e heade r393 : d f <− t r yCa t ch ({394 :395 : # read . t a b l e ( f i l e , heade r = FALSE , s k i p = 6)

read . t a b l e ( f i l e , heade r = FALSE , s k i p = skp )396 :397 : } , e r r o r = f u n c t i o n ( e ) {398 :349 : r ead . t a b l e ( f i l e , heade r = FALSE , s k i p = 7)340 :341 : }342 : )� �

Page 59: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

First Openair executionOnce run_Hysplit.R has been saved, it can be loaded in R using thecommand source().

To load all the tdump files contained in a directory with a long pathnamemight be convenient to define a specific variable, dir. Then the files can beread using the function ldply():� �> l i b r a r y ( o p en a i r )> sou r c e (" r u n_h y s p l i t .R")> d i r <− "/ data1 / h y s p l i t / h y s p l i t / t runk /working_201304/

analys is_mm /"> f i l e _ l i s t <− l i s t . f i l e s ( d i r , " tdump " , f u l l . name = TRUE)> NROW( f i l e _ l i s t )[ 1 ] 1009> t r a j <− p l y r : : l d p l y ( f i l e _ l i s t , r e a d_ h y s p l i t _ f i l e , drop =

TRUE)� �The file_list variable contains the list of all the filenames with the absolutepathname prepended (1009 files).

Page 60: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

The traj data frame (i.e. a "table") contenttraj contains the trajectories endpoints in the abovementioned format.� �> head ( t r a j , n=14L)

hour . i n c l a t l on h e i g h t p r e s s u r e V14 date2 date1 0 42 .805 12 .565 500 .0 891 .7 97 .2 2013−04−20 00 : 00 : 00 2013−04−202 −1 42 .644 12 .488 509 .3 897 .1 94 .8 2013−04−19 23 : 00 : 00 2013−04−203 −2 42 .500 12 .396 495 .7 907 .1 91 .2 2013−04−19 22 : 00 : 00 2013−04−204 −3 42 .377 12 .279 462 .4 920 .7 85 .4 2013−04−19 21 : 00 : 00 2013−04−205 −4 42 .277 12 .138 419 .2 937 .8 96 .1 2013−04−19 20 : 00 : 00 2013−04−206 −5 42 .205 11 .986 380 .5 953 .2 110 .5 2013−04−19 19 : 00 : 00 2013−04−207 −6 42 .156 11 .826 355 .2 959 .7 122 .6 2013−04−19 18 : 00 : 00 2013−04−208 −7 42 .116 11 .643 339 .1 963 .7 186 .9 2013−04−19 17 : 00 : 00 2013−04−209 −8 42 .078 11 .428 331 .0 968 .6 198 .8 2013−04−19 16 : 00 : 00 2013−04−2010 −9 42 .041 11 .190 343 .0 971 .6 163 .9 2013−04−19 15 : 00 : 00 2013−04−2011 −10 42 .016 10 .957 382 .7 970 .9 137 .2 2013−04−19 14 : 00 : 00 2013−04−2012 −11 42 .004 10 .762 441 .1 966 .2 141 .6 2013−04−19 13 : 00 : 00 2013−04−2013 −12 42 .001 10 .614 507 .7 959 .9 138 .3 2013−04−19 12 : 00 : 00 2013−04−2014 −13 42 .005 10 .501 571 .8 952 .6 122 .2 2013−04−19 11 : 00 : 00 2013−04−20> dim ( t r a j )[ 1 ] 122089 8� �

row

Note: traj has 8 columns and 122089 (= 1009 files × 121 hoursa) rows.aThe starting hour plus 120 propagation.

Page 61: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Plotting trajectoriesTo plot the trajectories use the Openair command trajPlot(). Trajectorystarting dates can be selected by the function selectByDate():� �> t r a j P l o t ( t r a j , main="4/20/2013−6/1/2013 MM hou r l y

b a c k t r a j e c t o r i e s " )> t r a j P l o t ( s e l e c tByDat e ( t r a j , s t a r t = "01/5/2013" ,

end="01/5/2013") , x l im=c (−10 ,25) , y l im=c (25 ,60) )� �

Page 62: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Concentration filesTo make source-receptor correlation analysis the measured concentrationsmust be loaded too.

A simple way is to prepare CSV (e.g. using MS Excel) input files like:� �yea r month day hour Fe [ ug/m3]2013 04 20 00 0.0262512013 04 21 00 0.0153472013 04 22 00 0.1041642013 04 23 00 0.0032262013 04 24 00 0.1060462013 04 25 00 0.1443952013 04 26 00 0.2634902013 04 27 00 0.3013602013 04 28 00 0.223864. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .� �

Page 63: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Loading concentrationsThe file can thus be loaded using the function read.csv(). Dates must bedefined in GMT since the trajectories refer to this timea. A new columnnamed date2 is added to the data frame dat.Then, unneeded columns are removed using subset() where the columns todelete are specified with a minus sign:� �> dat<−r ead . c sv (" data /mm_pm10_fe" , sep=" ")> dat$date <− with ( dat , ISOdatet ime ( year , month , day , hour ,

min = 0 , s e c = 0 , t z = "GMT") )> dat <− s ub s e t ( dat , s e l e c t=−c ( year , month , day , hour ) )> head ( dat )

Fe . ug .m3 . date1 0.026251 2013−04−192 0.015347 2013−04−203 0.104164 2013−04−214 0.003226 2013−04−225 0.106046 2013−04−236 0.144395 2013−04−24� �

name

aThey are the key of the correlations.

Page 64: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Merging backtrajectories and measurementsFinally, trajectories data and concentrations can be merged in a single dataframe using merge() on the basis of the "date" field:� �> mtraj<−merge ( t r a j , dat , by="date " )> head ( mtra j , n=5L)

date hour . i n c l a t l on h e i g h t p r e s s u r e V141 2013−04−20 −110 42 .507 7 .218 610 .2 951 .9 45 .42 2013−04−20 −72 41 .869 9 .454 551 .9 914 .3 120 .63 2013−04−20 −40 42 .084 8 .065 307 .2 979 .6 104 .94 2013−04−20 −105 42 .420 7 .509 444 .3 968 .6 117 .15 2013−04−20 −38 42 .080 8 .219 263 .8 972 .2 174 .7

date2 Fe . ug .m3 .1 2013−04−15 10 : 00 : 00 0.0153472 2013−04−17 00 : 00 : 00 0.0153473 2013−04−18 08 : 00 : 00 0.0153474 2013−04−15 15 : 00 : 00 0.0153475 2013−04−18 10 : 00 : 00 0.015347� �

The association "by date" creates a large data frame where each endpoint hasthe same concentration value measured at the starting date.

Page 65: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

PSCFPSCF probabilities can be calculated using the trajLevel() function:� �> t r a j L e v e l ( mtra j , p o l l u t a n t = "Fe . ug .m3 . " , s t a t i s t i c ="p s c f " ,

p e r c e n t i l e =90, c o l=" d e f a u l t " , smooth=T,x l im=c (−10 ,25) , y l im=c (10 ,60) ,map . f i l l =T, g r i d . c o l="b l a ck ")� �

Smoot=T Smoot=F

Page 66: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Choosing a thresholdSince the choice of a particular threshold to define high concentration values isarbitrary, it is recommended to compare PSCF probabilities calculated withdifferent values (e.g. 50%, 70%, 90%):

50th percentile 70th percentile 90th percentile

Very low thresholds tend to spread the probability above the whole areawhereas very high values might localize the probability along the path andmiss some source area.

Page 67: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Percentile componentstrajPlot() can be used to find which trajectories contribute to theprobabilities:� �> t r a j P l o t ( mtra j , x l im=c (−10 ,25) , y l im=c (25 ,60) )� �

What happenedto all the

remainingtrajectories?

The traj dataframe srinked:� �> dim ( mtra j )[ 1 ] 2783 9� �

Page 68: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Percentile components2783 endpoints correspond to 23 trajectories 121 endpoint each.

The contraction occured during the merging procedure:� �> ?mergemerge package : base R Documentat ion

Merge two data f rames by common columns or row names , o r do o th e rv e r s i o n s o f da tabase _jo in_ o p e r a t i o n s .

. . . . . . . . . .By d e f a u l t the data f rames a r e merged on the columns wi th namesthey both have , but s e p a r a t e s p e c i f i c a t i o n s o f the columns can beg i v en by ’ by . x ’ and ’ by . y ’ . The rows i n the two data f rames tha tmatch on the s p e c i f i e d columns a r e e x t r a c t ed , and j o i n e d t o g e t h e r .. . . . . . . . . .� �

Therefore, trajectories not starting at the recorded concentration hours arediscarded (986 out of 1009).

Page 69: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories representativenessThe high probability areas are determined by those backtrajectories among the23 that correlate with concentrations exceeding the limit:� �> qu a n t i l e ( dat [ [ " Fe . ug .m3 . " ] ] , p robs = 90 ./100 , na . rm = TRUE)

90%1.134477� �

The use of a single trajectory to represent all the contribution to a dailysample might be too coarse and, to solve the "matching limitation", all thesamples registration could be duplicated at 12:00.

To this end, the dataset is read again into a second data frame named dat1and the recording hour is shifted by 12 hours during the GMT conversion:� �> dat1<−r ead . c sv (" data /mm_pm10_fe" , sep=" ")> dat1$date <− with ( dat1 , ISOdatet ime ( year , month , day ,

hour=12, min = 0 , s e c = 0 , t z = "GMT") )> dat1 <− s ub s e t ( dat1 , s e l e c t=−c ( year , month , day , hour ) )> doub l eda t <− r b i n d ( dat , dat1 )� �

Page 70: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Increasing the number of trajectoriesNow the doubledat data frame has the same concentrations recorded twiceand a new PSCF plot can be calculated after reloading the trajectories and themerging procedure:� �> doub l eda t

Fe . ug .m3 . date1 0.026251 2013−04−202 0.015347 2013−04−213 0.104164 2013−04−22. . . . . . . . . . . . . . . . . . .24 0 .026251 2013−04−20 12 : 00 : 0025 0.015347 2013−04−21 12 : 00 : 0026 0.104164 2013−04−22 12 : 00 : 00. . . . . . . . . . . . . . . . . . .� �The dim() function confirms:� �> dim ( dmtra j )[ 1 ] 5566 9� �

Page 71: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Percentiles of duplicate setsUnfortunately in this way the threshold is altered:� �> qu a n t i l e ( doub l eda t [ [ " Fe . ug .m3 . " ] ] , p robs = 90 ./100 , na . rm = TRUE)

90%1.137519� �

leading to the exclusion of one measure (1.137519 µg/m3).

In contrast to what one would expect, the duplication of the values alters thequantile:� �> va l<− c ( 1 . 0 , 2 . 0 , 3 . 0 , 4 . 0 )> va l1<−c ( va l , v a l )> va l 1[ 1 ] 1 2 3 4 1 2 3 4> q u a n t i l e ( va l , p robs = 90 . / 100 , na . rm = TRUE)90%3 .7> q u a n t i l e ( va l1 , p robs = 90 . / 100 , na . rm = TRUE)90%

4� �

Page 72: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Openair’s way to calculate percentilesAs a consequence it may prove rather laborious to increase the number oftrajectories associated to low frequency (daily, weekly, etc.) sampling.

A second Openair peculiarity is how it implements the percentile calculationwhich differs when it is calculated on the merged data or on concentrations:� �> qu a n t i l e ( t r a j [ [ " Fe . ug .m3 . " ] ] , p robs =90./100 , na . rm=TRUE)90%1.137519> q u a n t i l e ( dat [ [ " Fe . ug .m3 . " ] ] , p robs =90./100 , na . rm=TRUE)90%1.134477� �

changing again the exceedance level and excluding of one measure (1.137519µg/m3).

Page 73: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Openair’s way to calculate percentilesThe discrepancy arises from the way Openair’s code calculates quantilesa:� �Q90 <− q u a n t i l e ( mydata [ [ p o l l u t a n t ] ] , p robs=p e r c e n t i l e /100 ,

na . rm=TRUE)mydata <− group_by (mydata , UQS( syms ( v a r s ) ) ) %>%

summarise (N = l e ng t h ( date ) , date = head ( date , 1) ,count = l e ng t h ( which (UQ( sym( p o l l u t a n t ) ) > Q90) ) / N

)� �which is calculated on the pollutant column of the data frame, e.g. thecolumn "Fe.ug.m3." of the merged mtraj object in the previous examples.

Since mtraj possesses as many rows as the number of trajectories endpoints,the quantiles ends up being calculated on the concentrations replicatedendpoints times.Accordingly, even if a single trajectory is used for each sample, the thresholdmight be different depending on the number of samples and the propagatingtime, a factor that might be misleading in comparisons.aNote the exclusion of the Q90 value due to the > operator in the mydata assignment.

Page 74: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Correlating trajectoriesTo plot the trajectories correlating with concentrations exceeding the twoquantiles, it is sufficient to extract them using the function subset():� �> t r j q 9 0 <− s ub s e t ( mtra j , Fe . ug .m3. >1.134477)> t r j q 9 x <− s ub s e t ( mtra j , Fe . ug .m3. >1.137519)> t r a j P l o t ( t r j q 90 , x l im=c (−10 ,25) ,

y l im=c (25 ,60) , p o l l u t a n t="Fe . ug .m3 . " ,main="Fe>1.134477 h=00 t r a j e c t o r i e s c o r r e l a t i o n ")

> t r a j P l o t ( t r j q 9 x , x l im=c (−10 ,25) ,y l im=c (25 ,60) , p o l l u t a n t="Fe . ug .m3 . " ,main="Fe>1.137519 h=00 t r a j e c t o r i e s c o r r e l a t i o n ")� �

Note the missing trajectoryon the right hand panel,correlating withconcentration 1.137519µg/m3 and correspondingto an omitted PSCFprobability area locatedsouth-east.

Page 75: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories not includedThe same operation can be carried out on the doubled dataset:

Both images show how important the south-east contribution is, althoughneglected by the combined effect of the threshold rising and theimpracticability of using more than one trajectory per sample.

Page 76: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories not includedAccordingly, the PSCF probabilities are: � �

> sub s e t ( doub ledat ,Fe . ug .m3. >1.137519)

Fe . ug .m3 . date10 1.162659 2013−04−29 00 : 00 : 0014 1.220515 2013−05−03 00 : 00 : 0033 1.162659 2013−04−29 12 : 00 : 0037 1.220515 2013−05−03 12 : 00 : 00� �

There is a slight mismatch between PSCF areas and correlating trajectories.The subsect() function extracts from the doubled dataset the concentrationsexceeding the threshold 1.137519 calculated by Openair PSCF.

Yet, surprisingly, there are 4 concentrations but just 3 trajectories while onetrajectory is missing.

The problem is associated to the documented trajPlot inability to plotshorter trajectories:

Page 77: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Short trajectories"Note that trajPlot will only plot full length trajectories. This can beimportant when plotting something like a single month e.g. by usingselectByDate when on partial sections of some trajectories may beselected."

The openair manual ver. 15th November 2018, p. 198

Thus, the missing trajectory can be plotted by using selectByDate():� �> t r a j P l o t ( s e l e c tByDat e ( t r a j , s t a r t = "03/5/2013" ,

end="03/5/2013" , hour=12) ,main="03/5/2013 12 :00 b a c k t r a j e c t o r y " ,x l im=c (−10 ,25) , y l im=c (25 ,60) )� �

corresponding to a trajectory 110 hour long.

Page 78: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Openair weightsThe Openair’s implementation of PSCF applies a weighting function to reducethe effect of cells with few endpoints. Unfortunately, no mention is made inthe manual on the particular parameterization adopted.

The weights are hardcoded into the Openair sources and cannot be excludedby the user.

The probability for each cell (i , j) is multiplied by W (nij):

W (nij) =

1.00, nij > 2n0.75, n < nij ≤ 2n0.50, n/2 < nij ≤ n0.15, nij ≤ n/2

where n is the average number of endpoints per cell, computed on every cellwith at least one endpoint.

Page 79: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Weights on/offHowever one of the Openair’s advantages is that its sources are publiclyavailable and trajLevel() can be modified to set all the weights to 1.:

potentialtailing effects

The effects of the weighting function is clearly visible in the cells far from thereceptor - those with a smaller number of endpoints - that would have a muchhigher probability if the weighting was not applied.

Page 80: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Concentration Weighted Trajectory or Concentration Field26CWT was developed in order to weight trajectories by means of the associatedconcentrations:� �> t r a j L e v e l ( mtra j , p o l l u t a n t="Fe . ug .m3 . " ,

s t a t i s t i c ="cwt " , c o l=" d e f a u l t " , smooth=T,x l im=c (−10 ,25) , y l im=c (10 ,60) ,map . f i l l =T, g r i d . c o l="b l a ck ")� �

log(Ci,j) = 1∑Nk=1

τijk

∑Nk=1 log(ck)τijk

i , j meanconcentration

concentrationat the receptor

k traj. residence time

CWT gives similar results to a mid percentile PSCF but without the possibilityto test different thresholds to distinguish between significant and backgroundconcentrations.26Hsu et al., Atm. Env. 37 (2003), 545-562

Page 81: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectory daysTo tag the trajectory day a further column day can be added to the trajdataframe:� �> t r a j $ d a y <− as . Date ( t r a j $ d a t e )> t r a j P l o t ( t r a j , group="day " ,

main="4/20/2013−6/1/2013" ,c o l=" j e t " , x l im=c (−10 ,25) ,y l im=c (10 ,60) )� �

Unfortunately, the group option cannotselect the week mode.

Nevertheless, grouping can be used tocatch a glimpse of trajectories paths duringthe period.

Page 82: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Correlating trajectoriesIf the pollutant option is used instead of the group one, a plot of thetrajectories colored on the basis of the correlating concentration is obtained:� �> t r a j P l o t ( mtra j , c o l=" j e t " ,

p o l l u t a n t="Fe . ug .m3 . "main="4/20/2013−6/1/2013" ,x l im=c (−10 ,25) ,y l im=c (10 ,60) )� �

Note the reduced number of trajectoriessince the function was applied to the mtrajdataframe.

Page 83: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Trajectories frequencytrajLevel() can also calculate trajectory frequency:� �> t r a j P l o t ( t r a j , smooth=F ,

s t a t i s t i c =" f r e qu en c y " ,main=" t r a j . f r e q . " ,x l im=c (−10 ,25) ,y l im=c (10 ,60) )� �

Since the function does not involvepollutant concentrations, the wholetrajectory dataset (traj) can be used.

Page 84: Backtrajectory analysis in the Source Apportionment ... 201902-Crocchianti.pdf · Backtrajectory analysis in the Source Apportionment: models, methods and applications Allmodelsarewrong,butsomeareuseful

Backtrajectory analysis in the Source Apportionment: models, methodsand applications

Further readingG. P. Brasseur and D. J. Jacob, Modeling of atmospheric chemistry, CambridgeUniversity Press (2017);Lin et al, Lagrangian Modeling of the Atmosphere, Geophysical Monograph Series 200,American Geophysical Union (2012);A. Stohl, Computation, accuracy and applications of trajectories – a review andbibliography, Atm. Env. 32 (1998), 947-966;D.C. Carslaw, The openair manual, University of York (2018);Fleming et al, Review: Untangling the influence of air-mass history in interpretingobserved atmospheric composition, Atm. Res. 104-105 (2012), 1-39;Petroselli et al., Disentangling the major source areas for an intense aerosol advectionin the Central Mediterranean on the basis of Potential Source Contribution Functionmodeling of chemical and size distribution measurements, Atm. Env. 204 (2018),67-77.http://orfeo.chm.unipg.it/temp/workshop_ias_2019.pdf

Acknowledgments to:

NOAA Air Resources Laboratory and READY website; D. Carslaw’s OpenAir project; RFoundation; GNU project; Free Software Foundation.

Version 1.1.1