probabilistic forecasts of mesoscale convective system

19
Probabilistic Forecasts of Mesoscale Convective System Initiation Using the Random Forest Data Mining Technique DAVID AHIJEVYCH AND JAMES O. PINTO National Center for Atmospheric Research,* Boulder, Colorado JOHN K. WILLIAMS The Weather Company, Andover, Massachusetts MATTHIAS STEINER National Center for Atmospheric Research,* Boulder, Colorado (Manuscript received 3 September 2015, in final form 26 January 2016) ABSTRACT A data mining and statistical learning method known as a random forest (RF) is employed to generate 2-h forecasts of the likelihood for initiation of mesoscale convective systems (MCS-I). The RF technique uses an ensemble of decision trees to relate a set of predictors [in this case radar reflectivity, satellite imagery, and numerical weather prediction (NWP) model diagnostics] to a predictand (in this case MCS-I). The RF showed a remarkable ability to detect MCS-I events. Over 99% of the 550 observed MCS-I events were detected to within 50 km. However, this high detection rate came with a tendency to issue false alarms either because of premature warning of an MCS-I event or in the continued elevation of RF forecast likelihoods well after an MCS-I event occurred. The skill of the RF forecasts was found to increase with the number of trees and the fraction of positive events used in the training set. The skill of the RF was also highly dependent on the types of predictor fields included in the training set and was notably better when a more recent training period was used. The RF offers advantages over high-resolution NWP because it can be run in a fraction of the time and can account for nonlinearly varying biases in the model data. In addition, as part of the training process, the RF ranks the importance of each predictor, which can be used to assess the utility of new datasets in the prediction of MCS-I. 1. Introduction Because of their large size, intensity, and longevity, mesoscale convective systems (MCSs) impact society in many ways: public safety (flash flooding); wind farm energy generation, above ground transmission of elec- tricity, and cellular communication towers (severe wind events); agricultural practices (water usage); and safe and efficient air travel (turbulence, wind shear, hail). Better forecasts of MCSs will lead to more advanced public warning of severe weather (Stensrud et al. 2013), improved ability to protect wind farm assets from ex- treme winds (Mahoney et al. 2012), improved response time for energy and communications infrastructure re- pairs due to damage caused by MCSs, and improved airline safety and air traffic efficiency by routing aircraft around potential MCS initiation events (Colavito et al. 2011, 2012; Robinson 2014). Operational high-resolution numerical weather pre- diction (NWP) models with advanced data assimilation, such as the High Resolution Rapid Refresh (HRRR; Benjamin et al. 2014), are beginning to show promise in providing skillful forecasts of MCSs. Advances in the assimilation of radar reflectivity have improved the ini- tialization of existing MCSs in NWP models, but pre- dicting the timing and location of MCS initiation remains a particularly vexing problem (e.g., Clark et al. 2007, 2014; Pinto et al. 2015; Trier et al. 2014, 2015). * The National Center for Atmospheric Research is sponsored by the National Science Foundation. Corresponding author address: David Ahijevych, Mesoscale and Microscale Meteorology Laboratory, National Center for Atmo- spheric Research, P.O. Box 3000, Boulder, CO 80307-3000. E-mail: [email protected] APRIL 2016 AHIJEVYCH ET AL. 581 DOI: 10.1175/WAF-D-15-0113.1 Ó 2016 American Meteorological Society Unauthenticated | Downloaded 05/17/22 06:50 PM UTC

Upload: others

Post on 18-May-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Forecasts of Mesoscale Convective System

Probabilistic Forecasts of Mesoscale Convective System Initiation Using theRandom Forest Data Mining Technique

DAVID AHIJEVYCH AND JAMES O PINTO

National Center for Atmospheric Research Boulder Colorado

JOHN K WILLIAMS

The Weather Company Andover Massachusetts

MATTHIAS STEINER

National Center for Atmospheric Research Boulder Colorado

(Manuscript received 3 September 2015 in final form 26 January 2016)

ABSTRACT

A data mining and statistical learning method known as a random forest (RF) is employed to generate 2-h

forecasts of the likelihood for initiation of mesoscale convective systems (MCS-I) The RF technique uses an

ensemble of decision trees to relate a set of predictors [in this case radar reflectivity satellite imagery and

numerical weather prediction (NWP)model diagnostics] to a predictand (in this caseMCS-I) TheRF showed a

remarkable ability to detectMCS-I events Over 99of the 550 observedMCS-I events were detected towithin

50 kmHowever this high detection rate camewith a tendency to issue false alarms either because of premature

warning of an MCS-I event or in the continued elevation of RF forecast likelihoods well after an MCS-I event

occurred The skill of theRF forecasts was found to increasewith the number of trees and the fraction of positive

events used in the training set The skill of the RF was also highly dependent on the types of predictor fields

included in the training set and was notably better when a more recent training period was used The RF offers

advantages over high-resolution NWP because it can be run in a fraction of the time and can account for

nonlinearly varying biases in the model data In addition as part of the training process the RF ranks the

importance of each predictor which can be used to assess the utility of new datasets in the prediction ofMCS-I

1 Introduction

Because of their large size intensity and longevity

mesoscale convective systems (MCSs) impact society in

many ways public safety (flash flooding) wind farm

energy generation above ground transmission of elec-

tricity and cellular communication towers (severe wind

events) agricultural practices (water usage) and safe

and efficient air travel (turbulence wind shear hail)

Better forecasts of MCSs will lead to more advanced

public warning of severe weather (Stensrud et al 2013)

improved ability to protect wind farm assets from ex-

treme winds (Mahoney et al 2012) improved response

time for energy and communications infrastructure re-

pairs due to damage caused by MCSs and improved

airline safety and air traffic efficiency by routing aircraft

around potential MCS initiation events (Colavito et al

2011 2012 Robinson 2014)

Operational high-resolution numerical weather pre-

diction (NWP) models with advanced data assimilation

such as the High Resolution Rapid Refresh (HRRR

Benjamin et al 2014) are beginning to show promise in

providing skillful forecasts of MCSs Advances in the

assimilation of radar reflectivity have improved the ini-

tialization of existing MCSs in NWP models but pre-

dicting the timing and location of MCS initiation

remains a particularly vexing problem (eg Clark et al

2007 2014 Pinto et al 2015 Trier et al 2014 2015)

The National Center for Atmospheric Research is sponsored

by the National Science Foundation

Corresponding author address David Ahijevych Mesoscale and

Microscale Meteorology Laboratory National Center for Atmo-

spheric Research PO Box 3000 Boulder CO 80307-3000

E-mail ahijevycucaredu

APRIL 2016 AH I J EVYCH ET AL 581

DOI 101175WAF-D-15-01131

2016 American Meteorological SocietyUnauthenticated | Downloaded 051722 0650 PM UTC

Very short-term predictions of the initiation of an MCS

(MCS-I) requires a high-resolution depiction of the

evolving stability shear profile and potential forcing

mechanisms such as surface boundaries or elevated

propagating waves (eg Jirak and Cotton 2007 Houze

2004) High-resolution models with advanced data as-

similation can provide a three-dimensional estimate of

the evolving environment but imperfections in the

model and poorly constrained errors in temperature and

moisture mean that NWP predictions of MCS-I are still

prone to a great deal of uncertainty (Pinto et al 2015)

Statistical techniques (eg linear regression k-nearest

neighbor analogs neural networks random forest and

genetic algorithms) can operate on data much more

quickly than a human analyst enabling the rapid di-

gestion of frequently updating datasets (eg surface

mesonets radar satellite) along with NWP models as

often as new data arrive In this study we evaluate the

utility and predictive skill of a random forest (RF) at

predicting MCS-I The RF technique is still relatively

new to most meteorologists yet has shown promise in

several other complex weather prediction applications

as described below

Statistical models have long been a part of weather

forecasting For example model output statistics (MOS)

based on multiple linear regressions are routinely used

to compensate for systematic model biases and to gen-

erate reliable probabilistic forecasts of precipitation

cloud cover and other variables (Glahn and Lowry

1972) Analog statistical techniques identify similar past

weather patterns and give probabilistic projections

based on the observed evolution of those past patterns

(Hamill andWhitaker 2006 Delle Monache et al 2013)

The tropical weather community uses statistical models

to predict the probability of tropical cyclogenesis rapid

intensification and eyewall replacement cycles (Rozoff

and Kossin 2011 DeMaria and Kaplan 1994) Marzban

et al (2007) used neural networks to predict cloud

ceiling and visibility and Coniglio et al (2007) used lo-

gistic regression to predict MCS maintenance based on

vertical wind and stability profiles More recently

Roebber (2015) used evolutionary programming tech-

niques to generate probabilistic forecasts of minimum

surface temperatures

In past studies the skill of the RF has been shown to

vary with implementation and application Prior to its use

in meteorology the RF statistical technique was used

successfully in biomedical research to select and classify

genes relevant to diseases (eg Diacuteaz-Uriarte and de

Andreacutes 2006) More recently the RF approach was used

to diagnose regions of atmospheric turbulence due to

convection from radar and satellite observations and

NWPmodel data (Williams et al 2007 2008c McGovern

et al 2011 Williams 2014) Williams et al (2008ab)

showed how RFs could be used to predict areas where

convective storms were likely Gagne et al (2009) com-

pared the RF technique to a host of other machine

learning algorithms and found it to be better than all

other algorithms at classifying radar-based storm type

Another comparative study described by Lakshmanan

et al (2010) found that RF had a slight edge over com-

peting artificial intelligence learning techniques in clas-

sifying storm type Hall et al (2011) found that the RF

was one of the best algorithms in terms of overall skill

metrics for short-term clear-sky forecasts although its

underconfidence (Wilks 2006 p 288) made it statistically

less reliable than other statistical data mining techniques

Recently Gagne et al (2014) used RF to add skill to an

ensemble of storm-scale precipitation forecasts while

Mecikalski et al (2015) found RF performed slightly

worse than logistic regression in forecasting small-scale

convection initiation with NWP and geostationary

satellite data

In this paper we demonstrate how RFs can be

trained to predict the very challenging forecast prob-

lem of large-scale convective storm initiation (MCS-I)

following an approach similar to that used by Williams

(2014) for predicting atmospheric turbulence Section 2

introduces the input predictor fields and our quantita-

tive definition of MCS-I Section 2 also describes the

predictor selection process and documents the im-

proved skill resulting from expanding the predictor list

from a small set of NWP model fields to a combination

of smoothed NWP output and observations The sen-

sitivity of prediction skill to various RF parameters is

explored in section 3 Section 4 shows case studies to

demonstrate the value that the RF technique offers

when compared to the individual constituent data

sources Finally in section 5 the results are summa-

rized and presented along with a discussion of the

strengths and weaknesses of the technique

2 Methodology

TheRF datamining technique requires the definition

of a forecast variable of interest or predictand and a

set of predictor fields that are thought to be related to

the predictand For this study the predictand is the

binary variable representing whether or not MCS-I

occurred at a given time and location and the pre-

dictors are derived from radar reflectivity satellite

data and NWP model output The RF was trained us-

ing data collected during JunendashAugust (JJA) of 2011

and evaluated on data from the summer of 2013 to

provide a stringent test of its ability to capture MCS-I

even when the NWP model changes The datasets

582 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand and RF methodology are described in

detail below

a Datasets

Themodel diagnostics used to train the RF came from

the HRRR (Benjamin et al 2014) The HRRR is a

convection-permitting model run over the entire conti-

nental United States (CONUS) with hourly cycling and

3-km grid spacing The 2011 version of the HRRR

gained information on the location of existing storms

indirectly via the three-dimensional variational data

assimilation (3DVAR) of radar reflectivity into the

13-km Rapid Refresh (RAP) model which was used to

initialize the HRRR forecasts In 2013 the HRRR was

updated to include direct assimilation of radar re-

flectivity into its 3-km grid (Benjamin et al 2014) This

change notably improved the performance of the

HRRR particularly its ability to capture existing MCSs

(Pinto et al 2015) As a result training the RF on 2011

HRRR data and testing it on 2013 HRRR data dem-

onstrates whether or not the RF is robust for use with

different NWP model analysis systems

Extrapolated radar observations started with com-

posite reflectivity provided by the National Mosaic and

Multi-Sensor Quantitative Precipitation Estimation

(NMQ) system from the National Severe Storms Lab-

oratory (Zhang et al 2011) This product merges mul-

tiple radar volumes into a 3D grid with 1-km spacing in

the horizontal and 05-km spacing in the vertical and

then derives 2D fields such as composite reflectivity

Satellite observations came from the Geostationary

Observational Environmental Satellite system (GOES)

operated by theNational Environmental Satellite Data

and Information Service (NESDIS) Brightness tem-

perature in the longwave IR channel (107mm) was

subtracted from the CO2 channel (133mm) to yield a

satellite brightness temperature difference (SBTD)

field The SBTD has been shown to distinguish between

growing cumulonimbi and low cumulus or thin cirrus

(Mecikalski and Bedka 2006) so it is useful to delineate

areas of growing cumuli that may consolidate into an

MCS Thin cirrus and shallow cumulus have brightness

temperature differences of less than 2258C while de-

veloping storms exhibit values between2258 and258Cand difference values near zero indicate mature cumu-

lonimbi Mecikalski and Bedka (2006) and Mecikalski

et al (2008) included SBTD as one of the components of

their satellite-based convection initiation algorithm

The radar reflectivity and SBTD were interpolated

onto the HRRR 3-km grid using bilinear interpolation

These fields were then advected to their expected

downstream locations at later times based on the mo-

tions detected in the vertically integrated liquid water

(VIL) field from the Corridor Integrated Weather Sys-

tem (CIWS Evans and Ducot 2006 Dupree et al 2009)

b Definition of MCS initiation

National composites of VIL that are available from

CIWS were used to identify MCSs following the method

described in Pinto et al (2015) As in Pinto et al (2015)

in this study we define MCSs as consisting of an area of

VIL exceeding 35 kgm22 with a horizontal extent of at

least 100 km (allowing gaps of up to 10km) These

conditions must be met for at least two consecutive tops

of the hour While not essential to the conclusions of the

paper the criteria that have been adopted to classify

MCSs are similar but not identical to those used in many

prior studies (eg Geerts 1998 Houze 2004 Coniglio

et al 2010) The lifetime threshold was set relatively low

to ensure an adequate sample size for developing the

training dataset using data obtained for a limited time

period Larger-sized storms of longer duration are much

less frequently occurring (eg Davis et al 2006) and

therefore would require a longer period from which to

draw an adequate number of representative cases

Once the MCS definition is satisfied the area spanned

by the core area of high VIL is dilated by 125km as shown

in Fig 1 to define the MCS region VIL is used to detect

MCSs instead of radar reflectivity because it is relatively

insensitive to brightband contamination and anomalous

propagation artifacts (eg Smalley and Bennett 2002)

VIL also includes the integrated effect of hydrometeors at

all vertical levels making its intensity more closely related

to convective vigor than a single level of radar reflectivity

After MCSs are identified for each time they are

checked to see if they qualify as an initiation event

(MCS-I) To qualify an MCS must be at least 125 km

removed from any previously existing MCS that was

present during the previous 2 h and it must persist for at

least 1 hMCS-I is evaluated only at the top of each hour

when HRRR model forecasts are valid and an MCS-I

event occurs only in the first hour that a temporally and

spatially isolated MCS is identified A detailed de-

scription of the MCS-I identification algorithm is given

in Pinto et al (2015) The data points around the MCS-I

are used in the RF training set as positive events while

all others are nonevents The expansion of the MCS

region accounts for potential offsets or timing errors

between the observed MCS-I and the environmental

conditions as represented by the model This increases

the number of training data points that go into the RF

and allows for some positional error in a forecast

c Random forest algorithm

A decision tree is a common tool in machine learn-

ing (Breiman et al 1984 Dersquoath and Fabricius 2000

APRIL 2016 AH I J EVYCH ET AL 583

Unauthenticated | Downloaded 051722 0650 PM UTC

Dattatreya 2009) and an RF is an ensemble of weakly

correlated decision trees (Breiman 2001) Collectively the

trees function as an ensemble of lsquolsquoexpertsrsquorsquo each casting a

vote for whether MCS-I will occur (eg Fig 2) All of the

nodes of a decision tree can be reduced to simple rules of

the form if predictor P is x or less (where x is any num-

ber) then follow branchA otherwise follow branch B A

predictor may be used at multiple nodes in the same tree

Each branch will either lead to another node or terminate

with a lsquolsquoyesrsquorsquo or lsquolsquonorsquorsquo vote When a decision tree is being

FIG 1 VIL and associated MCS detections at 1900 UTC 5 Aug 2013 The large black rectangle denotes the

geographical extent of data points used in the RF training suite The dark gray regions have no radar coverage and

are not used The black outline encircles an ongoing MCS that initiated previously while the magenta contours

encircle newly initiated MCSs (MCS-I) The outlines are obtained by extending outward each vertex of the MCS

by 125 km

FIG 2 Given all the predictor values at a point (ie a specific location and time) each tree in the RF votes on the outcome Together

the trees act as an ensemble of experts Each tree is different because 1) each one is trained with a bootstrapped sample of the full

training suite and 2) at each node symbolized by the blue circles only a subset of the original predictors are randomly chosen as

candidates for splitting

584 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

trained the algorithm finds a predictor and a threshold

that lsquolsquosplitsrsquorsquo the training data instances that reach a node

into two subsets in a way that maximizes the homogeneity

of the subsets with respect to the predictand for example

by minimizing the lsquolsquoGini impurityrsquorsquo (Breiman 1996)

RF trees differ from conventional decision trees in that

each RF tree is trained on a bootstrapped sample of the

training cases (illustrated by Fig 3) Additionally at each

node of the tree only a limited randomly selected subset

of predictors is chosen as candidates for splitting whereas

standard decision trees consider all predictors as candi-

dates (The predictor candidates are selected randomly

with replacement so that any predictor may be a candi-

date at any node) An implication of bootstrapping is that

roughly one-third of the training cases are not used for

any given tree and these lsquolsquoout of bagrsquorsquo cases are used as

test cases to quantify the importance of each predictor

field Bootstrapping the training cases and ignoring some

predictors at each nodemake individual treesweaker but

these steps also ensure the trees are not strongly corre-

lated with each other Thus the forest is less susceptible

to overfitting the peculiarities of the training set and can

provide probabilistic information The number of trees

and number of predictors chosen as candidates for split-

ting at each node are tunable parameters to which pre-

dictive performance sensitivity is tested below

TheRF has several advantages over other datamining

techniques For one the empirical model created from

the RF ensemble does not require the predictors to be

monotonically related to the predictand meaning that it

can represent a variety of functional relationships Al-

ternative techniques like logistic regression are in-

herently linear The decision trees in the RF are also

human readable such that relationships between data

and how they were used to predict can be explored

In addition to their predictive capabilities RFs can

rank the importance of individual predictors (Breiman

2001 Topic et al 2014) A predictorrsquos importance is

quantified by scrambling its values in the out-of-bag

training cases for each tree and seeing how much the

classification accuracy of theRFgoes down For example

the expected importance of a random variable is zero

The importance value is often scaled by dividing it by a

quantity akin to its standard error (eg see supplemental

material for Diacuteaz-Uriarte and de Andreacutes 2006) Impor-

tance scores provide a helpful starting point for com-

paring the potential contributions of different variables

and selecting a small but skillful subset of predictors

1) TRAINING

TheRF is trained to use predictors available at a given

time to forecast the occurrence of MCS-I 2 h in the fu-

ture To create the RF training suite predictor values

and associatedMCS-I lsquolsquotruthrsquorsquo values from 2h later were

interpolated onto a cylindrical grid covering 258ndash488Nand 1258ndash678W with horizontal spacing of about 4 km

(00368 latitude and 00388 longitude) The geographical

coverage of data points used in the training suite is

shown in Fig 1 Points over theAtlanticOcean and parts

of Canada and Mexico that are beyond the WSR-88D

radar network coverage were not included The analysis

was done using data available at the top of each hour

Over the 3-month period from June through August

2011 there were over 200 million potential data points

Even though there were many cases to choose from

most of them were null events (no MCS-I) Even in the

most MCS-I-prone geographical regions in the United

States MCS-I events occur only 3 of the time (Pinto

et al 2015) The averageMCS-I frequency for the entire

domain is only 03 This rarity makes MCS-I a difficult

FIG 3 Example of bootstrap sampling from a hypothetical set of 26 cases andashz Twenty-six

cases are randomly selected with replacement to create a 26-element set T Cases may be

selected multiple times or not at all This process is repeated for each tree Those cases not

selected are called out-of-bag cases and are used to assess predictor importance

APRIL 2016 AH I J EVYCH ET AL 585

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand for any statistical forecast algorithm to han-

dle One can achieve 997 accuracy by always

predicting a null event at every grid point To help the

RF algorithm discriminate between MCS-I and non-

MCS-I cases theMCS-I cases are oversampled such that

they make up 30 of the training set This artificial in-

crease in the proportion of events in the training set can

be accounted for in the RF vote calibration phase

The RF parameter sensitivity tests and predictor im-

portance analyses were conducted using 10 disjoint

training sets 5 sets of 18 000 cases each were drawn

randomly without replacement from odd days and an-

other 5 sets were drawn similarly from even days The

standard deviation of skill over these 10 training sets

provides a means for assessing the relative significance

of differences in mean skill score when the RF param-

eters are changed While one standard deviation is not a

particularly stringent requirement there is only a 22

chance that the mean of 10 samples will differ by more

than one standard deviation from the mean of another

10 samples drawn from the same population Selecting

the sets from even or odd days also permits testing a

model trained on even days against independent data

from odd days and vice versa

In general one wants as many training cases as pos-

sible to fully sample the general population of weather

scenarios On the other hand given finite resources one

must limit the number of cases We found that 18 000

cases allowed for efficient training of the RF while fully

sampling the parameter space This number of cases is

actually quite large compared to other recent studies

For example McGovern et al (2011) successfully

trained an RF to predict atmospheric turbulence with

only 2055 cases and Mecikalski et al (2015) predicted

the onset of radar reflectivity $ 35dBZ at the 2108Clevel (ie a formal definition for convective initiation)

with only 9015 cases While the number of cases is rel-

atively large it is important to note that a higher number

of training examples is often needed when the event is

rare so that both verifying classes are sufficiently

sampled

2) PREDICTOR SELECTION

As a proof of concept the RF was first trained with a

select group of diagnostic fields obtained from the

HRRR Later the value of adding observational pre-

dictor fields was explored Diagnostic output from the

HRRR included 17 two-dimensional fields deemed to be

relevant for the prediction of MCS-I (Table 1) Envi-

ronmental factors that contribute to the development of

MCSs are discussed in Houze (2004) Undoubtedly

there are other fields that may be derived from the full

three-dimensional HRRR dataset that would have po-

tential for adding value to the prediction of MCS-I (eg

vertical wind shear) but for simplicity we limited our

training sets to fields available within the HRRR two-

dimensional data stream In addition local solar time

was added as a predictor field as a simple way to account

for differing mechanisms responsible for daytime and

nocturnal MCS-I

As noted by Hall et al (2011) lsquolsquoone of the most ef-

fective ways to select features that are predictive of

some phenomena is manually based on subject matter

expertisersquorsquo Thus each variable available in the HRRR

TABLE 1 Predictors sorted by mean selection count

Predictor Description Unit Mean selection count

PWAT_EATM Precipitable water in model column kgm22 504

PRES_SFC Surface pressure (a proxy for terrain height) hPa 490

Local solar time UTC hour 1 (8E)158 h21 h 459

REFC_EATM Max reflectivity in model column dBZ 408

TSOIL_SFC Soil temperature at the surface K 383

CAPE_SFC CAPE of surface parcel J kg21 348

HPBL_SFC Height of planetary boundary layer m 283

RH_HTGL 2-m relative humidity 280

DZDT1Hr_SIGL Vertical velocity in 900ndash700-hPa layer m s21 260

APCP1Hr_SFC Accumulated model precipitation mm 246

DSWRF_SFC Downward shortwave radiation flux at surface Wm22 232

ULWRF_NTAT Upward longwave radiation flux at top of atmosphere Wm22 220

SHTFL_SFC Sensible heat flux at surface Wm22 209

LHTFL_SFC Latent heat flux at surface Wm22 200

DPT_HTGL 2-m dewpoint K 184

34LFTX_SPDLa Best (four layer) lifted index K 159

SPFH_HTGLa Specific humidity g kg21 152

CIN_SFC CIN of surface parcel J kg21 93

a These two predictors were removed after the mean selection count analysis for reasons described in the text

586 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 2: Probabilistic Forecasts of Mesoscale Convective System

Very short-term predictions of the initiation of an MCS

(MCS-I) requires a high-resolution depiction of the

evolving stability shear profile and potential forcing

mechanisms such as surface boundaries or elevated

propagating waves (eg Jirak and Cotton 2007 Houze

2004) High-resolution models with advanced data as-

similation can provide a three-dimensional estimate of

the evolving environment but imperfections in the

model and poorly constrained errors in temperature and

moisture mean that NWP predictions of MCS-I are still

prone to a great deal of uncertainty (Pinto et al 2015)

Statistical techniques (eg linear regression k-nearest

neighbor analogs neural networks random forest and

genetic algorithms) can operate on data much more

quickly than a human analyst enabling the rapid di-

gestion of frequently updating datasets (eg surface

mesonets radar satellite) along with NWP models as

often as new data arrive In this study we evaluate the

utility and predictive skill of a random forest (RF) at

predicting MCS-I The RF technique is still relatively

new to most meteorologists yet has shown promise in

several other complex weather prediction applications

as described below

Statistical models have long been a part of weather

forecasting For example model output statistics (MOS)

based on multiple linear regressions are routinely used

to compensate for systematic model biases and to gen-

erate reliable probabilistic forecasts of precipitation

cloud cover and other variables (Glahn and Lowry

1972) Analog statistical techniques identify similar past

weather patterns and give probabilistic projections

based on the observed evolution of those past patterns

(Hamill andWhitaker 2006 Delle Monache et al 2013)

The tropical weather community uses statistical models

to predict the probability of tropical cyclogenesis rapid

intensification and eyewall replacement cycles (Rozoff

and Kossin 2011 DeMaria and Kaplan 1994) Marzban

et al (2007) used neural networks to predict cloud

ceiling and visibility and Coniglio et al (2007) used lo-

gistic regression to predict MCS maintenance based on

vertical wind and stability profiles More recently

Roebber (2015) used evolutionary programming tech-

niques to generate probabilistic forecasts of minimum

surface temperatures

In past studies the skill of the RF has been shown to

vary with implementation and application Prior to its use

in meteorology the RF statistical technique was used

successfully in biomedical research to select and classify

genes relevant to diseases (eg Diacuteaz-Uriarte and de

Andreacutes 2006) More recently the RF approach was used

to diagnose regions of atmospheric turbulence due to

convection from radar and satellite observations and

NWPmodel data (Williams et al 2007 2008c McGovern

et al 2011 Williams 2014) Williams et al (2008ab)

showed how RFs could be used to predict areas where

convective storms were likely Gagne et al (2009) com-

pared the RF technique to a host of other machine

learning algorithms and found it to be better than all

other algorithms at classifying radar-based storm type

Another comparative study described by Lakshmanan

et al (2010) found that RF had a slight edge over com-

peting artificial intelligence learning techniques in clas-

sifying storm type Hall et al (2011) found that the RF

was one of the best algorithms in terms of overall skill

metrics for short-term clear-sky forecasts although its

underconfidence (Wilks 2006 p 288) made it statistically

less reliable than other statistical data mining techniques

Recently Gagne et al (2014) used RF to add skill to an

ensemble of storm-scale precipitation forecasts while

Mecikalski et al (2015) found RF performed slightly

worse than logistic regression in forecasting small-scale

convection initiation with NWP and geostationary

satellite data

In this paper we demonstrate how RFs can be

trained to predict the very challenging forecast prob-

lem of large-scale convective storm initiation (MCS-I)

following an approach similar to that used by Williams

(2014) for predicting atmospheric turbulence Section 2

introduces the input predictor fields and our quantita-

tive definition of MCS-I Section 2 also describes the

predictor selection process and documents the im-

proved skill resulting from expanding the predictor list

from a small set of NWP model fields to a combination

of smoothed NWP output and observations The sen-

sitivity of prediction skill to various RF parameters is

explored in section 3 Section 4 shows case studies to

demonstrate the value that the RF technique offers

when compared to the individual constituent data

sources Finally in section 5 the results are summa-

rized and presented along with a discussion of the

strengths and weaknesses of the technique

2 Methodology

TheRF datamining technique requires the definition

of a forecast variable of interest or predictand and a

set of predictor fields that are thought to be related to

the predictand For this study the predictand is the

binary variable representing whether or not MCS-I

occurred at a given time and location and the pre-

dictors are derived from radar reflectivity satellite

data and NWP model output The RF was trained us-

ing data collected during JunendashAugust (JJA) of 2011

and evaluated on data from the summer of 2013 to

provide a stringent test of its ability to capture MCS-I

even when the NWP model changes The datasets

582 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand and RF methodology are described in

detail below

a Datasets

Themodel diagnostics used to train the RF came from

the HRRR (Benjamin et al 2014) The HRRR is a

convection-permitting model run over the entire conti-

nental United States (CONUS) with hourly cycling and

3-km grid spacing The 2011 version of the HRRR

gained information on the location of existing storms

indirectly via the three-dimensional variational data

assimilation (3DVAR) of radar reflectivity into the

13-km Rapid Refresh (RAP) model which was used to

initialize the HRRR forecasts In 2013 the HRRR was

updated to include direct assimilation of radar re-

flectivity into its 3-km grid (Benjamin et al 2014) This

change notably improved the performance of the

HRRR particularly its ability to capture existing MCSs

(Pinto et al 2015) As a result training the RF on 2011

HRRR data and testing it on 2013 HRRR data dem-

onstrates whether or not the RF is robust for use with

different NWP model analysis systems

Extrapolated radar observations started with com-

posite reflectivity provided by the National Mosaic and

Multi-Sensor Quantitative Precipitation Estimation

(NMQ) system from the National Severe Storms Lab-

oratory (Zhang et al 2011) This product merges mul-

tiple radar volumes into a 3D grid with 1-km spacing in

the horizontal and 05-km spacing in the vertical and

then derives 2D fields such as composite reflectivity

Satellite observations came from the Geostationary

Observational Environmental Satellite system (GOES)

operated by theNational Environmental Satellite Data

and Information Service (NESDIS) Brightness tem-

perature in the longwave IR channel (107mm) was

subtracted from the CO2 channel (133mm) to yield a

satellite brightness temperature difference (SBTD)

field The SBTD has been shown to distinguish between

growing cumulonimbi and low cumulus or thin cirrus

(Mecikalski and Bedka 2006) so it is useful to delineate

areas of growing cumuli that may consolidate into an

MCS Thin cirrus and shallow cumulus have brightness

temperature differences of less than 2258C while de-

veloping storms exhibit values between2258 and258Cand difference values near zero indicate mature cumu-

lonimbi Mecikalski and Bedka (2006) and Mecikalski

et al (2008) included SBTD as one of the components of

their satellite-based convection initiation algorithm

The radar reflectivity and SBTD were interpolated

onto the HRRR 3-km grid using bilinear interpolation

These fields were then advected to their expected

downstream locations at later times based on the mo-

tions detected in the vertically integrated liquid water

(VIL) field from the Corridor Integrated Weather Sys-

tem (CIWS Evans and Ducot 2006 Dupree et al 2009)

b Definition of MCS initiation

National composites of VIL that are available from

CIWS were used to identify MCSs following the method

described in Pinto et al (2015) As in Pinto et al (2015)

in this study we define MCSs as consisting of an area of

VIL exceeding 35 kgm22 with a horizontal extent of at

least 100 km (allowing gaps of up to 10km) These

conditions must be met for at least two consecutive tops

of the hour While not essential to the conclusions of the

paper the criteria that have been adopted to classify

MCSs are similar but not identical to those used in many

prior studies (eg Geerts 1998 Houze 2004 Coniglio

et al 2010) The lifetime threshold was set relatively low

to ensure an adequate sample size for developing the

training dataset using data obtained for a limited time

period Larger-sized storms of longer duration are much

less frequently occurring (eg Davis et al 2006) and

therefore would require a longer period from which to

draw an adequate number of representative cases

Once the MCS definition is satisfied the area spanned

by the core area of high VIL is dilated by 125km as shown

in Fig 1 to define the MCS region VIL is used to detect

MCSs instead of radar reflectivity because it is relatively

insensitive to brightband contamination and anomalous

propagation artifacts (eg Smalley and Bennett 2002)

VIL also includes the integrated effect of hydrometeors at

all vertical levels making its intensity more closely related

to convective vigor than a single level of radar reflectivity

After MCSs are identified for each time they are

checked to see if they qualify as an initiation event

(MCS-I) To qualify an MCS must be at least 125 km

removed from any previously existing MCS that was

present during the previous 2 h and it must persist for at

least 1 hMCS-I is evaluated only at the top of each hour

when HRRR model forecasts are valid and an MCS-I

event occurs only in the first hour that a temporally and

spatially isolated MCS is identified A detailed de-

scription of the MCS-I identification algorithm is given

in Pinto et al (2015) The data points around the MCS-I

are used in the RF training set as positive events while

all others are nonevents The expansion of the MCS

region accounts for potential offsets or timing errors

between the observed MCS-I and the environmental

conditions as represented by the model This increases

the number of training data points that go into the RF

and allows for some positional error in a forecast

c Random forest algorithm

A decision tree is a common tool in machine learn-

ing (Breiman et al 1984 Dersquoath and Fabricius 2000

APRIL 2016 AH I J EVYCH ET AL 583

Unauthenticated | Downloaded 051722 0650 PM UTC

Dattatreya 2009) and an RF is an ensemble of weakly

correlated decision trees (Breiman 2001) Collectively the

trees function as an ensemble of lsquolsquoexpertsrsquorsquo each casting a

vote for whether MCS-I will occur (eg Fig 2) All of the

nodes of a decision tree can be reduced to simple rules of

the form if predictor P is x or less (where x is any num-

ber) then follow branchA otherwise follow branch B A

predictor may be used at multiple nodes in the same tree

Each branch will either lead to another node or terminate

with a lsquolsquoyesrsquorsquo or lsquolsquonorsquorsquo vote When a decision tree is being

FIG 1 VIL and associated MCS detections at 1900 UTC 5 Aug 2013 The large black rectangle denotes the

geographical extent of data points used in the RF training suite The dark gray regions have no radar coverage and

are not used The black outline encircles an ongoing MCS that initiated previously while the magenta contours

encircle newly initiated MCSs (MCS-I) The outlines are obtained by extending outward each vertex of the MCS

by 125 km

FIG 2 Given all the predictor values at a point (ie a specific location and time) each tree in the RF votes on the outcome Together

the trees act as an ensemble of experts Each tree is different because 1) each one is trained with a bootstrapped sample of the full

training suite and 2) at each node symbolized by the blue circles only a subset of the original predictors are randomly chosen as

candidates for splitting

584 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

trained the algorithm finds a predictor and a threshold

that lsquolsquosplitsrsquorsquo the training data instances that reach a node

into two subsets in a way that maximizes the homogeneity

of the subsets with respect to the predictand for example

by minimizing the lsquolsquoGini impurityrsquorsquo (Breiman 1996)

RF trees differ from conventional decision trees in that

each RF tree is trained on a bootstrapped sample of the

training cases (illustrated by Fig 3) Additionally at each

node of the tree only a limited randomly selected subset

of predictors is chosen as candidates for splitting whereas

standard decision trees consider all predictors as candi-

dates (The predictor candidates are selected randomly

with replacement so that any predictor may be a candi-

date at any node) An implication of bootstrapping is that

roughly one-third of the training cases are not used for

any given tree and these lsquolsquoout of bagrsquorsquo cases are used as

test cases to quantify the importance of each predictor

field Bootstrapping the training cases and ignoring some

predictors at each nodemake individual treesweaker but

these steps also ensure the trees are not strongly corre-

lated with each other Thus the forest is less susceptible

to overfitting the peculiarities of the training set and can

provide probabilistic information The number of trees

and number of predictors chosen as candidates for split-

ting at each node are tunable parameters to which pre-

dictive performance sensitivity is tested below

TheRF has several advantages over other datamining

techniques For one the empirical model created from

the RF ensemble does not require the predictors to be

monotonically related to the predictand meaning that it

can represent a variety of functional relationships Al-

ternative techniques like logistic regression are in-

herently linear The decision trees in the RF are also

human readable such that relationships between data

and how they were used to predict can be explored

In addition to their predictive capabilities RFs can

rank the importance of individual predictors (Breiman

2001 Topic et al 2014) A predictorrsquos importance is

quantified by scrambling its values in the out-of-bag

training cases for each tree and seeing how much the

classification accuracy of theRFgoes down For example

the expected importance of a random variable is zero

The importance value is often scaled by dividing it by a

quantity akin to its standard error (eg see supplemental

material for Diacuteaz-Uriarte and de Andreacutes 2006) Impor-

tance scores provide a helpful starting point for com-

paring the potential contributions of different variables

and selecting a small but skillful subset of predictors

1) TRAINING

TheRF is trained to use predictors available at a given

time to forecast the occurrence of MCS-I 2 h in the fu-

ture To create the RF training suite predictor values

and associatedMCS-I lsquolsquotruthrsquorsquo values from 2h later were

interpolated onto a cylindrical grid covering 258ndash488Nand 1258ndash678W with horizontal spacing of about 4 km

(00368 latitude and 00388 longitude) The geographical

coverage of data points used in the training suite is

shown in Fig 1 Points over theAtlanticOcean and parts

of Canada and Mexico that are beyond the WSR-88D

radar network coverage were not included The analysis

was done using data available at the top of each hour

Over the 3-month period from June through August

2011 there were over 200 million potential data points

Even though there were many cases to choose from

most of them were null events (no MCS-I) Even in the

most MCS-I-prone geographical regions in the United

States MCS-I events occur only 3 of the time (Pinto

et al 2015) The averageMCS-I frequency for the entire

domain is only 03 This rarity makes MCS-I a difficult

FIG 3 Example of bootstrap sampling from a hypothetical set of 26 cases andashz Twenty-six

cases are randomly selected with replacement to create a 26-element set T Cases may be

selected multiple times or not at all This process is repeated for each tree Those cases not

selected are called out-of-bag cases and are used to assess predictor importance

APRIL 2016 AH I J EVYCH ET AL 585

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand for any statistical forecast algorithm to han-

dle One can achieve 997 accuracy by always

predicting a null event at every grid point To help the

RF algorithm discriminate between MCS-I and non-

MCS-I cases theMCS-I cases are oversampled such that

they make up 30 of the training set This artificial in-

crease in the proportion of events in the training set can

be accounted for in the RF vote calibration phase

The RF parameter sensitivity tests and predictor im-

portance analyses were conducted using 10 disjoint

training sets 5 sets of 18 000 cases each were drawn

randomly without replacement from odd days and an-

other 5 sets were drawn similarly from even days The

standard deviation of skill over these 10 training sets

provides a means for assessing the relative significance

of differences in mean skill score when the RF param-

eters are changed While one standard deviation is not a

particularly stringent requirement there is only a 22

chance that the mean of 10 samples will differ by more

than one standard deviation from the mean of another

10 samples drawn from the same population Selecting

the sets from even or odd days also permits testing a

model trained on even days against independent data

from odd days and vice versa

In general one wants as many training cases as pos-

sible to fully sample the general population of weather

scenarios On the other hand given finite resources one

must limit the number of cases We found that 18 000

cases allowed for efficient training of the RF while fully

sampling the parameter space This number of cases is

actually quite large compared to other recent studies

For example McGovern et al (2011) successfully

trained an RF to predict atmospheric turbulence with

only 2055 cases and Mecikalski et al (2015) predicted

the onset of radar reflectivity $ 35dBZ at the 2108Clevel (ie a formal definition for convective initiation)

with only 9015 cases While the number of cases is rel-

atively large it is important to note that a higher number

of training examples is often needed when the event is

rare so that both verifying classes are sufficiently

sampled

2) PREDICTOR SELECTION

As a proof of concept the RF was first trained with a

select group of diagnostic fields obtained from the

HRRR Later the value of adding observational pre-

dictor fields was explored Diagnostic output from the

HRRR included 17 two-dimensional fields deemed to be

relevant for the prediction of MCS-I (Table 1) Envi-

ronmental factors that contribute to the development of

MCSs are discussed in Houze (2004) Undoubtedly

there are other fields that may be derived from the full

three-dimensional HRRR dataset that would have po-

tential for adding value to the prediction of MCS-I (eg

vertical wind shear) but for simplicity we limited our

training sets to fields available within the HRRR two-

dimensional data stream In addition local solar time

was added as a predictor field as a simple way to account

for differing mechanisms responsible for daytime and

nocturnal MCS-I

As noted by Hall et al (2011) lsquolsquoone of the most ef-

fective ways to select features that are predictive of

some phenomena is manually based on subject matter

expertisersquorsquo Thus each variable available in the HRRR

TABLE 1 Predictors sorted by mean selection count

Predictor Description Unit Mean selection count

PWAT_EATM Precipitable water in model column kgm22 504

PRES_SFC Surface pressure (a proxy for terrain height) hPa 490

Local solar time UTC hour 1 (8E)158 h21 h 459

REFC_EATM Max reflectivity in model column dBZ 408

TSOIL_SFC Soil temperature at the surface K 383

CAPE_SFC CAPE of surface parcel J kg21 348

HPBL_SFC Height of planetary boundary layer m 283

RH_HTGL 2-m relative humidity 280

DZDT1Hr_SIGL Vertical velocity in 900ndash700-hPa layer m s21 260

APCP1Hr_SFC Accumulated model precipitation mm 246

DSWRF_SFC Downward shortwave radiation flux at surface Wm22 232

ULWRF_NTAT Upward longwave radiation flux at top of atmosphere Wm22 220

SHTFL_SFC Sensible heat flux at surface Wm22 209

LHTFL_SFC Latent heat flux at surface Wm22 200

DPT_HTGL 2-m dewpoint K 184

34LFTX_SPDLa Best (four layer) lifted index K 159

SPFH_HTGLa Specific humidity g kg21 152

CIN_SFC CIN of surface parcel J kg21 93

a These two predictors were removed after the mean selection count analysis for reasons described in the text

586 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 3: Probabilistic Forecasts of Mesoscale Convective System

predictand and RF methodology are described in

detail below

a Datasets

Themodel diagnostics used to train the RF came from

the HRRR (Benjamin et al 2014) The HRRR is a

convection-permitting model run over the entire conti-

nental United States (CONUS) with hourly cycling and

3-km grid spacing The 2011 version of the HRRR

gained information on the location of existing storms

indirectly via the three-dimensional variational data

assimilation (3DVAR) of radar reflectivity into the

13-km Rapid Refresh (RAP) model which was used to

initialize the HRRR forecasts In 2013 the HRRR was

updated to include direct assimilation of radar re-

flectivity into its 3-km grid (Benjamin et al 2014) This

change notably improved the performance of the

HRRR particularly its ability to capture existing MCSs

(Pinto et al 2015) As a result training the RF on 2011

HRRR data and testing it on 2013 HRRR data dem-

onstrates whether or not the RF is robust for use with

different NWP model analysis systems

Extrapolated radar observations started with com-

posite reflectivity provided by the National Mosaic and

Multi-Sensor Quantitative Precipitation Estimation

(NMQ) system from the National Severe Storms Lab-

oratory (Zhang et al 2011) This product merges mul-

tiple radar volumes into a 3D grid with 1-km spacing in

the horizontal and 05-km spacing in the vertical and

then derives 2D fields such as composite reflectivity

Satellite observations came from the Geostationary

Observational Environmental Satellite system (GOES)

operated by theNational Environmental Satellite Data

and Information Service (NESDIS) Brightness tem-

perature in the longwave IR channel (107mm) was

subtracted from the CO2 channel (133mm) to yield a

satellite brightness temperature difference (SBTD)

field The SBTD has been shown to distinguish between

growing cumulonimbi and low cumulus or thin cirrus

(Mecikalski and Bedka 2006) so it is useful to delineate

areas of growing cumuli that may consolidate into an

MCS Thin cirrus and shallow cumulus have brightness

temperature differences of less than 2258C while de-

veloping storms exhibit values between2258 and258Cand difference values near zero indicate mature cumu-

lonimbi Mecikalski and Bedka (2006) and Mecikalski

et al (2008) included SBTD as one of the components of

their satellite-based convection initiation algorithm

The radar reflectivity and SBTD were interpolated

onto the HRRR 3-km grid using bilinear interpolation

These fields were then advected to their expected

downstream locations at later times based on the mo-

tions detected in the vertically integrated liquid water

(VIL) field from the Corridor Integrated Weather Sys-

tem (CIWS Evans and Ducot 2006 Dupree et al 2009)

b Definition of MCS initiation

National composites of VIL that are available from

CIWS were used to identify MCSs following the method

described in Pinto et al (2015) As in Pinto et al (2015)

in this study we define MCSs as consisting of an area of

VIL exceeding 35 kgm22 with a horizontal extent of at

least 100 km (allowing gaps of up to 10km) These

conditions must be met for at least two consecutive tops

of the hour While not essential to the conclusions of the

paper the criteria that have been adopted to classify

MCSs are similar but not identical to those used in many

prior studies (eg Geerts 1998 Houze 2004 Coniglio

et al 2010) The lifetime threshold was set relatively low

to ensure an adequate sample size for developing the

training dataset using data obtained for a limited time

period Larger-sized storms of longer duration are much

less frequently occurring (eg Davis et al 2006) and

therefore would require a longer period from which to

draw an adequate number of representative cases

Once the MCS definition is satisfied the area spanned

by the core area of high VIL is dilated by 125km as shown

in Fig 1 to define the MCS region VIL is used to detect

MCSs instead of radar reflectivity because it is relatively

insensitive to brightband contamination and anomalous

propagation artifacts (eg Smalley and Bennett 2002)

VIL also includes the integrated effect of hydrometeors at

all vertical levels making its intensity more closely related

to convective vigor than a single level of radar reflectivity

After MCSs are identified for each time they are

checked to see if they qualify as an initiation event

(MCS-I) To qualify an MCS must be at least 125 km

removed from any previously existing MCS that was

present during the previous 2 h and it must persist for at

least 1 hMCS-I is evaluated only at the top of each hour

when HRRR model forecasts are valid and an MCS-I

event occurs only in the first hour that a temporally and

spatially isolated MCS is identified A detailed de-

scription of the MCS-I identification algorithm is given

in Pinto et al (2015) The data points around the MCS-I

are used in the RF training set as positive events while

all others are nonevents The expansion of the MCS

region accounts for potential offsets or timing errors

between the observed MCS-I and the environmental

conditions as represented by the model This increases

the number of training data points that go into the RF

and allows for some positional error in a forecast

c Random forest algorithm

A decision tree is a common tool in machine learn-

ing (Breiman et al 1984 Dersquoath and Fabricius 2000

APRIL 2016 AH I J EVYCH ET AL 583

Unauthenticated | Downloaded 051722 0650 PM UTC

Dattatreya 2009) and an RF is an ensemble of weakly

correlated decision trees (Breiman 2001) Collectively the

trees function as an ensemble of lsquolsquoexpertsrsquorsquo each casting a

vote for whether MCS-I will occur (eg Fig 2) All of the

nodes of a decision tree can be reduced to simple rules of

the form if predictor P is x or less (where x is any num-

ber) then follow branchA otherwise follow branch B A

predictor may be used at multiple nodes in the same tree

Each branch will either lead to another node or terminate

with a lsquolsquoyesrsquorsquo or lsquolsquonorsquorsquo vote When a decision tree is being

FIG 1 VIL and associated MCS detections at 1900 UTC 5 Aug 2013 The large black rectangle denotes the

geographical extent of data points used in the RF training suite The dark gray regions have no radar coverage and

are not used The black outline encircles an ongoing MCS that initiated previously while the magenta contours

encircle newly initiated MCSs (MCS-I) The outlines are obtained by extending outward each vertex of the MCS

by 125 km

FIG 2 Given all the predictor values at a point (ie a specific location and time) each tree in the RF votes on the outcome Together

the trees act as an ensemble of experts Each tree is different because 1) each one is trained with a bootstrapped sample of the full

training suite and 2) at each node symbolized by the blue circles only a subset of the original predictors are randomly chosen as

candidates for splitting

584 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

trained the algorithm finds a predictor and a threshold

that lsquolsquosplitsrsquorsquo the training data instances that reach a node

into two subsets in a way that maximizes the homogeneity

of the subsets with respect to the predictand for example

by minimizing the lsquolsquoGini impurityrsquorsquo (Breiman 1996)

RF trees differ from conventional decision trees in that

each RF tree is trained on a bootstrapped sample of the

training cases (illustrated by Fig 3) Additionally at each

node of the tree only a limited randomly selected subset

of predictors is chosen as candidates for splitting whereas

standard decision trees consider all predictors as candi-

dates (The predictor candidates are selected randomly

with replacement so that any predictor may be a candi-

date at any node) An implication of bootstrapping is that

roughly one-third of the training cases are not used for

any given tree and these lsquolsquoout of bagrsquorsquo cases are used as

test cases to quantify the importance of each predictor

field Bootstrapping the training cases and ignoring some

predictors at each nodemake individual treesweaker but

these steps also ensure the trees are not strongly corre-

lated with each other Thus the forest is less susceptible

to overfitting the peculiarities of the training set and can

provide probabilistic information The number of trees

and number of predictors chosen as candidates for split-

ting at each node are tunable parameters to which pre-

dictive performance sensitivity is tested below

TheRF has several advantages over other datamining

techniques For one the empirical model created from

the RF ensemble does not require the predictors to be

monotonically related to the predictand meaning that it

can represent a variety of functional relationships Al-

ternative techniques like logistic regression are in-

herently linear The decision trees in the RF are also

human readable such that relationships between data

and how they were used to predict can be explored

In addition to their predictive capabilities RFs can

rank the importance of individual predictors (Breiman

2001 Topic et al 2014) A predictorrsquos importance is

quantified by scrambling its values in the out-of-bag

training cases for each tree and seeing how much the

classification accuracy of theRFgoes down For example

the expected importance of a random variable is zero

The importance value is often scaled by dividing it by a

quantity akin to its standard error (eg see supplemental

material for Diacuteaz-Uriarte and de Andreacutes 2006) Impor-

tance scores provide a helpful starting point for com-

paring the potential contributions of different variables

and selecting a small but skillful subset of predictors

1) TRAINING

TheRF is trained to use predictors available at a given

time to forecast the occurrence of MCS-I 2 h in the fu-

ture To create the RF training suite predictor values

and associatedMCS-I lsquolsquotruthrsquorsquo values from 2h later were

interpolated onto a cylindrical grid covering 258ndash488Nand 1258ndash678W with horizontal spacing of about 4 km

(00368 latitude and 00388 longitude) The geographical

coverage of data points used in the training suite is

shown in Fig 1 Points over theAtlanticOcean and parts

of Canada and Mexico that are beyond the WSR-88D

radar network coverage were not included The analysis

was done using data available at the top of each hour

Over the 3-month period from June through August

2011 there were over 200 million potential data points

Even though there were many cases to choose from

most of them were null events (no MCS-I) Even in the

most MCS-I-prone geographical regions in the United

States MCS-I events occur only 3 of the time (Pinto

et al 2015) The averageMCS-I frequency for the entire

domain is only 03 This rarity makes MCS-I a difficult

FIG 3 Example of bootstrap sampling from a hypothetical set of 26 cases andashz Twenty-six

cases are randomly selected with replacement to create a 26-element set T Cases may be

selected multiple times or not at all This process is repeated for each tree Those cases not

selected are called out-of-bag cases and are used to assess predictor importance

APRIL 2016 AH I J EVYCH ET AL 585

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand for any statistical forecast algorithm to han-

dle One can achieve 997 accuracy by always

predicting a null event at every grid point To help the

RF algorithm discriminate between MCS-I and non-

MCS-I cases theMCS-I cases are oversampled such that

they make up 30 of the training set This artificial in-

crease in the proportion of events in the training set can

be accounted for in the RF vote calibration phase

The RF parameter sensitivity tests and predictor im-

portance analyses were conducted using 10 disjoint

training sets 5 sets of 18 000 cases each were drawn

randomly without replacement from odd days and an-

other 5 sets were drawn similarly from even days The

standard deviation of skill over these 10 training sets

provides a means for assessing the relative significance

of differences in mean skill score when the RF param-

eters are changed While one standard deviation is not a

particularly stringent requirement there is only a 22

chance that the mean of 10 samples will differ by more

than one standard deviation from the mean of another

10 samples drawn from the same population Selecting

the sets from even or odd days also permits testing a

model trained on even days against independent data

from odd days and vice versa

In general one wants as many training cases as pos-

sible to fully sample the general population of weather

scenarios On the other hand given finite resources one

must limit the number of cases We found that 18 000

cases allowed for efficient training of the RF while fully

sampling the parameter space This number of cases is

actually quite large compared to other recent studies

For example McGovern et al (2011) successfully

trained an RF to predict atmospheric turbulence with

only 2055 cases and Mecikalski et al (2015) predicted

the onset of radar reflectivity $ 35dBZ at the 2108Clevel (ie a formal definition for convective initiation)

with only 9015 cases While the number of cases is rel-

atively large it is important to note that a higher number

of training examples is often needed when the event is

rare so that both verifying classes are sufficiently

sampled

2) PREDICTOR SELECTION

As a proof of concept the RF was first trained with a

select group of diagnostic fields obtained from the

HRRR Later the value of adding observational pre-

dictor fields was explored Diagnostic output from the

HRRR included 17 two-dimensional fields deemed to be

relevant for the prediction of MCS-I (Table 1) Envi-

ronmental factors that contribute to the development of

MCSs are discussed in Houze (2004) Undoubtedly

there are other fields that may be derived from the full

three-dimensional HRRR dataset that would have po-

tential for adding value to the prediction of MCS-I (eg

vertical wind shear) but for simplicity we limited our

training sets to fields available within the HRRR two-

dimensional data stream In addition local solar time

was added as a predictor field as a simple way to account

for differing mechanisms responsible for daytime and

nocturnal MCS-I

As noted by Hall et al (2011) lsquolsquoone of the most ef-

fective ways to select features that are predictive of

some phenomena is manually based on subject matter

expertisersquorsquo Thus each variable available in the HRRR

TABLE 1 Predictors sorted by mean selection count

Predictor Description Unit Mean selection count

PWAT_EATM Precipitable water in model column kgm22 504

PRES_SFC Surface pressure (a proxy for terrain height) hPa 490

Local solar time UTC hour 1 (8E)158 h21 h 459

REFC_EATM Max reflectivity in model column dBZ 408

TSOIL_SFC Soil temperature at the surface K 383

CAPE_SFC CAPE of surface parcel J kg21 348

HPBL_SFC Height of planetary boundary layer m 283

RH_HTGL 2-m relative humidity 280

DZDT1Hr_SIGL Vertical velocity in 900ndash700-hPa layer m s21 260

APCP1Hr_SFC Accumulated model precipitation mm 246

DSWRF_SFC Downward shortwave radiation flux at surface Wm22 232

ULWRF_NTAT Upward longwave radiation flux at top of atmosphere Wm22 220

SHTFL_SFC Sensible heat flux at surface Wm22 209

LHTFL_SFC Latent heat flux at surface Wm22 200

DPT_HTGL 2-m dewpoint K 184

34LFTX_SPDLa Best (four layer) lifted index K 159

SPFH_HTGLa Specific humidity g kg21 152

CIN_SFC CIN of surface parcel J kg21 93

a These two predictors were removed after the mean selection count analysis for reasons described in the text

586 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 4: Probabilistic Forecasts of Mesoscale Convective System

Dattatreya 2009) and an RF is an ensemble of weakly

correlated decision trees (Breiman 2001) Collectively the

trees function as an ensemble of lsquolsquoexpertsrsquorsquo each casting a

vote for whether MCS-I will occur (eg Fig 2) All of the

nodes of a decision tree can be reduced to simple rules of

the form if predictor P is x or less (where x is any num-

ber) then follow branchA otherwise follow branch B A

predictor may be used at multiple nodes in the same tree

Each branch will either lead to another node or terminate

with a lsquolsquoyesrsquorsquo or lsquolsquonorsquorsquo vote When a decision tree is being

FIG 1 VIL and associated MCS detections at 1900 UTC 5 Aug 2013 The large black rectangle denotes the

geographical extent of data points used in the RF training suite The dark gray regions have no radar coverage and

are not used The black outline encircles an ongoing MCS that initiated previously while the magenta contours

encircle newly initiated MCSs (MCS-I) The outlines are obtained by extending outward each vertex of the MCS

by 125 km

FIG 2 Given all the predictor values at a point (ie a specific location and time) each tree in the RF votes on the outcome Together

the trees act as an ensemble of experts Each tree is different because 1) each one is trained with a bootstrapped sample of the full

training suite and 2) at each node symbolized by the blue circles only a subset of the original predictors are randomly chosen as

candidates for splitting

584 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

trained the algorithm finds a predictor and a threshold

that lsquolsquosplitsrsquorsquo the training data instances that reach a node

into two subsets in a way that maximizes the homogeneity

of the subsets with respect to the predictand for example

by minimizing the lsquolsquoGini impurityrsquorsquo (Breiman 1996)

RF trees differ from conventional decision trees in that

each RF tree is trained on a bootstrapped sample of the

training cases (illustrated by Fig 3) Additionally at each

node of the tree only a limited randomly selected subset

of predictors is chosen as candidates for splitting whereas

standard decision trees consider all predictors as candi-

dates (The predictor candidates are selected randomly

with replacement so that any predictor may be a candi-

date at any node) An implication of bootstrapping is that

roughly one-third of the training cases are not used for

any given tree and these lsquolsquoout of bagrsquorsquo cases are used as

test cases to quantify the importance of each predictor

field Bootstrapping the training cases and ignoring some

predictors at each nodemake individual treesweaker but

these steps also ensure the trees are not strongly corre-

lated with each other Thus the forest is less susceptible

to overfitting the peculiarities of the training set and can

provide probabilistic information The number of trees

and number of predictors chosen as candidates for split-

ting at each node are tunable parameters to which pre-

dictive performance sensitivity is tested below

TheRF has several advantages over other datamining

techniques For one the empirical model created from

the RF ensemble does not require the predictors to be

monotonically related to the predictand meaning that it

can represent a variety of functional relationships Al-

ternative techniques like logistic regression are in-

herently linear The decision trees in the RF are also

human readable such that relationships between data

and how they were used to predict can be explored

In addition to their predictive capabilities RFs can

rank the importance of individual predictors (Breiman

2001 Topic et al 2014) A predictorrsquos importance is

quantified by scrambling its values in the out-of-bag

training cases for each tree and seeing how much the

classification accuracy of theRFgoes down For example

the expected importance of a random variable is zero

The importance value is often scaled by dividing it by a

quantity akin to its standard error (eg see supplemental

material for Diacuteaz-Uriarte and de Andreacutes 2006) Impor-

tance scores provide a helpful starting point for com-

paring the potential contributions of different variables

and selecting a small but skillful subset of predictors

1) TRAINING

TheRF is trained to use predictors available at a given

time to forecast the occurrence of MCS-I 2 h in the fu-

ture To create the RF training suite predictor values

and associatedMCS-I lsquolsquotruthrsquorsquo values from 2h later were

interpolated onto a cylindrical grid covering 258ndash488Nand 1258ndash678W with horizontal spacing of about 4 km

(00368 latitude and 00388 longitude) The geographical

coverage of data points used in the training suite is

shown in Fig 1 Points over theAtlanticOcean and parts

of Canada and Mexico that are beyond the WSR-88D

radar network coverage were not included The analysis

was done using data available at the top of each hour

Over the 3-month period from June through August

2011 there were over 200 million potential data points

Even though there were many cases to choose from

most of them were null events (no MCS-I) Even in the

most MCS-I-prone geographical regions in the United

States MCS-I events occur only 3 of the time (Pinto

et al 2015) The averageMCS-I frequency for the entire

domain is only 03 This rarity makes MCS-I a difficult

FIG 3 Example of bootstrap sampling from a hypothetical set of 26 cases andashz Twenty-six

cases are randomly selected with replacement to create a 26-element set T Cases may be

selected multiple times or not at all This process is repeated for each tree Those cases not

selected are called out-of-bag cases and are used to assess predictor importance

APRIL 2016 AH I J EVYCH ET AL 585

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand for any statistical forecast algorithm to han-

dle One can achieve 997 accuracy by always

predicting a null event at every grid point To help the

RF algorithm discriminate between MCS-I and non-

MCS-I cases theMCS-I cases are oversampled such that

they make up 30 of the training set This artificial in-

crease in the proportion of events in the training set can

be accounted for in the RF vote calibration phase

The RF parameter sensitivity tests and predictor im-

portance analyses were conducted using 10 disjoint

training sets 5 sets of 18 000 cases each were drawn

randomly without replacement from odd days and an-

other 5 sets were drawn similarly from even days The

standard deviation of skill over these 10 training sets

provides a means for assessing the relative significance

of differences in mean skill score when the RF param-

eters are changed While one standard deviation is not a

particularly stringent requirement there is only a 22

chance that the mean of 10 samples will differ by more

than one standard deviation from the mean of another

10 samples drawn from the same population Selecting

the sets from even or odd days also permits testing a

model trained on even days against independent data

from odd days and vice versa

In general one wants as many training cases as pos-

sible to fully sample the general population of weather

scenarios On the other hand given finite resources one

must limit the number of cases We found that 18 000

cases allowed for efficient training of the RF while fully

sampling the parameter space This number of cases is

actually quite large compared to other recent studies

For example McGovern et al (2011) successfully

trained an RF to predict atmospheric turbulence with

only 2055 cases and Mecikalski et al (2015) predicted

the onset of radar reflectivity $ 35dBZ at the 2108Clevel (ie a formal definition for convective initiation)

with only 9015 cases While the number of cases is rel-

atively large it is important to note that a higher number

of training examples is often needed when the event is

rare so that both verifying classes are sufficiently

sampled

2) PREDICTOR SELECTION

As a proof of concept the RF was first trained with a

select group of diagnostic fields obtained from the

HRRR Later the value of adding observational pre-

dictor fields was explored Diagnostic output from the

HRRR included 17 two-dimensional fields deemed to be

relevant for the prediction of MCS-I (Table 1) Envi-

ronmental factors that contribute to the development of

MCSs are discussed in Houze (2004) Undoubtedly

there are other fields that may be derived from the full

three-dimensional HRRR dataset that would have po-

tential for adding value to the prediction of MCS-I (eg

vertical wind shear) but for simplicity we limited our

training sets to fields available within the HRRR two-

dimensional data stream In addition local solar time

was added as a predictor field as a simple way to account

for differing mechanisms responsible for daytime and

nocturnal MCS-I

As noted by Hall et al (2011) lsquolsquoone of the most ef-

fective ways to select features that are predictive of

some phenomena is manually based on subject matter

expertisersquorsquo Thus each variable available in the HRRR

TABLE 1 Predictors sorted by mean selection count

Predictor Description Unit Mean selection count

PWAT_EATM Precipitable water in model column kgm22 504

PRES_SFC Surface pressure (a proxy for terrain height) hPa 490

Local solar time UTC hour 1 (8E)158 h21 h 459

REFC_EATM Max reflectivity in model column dBZ 408

TSOIL_SFC Soil temperature at the surface K 383

CAPE_SFC CAPE of surface parcel J kg21 348

HPBL_SFC Height of planetary boundary layer m 283

RH_HTGL 2-m relative humidity 280

DZDT1Hr_SIGL Vertical velocity in 900ndash700-hPa layer m s21 260

APCP1Hr_SFC Accumulated model precipitation mm 246

DSWRF_SFC Downward shortwave radiation flux at surface Wm22 232

ULWRF_NTAT Upward longwave radiation flux at top of atmosphere Wm22 220

SHTFL_SFC Sensible heat flux at surface Wm22 209

LHTFL_SFC Latent heat flux at surface Wm22 200

DPT_HTGL 2-m dewpoint K 184

34LFTX_SPDLa Best (four layer) lifted index K 159

SPFH_HTGLa Specific humidity g kg21 152

CIN_SFC CIN of surface parcel J kg21 93

a These two predictors were removed after the mean selection count analysis for reasons described in the text

586 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 5: Probabilistic Forecasts of Mesoscale Convective System

trained the algorithm finds a predictor and a threshold

that lsquolsquosplitsrsquorsquo the training data instances that reach a node

into two subsets in a way that maximizes the homogeneity

of the subsets with respect to the predictand for example

by minimizing the lsquolsquoGini impurityrsquorsquo (Breiman 1996)

RF trees differ from conventional decision trees in that

each RF tree is trained on a bootstrapped sample of the

training cases (illustrated by Fig 3) Additionally at each

node of the tree only a limited randomly selected subset

of predictors is chosen as candidates for splitting whereas

standard decision trees consider all predictors as candi-

dates (The predictor candidates are selected randomly

with replacement so that any predictor may be a candi-

date at any node) An implication of bootstrapping is that

roughly one-third of the training cases are not used for

any given tree and these lsquolsquoout of bagrsquorsquo cases are used as

test cases to quantify the importance of each predictor

field Bootstrapping the training cases and ignoring some

predictors at each nodemake individual treesweaker but

these steps also ensure the trees are not strongly corre-

lated with each other Thus the forest is less susceptible

to overfitting the peculiarities of the training set and can

provide probabilistic information The number of trees

and number of predictors chosen as candidates for split-

ting at each node are tunable parameters to which pre-

dictive performance sensitivity is tested below

TheRF has several advantages over other datamining

techniques For one the empirical model created from

the RF ensemble does not require the predictors to be

monotonically related to the predictand meaning that it

can represent a variety of functional relationships Al-

ternative techniques like logistic regression are in-

herently linear The decision trees in the RF are also

human readable such that relationships between data

and how they were used to predict can be explored

In addition to their predictive capabilities RFs can

rank the importance of individual predictors (Breiman

2001 Topic et al 2014) A predictorrsquos importance is

quantified by scrambling its values in the out-of-bag

training cases for each tree and seeing how much the

classification accuracy of theRFgoes down For example

the expected importance of a random variable is zero

The importance value is often scaled by dividing it by a

quantity akin to its standard error (eg see supplemental

material for Diacuteaz-Uriarte and de Andreacutes 2006) Impor-

tance scores provide a helpful starting point for com-

paring the potential contributions of different variables

and selecting a small but skillful subset of predictors

1) TRAINING

TheRF is trained to use predictors available at a given

time to forecast the occurrence of MCS-I 2 h in the fu-

ture To create the RF training suite predictor values

and associatedMCS-I lsquolsquotruthrsquorsquo values from 2h later were

interpolated onto a cylindrical grid covering 258ndash488Nand 1258ndash678W with horizontal spacing of about 4 km

(00368 latitude and 00388 longitude) The geographical

coverage of data points used in the training suite is

shown in Fig 1 Points over theAtlanticOcean and parts

of Canada and Mexico that are beyond the WSR-88D

radar network coverage were not included The analysis

was done using data available at the top of each hour

Over the 3-month period from June through August

2011 there were over 200 million potential data points

Even though there were many cases to choose from

most of them were null events (no MCS-I) Even in the

most MCS-I-prone geographical regions in the United

States MCS-I events occur only 3 of the time (Pinto

et al 2015) The averageMCS-I frequency for the entire

domain is only 03 This rarity makes MCS-I a difficult

FIG 3 Example of bootstrap sampling from a hypothetical set of 26 cases andashz Twenty-six

cases are randomly selected with replacement to create a 26-element set T Cases may be

selected multiple times or not at all This process is repeated for each tree Those cases not

selected are called out-of-bag cases and are used to assess predictor importance

APRIL 2016 AH I J EVYCH ET AL 585

Unauthenticated | Downloaded 051722 0650 PM UTC

predictand for any statistical forecast algorithm to han-

dle One can achieve 997 accuracy by always

predicting a null event at every grid point To help the

RF algorithm discriminate between MCS-I and non-

MCS-I cases theMCS-I cases are oversampled such that

they make up 30 of the training set This artificial in-

crease in the proportion of events in the training set can

be accounted for in the RF vote calibration phase

The RF parameter sensitivity tests and predictor im-

portance analyses were conducted using 10 disjoint

training sets 5 sets of 18 000 cases each were drawn

randomly without replacement from odd days and an-

other 5 sets were drawn similarly from even days The

standard deviation of skill over these 10 training sets

provides a means for assessing the relative significance

of differences in mean skill score when the RF param-

eters are changed While one standard deviation is not a

particularly stringent requirement there is only a 22

chance that the mean of 10 samples will differ by more

than one standard deviation from the mean of another

10 samples drawn from the same population Selecting

the sets from even or odd days also permits testing a

model trained on even days against independent data

from odd days and vice versa

In general one wants as many training cases as pos-

sible to fully sample the general population of weather

scenarios On the other hand given finite resources one

must limit the number of cases We found that 18 000

cases allowed for efficient training of the RF while fully

sampling the parameter space This number of cases is

actually quite large compared to other recent studies

For example McGovern et al (2011) successfully

trained an RF to predict atmospheric turbulence with

only 2055 cases and Mecikalski et al (2015) predicted

the onset of radar reflectivity $ 35dBZ at the 2108Clevel (ie a formal definition for convective initiation)

with only 9015 cases While the number of cases is rel-

atively large it is important to note that a higher number

of training examples is often needed when the event is

rare so that both verifying classes are sufficiently

sampled

2) PREDICTOR SELECTION

As a proof of concept the RF was first trained with a

select group of diagnostic fields obtained from the

HRRR Later the value of adding observational pre-

dictor fields was explored Diagnostic output from the

HRRR included 17 two-dimensional fields deemed to be

relevant for the prediction of MCS-I (Table 1) Envi-

ronmental factors that contribute to the development of

MCSs are discussed in Houze (2004) Undoubtedly

there are other fields that may be derived from the full

three-dimensional HRRR dataset that would have po-

tential for adding value to the prediction of MCS-I (eg

vertical wind shear) but for simplicity we limited our

training sets to fields available within the HRRR two-

dimensional data stream In addition local solar time

was added as a predictor field as a simple way to account

for differing mechanisms responsible for daytime and

nocturnal MCS-I

As noted by Hall et al (2011) lsquolsquoone of the most ef-

fective ways to select features that are predictive of

some phenomena is manually based on subject matter

expertisersquorsquo Thus each variable available in the HRRR

TABLE 1 Predictors sorted by mean selection count

Predictor Description Unit Mean selection count

PWAT_EATM Precipitable water in model column kgm22 504

PRES_SFC Surface pressure (a proxy for terrain height) hPa 490

Local solar time UTC hour 1 (8E)158 h21 h 459

REFC_EATM Max reflectivity in model column dBZ 408

TSOIL_SFC Soil temperature at the surface K 383

CAPE_SFC CAPE of surface parcel J kg21 348

HPBL_SFC Height of planetary boundary layer m 283

RH_HTGL 2-m relative humidity 280

DZDT1Hr_SIGL Vertical velocity in 900ndash700-hPa layer m s21 260

APCP1Hr_SFC Accumulated model precipitation mm 246

DSWRF_SFC Downward shortwave radiation flux at surface Wm22 232

ULWRF_NTAT Upward longwave radiation flux at top of atmosphere Wm22 220

SHTFL_SFC Sensible heat flux at surface Wm22 209

LHTFL_SFC Latent heat flux at surface Wm22 200

DPT_HTGL 2-m dewpoint K 184

34LFTX_SPDLa Best (four layer) lifted index K 159

SPFH_HTGLa Specific humidity g kg21 152

CIN_SFC CIN of surface parcel J kg21 93

a These two predictors were removed after the mean selection count analysis for reasons described in the text

586 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 6: Probabilistic Forecasts of Mesoscale Convective System

predictand for any statistical forecast algorithm to han-

dle One can achieve 997 accuracy by always

predicting a null event at every grid point To help the

RF algorithm discriminate between MCS-I and non-

MCS-I cases theMCS-I cases are oversampled such that

they make up 30 of the training set This artificial in-

crease in the proportion of events in the training set can

be accounted for in the RF vote calibration phase

The RF parameter sensitivity tests and predictor im-

portance analyses were conducted using 10 disjoint

training sets 5 sets of 18 000 cases each were drawn

randomly without replacement from odd days and an-

other 5 sets were drawn similarly from even days The

standard deviation of skill over these 10 training sets

provides a means for assessing the relative significance

of differences in mean skill score when the RF param-

eters are changed While one standard deviation is not a

particularly stringent requirement there is only a 22

chance that the mean of 10 samples will differ by more

than one standard deviation from the mean of another

10 samples drawn from the same population Selecting

the sets from even or odd days also permits testing a

model trained on even days against independent data

from odd days and vice versa

In general one wants as many training cases as pos-

sible to fully sample the general population of weather

scenarios On the other hand given finite resources one

must limit the number of cases We found that 18 000

cases allowed for efficient training of the RF while fully

sampling the parameter space This number of cases is

actually quite large compared to other recent studies

For example McGovern et al (2011) successfully

trained an RF to predict atmospheric turbulence with

only 2055 cases and Mecikalski et al (2015) predicted

the onset of radar reflectivity $ 35dBZ at the 2108Clevel (ie a formal definition for convective initiation)

with only 9015 cases While the number of cases is rel-

atively large it is important to note that a higher number

of training examples is often needed when the event is

rare so that both verifying classes are sufficiently

sampled

2) PREDICTOR SELECTION

As a proof of concept the RF was first trained with a

select group of diagnostic fields obtained from the

HRRR Later the value of adding observational pre-

dictor fields was explored Diagnostic output from the

HRRR included 17 two-dimensional fields deemed to be

relevant for the prediction of MCS-I (Table 1) Envi-

ronmental factors that contribute to the development of

MCSs are discussed in Houze (2004) Undoubtedly

there are other fields that may be derived from the full

three-dimensional HRRR dataset that would have po-

tential for adding value to the prediction of MCS-I (eg

vertical wind shear) but for simplicity we limited our

training sets to fields available within the HRRR two-

dimensional data stream In addition local solar time

was added as a predictor field as a simple way to account

for differing mechanisms responsible for daytime and

nocturnal MCS-I

As noted by Hall et al (2011) lsquolsquoone of the most ef-

fective ways to select features that are predictive of

some phenomena is manually based on subject matter

expertisersquorsquo Thus each variable available in the HRRR

TABLE 1 Predictors sorted by mean selection count

Predictor Description Unit Mean selection count

PWAT_EATM Precipitable water in model column kgm22 504

PRES_SFC Surface pressure (a proxy for terrain height) hPa 490

Local solar time UTC hour 1 (8E)158 h21 h 459

REFC_EATM Max reflectivity in model column dBZ 408

TSOIL_SFC Soil temperature at the surface K 383

CAPE_SFC CAPE of surface parcel J kg21 348

HPBL_SFC Height of planetary boundary layer m 283

RH_HTGL 2-m relative humidity 280

DZDT1Hr_SIGL Vertical velocity in 900ndash700-hPa layer m s21 260

APCP1Hr_SFC Accumulated model precipitation mm 246

DSWRF_SFC Downward shortwave radiation flux at surface Wm22 232

ULWRF_NTAT Upward longwave radiation flux at top of atmosphere Wm22 220

SHTFL_SFC Sensible heat flux at surface Wm22 209

LHTFL_SFC Latent heat flux at surface Wm22 200

DPT_HTGL 2-m dewpoint K 184

34LFTX_SPDLa Best (four layer) lifted index K 159

SPFH_HTGLa Specific humidity g kg21 152

CIN_SFC CIN of surface parcel J kg21 93

a These two predictors were removed after the mean selection count analysis for reasons described in the text

586 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 7: Probabilistic Forecasts of Mesoscale Convective System

2D data stream that we thought might be of value in the

prediction of MCS-I was evaluated Hall et al (2011)

also note that while the RF was designed to effectively

utilize large numbers of predictors it can be susceptible

to noise from extraneous or redundant features To re-

duce the number of predictors used in the training sets

we implemented a method that systematically de-

termines which predictor fields should be retained al-

lowing some of the correlated predictor fields to be

eliminated As will be discussed below choosing which

predictors to retain depends on the entire set of pre-

dictor fields under evaluation This is particularly true

for RFs since by utilizing decision trees that split on

multiple predictors in succession an RF captures and

exploits relationships between the predictors

A predictor selection trial was performed using a se-

ries of two forward selection steps and one backward

elimination step At each forward selection step all

unselected predictors were tested individually as can-

didates for retention by joining them to the predictors

already selected and evaluating the resulting RFrsquos pre-

dictive skill on an independent testing set The predictor

whose inclusion made the RF most skillful was retained

for the next step After two forward selection steps all

the retained variables were tested to see which onersquos

removal caused the smallest drop in the RFrsquos skill

(backward elimination) The predictor associated with

the smallest drop in skill was then removed from the

retained variable group and added back into the group

of unselected predictor fields This process was repeated

until all 18 variables were retained Each predictor se-

lection trial was repeated 10 times with the different

training sets with training on odd days and testing on

even days and then vice versa Figure 4 summarizes the

results obtained using 10 trials After step 1 model re-

flectivity (REFC_EATM) was retained for 9 out of 10

trials and model precipitable water (PWAT_EATM)

was retained once After step 2 the most frequently re-

tained variables were model reflectivity (9 out of 10

trials) and model precipitable water (9 out of 10 trials)

but model surface pressure (PRES_SFC which is in-

dicative of terrain height) and model lifted index

(34LFTX_SPDL) were also retained once The average

number of steps per trial for which a predictor was re-

tained is given in Table 1 The results suggest that the

presence of a deep column of water vapor is important

for MCS-I given that model precipitable water was the

most frequently retained predictor (504 steps per trial)

Fixed parameters such as solar time and surface pres-

sure which is a proxy for terrain height were also re-

tained quite often indicating the importance of

temporal and geographic regimes On the other hand

FIG 4 Cumulative results of 10 predictor selection trials for 2-h forecasts obtained using RFs

of varying sizes Predictors (see Table 1) are listed along the y axis and the selection steps

increase to the right Every three steps two predictors are added to the forest and one is

removed so the size of the predictor suite increases by one The colors indicate the number of

times (summed over 10 trials) a predictor was selected in the predictor suite after that step By

the 52nd step all 18 predictors were used

APRIL 2016 AH I J EVYCH ET AL 587

Unauthenticated | Downloaded 051722 0650 PM UTC

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 8: Probabilistic Forecasts of Mesoscale Convective System

model convective inhibition (CIN_SFC) was retained

least often (93 steps per trial) It is unclear as to why

CIN_SFC seems to have lower importance in the

training set however it is retained owing to previous

reports of its importance in the prediction of MCS-I

(eg Jirak and Cotton 2007)

3) SCORING AND EVALUATION

The primary objective measure used to assess the

performance of the probabilistic RF 2-hMCS-I forecasts

was the area under the receiver operating characteristic

(ROC) curve (AUC Marzban 2004) ROC curves

(Fig 5) are obtained by finding the relationship between

the hit rate [Hits(Hits1Misses)] and false alarm rate

[False_Positive(False_Positive1Correct_Null)] for a

range of RF vote thresholds AUC has a long history in

evaluating machine learning algorithms The ROC

curvemaps hit rate as a function of false alarm (FA) rate

across a range of thresholds available within the pre-

diction (eg RF vote counts or likelihood values) An

AUC value of one is indicative of a perfect forecast

while an AUC value of 05 is indicative of a purely

random forecast We also used the Gilbert skill score

commonly known as the equitable threat score (ETS)

as a second metric to evaluate the RF forecasts In this

case we took the maximum ETS value over all RF vote

thresholds Both AUC and maximum ETS can be used

to compare RF performance to that of other forecasts

even if they have different units or are calibrated dif-

ferently (Wilks 2006 ) Finally we used the symmetric

extreme dependency score (SEDS) to evaluate and in-

tercompare the performance of the RF and other short-

term forecasts in the real-time prediction of MCS-I

observed during a 5-week period in 2013 The SEDS

score which is described by Hogan et al (2009) is an

equitable skill score designed to more effectively eval-

uate the performance of forecasts of infrequently oc-

curring events such as MCS-I

d Predictor field optimization

In the first optimization step redundant predictors

were removed in order to reduce the amount of un-

necessary information going into the training set Highly

correlated or redundant predictors like CAPE and lifted

index or 2-m dewpoint and 2-m specific humidity were

compared It was found that the better variable to use

depended on the number of variables in the training set

Lifted index was selected more often than CAPE when

the suite was limited to five or fewer predictors (before

step 15 in Fig 4) but for more than five predictors

CAPE was selected more often meaning that it was

more valuable in combination with the other predic-

tors in the larger set Likewise specific humidity was

preferred when the number of predictor variables was

small but dewpoint worked better when a greater

number of predictors were used Since our final pre-

dictor suite has a larger number of predictors CAPE

and dewpoint were retained

In the second set of predictor optimization experi-

ments the impact of predictor field smoothing was ex-

plored Each of the remaining HRRR forecast fields was

smoothed with circular filters with radii ranging from 10

to 80km It was found that using a 40-km circular

smoothing filter resulted in the best skill scores Figure 5

shows ROC curves obtained for RF predictions that

were based onHRRR data only at raw resolution versus

those obtained using a 40-km circular smoothing filter

There is no overlap between the 10 curves obtained

using raw resolution and those obtained using a 40-km

filter for hit rates between 03 and 09 indicating that the

improved probability of detection associated with the

smoothing is significant The average AUC increased

from 084 to 086 and the maximumETS increased from

033 to 037 (Table 2) Both increases were large relative

to the standard deviation across the 10 training sets

further indicating the significance of this result

The final predictor optimization step was designed to

assess the impact of observation-based variables to the

skill of the RF forecasts The value of adding radar

reflectivity and 133ndash10-mm SBTD was assessed both

FIG 5 False alarm rate vs hit rate (ROC curve) using un-

smoothed HRRR (blue) and smoothed HRRR (red) The two sets

(unsmoothed and smoothed HRRR) of 10 RFs were trained on

even days and tested on odd days using summer (JJA) 2011 data

The predictor fields used are listed in Table 1 Note that lifted index

(34LFTX_SPDL) and specific humidity (SPFH_HTGL) were

omitted from the training set based on analyses described in

the text

588 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 9: Probabilistic Forecasts of Mesoscale Convective System

individually and in combination These two fields were

added to include information regarding the location and

spacing of cloud and precipitation areas that have yet to

reach the size threshold required to be classified as an

MCS For consistency these fields were also smoothed

using a 40-km circular filter To account for storm mo-

tion these fields were extrapolated to their expected

locations 2 h later to be consistent with the corre-

sponding HRRR forecast fields Adding smoothed

SBTD had little impact on the skill of the RF forecasts

whether added alone or in combination with radar re-

flectivity while adding radar reflectivity resulted in a

significant increase in skill (Table 2) This increase in

skill associated with including radar reflectivity as a

predictor field is comparable to that obtained by

smoothing the model data Despite the failings of the

SBTD this field was retained because of the value found

in other studies (eg Mecikalski and Bedka 2006

Mecikalski et al 2015)

3 Sensitivity to RF parameters

We tested the sensitivity of the RF performance to

several parameters that control aspects of the training

These parameters include the size of the forest (the

number of trees) the percentage of positive events in

the training set and the number of candidate variables

to use for splitting at each tree node

A forest withmore trees will generally bemore skillful

than one with fewer trees because it can accommodate

more of the nuances of the training set However there

comes a point when the rate of improvement with more

trees is negligible Using the datasets described above

forests were trained with sizes ranging from 4 to 500

trees The AUC and maximum ETS affirm that more

trees lead to better scores (Fig 6) However the im-

provement slows greatly after about 50 trees The mean

scores for the 50-tree forests were within one standard

deviation of the mean scores for the 500-tree forests

This pattern of diminishing returns with greater number

of trees is similar to that found by McGovern et al

(2011) Henceforth 200 trees are used in all the forests

The RF skill was found to be fairly insensitive to the

number of candidate predictors used for splitting at each

node By default the Topic et al (2014) software uses

the integer value of the square root of the total number

of predictors for this parameter With 19 total pre-

dictors 4 would be the default Our analysis reveals that

using fewer predictors was slightly better (Fig 7) The

best AUCwas achieved with two predictors and the best

maximum ETS was with three predictors (Fig 7) Most

of the 61 standard deviation ranges overlap so in any

case the results are not overly sensitive to this RF pa-

rameter For the rest of our experiments splitting of the

RF at nodes is done using two predictors

The AUC and maximum ETS of the RFs were most

sensitive to the ratio of events to nonevents in the

training set Williams (2014) alluded to the importance

of rebalancing the proportion of events to nonevents in

the training set when trying to predict very rare events

Results of our sensitivity analysis indicate that the best

ratio was between 20 and 40 The best AUC was

achieved with 40 events and the best maximum ETS

was achieved with 30 events (Fig 8) It is clear that

using an event ratio of 5 which is closest to the

climatological frequency of occurrence of MCS-I over

the entire domain (03) resulted in the worst

performance

4 Evaluation and case studies

Based on these sensitivity experiments we used a

training set that consisted of 30 MCS-I events se-

lected from the JJA period in 2011 to train a 200-tree RF

to make 2-h forecasts of MCS-I in real time A new RF-

based MCS-I forecast was issued every hour The pre-

dictive skill of these forecasts was evaluated over the

period 11 Junendash5 August 2013 inclusive Vote counts

were converted to interest or likelihood1 values using a

simple linear transform p 5 V200 where p is the like-

lihood of an MCS-I event and V is the vote count Only

forecasts for which a complete set of predictors was

available were evaluated resulting in a total of 654

forecasts during the evaluation period

TABLE 2 Skill scores for predictor optimization experiments

Predictors AUC Std dev of AUC Max ETS Std dev of ETS

Raw HRRR (17) 1 local solar time (LST) 084 0004 033 0008

Smoothed HRRR 1 LST 086 0003 037 0011

Smoothed HRRR 1 LST 1 SBTD 087 0003 038 0010

Smoothed HRRR 1 LST 1 RadarREFC 088 0003 040 0007

Smoothed HRRR 1 LST1 RadarREFC 1 SBTD 088 0004 040 0010

1 Note that the predictions are not actually probabilities since no

attempt has beenmade to calibrate the predicted likelihood values

APRIL 2016 AH I J EVYCH ET AL 589

Unauthenticated | Downloaded 051722 0650 PM UTC

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 10: Probabilistic Forecasts of Mesoscale Convective System

Because of the uncertainties associated with the pre-

diction of convection initiation and the processes re-

sponsible for the upscale growth of convective storms

into anMCS the probabilistic nature of the RF forecasts

is advantageous compared to deterministic forecasts

such as those provided by either extrapolated reflectivity

or a single HRRR forecast While the likelihood values

obtained using RF are not inherently calibrated the

values can still be used in a relative sense No attempt

has been made to calibrate the RF forecasts since the

relative variations in the RF likelihood field alone were

found to be highly useful however this could be done

using the approach described in Williams (2014)

The relative performance of four different forecast

techniques was assessed using the ROC diagram AUC

and the SEDS Skill scores were accumulated from

34 days during the evaluation period for which all

forecast datasets (extrapolated reflectivity HRRR

composite reflectivity forecasts2 and RF-based MCS-I

likelihood forecasts) were available

FIG 6 Bar plots of (top) average AUC and (bottom) maximum ETS for different-sized

forests The barsmark the average over 10 trials while the whiskers span61 standard deviation

There are incremental gains as the number of trees increases but the return per additional tree

gets progressively smaller

2 Note that 4-h forecasts of composite reflectivity from the

HRRR are evaluated in this study to account for a 2-h latency of

the HRRR forecast products used in the RF predictions

590 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 11: Probabilistic Forecasts of Mesoscale Convective System

The ROC curves shown in Fig 9 indicate the ability of

forecasts to discriminate between events and nonevents

To generate ROC curves from the deterministic HRRR

reflectivity and extrapolated reflectivity the hit rate and

FA rate were obtained at 5-dBZ intervals for thresholds

ranging from 0 to 65dBZ The skill of each method is

compared with that obtained using an lsquolsquoinformedrsquorsquo cli-

matology as the forecast The informed climatology was

obtained by grouping MCS-I occurrences observed

across the eastern two-thirds of the United States during

2011 into two periods [day hours (1200ndash2300 UTC) and

night hours (0000ndash1100 UTC)] This aggregation was

necessary to build a useful regional climatology from a

single summer of data ROC curves where obtained

from the MCS-I climatology using MCS-I frequencies

ranging from 0 to 005 with an interval of 0005

As can be seen in Fig 9 and Table 3 the RF out-

performs the other forecast methods While the RF

AUC values are much lower than those obtained when

the training and verification truth are both from the

same year (076 versus 088 from Table 3 and Table 2

respectively) the AUC values obtained for the RF

MCS-I forecasts made in 2013 are much higher than

those obtained with the other forecast methods The RF

forecasts also have the highest SEDS and the highest hit

rate for all FA rates of 01 or greater The RF forecasts

are also clearly more skillful than using an informed

climatology however the relative pickup in skill over

the informed climatology is seen to be regionally de-

pendent with the RF skill pickup beingmuch greater for

forecasts made over the Great Plains (GP) region than

those made over the Southeast region Also of note the

FIG 7 Bar plots of (top) average AUC and (bottom) maximum ETS for different numbers of

candidate predictors used for splitting at each node

APRIL 2016 AH I J EVYCH ET AL 591

Unauthenticated | Downloaded 051722 0650 PM UTC

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 12: Probabilistic Forecasts of Mesoscale Convective System

RF was somewhat more skillful at predicting daytime

MCS-I (AUC SEDS5 077 021) than nighttimeMCS-I

(AUC SEDS 5 075 017) indicating the differing

naturepredictability of surface-based and nocturnal

(more often elevated) MCS-I

Further more detailed manual evaluation of RF

performance using slightly less stringent verification

(ie allowing for some displacement error) revealed

that using a RF-based likelihood threshold of 01 detects

all but 1 of the 550 observed MCS-I events to within

50km That is a hit rate of 998 However this im-

pressive statistic and the RFrsquos ability to achieve higher

hit rates come at the expense of a tendency toward FAs

For example using the ROC diagram in Fig 9 it is seen

that a hit rate of over 70 can be achieved using RF but

at the expense of the FA rate exceeding 30

Reasons for RFrsquos tendency to FAs are described

below using a couple of representative case studies

(Figs 10 and 11) In these figures forecasts ob-

tained using RF HRRR reflectivity and extrapolated

FIG 8 (top) Average AUC and (bottom) maximum ETS as a function of the event per-

centage in the training set These were tested with a 200-treeRF using two candidate predictors

for splitting at each node The AUC peaks at 0878 with an event percentage of 40 and the

maximum ETS peaks at 0407 when the event percentage is 30

592 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 13: Probabilistic Forecasts of Mesoscale Convective System

reflectivity are compared with observed ongoing MCS

events (black contours) and instantaneousMCS-I events

(magenta contours) The contours given in these figures

provide a 125-km extension surrounding each observed

MCS and MCS-I event indicating the size of the region

considered positive in the training set The case shown in

Fig 10 provides an example of simultaneous MCS-I

events that occurred around 1800 UTC and spanned

regions with vastly different MCS formation mecha-

nisms The timing of the MCS-I event observed over the

southeastern United States on this day was fairly typical

(eg Geerts 1998) but the MCS-I event occurring over

the high plains was unusually early compared to clima-

tology (Carbone and Tuttle 2008) as it was triggered in

an area of moderate instability (CAPE 1500 J kg21)

through the interaction between a stationary front and

an old outflow boundary The MCS-I events observed

over the Florida panhandle and Mississippi were well

forecasted by both the RF (with likelihoods greater than

07) and the HRRR composite reflectivity forecast

(areas of 35 dBZ exceeding 100 km in length) Both the

HRRR and the RF forecasts provide a much weaker

indication of MCS-I in northwestern Kansas with ele-

vated RF likelihood values that peak around 04 and the

HRRR-forecasted reflectivity being too low to be con-

sidered an MCS

A multistorm MCS-I event over the high plains is

shown in Fig 11 In this case a long line of convection

formed between 2000 and 2100 UTC along a cold front

dryline in an area with no previously existing radar

echoes as evidenced by the lack of extrapolated re-

flectivity (Fig 11d) The RF forecast had likelihood

values of between 005 and 020 in the approximate lo-

cation of the observed MCS-I and with the correct ori-

entation However these values were much lower than

those routinely obtained for RF-based forecasts of

MCS-I in the southeastern United States The HRRR

predicted a broken line of convective cells with the

correct orientation but the storm cells were too far apart

(ie the distance between grid points with 35dBZmdash

analogous to VIL of 35 kgm22mdashwas greater than

10 km) for this area of convection predicted by the

HRRR to be considered anMCS (Fig 11d) TheRFwas

able to determine that storms larger than those indicated

in theHRRR reflectivity forecast were possible in 2h In

addition MCS-I likelihoods obtained from the RF

forecast issued 1h later (Fig 11c) increased dramatically

(to over 05) lsquolsquocatching uprsquorsquo to the MCS-I event 1 h be-

fore it actually occurred (ie providing an indication of

FIG 9 (top) ROC curves for 2-h random forest predictions (red)

4-h forecasts of composite reflectivity from HRRR (black) MCS-I

climatology (cyan) and 2-h extrapolations ofVIL (green)Datawere

obtained for the period 12 Junendash5August 2013 andwerematched for

availability Skillful forecasts lie above the dotted 11 line (bottom)

As in the top panel but zoomed in on the dashed area

TABLE 3 Hit rate AUC and SEDS obtained for times in which all forecasts were available between 11 Jun and 5 Aug 2013 In column

headers eUS stands for eastern United States SE stands for Southeast and GP stands for Great Plains

RF HRRR reflectivity

Extrapolated

reflectivity

Observed climatologi-

cal MCS-I frequency

eUS SE GP eUS SE GP eUS SE GP eUS SE GP

Hit rate for FA rate 5 01 031 027 028 024 021 023 020 014 019 022 021 012

AUC 076 073 075 059 057 060 055 052 053 061 065 052

Max SEDS 019 017 018 014 013 014 011 006 011 015 013 003

APRIL 2016 AH I J EVYCH ET AL 593

Unauthenticated | Downloaded 051722 0650 PM UTC

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 14: Probabilistic Forecasts of Mesoscale Convective System

increased confidence in the likelihood of an MCS-I

event) The increase in RF likelihood values between

successive RF forecasts issued before an observed

MCS-I event happened a number of times during the

evaluation period indicating that the trend in the RF

likelihood forecasts may provide additional informa-

tion that could be used by forecasters to ascertain

whether or not an MCS may initiate soon

While the RF was able to capture nearly every MCS-I

event this forecast tool requires forecaster intervention

because of the high FA rate It turns out that the RF

forecasts have three failure modes as listed in Table 4

The most common cause of these FAs (class 1) results

from the inability of the RF to distinguish between an

MCS-I event and an ongoing MCS Detailed analyses

reveal that ongoing MCSs are nearly always coincident

with RF likelihoods of greater than 05 Examples of the

RFrsquos tendency to remain elevated for ongoingMCSs are

evident within the black contours (previously existing

MCS locations) of Figs 10 and 11 Oftentimes the RF

likelihood values will remain elevated up to 3 h after the

MCS has dissipated further exacerbating the FA

problem This failure mode can easily be recognized by

forecasters who thus may ignore these elevated RF

values in areas of ongoing or recently decayed MCSs

The second most common cause for FAs (class 2)

results from the RFrsquos tendency to predict MCS-I earlier

than observed An example of this type of FA is shown

in Figs 10a and 10c for the two MCS-I events that oc-

curred in the Southeast In this case the RF forecast

valid 1 h prior to the observed MCS-I events observed

over Mississippi and Florida exhibited likelihood values

of up to 07 (Fig 10a) rising to over 08 at the observed

time ofMCS-I (Fig 10c) While this type of forecast bias

leads to FAs it could also be used constructively by

providing forecasters early warning of areas worth fur-

ther exploration The third type of FAs (class 3) seen in

the RF MCS-I forecasts occurs in areas where convec-

tive storms are observed but do not reach MCS size

criteria An example of class 3 FAs is evident in Fig 10

where an arc of elevated RF likelihood values extends

from the Gulf Stream across eastern Georgia and the

western portions of the Carolinas Convective storms

are evident in eastern Georgia but fail to reach MCS

FIG 10 RF 2-hMCS-I forecasts valid at (a) 1800 and (c) 1900 UTC 5 Aug 2013 Black contours indicate the locations of ongoingMCSs

with a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 1900 UTC

(d) HRRR 5-h reflectivity forecast (color shading) and extrapolated radar reflectivity (25-dBZ contour indicated by gold line) valid at

1900 UTC

594 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 15: Probabilistic Forecasts of Mesoscale Convective System

size criteria The HRRRmodel forecast had small-scale

storms throughout this region that when coupled with

the other HRRR predictor fields and SBTD yielded

MCS-I likelihood values exceeding 06 in this region In

this case the RF could not discriminate the environ-

mental conditions responsible for upscale growth from

those that inhibit upscale growth Nonetheless the

warning information could be evaluated by a forecaster

who in turn may decide if the possibility of MCS-I

warranted more attention or not

The analyses discussed above clearly indicate that the

RF-based MCS-I forecasts add value over the HRRR

model forecasts in the short term It does this by effec-

tively combining available data (both observations and

model data) to predict the likelihood of anMCS-I event

Key features of the RF technique include its ability to

remove biases in each predictor field and to form com-

plex nonlinear relationships between the predictors as

part of the training process thereby condensing a great

deal of data into a single probabilistic forecast that can

FIG 11 RF 2-hMCS-I forecasts valid at (a) 2000 and (c) 2100UTC 14 Jun 2013 Black contours indicate locations of ongoingMCSs with

a 125-km buffer and magenta contours indicate the locations of MCS-I events with a 125-km buffer (b) Observed VIL at 2000 UTC

(d) HRRR 4-h reflectivity forecast (color shading) and extrapolated reflectivity (25-dBZ contour indicated by gold line) valid at

2000 UTC

TABLE 4 Classification of FA in RF forecasts of MCS-I

Class Description Comment

Region of most common

occurrence

1 RF probabilities are always elevated

in areas of ongoing MCSs

Use RF along with current radar

observations to disregard these areas

Entire domain

2 RF probabilities are often elevated in

forecasts valid up to 2 h prior to the

MCS-I event

RF provides an early indication of regions

where MCS-I is possible in the next 2ndash4 h

Southeastern United States

3 Elevated RF probabilities can occur

in areas where convective storms

form but fail to reach MCS size criteria

Reflects the difficulty of predicting upscale

growth of storm cells into clusters large

enough to be considered MCSs in weakly

forced environments

Southeastern United States

APRIL 2016 AH I J EVYCH ET AL 595

Unauthenticated | Downloaded 051722 0650 PM UTC

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 16: Probabilistic Forecasts of Mesoscale Convective System

be used as guidance for the short-term prediction of

discrete high-impact events like the initiation of MCSs

Predicting the exact timing and location of an MCS-I

event is an extremely challenging problem due to the

discrete nature of the predictand and the complex

nonlinear and somewhat chaotic processes responsible

for the development of MCSs Thus the success of the

RF on this problem suggests that it should be broadly

applicable

5 Summary and conclusions

An RF data mining technique was used to objectively

rank a set of predictor fields and evaluate their potential

to predict MCS-I Predicting the initiation of MCSs is an

extremely challenging forecast problem owing to its dis-

crete nature (ie occurring at a specific instant in time)

and infrequency (occurring in about 03 of the sample

points across the eastern two-thirds of the United States

during the period JunendashAugust in 2011) As a proof of

concept the RF was trained to predict MCS-I using a set

of 2Dfields that are available from theHRRRmodel An

iterative method for selecting which variables have the

most predictive skill was described It was found that

precipitablewater was themost useful predictor ofMCS-I

with local solar time and surface pressure (ie terrain

height) ranked highly as well Interestingly soil temper-

ature also ranked very highly while 2-m moisture vari-

ables were found to be less useful In addition it was

found that CAPE was a good predictor of MCS-I while

CIN was not It is not clear why the model-derived CIN

provided little in the way of predictive skill in nowcasting

MCS-I One possible explanation is that surface-based

CIN calculations underrepresent the true large-scale

stability of the atmosphere especially during daytime

hours when a superadiabatic layer often exists at the

surface

Adding extrapolated radar reflectivity to the set of

predictors significantly increased the RFrsquos skill while

adding SBTD did not It is believed that the radar re-

flectivity helped to capture the more slowly evolving

MCS-I events that occur in the southeastern United

States It was somewhat surprising that the SBTD did

not improve the skill of the RF forecasts as a recent

study byMecikalski et al (2015) indicated it had value in

nowcasting convective initiation (albeit at much smaller

scales and shorter lead times than explored in this

study) It is also possible that the CIWS motion vectors

were not as well suited for extrapolating SBTD as they

were for radar reflectivity Further studies are needed to

assess the utility of the full range of satellite measure-

ments in the forecasting of MCS-I but such work is

beyond the scope of this paper

The sensitivity of RF forecast skill to several tuning

parameters was explored Results were most sensitive to

the fraction of events to nonevents in the training set

Our best results came when 30 of the training set

consisted of MCS-I events which is 100 times the cli-

matological frequency of 03 The RF skill increased

with more trees but there was definitely a point of di-

minishing return A forest size of 200 trees was found to

have AUC and ETS values that were roughly 99 of

those obtained for a 500-tree forest The best number of

candidate variables to split on was 2 or 3 depending on

the verification metric It should be noted that the op-

timal number of candidate split variables will depend on

the number and type of predictor fields used in the

training set

Case studies were used to demonstrate the strengths

and weaknesses of the RF in the prediction of MCS-I

The probabilistic RF forecasts captured nearly all of the

654 MCS-I events observed during the 6-week evalua-

tion period timed to the closest hour and to within

50km In many cases the RF was able to detect MCS-I

events that were not explicitly predicted by the de-

terministic HRRR forecast used as input to the RF The

RF is able to do this by accounting for biases in the

model and by developing nonlinear relationships be-

tween the HRRR-based predictor fields and the two

observational inputs While the RF was able to detect a

large percentage of the MCS-I events observed during

the evaluation period it also produced a larger than

optimal FA rate The largest class of FA (termed class 1)

was when RF forecasted likelihoods remaining high in

the vicinity of existing MCSs While this high FA rate

contributed to overprediction of MCS-I (ie a high

bias) these forecasts could be automatically masked out

of an operational product with existing MCSs Class 2

FAs happened when elevated RF forecast likelihoods

occurred prior to the observed MCS-I event by 1ndash2 h

However this feature could be considered a strength of

the RF MCS-I forecasts by providing advanced notice

for the potential of an MCS-I event

The basic process of training and optimizing the RF

was discussed here however a number of additional

pre- and postprocessing steps could be employed to

further enhance performance Both terrain and time of

day ranked highly in terms of importance as predictors

in the training set Thus one area of future research

would be to assess the value of developing separate

training sets by region and time of day It was also found

in comparing Figs 5 and 9 that the skill of the RF in-

creases notably when using more recently obtained

training data The ROC curves obtained in Fig 5 were

obtained using training data from the even days of JJA

2011 to make forecasts on the odd days of JJA 2011

596 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 17: Probabilistic Forecasts of Mesoscale Convective System

while the ROC curves shown in Fig 9 were obtained

using RFrsquos trained using 2011 data and used to make

forecasts employing 2013 data In fact an ideal approach

for operational use would be to retrain the RF each day

using the latest available datasets To do this one would

have to determine the ideal length for the period of re-

cent data used to train the RF There are trade-offs one

must consider in doing this one would like the training

period to be long enough to capture the full range of

conditions that lead to the event occurring while at the

same time one would like the training period to be short

enough for the RF to respond to changes in the skill of

the predictor fields (eg as a result of changing NWP

models or evolving weather regimes) This might be

accomplished by sampling a training set more heavily

from instances that occurred near the current Julian

date and more from the current convective season than

from previous years

Finally if desired the RF forecast likelihood fields

can be calibrated by relating the forecast categories

to observed frequencies however because of the

high biases described above the calibration process

would necessarily reduce the dynamic range of

likelihood values

The findings presented herein along with the positive

results of Mecikalski et al (2015) of the prediction of

small-scale storm initiation and Williams et al (2008c)

and Williams (2014) in the diagnosis of turbulence

demonstrate the potential benefit of using RF tech-

niques for difficult nowcasting problems that require

analysis and interpretation of large amounts of data in a

short amount of time to predict discrete high-impact

events As such the RF represents a class of data mining

techniques that can be used to digest the ever-increasing

wealth of observational and model data into a single

probabilistic product that alerts forecasters to the pos-

sibility of an impending high-impact event that warrants

further attention

Acknowledgments This research is in response to re-

quirements and funding by the Federal Aviation Ad-

ministration (FAA) Partial support also came from the

National Science Foundation The views expressed are

those of the authors and do not necessarily represent the

official policy or position of the FAA or NSF We thank

Drs Stan Benjamin Curtis Alexander and Steven

Weygandt of NOAAGSD for providing the HRRR

data and Dr Haig Iskenderian of MITLL for providing

access to the CIWSVIL data used to generate theMCS-I

truth dataset and the CIWS motion vectors used to

extrapolate the satellite and radar reflectivity to the

forecast valid time The authors also thank Dr Stan

Trier (NCARMMM) and three anonymous reviewers

for insightful reviews and constructive comments that

helped improve the manuscript

REFERENCES

Benjamin S G and Coauthors 2014 The 2014 HRRR and Rapid

Refresh Hourly updated NWP guidance from NOAA for

aviation improvements for 2013ndash2016 Proc Fourth Aviation

Range and Aerospace Meteorology Special Symp on

WeatherndashAir Traffic Management Integration Atlanta GA

Amer Meteor Soc 24 [Available online at httpsams

confexcomams94AnnualwebprogramPaper240012html]

Breiman L 1996 Technical note Some properties of splitting cri-

teria Mach Learn 24 41ndash47 doi101023A1018094028462

mdashmdash 2001 Random forests Mach Learn 45 5ndash32 doi101023

A1010933404324

mdashmdash J Friedman R A Olshen and C J Stone 1984 Classifi-

cation and Regression Trees CRC Press 358 pp

Carbone R E and J D Tuttle 2008 Rainfall occurrence in the

US warm season The diurnal cycle J Climate 21 4132ndash4146

doi1011752008JCLI22751

Clark A J W A Gallus Jr and T C Chen 2007 Comparison of

the diurnal precipitation cycle in convection-resolving and

non-convection-resolving mesoscale modelsMon Wea Rev

135 3456ndash3473 doi101175MWR34671

mdashmdash R G Bullock T L Jensen M Xue and F Kong 2014 Ap-

plication of object-based time-domain diagnostics for tracking

precipitation systems in convection allowing models Wea

Forecasting 29 517ndash542 doi101175WAF-D-13-000981

Colavito J A S McGettigan M Robinson J L Mahoney and

M Phaneuf 2011 Enhancements in convective weather fore-

casting for NAS traffic flow management (TFM) Preprints

15th Conf on Aviation Range and Aerospace Meteorology

Los Angeles CA Amer Meteor Soc 136 [Available online

at httpsamsconfexcomams14Meso15ARAMtechprogram

paper_191100htm]

mdashmdashmdashmdash H Iskenderian and S A Lack 2012 Enhancements in

convective weather forecasting for NAS traffic flow manage-

ment Results of the 2010 and 2011 evaluations of CoSPA and

discussion of FAA plans Proc Third Aviation Range and

Aerospace Meteorology Special Symp on WeatherndashAir Traffic

Management Integration New Orleans LA Amer Meteor

Soc [Available online at httpsamsconfexcomams

92AnnualwebprogramPaper202520html]

Coniglio M C H E Brooks S J Weiss and S F Corfidi 2007

Forecasting themaintenance of quasi-linearmesoscale convective

systemsWea Forecasting 22 556ndash570 doi101175WAF10061

mdashmdash J Y Hwang andD J Stensrud 2010 Environmental factors

in the upscale growths and longevity of MCSs derived from

Rapid Update Cycle analyses Mon Wea Rev 138 3514ndash

3539 doi1011752010MWR32331

DattatreyaGR 2009Decision treesArtificial IntelligenceMethods

in the Environmental Sciences S E Haupt C Marzban and

A Pasini Eds Springer 77ndash101

Davis C A B G Brown and R G Bullock 2006 Object-based

verification of precipitation forecasts Part II Application to

convective rain systems Mon Wea Rev 134 1785ndash1795

doi101175MWR31461

Dersquoath G and K E Fabricius 2000 Classification and regression

treesApowerful yet simple technique for ecological data analysis

Ecology 81 3178ndash3192 doi1018900012-9658(2000)081[3178

CARTAP]20CO2

APRIL 2016 AH I J EVYCH ET AL 597

Unauthenticated | Downloaded 051722 0650 PM UTC

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 18: Probabilistic Forecasts of Mesoscale Convective System

Delle Monache L F A Eckel D L Rife B Nagarajan and

K Searight 2013 Probabilistic weather prediction with an

analog ensembleMon Wea Rev 141 3498ndash3516 doi101175

MWR-D-12-002811

DeMaria M and J Kaplan 1994 A statistical hurricane intensity

prediction scheme (SHIPS) for the Atlantic basin Wea Fore-

casting 9 209ndash220 doi1011751520-0434(1994)0090209

ASHIPS20CO2

Diacuteaz-Uriarte R and S A de Andreacutes 2006 Gene selection and

classification of microarray data using random forest BMC

Bioinformatics 7 3 doi1011861471-2105-7-3

DupreeW and Coauthors 2009 The advanced storm prediction for

aviation forecast demonstration WMO Symp on Nowcasting

Whistler BC Canada WMO [Available online at httpswww

llmitedumissionaviationpublicationspublication-files

ms-papersDupree_2009_WSN_MS-38403_WW-16540pdf]

Evans J E and E R Ducot 2006 Corridor Integrated Weather

System MIT Lincoln Lab J 16 59ndash80

Gagne D J A McGovern and J Brotzge 2009 Classification of

convective areas using decision trees J Atmos Oceanic

Technol 26 1341ndash1353 doi1011752008JTECHA12051

mdashmdash mdashmdash and M Xue 2014 Machine learning enhancement of

storm-scale ensemble probabilistic quantitative precipitation

forecasts Wea Forecasting 29 1024ndash1043 doi101175

WAF-D-13-001081

Geerts B 1998Mesoscale convective systems in the southeastUnited

States during 1994ndash95 A survey Wea Forecasting 13 860ndash869

doi1011751520-0434(1998)0130860MCSITS20CO2

Glahn H R and D A Lowry 1972 The use of model output

statistics (MOS) in objective weather forecasting J Appl

Meteor 11 1203ndash1211 doi1011751520-0450(1972)0111203

TUOMOS20CO2

Hall T J C N Mutchler G J Bloy R N Thessin S K Gaffney

and J J Lareau 2011 Performance of observation-based

prediction algorithms for very short-range probabilistic clear-

sky condition forecasting J Appl Meteor Climatol 50 3ndash19

doi1011752010JAMC25291

Hamill T M and J S Whitaker 2006 Probabilistic quantitative

precipitation forecasts based on reforecast analogs Theory

and applicationMon Wea Rev 134 3209ndash3229 doi101175

MWR32371

Hogan R J E J OrsquoConnor and A J Illingworth 2009 Verifi-

cation of cloud fraction forecastsQuart J Roy Meteor Soc

135 1494ndash1511 doi101002qj481

Houze R A Jr 2004 Mesoscale convective systems Rev Geo-

phys 42 RG4003 doi1010292004RG000150

Jirak I L and W R Cotton 2007 Observational analysis of the

predictability of mesoscale convective systems Wea Fore-

casting 22 813ndash838 doi101175WAF10121

Lakshmanan V K L Elmore andM B Richman 2010 Reaching

scientific consensus through a competitionBull Amer Meteor

Soc 91 1423ndash1427 doi1011752010BAMS28701

Mahoney W P and Coauthors 2012 A wind power forecasting

system to optimize grid integration IEEE Trans Sustainable

Energy 3 670ndash682 doi101109TSTE20122201758

Marzban C 2004 The ROC curve and the area under it as perfor-

mance measures Wea Forecasting 19 1106ndash1114 doi101175

8251

mdashmdash S Leyton and B Colman 2007 Ceiling and visibility fore-

casts via neural networks Wea Forecasting 22 466ndash479

doi101175WAF9941

McGovern A D J Gagne II N Troutman R A Brown

J Basara and J K Williams 2011 Using spatiotemporal

relational random forests to improve our understanding of

severe weather processes Stat Anal Data Mining 4 407ndash429

doi101002sam10128

Mecikalski J R and K M Bedka 2006 Forecasting convective

initiation bymonitoring the evolution of moving convection in

daytime GOES imagery Mon Wea Rev 134 49ndash78

doi101175MWR30621

mdashmdash mdashmdash S J Paech and L A Litten 2008 A statistical evalu-

ation of GOES cloud-top properties for predicting convective

initiation Mon Wea Rev 136 4899ndash4914 doi101175

2008MWR23521

mdashmdash J Williams C Jewett D Ahijevych A LeRoy and

J Walker 2015 Probabilistic 0ndash1-h convective initia-

tion nowcasts that combine geostationary satellite obser-

vations and numerical weather prediction model data

J Appl Meteor Climatol 54 1039ndash1059 doi101175

JAMC-D-14-01291

Pinto J O J A Grim and M Steiner 2015 Assessment of the

High-Resolution Rapid Refresh Modelrsquos ability to predict

large convective storms using object-based verification Wea

Forecasting 30 892ndash913 doi101175WAF-D-14-001181

Robinson M 2014 Significant weather impacts on the national

airspace system A lsquolsquoweather-readyrsquorsquo view of air traffic man-

agement needs challenges opportunities and lessons learned

Proc Second Symp on Building a Weather-Ready Nation

Enhancing Our Nationrsquos Readiness Responsiveness and Re-

silience to High Impact Weather Events Atlanta GA Amer

Meteor Soc 63 [Available online at httpsamsconfexcom

ams94AnnualwebprogramPaper241280html]

Roebber P J 2015 Adaptive evolutionary programming Mon

Wea Rev 143 1497ndash1505 doi101175MWR-D-14-000951

Rozoff C M and J P Kossin 2011 New probabilistic forecast

schemes for the prediction of tropical cyclone rapid in-

tensification Wea Forecasting 26 677ndash689 doi101175

WAF-D-10-050591

Smalley D J and B J Bennett 2002 Using ORPG to enhance

NEXRAD products to support FAA critical systems Pre-

prints 10th Conf on Aviation Range and Aerospace Meteo-

rology Portland OR Amer Meteor Soc 36 [Available

online at httpsamsconfexcomamspdfpapers38861pdf]

Stensrud D J and Coauthors 2013 Progress and challenges with

warn-on-forecast Atmos Environ 123 2ndash16 doi101016

jatmosres201204004

Topic G and Coauthors 2014 Parallel random forest algorithm

usage Google Code Archive accessed 26 June 2014 [Avail-

able online at httpcodegooglecompparfwikiUsage]

Trier S B C A Davis D A Ahijevych and K W Manning

2014 Use of the parcel buoyancy minimum (Bmin) to diagnose

simulated thermodynamic destabilization Part I Methodol-

ogy and case studies of MCS initiation Mon Wea Rev 142

945ndash966 doi101175MWR-D-13-002721

mdashmdash G S Romine D A Ahijevych R J Trapp R S

Schumacher M C Coniglio and D J Stensrud 2015

Mesoscale thermodynamic influences on convection initia-

tion near a surface dryline in a convection-permitting en-

semble Mon Wea Rev 143 3726ndash3753 doi101175

MWR-D-15-01331

Wilks D S 2006 Statistical Methods in the Atmospheric Sciences

2d ed International Geophysics Series Vol 91 Academic

Press 627 pp

Williams J K 2014 Using random forests to diagnose avia-

tion turbulence Mach Learn 95 51ndash70 doi101007

s10994-013-5346-7

598 WEATHER AND FORECAST ING VOLUME 31

Unauthenticated | Downloaded 051722 0650 PM UTC

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC

Page 19: Probabilistic Forecasts of Mesoscale Convective System

mdashmdash J Craig A Cotter and J K Wolff 2007 A hybrid machine

learning and fuzzy logic approach to CIT diagnostic devel-

opment Preprints Fifth Conf on Artificial Intelligence Ap-

plications to Environmental Science San Antonio TX Amer

Meteor Soc 12 [Available online at httpsamsconfexcom

ams87ANNUALwebprogramPaper120119html]

mdashmdash D Ahijevych S Dettling and M Steiner 2008a Combining

observations and model data for short-term storm forecasting

Remote Sensing Applications for Aviation Weather Hazard

Detection and Decision Support W Feltz and J Murray Eds

International Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708805 doi10111712795737

mdashmdashmdashmdashC J Kessinger T R SaxenM Steiner and S Dettling

2008b A machine learning approach to finding weather re-

gimes and skillful predictor combinations for short-term storm

forecasting Preprints Sixth Conf on Artificial Intelligence

Applications to Environmental Science13th Conf on Avia-

tion Range and Aerospace Meteorology New Orleans LA

Amer Meteor Soc J14 [Available online at httpsams

confexcomamspdfpapers135663pdf]

mdashmdash R Sharman J Craig and G Blackburn 2008c Remote

detection and diagnosis of thunderstorm turbulence Remote

Sensing Applications for Aviation Weather Hazard Detection

and Decision Support W Feltz and J Murray Eds In-

ternational Society for Optical Engineering (SPIE Pro-

ceedings Vol 7088) 708804 doi10111712795570

Zhang J and Coauthors 2011 National Mosaic andMulti-Sensor

QPE (NMQ) system Description results and future plans

Bull Amer Meteor Soc 92 1321ndash1338 doi101175

2011BAMS-D-11-000471

APRIL 2016 AH I J EVYCH ET AL 599

Unauthenticated | Downloaded 051722 0650 PM UTC