analysing noaa's storm events database to prepare for severe weather events

16
22/6/2014 Anal y sing NOAA's Storm Events Database to Prepare for Severe Weather Ev ents file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData PeerAssessment2/repdata_pa2.html 1/16 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events Synopsis Th is document prov ide s an analy sis of U.S. National Oceanic and Atmospheric  Administr ation's (N OAA) Stor m Ev ent s Database in order t o address t h e qu estion of which major weather events are most  harmful with respect to population health and/or have th e gre atest economic consequen ces. Based o n it, a governm ent or mu nicipa l manager might be prepared for severe weather events and priori tize resources for different ty pes of events. It's part o f th e se cond assi gnmen t from the Coursera's Repro ducible Research online course. Data files Th e main data file used in this anal ysis is a 47 Mb comma-sep arated-valu e file compresse d via the bzip2 a lgorithm th at can be d own loaded from th e co ur se web s ite. Main data file: repdata data StormData.csv.bz2  Additionall y , w e'v e u sed tw o mor e fil es th at al low ed us t o 1) map in cons istent event ty pe descriptions to the limited list of 48 event t y pes, as defined in NOAA's National Weather Service (NWS) Directive 10-1605 and 2) adjust  economic damage s for inf lation. T hese files are available at my GitH ub public reposi tory : evty pes-di rective-10-1605.csv: a manu ally built mappi ng from the 985 different event ty pes i n the ori ginal data file to the NWS's 48-event ty pes li st, accordi ng to (my best  ju dgment of) Directive 10-1605 guidelin es. bls-cpi-1 950-20 13.csv: Average Consumer Price In dex for all calendar years in the 1950-2013 peri od, adjusted relatively to 1982 -84 (index=100). Th ese indexes were downl oaded from the Bureau Labor Statistics (BLS) website, with the following parameters: Series ID, CUUR0000SA0;  Year range: 1950-2013; One Time period: Annual Data; Output format: Tex t, comma delimi ted. Data processing We beg in by loading the main data file into R. Because it's a bzip2 compresse d file, we have to crea te a bzip2 fi le connection and pass it to read.csv() instead of just the filename. We split the rea ding i n tw o parts in order to s elect ju st the columns we need for 

Upload: pejotans

Post on 12-Oct-2015

70 views

Category:

Documents


0 download

DESCRIPTION

This document provides an analysis of U.S. National Oceanic and AtmosphericAdministration's (NOAA) Storm Events Database in order to address the question ofwhich major weather events are most harmful with respect to population health and/or havethe greatest economic consequences. Based on it, a government or municipal managermight be prepared for severe weather events and prioritize resources for different types ofevents. It's part of the second assignment from the Coursera's Reproducible Researchonline course.

TRANSCRIPT

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 1/16

    Analysing NOAA's Storm EventsDatabase to Prepare for SevereWeather Events

    Synopsis

    This document provides an analysis of U.S. National Oceanic and AtmosphericAdministration's (NOAA) Storm Events Database in order to address the question ofwhich major weather events are most harmful with respect to population health and/or havethe greatest economic consequences. Based on it, a government or municipal managermight be prepared for severe weather events and prioritize resources for different types ofevents. It's part of the second assignment from the Coursera's Reproducible Researchonline course.

    Data files

    The main data file used in this analysis is a 47Mb comma-separated-value filecompressed via the bzip2 algorithm that can be downloaded from the course web site.

    Main data file: repdata data StormData.csv.bz2

    Additionally, we've used two more files that allowed us to 1) map inconsistent event typedescriptions to the limited list of 48 event types, as defined in NOAA's National WeatherService (NWS) Directive 10-1605 and 2) adjust economic damages for inflation. Thesefiles are available at my GitHub public repository:

    evtypes-directive-10-1605.csv: a manually built mapping from the 985 different eventtypes in the original data file to the NWS's 48-event types list, according to (my bestjudgment of) Directive 10-1605 guidelines.

    bls-cpi-1950-2013.csv: Average Consumer Price Index for all calendar years in the1950-2013 period, adjusted relatively to 1982-84 (index=100). These indexes weredownloaded from the Bureau Labor Statistics (BLS) website, with the followingparameters: Series ID, CUUR0000SA0; Year range: 1950-2013; One Time

    period: Annual Data; Output format: Text, comma delimited.

    Data processing

    We begin by loading the main data file into R. Because it's a bzip2 compressed file, wehave to create a bzip2 file connection and pass it to read.csv() instead of just thefilename. We split the reading in two parts in order to select just the columns we need for

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 2/16

    our analysis.

    # Read only the first line to get columns names (to be able to select just a# few of them when reading the full dataset). Notice the bzip2 file# connection we create and close just after using it.filecon = bzfile("repdata-data-StormData.csv.bz2", "r")col.names = colnames(read.csv(filecon, nrows = 1))# If we don't close this connection now, next call to read.csv() will resume# from where it stopped, but we don't want this.close(filecon)

    # define needed columnscols.to.read = c("BGN_DATE", "COUNTY", "STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")

    # Read file from the beginning (new connection), keeping just the needed# columns.filecon = bzfile("repdata-data-StormData.csv.bz2", "r")storm = read.csv(filecon, colClasses = c("NULL", NA)[(col.names %in% cols.to.read) + 1])close(filecon)

    Let's take a look at our data.

    str(storm)

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 3/16

    ## 'data.frame': 902297 obs. of 10 variables:## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

    Cleaning and tranforming the data

    This data definitely needs some tidying. The more important things we'd want to changeare:

    BGN_DATE as a factor: we think that a date type would be more appropriate,because it'll make it easier to aggregate and summarize on specific periods of time.

    storm$BGN_DATE = strptime(storm$BGN_DATE, "%m/%d/%Y %H:%M:%S")

    EVTYPE as a 985-level factor: it seems that there is a lot of redudancy (or mistakes)in these event type descriptions. According to NOAA's website, from 1996 topresent, only 48 event types are defined by NWS Directive 10-1605. We'd want totranslate or map this gigantic list of event types into NWS's latest list.

    evtype

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 4/16

    ## [1] "Astronomical Low Tide" "Avalanche" ## [3] "Blizzard" "Coastal Flood" ## [5] "Cold/Wind Chill" "Debris Flow" ## [7] "Dense Fog" "Dense Smoke" ## [9] "Drought" "Dust Devil" ## [11] "Dust Storm" "Excessive Heat" ## [13] "Extreme Cold/Wind Chill" "Flash Flood" ## [15] "Flood" "Freezing Fog" ## [17] "Frost/Freeze" "Funnel Cloud" ## [19] "Hail" "Heat" ## [21] "Heavy Rain" "Heavy Snow" ## [23] "High Surf" "High Wind" ## [25] "Hurricane (Typhoon)" "Ice Storm" ## [27] "Lake-Effect Snow" "Lakeshore Flood" ## [29] "Lightning" "Marine Hail" ## [31] "Marine High Wind" "Marine Strong Wind" ## [33] "Marine Thunderstorm Wind" "Rip Current" ## [35] "Seiche" "Sleet" ## [37] "Storm Surge/Tide" "Strong Wind" ## [39] "Thunderstorm Wind" "Tornado" ## [41] "Tropical Depression" "Tropical Storm" ## [43] "Tsunami" "Volcanic Ash" ## [45] "Waterspout" "Wildfire" ## [47] "Winter Storm" "Winter Weather"

    PROPDMG and CROPDMG with multiple scales: damage magnitudes are expressed ina separate variable (PROPDMGEXP and CROPDMGEXP, respectively). We shouldnormalize them (to the same scale), otherwise it would not be straighforward tocompare damage values in different events and dates.

    We start by capitalizing all magnitude codes. It'll make the following steps easier.

    # capitalize all 1-letter magnitude codesstorm$PROPDMGEXP = toupper(storm$PROPDMGEXP)storm$CROPDMGEXP = toupper(storm$CROPDMGEXP)

    Before normalizing, we have to check whether there are invalid values in the magnitudevariables.

    # check invalid codes in property damagesvalid.mag = c("", "H", "K", "M", "B")with(storm[!storm$PROPDMGEXP %in% valid.mag, ], table(factor(PROPDMGEXP), format(BGN_DATE, "%Y")))

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 5/16

    ## ## 1993 1994 1995 2011## - 0 0 1 0## ? 2 1 5 0## + 0 1 4 0## 0 1 28 186 1## 1 0 0 25 0## 2 0 0 13 0## 3 0 0 4 0## 4 0 1 3 0## 5 0 2 26 0## 6 1 0 3 0## 7 0 0 5 0## 8 0 0 1 0

    # check invalid codes in crop damageswith(storm[!storm$CROPDMGEXP %in% valid.mag, ], table(factor(CROPDMGEXP), format(BGN_DATE, "%Y")))

    ## ## 1993 1994 1995## ? 2 0 5## 0 2 9 8## 2 0 0 1

    It seems that, apart from one observation in 2011, all invalid magnitude codes are from the1993-1995 period when, according to NOAA's website, events were extracted fromUnformatted Text files, a task known to be more prone to error. We'll leave them alone andconsider all to be invalid. That means that their normalized value will be set to NA.

    We're going to create two more variables to store the normalized values.

    magnitude = 10^c(0, 2, 3, 6, 9)storm$nPROPDMG = storm$PROPDMG * magnitude[match(storm$PROPDMGEXP, valid.mag)]storm$nCROPDMG = storm$CROPDMG * magnitude[match(storm$CROPDMGEXP, valid.mag)]

    Now we need to adjust them for inflation, using BLS Consumer Price Index data,mentioned in the Data files section.

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 6/16

    # read CPI datacpi = read.csv("bls-cpi-1950-2013.csv", skip = 13, colClasses = c("integer", "numeric", "NULL"))# get last indexlast.cpi = tail(cpi, 1)$Annual# calculate each row's inflation indexadj.index = last.cpi/cpi[match(storm$BGN_DATE$year + 1900, cpi$Year), "Annual"]# apply themstorm$nPROPDMG = floor(storm$nPROPDMG * adj.index)storm$nCROPDMG = floor(storm$nCROPDMG * adj.index)

    One last step would be to generate an aggregate dataset, with annual totals. That will bethe dataset used in our analysis.

    storm.agg = aggregate(cbind(FATALITIES, INJURIES, nPROPDMG, nCROPDMG) ~ BGN_DATE$year + EVTYPE48 + STATE, data = storm, sum)names(storm.agg)[1] = "YEAR"storm.agg$YEAR = storm.agg$YEAR + 1900

    Analysis

    As mentioned in the NOAA's website, until 1992, only tornado, thunderstorm wind and hailevents found their way into the database. Let's see what we have in our dataset.

    pre1993 = aggregate(cbind(FATALITIES, INJURIES, nPROPDMG, nCROPDMG) ~ EVTYPE48, data = storm.agg[storm.agg$YEAR < 1993, ], sum)pre1993

    ## EVTYPE48 FATALITIES INJURIES nPROPDMG nCROPDMG## 1 Hail 5 401 0.000e+00 0## 2 Thunderstorm Wind 263 3326 0.000e+00 0## 3 Tornado 4012 68036 1.147e+11 0

    As we can see, until 1992, property damages related to 'Tornado' events amount to$114.7 billion. This is more than 19% of total property damages of all event types in thewhole period. Tornado pre-1993 data also represents 26.5% of all fatalities and 48% of allinjuries, as we show below.

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 7/16

    metrics = c("FATALITIES", "INJURIES", "nPROPDMG", "nCROPDMG")pre1993[pre1993$EVTYPE48 == "Tornado", metrics]/colSums(storm.agg[metrics])

    ## FATALITIES INJURIES nPROPDMG nCROPDMG## 3 0.2652 0.4844 0.1909 0

    Any attempt of making an unbiased analysis should leave pre-1993 data out and that'swhat we're going to do.

    storm.agg.93 = subset(storm.agg, YEAR >= 1993)

    Events harmful to population health

    Now, we start to analyze the type of events most harmful to population health, according to1993 and forward storm data.

    harmful = aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE48, data = storm.agg.93, sum)# ordering by FATALITIES and INJURIESmost.harmful.5 = lapply(c("FATALITIES", "INJURIES"), function(n) harmful[order(harmful[n], decreasing = TRUE)[1:5], c("EVTYPE48", n)])most.harmful.5

    ## [[1]]## EVTYPE48 FATALITIES## 12 Excessive Heat 2016## 40 Tornado 1643## 20 Heat 1161## 14 Flash Flood 1065## 29 Lightning 817## ## [[2]]## EVTYPE48 INJURIES## 40 Tornado 23303## 15 Flood 6794## 12 Excessive Heat 6680## 39 Thunderstorm Wind 6231## 29 Lightning 5232

    We can see that, in the period including and after 1993, 'Tornado' was the event thatcaused more injuries, as most of us would pehaps expect.

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 8/16

    What came as a surprise is that 'Excessive Heat' caused more fatalities in the sameperiod (followed by 'Tornado' events). If we add to this number those 'Heat'-causedfatalities (3rd line of first list item), we'll have more than 3,100 heat-related fatalities, almosttwice as much as 'Tornado' fatalities.

    Let's investigate this a little further.

    heat.related = storm.agg.93$EVTYPE48 == "Excessive Heat" | storm.agg.93$EVTYPE48 == "Heat"

    library(ggplot2)library(grid)ggplot(storm.agg.93[heat.related, ], aes(x = as.factor(YEAR), y = FATALITIES)) + geom_bar(stat = "identity", fill = "darkolivegreen3") + labs(title = "Heat-related fatalities (1993-2011)", x = "Year", y = expression("Fatalities")) + guides(fill = FALSE) + theme_minimal() + theme(plot.title = element_text(size = 18, face = "bold"), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 9/16

    Look at the barplot above. Why is the number of 1995 heat-related fatalities so high? Let'scheck which states had more fatalities that year.

    states.heat = aggregate(FATALITIES ~ STATE, data = subset(storm.agg.93, heat.related & YEAR == 1995), sum)states.heat = states.heat[order(states.heat$FATALITIES, decreasing = TRUE), ]head(states.heat, 5)

    ## STATE FATALITIES## 12 IL 626## 29 PA 198## 38 WI 67## 21 MO 34## 24 NJ 20

    We can see that Illinois was the state with the greatest number of heat-related fatalities in1995. Searching on the internet, we find that it corresponds to the so-called 1995 Chicagoheat wave. According to Wikipedia, this heat wave led to approximately 750 heat-relateddeaths in Chicago over a period of five days. Most of the victims of the heatwave wereelderly poor residents of the inner city, who could not afford air conditioning and did notopen windows or sleep outside for fear of crime.

    Getting back to our list, we'll have flood-like events, thunderstorm winds and lightningevents, besides heat and tornadoes, as the most harmful events to population health.

    most.harmful.5

    ## [[1]]## EVTYPE48 FATALITIES## 12 Excessive Heat 2016## 40 Tornado 1643## 20 Heat 1161## 14 Flash Flood 1065## 29 Lightning 817## ## [[2]]## EVTYPE48 INJURIES## 40 Tornado 23303## 15 Flood 6794## 12 Excessive Heat 6680## 39 Thunderstorm Wind 6231## 29 Lightning 5232

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 10/16

    Events with the greatest economic consequences

    Applying a similar approach to find the events that caused the most damages to propertyand crop in the 1993-2011 period, we'll find these results.

    econ.cons = aggregate(cbind(nPROPDMG, nCROPDMG) ~ EVTYPE48, data = storm.agg.93, sum)# ordering by Property and Crop Damages (inflation adjusted)econ.cons.5 = lapply(c("nPROPDMG", "nCROPDMG"), function(n) econ.cons[order(econ.cons[n], decreasing = TRUE)[1:5], c("EVTYPE48", n)])econ.cons.5

    ## [[1]]## EVTYPE48 nPROPDMG## 15 Flood 1.780e+11## 25 Hurricane (Typhoon) 1.058e+11## 37 Storm Surge/Tide 5.678e+10## 40 Tornado 3.417e+10## 14 Flash Flood 2.149e+10## ## [[2]]## EVTYPE48 nCROPDMG## 9 Drought 1.841e+10## 15 Flood 1.534e+10## 26 Ice Storm 7.891e+09## 25 Hurricane (Typhoon) 7.186e+09## 19 Hail 3.959e+09

    'Flood' and 'Drought' come up as the most dangerous weather events to properties andcrop, respectively, with total damages worth around $178 billion in the former's case andarount $18.4 in the latter's, inflation adjusted. 'Flood' have its evil share of $15.3 billion incrop damages too.

    Alongside them, concerning property damages, we have 'Hurricane (Typhoon)' ($105billion), 'Storm Surge/Tide' ($56 billion), 'Tornado' ($34 billion) and 'Flash Flood' ($21billion); crop damages around $7.8 billion caused by 'Ice Storm', $7.1 billion caused againby 'Hurricane (Typhoon)' and $3.9 billion by 'Hail' complete this list of the mosteconomically harmful weather events.

    Lookint at the plot of property damage caused by top-5 events, we can't help but noticesome prominent points.

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 11/16

    propdmg.top5 = as.character(econ.cons.5[[1]]$EVTYPE48)propdmg.agg = aggregate(nPROPDMG ~ EVTYPE48 + YEAR, data = subset(storm.agg.93, EVTYPE48 %in% propdmg.top5), sum)ggplot(propdmg.agg, aes(x = as.factor(YEAR), y = nPROPDMG/10^9, group = EVTYPE48, color = EVTYPE48)) + geom_line(size = 1) + geom_point(size = 4) + labs(title = "Property damage - 5 most destructive event types", x = "Year", y = "Property damages ($ billion)", color = "Event type") + guides(fill = FALSE) + theme_minimal() + theme(plot.title = element_text(size = 16, face = "bold"), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), legend.position = c(0, 0.95), legend.justification = c(0, 1), legend.direction = "vertical", legend.key.height = unit(5, "mm"))

    The spikes in year 2005 are most likely to represent the economic losses of HurricaneKatrina, supposedly the costliest natural disaster in the US history. So, it seems odd thatthe spike in 2006 is higher than that. We decided to investigate this.

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 12/16

    Looking at the top states affected by flood in 2006, we get that California has the higherproperty damage annual total.

    states.flood = aggregate(nPROPDMG ~ STATE, data = subset(storm.agg.93, EVTYPE48 == "Flood" & YEAR == 2006), sum)states.flood = states.flood[order(states.flood$nPROPDMG, decreasing = TRUE), ]head(states.flood, 5)

    ## STATE nPROPDMG## 5 CA 1.332e+11## 18 LA 7.157e+08## 34 NY 1.840e+08## 38 PA 1.609e+08## 1 AK 7.074e+07

    Looking into the (almost) raw data, we see this suspect event, related to the period ofstorms and flooding in California between December 2005 and January 2006.

    ca.flood.2006 = subset(storm, STATE == "CA" & EVTYPE48 == "Flood" & BGN_DATE$year + 1900 == 2006)ca.flood.2006[which.max(ca.flood.2006$nPROPDMG), ]

    ## BGN_DATE COUNTY STATE EVTYPE FATALITIES INJURIES PROPDMG## 605953 2006-01-01 55 CA FLOOD 0 0 115## PROPDMGEXP CROPDMG CROPDMGEXP EVTYPE48 nPROPDMG nCROPDMG## 605953 B 32.5 M Flood 1.329e+11 37555071

    We went to the online NOAA Storm Events Database to look at the original record. Yes,property damage is coded as $115 billion, but in the Episode Narrative field, we clearlysee that it is likely to be an error.

    That's the Episode Narrative: Major flooding continued into the early hours of January 1st,before the Napa River finally fell below flood stage and the water receeded. Flooding wassevere in Downtown Napa from the Napa Creek and the City and Parks Department washit with $6 million in damage alone. The City of Napa had 600 homes with moderatedamage, 150 damaged businesses with costs of at least $70 million.

    So, we're pretty sure that this astounding value should be around $100 millions (notbillions). Now, let's fix this error.

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 13/16

    diff.nPROPDMG = 0.999 * ca.flood.2006[which.max(ca.flood.2006$nPROPDMG), ]$nPROPDMG# original data.framewrong.obs = with(storm, STATE == "CA" & EVTYPE48 == "Flood" & BGN_DATE$year + 1900 == 2006)storm[wrong.obs, ]$nPROPDMG = storm[wrong.obs, ]$nPROPDMG - diff.nPROPDMG# aggregate data.framewrong.obs = with(storm.agg, STATE == "CA" & EVTYPE48 == "Flood" & YEAR == 2006)storm.agg[wrong.obs, ]$nPROPDMG = storm.agg[wrong.obs, ]$nPROPDMG - diff.nPROPDMG# subset 1993-forward data.framewrong.obs = with(storm.agg.93, STATE == "CA" & EVTYPE48 == "Flood" & YEAR == 2006)storm.agg.93[wrong.obs, ]$nPROPDMG = storm.agg.93[wrong.obs, ]$nPROPDMG - diff.nPROPDMG

    Time to redo the economic losses analysis.

    econ.cons = aggregate(cbind(nPROPDMG, nCROPDMG) ~ EVTYPE48, data = storm.agg.93, sum)# ordering by FATALITIES and INJURIESecon.cons.5 = lapply(c("nPROPDMG", "nCROPDMG"), function(n) econ.cons[order(econ.cons[n], decreasing = TRUE)[1:5], c("EVTYPE48", n)])econ.cons.5

    ## [[1]]## EVTYPE48 nPROPDMG## 25 Hurricane (Typhoon) 1.058e+11## 37 Storm Surge/Tide 5.678e+10## 15 Flood 4.526e+10## 40 Tornado 3.417e+10## 14 Flash Flood 2.149e+10## ## [[2]]## EVTYPE48 nCROPDMG## 9 Drought 1.841e+10## 15 Flood 1.534e+10## 26 Ice Storm 7.891e+09## 25 Hurricane (Typhoon) 7.186e+09## 19 Hail 3.959e+09

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 14/16

    This time, the first positions in the properties and crop damage lists are occupied by'Hurricane (Typhoon)' and 'Drought'. Let's see how has our plot changed. Has HurricaneKatrina claimed its dreadful spot?

    propdmg.top5 = as.character(econ.cons.5[[1]]$EVTYPE48)propdmg.agg = aggregate(nPROPDMG ~ EVTYPE48 + YEAR, data = subset(storm.agg.93, EVTYPE48 %in% propdmg.top5), sum)ggplot(propdmg.agg, aes(x = as.factor(YEAR), y = nPROPDMG/10^9, group = EVTYPE48, color = EVTYPE48)) + geom_line(size = 1) + geom_point(size = 4) + labs(title = expression(atop("Property damage - 5 most destructive event types", scriptstyle("After fixing CA flood error"))), x = "Year", y = "Property damages ($ billion)", color = "Event type") + guides(fill = FALSE) + theme_minimal() + theme(plot.title = element_text(size = 16, face = "bold"), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), legend.position = c(0, 0.95), legend.justification = c(0, 1), legend.direction = "vertical", legend.key.height = unit(5, "mm"))

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 15/16

    Results

    In our analysis, we found that the most harmful weather events to population health wereheat-related events ('Excessive Heat' and 'Heat'), along with tornadoes, flood-like events,thunderstorm winds and lightning events.

    most.harmful.5

  • 22/6/2014 Analysing NOAA's Storm Events Database to Prepare for Severe Weather Events

    file:///C:/Users/Paulo%20Jean/Documents/GitHub/RepData_PeerAssessment2/repdata_pa2.html 16/16

    ## [[1]]## EVTYPE48 FATALITIES## 12 Excessive Heat 2016## 40 Tornado 1643## 20 Heat 1161## 14 Flash Flood 1065## 29 Lightning 817## ## [[2]]## EVTYPE48 INJURIES## 40 Tornado 23303## 15 Flood 6794## 12 Excessive Heat 6680## 39 Thunderstorm Wind 6231## 29 Lightning 5232

    Concerning property economic losses, the most destructive events were hurricanes, stormsurges/tides, flood (including flash floods) and tornadoes; in the case of crop losses,drought, ice storm and hail, besides hurricanes and flood, are the most important ones.

    econ.cons.5

    ## [[1]]## EVTYPE48 nPROPDMG## 25 Hurricane (Typhoon) 1.058e+11## 37 Storm Surge/Tide 5.678e+10## 15 Flood 4.526e+10## 40 Tornado 3.417e+10## 14 Flash Flood 2.149e+10## ## [[2]]## EVTYPE48 nCROPDMG## 9 Drought 1.841e+10## 15 Flood 1.534e+10## 26 Ice Storm 7.891e+09## 25 Hurricane (Typhoon) 7.186e+09## 19 Hail 3.959e+09