wave height prediction and the time of significant...
TRANSCRIPT
Data Analysis ProjectDecember 13, 2004
1
Wave Height Prediction and the Time of Significant Swell
I have always had a strong affinity for the ocean and its environment. When I was quite young I
was introduced to surfing and have never looked back. Surfing being a lifestyle and not just a hobby, is
ever present on my mind. With the ever growing demands on my time by the other areas of my life, it is
difficult to sit around, board under arm, waiting for the next swell to appear. I am looking to identify a
reliable and accurate method to determine when swell events are going to occur and for how long they are
going to occur so that I can plan ahead for the waves so I can keep myself and those around me happy
while sating my surfing itch.
Waves are the product of wind and fetch (the distance that the wind blows over.) Ideally, waves
are generated by a strong wind over a long fetch. This produces what is called a ground swell. A ground
swell is a long period swell that tends to have a greater significant wave height (dependent on the wind
speeds generating the swell) and by observation they seem to last longer than a shorter period swell caused
by a wind over a smaller fetch. Storms out at sea in combination with local weather have an impact on the
size, frequency, and length of a swell window (time that a dominant wave period begins to show on a near
shore buoy.)
By analyzing the conditions at different buoys I would like to be able to determine the size and
length of time that a swell will be adequate for surfing. Localized wave heights will be determined to be
sufficient for surfing if they are registering on the local buoys at 3ft.
In general, I expect that I will find the most significant swell patterns to result from hurricanes that
will arrive from the south in the late summer and early fall months and from Northeaster storms that
dominate the winter and early spring weather pattern. In addition, there are rare events that occur
throughout the year where local weather patterns are not at all conducive to wave generation, but a major
swell event occurs. These instances may be the most useful in identifying the offshore weather conditions
that produce swell and allow me the necessary information to predict localized swell events before they
physically evidence themselves.
There are other factors besides the weather that affect waves. The bottom contours of the ocean
floor, the steepness of the shoreline and the material (sand, reef, rock) that the ocean floor is made of all
affect the local wave patterns. I am choosing to ignore these factors in my study because the New Jersey
2
shore is mostly a sandy bottom resulting in what is called a beach break, as apposed to a reef or a point
break. Beach breaks have sand bars which are the shelves that slow the lower (underwater) portion of the
wave down causing the top of the wave to continue at speed, therefore resulting in a breaking wave. Beach
breaks themselves tend to be highly variable with waves breaking in different areas along the beach
depending upon the shifting of the sand bars. I am choosing to ignore this variability because I am not
looking at specific breaks along the shore, but rather I am using the data buoy to indicate general conditions
that produce waves of a significant height knowing that these general conditions will result in variable
conditions depending on ones location along the New Jersey shore. Having local knowledge of the areas I
most frequent and having built up years of “eyes on data” I can then translate the general buoy data into
more specific data about my favorite spots, but it is nearly impossible to quantify my local knowledge, and
I don’t want to give away all of my secrets anyway.
I have chosen to record 10 months worth of data at three buoys. These ten months are important
because the summer months where there is traditionally little wave producing activity should provide a nice
contrast to the rest of the year where waves are generated from the late summer / early fall hurricane season
and the winter / early spring Northeaster storms.1
The data I have gathered is from three NOAA buoy stations. I have chosen these stations because
two of them will capture the storm activity out at sea that produce swell for the New Jersey shore, and the
third will give an indication of local wave height and swell period.
1. Station 44011 – Georges Banks; to capture the northeastern storm patterns in the
winter months
2. Station 41001 – Cape Hatteras; to capture the hurricane patterns in the fall
3. Station 44009 – Delaware Bay; to capture the local wave height and weather
conditions for the New Jersey Shore affecting the surfing conditions
From these buoys I have collected six variables: wind direction, wind speed, wave height,
dominant wave period, barometric pressure, and the time at which these readings were recorded. Wind
speed, wind direction, and barometric pressure are all contributing factors in the size and duration of the
1 The official start to the hurricane season is June 1, however, past experience has demonstrated that tropical activity generating enough power to create a swell in the area we are looking does not evidence itself until August through October.
3
swell events. Wave height and dominant wave period are the resultant readings from the effects of weather
patterns that would generate a swell event. The time variable will allow me to identify the length of time
that a swell event impacts the local beaches. Before beginning my multiple regression analysis I have
evaluated my set of predictors and decided to focus on the wave height at the Delaware Bay Buoy (44009)
as the target variable and the wave height, dominant wave period, and the barometric pressure at the
Georges Bank Buoy (44011) and the Hatteras Buoy (41001) as my predictor variables. I have chosen to
dismiss wind direction and wind speed at the off shore buoys as they are captured in the barometric
pressure readings, being the likely resultant of changes in weather patterns. Additionally, the wave heights
at the offshore buoys will be affected directly by the wind speed. Therefore because the wind data can be
embedded in the wave heights and barometric readings, excluding the data will result in a simpler and
cleaner multiple regression.
The data I have used is from January through October 2004 . NOAA historical data is problematic
because it is voluminous and not well presented. NOAA presents its historical buoy data as readings for
every hour of the day, resulting in a data sheet of 17 columns wide and 8,760 rows long. Therefore it was
necessary for me to cull the data from a text file in order to have a manageable data set. I decided to take
the buoy data twice a day, at 6:00 and 18:00. Having a twelve hour gap between readings should be a good
balance between recording manageable data and not missing any significant swell train, since in my
experience, swell patterns tend to last longer than 12 hours barring any rare circumstances (offshore gusts
of 30+ mph) that would register as outliers.
I have chosen 2004 as my year to study because I can compare the actual data with journal entries
of swell events that I kept through out the year. Cross referencing my observed reference with the
statistical analysis should provide a better learning tool for me as to how to read the buoy data. While the
journal entries will not be recorded in this project, they will serve as a guide and yardstick to remember
swell events and help to identify those buoy characteristics that lead to conditions suited to surfing.
4
Wave Height Prediction at Delaware Buoy
Can the wave height for onshore buoys be predicted by using data from offshore buoys? Before beginning
my multiple regression analysis I have reevaluated my set of predictors and decided to focus on the wave
height at the Delaware Bay Buoy (44009) as the target variable and the wave height, dominant wave
period, and the barometric pressure at the Georges Bank Buoy (44011) and the Hatteras Buoy (41001) as
my predictor variables. I have chosen to dismiss wind direction and wind speed at the off shore buoys as
they are captured in the barometric pressure readings, being the likely resultant of changes in weather
patterns. Additionally, the wave heights at the offshore buoys will be affected directly by the wind speed.
Therefore because the wind data can be embedded in the wave heights and barometric readings, excluding
the data will result in a simpler and cleaner multiple regression.
First I want to get an idea of how my target data is distributed:
Looking at the histograms of data, the first chart evidences the wave heights are not normally distributed,
they are long right tailed. This makes sense because it is not possible to have wave heights of less than
zero and while the majority of points are within the one foot to five foot range, it is not all that abnormal to
have waves heights exceeding this range and on the rare occasion waves heights will exceed ten feet. By
taking the log10 of the waves heights, the data becomes more normally distributed, an indication that
perhaps we will want to continue with the logged values.
5
Scatter Plots
As the first step of my multiple regression I want to look at the scatter plots of my target variable versus
each of my predictor variables:
The scatter plots of the target variable relative to the predictor variables confirms that that the data is in fact
long right tailed, thus the spray effect. Because of the long right tailed nature of the data, it now makes
sense to use the logged data in order to get a relationship that is more linear with better variance properties.
6
Looking at the scatter plots of the logged data:
By taking the logged values we have a more normalized data set that will lead to a more accurate regression
model, and these values should be used going forward. The scatter plots indicate that there is a stronger
relationship between wave heights at the of shore buoys with the wave height at the onshore buoy.
Barometric pressure seems to have some relationship, but not nearly as strong as the wave height
relationships. The dominant period relationship appears to be the weakest relationship in the data. This
intuitively makes sense. Wave height to wave height relationships are the most direct relationship and the
effects of very large waves offshore are often visible at the onshore buoys and even at the beach. This is an
apples to apples comparison.
7
A note on the relationship between barometric pressure and wave height: A lower barometric pressure
indicates more unstable weather, or stormy weather. Waves are most often created by winds from storms,
therefore a negative relationship is expected. The lower the barometric pressure, the more volatile the
weather will be, increasing the chance of winds. Lower barometric pressures will there for indicate, at least
to some degree, storm activity and subsequently the possibility for wave generating conditions. It is
particularly interesting to see that the relationship between barometric pressure and onshore wave height is
stronger in the southern regions (Hatteras) as opposed to the northern regions (Georges Bank.) A possible
explanation for this may be the intense and concentrated low pressure events and resultant wave activities
from hurricanes. The Hatteras buoy sits directly in the path of some storms and tends to be at least in some
close vicinity for the tracks of most hurricanes that affect swell events on the eastern seaboard. The
proximity to such intense low pressure systems could have the effect of magnifying the relationship
between barometric pressure and wave height.
The relationship between dominant period and wave height seems to be the weakest and appears to have
the most variability of the relationships. This makes sense because while longer periods tend to be
indicative of more organized wave events, the fetch at which we are looking in the Atlantic tends to be
relatively small in comparison to the pacific ocean where the storms generated are much further off shore
resulting in more organized swell events with a more consistent and longer dominant wave period.
Regression Analysis: LOGD_Bay-WVH versus LOGHatteras-, LOGHatteras-, ...
The regression equation isLOGD_Bay-WVHT = 13.6 + 0.411 LOGHatteras-WVHT + 0.289 LOGHatteras-DPD - 24.5 LOGHatteras-BARO + 0.189 LOGG_Bank-WVHT - 0.186 LOGG_Bank-DPD + 20.0 LOGG_Bank-BARO
Predictor Coef SE Coef T PConstant 13.568 8.027 1.69 0.091LOGHatteras-WVHT 0.41106 0.04306 9.55 0.000LOGHatteras-DPD 0.28928 0.06721 4.30 0.000LOGHatteras-BARO -24.529 3.515 -6.98 0.000LOGG_Bank-WVHT 0.18854 0.03559 5.30 0.000LOGG_Bank-DPD -0.18627 0.06363 -2.93 0.004LOGG_Bank-BARO 20.024 2.552 7.85 0.000
S = 0.149992 R-Sq = 47.2% R-Sq(adj) = 46.7%
Analysis of Variance
Source DF SS MS F PRegression 6 11.6986 1.9498 86.67 0.000Residual Error 581 13.0711 0.0225Total 587 24.7698
8
The standard error of the estimate indicates that given these predictors which are logged, I can predict with
in a multiplicative effect of 10+.3, or +1.99 times a predicted wave height. Therefore for a predicted wave
of 3 feet, an actual wave height of 1.5 feet to 5.97 feet, and I will be able to predicted with in this range
95% of the time. The .3 standard error indicates that there is a strong relationship between the target and its
predictors and that the offshore wave heights, dominant periods, and barometric pressures are doing a
relatively good job of predicting the close to shore wave heights at the Delaware Bay buoy.
Looking at the R-Sq, 47.2% of the variability of the wave height at the Delaware buoy is accounted for in
the model. The linear relationship between the predictors tells us that for example a change in 1 of the
logged Hatteras wave height will result in approximately multiplying the Delaware Bay wave height by
10.41, or 2.57 feet, all else being held constant.
The only variable that is proving to be suspect is the logged Georges Bank dominant period, with a P value
of .004. All of the other P values indicated that the predictors are strong in telling me what the wave height
will be at the Delaware Bay wave height.
An unexpected result is the different barometric reading coefficients. This could be explained by the full
year readings where high pressure systems dominate the Georges Bank area for a great deal of the year due
to the convergence of the jet stream air flow and the Gulf Stream water flow, this combination leads to little
storm activity from May through October.
Descriptive Statistics: LOGD_Bay-WVHT
Variable N N* Mean SE Mean StDev Minimum Q1 MedianLOGD_Bay-WVHT 588 0 0.55414 0.00847 0.20542 0.09577 0.40808 0.52031
Q3 Maximum0.70068 1.12344
The data in the logged wave height at the Delaware Bay buoy covers a range of 1.028. The middle 50% of
the data covers a range of .293. Therefore, Predicting within a factor of ~ 2 is somewhat useful in
predicting wave heights using the defined predictors. However, the ability to predict waves that are
adequate to surf is not refined enough if I define a surfable wave as 3 feet or higher. Any predicted wave of
6 feet or higher, will indicate that 95% of the time I will have waves that I can surf. However, any
predicted wave below 6 feet and larger than 1.5 I cannot be sure that I will be able to ride the waves.
9
Residual Plots: Is this the right model?
10
The residual plots for the model seem to be pretty good which indicates that I am using the right model.
What the residuals do tell me is that there is evidence of many unusual observations that contribute the
inability of the model to accurately predict wave heights in a narrow enough range to determine if the
waves are fit for surfing.
Further analysis will be useful if I break out the predictors and perform multiple regressions for the
Hatteras Buoy vs. the Delaware Buoy and the Georges Bank Buoy vs. the Delaware Buoy independently.
The effects of the wide net I have cast may be affecting the results of the model. I have taken two offshore
buoys and tried to determine their effect on one onshore wave heights over a whole year. In order to refine
the model, it will be useful to focus my data and regressions. If I focus on the Georges Bank buoy and its
effects on the onshore waves heights for a better define period, say January through April, and focus on the
Hatteras Buoy’s effects on the onshore wave heights from August through October, the model may reveal
greater predictive abilities. I have demonstrated something that intuitively makes sense. It is necessary to
take into account the areas and times where each off shore buoy is most affected and determine that
relationship to the onshore resultant wave heights.
11
Separate Models for Each Predictor Buoy
To see if I can get more accurate models, I have broken out the regression for the two buoys, using the same predictors and tailoring the data to only reflect those time periods where the buoys are most affected by consistent storm activity. For the Hatteras buoy, I have focused the data on the months August through October (A-O) and for the Georges Bank buoy I have focused on the months January through April (J-AP.)
Regression Analysis: A-O:LOGD_Bay versus A-O:LOGHatte, A-O:LOGHatte, ... The regression equation isA-O:LOGD_Bay-WVHT = 42.7 + 0.444 A-O:LOGHatteras-WVHT + 0.190 A-O:LOGHatteras-DPD - 14.2 A-O:LOGHatteras-BARO
Predictor Coef SE Coef T PConstant 42.69 18.17 2.35 0.020A-O:LOGHatteras-WVHT 0.44358 0.07037 6.30 0.000A-O:LOGHatteras-DPD 0.1898 0.1020 1.86 0.064A-O:LOGHatteras-BARO -14.171 6.042 -2.35 0.020
S = 0.144582 R-Sq = 41.1% R-Sq(adj) = 40.1%
Analysis of Variance
Source DF SS MS F PRegression 3 2.62202 0.87401 41.81 0.000Residual Error 180 3.76274 0.02090Total 183 6.38476
Source DF Seq SSA-O:LOGHatteras-WVHT 1 2.46739A-O:LOGHatteras-DPD 1 0.03965A-O:LOGHatteras-BARO 1 0.11498
12
Regression Analysis: J-AP:LOGD_Ba versus J-AP:LOGG_Ba, J-AP:LOGG_Ba, ... The regression equation isJ-AP:LOGD_Bay-WVHT = 10.9 + 0.318 J-AP:LOGG_Bank-WVHT - 0.256 J-AP:LOGG_Bank-DPD - 3.43 J-AP:LOGG_Bank-BARO
Predictor Coef SE Coef T PConstant 10.88 10.28 1.06 0.291J-AP:LOGG_Bank-WVHT 0.31795 0.08509 3.74 0.000J-AP:LOGG_Bank-DPD -0.2558 0.1734 -1.48 0.142J-AP:LOGG_Bank-BARO -3.433 3.410 -1.01 0.315
S = 0.220169 R-Sq = 9.6% R-Sq(adj) = 8.4%
Analysis of Variance
Source DF SS MS F PRegression 3 1.10939 0.36980 7.63 0.000Residual Error 215 10.42201 0.04847Total 218 11.53140
Source DF Seq SSJ-AP:LOGG_Bank-WVHT 1 0.95365J-AP:LOGG_Bank-DPD 1 0.10661J-AP:LOGG_Bank-BARO 1 0.04913
By looking at the regressions of the individual offshore buoys versus the Delaware Bay buoy for the
months where the storms that would impact the offshore buoys (January through April for Georges Bank
and August through October for Hatteras) the results reveal something very interesting. The new models
13
reveal P values that indicate that wave height is the best predictor for the onshore wave heights at the
Delaware buoy. Additionally, the new model seems to be consistent with the inverse relationship between
barometric readings versus the onshore wave heights.
The predictive powers of the two separate models differ slightly from the larger model, but they do provide
me with enough information to make an educated guess as to upcoming surf.
Thus in the final analysis, it seems that Barometric pressure is a mild indicator of wave heights indicating
that the change in barometric pressure will tell me if there is a storm at the offshore buoys that I should
keep an eye to for wave generating conditions. The most reliable predictor is the actual wave generated
from these storms however, and those are the indicators that will most reliably tell me if I can expect to
have waves for surfing in the next couple of days. The time element is important as it is the knowledge of
what time of year storms are generated and where they are generated that proves to be essential in forming
a predictive model.
14