wave height prediction and the time of significant...

Data Analysis ProjectDecember 13, 2004

1

Wave Height Prediction and the Time of Significant Swell

I have always had a strong affinity for the ocean and its environment. When I was quite young I

was introduced to surfing and have never looked back. Surfing being a lifestyle and not just a hobby, is

ever present on my mind. With the ever growing demands on my time by the other areas of my life, it is

difficult to sit around, board under arm, waiting for the next swell to appear. I am looking to identify a

reliable and accurate method to determine when swell events are going to occur and for how long they are

going to occur so that I can plan ahead for the waves so I can keep myself and those around me happy

while sating my surfing itch.

Waves are the product of wind and fetch (the distance that the wind blows over.) Ideally, waves

are generated by a strong wind over a long fetch. This produces what is called a ground swell. A ground

swell is a long period swell that tends to have a greater significant wave height (dependent on the wind

speeds generating the swell) and by observation they seem to last longer than a shorter period swell caused

by a wind over a smaller fetch. Storms out at sea in combination with local weather have an impact on the

size, frequency, and length of a swell window (time that a dominant wave period begins to show on a near

shore buoy.)

By analyzing the conditions at different buoys I would like to be able to determine the size and

length of time that a swell will be adequate for surfing. Localized wave heights will be determined to be

sufficient for surfing if they are registering on the local buoys at 3ft.

In general, I expect that I will find the most significant swell patterns to result from hurricanes that

will arrive from the south in the late summer and early fall months and from Northeaster storms that

dominate the winter and early spring weather pattern. In addition, there are rare events that occur

throughout the year where local weather patterns are not at all conducive to wave generation, but a major

swell event occurs. These instances may be the most useful in identifying the offshore weather conditions

that produce swell and allow me the necessary information to predict localized swell events before they

physically evidence themselves.

There are other factors besides the weather that affect waves. The bottom contours of the ocean

floor, the steepness of the shoreline and the material (sand, reef, rock) that the ocean floor is made of all

affect the local wave patterns. I am choosing to ignore these factors in my study because the New Jersey

2

shore is mostly a sandy bottom resulting in what is called a beach break, as apposed to a reef or a point

break. Beach breaks have sand bars which are the shelves that slow the lower (underwater) portion of the

wave down causing the top of the wave to continue at speed, therefore resulting in a breaking wave. Beach

breaks themselves tend to be highly variable with waves breaking in different areas along the beach

depending upon the shifting of the sand bars. I am choosing to ignore this variability because I am not

looking at specific breaks along the shore, but rather I am using the data buoy to indicate general conditions

that produce waves of a significant height knowing that these general conditions will result in variable

conditions depending on ones location along the New Jersey shore. Having local knowledge of the areas I

most frequent and having built up years of “eyes on data” I can then translate the general buoy data into

more specific data about my favorite spots, but it is nearly impossible to quantify my local knowledge, and

I don’t want to give away all of my secrets anyway.

I have chosen to record 10 months worth of data at three buoys. These ten months are important

because the summer months where there is traditionally little wave producing activity should provide a nice

contrast to the rest of the year where waves are generated from the late summer / early fall hurricane season

and the winter / early spring Northeaster storms.1

The data I have gathered is from three NOAA buoy stations. I have chosen these stations because

two of them will capture the storm activity out at sea that produce swell for the New Jersey shore, and the

third will give an indication of local wave height and swell period.

1. Station 44011 – Georges Banks; to capture the northeastern storm patterns in the

winter months

2. Station 41001 – Cape Hatteras; to capture the hurricane patterns in the fall

3. Station 44009 – Delaware Bay; to capture the local wave height and weather

conditions for the New Jersey Shore affecting the surfing conditions

From these buoys I have collected six variables: wind direction, wind speed, wave height,

dominant wave period, barometric pressure, and the time at which these readings were recorded. Wind

speed, wind direction, and barometric pressure are all contributing factors in the size and duration of the

1 The official start to the hurricane season is June 1, however, past experience has demonstrated that tropical activity generating enough power to create a swell in the area we are looking does not evidence itself until August through October.

3

swell events. Wave height and dominant wave period are the resultant readings from the effects of weather

patterns that would generate a swell event. The time variable will allow me to identify the length of time

that a swell event impacts the local beaches. Before beginning my multiple regression analysis I have

evaluated my set of predictors and decided to focus on the wave height at the Delaware Bay Buoy (44009)

as the target variable and the wave height, dominant wave period, and the barometric pressure at the

Georges Bank Buoy (44011) and the Hatteras Buoy (41001) as my predictor variables. I have chosen to

dismiss wind direction and wind speed at the off shore buoys as they are captured in the barometric

pressure readings, being the likely resultant of changes in weather patterns. Additionally, the wave heights

at the offshore buoys will be affected directly by the wind speed. Therefore because the wind data can be

embedded in the wave heights and barometric readings, excluding the data will result in a simpler and

cleaner multiple regression.

The data I have used is from January through October 2004 . NOAA historical data is problematic

because it is voluminous and not well presented. NOAA presents its historical buoy data as readings for

every hour of the day, resulting in a data sheet of 17 columns wide and 8,760 rows long. Therefore it was

necessary for me to cull the data from a text file in order to have a manageable data set. I decided to take

the buoy data twice a day, at 6:00 and 18:00. Having a twelve hour gap between readings should be a good

balance between recording manageable data and not missing any significant swell train, since in my

experience, swell patterns tend to last longer than 12 hours barring any rare circumstances (offshore gusts

of 30+ mph) that would register as outliers.

I have chosen 2004 as my year to study because I can compare the actual data with journal entries

of swell events that I kept through out the year. Cross referencing my observed reference with the

statistical analysis should provide a better learning tool for me as to how to read the buoy data. While the

journal entries will not be recorded in this project, they will serve as a guide and yardstick to remember

swell events and help to identify those buoy characteristics that lead to conditions suited to surfing.

4

Wave Height Prediction at Delaware Buoy

Can the wave height for onshore buoys be predicted by using data from offshore buoys? Before beginning

my multiple regression analysis I have reevaluated my set of predictors and decided to focus on the wave

height at the Delaware Bay Buoy (44009) as the target variable and the wave height, dominant wave

period, and the barometric pressure at the Georges Bank Buoy (44011) and the Hatteras Buoy (41001) as

my predictor variables. I have chosen to dismiss wind direction and wind speed at the off shore buoys as

they are captured in the barometric pressure readings, being the likely resultant of changes in weather

patterns. Additionally, the wave heights at the offshore buoys will be affected directly by the wind speed.

Therefore because the wind data can be embedded in the wave heights and barometric readings, excluding

the data will result in a simpler and cleaner multiple regression.

First I want to get an idea of how my target data is distributed:

Looking at the histograms of data, the first chart evidences the wave heights are not normally distributed,

they are long right tailed. This makes sense because it is not possible to have wave heights of less than

zero and while the majority of points are within the one foot to five foot range, it is not all that abnormal to

have waves heights exceeding this range and on the rare occasion waves heights will exceed ten feet. By

taking the log10 of the waves heights, the data becomes more normally distributed, an indication that

perhaps we will want to continue with the logged values.

5

Scatter Plots

As the first step of my multiple regression I want to look at the scatter plots of my target variable versus

each of my predictor variables:

The scatter plots of the target variable relative to the predictor variables confirms that that the data is in fact

long right tailed, thus the spray effect. Because of the long right tailed nature of the data, it now makes

sense to use the logged data in order to get a relationship that is more linear with better variance properties.

6

Looking at the scatter plots of the logged data:

By taking the logged values we have a more normalized data set that will lead to a more accurate regression

model, and these values should be used going forward. The scatter plots indicate that there is a stronger

relationship between wave heights at the of shore buoys with the wave height at the onshore buoy.

Barometric pressure seems to have some relationship, but not nearly as strong as the wave height

relationships. The dominant period relationship appears to be the weakest relationship in the data. This

intuitively makes sense. Wave height to wave height relationships are the most direct relationship and the

effects of very large waves offshore are often visible at the onshore buoys and even at the beach. This is an

apples to apples comparison.

7

A note on the relationship between barometric pressure and wave height: A lower barometric pressure

indicates more unstable weather, or stormy weather. Waves are most often created by winds from storms,

therefore a negative relationship is expected. The lower the barometric pressure, the more volatile the

weather will be, increasing the chance of winds. Lower barometric pressures will there for indicate, at least

to some degree, storm activity and subsequently the possibility for wave generating conditions. It is

particularly interesting to see that the relationship between barometric pressure and onshore wave height is

stronger in the southern regions (Hatteras) as opposed to the northern regions (Georges Bank.) A possible

explanation for this may be the intense and concentrated low pressure events and resultant wave activities

from hurricanes. The Hatteras buoy sits directly in the path of some storms and tends to be at least in some

close vicinity for the tracks of most hurricanes that affect swell events on the eastern seaboard. The

proximity to such intense low pressure systems could have the effect of magnifying the relationship

between barometric pressure and wave height.

The relationship between dominant period and wave height seems to be the weakest and appears to have

the most variability of the relationships. This makes sense because while longer periods tend to be

indicative of more organized wave events, the fetch at which we are looking in the Atlantic tends to be

relatively small in comparison to the pacific ocean where the storms generated are much further off shore

resulting in more organized swell events with a more consistent and longer dominant wave period.

Regression Analysis: LOGD_Bay-WVH versus LOGHatteras-, LOGHatteras-, ...

The regression equation isLOGD_Bay-WVHT = 13.6 + 0.411 LOGHatteras-WVHT + 0.289 LOGHatteras-DPD - 24.5 LOGHatteras-BARO + 0.189 LOGG_Bank-WVHT - 0.186 LOGG_Bank-DPD + 20.0 LOGG_Bank-BARO

Predictor Coef SE Coef T PConstant 13.568 8.027 1.69 0.091LOGHatteras-WVHT 0.41106 0.04306 9.55 0.000LOGHatteras-DPD 0.28928 0.06721 4.30 0.000LOGHatteras-BARO -24.529 3.515 -6.98 0.000LOGG_Bank-WVHT 0.18854 0.03559 5.30 0.000LOGG_Bank-DPD -0.18627 0.06363 -2.93 0.004LOGG_Bank-BARO 20.024 2.552 7.85 0.000

S = 0.149992 R-Sq = 47.2% R-Sq(adj) = 46.7%

Analysis of Variance

Source DF SS MS F PRegression 6 11.6986 1.9498 86.67 0.000Residual Error 581 13.0711 0.0225Total 587 24.7698

8

The standard error of the estimate indicates that given these predictors which are logged, I can predict with

in a multiplicative effect of 10+.3, or +1.99 times a predicted wave height. Therefore for a predicted wave

of 3 feet, an actual wave height of 1.5 feet to 5.97 feet, and I will be able to predicted with in this range

95% of the time. The .3 standard error indicates that there is a strong relationship between the target and its

predictors and that the offshore wave heights, dominant periods, and barometric pressures are doing a

relatively good job of predicting the close to shore wave heights at the Delaware Bay buoy.

Looking at the R-Sq, 47.2% of the variability of the wave height at the Delaware buoy is accounted for in

the model. The linear relationship between the predictors tells us that for example a change in 1 of the

logged Hatteras wave height will result in approximately multiplying the Delaware Bay wave height by

10.41, or 2.57 feet, all else being held constant.

The only variable that is proving to be suspect is the logged Georges Bank dominant period, with a P value

of .004. All of the other P values indicated that the predictors are strong in telling me what the wave height

will be at the Delaware Bay wave height.

An unexpected result is the different barometric reading coefficients. This could be explained by the full

year readings where high pressure systems dominate the Georges Bank area for a great deal of the year due

to the convergence of the jet stream air flow and the Gulf Stream water flow, this combination leads to little

storm activity from May through October.

Descriptive Statistics: LOGD_Bay-WVHT

Variable N N* Mean SE Mean StDev Minimum Q1 MedianLOGD_Bay-WVHT 588 0 0.55414 0.00847 0.20542 0.09577 0.40808 0.52031

Q3 Maximum0.70068 1.12344

The data in the logged wave height at the Delaware Bay buoy covers a range of 1.028. The middle 50% of

the data covers a range of .293. Therefore, Predicting within a factor of ~ 2 is somewhat useful in

predicting wave heights using the defined predictors. However, the ability to predict waves that are

adequate to surf is not refined enough if I define a surfable wave as 3 feet or higher. Any predicted wave of

6 feet or higher, will indicate that 95% of the time I will have waves that I can surf. However, any

predicted wave below 6 feet and larger than 1.5 I cannot be sure that I will be able to ride the waves.

9

Residual Plots: Is this the right model?

10

The residual plots for the model seem to be pretty good which indicates that I am using the right model.

What the residuals do tell me is that there is evidence of many unusual observations that contribute the

inability of the model to accurately predict wave heights in a narrow enough range to determine if the

waves are fit for surfing.

Further analysis will be useful if I break out the predictors and perform multiple regressions for the

Hatteras Buoy vs. the Delaware Buoy and the Georges Bank Buoy vs. the Delaware Buoy independently.

The effects of the wide net I have cast may be affecting the results of the model. I have taken two offshore

buoys and tried to determine their effect on one onshore wave heights over a whole year. In order to refine

the model, it will be useful to focus my data and regressions. If I focus on the Georges Bank buoy and its

effects on the onshore waves heights for a better define period, say January through April, and focus on the

Hatteras Buoy’s effects on the onshore wave heights from August through October, the model may reveal

greater predictive abilities. I have demonstrated something that intuitively makes sense. It is necessary to

take into account the areas and times where each off shore buoy is most affected and determine that

relationship to the onshore resultant wave heights.

11

Separate Models for Each Predictor Buoy

To see if I can get more accurate models, I have broken out the regression for the two buoys, using the same predictors and tailoring the data to only reflect those time periods where the buoys are most affected by consistent storm activity. For the Hatteras buoy, I have focused the data on the months August through October (A-O) and for the Georges Bank buoy I have focused on the months January through April (J-AP.)

Regression Analysis: A-O:LOGD_Bay versus A-O:LOGHatte, A-O:LOGHatte, ... The regression equation isA-O:LOGD_Bay-WVHT = 42.7 + 0.444 A-O:LOGHatteras-WVHT + 0.190 A-O:LOGHatteras-DPD - 14.2 A-O:LOGHatteras-BARO

Predictor Coef SE Coef T PConstant 42.69 18.17 2.35 0.020A-O:LOGHatteras-WVHT 0.44358 0.07037 6.30 0.000A-O:LOGHatteras-DPD 0.1898 0.1020 1.86 0.064A-O:LOGHatteras-BARO -14.171 6.042 -2.35 0.020

S = 0.144582 R-Sq = 41.1% R-Sq(adj) = 40.1%



Source DF Seq SSA-O:LOGHatteras-WVHT 1 2.46739A-O:LOGHatteras-DPD 1 0.03965A-O:LOGHatteras-BARO 1 0.11498

12

Regression Analysis: J-AP:LOGD_Ba versus J-AP:LOGG_Ba, J-AP:LOGG_Ba, ... The regression equation isJ-AP:LOGD_Bay-WVHT = 10.9 + 0.318 J-AP:LOGG_Bank-WVHT - 0.256 J-AP:LOGG_Bank-DPD - 3.43 J-AP:LOGG_Bank-BARO

Predictor Coef SE Coef T PConstant 10.88 10.28 1.06 0.291J-AP:LOGG_Bank-WVHT 0.31795 0.08509 3.74 0.000J-AP:LOGG_Bank-DPD -0.2558 0.1734 -1.48 0.142J-AP:LOGG_Bank-BARO -3.433 3.410 -1.01 0.315

S = 0.220169 R-Sq = 9.6% R-Sq(adj) = 8.4%



Source DF Seq SSJ-AP:LOGG_Bank-WVHT 1 0.95365J-AP:LOGG_Bank-DPD 1 0.10661J-AP:LOGG_Bank-BARO 1 0.04913

By looking at the regressions of the individual offshore buoys versus the Delaware Bay buoy for the

months where the storms that would impact the offshore buoys (January through April for Georges Bank

and August through October for Hatteras) the results reveal something very interesting. The new models

13

reveal P values that indicate that wave height is the best predictor for the onshore wave heights at the

Delaware buoy. Additionally, the new model seems to be consistent with the inverse relationship between

barometric readings versus the onshore wave heights.

The predictive powers of the two separate models differ slightly from the larger model, but they do provide

me with enough information to make an educated guess as to upcoming surf.

Thus in the final analysis, it seems that Barometric pressure is a mild indicator of wave heights indicating

that the change in barometric pressure will tell me if there is a storm at the offshore buoys that I should

keep an eye to for wave generating conditions. The most reliable predictor is the actual wave generated

from these storms however, and those are the indicators that will most reliably tell me if I can expect to

have waves for surfing in the next couple of days. The time element is important as it is the knowledge of

what time of year storms are generated and where they are generated that proves to be essential in forming

a predictive model.

14

wave height prediction and the time of significant...

Documents