appendix f - economic analysis methodology

10

Click here to load reader

Upload: morris-county-nj

Post on 21-Jul-2016

27 views

Category:

Documents


2 download

DESCRIPTION

Final Draft - Preservation Trust Fund Analysis & Strategy Report

TRANSCRIPT

Page 1: Appendix F - Economic Analysis Methodology

Appendix F Economic Analysis Methodology

Page 2: Appendix F - Economic Analysis Methodology
Page 3: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-1

Appendix F – Economic Analysis Methodology

DESCRIPTION OF DATA SET (Chapter V - Part B of the Main Report)

Land Square (Sq.) Miles – is sourced from USA.com, which uses 2010 US Census data, and is defined as the total land area in square miles per zip code. This data has been converted into land acres per zip code through multiplication. Since 1 square mile equals 640 acres, land square miles is multiplied by 640 to arrive at land acres per zip code.

Water Sq. Miles – is sourced from USA.com, which uses 2010 US Census data, and is defined as the total water area in square miles per zip code. This data has been converted using the same multiplication methodology as stated for Land above. Water square miles is multiplied by 640 to arrive at water acres per zip code.

Total Acres – is calculated by adding both land and water acres per zip code. Total acres are defined as the total area, in acres, for one given zip code. The reason acres are rather than square miles is because GIS data provided by the University of Kentucky was presented in acres.

Preserved Acres – is taken from University of Kentucky GIS Data and is defined as the number of total preserved acres for a given zip code. Preserved acres includes any historic, farmland, open space, and flood mitigation properties funded through the Preservation Trust Fund, Green Acres, State, Government, or any other type of preservation program.

Population – is sourced from USA.com and is defined as the number of people who inhabit a given zip code.

Driving Distance to NYC – is sourced from zip-codes.com, which uses 2010 US Census data, and is defined as the driving distance from a given Morris County zip code to Midtown Manhattan, New York. The zip code used for Manhattan is 10018, which is the closest access to Midtown Manhattan in driving proximity and accessibility to Morris County.

Median Household Income – is sourced from USA.com, which uses 2010 US Census data, and is defined as the amount of household income (often a combination of multiple earners) that divides an income distribution into separate groups for a given zip code.

Current P/SQFT – is sourced from Trulia.com, a home sales web database, and is defined as the average sales price per square foot of recently sold homes for a given zip code. Recently sold is defined as homes that were sold in 2014 from the months January through March.

Page 4: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-2

DATA STEPS FOR REGRESSION ANALYSIS (Chapter V - Part B of the Main Report)

Identifying a dependent variable

The dependent variable is a factor that is measured in a regression as the function of independent variables – it is what is being explained by the independent variables. The regression in this analysis identifies price per square foot (P/SQFT) as the dependent variable. By creating a proper function to define P/SQFT, we can begin to identify the likelihood and magnitude of affect that each independent variable has on property values in Morris County.

Identifying and hypothesizing for independent variables

The study “What Drives Housing Prices” by James A. Kahn (2008), controls for several factors including land, housing services, demographic factors and others services in his paper to explain the driving factors of housing prices. In accordance with this methodology, it is important to control for these aspects as they are related to Morris County.

Creating a hypothesis before performing the regression allows for a more honest interpretation of regression results. A null hypothesis (H0) and an alternative hypothesis (H1) are specified for each independent variable before running the regression. The null hypothesis is assumed to be true until the regression is performed. Once the regression is performed, results are used to determine if the null hypothesis can be rejected. The hypotheses for each variable are stated below: Preserved Ratio:

H0: The greater the amount of preserved land does not increase P/SQFT.

H1: The greater the amount of preserved land in an area increases P/SQFT. Population Density:

H0: The greater the amount of people inhabiting an area does not increase P/SQFT.

H1: The greater the amount of people inhabiting an area does increase P/SQFT. Driving Distance to NYC:

H0: The closer a home is to NYC does not increase P/SQFT.

H1: The closer a home is to NYC increases P/SQFT. Median Household Income:

H0: The greater the median household income does not increase P/SQFT.

H1: The greater the median household income increases P/SQFT.

A table of variables, their descriptions, their hypothetical impacts on P/SQFT, and hypothetical rationale, are summarized below.

Page 5: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-3

Description of Variables and Hypothetical Impacts (Chapter V - Part B of the Main Report)

Note: * signifies dependent variable

Determine sample grouping

While this report identifies 56 zip codes in Morris County, data from only 32 of the zip codes are in the final sampling used for the regression. Zip codes that were not used were omitted for the following reasons:

Overlap into areas outside of Morris County – the following 7 zip codes were discarded because of their overlap into areas outside of Morris County: 07435, 07920, 07438, 07931, 07830, 07840, and 07865. Overlap into other areas outside of Morris County could lead to inconsistencies.

“Preserved Ratio” was less than 10 or greater than 60 – the following 6 zip codes were discarded because their preserved ratio did not fit the parameters stated above: 07933, 07440, 07058, 07927, 07034, and 07935. Too small or large of a preserved ratio may over or understate the true impact of the land on P/SQFT in a given zip code.

“Driving Distance to NYC” exceeded 50 miles – one zip code, 07853, was 56.4 miles away from NYC and as such was considered an outlier.

Page 6: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-4

The remaining 10 zip codes omitted had incomplete data.

The below table provides summary statistics of the final data set.

Summary Statistics of Final Data set (n=32)

Variable Mean Median Std. Dev. Min Max

P/SQFT $221.22 $208.50 $68.74 $104.00 $384.00

Preserved Ratio 26.02 23.31 12.52 10.31 57.87

Population Density 2.05 1.94 1.14 0.16 5.78

Driving Distance NYC 35.44 34.75 7.71 22.51 48.86

Median Household Income $109,223.03 $104,773.00 $32,088.95 $55,212.00 $226,111.00

Page 7: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-5

Final Data set (Chapter V - Part B of the Main Report)

Zip

Code

Land

Acres

Water

Acres

Total

Acres

Preserved

Acres

Current

P/SQFT (2014)

Preserved

Ratio

Population

Density

Driving Distance

NYC

Median Household

Income

Water Sq.

MilesPopulation

Land Sq.

Miles

07946 2,662 58 2,720 1541 $292 57.87 1.2 34.67 $128,875 0.09 3,144 4.16

07866 14,189 1,478 15,667 7823 $176 55.13 1.4 38.64 $98,989 2.31 22,098 22.17

07930 16,371 45 16,416 7252 $222 44.30 0.5 45.68 $153,704 0.07 8,559 25.58

07885 9,626 461 10,086 4196 $165 43.60 1.0 40.87 $76,582 0.72 10,078 15.04

07980 1,741 19 1,760 728 $235 41.85 1.3 34.82 $99,155 0.03 2,307 2.72

07828 9,088 954 10,042 3405 $129 37.46 1.4 48.86 $70,924 1.49 14,150 14.2

07005 12,045 1,069 13,114 4339 $183 36.02 1.2 34 $92,369 1.67 15,269 18.82

07928 5,530 205 5,734 1917 $384 34.67 3.3 27.45 $140,231 0.32 19,144 8.64

07803 1,882 58 1,939 599 $161 31.81 1.9 40.86 $105,459 0.09 3,651 2.94

07035 4,301 339 4,640 1335 $178 31.05 2.3 22.51 $88,322 0.53 10,607 6.72

07405 12,602 800 13,402 3423 $183 27.16 1.3 30.62 $105,699 1.25 17,701 19.69

07054 8,781 422 9,203 2361 $261 26.89 3.2 29.03 $80,515 0.66 29,305 13.72

07045 4,512 205 4,717 1190 $186 26.37 2.1 25.49 $109,308 0.32 10,127 7.05

07960 22,406 237 22,643 5776 $237 25.78 1.9 35.18 $99,319 0.37 43,747 35.01

07945 11,987 192 12,179 3016 $258 25.16 0.8 40.69 $145,197 0.3 9,539 18.73

07856 2,246 525 2,771 528 $165 23.50 1.4 43.14 $69,803 0.82 3,944 3.51

07869 12,851 141 12,992 2971 $195 23.12 1.9 40.99 $122,786 0.22 25,291 20.08

07852 1,920 38 1,958 436 $155 22.72 1.8 43.8 $110,759 0.06 3,609 3

07836 10,221 77 10,298 2248 $148 21.99 1.2 47.46 $101,230 0.12 12,568 15.97

07082 3,866 45 3,910 811 $168 20.98 1.4 24.72 $137,941 0.07 5,384 6.04

07444 3,328 134 3,462 675 $302 20.28 3.2 24.92 $84,592 0.21 11,046 5.2

07850 2,061 294 2,355 415 $147 20.14 2.7 44.31 $90,139 0.46 6,436 3.22

07046 1,683 173 1,856 261 $320 15.53 2.3 32.8 $159,773 0.27 4,194 2.63

07976 4,474 256 4,730 683 $327 15.26 0.2 31.59 $226,111 0.4 754 6.99

07857 698 51 749 106 $104 15.18 4.3 45.56 $55,212 0.08 3,244 1.09

07981 4,314 96 4,410 635 $243 14.72 2.0 28.57 $102,125 0.15 8,865 6.74

07936 5,050 154 5,203 735 $289 14.55 2.1 25.43 $108,719 0.24 11,157 7.89

07876 3,654 102 3,757 507 $186 13.89 2.8 44.93 $111,599 0.16 10,619 5.71

07950 5,805 70 5,875 731 $224 12.59 3.3 35.67 $101,034 0.11 19,564 9.07

07940 2,976 13 2,989 363 $350 12.21 5.8 28.1 $107,208 0.02 17,278 4.65

07834 7,846 544 8,390 838 $244 10.68 2.1 33.38 $107,371 0.85 17,722 12.26

07932 4,333 160 4,493 447 $262 10.31 2.2 29.34 $104,087 0.25 9,868 6.77

Page 8: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-6

REGRESSION EQUATION AND RESULTS DETAIL (Chapter V - Part B of the Main Report)

Specify regression methodology and equation

The model in this analysis employs a linear regression and uses ordinary least squares (OLS) as the computational method. While this report considered logarithmic equations and converting certain key variables into logarithmic equivalents, the data and observations at hand are best described by a linear regression using OLS. The structure of the equation is as follows:

𝓨𝒊 = 𝜷𝟎 + 𝜷𝟏𝚾𝟏𝒊 + 𝜷𝟐𝚾𝟐𝒊 + 𝜷𝟑𝚾𝟑𝒊 + 𝜷𝟒𝚾𝟒𝒊 + ⋯ + 𝜺𝒊

Applying this structure to the variables previously identified, the equation used for the regression is:

𝐏/𝐒𝐐𝐅𝐓𝒊 = 𝜷𝟎 + 𝜷𝟏𝐏𝐫𝐞𝐬𝐞𝐫𝐯𝐞𝐝 𝐑𝐚𝐭𝐢𝐨𝒊 + 𝜷𝟐𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐃𝐞𝐧𝐬𝐢𝐭𝐲𝒊 + 𝜷𝟑𝐃𝐫𝐢𝐯𝐢𝐧𝐠 𝐃𝐢𝐬𝐭𝐚𝐧𝐜𝐞 𝐍𝐘𝐂𝒊

+ 𝜷𝟒𝐌𝐞𝐝𝐢𝐚𝐧 𝐈𝐧𝐜𝐨𝐦𝐞𝒊 + 𝜺𝒊

Where P/SQFT is the dependent variable, 𝛽0 is the intercept term, Preserved Ratio, Population Density, Driving Distance NYC, and Median Income are independent variables with corresponding coefficients (𝛽), and

𝜀𝑖 is the error term.

Results: High Levels of Significance

The regression exhibits a high R2 (0.9687), indicating that the independent variables (Preserved Ratio, Population Density, Driving Distance NYC, and Median Household Income) do a satisfactory job of explaining P/SQFT. To be more explicit, the independent variables explain approximately 97% of the P/SQFT values. It should also be noted that each of the dependent variables was also statistically significant. This means the results of this model are likely to be highly accurate.

Linear Regression Outputs (Chapter V - Part B of the Main Report)

The chart below provides visual representation of the degree of accuracy with which the model predicts the actual P/SQFT of a given zip code within the sample set.

Regression Statistics

Multiple R 0.9842

R Square 0.9687

Adjusted R Square 0.9296

Standard Error 43.7477

Observations 32

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 0 #N/A #N/A #N/A #N/A #N/A

Preserved Ratio 1.5039 0.6430 2.3389 0.0267 0.1868 2.8210

Population Density 37.3624 6.2192 6.0076 1.79295E-06 24.6230 50.1018

Driving Distance NYC -2.3475 0.7784 -3.0160 0.0054 -3.9420 -0.7531

Median Household Income 0.0017 0.0002 9.1737 6.23393E-10 0.0013 0.0021

Page 9: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-7

Model Predicted P/SQFT vs. Actual P/SQFT by Zip Code (Chapter V - Part B of the Main Report)

Testing Robustness of Results

The model presented in this report is a linear regression which uses ordinary least squares methodology. The OLS method is considered by this report to be the best linear unbiased estimator, given the parameters. An implicit assumption of employing this methodology is that the standard deviations of the error terms are constant and do not depend on the x-value (the explanatory variables). Were the variances to change based on changes in the x-value, this would imply heteroscedasticity1, and undermine the ability for the model to validate claims of statistical significance. In order to evaluate the presence of heteroscedasticity, line plots for each variable were created. The line plots provided below support the claim that the data is not heteroscedastic. The variances of the residuals for each variable do not show obvious trends as the x-variables increase or decrease. By looking for heteroscedasticity and finding that it is absent, the results of this regression are further validated.

This analysis also tests for multicollinearity2, a problem arising when the independent variables can predict each other with a high degree of accuracy. The presence of multicollinearity can cause problems when estimating the magnitude of a given independent variable. Variance Inflation Factors (VIFs) were used to test levels of multicollinearity, and showed that multicollinearity is present in the model. This suggests that we should be cautious when interpreting the magnitude of the coefficients.

Measures that should be taken in similar future models to improve robustness include expanding the number of observations in the model, as well as identifying

1 Heteroscedasticity - an irregular scattering of values in a series of distributions; accompanied by a comparable scatter of variances 2 In statistics, the occurrence of several independent variables in a multiple regression model are closely correlated to one another.

07946

07866

07930

07885

07980

07828

07005

07928

07803

07035

07405

07054

07045

07960

07945

07856

07869

07852

07836

07082

07444

07850

07046

07976

07857

07981

07936

07876

07950

07940

07834

07932

$0

$50

$100

$150

$200

$250

$300

$350

$400

$450

ModelPredictedPSQFvs.AcutalPSQFbyZipCode

PredictedP/SQFT ActualP/SQFT

Model Predicted PSQF vs. Actual PSQF by Zip Code

Page 10: Appendix F - Economic Analysis Methodology

Appendix F – Economic Analysis Methodology

M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-8

data for independent variables that could be substituted in efforts to decrease multicollinearity.

Testing for Heteroscedasticity with Residual Line Plots (Chapter V - Part B of the Main Report)

Pairwise Correlation of Variables (Chapter V - Part B of the Main Report)

Testing for Multicollinearity with VIFs (Chapter V - Part B of the Main Report)

-150

-100

-50

0

50

100

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00

Residuals

AcreRa o

PreservedRa oResidualPlot

-150

-100

-50

0

50

100

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Residuals

Popula onDensity

Popula onDensityResidualPlot

-200

-100

0

100

0 10 20 30 40 50 60

Residuals

DrivingDistanceNYC

DrivingDistanceNYCResidualPlot

-200

-100

0

100

$0 $50,000 $100,000 $150,000 $200,000 $250,000

Residuals

MedianHouseholdIncome

MedianHouseholdIncomeResidualPlot

Variable P/SQFTMedian Household

IncomeAcre Ratio Population Density

Driving Distance

NYC

P/SQFT 1.000

Median Household Income 0.544 1.000

Preserved Ratio -0.143 -0.066 1.000

Population Density 0.227 -0.370 -0.451 1.000

Driving Distance NYC -0.578 -0.182 0.191 -0.231 1.000

R-squared as

defined by

other

Variance Inflation

Factor:

VIF=1/(1-R-

Driving Distance NYC 0.925 13.31

Preserved Ratio 0.826 5.73

Population Density 0.717 3.54

Median Household Income 0.868 7.56

Preserved Ratio Residual Plot Population Density Residual Plot

Driving Distance NYC Residual Plot

Median Household Income Residual Plot