appendix f - economic analysis methodology
Post on 21-Jul-2016
27 Views
Preview:
DESCRIPTION
TRANSCRIPT
Appendix F Economic Analysis Methodology
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-1
Appendix F – Economic Analysis Methodology
DESCRIPTION OF DATA SET (Chapter V - Part B of the Main Report)
Land Square (Sq.) Miles – is sourced from USA.com, which uses 2010 US Census data, and is defined as the total land area in square miles per zip code. This data has been converted into land acres per zip code through multiplication. Since 1 square mile equals 640 acres, land square miles is multiplied by 640 to arrive at land acres per zip code.
Water Sq. Miles – is sourced from USA.com, which uses 2010 US Census data, and is defined as the total water area in square miles per zip code. This data has been converted using the same multiplication methodology as stated for Land above. Water square miles is multiplied by 640 to arrive at water acres per zip code.
Total Acres – is calculated by adding both land and water acres per zip code. Total acres are defined as the total area, in acres, for one given zip code. The reason acres are rather than square miles is because GIS data provided by the University of Kentucky was presented in acres.
Preserved Acres – is taken from University of Kentucky GIS Data and is defined as the number of total preserved acres for a given zip code. Preserved acres includes any historic, farmland, open space, and flood mitigation properties funded through the Preservation Trust Fund, Green Acres, State, Government, or any other type of preservation program.
Population – is sourced from USA.com and is defined as the number of people who inhabit a given zip code.
Driving Distance to NYC – is sourced from zip-codes.com, which uses 2010 US Census data, and is defined as the driving distance from a given Morris County zip code to Midtown Manhattan, New York. The zip code used for Manhattan is 10018, which is the closest access to Midtown Manhattan in driving proximity and accessibility to Morris County.
Median Household Income – is sourced from USA.com, which uses 2010 US Census data, and is defined as the amount of household income (often a combination of multiple earners) that divides an income distribution into separate groups for a given zip code.
Current P/SQFT – is sourced from Trulia.com, a home sales web database, and is defined as the average sales price per square foot of recently sold homes for a given zip code. Recently sold is defined as homes that were sold in 2014 from the months January through March.
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-2
DATA STEPS FOR REGRESSION ANALYSIS (Chapter V - Part B of the Main Report)
Identifying a dependent variable
The dependent variable is a factor that is measured in a regression as the function of independent variables – it is what is being explained by the independent variables. The regression in this analysis identifies price per square foot (P/SQFT) as the dependent variable. By creating a proper function to define P/SQFT, we can begin to identify the likelihood and magnitude of affect that each independent variable has on property values in Morris County.
Identifying and hypothesizing for independent variables
The study “What Drives Housing Prices” by James A. Kahn (2008), controls for several factors including land, housing services, demographic factors and others services in his paper to explain the driving factors of housing prices. In accordance with this methodology, it is important to control for these aspects as they are related to Morris County.
Creating a hypothesis before performing the regression allows for a more honest interpretation of regression results. A null hypothesis (H0) and an alternative hypothesis (H1) are specified for each independent variable before running the regression. The null hypothesis is assumed to be true until the regression is performed. Once the regression is performed, results are used to determine if the null hypothesis can be rejected. The hypotheses for each variable are stated below: Preserved Ratio:
H0: The greater the amount of preserved land does not increase P/SQFT.
H1: The greater the amount of preserved land in an area increases P/SQFT. Population Density:
H0: The greater the amount of people inhabiting an area does not increase P/SQFT.
H1: The greater the amount of people inhabiting an area does increase P/SQFT. Driving Distance to NYC:
H0: The closer a home is to NYC does not increase P/SQFT.
H1: The closer a home is to NYC increases P/SQFT. Median Household Income:
H0: The greater the median household income does not increase P/SQFT.
H1: The greater the median household income increases P/SQFT.
A table of variables, their descriptions, their hypothetical impacts on P/SQFT, and hypothetical rationale, are summarized below.
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-3
Description of Variables and Hypothetical Impacts (Chapter V - Part B of the Main Report)
Note: * signifies dependent variable
Determine sample grouping
While this report identifies 56 zip codes in Morris County, data from only 32 of the zip codes are in the final sampling used for the regression. Zip codes that were not used were omitted for the following reasons:
Overlap into areas outside of Morris County – the following 7 zip codes were discarded because of their overlap into areas outside of Morris County: 07435, 07920, 07438, 07931, 07830, 07840, and 07865. Overlap into other areas outside of Morris County could lead to inconsistencies.
“Preserved Ratio” was less than 10 or greater than 60 – the following 6 zip codes were discarded because their preserved ratio did not fit the parameters stated above: 07933, 07440, 07058, 07927, 07034, and 07935. Too small or large of a preserved ratio may over or understate the true impact of the land on P/SQFT in a given zip code.
“Driving Distance to NYC” exceeded 50 miles – one zip code, 07853, was 56.4 miles away from NYC and as such was considered an outlier.
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-4
The remaining 10 zip codes omitted had incomplete data.
The below table provides summary statistics of the final data set.
Summary Statistics of Final Data set (n=32)
Variable Mean Median Std. Dev. Min Max
P/SQFT $221.22 $208.50 $68.74 $104.00 $384.00
Preserved Ratio 26.02 23.31 12.52 10.31 57.87
Population Density 2.05 1.94 1.14 0.16 5.78
Driving Distance NYC 35.44 34.75 7.71 22.51 48.86
Median Household Income $109,223.03 $104,773.00 $32,088.95 $55,212.00 $226,111.00
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-5
Final Data set (Chapter V - Part B of the Main Report)
Zip
Code
Land
Acres
Water
Acres
Total
Acres
Preserved
Acres
Current
P/SQFT (2014)
Preserved
Ratio
Population
Density
Driving Distance
NYC
Median Household
Income
Water Sq.
MilesPopulation
Land Sq.
Miles
07946 2,662 58 2,720 1541 $292 57.87 1.2 34.67 $128,875 0.09 3,144 4.16
07866 14,189 1,478 15,667 7823 $176 55.13 1.4 38.64 $98,989 2.31 22,098 22.17
07930 16,371 45 16,416 7252 $222 44.30 0.5 45.68 $153,704 0.07 8,559 25.58
07885 9,626 461 10,086 4196 $165 43.60 1.0 40.87 $76,582 0.72 10,078 15.04
07980 1,741 19 1,760 728 $235 41.85 1.3 34.82 $99,155 0.03 2,307 2.72
07828 9,088 954 10,042 3405 $129 37.46 1.4 48.86 $70,924 1.49 14,150 14.2
07005 12,045 1,069 13,114 4339 $183 36.02 1.2 34 $92,369 1.67 15,269 18.82
07928 5,530 205 5,734 1917 $384 34.67 3.3 27.45 $140,231 0.32 19,144 8.64
07803 1,882 58 1,939 599 $161 31.81 1.9 40.86 $105,459 0.09 3,651 2.94
07035 4,301 339 4,640 1335 $178 31.05 2.3 22.51 $88,322 0.53 10,607 6.72
07405 12,602 800 13,402 3423 $183 27.16 1.3 30.62 $105,699 1.25 17,701 19.69
07054 8,781 422 9,203 2361 $261 26.89 3.2 29.03 $80,515 0.66 29,305 13.72
07045 4,512 205 4,717 1190 $186 26.37 2.1 25.49 $109,308 0.32 10,127 7.05
07960 22,406 237 22,643 5776 $237 25.78 1.9 35.18 $99,319 0.37 43,747 35.01
07945 11,987 192 12,179 3016 $258 25.16 0.8 40.69 $145,197 0.3 9,539 18.73
07856 2,246 525 2,771 528 $165 23.50 1.4 43.14 $69,803 0.82 3,944 3.51
07869 12,851 141 12,992 2971 $195 23.12 1.9 40.99 $122,786 0.22 25,291 20.08
07852 1,920 38 1,958 436 $155 22.72 1.8 43.8 $110,759 0.06 3,609 3
07836 10,221 77 10,298 2248 $148 21.99 1.2 47.46 $101,230 0.12 12,568 15.97
07082 3,866 45 3,910 811 $168 20.98 1.4 24.72 $137,941 0.07 5,384 6.04
07444 3,328 134 3,462 675 $302 20.28 3.2 24.92 $84,592 0.21 11,046 5.2
07850 2,061 294 2,355 415 $147 20.14 2.7 44.31 $90,139 0.46 6,436 3.22
07046 1,683 173 1,856 261 $320 15.53 2.3 32.8 $159,773 0.27 4,194 2.63
07976 4,474 256 4,730 683 $327 15.26 0.2 31.59 $226,111 0.4 754 6.99
07857 698 51 749 106 $104 15.18 4.3 45.56 $55,212 0.08 3,244 1.09
07981 4,314 96 4,410 635 $243 14.72 2.0 28.57 $102,125 0.15 8,865 6.74
07936 5,050 154 5,203 735 $289 14.55 2.1 25.43 $108,719 0.24 11,157 7.89
07876 3,654 102 3,757 507 $186 13.89 2.8 44.93 $111,599 0.16 10,619 5.71
07950 5,805 70 5,875 731 $224 12.59 3.3 35.67 $101,034 0.11 19,564 9.07
07940 2,976 13 2,989 363 $350 12.21 5.8 28.1 $107,208 0.02 17,278 4.65
07834 7,846 544 8,390 838 $244 10.68 2.1 33.38 $107,371 0.85 17,722 12.26
07932 4,333 160 4,493 447 $262 10.31 2.2 29.34 $104,087 0.25 9,868 6.77
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-6
REGRESSION EQUATION AND RESULTS DETAIL (Chapter V - Part B of the Main Report)
Specify regression methodology and equation
The model in this analysis employs a linear regression and uses ordinary least squares (OLS) as the computational method. While this report considered logarithmic equations and converting certain key variables into logarithmic equivalents, the data and observations at hand are best described by a linear regression using OLS. The structure of the equation is as follows:
𝓨𝒊 = 𝜷𝟎 + 𝜷𝟏𝚾𝟏𝒊 + 𝜷𝟐𝚾𝟐𝒊 + 𝜷𝟑𝚾𝟑𝒊 + 𝜷𝟒𝚾𝟒𝒊 + ⋯ + 𝜺𝒊
Applying this structure to the variables previously identified, the equation used for the regression is:
𝐏/𝐒𝐐𝐅𝐓𝒊 = 𝜷𝟎 + 𝜷𝟏𝐏𝐫𝐞𝐬𝐞𝐫𝐯𝐞𝐝 𝐑𝐚𝐭𝐢𝐨𝒊 + 𝜷𝟐𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐃𝐞𝐧𝐬𝐢𝐭𝐲𝒊 + 𝜷𝟑𝐃𝐫𝐢𝐯𝐢𝐧𝐠 𝐃𝐢𝐬𝐭𝐚𝐧𝐜𝐞 𝐍𝐘𝐂𝒊
+ 𝜷𝟒𝐌𝐞𝐝𝐢𝐚𝐧 𝐈𝐧𝐜𝐨𝐦𝐞𝒊 + 𝜺𝒊
Where P/SQFT is the dependent variable, 𝛽0 is the intercept term, Preserved Ratio, Population Density, Driving Distance NYC, and Median Income are independent variables with corresponding coefficients (𝛽), and
𝜀𝑖 is the error term.
Results: High Levels of Significance
The regression exhibits a high R2 (0.9687), indicating that the independent variables (Preserved Ratio, Population Density, Driving Distance NYC, and Median Household Income) do a satisfactory job of explaining P/SQFT. To be more explicit, the independent variables explain approximately 97% of the P/SQFT values. It should also be noted that each of the dependent variables was also statistically significant. This means the results of this model are likely to be highly accurate.
Linear Regression Outputs (Chapter V - Part B of the Main Report)
The chart below provides visual representation of the degree of accuracy with which the model predicts the actual P/SQFT of a given zip code within the sample set.
Regression Statistics
Multiple R 0.9842
R Square 0.9687
Adjusted R Square 0.9296
Standard Error 43.7477
Observations 32
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 0 #N/A #N/A #N/A #N/A #N/A
Preserved Ratio 1.5039 0.6430 2.3389 0.0267 0.1868 2.8210
Population Density 37.3624 6.2192 6.0076 1.79295E-06 24.6230 50.1018
Driving Distance NYC -2.3475 0.7784 -3.0160 0.0054 -3.9420 -0.7531
Median Household Income 0.0017 0.0002 9.1737 6.23393E-10 0.0013 0.0021
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-7
Model Predicted P/SQFT vs. Actual P/SQFT by Zip Code (Chapter V - Part B of the Main Report)
Testing Robustness of Results
The model presented in this report is a linear regression which uses ordinary least squares methodology. The OLS method is considered by this report to be the best linear unbiased estimator, given the parameters. An implicit assumption of employing this methodology is that the standard deviations of the error terms are constant and do not depend on the x-value (the explanatory variables). Were the variances to change based on changes in the x-value, this would imply heteroscedasticity1, and undermine the ability for the model to validate claims of statistical significance. In order to evaluate the presence of heteroscedasticity, line plots for each variable were created. The line plots provided below support the claim that the data is not heteroscedastic. The variances of the residuals for each variable do not show obvious trends as the x-variables increase or decrease. By looking for heteroscedasticity and finding that it is absent, the results of this regression are further validated.
This analysis also tests for multicollinearity2, a problem arising when the independent variables can predict each other with a high degree of accuracy. The presence of multicollinearity can cause problems when estimating the magnitude of a given independent variable. Variance Inflation Factors (VIFs) were used to test levels of multicollinearity, and showed that multicollinearity is present in the model. This suggests that we should be cautious when interpreting the magnitude of the coefficients.
Measures that should be taken in similar future models to improve robustness include expanding the number of observations in the model, as well as identifying
1 Heteroscedasticity - an irregular scattering of values in a series of distributions; accompanied by a comparable scatter of variances 2 In statistics, the occurrence of several independent variables in a multiple regression model are closely correlated to one another.
07946
07866
07930
07885
07980
07828
07005
07928
07803
07035
07405
07054
07045
07960
07945
07856
07869
07852
07836
07082
07444
07850
07046
07976
07857
07981
07936
07876
07950
07940
07834
07932
$0
$50
$100
$150
$200
$250
$300
$350
$400
$450
ModelPredictedPSQFvs.AcutalPSQFbyZipCode
PredictedP/SQFT ActualP/SQFT
Model Predicted PSQF vs. Actual PSQF by Zip Code
Appendix F – Economic Analysis Methodology
M o r r i s C o u n t y P r e s e r v a t i o n T r u s t F u n d A n a l y s i s a n d S t r a t e g y R e p o r t F-8
data for independent variables that could be substituted in efforts to decrease multicollinearity.
Testing for Heteroscedasticity with Residual Line Plots (Chapter V - Part B of the Main Report)
Pairwise Correlation of Variables (Chapter V - Part B of the Main Report)
Testing for Multicollinearity with VIFs (Chapter V - Part B of the Main Report)
-150
-100
-50
0
50
100
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00
Residuals
AcreRa o
PreservedRa oResidualPlot
-150
-100
-50
0
50
100
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
Residuals
Popula onDensity
Popula onDensityResidualPlot
-200
-100
0
100
0 10 20 30 40 50 60
Residuals
DrivingDistanceNYC
DrivingDistanceNYCResidualPlot
-200
-100
0
100
$0 $50,000 $100,000 $150,000 $200,000 $250,000
Residuals
MedianHouseholdIncome
MedianHouseholdIncomeResidualPlot
Variable P/SQFTMedian Household
IncomeAcre Ratio Population Density
Driving Distance
NYC
P/SQFT 1.000
Median Household Income 0.544 1.000
Preserved Ratio -0.143 -0.066 1.000
Population Density 0.227 -0.370 -0.451 1.000
Driving Distance NYC -0.578 -0.182 0.191 -0.231 1.000
R-squared as
defined by
other
Variance Inflation
Factor:
VIF=1/(1-R-
Driving Distance NYC 0.925 13.31
Preserved Ratio 0.826 5.73
Population Density 0.717 3.54
Median Household Income 0.868 7.56
Preserved Ratio Residual Plot Population Density Residual Plot
Driving Distance NYC Residual Plot
Median Household Income Residual Plot
top related