diamond price model - cae usershomepages.cae.wisc.edu/~ece539/project/f17/yang_rpt.pdfthe last c of...
TRANSCRIPT
Diamond Price Model
Summary
To compare neural network with the traditional statistic model, I first visualize diamonds’
data set, construct an Econometrics model: log(price) = 7.56104309911 +
0.0320045461853 * cut + 0.0778686923705 * color + 0.123285127953 * clarity +
1.87768943035 * log(carat) and analyze the importance of each variable. Then I
construct a one-layer neural network, a deep neural network and tune the parameters of
the one-layer neural network carefully by trying a lot of combinations of different
parameter. Data show that one-layer neural network with 17 neurons, whose learning rate
is 0.04 and momentum is 0.86 works well and beats the statistic model in terms of ability
of prediction. Since we cannot analyze data by just using neural networks, here comes
conclusion that both traditional statistic model and neural networks play an important
role.
Keywords log transform, Econometrics, deep neural network, back propagation algorithm, TensorFlow
1. Introduction and related work
There is a very neat data set about 53,940 diamonds’ carat, cut, color, clarity, depth,
table, price, length, depth and width on kaggle. Fortunately, the database has no missing
data.
The most popular report on the dataset is another price model by traditional statistic way.
After visualize data in several graphs, the author simply applies linear model with log
transform and get a pretty good model whose adjusted R squared is 0.9.
The second popular report is Diamond Cut’s Prediction with XGBoost. Its confusion rate
is up to 32% and it uses the price as input and cut as output which is totally different from
a price model.
2. Background Knowledge of Diamonds
Diamond is a metastable allotrope of carbon, where the carbon atoms are arranged in a
variation of the face-centered cubic crystal structure called a diamond lattice. Since it is
the hardest natural mineral on the earth whose Mohs scale of mineral hardness is 10, it
used to be the material of glasscutter before the invention of artificial diamonds. It is one
of most important, famous and shiny gems in the world. Diamonds are so beautiful and
expensive that they are often used in the engagement rings to show groom-to-bes’
sincerity and love.
In fact, diamonds are not as rare as most people thinks. in September 2012, Russia held a
press conference, claiming that in the eastern part of Siberia a massive diamond mine
with trillions of carats of reserves has been discovered, which is tens of times the size of
the world's diamond reserves and enough to satisfy all humankind’s demand for 3000
years. However, De Beers Consolidated Mines, Ltd. controls more than 90% diamonds
mines all over the world and has established a trust with owners of other diamonds mines.
As a monopolist, De Beers intentionally extracts a few diamonds every year to keep a
high price and maximize its profits. If any other miners extract plenty of diamonds and
sell them to the market, De Beers will sell its large inventories temporarily to impact the
market in a short period until the disobedient miner go bankrupt. As a result, nobody dare
to decrease diamonds’ price.
Since it is light, convenient to carry, precious and easy to be sold on a market, arms
dealers use them in a trade as shown in the film king of war. Jews also bring diamonds
with themselves when holocaust occurred to them in the film Schindler's List. Now that
diamonds can be medium of exchange to influence a trade, figuring out their market
value is quite essential. Besides, as ordinary consumers, it is important to know whether
we purchase diamonds at a reasonable price to avoid frauds.
Usually, prices are affected by diamonds’ weight and physical appearances so we can
judge its price by diamond’s characters in all aspects. There is a generally admitted
international standards called 4Cs of diamonds quality to classify diamonds and judge
their qualities. Although it is first created by The Gemological Institute of America(GIA),
you can get a diamond grading report depending on 4Cs standard from the other institutes
like International Gemological Institute. To be brief, 4Cs are cut, color, clarity and carat.
The most important of the 4Cs is Cut1 because it has the greatest influence on a
diamond's sparkle. In determining the quality of the cut, the diamond grader evaluates the
cutter’s skill in the fashioning of the diamond. The more precise the cut, the more
captivating the diamond is to the eye.
The second most important of the 4Cs is Color, which refers to a diamond's lack of color.
Color Gem-quality diamonds occur in many hues. In the range from colorless to light
yellow or light brown. Colorless diamonds are the rarest. So The less color, the higher the
grade. It is noteworthy that color grading system starts at D. Before GIA universalized
the D-to-Z Color Grading Scale, a variety of other systems were used loosely, from A, B,
and C (used without clear definition), to Arabic (0, 1, 2, 3) and Roman (I, II, III)
numbers, to descriptive terms like “gem blue” or “blue white,” which are notorious for
misinterpretation. So the inventors of the 4Cs standard wanted to start fresh, without any
association with earlier systems. Thus the scale starts at the letter D. Other natural colors
(blue, red, pink for example) are known as "fancy,” and their color grading is different
than from white colorless diamonds.
Clarity is Often the least important of the 4Cs. Diamonds can have internal characteristics
known as inclusions or external characteristics known as blemishes.2 Diamonds without
inclusions or blemishes are rare; however, most characteristics are tiny imperfections
which can only be seen with magnification of a microscope so clarity is not as important
as cut and colors.
The last C of 4Cs is the carat, the diamond’s physical weight measured in metric carats.
One carat equals 1/5 gram and is subdivided into 100 points. Carat weight is the most
objective grade of the 4Cs. Diamond prices jump at the full- and half-carat weights.
Diamonds just below these weights cost significantly less, and, because carat weight is
distributed across the entirety of the diamond, small size differences are almost
impossible to detect. Visually, there’s little difference between a 0.99 carat diamond and
one that weighs a full carat. But the price differences between the two can be significant.
This feature is important because it has great influence and impact on the accuracy of a
prediction model.
1 www.bluenile.com/education/diamonds 2 www.gia.edu/gia-about/4cs-clarity
3. Visualization of the data
As our professor said in class, before we apply a neural network model, it is always
helpful to look at the graphs generated from the data.
The data set has following fields:
fields descriptions
price price in US dollars ($326--$18,823)
carat weight of the diamond (0.2--5.01)
cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color diamond color, from J (worst) to D (best)
clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1,
VVS2, VVS1, IF (best))
x length in mm (0--10.74)
y width in mm (0--58.9)
z depth in mm (0--31.8)
depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
table width of top of diamond relative to widest point (43--95)
Table 1 attributes
Carat, cut, color and clarity have been introduced hereinabove, while x, y, z, depth and
table are attributes of cut, the most important element so that cut may have a greater
weight in the model and predict more precisely.
Before we dispose these fields in our model, we must convert some linguistic grade into
Arabic numbers. To make 4Cs have a positive relation with the price, the better the
diamond’s quality is, the greater numerical attributes will be. For cuts, fair is replaced
with 1, good is replaced with 2, very good is 3, premium is 4 and ideal is 5. As to color,
grade J gets 1 point, grade I gets 2 points … grade D gets 7 points. And for clarity, 1
stands for I1, 2 stands for SI2 … 8 stands for IF.
I also add lots of dummy variables like cutFair, cutGood, cutPremium, colorJ, colorI,
clarityI1, clarityI2 and so on. Their values are 1 when they are true and are 0 when they
are false.
Figure 1 carat distribution
The minimum weight is 0.2 carat while the maximum weight is 5.01 carats. The average
weight in the dataset is 0.798 carat and the sample standard deviation is 0.474carat. As
we can see in Figure 1 which is a bar graph in nature, most diamonds are lighter than 2.5
carats. More importantly, the distribution of carat does not obey the normal distribution.
Many diamonds have 0.3 or 0.4 or 0.5 or 0.7 or 0.9 or 1.2 or 1.5 carats while only a few
diamonds have 0.29, 0.39, 0.49, 0.69, 0.89, 1.19 or 1.49 carats. The price of a diamond
will jump a great step if it has 0.5 carat or 1 carat as mentioned above, so the jeweler pick
theses weights when cutting a raw diamond to make a fortune. Since we have only a few
data when diamonds are a litter lighter than 0.5 or 1 carat, it is difficult for a model to
reflect a price jump between 0.49 carat and 0.5 carat or the jump between 0.99 carat and
1 carat. We had better dispose these price jumps carefully and intentionally.
Figure 2 cuts distribution
Cuts are the most important elements and cutting technique is under human being’s
control so most diamonds dealer will sell diamonds with ideal cuts. The better the cut is,
the more diamonds there are in our dataset.
Figure 3 color distribution
0
5000
10000
15000
20000
25000
Fair Good VeryGood Premium Ideal
counts
cuts
0
2000
4000
6000
8000
10000
12000
J I H G F E D
counts
color
Figure 4 clarity distribution
It seems that the color distribution and the clarity distribution obey the normal
distributions. It is hard for our naked eyes to judge a diamond’s quality when its clarity is
better than grade VS2 and its color is colorless than grade G so the gem dealers do not
have to throw away diamonds with defective clarity or not perfect color.
Since we just define depth as 2𝑧
𝑥+𝑦, there is no need to visualize the relation between
depth and x or depth and y or depth and z.
From Figure 5, which are three almost straight lines, we can conclude that y is
proportional to x, z is proportional to x, z is proportional to y and they are highly related.
So we may only pick one variable to represent x, y and z.
0
2000
4000
6000
8000
10000
12000
14000
I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
counts
clarity
Figure 5 x-y, x-z, y-z scatters
Figure 6 x – table scatter
From Figure 6, we may find that there may be no certain relation between x and table.
Now that x, y and z are highly related, there may be no certain relation between y and
table or z and table, either.
Since distribution of x among all kinds of cuts are quite even in Figure 7, x itself has
nothing to do with cuts, which means whatever the diamond is big or small in size, it
could have the worst cut or the best cut.
Figure 7 x-cut scatter
However, diamonds depth will influence cut greatly. As shown before, the number of
diamonds with ideal cut is far greater than the number of diamonds with fair cut. To
eliminate bias caused by the different sample numbers between groups, I pick about 1610
diamonds randomly in each cut grade. Figure 8 shows that the closer depth is to the range
from 55 to 65, the more likely its cut is ideal, which means its cut is better.
Figure 8 depth – cut scatter
Now we could use depth to represent cut, the most important factor of a price model. In
Figure 9, both cut and carat influence price. There are strong and positive correlations
between carat and price. Besides, the better diamond’s cut is, the higher price it has.
More importantly, we could find that when carat is fixed, the price obey normal
distribution and whatever carat is, depth has the same mean and standard deviation,
which means carat and cut are unrelated.
Figure 9 carat-depth-price
Both Figure 10 and figure 11 are carat-price scatters. Dots in figure 10 are colored
according to diamonds’ colors while dots in figure 11 are colored according to diamonds’
clarity. They reveal that perhaps there lies in a power function relation between carat and
price when color and clarity are fixed. Consistent with verbal description mentioned
above, more transparent color or purer clarity usually leads to higher prices in figure 10
and 11. So it is reasonable and justifiable to set color J as 1, color I as 2, color H as 3 …
and convert I1 to 1, SI2 to 2, SI1 to 3 and so forth.
Figure 10 carat-color-price
Figure 11 carat-clarity-price
4. Statistical Analysis
Since we will compare the performance of traditional statistical analysis and the effect of
neural networks later, I pick 90% data (48,546 diamonds) randomly as the training set
and let the rest part (5,394 diamonds) become the testing set to make the comparison fair.
According to description of diamonds’ 4Cs standards, we first just use cut, color, clarity
and carat as independent variables and regard variable depth as an instrumental variable
which may be used if necessary.
Since we infer that there lies in a power function relation between carat and price based
on figure 10 and 11, when cut, color and clarity are fixed, we can get the formula:
𝑝𝑟𝑖𝑐𝑒 = 𝛼 × 𝑐𝑎𝑟𝑎𝑡𝛽 (𝛼 𝑎𝑛𝑑 𝛽 𝑎𝑟𝑒 𝑢𝑛𝑘𝑜𝑤𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑛𝑜𝑤)
log(𝑝𝑟𝑖𝑐𝑒) = log(𝛼) + 𝛽log (𝑐𝑎𝑟𝑎𝑡)
So we use log(price) and log(carat) instead of price and carat themselves. Another
advantage of using log(price) and log(carat) is that we could analysis percentage change
of price based on variation of cut, color, clarity or carat later. Now we could establish the
first model:
log(price) = w0 + w1 ∗ numericalCut + w2 ∗ numericalColor + w3
∗ numericalClarity + w4 ∗ log (carat)
Table 1 regression model
and get their coefficient shown in the superior table. The p-value of F-statistic is near 0 so
the equation itself is statistically significant. All coefficient’s p-value of t-Statistic is near
0, so all variables are all statistically significant. The adjusted R-squared is 0.979328 so
the model can explain 98% causes of price’s formation.
Firstly, we use White heteroscedasticity test to check our model.
Dependent Variable: LOG(PRICE)Method: Least SquaresDate: 12/02/17 Time: 12:23Sample: 1 48546Included observations: 48546
Variable Coefficient Std. Error t-Statistic Prob.
C 7.561043 0.003316 2280.377 0.0000NUMERICALCUT 0.032005 0.000606 52.79007 0.0000
NUMERICALCOLOR 0.077869 0.000407 191.2331 0.0000NUMERICALCLARITY 0.123285 0.000444 277.7187 0.0000
LOG(CARAT) 1.877689 0.001287 1459.116 0.0000
R-squared 0.979329 Mean dependent var 7.786411Adjusted R-squared 0.979328 S.D. dependent var 1.015213S.E. of regression 0.145966 Akaike info criterion -1.010777Sum squared resid 1034.224 Schwarz criterion -1.009872Log likelihood 24539.60 Hannan-Quinn criter. -1.010493F-statistic 574938.5 Durbin-Watson stat 1.987218Prob(F-statistic) 0.000000
Table 2 White heteroscedasticity test
The p-value of F-statistic is still near 0, which is smaller than 0.05, so there is no
heteroscedasticity problem here and we do not have to use weighted linear squares
regressions.
Secondly, Figure 12 shows that residues of the model are evenly distributed so there may
be no autocorrelation problem. In table 2, the Durbin-Watson statistic value is 1.987218,
which is quite near 2, so first order autocorrelation does not exist. Now that there is no
first order autocorrelation, higher order autocorrelations could not exist either.
Heteroskedasticity Test: White
F-statistic 271.9102 Prob. F(14,48531) 0.0000Obs*R-squared 3530.954 Prob. Chi-Square(14) 0.0000Scaled explained SS 7743.107 Prob. Chi-Square(14) 0.0000
Test Equation:Dependent Variable: RESID^2Method: Least SquaresDate: 12/02/17 Time: 12:33Sample: 1 48546Included observations: 48546
Variable Coefficient Std. Error t-Statistic Prob.
C 0.100515 0.003449 29.14721 0.0000NUMERICALCUT^2 0.002080 0.000158 13.17050 0.0000
NUMERICALCUT*NUMERICALCOLOR 0.000122 0.000109 1.124595 0.2608NUMERICALCUT*NUMERICALCLARITY 0.001098 0.000123 8.954261 0.0000
NUMERICALCUT*LOG(CARAT) -0.002102 0.000361 -5.826661 0.0000NUMERICALCUT -0.021523 0.001205 -17.86120 0.0000
NUMERICALCOLOR^2 0.000453 6.80E-05 6.654784 0.0000NUMERICALCOLOR*NUMERICALCLAR -0.000308 8.40E-05 -3.660054 0.0003
NUMERICALCOLOR*LOG(CARAT) 0.001876 0.000237 7.929443 0.0000NUMERICALCOLOR -0.001799 0.000768 -2.342169 0.0192
NUMERICALCLARITY^2 0.002286 7.35E-05 31.11894 0.0000NUMERICALCLARITY*LOG(CARAT) 0.003184 0.000266 11.97124 0.0000
NUMERICALCLARITY -0.023094 0.000829 -27.87292 0.0000LOG(CARAT)^2 0.028311 0.000707 40.05595 0.0000LOG(CARAT) 0.009504 0.001965 4.836121 0.0000
R-squared 0.072734 Mean dependent var 0.021304Adjusted R-squared 0.072467 S.D. dependent var 0.044621S.E. of regression 0.042974 Akaike info criterion -3.456153Sum squared resid 89.62367 Schwarz criterion -3.453437Log likelihood 83906.19 Hannan-Quinn criter. -3.455301F-statistic 271.9102 Durbin-Watson stat 1.997331Prob(F-statistic) 0.000000
Figure 12 residues
Thirdly, all t-statistic of variables in Table 1 are large enough so the model does not have
multicollinearity problem here.
Fourthly, covariance between cut and residue is -1.06 * 10-14, covariance between color
and residue is -1.13 * 10-14, covariance between clarity and residue is -1.67 * 10-14 and
covariance between log(carat) and residue is -1.58 * 10-14 . They are so small that there is
no stochastic explanatory variables problem here and we do not have to use varibles like
x, y, z, depth and table as instrumental variables and put them into our model.
At the end of day, we get the model by traditional statistic way:
log(price) = 7.56104309911 + 0.0320045461853 × numericalCut
+ 0.0778686923705 × numericalColor
+ 0.123285127953 × numericalClarity
+ 1.87768943035 × log (carat)
Assume we have a model log(y) 0
1 x
Δ log(y) 1 Δ x
When Δ log(x) →0, ∆𝑦
𝑦= β1
∆ x = (100 × β1 ) × ∆ x%
So when cut is upgraded to an adjacent higher grade, price will increase 3.20%.
When color is upgraded to an adjacent higher grade, price will increase 7.79%.
When clarity is upgraded to an adjacent higher grade, price will increase 12.33%.
When carat increases 1%, price will increase 1.88%.
It seems that clarity is more important than color and color is more important than
cut, which is contradicting to diamonds’ background knowledge. One of the
reasons is that the distribution of data set’s does not obey normal distribution and
we have only a few diamonds with fair cut but much diamonds with ideal cut as
shown in figure 2. Another reason is that cutting is always under human being’s
control while color and clarity are determined by the nature, it is wise for diamond
dealers to exaggerate and emphasis impacts of cut to make more money.
5. Prediction by neural networks
At first, I assume the price function is a continuous function, so I try to use Matlab’s
neural network tool box to construct a one-layer network.
Since the only critical parameter we can change in the toolbox is the number of neurons
in the hidden layer, I divide my training set into a real training set which contains 43,692
diamonds and a validation set containing 4,854 diamonds. I try to set the number of
neurons from 1 to 20 and for every number of neurons I do 100 trails to get the best
performance.
Figure 13 performance – number of neurons
As we can see in figure 13, when the number of neurons is greater than or equal to 10, the
performance is quite stable. And when we have 17 neurons in the hidden layer, we can
reach the best result. The mean square error on the testing set (not the validation set) is
1752824777.787574.
Figure 14 the structure of the best neural network
Figure 15 Gradient, Mu and Validation checks
Figure 16 Error Histograms
Figure 17 performance during the best trail
Figure 18 regression plot
As we all know, one-layer neural network can just simulate continuous functions while
multiple-layer perceptron can fit discontinuous functions. As mentioned in background
knowledge part, diamond prices jump at the full- and half-carat weights usually so the
price function may be discontinuous. Although from figure 10 and figure 11 we know
that the price function looks quite continuous, and figure 1 shows that we have few
diamonds whose weight are 0.99 carat or 0.49 carat so the discontinuity may not be a
problem here, we had better take a chance.
I use TensorFlow to construct a deep neural network as shown below.
Figure 18 panorama of deep neural network
Figure 19 details inside DNN (the red box in figure 18)
Since we use 17 neurons in one-layer neural network, it is a good idea to use 20 neurons
at all and 10 neurons in each hidden layer. After 2 hours’ training, the loss function of
training set become stable eventually as shown in Figure 20.
Figure 20 loss function of training set
However, its MSE of testing set is 2693111703.808056, worse than the output of one-
layer neural network which can be trained in half a minute. So we have to stop here and
conclude that one layer with 17 neurons is the best model.
The built-in Matlab neural network toolbox is a black box to us where we cannot set the
learning rate, the momentum, number of epochs between convergence check and so many
other crucial parameters. To get the better neural network, I modify our professor’s
Matlab codes on Back-propagation Multi-Layer Perceptron, set a model with 1 hidden
layer and 17 neurons. To make later comparison fair, I use 3-way cross validation just on
the training set and get the relation between the mean square error and the combination of
learning rate and the momentum.
Figure 21 learning rate – momentum – MSE
From figure 21, we can learn that the best learning rate is 0.04 and the best momentum is
0.86. Applying these parameters, we can get a quite small mean square error
(691834782.0560297) of the testing set, which is only about one-fifth of statistical
model’s mean square error.
6. Prediction Comparison
The author of the report Shine bright like a diamond raised 5 linear models with log
transform
Formula 1: log(price) = w0 + w1 × carat1
3
Formula 2: log(price) = w0 + w1 × carat1
3 + w2 × carat
Formula 3: log(price) = w0 + w1 × carat1
3 + w2 × carat + w3 × cut
Formula 4: log(price) = w0 + w1 × carat1
3 + w2 × carat + w3 × cut +
w4 × color
Formula 5: log(price) = w0 + w1 × carat1
3 + w2 × carat + w3 × cut +
w4 × color + w5 × clarity
By using Eviews or Excel, we could get the coefficients:
Formula 1: log(price) = 2.82034553075 + 5.55851129815 × carat1
3
Formula 2: log(price) = 1.03902034605 + 8.56729813795 × carat1
3 −
1.13661881495 × carat
Formula 3: log(price) = 0.730922296912 + 8.69718424697 × carat1
3 −
1.16623519277 × carat + 0.0552672749281 × cut
Formula 4: log(price) = 0.513286096807 + 8.50650043631 × carat1
3 −
1.03155001669 × carat + 0.0565930262223 × cut + 0.0624816274995 ×
color
Formula 5: log(price) = −0.572161029767 + 9.31558417329 × carat1
3 −
1.16644626416 × carat + 0.0329350331621 × cut + 0.0776484129798 ×
color + 0.122328957467 × clarity
Now we can compare his models’ mean squared errors and my related results on the
testing set:
model testing set’s MSE
The others’ formula 1 26875409211.834496
The others’ formula 2 11405278224.792236
The others’ formula 3 10992406214.85137
The others’ formula 4 9309886097.407425
The others’ formula 5 3445772154.118181
Econometrics model 4085100041.590896
The most predictive
statistic model
3359615842.0380077
One-layer network 1752824777.787574
Deep neural network 2693111703.808056
Refined network 691834782.0560297
Table 3 MSE on testing data set
As table 3 shows, the most predictive statistical formula is formula 5, whose MSE is
3445772154.118181. It is a little better than my Econometrics model in the aspect of
prediction. However, the most significant work for statisticians is explaining what has
happened and what is going on. In other words, the advantage of statistical model is the
power to explain the influence of independent variables. I can conclude that when carat
increases 1%, price will increase 1.88%, while you cannot get the similar
conclusion from formula 5 since there are both 𝑐𝑎𝑟𝑎𝑡 and 𝑐𝑎𝑟𝑎𝑡1
3 on the right
hand side.
Besides, the author of the report Shine bright cannot even explain how he derived the
formula and why he set the power as 1
3. In fact, I can get a model with even higher
adjusted R-squared and lower mean square error on testing set by setting the power as
0.62.
log(price) = 2.08191951735 + 9.20137897266 × carat13
− 3.69894098461 × carat + 0.0333092842757 × cut
+ 0.077985772295 × color + 0.121101322659 × clarity
Similarly, I lose the ability to explain the influence of carat’s variation so I prefer my
original Econometrics model.
Although the adjusted R-squared of the Econometrics model is 0.98, which is quite
incredible and wonderful, the Econometrics model’s mean square error is still 4.7 times
of the best neural network’s counterpart as shown in table 3.
7. Conclusion
Both statistical model and neural network play a big part in data mining. Statistical model
combined with data visualization graphs can give us insights of data’s relations and the
rank of importance of every variable, while neural network is good at fitting data and
predicting more precisely. Since none of them can replace the other one, we had better
use statistics to analyze the past and predict the future by neural network.
Reference
[1] Vivek Mangipudi, Diamonds are Forever
[2] Benjamin Lott, Diamond Cut’s Prediction with XGBoost
[3] Jeffrey M. Wooldridge, Econometric Analysis of Cross Section and Panel Data (MIT
Press)
Dependent Variable: LOG(PRICE)Method: Least SquaresDate: 12/03/17 Time: 13:49Sample: 1 48546Included observations: 48546
Variable Coefficient Std. Error t-Statistic Prob.
C 2.081920 0.007306 284.9576 0.0000CARAT^(0.62) 9.201379 0.017928 513.2458 0.0000
CARAT -3.698941 0.011542 -320.4752 0.0000NUMERICALCUT 0.033309 0.000577 57.71657 0.0000
NUMERICALCOLOR 0.077986 0.000392 199.1133 0.0000NUMERICALCLARITY 0.121101 0.000422 287.0655 0.0000
R-squared 0.981286 Mean dependent var 7.786411Adjusted R-squared 0.981284 S.D. dependent var 1.015213S.E. of regression 0.138889 Akaike info criterion -1.110161Sum squared resid 936.3427 Schwarz criterion -1.109075Log likelihood 26952.95 Hannan-Quinn criter. -1.109821F-statistic 509036.5 Durbin-Watson stat 1.995656Prob(F-statistic) 0.000000
[4] Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning
[5] Michael Nielsen, A visual proof that neural nets can compute any function