louis roussos sports data - istics.net

Louis Roussos Sports Data

Rank the sports you most like to participate in, 1 = favorite, 7 =least favorite. There are n=130 rank vectors.

> sportsranks

Baseball Football Basketball Tennis Cycling Swimming Jogging

1 3 7 2 4 5 6

1 3 2 5 4 7 6

4 7 3 1 5 6 2

3 2 1 4 7 5 6

3 2 1 4 5 6 7

5 7 6 4 1 3 2

2 1 6 7 3 5 4

K-means in RSet #Clusters = K = centers. nstart is the number of times it runsthe algorithm, each time using a diferent random starting set ofmeans.> kmeans(sportsranks,centers=2,nstart=10)K−means clustering with 2 clusters of sizes 62, 68

Cluster means:Baseball Football Basketball Tennis Cycling Swimming Jogging

1 2.451613 2.596774 3.064516 4.112903 4.709677 5.209677 5.8548392 5.014706 5.838235 4.352941 3.632353 2.573529 2.470588 4.117647

Clustering vector:

1 1 1 2 1 2 2 2 2 2 2 1 2 1 1 2 2 1 1 1 2 1 1 2 2 1 1 2 1 2 2 2 1 1 1 1 2 1 1 2 2 2 1 2 1 2 1 1 1 1

2 1 1 2 2 1 1 1 2 1 1 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 1 2 1 1 1 2 2 2 2 1 2 2 2 2 2 1 1 1 1

2 2 1 1 1 1 2 2 2 1 2 2 1 2 2 2 1 2 1 2 2 2 2 1 2 1 1 1 2 1

Within cluster sum of squares by cluster:[1] 1074.968 1288.176

Available components:[1] ”cluster” ”centers” ”withinss” ”size”

Getting clusters of size K=2, ..., 10

kms <− vector(’list’,10)for(K in 2:10) {

kms[[K]] <− kmeans(sportsranks,centers=K,nstart=10)}

K = 1 BaseB FootB BsktB Ten Cyc Swim JogGroup 1 3.79 4.29 3.74 3.86 3.59 3.78 4.95

K = 2 BaseB FootB BsktB Ten Cyc Swim JogGroup 1 5.01 5.84 4.35 3.63 2.57 2.47 4.12Group 2 2.45 2.60 3.06 4.11 4.71 5.21 5.85

K = 3 BaseB FootB BsktB Ten Cyc Swim JogGroup 1 2.33 2.53 3.05 4.14 4.76 5.33 5.86Group 2 4.94 5.97 5.00 3.71 2.90 3.35 2.13Group 3 5.00 5.51 3.76 3.59 2.46 1.90 5.78

K = 4 BaseB FootB BsktB Ten Cyc Swim JogGroup 1 5.10 5.47 3.75 3.60 2.40 1.90 5.78Group 2 2.30 2.10 2.65 5.17 4.75 5.35 5.67Group 3 2.40 3.75 3.90 1.85 4.85 5.20 6.05Group 4 4.97 6.00 5.07 3.80 2.80 3.23 2.13

K = 2: Group 1 likes swimming and cycling, while group 2 likes the team sports,

baseball, football, and basketball. K = 3: Group 1 appears to be about the same is the

team sports group from K = 2, while groups 2 and 3 both like swimming and cycling.

The difference is that group 3 does not like jogging, while group 2 does. K = 4: The

team-sports group has split into one that likes tennis (group 3), and one that doesn’t

(group 2).

Plotting two clusters

The idea is to project the observations to the subspace (which isjust a line) that goes through the two clusters’ mean vectors.The

z =µ̂1 − µ̂2

‖µ̂1 − µ̂2‖,

is the unit vector pointing from µ̂2 to µ̂1. Then using z as anaxis, the projections of the observations onto z have coordinates

wi = xiz′, i = 1, . . . , N.

The histogram

quency

−6 −4 −2 0 2 4 6

quency

−6 −4 −2 0 2 4 6

Baseball

Football

Basketball

Tennis

Cycling

Swimming

Jogging

Plot for K=3If K = 3, then the three means lie in a plane, hence we wouldlike to project the observations onto that plane. One approachis to use principal components on the means:

µ̂1µ̂2µ̂3

we apply the spectral decomposition to the sample covariancematrix of Z:

Z′H3Z = GLG′, (1)

where G is orthogonal and L is diagonal. The diagonals of Lhere are 11.77, 4.07, and five zeros. We then rotate the data andthe means using G,

W = XG and W(means) = ZG,

Only the first two columns in each matrix are relevant.

The Plot

−4 −2 0 2 4

BaseballFootball

BasketballTennis

Cycling

Swimming

Jogging

The sums of squares

2 4 6 8 10

SSK = obj(µ̂1, . . . , µ̂K) =K

∑k=1

∑{i|yi=k}

‖xi − µ̂k‖2.

The reduction of sums of squares

2 4 6 8 10

1-SS[k]/SS[k-1]

1− SSK

SSK−1

Silhouettes in RThe function silhouette.km finds the silhouettes for a givenclustering, then sort.silhouette orders them, first by clusternumber, then by value. To plot the sillhouettes for k = 2, . . . , 10:

sil.ave <− NULL # To collect silhouette’s means for each Kpar(mfrow=c(3,3))for(K in 2:10) {

sil <− silhouette.km(sportsranks,kms[[K]]$centers)sil.ave <− c(sil.ave,mean(sil))ssil <− sort.silhouette(sil,kms[[K]]$cluster)plot(ssil,type=’h’,xlab=’Observations’,ylab=’Silhouettes’)title(paste(’K =’,K))

The sil.ave calculated above can then be used to obtain the plotof averages:

plot(2:10,sil.ave,type=’l’,xlab=’K’,ylab=’Average silhouette width’)

Plotting the silhouettes

0 20 40 60 80 120

Ave = 0.625

0 20 40 60 80 120

Ave = 0.555

0 20 40 60 80 120

Ave = 0.508

0 20 40 60 80 120

Ave = 0.534

Plotting the silhouettes’ averages

2 4 6 8 10

ilhouette w

K = 2 seems like a good choice.

Model-based clustering – Car data

The data consists of size measurements on 111 automobiles, thevariables include length, wheelbase, width, height, front andrear head room, front leg room, rear seating, front and rearshoulder room, and luggage area. The data are in the file cars.The variables have been normalized to have medians of 0 andmedian absolute deviations (MAD) of 1.4826 (the MAD for aN(0, 1)).

R for model-based clustering

The R function we use is in the package mclust. The function isMclust. The basic command is simple:

mcars <− Mclust(cars)

There are many options for plotting in the package. To see aplot of the BIC’s, use

plot(mcars,cars,what=’BIC’)

You have to clicking on the graphics window, or hit enter, toreveal the plot. Not that the BIC’s in this function are actuallythe −BIC’s. So we want to maximize it.

Plotting the BIC’s

2 4 6 8

number of components

K = 2, VVV is best.

What is VVV?

To find the name of the best model:

> mcarsbest model: ellipsoidal, unconstrained with 2 components

That K = 2 is easy to see. The assumptions on the covariancematrices are “ellipsoidal,” which means they have no specialstructure, and “unconstrained,” which means they are notassumed equal for the two groups, Σ1 6= Σ2.

To plot variable 1 (length) versus variable 4 (height), use

plot(mcars,cars,what=’classification’,dimens=c(1,4))

Plotting the clusters

−4 −2 0 2 4

Length

−4 −2 0 2 4

−4 −2 0 2 4 6

RearHd

0 10 20 30

The cars in group 2

Rear Head Rear Seating Rear Shoulder LuggageChevrolet Corvette −4.0 −19.67 −28.00 −8.0Honda Civic CRX −4.0 −19.67 −28.00 −8.0Mazda MX5 Miata −4.0 −19.67 −28.00 −8.0Mazda RX7 −4.0 −19.67 −28.00 −8.0Nissan 300ZX −4.0 −19.67 −28.00 −8.0Chevrolet Astro 2.5 0.33 −1.75 −8.0Chevrolet Lumina APV 2.0 3.33 4.00 −8.0Dodge Caravan 2.5 −0.33 −6.25 −8.0Dodge Grand Caravan 2.0 2.33 3.25 −8.0Ford Aerostar 1.5 1.67 4.25 −8.0Mazda MPV 3.5 0.00 −5.50 −8.0Mitsubishi Wagon 2.5 −19.00 2.50 −8.0Nissan Axxess 2.5 0.67 1.25 −8.5Nissan Van 3.0 −19.00 2.25 −8.0Volkswagen Vanagon 7.0 6.33 −7.25 −8.0

Just group 1

Redo on just the group 1 automobiles:

cars1 <− cars[mcars$classification==1,]mcars1 <− Mclust(cars1)mcars1best model: elliposidal multivariate normal with 1 components

The best is one big cluster.

The models in mclust

Code Description ΣkEII spherical, equal volume σ2IpVII spherical, unequal volume σ2

k IpEEI diagonal, equal volume and shape ΛVEI diagonal, varying volume, equal shape ck∆EVI diagonal, equal volume, varying shape c∆kVVI diagonal, varying volume and shape ΛkEEE∗ ellipsoidal, equal volume, shape, and orientation ΣEEV ellipsoidal, equal volume and equal shape ΓkΛΓ′kVEV ellipsoidal, equal shape ckΓk∆Γ′kVVV∗ ellipsoidal, varying volume, shape, and orientation arbitrary

Here, Λ’s are diagonal matrices with positive diagonals, ∆’s are diagonal matrices with

positive diagonals whose product is 1, Γ’s are orthogonal matrices, Σ’s are arbitrary

nonnegative definite symmetric matrices, and c’s are positive scalars. A subscript k on

an element means the groups can have different values for that element. No subscript

means that element is the same for each group.

Hierarchical clustering of the sportsplclust(hclust(dist(t(sportsranks))))

Baseball

Basketb

Joggin

Tennis

Cyclin

Complete linkage

Hierarchical clustering of the individualspar(mfrow=c(2,1))dxs <− dist(sportsranks) # Gets Euclidean distanceslbl <− rep(’ ’,130) # Prefer no labels for the individualsplclust(hclust(dxs),xlab=’Complete linkage’,sub=’ ’,labels=lbl)plclust(hclust(dxs,method=’single’),xlab=’Single linkage’,sub=’ ’,labels=lbl)

Complete linkage

Height

Single linkage

Height

louis roussos sports data - istics.net

Documents

copyright 2014, stephanos a. roussos

demis roussos-goodbye my love goodbye

st. louis sports magazine march 2011

st. louis sports magazine september 2009

perdoname (demis roussos)

st. louis sports magazine fall 2012

st. louis sports magazine april 2011

st. louis sports magazine march 2012

d e m i s roussos

sports marketing(st. louis rams)

st. louis sports magazine january 2009

project based learning roussos

st. louis sports magazine september 2010

st. louis sports magazine

st. louis sports magazine april 2010

st. louis sports magazine may 2010

st. louis sports magazine october issue

dr. peter a. roussos

st. louis sports magazine may 09

curriculum vitae roussos -...