the mclust package - cmu statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · the mclust package...

93
The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics, University of Washington. Title Model-based cluster analysis Description Model-based cluster analysis: the 2002 version of MCLUST Depends R (>= 1.7.0) License See http://www.stat.washington.edu/mclust/license.txt Maintainer Ron Wehrens <[email protected]> URL http://www.stat.washington.edu/mclust R topics documented: Defaults.Mclust ....................................... 2 EMclust ........................................... 4 EMclustN .......................................... 6 Mclust ............................................ 8 bic .............................................. 9 bicE ............................................. 11 bicEMtrain ......................................... 12 cdens ............................................ 13 cdensE ............................................ 16 chevron ........................................... 18 clPairs ............................................ 18 classError .......................................... 20 compareClass ........................................ 21 coordProj .......................................... 22 cv1EMtrain ......................................... 24 decomp2sigma ....................................... 25 dens ............................................. 26 density ............................................ 28 diabetes ........................................... 29 em .............................................. 30 emE ............................................. 32 estep ............................................. 35 estepE ............................................ 37 grid1 ............................................. 39 1

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

The mclust PackageJanuary 18, 2005

Version 2.1-8

Author C. Fraley and A.E. Raftery, Dept. of Statistics, University of Washington.

Title Model-based cluster analysis

Description Model-based cluster analysis: the 2002 version of MCLUST

Depends R (>= 1.7.0)

License See http://www.stat.washington.edu/mclust/license.txt

Maintainer Ron Wehrens <[email protected]>

URL http://www.stat.washington.edu/mclust

R topics documented:

Defaults.Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2EMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4EMclustN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8bic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9bicE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11bicEMtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12cdens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13cdensE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16chevron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18clPairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18classError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20compareClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21coordProj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22cv1EMtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24decomp2sigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25dens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29em . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30emE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32estep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35estepE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37grid1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

1

Page 2: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

2 Defaults.Mclust

hc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40hcE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42hclass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43hypvol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44lansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46mapClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47mclust-internal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47mclust1Dplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48mclust2Dplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49mclustDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51mclustDAtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53mclustDAtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54mclustOptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56me . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58meE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60mstep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62mstepE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64mvn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65mvnX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67partconv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68partuniq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69plot.Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69plot.mclustDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70randProj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72sigma2decomp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74sim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75simE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77spinProj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79summary.EMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81summary.EMclustN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82summary.Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83summary.mclustDAtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84summary.mclustDAtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85surfacePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86uncerPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88unmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89

Index 91

Defaults.Mclust List of values controlling defaults for some MCLUST functions.

Description

A named list of values including tolerances for singularity and convergence assessment, and anenumeration of models used as defaults in MCLUST functions.

Details

A functionmclustOptions is supplied for assigning values to the.Mclust list.

Page 3: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

Defaults.Mclust 3

Value

A list with the following components:

eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allow computa-tions to proceed nearer to singularity. The default is the relative machine pre-cision .Machine$double.eps , which is approximately $2e-16$ on IEEE-compliant machines.

tol A vector of length two giving relative convergence tolerances for the loglikeli-hood and for parameter convergence in the inner loop for models with iterativeM-step ("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(1.e-5,1.e-5) .

itmax A vector of length two giving integer limits on the number of EM iterations andon the number of iterations in the inner loop for models with iterative M-step("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(Inf,Inf)allowing termination to be completely governed bytol .

equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. Default:equalPro = FALSE .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. Default:warnSingular = TRUE .

emModelNames A vector of character strings indicating the models to be used for multivari-ate data in the functions such asEMclust and mclustDAtrain that in-volve multiple models. The default is all of the multivariate models availablein MCLUST:

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

hcModelName A vector of two character strings giving the name of the model to be used in thehierarchical clustering phase for univariate and multivariate data, respectively,in EMclust andEMclustN . The default isc("V","VVV") , giving the un-constrained model in each case.

symbols A vector whose entries are either integers corresponding to graphics symbols orsingle characters for plotting for classifications. Classes are assigned symbolsin the given order.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and den-sity estimation. Journal of the American Statistical Association. Seehttp://www.stat.washington.edu/tech.reports (No. 380, 2000).

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/tech.reports .

Page 4: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

4 EMclust

See Also

mclustOptions , EMclust , mclustDAtrain , em, me, estep , mstep

Examples

n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepeven <- odd + 1test <- mclustDAtest(x[even, ], train) ## compute model densities

data(iris)irisMatrix <- iris[,1:4]irisClass <- iris[,5]

.Mclust

.Mclust <- mclustOptions(tol = 1.e-6, emModelNames = c("VII", "VVI", "VVV"))

.MclustirisBic <- EMclust(irisMatrix)summary(irisBic, irisMatrix).Mclust <- mclustOptions() # restore defaults.Mclust

EMclust BIC for Model-Based Clustering

Description

BIC for EM initialized by hierarchical clustering for parameterized Gaussian mixture models.

Usage

EMclust(data, G, emModelNames, hcPairs, subset, eps, tol, itmax, equalPro,warnSingular, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

G An integer vector specifying the numbers of mixture components (clusters) forwhich the BIC is to be calculated. The default is1:9 .

emModelNames A vector of character strings indicating the models to be fitted in the EM phaseof clustering. Possible models:

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume

Page 5: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

EMclust 5

"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

The default is.Mclust$emModelNames .

hcPairs A matrix of merge pairs for hierarchical clustering such as produced by func-tion hc . The default is to compute a hierarchical clustering tree by applyingfunctionhc with modelName = .Mclust$hcModelName[1] to univari-ate data andmodelName = .Mclust$hcModelName[2] to multivariatedata or a subset as indicated by thesubset argument. The hierarchical clus-tering results are used as starting values for EM.

subset A logical or numeric vector specifying the indices of a subset of the data to beused in the initial hierarchical clustering phase.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .

tol A scalar tolerance for relative convergence of the loglikelihood. The default is.Mclust$tol .

itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .

equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular=FALSE .

... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .

Value

Bayesian Information Criterion for the specified mixture models numbers of clusters. Auxiliaryinformation returned as attributes.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

summary.EMclust , EMclustN , hc , me, mclustOptions

Page 6: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

6 EMclustN

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

irisBic <- EMclust(irisMatrix)irisBicplot(irisBic)

irisBic <- EMclust(irisMatrix, subset = sample(1:nrow(irisMatrix), 100))irisBicplot(irisBic)

EMclustN BIC for Model-Based Clustering with Poisson Noise

Description

BIC for EM initialized by hierarchical clustering for parameterized Gaussian mixture models withPoisson noise.

Usage

EMclustN(data, G, emModelNames, noise, hcPairs, eps, tol, itmax,equalPro, warnSingular=FALSE, Vinv, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

G An integer vector specifying the numbers of MVN (Gaussian) mixture compo-nents (clusters) for which the BIC is to be calculated. The default is0:9 where0 indicates only a noise component.

emModelNames A vector of character strings indicating the models to be fitted in the EM phaseof clustering. Possible models:

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

The default is.Mclust$emModelNames .

Page 7: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

EMclustN 7

noise A logical or numeric vector indicating whether or not observations are initiallyestimated to noise in the data. If there is no noiseEMclust should be use ratherthanEMclustN .

hcPairs A matrix of merge pairs for hierarchical clustering such as produced by func-tion hc . The default is to compute a hierarchical clustering tree by applyingfunctionhc with modelName = .Mclust$hcModelName[1] to univari-ate data andmodelName = .Mclust$hcModelName[2] to multivariatedata or a subset as indicated by thesubset argument. The hierarchical clus-tering results are used as starting values for EM.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .

tol A scalar tolerance for relative convergence of the loglikelihood. The default is.Mclust$tol .

itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .

equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .

Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data.

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular=FALSE .

... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .

Value

Bayesian Information Criterion for the specified mixture models numbers of clusters. Auxiliaryinformation returned as attributes.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

summary.EMclustN , EMclust , hc , me, mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

b <- apply( irisMatrix, 2, range)n <- 450set.seed(0)

Page 8: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

8 Mclust

poissonNoise <- apply(b, 2, function(x, n=n)runif(n, min = x[1]-0.1, max = x[2]+.1), n = n)

set.seed(0)noiseInit <- sample(c(TRUE,FALSE),size=150+450,replace=TRUE,prob=c(3,1))Bic <- EMclustN(data=rbind(irisMatrix, poissonNoise), noise = noiseInit)Bicplot(Bic)

Mclust Model-Based Clustering

Description

Clustering via EM initialized by hierarchical clustering for parameterized Gaussian mixture models.The number of clusters and the clustering model is chosen to maximize the BIC.

Usage

Mclust(data, minG, maxG)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

minG An integer vector specifying the minimum number of mixture components (clus-ters) to be considered. The default is1 component.

maxG An integer vector specifying the maximum number of mixture components (clus-ters) to be considered. The default is9 components.

Value

A list representing the best model (according to BIC) for the given range of numbers of clusters.The following components are included:

BIC A matrix giving the BIC value for each model (rows) and number of clusters(columns).

bic A scalar giving the optimal BIC value.

modelName The MCLUST name for the best model according to BIC.classification

The classification corresponding to the optimal BIC value.

uncertainty The uncertainty in the classification corresponding to the optimal BIC value.

mu For multidimensional models, a matrix whose columns are the means of eachgroup in the best model. For one-dimensional models, a vector whose entriesare the means for each group in the best model.

sigma For multidimensional models, a three dimensional array in whichsigma[,,k]gives the covariance for thekth group in the best model. For one-dimensionalmodels, either a scalar giving a common variance for the groups or a vectorwhose entries are the variances for each group in the best model.

pro The mixing probabilities for each component in the best model.

Page 9: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

bic 9

z A matrix whose[i,k] th entry is the probability that observationi belongs to thekcomponent in the model. The optimal classification is derived from this, chosingthe class to be the one giving the maximum probability.

loglik The log likelihood for the data under the best model.

Details

The following models are compared inMclust :

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"VVV": ellipsoidal, varying volume, shape, and orientation

Mclust is intended to combineEMclust and itssummary in a simiplified one-step model-basedclustering function. The latter provide more flexibility including choice of models.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

plot.Mclust , EMclust

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]irisMclust <- Mclust(irisMatrix)

## Not run: plot(irisMclust,irisMatrix)

bic BIC for Parameterized MVN Mixture Models

Description

Compute the BIC (Bayesian Information Criterion) for parameterized mixture models given theloglikelihood, the dimension of the data, and number of mixture components in the model.

Page 10: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

10 bic

Usage

bic(modelName, loglik, n, d, G, ...)

Arguments

modelName A character string indicating the model. Possible models:

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

loglik The loglikelihood for a data set with respect to the MVN mixture model speci-fied in themodelName argument.

n The number of observations in the data use to computeloglik .

d The dimension of the data used to computeloglik .

G The number of components in the MVN mixture model used to computeloglik .

... Arguments for diagonal-specific methods, in particular

equalPro A logical variable indicating whether or not the components in themodel are assumed to be present in equal proportion. The default is.Mclust$equalPro .

noise A logical variable indicating whether or not the model includes and op-tional Poisson noise component. The default is to assume that the modeldoes not include a noise component.

Value

The BIC or Bayesian Information Criterion for the given input arguments.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

bicE , . . . ,bicVVV , EMclust , estep , mclustOptions , do.call .

Page 11: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

bicE 11

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

n <- nrow(irisMatrix)d <- ncol(irisMatrix)G <- 3

emEst <- me(modelName="VVI", data=irisMatrix, unmap(irisClass))names(emEst)

args(bic)bic(modelName="VVI",loglik=emEst$loglik,n=n,d=d,G=G)## Not run: do.call("bic", emEst) ## alternative call

bicE BIC for a Parameterized MVN Mixture Model

Description

Compute the BIC (Bayesian Information Criterion) for a parameterized mixture model given theloglikelihood, the dimension of the data, and number of mixture components in the model.

Usage

bicE(loglik, n, G, equalPro, noise = FALSE, ...)bicV(loglik, n, G, equalPro, noise = FALSE, ...)bicEII(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVII(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEEI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVEI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEVI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVVI(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEEE(loglik, n, d, G, equalPro, noise = FALSE, ...)bicEEV(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVEV(loglik, n, d, G, equalPro, noise = FALSE, ...)bicVVV(loglik, n, d, G, equalPro, noise = FALSE, ...)

Arguments

loglik The loglikelihood for a data set with respect to the MVN mixture model.

n The number of observations in the data used to computeloglik .

d The dimension of the data used to computeloglik .

G The number of components in the MVN mixture model used to computeloglik .

equalPro A logical variable indicating whether or not the components in the model are as-sumed to be present in equal proportion. The default is.Mclust$equalPro .

noise A logical variable indicating whether or not the model includes and optionalPoisson noise component. The default is to assume that the model does notinclude a noise component.

... Catch unused arguments from ado.call call.

Page 12: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

12 bicEMtrain

Value

The BIC or Bayesian Information Criterion for the MVN mixture model and data set correspondingto the input arguments.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

bic , EMclust , estepE , mclustOptions , do.call

Examples

## To run an example, see man page for bic## Not run:data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

n <- nrow(irisMatrix)d <- ncol(irisMatrix)G <- 3

emEst <- meVVI(data=irisMatrix, unmap(irisClass))names(emEst)

bicVVI(loglik=emEst$loglik, n=n, d=d, G=G)do.call("bicVVI", emEst) ## alternative call## End(Not run)

bicEMtrain Select models in discriminant analysis using BIC

Description

For the ten available discriminant models the BIC is calulated. The models for one-dimensionaldata are "E" and "V"; for higher dimensions they are "EII", "VII", "EEI", "VEI", "EVI", "VVI","EEE", "EEV", "VEV" and "VVV". This function is much faster thancv1EMtrain .

Usage

bicEMtrain(data, labels, modelNames)

Page 13: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

cdens 13

Arguments

data A data matrix

labels Labels for each row in the data matrix

modelNames Vector of model names that should be tested.

Value

Returns a vector where each element is the BIC for the corresponding model.

Author(s)

C. Fraley

See Also

cv1EMtrain

Examples

data(lansing)odd <- seq(from=1, to=nrow(lansing), by=2)round(bicEMtrain(lansing[odd,-3], labels=lansing[odd, 3]), 1)

cdens Component Density for Parameterized MVN Mixture Models

Description

Computes component densities for observations in parameterized MVN mixture models.

Usage

cdens(modelName, data, mu, ...)

Arguments

modelName A character string indicating the model. Possible models:

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

Page 14: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

14 cdens

For fitting a single Gaussian:

"X": one-dimensional"XII": spherical"XXI": diagonal"XXX": ellipsoidal

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

... Arguments for model-specific functions. Specifically:

• logarithm : A logical value indicating whether or not the logarithm ofthe component densities should be returned. The default is to return thecomponent densities, obtained from the log component densities by expo-nentiation.

• An argument describing the variance (depends on the model):

sigmasq for the one-dimensional models ("E", "V") and spherical models("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). This is a list with the followingcomponents:

d The dimension of the data.

G The number of components in the mixture model.

scale Either aG-vector giving the scale of the covariance (thedth rootof its determinant) for each component in the mixture model, or asingle numeric value if the scale is the same for each component.

shape Either aG by d matrix in which thekth column is the shapeof the covariance matrix (normalized to have determinant 1) for thekth component, or ad-vector giving a common shape for all compo-nents.

orientation Either ad by d by G array whose[,,k] th entry is the or-thonomal matrix of eigenvectors of the covariance matrix of thekthcomponent, or ad by d orthonormal matrix if the mixture compo-nents have a common orientation. Theorientation componentof decomp can be omitted in spherical and diagonal models, forwhich the principal components are parallel to the coordinate axesso that the orientation matrix is the identity.

Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.

sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.

Page 15: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

cdens 15

• eps : A scalar tolerance for deciding when to terminate computations dueto computational singularity in covariances. Smaller values ofeps allowcomputations to proceed nearer to singularity. The default is.Mclust$eps .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foreps , in which case the second value is used for determiningsingularity in the M-step.

• warnSingular : A logical value indicating whether or not a warningshould be issued whenever a singularity is encountered. The default is.Mclust$warnSingular .

Value

A numeric matrix whose[i,j] th entry is the density of observationi in componentj. The densitiesare not scaled by mixing proportions.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

cdensE , . . . ,cdensVVV , dens , EMclust , mstep , mclustDAtrain , mclustDAtest , mclustOptions ,do.call

Examples

n <- 100 ## create artificial data

set.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))clPairs(x, cl = xclass, sym = c("1","2")) ## display the data

set.seed(0)I <- sample(1:(2*n)) ## random ordering of the datax <- x[I, ]xclass <- xclass[I]

odd <- seq(1, 2*n, by = 2)oddBic <- EMclust(x[odd, ])oddSumry <- summary(oddBic, x[odd, ]) ## best parameter estimatesnames(oddSumry)

even <- odd + 1temp <- cdens(modelName = oddSumry$modelName, data = x[even, ],

mu = oddSumry$mu, decomp = oddSumry$decomp)cbind(class = xclass[even], temp)

## alternative call

Page 16: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

16 cdensE

## Not run:temp <- do.call( "cdens", c(list(data = x[even, ]), oddSumry))cbind(class = xclass[even], temp)## End(Not run)

cdensE Component Density for a Parameterized MVN Mixture Model

Description

Computes component densities for points in a parameterized MVN mixture model.

Usage

cdensE(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensV(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensEII(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensVII(data, mu, sigmasq, eps, warnSingular, logarithm = FALSE, ...)cdensEEI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVEI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensEVI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVVI(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensEEE(data, mu, eps, warnSingular, logarithm = FALSE, ...)cdensEEV(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVEV(data, mu, decomp, eps, warnSingular, logarithm = FALSE, ...)cdensVVV(data, mu, eps, warnSingular, logarithm = FALSE, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described in more detail incdens .

logarithm A logical value indicating whether or not the logarithm of the component den-sities should be returned. The default is to return the component densities, ob-tained from the log component densities by exponentiation.

... An argument giving the variance that takes one of the following forms:

decomp for models "EII" and "VII"; see above.

cholSigma see Sigma, for "EEE".

Sigma for the equal variance model "EEE". Ad by d matrix giving the commoncovariance for all components of the mixture model.

Page 17: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

cdensE 17

cholsigma see sigma, for "VVV".

sigma for the unconstrained variance model "VVV". Ad by d by G matrixarray whose[,,k] th entry is the covariance matrix for thekth componentof the mixture model.The form of the variance specification is the same as for the output for theem, me, or mstep methods for the specified mixture model.Also used to catch unused arguments from ado.call call.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .

Value

A numeric matrix whose[i,j] th entry is the density of observationi in componentj. The densitiesare not scaled by mixing proportions.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

cdens , dens , EMclust , mstep , mclustOptions , do.call

Examples

n <- 100 ## create artificial data

set.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))clPairs(x, cl = xclass, sym = c("1","2")) ## display the data

modelVII <- meVII(x, z = unmap(xclass))modelVVI <- meVVI(x, z = unmap(xclass))modelVVV <- meVVV(x, z = unmap(xclass))

names(modelVII)args(cdensVII)cdenVII <- cdensVII(data = x, mu = modelVII$mu, pro = modelVII$pro,

decomp = modelVII$decomp)names(modelVVI)args(cdensVVI)cdenVVI <- cdensVII(data = x, mu = modelVVI$mu, pro = modelVVI$pro,

decomp = modelVVI$decomp)names(modelVVV)

Page 18: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

18 clPairs

args(cdensVVV)cdenVVV <- cdensVVV( data = x, mu = modelVVV$mu, pro = modelVVV$pro,

cholsigma = modelVVV$cholsigma)

cbind(class=xclass,VII=map(cdenVII),VVI=map(cdenVVI),VVV=map(cdenVVV))

## alternative call

## Not run:cdenVII <- do.call("cdensVII", c(list(data = x), modelVII))cdenVVI <- do.call("cdensVVI", c(list(data = x), modelVVI))cdenVVV <- do.call("cdensVVV", c(list(data = x), modelVVV))

cbind(class=xclass,VII=map(cdenVII),VVI=map(cdenVVI),VVV=map(cdenVVV))## End(Not run)

chevron Simulated minefield data

Description

A two-dimensional data set of simulated minefield data (1104 observations).

Usage

data(chevron)

References

C. Fraley and A.E. Raftery,Computer J., 41:578-588 (1998)

clPairs Pairwise Scatter Plots showing Classification

Description

Creates a scatter plot for each pair of variables in given data. Observations in different classes arerepresented by different symbols.

Usage

clPairs(data, classification, symbols, labels=dimnames(data)[[2]],CEX=1, col, ...)

Page 19: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

clPairs 19

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

classificationA numeric or character vector representing a classification of observations (rows)of data .

symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes in or-der of appearance in the sequence of observations (the order used by the functionunique ). Default: If G is the number of groups in the classification, the firstG symbols in.Mclust$symbols , otherwise ifG is less than 27 then the firstG capital letters in the Roman alphabet. If noclassification argument isgiven the default symbol is"." .

labels A vector of character strings for labeling the variables. The default is to use thecolumn dimension names ofdata .

CEX An argument specifying the size of the plotting symbols. The default value is 1.

col Color vector to use. Default is one color per class. Splus default: all black.

... Additional arguments to be passed to the graphics device.

Side Effects

Scatter plots for each combination of variables indata are created on the current graphics device.Observations of different classifications are labeled with different symbols.

References

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

pairs , coordProj , mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

clPairs(irisMatrix, cl=irisClass, symbols=as.character(1:3))

Page 20: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

20 classError

classError Classification error.

Description

Error for a given classification relative to a known truth. Location of errors in a given classificationrelative to a known truth.

Usage

classError(classification, truth)

Arguments

classificationA numeric or character vector of class labels.

truth A numeric or character vector of class labels. Must have the same length asclassification .

Details

classErrors will only return one possibility if more than one mapping between classificationand truth results in the minimum error.

Value

classError gives the fraction of elements misclassified forclassification relative totruth . classErrors is a logical vector of the same length asclassification andtruthwhich gives the location of misclassified elements inclassification relative totruth .

See Also

compareClass , mapClass , table

Examples

a <- rep(1:3, 3)ab <- rep(c("A", "B", "C"), 3)bclassError(a, b)classErrors(a, b)

a <- sample(1:3, 9, replace = TRUE)ab <- sample(c("A", "B", "C"), 9, replace = TRUE)bclassError(a, b)

Page 21: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

compareClass 21

compareClass Compare classifications.

Description

Compare classifications via the normalized variation of information criterion.

Usage

compareClass(a, b)

Arguments

a A numeric or character vector of class labels.

b A numeric or character vector of class labels. Must have the same length asa.

Value

The variation of information criterion (Meila 2002) fora andb divided by the log of the length ofthe sequences so that it falls in[0,1].

References

Marina Meila (2002). Comparing clusterings. Technical Report no. 418, Department of Statistics,University of Washington.

Seehttp://www.stat.washington.edu/www/research/reports .

See Also

mapClass , classError , table

Examples

a <- rep(1:3, 3)ab <- rep(c("A", "B", "C"), 3)bcompareClass(a, b)a <- sample(1:3, 9, replace = TRUE)ab <- sample(c("A", "B", "C"), 9, replace = TRUE)bcompareClass(a, b)

Page 22: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

22 coordProj

coordProj Coordinate projections of data in more than two dimensions modelledby an MVN mixture.

Description

Plots coordinate projections given data in more than two dimensions and parameters of an MVNmixture model for the data.

Usage

coordProj(data, ..., dimens = c(1, 2),type = c("classification","uncertainty","errors"), ask = TRUE,quantiles = c(0.75, 0.95), symbols, scale = FALSE,identify = FALSE, CEX = 1, PCH = ".", xlim, ylim)

Arguments

data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables.

dimens A vector of length 2 giving the integer dimensions of the desired coordinateprojections. The default isc(1,2) , in which the first dimension is plottedagainst the second.

... One or more of the following:

classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .

uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.

z A matrix in which the[i,k] th entry gives the probability of observationi belonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.

truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.

mu A matrix whose columns are the means of each group.

sigma A three dimensional array in whichsigma[,,k] gives the covariancefor thekth group.

decomp A list with scale , shape andorientation components givingan alternative form for the covariance structure of the mixture model.

type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.

ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.

Page 23: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

coordProj 23

quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).

symbols Either an integer or character vector assigning a plotting symbol to each uniqueclass inclassification . Elements insymbols correspond to classes inclassification in sorted order. Default: IfG is the number of groups inthe classification, the firstG symbols in.Mclust$symbols , otherwise ifGis less than 27 then the firstG capital letters in the Roman alphabet.

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE

identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.

CEX An argument specifying the size of the plotting symbols. The default value is 1.

PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".

xlim, ylim Arguments specifying bounds for the ordinate, abscissa of the plot. This may beuseful for when comparing plots.

Side Effects

Coordinate projections of the data, possibly showing location of the mixture components, classifi-cation, uncertainty, and/or classification errors.

References

C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

clPairs , randProj , mclust2Dplot , mclustOptions , do.call

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstepVVV(irisMatrix, unmap(irisClass))

par(pty = "s", mfrow = c(1,2))coordProj(irisMatrix,dimens=c(2,3), truth = irisClass,

mu = msEst$mu, sigma = msEst$sigma, z = msEst$z)do.call("coordProj", c(list(data=irisMatrix, dimens=c(2,3), truth=irisClass),

msEst))

Page 24: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

24 cv1EMtrain

cv1EMtrain Select discriminant models using cross validation

Description

For the ten available discriminant models the leave-one-out cross validation error is calulated. Themodels for one-dimensional data are "E" and "V"; for higher dimensions they are "EII", "VII","EEI", "VEI", "EVI", "VVI", "EEE", "EEV", "VEV" and "VVV".

Usage

cv1EMtrain(data, labels, modelNames)

Arguments

data A data matrix

labels Labels for each row in the data matrix

modelNames Vector of model names that should be tested.

Value

Returns a vector where each element is the error rate for the corresponding model.

Author(s)

C. Fraley

See Also

bicEMtrain

Examples

data(lansing)odd <- seq(from=1, to=nrow(lansing), by=2)round(cv1EMtrain(data=lansing[odd,-3], labels=lansing[odd,3]), 3)

cv1Modd <- mstepEEV(data=lansing[odd,-3], z=unmap(lansing[odd,3]))cv1Zodd <- do.call("estepEEV", c(cv1Modd, list(data=lansing[odd,-3])))$zcompareClass(map(cv1Zodd), lansing[odd,3])

even <- (1:nrow(lansing))[-odd]cv1Zeven <- do.call("estepEEV", c(cv1Modd, list(data=lansing[even,-3])))$zcompareClass(map(cv1Zodd), lansing[odd,3])$error

Page 25: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

decomp2sigma 25

decomp2sigma Convert mixture component covariances to matrix form.

Description

Converts a set of covariances from a parameterization by eigenvalue decomposition to representa-tion as a 3-D array.

Usage

decomp2sigma(d, G, scale, shape, orientation, ...)

Arguments

d The dimension of the data.

G The number of components in the mixture model.

scale Either aG-vector giving the scale of the covariance (thedth root of its determi-nant) for each component in the mixture model, or a single numeric value if thescale is the same for each component.

shape Either aG by d matrix in which thekth column is the shape of the covariancematrix (normalized to have determinant 1) for thekth component, or ad-vectorgiving a common shape for all components.

orientation Either ad by d by G array whose[,,k] th entry is the orthonomal matrix ofeigenvectors of the covariance matrix of thekth component, or ad by d or-thonormal matrix if the mixture components have a common orientation. Theorientation component ofdecomp can be omitted in spherical and diag-onal models, for which the principal components are parallel to the coordinateaxes so that the orientation matrix is the identity.

... Catch unused arguments from ado.call call.

Value

A 3-D array whose[,,k] th component is the covariance matrix of thekth component in an MVNmixture model.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation, and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

sigma2decomp

Page 26: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

26 dens

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

meEst <- meVEV(irisMatrix, unmap(irisClass))names(meEst)meEst$decompmeEst$sigma

dec <- meEst$decompdecomp2sigma(d=dec$d, G=dec$G, shape=dec$shape, scale=dec$scale,

orientation = dec$orientation)## Not run:do.call("decomp2sigma", meEst$decomp) ## alternative call## End(Not run)

dens Density for Parameterized MVN Mixtures

Description

Computes densities of obseravations in parameterized MVN mixtures.

Usage

dens(modelName, data, mu, logarithm, ...)

Arguments

modelName A character string indicating the model. Possible models:

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

For fitting a single Gaussian,

"X": one-dimensional"XII": spherical"XXI": diagonal"XXX": ellipsoidal

Page 27: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

dens 27

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

logarithm Return logarithm of the density, rather than the density itself. Default: FALSE

... Other arguments, such as an argument describing the variance. Seecdens .

Value

A numeric vector whoseith component is the density of observationi in the MVN mixture specifiedby muand... .

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

grid1 , cdens , mclustOptions , do.call

Examples

n <- 100 ## create artificial data

set.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))clPairs(x, cl = xclass, sym = c("1","2")) ## display the data

set.seed(0)I <- sample(1:(2*n))x <- x[I, ]xclass <- xclass[I]

odd <- seq(1, 2*n, by = 2)oddBic <- EMclust(x[odd, ])oddSumry <- summary(oddBic, x[odd, ]) ## best parameter estimatesnames(oddSumry)

oddDens <- dens(modelName = oddSumry$modelName, data = x,mu = oddSumry$mu, decomp = oddSumry$decomp, pro = oddSumry$pro)

## Not run:oddDens <- do.call("dens", c(list(data = x), oddSumry)) ## alternative call## End(Not run)

even <- odd + 1

Page 28: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

28 density

evenBic <- EMclust(x[even, ])evenSumry <- summary(evenBic, x[even, ]) ## best parameter estimatesevenDens <- do.call( "dens", c(list(data = x), evenSumry))

cbind(class = xclass, odd = oddDens, even = evenDens)

density Kernel Density Estimation

Description

This is exaclty the same function as in the base package but for themethod argument: if it isgiven and equals"mclust" , themclust density estimation is used. Optionally, the number ofgaussians to be considered can be given as well (G).

Usage

density(..., method, G)

Arguments

... Arguments to thedensity function in the base package.

method If equal to "mclust",EMclust is used to estimate the density.

G The number of gaussians to consider in the model-based density estimation.Default: 1:9. Ignored if method is not equal to "mclust".

Value

If give.Rkern is true, the numberR(K), otherwise an object with class"density" whoseunderlying structure is a list containing the following components.

x then coordinates of the points where the density is estimated.

y the estimated density values.

bw the bandwidth used.

N the sample size after elimination of missing values.

call the call which produced the result.

data.name the deparsed name of thex argument.

has.na logical, for compatibility (always FALSE).

References

Fraley, C. and Raftery, A.E. (2002) MCLUST: software for model-based clustering, density esti-mation and discriminant analysis. Technical Report No. 415, Dept. of Statistics, University ofWashington.

Scott, D. W. (1992)Multivariate Density Estimation. Theory, Practice and Visualization. NewYork: Wiley.

Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kerneldensity estimation.J. Roy. Statist. Soc.B, 683–690.

Silverman, B. W. (1986)Density Estimation. London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (1999)Modern Applied Statistics with S-PLUS. New York:Springer.

Page 29: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

diabetes 29

See Also

density (base package),bw.nrd , plot.density , hist .

Examples

plot(density(c(-20,rep(0,98),20)), xlim = c(-4,4))# IQR = 0

# The Old Faithful geyser datadata(faithful)d <- density(faithful$eruptions, bw = "sj")dplot(d)dmc <- density(faithful$eruptions, method="mclust")plot(dmc, type = "n")polygon(dmc, col = "wheat")lines(d, col="red")

## Missing values:x <- xx <- faithful$eruptionsx[i.out <- sample(length(x), 10)] <- NAdoRmc <- density(x=x, method="mclust", na.rm = TRUE)lines(doRmc, col="blue")doR <- density(x, bw = 0.15, na.rm = TRUE)lines(doR, col = "green")rug(x)points(xx[i.out], rep(0.01, 10))

## function formals returns something different now the original## density function is masked...base.density <- if(exists("density", envir = NULL)) {

get("density", envir = NULL)} else

stats::density(kernels <- eval(formals(base.density)$kernel))

## show the kernels in the R parametrizationplot (density(0, bw = 1), xlab = "",

main="R's density() kernels with bw = 1")for(i in 2:length(kernels))

lines(density(0, bw = 1, kern = kernels[i]), col = i)legend(1.5,.4, legend = kernels, col = seq(kernels),

lty = 1, cex = .8, y.int = 1)

data(precip)bw <- bw.SJ(precip) ## sensible automatic choiceplot(density(precip, bw = bw, n = 2^13))lines(density(precip, G=2:5, method="mclust"), col="red")rug(precip)

diabetes Diabetes data

Description

Diabetes data from Reaven and Miller. Number of objects: 145; 3 variables. Three classes.

Page 30: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

30 em

Usage

data(diabetes)

References

G.M. Reaven and R.G. Miller,Diabetologica16:17-24 (1979).

em EM algorithm starting with E-step for parameterized MVN mixturemodels.

Description

Implements the EM algorithm for parameterized MVN mixture models, starting with the expecta-tion step.

Usage

em(modelName, data, mu, ...)

Arguments

modelName A character string indicating the model:

"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

... Arguments for model-specific em functions. Specifically:

• An argument describing the variance (depends on the model):

sigmasq for the one-dimensional models ("E", "V") and spherical models("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").

Page 31: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

em 31

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). For a description, seecdens .

Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.

sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.

• pro : Mixing proportions for the components of the mixture. There shouldone more mixing proportion than the number of MVN components if themixture model includes a Poisson noise term.

• eps : A scalar tolerance for deciding when to terminate computations dueto computational singularity in covariances. Smaller values ofeps allowcomputations to proceed nearer to singularity. The default is.Mclust$eps .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foreps , in which case the second value is used for determiningsingularity in the M-step.

• tol : A scalar tolerance for relative convergence of the loglikelihood. Thedefault is.Mclust$tol .For those models with iterative M-step ("VEI", "VEV"), two values can beentered fortol , in which case the second value governs parameter conver-gence in the M-step.

• itmax : An integer limit on the number of EM iterations. The default is.Mclust$itmax .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foritmax , in which case the second value is an upper limit on thenumber of iterations in the M-step.

• equalPro : Logical variable indicating whether or not the mixing propor-tions are equal in the model. The default is.Mclust$equalPro .

• warnSingular : A logical value indicating whether or not a warningshould be issued whenever a singularity is encountered. The default is.Mclust$warnSingular .

• Vinv : An estimate of the reciprocal hypervolume of the data region. Thedefault is determined by applying functionhypvol to the data. Used onlywhenpro includes an additional mixing proportion for a noise component.

Details

This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.

Value

A list including the following components:

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

loglik The logliklihood for the data in the mixture model.

mu A matrix whose kth column is the mean of thekth component of the mixturemodel.

Page 32: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

32 emE

sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.

pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.

modelName A character string identifying the model (same as the input argument).

Attributes: • "info" : Information on the iteration.

• "warn" : An appropriate warning if problems are encountered in the com-putations.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

emE, . . . ,emVVV, estep , me, mstep , mclustOptions , do.call

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstep(modelName = "EEE", data = irisMatrix,z = unmap(irisClass))

names(msEst)

em(modelName = msEst$modelName, data = irisMatrix,mu = msEst$mu, Sigma = msEst$Sigma, pro = msEst$pro)

## Not run:do.call("em", c(list(data = irisMatrix), msEst)) ## alternative call## End(Not run)

emE EM algorithm starting with E-step for a parameterized MVN mixturemodel.

Description

Implements the EM algorithm for a parameterized MVN mixture model, starting with the expecta-tion step.

Page 33: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

emE 33

Usage

emE(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emV(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emEII(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emVII(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emEEI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emVEI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emEVI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emVVI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emEEE(data, mu, Sigma, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emEEV(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emVEV(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

emVVV(data, mu, sigma, pro, eps, tol, itmax, equalPro, warnSingular,Vinv, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described in more detail incdens .

Sigma for the equal variance model "EEE". Ad by d matrix giving the common co-variance for all components of the mixture model.

sigma for the unconstrained variance model "VVV". Ad by d by G matrix array whose[,,k] th entry is the covariance matrix for thekth component of the mixturemodel.

... An argument giving the variance that takes one of the following forms:

decomp for models "VVV", "EII" and "VII"; seecdens .

cholSigma see Sigma, for "EEE".

cholsigma see sigma, for "VVV".

sigma see sigma, for "VVV".

Page 34: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

34 emE

Sigma see Sigma, for "EEE".The form of the variance specification is the same as for the output for theem, me, or mstep methods for the specified mixture model.Also used to catch unused arguments from ado.call call.

pro Mixing proportions for the components of the mixture. There should one moremixing proportion than the number of MVN components if the mixture modelincludes a Poisson noise term.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .

tol A scalar tolerance for relative convergence of the loglikelihood values. Thedefault is.Mclust$tol .

itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .

equalPro A logical value indicating whether or not the components in the model arepresent in equal proportions. The default is.Mclust$equalPro .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .

Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data. Used only whenproincludes an additional mixing proportion for a noise component.

Details

This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.

Value

A list including the following components:

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

loglik The logliklihood for the data in the mixture model.

mu A matrix whose kth column is the mean of thekth component of the mixturemodel.

sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.

pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.

modelName Character string identifying the model.

Attributes: • "info" : Information on the iteration.

• "warn" : An appropriate warning if problems are encountered in the com-putations.

Page 35: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

estep 35

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

em, mstep , mclustOptions , do.call

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstepEEE(data = irisMatrix, z = unmap(irisClass))names(msEst)

emEEE(data = irisMatrix, mu = msEst$mu, pro = msEst$pro,cholSigma = msEst$cholSigma)## Not run:do.call("emEEE", c(list(data=irisMatrix), msEst)) ## alternative call## End(Not run)

estep E-step for parameterized MVN mixture models.

Description

Implements the expectation step of EM algorithm for parameterized MVN mixture models.

Usage

estep(modelName, data, mu, ...)

Arguments

modelName A character string indicating the model:

"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation

Page 36: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

36 estep

"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

... Arguments for model-specific functions. Specifically:

• An argument describing the variance (depends on the model):sigmasq for the one-dimensional models ("E", "V") and spherical models

("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). This is a list described incdens .

Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.

sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.

pro Mixing proportions for the components of the mixture. There should onemore mixing proportion than the number of MVN components if the mix-ture model includes a Poisson noise term.

eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allow compu-tations to proceed nearer to singularity. The default is.Mclust$eps .

warnSingularA logical value indicating whether or not a warning should be issued when-ever a singularity is encountered. The default is.Mclust$warnSingular .

Vinv An estimate of the reciprocal hypervolume of the data region. The defaultis determined by applying functionhypvol to the data. Used only whenpro includes an additional mixing proportion for a noise component.

Details

This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.

Value

A list including the following components:

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

loglik The logliklihood for the data in the mixture model.

modelName A character string identifying the model (same as the input argument).

Attribute • "warn" : An appropriate warning if problems are encountered in the com-putations.

Page 37: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

estepE 37

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

estepE , ...,estepVVV , em, mstep , do.call , mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstep(modelName = "EII", data = irisMatrix,z = unmap(irisClass))

names(msEst)

estep(modelName = msEst$modelName, data = irisMatrix,mu = msEst$mu, sigmasq = msEst$sigmasq, pro = msEst$pro)

## Not run:do.call("estep", c(list(data = irisMatrix), msEst)) ## alternative call## End(Not run)

estepE E-step in the EM algorithm for a parameterized MVN mixture model.

Description

Implements the expectation step in the EM algorithm for a parameterized MVN mixture model.

Usage

estepE(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepV(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepEII(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepVII(data, mu, sigmasq, pro, eps, warnSingular, Vinv, ...)estepEEI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVEI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepEVI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVVI(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepEEE(data, mu, Sigma, pro, eps, warnSingular, Vinv, ...)estepEEV(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVEV(data, mu, decomp, pro, eps, warnSingular, Vinv, ...)estepVVV(data, mu, sigma, pro, eps, warnSingular, Vinv, ...)

Page 38: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

38 estepE

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described in more detail incdens .

sigma for the unconstrained variance model "VVV" or the equal variance model "EEE".A d by d by G matrix array whose[,,k] th entry is the covariance matrix forthekth component of the mixture model.

Sigma for the equal variance model "EEE". Ad by d matrix giving the common co-variance for all components of the mixture model.

pro Mixing proportions for the components of the mixture. There should one moremixing proportion than the number of MVN components if the mixture modelincludes a Poisson noise term.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .

Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data. Used only whenproincludes an additional mixing proportion for a noise component.

... Other arguments to describe the variance, in particulardecomp, sigma orcholsigma for model "VVV", decomp for models "VII" and "EII", andSigma or cholSigma for model "EEE". Sigma is and by d matrix givingthe common covariance for all components of the mixture model.Also used to catch unused arguments from ado.call call.

Details

This function can be used with an indirect or list call usingdo.call , allowing the output of e.g.mstep to be passed without the need to specify individual parameters as arguments.

Value

A list including the following components:

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

loglik The logliklihood for the data in the mixture model.

modelName Character string identifying the model.

Attribute • "warn" : An appropriate warning if problems are encountered in the com-putations.

Page 39: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

grid1 39

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and den-sity estimation. Journal of the American Statistical Association. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

estep , em, mstep , do.call , mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstepEII(data = irisMatrix, z = unmap(irisClass))names(msEst)

estepEII(data = irisMatrix, mu = msEst$mu, pro = msEst$pro,sigmasq = msEst$sigmasq)

## Not run:do.call("estepEII", c(list(data=irisMatrix), msEst)) ## alternative call## End(Not run)

grid1 Generate grid points

Description

Generate grid points in one or two dimensions.

Usage

grid1(n, range = c(0, 1), edge = TRUE)grid2(x, y)

Arguments

n Number of grid points.

range Range of grid points.

edge Logical: include edges or not?

x, y Vectors.

Value

The value returned is simple:grid1 generates a vector;grid2 generates a matrix.

Page 40: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

40 hc

Author(s)

C. Fraley

See Also

lansing , dens

Examples

data(lansing)maples <- lansing[as.character(lansing[,"species"]) == "maple", -3]maplesBIC <- EMclust(maples)maplesModel <- summary(maplesBIC, maples)x <- grid1(100, range=c(0,1))y <- xxyDens <- do.call("dens", c(list(data=grid2(x, y)), maplesModel))xyDens <- matrix(xyDens, ncol=100)contour(xyDens)points(maples, cex=.2, col="red")

image(xyDens)points(maples, cex=.5)

hc Model-based Hierarchical Clustering

Description

Agglomerative hierarchical clustering based on maximum likelihood criteria for MVN mixturemodels parameterized by eigenvalue decomposition.

Usage

hc(modelName, data, ...)

Arguments

modelName A character string indicating the model. Possible models:

"E" : equal variance (one-dimensional)"V" : spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEE": ellipsoidal, equal volume, shape, and orientation"VVV": ellipsoidal, varying volume, shape, and orientation

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

... Arguments for the method-specific hc functions. SeehcE.

Page 41: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

hc 41

Details

Most models have memory usage of the order of the square of the number groups in the initialpartition for fast execution. Some models, such as equal variance or"EEE" , do not admit a fastalgorithm under the usual agglomerative hierarchical clustering paradigm. These use less memorybut are much slower to execute.

Value

A numeric two-column matrix in which theith row gives the minimum index for observations ineach of the two clusters merged at theith stage of agglomerative hierarchical clustering.

References

J. D. Banfield and A. E. Raftery (1993). Model-based Gaussian and non-Gaussian Clustering.Biometrics 49:803-821.

C. Fraley (1998). Algorithms for model-based Gaussian hierarchical clustering.SIAM Journal onScientific Computing 20:270-281. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

Note

If modelName = "E" (univariate with equal variances) ormodelName = "EII" (multivari-ate with equal spherical covariances), then the method is equivalent to Ward’s method for hierarchi-cal clustering.

See Also

hcE,...,hcVVV, hclass

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

hcTree <- hc(modelName = "VVV", data = irisMatrix)cl <- hclass(hcTree,c(2,3))

par(pty = "s", mfrow = c(1,1))clPairs(irisMatrix,cl=cl[,"2"])clPairs(irisMatrix,cl=cl[,"3"])

par(mfrow = c(1,2))dimens <- c(1,2)coordProj(irisMatrix, classification=cl[,"2"], dimens=dimens)coordProj(irisMatrix, classification=cl[,"3"], dimens=dimens)

Page 42: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

42 hcE

hcE Model-based Hierarchical Clustering

Description

Agglomerative hierarchical clustering based on maximum likelihood for a MVN mixture modelparameterized by eigenvalue decomposition.

Usage

hcE(data, partition, minclus=1, ...)hcV(data, partition, minclus = 1, alpha = 1, ...)hcEII(data, partition, minclus = 1, ...)hcVII(data, partition, minclus = 1, alpha = 1, ...)hcEEE(data, partition, minclus = 1, ...)hcVVV(data, partition, minclus = 1, alpha = 1, beta = 1, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

partition A numeric or character vector representing a partition of observations (rows) ofdata . If provided, group merges will start with this partition. Otherwise, eachobservation is assumed to be in a cluster by itself at the start of agglomeration.

minclus A number indicating the number of clusters at which to stop the agglomeration.The default is to stop when all observations have been merged into a singlecluster.

alpha, beta Additional tuning parameters needed for initializatiion in some models. Fordetails, see Fraley 1998. The defaults provided are usually adequate.

... Catch unused arguments from ado.call call.

Details

Most models have memory usage of the order of the square of the number groups in the initialpartition for fast execution. Some models, such as equal variance or"EEE" , do not admit a fastalgorithm under the usual agglomerative hierachical clustering paradigm. These use less memorybut are much slower to execute.

Value

A numeric two-column matrix in which theith row gives the minimum index for observations ineach of the two clusters merged at theith stage of agglomerative hierarchical clustering.

References

J. D. Banfield and A. E. Raftery (1993). Model-based Gaussian and non-Gaussian Clustering.Biometrics 49:803-821.

C. Fraley (1998). Algorithms for model-based Gaussian hierarchical clustering.SIAM Journal onScientific Computing 20:270-281. Seehttp://www.stat.washington.edu/mclust .

Page 43: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

hclass 43

C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

hc , hclass

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

hcTree <- hcEII(data = irisMatrix)cl <- hclass(hcTree,c(2,3))

par(pty = "s", mfrow = c(1,1))clPairs(irisMatrix,cl=cl[,"2"])clPairs(irisMatrix,cl=cl[,"3"])

par(mfrow = c(1,2))dimens <- c(1,2)coordProj(irisMatrix, classification=cl[,"2"], dimens=dimens)coordProj(irisMatrix, classification=cl[,"3"], dimens=dimens)

hclass Classifications from Hierarchical Agglomeration

Description

Determines the classifications corresponding to different numbers of groups given merge pairs fromhierarchical agglomeration.

Usage

hclass(hcPairs, G)

Arguments

hcPairs A numeric two-column matrix in which theith row gives the minimum index forobservations in each of the two clusters merged at theith stage of agglomerativehierarchical clustering.

G An integer or vector of integers giving the number of clusters for which thecorresponding classfications are wanted.

Value

A matrix with length(G) columns, each column corresponding to a classification. Columns areindexed by the character representation of the integers inG.

Page 44: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

44 hypvol

References

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

hc , hcE

Examples

data(iris)irisMatrix <- iris[,1:4]

hcTree <- hc(modelName="VVV", data = irisMatrix)cl <- hclass(hcTree,c(2,3))

par(pty = "s", mfrow = c(1,1))clPairs(irisMatrix,cl=cl[,"2"])clPairs(irisMatrix,cl=cl[,"3"])

hypvol Aproximate Hypervolume for Multivariate Data

Description

Computes a simple approximation to the hypervolume of a multivariate data set.

Usage

hypvol(data, reciprocal=FALSE)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

reciprocal A logical variable indicating whether or not the reciprocal hypervolume is de-sired rather than the hypervolume itself. The default is to return the approximatehypervolume.

Value

Computes the hypervolume by two methods: simple variable bounds and principal components,and returns the minimum value.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611:631. Seehttp://www.stat.washington.edu/mclust .

Page 45: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

lansing 45

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])hypvol(irisMatrix)

lansing Maple trees in Lansing Woods

Description

The lansing data frame has 1217 rows and 3 columns. The first two columns give the location,the third column the tree type.

Usage

data(lansing)

Format

This data frame contains the following columns:

x a numeric vector

y a numeric vector

speciesa factor with levelshickory andmaple

Source

D.J. Gerrard, Research Bulletin No. 20, Agricultural Experimental Station, Michigan State Univer-sity, 1969.

See Also

grid1 , dens

Examples

data(lansing)plot(lansing[,1:2], pch=as.integer(lansing[,3]),

col=as.integer(lansing[,3]), main="Lansing Woods tree types")

Page 46: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

46 map

map Classification given Probabilities

Description

Converts a matrix in which each row sums to1 into the nearest matrix of(0,1) indicator variables.

Usage

map(z, warn=TRUE, ...)

Arguments

z A matrix (for example a matrix of conditional probabilities in which each rowsums to 1 as produced by the E-step of the EM algorithm).

warn A logical variable indicating whether or not a warning should be issued whenthere are some columns ofz for which no row attains a maximum.

... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .

Value

A integer vector with one entry for each row of z, in which thei-th value is the column index atwhich thei-th row ofz attains a maximum.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and densityestimation.Journal of the American Statistical Association 97:611-631.

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington.

Seehttp://www.stat.washington.edu/mclust .

See Also

unmap, estep , em, me

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

emEst <- me(modelName = "VVV", data = irisMatrix, z = unmap(irisClass))

map(emEst$z)

Page 47: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mapClass 47

mapClass Correspondence between classifications.

Description

Best correspondence between classes given two vectors viewed as alternative classifications of thesame object.

Usage

mapClass(a, b)

Arguments

a A numeric or character vector of class labels.

b A numeric or character vector of class labels. Must have the same length asa.

Value

A list with two named elements,aTOb andbTOa which are themselves lists. TheaTOb list has acomponent corresponding to each unique element ofa, which gives the element or elements ofbthat result in the closest class correspondence.

ThebTOa list has a component corresponding to each unique element ofb, which gives the elementor elements ofa that result in the closest class correspondence.

See Also

mapClass , classError , table

Examples

a <- rep(1:3, 3)ab <- rep(c("A", "B", "C"), 3)bmapClass(a, b)a <- sample(1:3, 9, replace = TRUE)ab <- sample(c("A", "B", "C"), 9, replace = TRUE)bmapClass(a, b)

mclust-internal Internal MCLUST functions

Description

Internal tools functions.

Details

These are not to be called by the user directly.

Page 48: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

48 mclust1Dplot

mclust1Dplot Plot one-dimensional data modelled by an MVN mixture.

Description

Plot one-dimensional data given parameters of an MVN mixture model for the data.

Usage

mclust1Dplot(data, ...,type = c("classification","uncertainty","density","errors"),ask = TRUE, symbols, grid = 100, identify = FALSE, CEX = 1, xlim)

Arguments

data A numeric vector of observations. Categorical variables are not allowed.

... One or more of the following:

classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .

uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.

z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.

truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.

mu A vector whose entries are the means of each group.sigma Either a vector whose entries are the variances for each group or a scalar

giving a common variance for the groups.pro The vector of mixing proportions.

type Any subset ofc("classification","uncertainty","density","errors") .The function will produce the corresponding plot if it has been supplied suffi-cient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.

ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.

symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance in the observations (the order usedby the functionunique ). The default is to use a single plotting symbol|.Classes are delineated by showing them in separate lines above the whole of thedata.

grid Number of grid points to use.

identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.

CEX An argument specifying the size of the plotting symbols. The default value is 1.

xlim An argument specifying bounds of the plot. This may be useful for when com-paring plots.

Page 49: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mclust2Dplot 49

Side Effects

One or more plots showing location of the mixture components, classification, uncertainty, densityand/or classification errors. Points in the different classes are shown in separate lines above thewhole of the data.

References

C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mclust2Dplot , clPairs , coordProj , do.call

Examples

n <- 250 ## create artificial dataset.seed(0)y <- c(rnorm(n,-5), rnorm(n,0), rnorm(n,5))yclass <- c(rep(1,n), rep(2,n), rep(3,n))

yEMclust <- summary(EMclust(y),y)

mclust1Dplot(y, identify = TRUE, truth = yclass, z = yEMclust$z, ask=FALSE,mu = yEMclust$mu, sigma = yEMclust$sigma, pro = yEMclust$pro)

do.call("mclust1Dplot",c(list(data = y, identify = TRUE, truth = yclass, ask=FALSE),yEMclust))

mclust2Dplot Plot two-dimensional data modelled by an MVN mixture.

Description

Plot two-dimensional data given parameters of an MVN mixture model for the data.

Usage

mclust2Dplot(data, ...,type = c("classification","uncertainty","errors"), ask = TRUE,quantiles = c(0.75, 0.95), symbols, scale = FALSE,identify = FALSE, CEX = 1, PCH = ".", xlim, ylim,swapAxes = FALSE)

Page 50: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

50 mclust2Dplot

Arguments

data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables. In this case the data are two dimensional, so there aretwo columns.

... One or more of the following:

classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .

uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.

z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.

truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.

mu A matrix whose columns are the means of each group.sigma A three dimensional array in whichsigma[,,k] gives the covariance

for thekth group.decomp A list with scale , shape andorientation components giving

an alternative form for the covariance structure of the mixture model.

type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.

ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.

quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).

symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance in the observations (the order usedby the S-PLUS functionunique ). Default: If G is the number of groups inthe classification, the firstG symbols in.Mclust$symbols , otherwise ifGis less than 27 then the firstG capital letters in the Roman alphabet.

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE

identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.

CEX An argument specifying the size of the plotting symbols. The default value is 1.

PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".

xlim, ylim An argument specifying bounds for the ordinate, abscissa of the plot. This maybe useful for when comparing plots.

Page 51: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mclustDA 51

swapAxes A logical variable indicating whether or not the axes should be swapped for theplot.

Side Effects

One or more plots showing location of the mixture components, classification, uncertainty, and/orclassification errors.

References

C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

surfacePlot , clPairs , coordProj , randProj , spinProj , mclustOptions , do.call

Examples

n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))

xEMclust <- summary(EMclust(x),x)

mclust2Dplot(x, truth = xclass, z = xEMclust$z, ask=FALSE,mu = xEMclust$mu, sigma = xEMclust$sigma)

do.call("mclust2Dplot", c(list(data = x, truth = xclass, ask=FALSE), xEMclust))

mclustDA MclustDA discriminant analysis.

Description

MclustDA training and testing.

Usage

mclustDA(trainingData, labels, testData, G=1:6, verbose = FALSE)

Page 52: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

52 mclustDA

Arguments

trainingData A numeric vector, matrix, or data frame of training observations. Categoricalvariables are not allowed. If a matrix or data frame, rows correspond to obser-vations and columns correspond to variables.

labels A numeric or character vector assigning a class label to each training observa-tion.

testData A numeric vector, matrix, or data frame of training observations. Categoricalvariables are not allowed. If a matrix or data frame, rows correspond to obser-vations and columns correspond to variables.

G An integer vector specifying the numbers of mixture components (clusters) tobe considered for each class. Default:1:6 .

verbose A logical variable telling whether or not to print an indication that the functionis in the training phase, which may take some time to complete.

Value

A list with the following components:

testClassificationmclustDA classification of the test data.

trainingClassificationmclustDA classification of the training data.

VofIindex Meila’s Variation of Information index, to compare classification of the trainingdata to the known labels.

summary Gives the best model and number of clusters for each training class.

models The mixture models used to fit the known classes.

postProb A matrix whose[i,k] th entry is the probability that observationi in the test databelongs to thekth class.

Details

The following models are compared inMclust :

"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"VVV": ellipsoidal, varying volume, shape, and orientation

mclustDA is a simplified function combiningmclustDAtrain andmclustDAtest and theirsummaries.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

Page 53: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mclustDAtest 53

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

M. Meila (2002). Comparing clusterings. Technical Report 418, Department of Statistics, Univer-sity of Washington. Seehttp://www.stat.washington.edu/www/research/reports .

See Also

plot.mclustDA , mclustDAtrain , mclustDAtest , compareClass , classError

Examples

n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))

## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)

odd <- seq(from = 1, to = 2*n, by = 2)even <- odd + 1testMclustDA <- mclustDA(trainingData = x[odd, ], labels = xclass[odd],

testData = x[even,])

clEven <- testMclustDA$testClassification ## classify training setcompareClass(clEven,xclass[even])## Not run:plot(testMclustDA, trainingData = x[odd, ], labels = xclass[odd],

testData = x[even,])## End(Not run)

mclustDAtest MclustDA Testing

Description

Testing phase for MclustDA discriminant analysis.

Usage

mclustDAtest(data, models)

Arguments

data A numeric vector, matrix, or data frame of observations to be classified.

models A list of MCLUST-style models including parameters, usually the result of ap-plying mclustDAtrain to some training data.

Page 54: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

54 mclustDAtrain

Value

A matrix in which the[i,j] th entry is the density for that test observationi in the model for classj.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

summary.mclustDAtest , mclustDAtrain

Examples

n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)

odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)

even <- odd + 1test <- mclustDAtest(x[even, ], train) ## compute model densitiessummary(test)$class ## classify training set

mclustDAtrain MclustDA Training

Description

Training phase for MclustDA discriminant analysis.

Usage

mclustDAtrain(data, labels, G, emModelNames, eps, tol, itmax,equalPro, warnSingular, verbose)

Page 55: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mclustDAtrain 55

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

labels A numeric or character vector assigning a class label to each observation.

G An integer vector specifying the numbers of Gaussian mixture components (clus-ters) for which the BIC is to be calculated (the same specification is used for allclasses). Default:1:9.

emModelNames A vector of character strings indicating the models to be fitted in the EM phaseof clustering. Possible models:"E" for spherical, equal variance (one-dimensional)"V" for spherical, variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

The default is.Mclust$emModelNames .

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allow computationsto proceed nearer to singularity. The default is.Mclust$eps .

tol A scalar tolerance for relative convergence of the loglikelihood. The default is.Mclust$tol .

itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .

equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular=FALSE .

verbose A logical value indicating whether or not to print the models and numbers ofcomponents for each class. Default:verbose=TRUE .

Value

A list in which each element gives the optimal parameters for the model best fitting each classaccording to BIC.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

Page 56: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

56 mclustOptions

See Also

summary.mclustDAtrain , mclustDAtest , EMclust , hc , mclustOptions

Examples

n <- 250 ## create artificial dataset.seed(0)par(pty = "s")x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))## Not run:mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)

odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)

even <- odd + 1test <- mclustDAtest(x[even, ], train) ## compute model densitiesclEven <- summary(test)$class ## classify training setcompareClass(clEven,xclass[even])

mclustOptions Set control values for use with MCLUST.

Description

Supplies a list of values including tolerances for singularity and convergence assessment, and anenumeration of models for use withMCLUST.

Usage

mclustOptions(eps, tol, itmax, equalPro, warnSingular, emModelNames,hcModelName, symbols)

Arguments

eps A scalar tolerance associated with deciding when to terminate computationsdue to computational singularity in covariances. Smaller values ofeps allowcomputations to proceed nearer to singularity. The default is the relative ma-chine precision.Machine$double.eps , which is approximately $2e-16$on IEEE-compliant machines.

tol A vector of length two giving relative convergence tolerances for the loglikeli-hood and for parameter convergence in the inner loop for models with iterativeM-step ("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(1.e-5,1.e-5) .

itmax A vector of length two giving integer limits on the number of EM iterations andon the number of iterations in the inner loop for models with iterative M-step("VEI", "VEE", "VVE", "VEV"), respectively. The default isc(Inf,Inf)allowing termination to be completely governed bytol .

Page 57: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mclustOptions 57

equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. Default:equalPro = FALSE .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default iswarnSingular = TRUE .

emModelNames A vector of character strings associated with multivariate models in MCLUST.The default includes strings encoding all of the multivariate models available:

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

hcModelName A vector of two character strings giving the name of the model to be used in thehierarchical clustering phase for univariate and multivariate data, respectively,in EMclust andEMclustN . The default isc("V","VVV") , giving the un-constrained model in each case.

symbols A vector whose entries are either integers corresponding to graphics symbols orsingle characters for plotting for classifications. Classes are assigned symbols inthe given order. The default isc(17,0,10,4,11,18,6,7,3,16,2,12,8,15,1,9,14,13,5) .

Details

mclustOptions is provided for assigning values to the.Mclust list, which is used to supplydefault values to various functions in MCLUST.

Calls tomclustOptions do not in themselves affect the outcome of computations.

Value

A named list in which the names are the names of the arguments and the values are the valuessupplied to the arguments.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

.Mclust

Page 58: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

58 me

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

.Mclust

.Mclust <- mclustOptions(tol = 1.e-6, emModelNames = c("VII", "VVI", "VVV"))

.MclustirisBic <- EMclust(irisMatrix)summary(irisBic, irisMatrix).Mclust <- mclustOptions() # restore default values.Mclust

me EM algorithm starting with M-step for parameterized MVN mixturemodels.

Description

Implements the EM algorithm for parameterized MVN mixture models, starting with the maximiza-tion step.

Usage

me(modelName, data, z, ...)

Arguments

modelName A character string indicating the model:"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.

... Any number of the following:

eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allow compu-tations to proceed nearer to singularity. The default is.Mclust$eps .

Page 59: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

me 59

For those models with iterative M-step ("VEI", "VEV"), two values can beentered foreps , in which case the second value is used for determiningsingularity in the M-step.

tol A scalar tolerance for relative convergence of the loglikelihood. The defaultis .Mclust$tol .For those models with iterative M-step ("VEI", "VEV"), two values can beentered fortol , in which case the second value governs parameter conver-gence in the M-step.

itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .For those models with iterative M-step ("VEI", "VEV"), two values can beentered foritmax , in which case the second value is an upper limit on thenumber of iterations in the M-step.

equalProLogical variable indicating whether or not the mixing proportions are equalin the model. The default is.Mclust$equalPro .

warnSingularA logical value indicating whether or not a warning should be issued when-ever a singularity is encountered. The default is.Mclust$warnSingular .

noise A logical value indicating whether or not the model includes a Poisson noisecomponent. The default assumes there is no noise component.

Vinv An estimate of the reciprocal hypervolume of the data region. The defaultis determined by applying functionhypvol to the data. Used only whennoise = TRUE .

Value

A list including the following components:

mu A matrix whose kth column is the mean of thekth component of the mixturemodel.

sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.

pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

loglik The logliklihood for the data in the mixture model.

modelName A character string identifying the model (same as the input argument).

Attributes: "info" Information on the iteration.

"warn" An appropriate warning if problems are encountered in the computations.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

Page 60: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

60 meE

See Also

meE,...,meVVV, em, mstep , estep , mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

me(modelName = "VVV", data = irisMatrix, z = unmap(irisClass))

meE EM algorithm starting with M-step for a parameterized MVN mixturemodel.

Description

Implements the EM algorithm for a parameterized MVN mixture model, starting with the maxi-mization step.

Usage

meE(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meEII(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meVII(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meEEI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meVEI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meEVI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meVVI(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meEEE(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meEEV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meVEV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

meVVV(data, z, eps, tol, itmax, equalPro, warnSingular,noise = FALSE, Vinv)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

Page 61: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

meE 61

z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allows computationsto proceed nearer to singularity. The default is.Mclust$eps .

tol A scalar tolerance for relative convergence of the loglikelihood values. Thedefault is.Mclust$tol .

itmax An integer limit on the number of EM iterations. The default is.Mclust$itmax .

equalPro Logical variable indicating whether or not the mixing proportions are equal inthe model. The default is.Mclust$equalPro .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .

noise A logical value indicating whether or not the model includes a Poisson noisecomponent. The default assumes there is no noise component.

Vinv An estimate of the reciprocal hypervolume of the data region. The default isdetermined by applying functionhypvol to the data. Used only whennoise= TRUE.

Value

A list including the following components:

mu A matrix whose kth column is the mean of thekth component of the mixturemodel.

sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.

pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

modelName Character string identifying the model.

loglik The logliklihood for the data in the mixture model.

Attributes: The return value also has the following attributes:

"info" : Information on the iteration.

"warn" : An appropriate warning if problems are encountered in the computa-tions.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

Page 62: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

62 mstep

See Also

em, me, estep , mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

meVVV(data = irisMatrix, z = unmap(irisClass))

mstep M-step in the EM algorithm for parameterized MVN mixture models.

Description

Maximization step in the EM algorithm for parameterized MVN mixture models.

Usage

mstep(modelName, data, z, ...)

Arguments

modelName A character string indicating the model:

"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume and shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume and shape"EEE": ellipsoidal, equal volume, shape, and orientation"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.

... Any number of the following:

equalPro A logical value indicating whether or not the components in the modelare present in equal proportions. The default is.Mclust$equalPro .

noise A logical value indicating whether or not the model includes a Poissonnoise component. The default assumes there is no noise component.

Page 63: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mstep 63

eps A scalar tolerance for deciding when to terminate computations due to com-putational singularity in covariances. Smaller values ofeps allows com-putations to proceed nearer to singularity. The default is.Mclust$eps .Not used for models "EII", "VII", "EEE", "VVV".

tol For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), a scalartolerance for relative convergence of the parameters. The default is.Mclust$tol .

itmax For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), an in-teger limit on the number of EM iterations. The default is.Mclust$itmax .

warnSingular A logical value indicating whether or not a warning should be is-sued whenever a singularity is encountered. The default is.Mclust$warnSingular .Not used for models "EII", "VII", "EEE", "VVV".

Value

A list including the following components:

mu A matrix whose kth column is the mean of thekth component of the mixturemodel.

sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.

pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

modelName A character string identifying the model (same as the input argument).

Attributes:

"info" : Information on the iteration.

"warn" : An appropriate warning if problems are encountered in the computa-tions.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mstepE , . . . ,mstepVVV, me, estep , mclustOptions .

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

mstep(modelName = "VII", data = irisMatrix, z = unmap(irisClass))

Page 64: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

64 mstepE

mstepE M-step in the EM algorithm for a parameterized MVN mixture model.

Description

Maximization step in the EM algorithm for a parameterized MVN mixture model.

Usage

mstepE(data, z, equalPro, noise = FALSE, ...)mstepV(data, z, equalPro, noise = FALSE, ...)mstepEII(data, z, equalPro, noise = FALSE, ...)mstepVII(data, z, equalPro, noise = FALSE, ...)mstepEEI(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepVEI(data, z, equalPro, noise = FALSE, eps, tol, itmax, warnSingular, ...)mstepEVI(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepVVI(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepEEE(data, z, equalPro, noise = FALSE, ...)mstepEEV(data, z, equalPro, noise = FALSE, eps, warnSingular, ...)mstepVVV(data, z, equalPro, noise = FALSE, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

z A matrix whose[i,k] th entry is the conditional probability of the ith observa-tion belonging to thekth component of the mixture.

equalPro A logical value indicating whether or not the components in the model arepresent in equal proportions. The default is.Mclust$equalPro .

noise A logical value indicating whether or not the model includes a Poisson noisecomponent. The default assumes there is no noise component.

eps A scalar tolerance for deciding when to terminate computations due to compu-tational singularity in covariances. Smaller values ofeps allows computationsto proceed nearer to singularity. The default is.Mclust$eps .

Not used for models "EII", "VII", "EEE", "VVV".

tol For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), a scalar tol-erance for relative convergence of the parameters. The default is.Mclust$tol .

itmax For models with iterative M-step ("VEI", "VEE", "VVE", "VEV"), an integerlimit on the number of EM iterations. The default is.Mclust$itmax .

warnSingular A logical value indicating whether or not a warning should be issued whenevera singularity is encountered. The default is.Mclust$warnSingular .

Not used for models "EII", "VII", "EEE", "VVV".

... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .

Page 65: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mvn 65

Value

A list including the following components:

mu A matrix whose kth column is the mean of thekth component of the mixturemodel.

sigma For multidimensional models, a three dimensional array in which the[,,k] thentry gives the the covariance for thekth group in the best model. <br> For one-dimensional models, either a scalar giving a common variance for the groups ora vector whose entries are the variances for each group in the best model.

pro A vector whosekth component is the mixing proportion for thekth componentof the mixture model.

z A matrix whose[i,k] th entry is the conditional probability of theith observa-tion belonging to thekth component of the mixture.

modelName A character string identifying the model (same as the input argument).

Attributes:

"info" Information on the iteration.

"warn" An appropriate warning if problems are encountered in the computa-tions.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mstep , me, estep , mclustOptions

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

mstepVII(data = irisMatrix, z = unmap(irisClass))

mvn Multivariate Normal Fit

Description

Computes the mean, covariance, and loglikelihood from fitting a single MVN or Gaussian to givendata.

Page 66: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

66 mvn

Usage

mvn( modelName, data)

Arguments

modelName A character string representing a model name. This can be either"Spherical" ,"Diagonal" , or "Ellipsoidal" or an MCLUST-style model name:"E", "V", "X" (one-dimensional)"EII", "VII", "XII" (spherical)"EEI", "VEI", "EVI", "VVI", "XXI" (diagonal)"EEE", "EEV", "VEV", "VVV", "XXX" (ellipsoidal)

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

Value

A list of including the parameters of the Gaussian model best fitting the data, and the correspondingloglikelihood for the data under the model.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mvnX, mvnXII , mvnXXI , mvnXXX, mstep

Examples

n <- 1000

set.seed(0)x <- rnorm(n, mean = -1, sd = 2)mvn(modelName = "X", x)

mu <- c(-1, 0, 1)

set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% (2*diag(3)),

MARGIN = 2, STATS = mu, FUN = "+")mvn(modelName = "XII", x)mvn(modelName = "Spherical", x)

set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% diag(1:3),

MARGIN = 2, STATS = mu, FUN = "+")mvn(modelName = "XXI", x)mvn(modelName = "Diagonal", x)

Page 67: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

mvnX 67

Sigma <- matrix(c(9,-4,1,-4,9,4,1,4,9), 3, 3)set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% chol(Sigma),

MARGIN = 2, STATS = mu, FUN = "+")mvn(modelName = "XXX", x)mvn(modelName = "Ellipsoidal", x)

mvnX Multivariate Normal Fit

Description

Computes the mean, covariance, and loglikelihood from fitting a single MVN or Gaussian.

Usage

mvnX(data)mvnXII(data)mvnXXI(data)mvnXXX(data)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

Details

mvnXII computes the best fitting Gaussian with the covariance restricted to be a multiple of theidentity. mvnXXI computes the best fitting Gaussian with the covariance restricted to be diagonal.mvnXXXcomputes the best fitting Gaussian with ellipsoidal (unrestricted) covariance.

Value

A list of including the parameters of the Gaussian model best fitting the data, and the correspondingloglikelihood for the data under the model.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mvn, mstepE

Page 68: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

68 partconv

Examples

n <- 1000

set.seed(0)x <- rnorm(n, mean = -1, sd = 2)mvnX(x)

mu <- c(-1, 0, 1)

set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% (2*diag(3)),

MARGIN = 2, STATS = mu, FUN = "+")mvnXII(x)

set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% diag(1:3),

MARGIN = 2, STATS = mu, FUN = "+")mvnXXI(x)

Sigma <- matrix(c(9,-4,1,-4,9,4,1,4,9), 3, 3)set.seed(0)x <- sweep(matrix(rnorm(n*3), n, 3) %*% chol(Sigma),

MARGIN = 2, STATS = mu, FUN = "+")mvnXXX(x)

partconv Convert partitioning into numerical vector.

Description

partconv converts a partitioning into a numerical vector. The second argument is used to forceconsecutive numbers (default) or not.

Usage

partconv(x, consec=TRUE)

Arguments

x Partitioning. Maybe numerical or not.

consec Logical flag, whether or not to use consecutive class numbers.

Value

Vector of class numbers.

Examples

data(iris)partconv(iris[,5])

cl <- sample(1:10, 25, replace=TRUE)partconv(cl, consec=FALSE)partconv(cl, consec=TRUE)

Page 69: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

partuniq 69

partuniq Classifies Data According to Unique Observations

Description

Gives a one-to-one mapping from unique observations to rows of a data matrix.

Usage

partuniq(x)

Arguments

x Matrix of observations.

Value

A vector of lengthnrow(x) with integer entries. An observationk is assigned an integeri when-ever observationi is the first row ofx that is identical to observationk (note thati <= k ).

Examples

data(iris)partuniq(as.matrix(iris[,1:4]))

plot.Mclust Plot Model-Based Clustering Results

Description

Plot model-based clustering results: BIC, classification, uncertainty and (for one- and two-dimensionaldata) density.

Usage

plot.Mclust(x, data, dimens = c(1, 2), scale = FALSE, ...)

Arguments

x Output fromMclust .

data The data used to producex .

dimens An integer vector of length two specifying the dimensions for coordinate pro-jections if the data is more than two-dimensional. The default isc(1,2) (thefirst two dimesions).

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE

... Further arguments to the lower level plotting functions.

Page 70: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

70 plot.mclustDA

Value

Plots selected via a menu including the following options: BIC values used for choosing the numberof clusters For data in more than two dimensions, a pairs plot of the showing the classification, co-ordinate projections of the data, showing location of the mixture components, classification, and/oruncertainty. For one- and two- dimensional data, plots showing location of the mixture components,classification, uncertainty, and or density.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

Mclust

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisMclust <- Mclust(irisMatrix)

## Not run: plot(irisMclust,irisMatrix)

plot.mclustDA Plotting method for MclustDA discriminant analysis.

Description

Plots training and test data, known training data classification, mclustDA test data classification,and/or training errors.

Usage

plot.mclustDA(x, trainingData, labels, testData, dimens=c(1,2),scale = FALSE, identify=FALSE, ...)

Arguments

x The object produced by applyingmclustDA with trainingData and clas-sificationlabels to testData .

trainingData The numeric vector, matrix, or data frame of training observations used to obtainx .

labels The numeric or character vector assigning a class label to each training observa-tion.

Page 71: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

plot.mclustDA 71

testData A numeric vector, matrix, or data frame of training observations. Categoricalvariables are not allowed. If a matrix or data frame, rows correspond to obser-vations and columns correspond to variables.

dimens An integer vector of length two specifying the dimensions for coordinate pro-jections if the data is more than two-dimensional. The default isc(1,2) (thefirst two dimesions).

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE

identify A logical variable indicating whether or not to print a title identifying the plot.Default: identify=FALSE

... Further arguments to the lower level plotting functions.

Value

Plots selected via a menu including the following options: training and test data, known trainingdata classification, mclustDA test data classification, training errors.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mclustDA

Examples

n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))## Not run:mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)odd <- seq(from = 1, to = 2*n, by = 2)even <- odd + 1testMclustDA <- mclustDA(trainingData = x[odd, ], labels = xclass[odd],

testData = x[even,])

clEven <- testMclustDA$testClassification ## classify training setcompareClass(clEven,xclass[even])

## Not run:plot(testMclustDA, trainingData = x[odd, ], labels = xclass[odd],testData = x[even,])## End(Not run)

Page 72: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

72 randProj

randProj Random projections for data in more than two dimensions modelledby an MVN mixture.

Description

Plots random projections given data in more than two dimensions and parameters of an MVN mix-ture model for the data.

Usage

randProj(data, seeds = 0, ...,type = c("classification", "uncertainty", "errors"), ask = TRUE,quantiles = c(0.75,0.95), symbols, scale = FALSE, identify = FALSE,CEX = 1, PCH = ".", xlim, ylim)

Arguments

data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables.

seeds A vector of integers between 0 and 1000, specifying seeds for the random pro-jections. The default value is the single seed 0.

... Any number of the following:

classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .

uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.

z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.

truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.

mu A matrix whose columns are the means of each group.sigma A three dimensional array in whichsigma[,,k] gives the covariance

for thekth group.decomp A list with scale , shape andorientation components giving

an alternative form for the covariance structure of the mixture model.

type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.

ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.

quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).

Page 73: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

randProj 73

symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance inclassification (the orderused by the S-PLUS functionunique ). Default: If G is the number of groupsin the classification, the firstG symbols in.Mclust$symbols , otherwise ifG is less than 27 then the firstG capital letters in the Roman alphabet.

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE

identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.

CEX An argument specifying the size of the plotting symbols. The default value is 1.

PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".

xlim, ylim Arguments specifying bounds for the ordinate, abscissa of the plot. This may beuseful for when comparing plots.

Value

Random projections of the data, possibly showing location of the mixture components, classifica-tion, uncertainty, and classficaition errors.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

coordProj , spinProj , mclust2Dplot , mclustOptions , do.call ,

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstepVVV(irisMatrix, unmap(irisClass))

par(pty = "s", mfrow = c(2,3))randProj(irisMatrix, seeds = 0:5, truth=irisClass,

mu = msEst$mu, sigma = msEst$sigma, z = msEst$z)do.call("randProj", c(list(data = irisMatrix, seeds = 0:5, truth=irisClass),

msEst))

Page 74: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

74 sigma2decomp

sigma2decomp Convert mixture component covariances to decomposition form.

Description

Converts a set of covariance matrices from representation as a 3-D array to a parameterization byeigenvalue decomposition.

Usage

sigma2decomp(sigma, G, tol, ...)

Arguments

sigma Either a 3-D array whose [„k]th component is the covariance matrix for the kthcomponent in an MVN mixture model, or a single covariance matrix in the casethat all components have the same covariance.

G The number of components in the mixture. Whensigma is a 3-D array, thenumber of components can be inferred from its dimensions.

tol Tolerance for determining whether or not the covariances have equal volume,shape, and or orientation. The default is the square root of the relative machineprecision,sqrt(.Machine$double.eps) , which is about1.e-8 .

... Catch unused arguments from ado.call call.

Value

The covariance matrices for the mixture components in decomposition form, including the follow-ing components:

d The dimension of the data.

G The number of components in the mixture model.

scale Either aG-vector giving the scale of the covariance (thedth root of its determi-nant) for each component in the mixture model, or a single numeric value if thescale is the same for each component.

shape Either aG by d matrix in which thekth column is the shape of the covariancematrix (normalized to have determinant 1) for thekth component, or ad-vectorgiving a common shape for all components.

orientation Either ad by d by G array whose[,,k] th entry is the orthonomal matrix ofeigenvectors of the covariance matrix of thekth component, or ad by d or-thonormal matrix if the mixture components have a common orientation. Theorientation component ofdecomp can be omitted in spherical and diag-onal models, for which the principal components are parallel to the coordinateaxes so that the orientation matrix is the identity.

Page 75: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

sim 75

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation, and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

decomp2sigma

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

meEst <- meEEE(irisMatrix, unmap(irisClass))names(meEst)meEst$sigma

sigma2decomp(meEst$sigma)## Not run:do.call("sigma2decomp", meEst) ## alternative call## End(Not run)

sim Simulate from Parameterized MVN Mixture Models

Description

Simulate data from parameterized MVN mixture models.

Usage

sim(modelName, mu, ..., seed = 0)

Arguments

modelName A character string indicating the model. Possible models:

"E": equal variance (one-dimensional)"V": variable variance (one-dimensional)

"EII": spherical, equal volume"VII": spherical, unequal volume"EEI": diagonal, equal volume, equal shape"VEI": diagonal, varying volume, equal shape"EVI": diagonal, equal volume, varying shape"VVI": diagonal, varying volume, varying shape"EEE": ellipsoidal, equal volume, shape, and orientation

Page 76: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

76 sim

"EEV": ellipsoidal, equal volume and equal shape"VEV": ellipsoidal, equal shape"VVV": ellipsoidal, varying volume, shape, and orientation

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

... Arguments for model-specific functions. Specifically:

• An argument describing the variance (depends on the model):

sigmasq for the one-dimensional models ("E", "V") and spherical models("EII", "VII"). This is either a vector whosekth component is the vari-ance for thekth component in the mixture model ("V" and "VII"), ora scalar giving the common variance for all components in the mixturemodel ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and someellipsoidal models ("EEV", "VEV"). This is a list described incdens .

Sigma for the equal variance model "EEE". Ad by d matrix giving thecommon covariance for all components of the mixture model.

sigma for the unconstrained variance model "VVV". Ad by d by G ma-trix array whose[,,k] th entry is the covariance matrix for thekthcomponent of the mixture model.The form of the variance specification is the same as for the output fortheem, me, or mstep methods for the specified mixture model.

pro Component mixing proportions. If missing, equal proportions are assumed.

n An integer specifying the number of data points to be simulated.

seed A integer between 0 and 1000, inclusive, for specifying a seed for random classassignment. The default value is 0.

Details

This function can be used with an indirect or list call usingdo.call , allowing the output ofe.g. mstep , em, me, or EMclust to be passed directly without the need to specify individualparameters as arguments.

Value

A data set consisting of n points simulated from the specified MVN mixture model.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

simE , . . . ,simVVV, EMclust , mstep , do.call

Page 77: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

simE 77

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

irisBic <- EMclust(irisMatrix)irisSumry <- summary(irisBic,irisMatrix)names(irisSumry)irisSim <- sim(modelName = irisSumry$modelName, n = dim(irisMatrix)[1],

mu = irisSumry$mu, decomp = irisSumry$decomp, pro = irisSumry$pro)## Not run:irisSim <- do.call("sim", irisSumry) ## alternative call## End(Not run)

par(pty = "s", mfrow = c(1,2))dimens <- c(1,2)xlim <- range(rbind(irisMatrix,irisSim)[,dimens][,1])ylim <- range(rbind(irisMatrix,irisSim)[,dimens][,2])

cl <- irisSumry$classificationcoordProj(irisMatrix, par=irisSumry, classification=cl, dimens=dimens,

xlim=xlim, ylim=ylim)cl <- attr(irisSim,"classification")coordProj(irisSim, par=irisSumry, classification=cl, dimens=dimens,

xlim=xlim, ylim=ylim)

irisSumry3 <- summary(irisBic,irisMatrix, G=3)irisSim3 <- do.call("sim", c(list(n = 500, seed = 1), irisSumry3))clPairs(irisSim3, cl = attr(irisSim3,"classification"))

simE Simulate from a Parameterized MVN Mixture Model

Description

Simulate data from a parameterized MVN mixture model.

Usage

simE(mu, sigmasq, pro, ..., seed = 0)simV(mu, sigmasq, pro, ..., seed = 0)simEII(mu, sigmasq, pro, ..., seed = 0)simVII(mu, sigmasq, pro, ..., seed = 0)simEEI(mu, decomp, pro, ..., seed = 0)simVEI(mu, decomp, pro, ..., seed = 0)simEVI(mu, decomp, pro, ..., seed = 0)simVVI(mu, decomp, pro, ..., seed = 0)simEEE(mu, pro, ..., seed = 0)simEEV(mu, decomp, pro, ..., seed = 0)simVEV(mu, decomp, pro, ..., seed = 0)simVVV mu, pro, ..., seed = 0)

Page 78: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

78 simE

Arguments

mu The mean for each component. If there is more than one component,mu is amatrix whose columns are the means of the components.

sigmasq for the one-dimensional models ("E", "V") and spherical models ("EII", "VII").This is either a vector whosekth component is the variance for thekth com-ponent in the mixture model ("V" and "VII"), or a scalar giving the commonvariance for all components in the mixture model ("E" and "EII").

decomp for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidalmodels ("EEV", "VEV"). This is a list described incdens .

pro Component mixing proportions. If missing, equal proportions are assumed.

...

Other terms describing variance:

Sigma for the equal variance model "EEE". Ad by d matrix giving the commoncovariance for all components of the mixture model.

sigma for the unconstrained variance model "VVV". Ad by d by G matrixarray whose[,,k] th entry is the covariance matrix for thekth componentof the mixture model.The form of the variance specification is the same as for the output for theem, me, or mstep methods for the specified mixture model.

n An integer specifying the number of data points to be simulated.

seed A integer between 0 and 1000, inclusive, for specifying a seed for random classassignment. The default value is 0.

Details

This function can be used with an indirect or list call usingdo.call , allowing the output ofe.g. mstep , em me, or EMclust , to be passed directly without the need to specify individualparameters as arguments.

Value

A data set consisting ofn points simulated from the specified MVN mixture model.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

sim , EMclust , mstepE , do.call

Page 79: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

spinProj 79

Examples

d <- 2G <- 2scale <- 1shape <- c(1, 9)

O1 <- diag(2)O2 <- diag(2)[,c(2,1)]O <- array(cbind(O1,O2), c(2, 2, 2))O

decomp <- list(d= d, G = G, scale = scale, shape = shape, orientation = O)mu <- matrix(0, d, G) ## center at the originsimdat <- simEEV(n=200, mu=mu, decomp=decomp, pro = c(1,1))

cl <- attr(simdat, "classification")sigma <- array(apply(O, 3, function(x,y) crossprod(x*y),

y = sqrt(scale*shape)), c(2,2,2))paramList <- list(mu = mu, sigma = sigma)coordProj( simdat, paramList = paramList, classification = cl)

spinProj Planar spin for random projections of data in more than two dimen-sions modelled by an MVN mixture.

Description

Plots random 2-D projections with suggessive rotations through a specified angles given data inmore than two dimensions and parameters of an MVN mixture model.

Usage

spinProj(data, ..., angles, seed = 0, reflection = FALSE,type = c("classification", "uncertainty", "errors"),ask = TRUE, quantiles = c(0.75,0.95), symbols, scale = FALSE,identify = FALSE, CEX = 1, PCH = ".", xlim, ylim)

Arguments

data A numeric matrix or data frame of observations. Categorical variables are notallowed. If a matrix or data frame, rows correspond to observations and columnscorrespond to variables.

... Any number of the following:

classification A numeric or character vector representing a classification of ob-servations (rows) ofdata .

uncertainty A numeric vector of values in(0,1)giving the uncertainty of eachdata point.

z A matrix in which the[i,k] the entry gives the probability of observationibelonging to thekth class. Used to computeclassification anduncertainty if those arguments aren’t available.

Page 80: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

80 spinProj

truth A numeric or character vector giving a known classification of each datapoint. If classification orz is also present, this is used for displayingclassification errors.

mu A matrix whose columns are the means of each group.

sigma A three dimensional array in whichsigma[,,k] gives the covariancefor thekth group.

decomp A list with scale , shape andorientation components givingan alternative form for the covariance structure of the mixture model.

angles The angles (in radians) through which successive projections should be rotatedor reflected.

seed A integer between 0 and 1000, inclusive, for specifying a seed for generatingthe initial random projection. The default value is 0. The seed/projection corre-spondence is the same as inrandProj .

reflection A logical variable telling whether or not the data should be reflected or rotatedthrough the given angles. The default is rotation.

type Any subset ofc("classification","uncertainty","errors") .The function will produce the corresponding plot if it has been supplied suf-ficient information to do so. If more than one plot is possible then users will beasked to choose from a menu ifask=TRUE.

ask A logical variable indicating whether or not a menu should be produced whenmore than one plot is possible. The default isask=TRUE.

quantiles A vector of length 2 giving quantiles used in plotting uncertainty. The smallestsymbols correspond to the smallest quantile (lowest uncertainty), medium-sized(open) symbols to points falling between the given quantiles, and large (filled)symbols to those in the largest quantile (highest uncertainty). The default is(0.75,0.95).

symbols Either an integer or character vector assigning a plotting symbol to each uniqueclassclassification . Elements insymbols correspond to classes inclassification in order of appearance inclassification (the orderused by the S-PLUS functionunique ). Default: If G is the number of groupsin the classification, the firstG symbols in.Mclust$symbols , otherwise ifG is less than 27 then the firstG capital letters in the Roman alphabet.

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=FALSE

identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.

CEX An argument specifying the size of the plotting symbols. The default value is 1.

PCH An argument specifying the symbol to be used when a classificatiion has notbeen specified for the data. The default value is a small dot ".".

xlim, ylim Arguments specifying bounds for the ordinate, abscissa of the plot. This may beuseful for when comparing plots.

Value

Rotations or reflections of a random projection of the data, possibly showing location of the mixturecomponents, classification, uncertainty and/or classfication errors.

Page 81: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

summary.EMclust 81

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

coordProj , randProj , mclust2Dplot , mclustOptions , do.call

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

msEst <- mstepVVV(irisMatrix, unmap(irisClass))

par(pty = "s", mfrow = c(2,2))spinProj(irisMatrix, seed = 1, truth=irisClass,

mu = msEst$mu, sigma = msEst$sigma, z = msEst$z)do.call("spinProj", c(list(data = irisMatrix, seeds = 2, truth=irisClass),

msEst))

summary.EMclust Summary function for EMclust

Description

Optimal model characteristics and classification forEMclust results.

Usage

summary.EMclust(object, data, G, modelNames, ...)

Arguments

object An "EMclust" object, which is the result of applyingEMclust to data .

data The matrix or vector of observations used to generate ‘object’.

G A vector of integers giving the numbers of mixture components (clusters) overwhich the summary is to take place (as.character(G) must be a subset ofthe column names ofobject ). The default is to summarize over all of thenumbers of mixture components used in the original analysis.

modelNames A vector of character strings denoting the models over which the summary isto take place (must be a subset of the row names of ‘object’). The default is tosummarize over all models used in the original analysis.

... Not used. For generic/method consistency.

Page 82: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

82 summary.EMclustN

Value

A list giving the optimal (according to BIC) parameters, conditional probabilitiesz , and loglikeli-hood, together with the associated classification and its uncertainty.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

EMclust

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

irisBic <- EMclust(irisMatrix)summary(irisBic, irisMatrix)summary(irisBic, irisMatrix, G = 1:6, modelName = c("VII", "VVI", "VVV"))

summary.EMclustN summary function for EMclustN

Description

Optimal model characteristics and classification forEMclustN results.

Usage

summary.EMclustN(object, data, G, modelNames, ...)

Arguments

object An "EMclustN" object, whch is the result of a pplyingEMclustN to datawith an initail noise estimate.

data The matrix or vector of observations used to generate ‘object’.

G A vector of integers giving the numbers of mixture components (clusters) overwhich the summary is to take place (as.character(G) must be a subsetof the column names of ‘object’). The default is to summarize over all of thenumbers of mixture components used in the original analysis.

modelNames A vector of character strings denoting the models over which the summary isto take place (must be a subset of the row names of ‘object’). The default is tosummarize over all models used in the original analysis.

... Not used. For generic/method consistency.

Page 83: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

summary.Mclust 83

Value

A list giving the optimal (according to BIC) parameters, conditional probabilitiesz , and loglikeli-hood, together with the associated classification and its uncertainty.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

EMclustN

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

b <- apply( irisMatrix, 2, range)n <- 450set.seed(0)poissonNoise <- apply(b, 2, function(x, n=n)

runif(n, min = x[1]-0.1, max = x[2]+.1), n = n)set.seed(0)noiseInit <- sample(c(TRUE,FALSE),size=150+450,replace=TRUE,prob=c(3,1))irisNoise <- rbind(irisMatrix, poissonNoise)

Bic <- EMclustN(data=irisNoise, noise = noiseInit)summary(Bic, irisNoise)summary(Bic, irisNoise, G = 0:6, modelName = c("VII", "VVI", "VVV"))

summary.Mclust Very brief summary of an Mclust object.

Description

Function gives a brief summary of an Mclust object: the type of model that is picked and the numberof clusters.

Usage

summary.Mclust(object, ...)

Arguments

object The result of a call to functionMclust .

... Not used.

Page 84: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

84 summary.mclustDAtest

summary.mclustDAtestClassification and posterior probability from mclustDAtest.

Description

Classifications frommclustDAtest and the corresponding posterior probabilities.

Usage

summary.mclustDAtest(object, pro, ...)

Arguments

object The output ofmclustDAtest .

pro Prior probabilities for each class in the training data.

... Not used. For generic/method consistency.

Value

A list with the following two components:

classficationThe classification frommclustDAtest

z Matrix of posterior probabilities in which the[i,j] th entry is the probabilityof observationi belonging to classj.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mclustDAtest

Examples

set.seed(0)n <- 100 ## create artificial data

x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])

xclass <- c(rep(1,n),rep(2,n))## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)

Page 85: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

summary.mclustDAtrain 85

odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)

even <- seq(1, 2*n, 2)test <- mclustDAtest(x[even, ], train) ## compute model densitiestestSummary <- summary(test) ## classify training set

names(testSummary)testSummary$classtestSummary$z

summary.mclustDAtrainModels and classifications from mclustDAtrain

Description

The models selected inmclustDAtrain and the corresponding classfications.

Usage

summary.mclustDAtrain(object, ...)

Arguments

object The output ofmclustDAtrain .

... Not used. For generic/method consistency.

Value

A list identifying the model selected bymclustDAtrain for each class of training data and thecorresponding classification.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mclustDAtrain

Page 86: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

86 surfacePlot

Examples

set.seed(0)n <- 100 ## create artificial data

x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])

xclass <- c(rep(1,n),rep(2,n))## Not run:par(pty = "s")mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)## End(Not run)

odd <- seq(1, 2*n, 2)train <- mclustDAtrain(x[odd, ], labels = xclass[odd]) ## training stepsummary(train)

surfacePlot Density or uncertainty surface for two dimensional mixtures.

Description

Plots a density or uncertainty surface given data in more than two dimensions and parameters of anMVN mixture model for the data.

Usage

surfacePlot(data, mu, pro, ..., type = c("contour", "image", "persp"),what = c("density", "uncertainty", "skip"),transformation = c("none", "log", "sqrt"),grid = 50, nlevels = 20, scale = FALSE, identify = FALSE,verbose = FALSE, xlim, ylim, swapAxes = FALSE)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variablesare not allowed. If a matrix or data frame, rows correspond to observations andcolumns correspond to variables.

mu A matrix whose columns are the means of each group.

pro A list with scale , shape andorientation components giving an alterna-tive form for the covariance structure of the mixture model.

... An argument specifying the covariance structure of the model. If used an indi-rect function call viado.call (see example below), it is usually not necessaryto know the precise form for this argument. This argument usually take one ofthe following forms:

sigma A three dimensional array in whichsigma[,,k] gives the covariancefor thekth group.

decomp A list with scale , shape andorientation components givingan alternative form for the covariance structure of the mixture model.

type Any subset ofc("contour","image","persp") indicating the plot type.For more than one selection, users will be asked to choose from a menu.

Page 87: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

surfacePlot 87

what Any subset ofc("density","uncertainty","skip") indicating whatto plot. For more than one selection, users will be asked to choose from a menu.The "skip" produces and empty plot, which may be useful if multiple plotsare displayed simultaneously.

transformationAny subset ofc("none","log","sqrt") indicating a transformation tobe applied to the surface values before plotting. For more than one selection,users will be asked to choose from a menu.

grid The number of grid points (evenly spaced on each axis). The mixture densityand uncertainty is computed atgrid x grid points to produce the surfaceplot. Default:50 .

nlevels The number of levels to use for a contour plot. Default:20 .

scale A logical variable indicating whether or not the two chosen dimensions shouldbe plotted on the same scale, and thus preserve the shape of the distribution.Default: scale=F

identify A logical variable indicating whether or not to add a title to the plot identifyingthe dimensions used.

verbose A logical variable telling whether or not to print an indication that the functionis in the process of computing values at the grid points, which typically takessome time to complete.

xlim, ylim An argument specifying bounds for the ordinate, abscissa of the plot. This maybe useful for when comparing plots.

swapAxes A logical variable indicating whether or not the axes should be swapped for theplot.

Value

An invisible list with components x, y, and z in which x and y are the values used to define the gridand z is the transformed density or uncertainty at the grid points.

Side Effects

One or more plots showing location of the mixture components, classification, uncertainty, and/orclassification errors.

Details

For an image plot, a color scheme may need to be selected on the display device in order to viewthe plot.

References

C. Fraley and A. E. Raftery (2002). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

mclust2Dplot , do.call

Page 88: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

88 uncerPlot

Examples

n <- 250 ## create artificial dataset.seed(0)x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),

matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])xclass <- c(rep(1,n),rep(2,n))

xEMclust <- summary(EMclust(x),x)surfacePlot(x, mu = xEMclust$mu, sigma = xEMclust$sigma, pro=xEMclust$pro,

type = "contour", what = "density", transformation = "none")

## Not run: do.call("surfacePlot", c(list(data = x), xEMclust))

uncerPlot Uncertainty Plot for Model-Based Clustering

Description

Plots the uncertainty in converting a conditional probablility from EM to a classification in model-based clustering.

Usage

uncerPlot(z, truth, ...)

Arguments

z A matrix whose[i,k] th entry is the conditional probability of the ith observationbelonging to thekth component of the mixture.

truth A numeric or character vector giving the true classification of the data.

... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .

Details

Whentruth is provided and the number of classes is compatible withz , the functioncompareClassis used to to find best correspondence between classes intruth andz .

Value

A plot of the uncertainty profile of the data, with uncertainties in increasing order of magnitude.If truth is supplied and the number of classes is the same as the number of columns ofz , theuncertainty of the misclassified data is marked by vertical lines on the plot.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST: Software for model-based clustering, densityestimation and discriminant analysis. Technical Report, Department of Statistics, University ofWashington. Seehttp://www.stat.washington.edu/mclust .

Page 89: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

unmap 89

See Also

EMclust , em, me, mapClass

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])

irisBic <- EMclust(irisMatrix)irisSumry3 <- summary(irisBic, irisMatrix, G = 3)

uncerPlot(z = irisSumry3$z)

uncerPlot(z = irisSumry3$z, truth = rep(1:3, rep(50,3)))

do.call("uncerPlot", c(irisSumry3, list(truth = rep(1:3, rep(50,3)))))

unmap Indicator Variables given Classification

Description

Converts a classification into a matrix of indicator variables.

Usage

unmap(classification, noise, ...)

Arguments

classificationA numeric or character vector. Typically the distinct entries of this vector wouldrepresent a classification of observations in a data set.

noise A single numeric or character value used to indicate observations correspondingto noise.

... Provided to allow lists with elements other than the arguments can be passed inindirect or list calls withdo.call .

Value

An n by m matrix of (0,1) indicator variables, wheren is the length ofclassification andm is the number of unique values or symbols inclassification . Columns are labeled by theunique values inclassification , and the[i,j] th entry is1 if classification[i] isthejth unique value or symbol in order of appearance in theclassification . If a noise valueof symbol is designated, the corresponding indicator variables are located in the last column of thematrix.

Page 90: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

90 unmap

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density es-timation.Journal of the American Statistical Association 97:611-631. Seehttp://www.stat.washington.edu/mclust .

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density esti-mation and discriminant analysis. Technical Report, Department of Statistics, University of Wash-ington. Seehttp://www.stat.washington.edu/mclust .

See Also

map, estep , me

Examples

data(iris)irisMatrix <- as.matrix(iris[,1:4])irisClass <- iris[,5]

z <- unmap(irisClass)z

emEst <- me(modelName = "VVV", data = irisMatrix, z = z)emEst$z

map(emEst$z)

Page 91: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

Index

∗Topic clusterbic , 8bicE , 10bicEMtrain , 11cdens , 12cdensE , 15classError , 18clPairs , 17compareClass , 19coordProj , 20cv1EMtrain , 22decomp2sigma , 23Defaults.Mclust , 1dens , 24density , 26em, 28EMclust , 3EMclustN , 5emE, 31estep , 33estepE , 36grid1 , 38hc , 39hcE, 40hclass , 42hypvol , 43map, 44mapClass , 45Mclust , 7mclust1Dplot , 46mclust2Dplot , 48mclustDA , 50mclustDAtest , 52mclustDAtrain , 53mclustOptions , 54me, 56meE, 58mstep , 60mstepE , 62mvn, 64mvnX, 65partconv , 66partuniq , 67

plot.Mclust , 68plot.mclustDA , 69randProj , 70sigma2decomp , 72sim , 74simE , 76spinProj , 77summary.EMclust , 80summary.EMclustN , 81summary.Mclust , 82summary.mclustDAtest , 82summary.mclustDAtrain , 83surfacePlot , 84uncerPlot , 86unmap, 88

∗Topic datasetschevron , 17diabetes , 28lansing , 43

∗Topic distributiondensity , 26

∗Topic internalmclust-internal , 46

∗Topic smoothdensity , 26

.Mclust , 56

.Mclust (Defaults.Mclust ), 1[.EMclust (mclust-internal ), 46[.EMclustN (mclust-internal ), 46[.mclustDAtest (mclust-internal ),

46

bic , 8, 11bicE , 9, 10bicEEE (bicE ), 10bicEEI (bicE ), 10bicEEV (bicE ), 10bicEII (bicE ), 10bicEMtrain , 11, 22bicEVI (bicE ), 10bicV (bicE ), 10bicVEI (bicE ), 10bicVEV (bicE ), 10bicVII (bicE ), 10

91

Page 92: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

92 INDEX

bicVVI (bicE ), 10bicVVV , 9bicVVV (bicE ), 10bw.nrd , 27

cdens , 12, 15, 16, 25, 29, 31, 32, 34, 36, 74,76

cdensE , 14, 15cdensEEE (cdensE ), 15cdensEEI (cdensE ), 15cdensEEV (cdensE ), 15cdensEII (cdensE ), 15cdensEVI (cdensE ), 15cdensV (cdensE ), 15cdensVEI (cdensE ), 15cdensVEV (cdensE ), 15cdensVII (cdensE ), 15cdensVVI (cdensE ), 15cdensVVV , 14cdensVVV (cdensE ), 15charconv (mclust-internal ), 46chevron , 17classError , 18, 20, 45, 51classErrors (classError ), 18clPairs , 17, 21, 47, 49compareClass , 19, 19, 51coordProj , 18, 20, 47, 49, 72, 79cv1EMtrain , 12, 22

decomp2sigma , 23, 73Defaults.Mclust , 1dens , 14, 16, 24, 38, 44density , 26, 27diabetes , 28do.call , 9, 11, 14, 16, 21, 25, 30, 33, 35, 37,

47, 49, 72, 75, 77, 79, 86

em, 2, 28, 33, 35, 37, 45, 58, 60, 87EMclust , 2, 3, 6, 8, 9, 11, 14, 16, 54, 75, 77,

80, 87EMclustN , 4, 5, 81emE, 30, 31emEEE(emE), 31emEEI (emE), 31emEEV(emE), 31emEII (emE), 31emEVI (emE), 31emV(emE), 31emVEI (emE), 31emVEV(emE), 31emVII (emE), 31emVVI (emE), 31emVVV, 30

emVVV(emE), 31estep , 2, 9, 30, 33, 37, 45, 58, 60, 62, 63, 88estep2 (mclust-internal ), 46estepE , 11, 35, 36estepEEE (estepE ), 36estepEEI (estepE ), 36estepEEV (estepE ), 36estepEII (estepE ), 36estepEVI (estepE ), 36estepV (estepE ), 36estepVEI (estepE ), 36estepVEV (estepE ), 36estepVII (estepE ), 36estepVVI (estepE ), 36estepVVV , 35estepVVV (estepE ), 36

grid1 , 25, 38, 44grid2 (grid1 ), 38

hc , 4, 6, 39, 41, 42, 54hcE, 39, 40, 40, 42hcEEE (hcE), 40hcEII (hcE), 40hclass , 40, 41, 42hcV (hcE), 40hcVII (hcE), 40hcVVV, 40hcVVV (hcE), 40hist , 27hypvol , 43

lansing , 38, 43

map, 44, 88mapClass , 19, 20, 45, 45, 87Mclust , 7, 68mclust-internal , 46mclust1Dplot , 46mclust2Dplot , 21, 47, 48, 72, 79, 86mclust2DplotControl

(mclust-internal ), 46mclustDA , 50, 70mclustDAtest , 14, 51, 52, 54, 83mclustDAtrain , 2, 14, 51, 52, 53, 84mclustOptions , 2, 4, 6, 9, 11, 14, 16, 18,

21, 25, 30, 33, 35, 37, 49, 54, 54, 58,60, 62, 63, 72, 79

mclustProjControl(mclust-internal ), 46

me, 2, 4, 6, 30, 45, 56, 60, 62, 63, 87, 88meE, 58, 58meEEE(meE), 58

Page 93: The mclust Package - CMU Statisticsbrian/724/week14/mclust.pdf · 2005-03-13 · The mclust Package January 18, 2005 Version 2.1-8 Author C. Fraley and A.E. Raftery, Dept. of Statistics,

INDEX 93

meEEI (meE), 58meEEV(meE), 58meEII (meE), 58meEVI (meE), 58meV(meE), 58meVEI (meE), 58meVEV(meE), 58meVII (meE), 58meVVI (meE), 58meVVV, 58meVVV(meE), 58mstep , 2, 14, 16, 30, 33, 35, 37, 58, 60, 63,

64, 75mstepE , 62, 62, 66, 77mstepEEE (mstepE ), 62mstepEEI (mstepE ), 62mstepEEV (mstepE ), 62mstepEII (mstepE ), 62mstepEVI (mstepE ), 62mstepV (mstepE ), 62mstepVEI (mstepE ), 62mstepVEV (mstepE ), 62mstepVII (mstepE ), 62mstepVVI (mstepE ), 62mstepVVV, 62mstepVVV (mstepE ), 62mvn, 64, 66mvn2plot (mclust-internal ), 46mvnX, 64, 65mvnXII , 64mvnXII (mvnX), 65mvnXXI , 64mvnXXI (mvnX), 65mvnXXX, 64mvnXXX(mvnX), 65

nextPerm (mclust-internal ), 46

orth2 (mclust-internal ), 46

pairs , 18partconv , 66partuniq , 67plot.density , 27plot.EMclust (EMclust ), 3plot.EMclustN (EMclustN ), 5plot.Mclust , 8, 68plot.mclustDA , 51, 69print.density (density ), 26print.EMclust (EMclust ), 3print.EMclustN (EMclustN ), 5print.Mclust (Mclust ), 7print.mclustDA (mclustDA ), 50

print.summary.EMclust(summary.EMclust ), 80

print.summary.EMclustN(summary.EMclustN ), 81

randProj , 21, 49, 70, 79

shapeO (mclust-internal ), 46sigma2decomp , 24, 72sim , 74, 77simE , 75, 76simEEE (simE ), 76simEEI (simE ), 76simEEV (simE ), 76simEII (simE ), 76simEVI (simE ), 76simV (simE ), 76simVEI (simE ), 76simVEV (simE ), 76simVII (simE ), 76simVVI (simE ), 76simVVV, 75simVVV (simE ), 76spinProj , 49, 72, 77summary.EMclust , 4, 80summary.EMclustN , 6, 81summary.Mclust , 82summary.mclustDAtest , 52, 82summary.mclustDAtrain , 54, 83surfacePlot , 49, 84

table , 19, 20, 45traceW (mclust-internal ), 46

uncerPlot , 86unchol (mclust-internal ), 46unmap, 45, 88

vecnorm (mclust-internal ), 46