introduction to contributed packages in r department of statistical sciences and operations research...

42
Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: [email protected]

Upload: rodney-edwards

Post on 17-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Introduction to Contributed Packages in R

Department of Statistical Sciences and Operations ResearchComputation Seminar SeriesSpeaker: Edward BooneEmail: [email protected]

Page 2: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

What is R?

The R statistical programming language is a free open source package based on the S language developed by Bell Labs.

The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to

cutting edge research. Since it is a programming language, generating

computer code to complete tasks is required.

Page 3: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Getting Started

Where to get R? Go to www.r-project.org Downloads: CRAN Set your Mirror: Anyone in the USA is fine. Select Windows 95 or later. Select base. Select R-2.4.1-win32.exe

The others are if you are a developer and wish to change the source code.

Page 4: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Getting Started

The R GUI?

Page 5: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Getting Started

Opening a script. This gives you a script window.

Page 6: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Getting Started

Submitting a program:

Use button

Right mouse click and run selection.

Submit Selection

Page 7: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Getting Started

Basic assignment and operations. Arithmetic Operations:

+, -, *, /, ^ are the standard arithmetic operators. Matrix Arithmetic.

* is element wise multiplication %*% is matrix multiplication

Assignment To assign a value to a variable use “<-”

Page 8: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Getting Started

How to use help in R? R has a very good help system built in. If you know which function you want help with

simply use ?_______ with the function in the blank.

Ex: ?hist. If you don’t know which function to use, then use

help.search(“_______”). Ex: help.search(“histogram”).

Page 9: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Importing Data

How do we get data into R? Remember we have no point and click… First make sure your data is in an easy to

read format such as CSV (Comma Separated Values).

Use code: D <- read.table(“path”,sep=“,”,header=TRUE)

Page 10: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Working with data.

Accessing columns. D has our data in it…. But you can’t see it

directly. To select a column use D$column.

Page 11: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Working with data.

Subsetting data. Use a logical operator to do this.

==, >, <, <=, >=, <> are all logical operators. Note that the “equals” logical operator is two = signs.

Example: D[D$Gender == “M”,] This will return the rows of D where Gender is “M”. Remember R is case sensitive! This code does nothing to the original dataset. D.M <- D[D$Gender == “M”,] gives a dataset with the

appropriate rows.

Page 12: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Source Files

Source files allows you to store all of your created functions in a single file and have all those functions available to you.

To load a self created library use:

source(Path)

Don’t forget that \ in the path needs to be replaced with \\

Page 13: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Libraries

In order to keep R’s memory footprint small, additional functionality is stored in libraries.

These libraries can be called through the GUI or scripts.

Beware that some contributed packages may conflict with some libraries.

Page 14: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Contributed Packages

Since R is open source and the developers are well organized, developing and finding contributed packages is easy.

Currently there are 964 contributed packages.

These range from wavelets, financial mathematics to spatial data analysis.

Page 15: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Contributed Packages

One popular library is lattice.

Page 16: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Contributed Packages

You can install contributed packages using the GUI.

Page 17: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Contributed Packages

You can install the package by selecting it from the list.

Note: Installing a package does not make it immediately available for use.

You still need to use the library() statement to make the functionality available to you.

library(lattice)

Page 18: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Help on contributed packages Once a contributed package is loaded you

can access the help for the package and a list of functions available in the package by:

library(help=“lattice”)

Page 19: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The CircStats Package Many times data may come in a circular format. For example the direction of migration or flight of birds from

their nest. The data is an angle not a “linear” measurement. The data can only go between 0 and 2

Page 20: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The CircStats Package Use the CircStats Package.

library(CircStats)

Consider the following: data <- runif(50, 0, pi)

mean.dir <- circ.mean(data)

mean.dir

[1] 1.446502

Page 21: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The CircStats Package Randomly generate data from a Von Mises distribution

data.vm <- rvm(100, 0, 3)

Create a plot of it using circ.plot:

circ.plot(data.vm, stack=TRUE, bins=150, shrink=1.5)

Page 22: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The CircStats Package Regression with circular data: Create some data

data1 <- runif(50, 0, 2*pi)

data2 <- atan2(0.15*cos(data1) + 0.25*sin(data1), 0.35*sin(data1)) + rvm(50, 0, 5)

Run the regression using circ.reg:

circ.lm <- circ.reg(data1, data2, order=1)circ.lm

(Intercept) -0.01365604 -0.02939188cos.alpha -0.29872673 0.41344126

sin.alpha 0.78894271 0.72908521

Page 23: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The CircStats Package Plot the data

plot(data1, data2)

Plot the predicted line

circ.lm$fitted[circ.lm$fitted>pi] <- circ.lm$fitted[circ.lm$fitted>pi] - 2*pi

points(data1[order(data1)], circ.lm$fitted[order(data1)], type='l')

Page 24: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package While the norm package sounds as if it

would have something to do with the normal distribution it is in fact a package for dealing with missing data.

It implements the Data Augmentation and Multiple Imputation scheme of Schafer (1997).

Similar to SAS PROC MI.

Page 25: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Load the library.

library(norm)

Page 26: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Generate some data.

X1 <- rnorm(100,6,1)

X2 <- rnorm(100,10,3)

X3 <- rnorm(100,3,.2)

X4 <- rnorm(100,31,2)

Y <- 5 +.4*X1-.3*X2+rnorm(100,0,1)

Page 27: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Generate some missing data.

X1a <- ifelse(runif(100,0,1)<.1,NA,X1)

X2a <- ifelse(runif(100,0,1)<.1,NA,X2)

Put the data together.

YX <- cbind(Y,X1a,X2a,X3,X4)

Page 28: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Prep the data and parameters for multiple

imputation.#do preliminary manipulations s <- prelim.norm(YX)

#find the mle

thetahat <- em.norm(s)

#set random number generator seed

rngseed(1234567)

Page 29: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Create a list to store the individual results in.

betaout <- vector("list",10)

betasterrout <- vector("list",10)

Page 30: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Run a multiple imputation loop

for(i in 1:10){

ximp <- imp.norm(s,thetahat,YX)

beta1 <- lm(ximp[,1]~ximp[,2]+ximp[,3]+ximp[,4]+ximp[,5] )$coefficients

betaout[[i]] <- beta1

betasterrout[[i]] <- summary(lm(ximp[,1]~ximp[,2] + ximp[,3] + ximp[,4] + ximp[,5]))$coefficients[,2]

}

Page 31: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Analyze the results

mi.inference(betaout,betasterrout,confidence=0.95)

Page 32: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The norm Contributed Package Look at the output(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

6.75624286 0.30502706 -0.32846960 0.05157696 -0.04154060

$std.err

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

2.70312542 0.13431178 0.04240159 0.65908509 0.05596610

$df

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

1318.8371 222.2528 13269.2373 1770.6680 27689.4900

$signif

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

1.256048e-02 2.410251e-02 1.021405e-14 9.376337e-01 4.579447e-01

$r

(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]

0.09004737 0.25192843 0.02673983 0.07676697 0.01835967

Page 33: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

The lpSolve package allows for the solving of linear and integer programs.

library(lpSolve)

Page 34: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

Consider the following linear program:

1 2 3

1 2 3

1 2 3

max :

9

:

2 3 9

3 2 2 15

x x x

ST

x x x

x x x

Page 35: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

Set up the vectors and matrices

f.obj <- c(1, 9, 3)

f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE)

f.dir <- c("<=", "<=")

f.rhs <- c(9, 15)

Page 36: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

The lp() function will attempt to solve the linear program.

lp ("max", f.obj, f.con, f.dir, f.rhs)

Success: the objective function is 40.5

Page 37: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

To obtain the solution grab the solution from the object.

lp("max", f.obj, f.con, f.dir, f.rhs)$solution

[1] 0.0 4.5 0.0

Page 38: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

Sensitivity analyses can be obtained from the lp() object.

The following are objects attached to an lp() object.

[1] "direction" "x.count" "objective" "const.count" [5] "constraints""int.count" "int.vec" "objval"

[9] "solution" "presolve" "compute.sens" "sens.coef.from"

[13] "sens.coef.to" "duals" "duals.from" "duals.to"

[17] "status"

Page 39: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

To solve an integer program specify the vector components for which variables need to be integers

lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3)

Success: the objective function is 37

Page 40: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

The lpSolve Package

To obtain the solution to the integer program use the solution statemet as before:

lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) $solution

[1] 1 4 0

Page 41: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Summary

R is programming environment with many standard programming structures already included.

A large number of contributed packages. Many packages allow for use of modern statistical

procedures with out having to code them yourself. Requires familiarity with R to actually implement the

packages. No support. Allows users to create new packages.

Page 42: Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email:

Summary

All of the R code and files can be found at:

www.people.vcu.edu/~elboone2/CSS.htm