jack chen
DESCRIPTION
Crash Course in R · October 16, 2009. Jack Chen. Presentation Flow. R Session: Function writing Plots customization Simulation tips. Background/ Environment. Read/Write Data. Common Data Structures and Operations. Object- o rientated Concept. Graphics Samples. Control Blocks. - PowerPoint PPT PresentationTRANSCRIPT
Jack ChenCrash Course in R · October 16, 2009
2
Presentation Flow
Major Topics
Background/ Environment
Object- orientated Concept
Common Data Structures and Operations
Read/Write Data
Graphics Samples
R Session:
Function writing Plots customization Simulation tips
Control Blocks
Background/Environment
4
Mid 1970s, Bell Laboratory
John Chambers, Rick Becker
History Background
Statistical
Computing
Subroutines
FortranInteractive
Environment
“Interactive Statistical Computing System”
“Statistical Analysis System”
Engine
“S”
5
Early 1990s, University of Auckland
Ross Ihaka, Robert Gentleman
”R”
History Background
Interactive
Environment
Statistical
Computing
Subroutines
Engine
C
Pas
cal
Java
C++Fortran
Perl …R functions
6
Major differences between S and Ro Syntaxo Memory managemento Variable scopingo S has developed into S-plus, a commercially available
software from Tibcoo R is an open source freeware, with contributed packages
from researchers worldwideo Recently, XLSolutions is developing R-plus, the
commercial version of R
History Background
7
Starting R in Windows
Environment
Command line to interact with R
Mouse-click menus
Mouse-click shortcuts
Windows
8
Some keyboard shortcuts for the Windows platform:o Esc: cancels current line of execution (useful when running
into trouble)o Ctr-p or arrow up: previous commando Ctr-n or arrow down: next commando Ctr-u: erase lineo Ctr-a or ‘home’: beginning of lineo Ctr-e or ‘end’: end of lineo Ctr-c: copy highlighted texto Ctr-v: pasteo Ctr-x: copy and paste highlighted texto Ctr-l: clear command line windowo Ctr-z or q(): quit
EnvironmentWindows
9
Starting R in Unix
Environment
Command line to interact with R
Unix
10
Some keyboard shortcuts for Unix platform:o Esc or Ctr-c: cancels current line of execution (useful when
running into trouble)o Ctr-p or arrow up: previous commando Ctr-n or arrow down: next commando Ctr-u: erase lineo Ctr-a: beginning of lineo Ctr-e: end of lineo Ctr-z: send to background (type fg to bring back R)o Ctr-l: clear command line windowo Ctr-r : reverse search command historyo q(): quit session
EnvironmentUnix
11
R has an interpretative environment
Everything you type on the command line followed by ‘enter’ will be sent to R’s internal engine. R performs the following steps:o Interprets what you have typedo Evaluates ito Returns a result (possibly an error message)
The only exception when R sees a comment. R does not interpret anything after the pound sign #
EnvironmentR interpretor
Object-oriented Concept
13
Object-oriented programming is a natural way to classify and modularize “things” of interest in order to interact with them during program execution.
For example, suppose in our program there are 3 shapes:o Circleo Squareo Triangle
Initializationo We want to be able to create different shapes of different
sizes Interaction
o We want each shape to be able to report to us its areao We want each shape to be able to display itself
Object-oriented ConceptIntuition
14Object-oriented ConceptIntuition
Class: Shape Type: Circle Functions:
Report area Draw
Attributes: Name ID Radius r
Class: Shape Type: Square Functions:
Report area Draw
Attributes: Name ID Width w
Class: Shape Type: Triangle
(Isosceles) Functions:
Report area Draw
Attributes: Name ID Base b Height h
Internally in a program:
15Object-oriented ConceptIntuition
Class: Shape Type: Circle Functions:
Report area Draw
Attributes: Name ID Radius r
Typical programming steps:
Radius: r = 1
Name: ID = circle1
Initialize
Interact
Tell me the area
12π= 3.14159…
Interact
Draw
16Object-oriented ConceptIntuition
Class: Shape Type: Circle Functions:
Report area Draw
Attributes: Name ID Radius r
Typical programming steps:
Radius: r = 2
Name: ID = circle2
Initialize
Interact
Tell me the area
22π= 12.566…
Interact
Draw
17Object-oriented ConceptIntuition
Typical programming steps:
Width: w = 1
Name: ID = square1
Initialize
Interact
Tell me the area
12= 1
Interact
Draw
Class: Shape Type: Square Functions:
Report area Draw
Attributes: Name ID Width w
18Object-oriented ConceptIntuition
Typical programming steps:
base: b = 1
Height: h = 0.866
Name: ID = tri1
Initialize
Interact
Tell me the area1(0.866)/2
= 0.433
Interact
Draw
Class: Shape Type: Triangle
(Isosceles) Functions:
Report area Draw
Attributes: Name ID Base b Height h
19Object-oriented ConceptIntuition
Class: Shape Type: Circle Functions:
Report area Draw
Attributes: Name ID Radius r
Translating to sensible commands:
Radius: r = 1
Name: ID = circle1
Initialize
Interact
Tell me the area
12π= 3.14159…
Interact
Drawcircle1 = Circle(r=1)
area(circle1)
draw(circle1)
20
Programming commandso circle2 = Circle(radius=2)o area(circle2)o draw(circle2)
o square1 = Square(w=1)o area(square1)o draw(square1)
o tri1 = Triangle(b=1, h=0.866)o area(tri1)o draw(tri1)
Object-oriented ConceptIntuition
21
What does this have to do with R?
o R is inherently object-oriented.o R has a set of pre-defined objects that we can interact
with themo There are tons of objects inside various packages in R
online repository for us to perform various taskso We can also write our own R objects that perform
analysis to our needso The way we interact with R is very similar to the way we
interacted with the program with 3 shapes
Object-oriented ConceptIn relation to R
Common Data Structures and Operations in R
23
Primitive data objectso Comes with all R installations
o Integers: -3, -2, 1, 2, 3, 1e+10, …o Doubles: 0.789, 3.14, 1.68, 2.9e-6, …o Complex numbers: 3i+7, 2i+3, …o Characters: “a”, “zZ”, “I hope you are still awake”,…o Constants: pio Logical symbols: TRUE, FALSEo The empty object: NULLo Missing value: NAo Infinity: Info Some others
Common Data StructuresPrimitive data objects
24
Primitive operators arithmetic: +, -, *, / modular: %% matrix multiply: %*% power: ^ logical and/or: &, | relation: <, <=, >, >=, ==, != assignment: =, <-
Common Data StructuresPrimitive operators
25
R function calls have the form: functionName(arg1, arg2, …)
Primitive functions square-root: sqrt(arg) exponential: exp(arg) natural log: log(arg) length of object: length(arg) sum of elements in object: sum(obj) concatenate objects: c(arg1, arg2, …) round down to nearest integer: floor(arg) round up to nearest integer: ceiling(arg) many many others
Common Data StructuresPrimitive functions
26
Examples of valid expressions 1 “a” ‘a’ 1 & TRUE TRUE == FALSE TRUE != FALSE 2 > 3 1 + 2 + 3 + 4 2^3 a = 4; b = 2^a log(37)
Common Data StructuresSimple valid expressions
27
Examples of invalid expressions lala # variable not assigned sqrt(25, 4) # too many arguments log(1 2) # invalid argument 1 = “a” # cannot assign value to primitive numeric TRUE = 3 # cannot assign value to primitive logical
Common Data StructuresSimple invalid expressions
28
Vectors
o R vectors are column vectors, even though they are displayed horizontally in R
o c(object1, object2, …, objectN)
o c stands for: concatenate object1, object2, …, objectN
Common Data Structures and Constructsvectors
29
Examples of vectors:
o c(1, 2, 3, 4) # numeric vector, (1, 2, 3, 4)o c(1:4) # same as aboveo c(1, “a”) # mixture of object typeso c(c(1:3),c(7:10)) # (1, 2, 3, 7, 8, 9, 10)o c(TRUE, FALSE) # logical vector
Common Data Structures and Constructsvectors
30
Other ways to form vectors:
o seq(start, end, by increment) seq(1, 10, 1) # equivalent to c(1:10) seq(10, 1, -1) # equivalent to c(10:1)
o rep(object, repeat) rep(1, 10) # a vector of 10 1’s rep(c(1, 2), 10) # a vector of 1 2 1 2 …
Common Data Structures and Constructsvectors
31
Accessing vector elements
o vector[start index:end index] v = c(1, 2, 3, 4) # assigns v c(1, 2, 3, 4)[1] # returns 1 c(1, 2, 3, 4)[2:4] # returns (2, 3, 4) c(1, 2, 3, 4)[-1] # removes 1st element, returns (2, 3, 4) c(1, 2, 3, 4)[c(1, 3)] # returns (1, 3)
Common Data Structures and Constructsvectors
32
Matrices
o R matrices are objects internally represented as vectors, with 2 additional attributes: number of rows number of columns
o matrix(c(object1, object2, …, objectN), nrow = I, ncol = J)
Common Data Structures and Constructsmatrices
33
Examples of matrices:o matrix(c(1:12), nrow=4, ncol=3)
o matrix(c(1:12), 4, 3) # same as aboveo matrix(c(1:12), nrow=4) # same as aboveo matrix(c(1:12), ncol=3) # same as
aboveo matrix(c(1:12), 4, 2) # invalid
Other ways to form matrices:o diag(1, 10) # 10x10 identity matrixo diag(“a”, 10) # 10x10 matrix with diagonal of “a”o diag(c(1:10), 10) # 10x10 matrix with diagonal
# entries 1, 2, …, 10
Common Data Structures and Constructsmatrices
34
Accessing matrix elements
o matrix[(accessing row vectors), (accessing column vectors)] A = matrix(c(1:9), 3, 3) # assign matrix to variable name A A[1, 1] # returns 1st row 1st element A[1, ] # returns row 1 A[, 1] # returns column 1 A[, 1:2] # returns column 1, 2 A[1:5] # returns (1, 2, 3, 4, 5)
Common Data Structures and Constructsmatrices
35
Matrix manipulationo Adding a row
rbind(matrix object, vector object)o Adding a column
cbind(matrix object, vector object)
o Examples: A = matrix(c(1:9), 3 , 3) cbind(matrix, c(10:12)) # add (10, 11, 12) as last
# column cbind(A[,1], c(10:12), A[,2:3]) # add (10, 11, 12) as
# 2nd column
Common Data Structures and Constructsmatrices
36
Matrix operationo Matrix operations on matrices A, B of conforming
dimensions Addition: A + B Subtraction: A - B Multiplication: A %*% B Inverse: solve(A) Transpose: t(A) Determinant: det(A)
Common Data Structures and Constructsmatrices
37
Listso Traditionally vectors and matrices contain simple data
objects, mostly primitive data objects. More complex data structures are stored in lists.
o lists contain objects and their assigned names:
o list(name1=object1, name2=object2, …)
Example of a list:o list(foo=“hello”, bar=“world”)
Common Data Structures and Constructslists
38
Accessing elements in a list:o We can reference objects in lists by their names with the
dollar “$” operator: alist = list(Friday=“happy”, Monday=“urrr”) alist$Friday # returns “happy” alist$Monday # returns “urrr”
o If no object in the list contains the name following $, then NULL is returned: alist$Tuesday # returns NULL
o We can also access objects in lists by their index with double bracket [[index]]: alist[[1]] # returns “happy” alist[[2]] # returns “urrr”
Common Data Structures and Constructslists
39
Operating on R objects
o R operations are vector-basedo When the left hand side (LHS) and right hand side (RHS)
of an operator conform, elements on LHS of an operator interact with elements on RHS
o Examples c(1, 2) + c(3, 4) # returns (4, 6) c(1, 2) + c(3, 4, 5, 6) # returns (4, 6, 6, 8)
# (1, 2) is added to (3, 4) and (5, 6) 2^c(1, 2, 3, 4) # returns (2, 4, 8, 16) c(1, 2)^c(1, 2, 3, 4) # returns (1, 4, 1, 16)
Operationsoperating on R objects
40
Operating on R objects
o Most of the built-in R objects can report their dimensions.
o Examples: length(c(1:4)) # return 4 length(list(a=1, b=2)) # return 2 length(matrix(c(1:12),4,3)) # return 12 nrow(matrix(c(1:12),4,3)) # returns 4 ncol(matrix(c(1:12),4,3)) # returns 3
Operationsoperating on R objects
Control Blocks
42
Logical Expressionso Logical expression is an expression which evaluates to
TRUE or FALSEo Logical expressions can be formed by the relation
operators equal: == not equal: != less than < greater than > less than or equal to: <= greater than or equal to: >=
o Examples: 0 < 1 # evaluates to TRUE 0 > 1 # evaluates to FALSE “A” == “a” # evaluates to FALSE
Control BlocksLogical expressions
43
if-else statemento if (logical expression) { … } else { … }
{ … } can be a single expression, or a group of expressions and statements, including another if-else statement.
The else part of the statement is optional.
o Examples: if (0 < 1) “true” if (0 > 1) “should not see anything” if (“a” == “A”) { “not equal” } else { “equal” } if (FALSE) { “nothing” } else if (TRUE) { “something” }
Control Blocksif-else statement
44
While loopo while (logical expression) { … }
{ … } (the “body” of the statement) can be a single expression, or a group of expressions.
while statement loops inside { … } until the logical expression evaluates to FALSE.
o Example: while (TRUE) { “never ends!!” } while (FALSE) { “never executed!!” } x=1; while (x==1) { print(x); x=2 } # prints 1, then
# assign x to 2
Control Blockswhile loop
45
For loopo for (index in start:end) { … }
{ … } (the “body” of the statement) can be a single expression, or a group of expressions or statements.
for statement loops in { … } until index exceeds end
o Example: for (i in 1:10) { print(i); }
Control Blocksfor loop
Read/Write Data
47
Read/Write Datao Importing and Exporting data in R is relatively painless.o We can easily import/export files where:
data points are separated by commas data points are separated by tabs or spaces data points are separated by some other delimiter.
Read SAS/SPSS/Stata datao Package “foreign” contains functions that allow you to
read, among others, SAS/SPSS/Stata data. type: install.packages(“foreign”), select a location to download package,
the rest is automatic type: library(foreign) to load the package type: help(package = foreign) to see a list of functions
Read/Write Data
48
Example of reading a file
# reads a file, data points separated by spaces or tabs# assign first column to y, second column to x1, third column to x2file = “http://www-personal.umich.edu/~jktc/R/samples/simple.dat”read.table(file, col.names=c(“y”, “x1”, “x2”))
# specify missing data in fileread.table(file, na.strings= “.”)
# if first row of data file has header (names for each column)file2 = http://www-personal.umich.edu/~jktc/R/samples/simple.header.datread.table(file2, header=TRUE)
# to see more details of read.table functionhelp(read.table)
Read/Write DataReading from a file
49
Example of writing to a file
data = matrix(c(1:9), 3, 3)
# write a space separated file.# assign first column to y, second column to x1# third column to x2
write.table(data, file=“c:/temp/simple.dat”, row.names=FALSE, col.names=c(“y”, “x1”, “x2”), sep=“ “)
# to see more details on write.table functionhelp(write.table)
Read/Write DataWriting to a file
Graphics Samples
51
R has a sophisticated and powerful graphic engine.
We can think of graphic engine as one large object with many attributes representing different pieces to be displayed. The par function allows you to change different attributes of a graph.
Take a look at the different graphic parameters that are available in R:o help(par)
Graphics SamplesBasic graphics
52
Sample plot
Graphic SamplesSample plot
53
Sample plot
Graphic SamplesSample plot
54
Image plot
Graphic SamplesSample graphics
x
y
110
120
135
140
100 200 300 400 500 600 700 800
100
200
300
400
500
600
Maunga Whau Volcano
55
3-D figure
Graphic SamplesSample graphics
R Session:
Function writing, Plots customization, Simulation tips
57
Writing and Debugging Functions
o One of the advantages in R is the ease of creating our own functions. Here’s a very simple function: foo = function() { print(“hello world”); }
o Functions are object themselves.
o We are assigning to variable “foo” a function with no argument.
o When executed: foo(), a message “hello world” is printed to screen.
Functions Writing/DebuggingFunction syntax
58
Sample function writing session:1. Generate population of size 1000 based on the model:
2. Take a random sample of size 100 from population3. Perform two simple linear regressions of y on x:
fit one with intercept fit one without intercept
4. Repeat steps 2 and 3 500 times, store each regression coefficients and plot a histogram of their distribution over the 500 values (ie, distributions of estimated coefficients based on samples).
Functions WritingFunction writing session
59
More on graphics:o To output a plot/graph to a file
pdf(file=filename) # generates pdf file jpeg(file=filename) # generates jpeg file png(file=filename) # generates png file and some others
o When you are done graphing/plotting, run dev.off() to have the image saved in file.
o Without calling the above functions, R generates graphics in a separate window.
o The package “xtable” allows you to output tables into various formats, including html, latex, etc.
Functions WritingFunction writing session
60
Help and administrative functionso help.search(any key word) # help.search(“random
forest”)o help(functionName) # help(glm)o install.packages(“packageName”) # note the quoteo require(packageName)o save(file= , list= )o save.image(file= )
Other common functionso Model fitting
lm, glm, lsfit, anova summary, coef, residuals
o Model adequecy checking av.plot (in car package), influence.measure, colldiag
Functions Writing/DebuggingCommon functions
61
Distributions functionso For normal distribution, R has 4 associated functions:
dnorm: probability density function pnorm: cumulative density function qnorm: inverse of cumulative density function rnorm: point generating function
o Others dpois, ppois, qpois, rpois (poisson) dgeom, pgeom, qgeom, rgeom (geometric) dbinom, pbinom, qbinom, rbinom (binomial) dnbinom, pnbinom, qnbinom, rnbinom (negative binomial) dunif, punif, qunif, runif (uniform) dexp, pexp, qexp, rexp (exponential) dgamma, pgamma, qgamma, rgamma (gamma) dbeta, pbeta, qbeta, rbeta (beta) dchisq, pchisq, qchisq, rchisq (chi-square) df, pf, qf, rf (F distribution) dt, pt, qt, rt (t distribution)
Functions Writing/DebuggingCommon functions
62
Running R commands in batch mode under Unix environment
o Suppose the R commands are in file: cmds.Ro At a command line prompt, type:
R --no-save < cmds.R > output.log 2>&1
o To see the details of command line options: man R
Functions Writing/DebuggingCommon functions
63
References Official R-project website:
o http://www.r-project.orgo On the left hand side, there’s a link “Manuals” under Documentation.
There are quite a few good documentations.o The link “packages” gives a listing of available R packages, and their
documentations.
An excellent link with R examples (including linking R with C/C++ programs):o http://www.math.ncu.edu.tw/~chenwc/R_note/
R for Windows FAQ:o http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/rw-FAQ.html
Google:o Since R is single letter, searching “R” might give you unrelated results. I’ve
used: R+project, R+cran, R+stat, etc…o cran stands for “Complete R Archive Network”
Wrapping upReferences
64
Thank you!
The slides are posted at:http://www-personal.umich.edu/~jktc/R/presentation2009.pptx
The sample R commands in the slides are posted at:http://www-personal.umich.edu/~jktc/R/samples/sample.cmds.2009.R
This is it!Q&A