r tutorial ismail

24
An Easy Introduction To R Ismail Basoglu October 2, 2009 1 Introduction This document contains an easy introduction to the programming language R. By the help of each example given in this document, you should be able to gather a basic knowledge about R, which will help you to comprehend and create financial applications in IE 586 and statistical applications in IE 508 courses. In order to comprehend this programming language, it is recommended that you try each and every step of the applications presented in this document. You can download the latest version of R from “http://cran.r-project.org/”. For Windows users, click to the “Windows” link, then the “base” link and you will see the download link for the “*.exe” file. Have fun!!! 2 R Works with Vectors 2.1 Creating Vectors In order to assign a value to a specified variable (e.g. 3 to x), we do the following: x<-3 or x=3 We will use the operator <- in our future examples for assigning values. When we assign a number to a variable, R counts it as a vector with a single element. So, by using [.] next to the specified variable, we can assign another element onto any index we want. Finally, if we want to see what is stored in that specified variable, simply we write its name and press enter. x[4]<-7.5 x # press enter to display the content of x # [1] 3.0 NA NA 7.5 Here, NA stands for “not available”. In fact, we have not assigned any values for the second and the third indices. You can use # to add comments on a command line. R will ignore the rest of the line after this symbol. However, the next line will be executed by R. (So you do not have to close your comment line with # when it ends) We can create an consecutive integer vector between two integers by a simple command. x<-1:8 # creates a consecutive integer vector x #[1]12345678 y<-15:11 y # [1] 15 14 13 12 11 1

Upload: pinaryozgatli

Post on 04-Apr-2015

74 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: r Tutorial Ismail

An Easy Introduction To R

Ismail Basoglu

October 2, 2009

1 Introduction

This document contains an easy introduction to the programming language R. By the help of eachexample given in this document, you should be able to gather a basic knowledge about R, which willhelp you to comprehend and create financial applications in IE 586 and statistical applications in IE 508courses. In order to comprehend this programming language, it is recommended that you try each andevery step of the applications presented in this document.

You can download the latest version of R from “http://cran.r-project.org/”. For Windows users, clickto the “Windows” link, then the “base” link and you will see the download link for the “*.exe” file.

Have fun!!!

2 R Works with Vectors

2.1 Creating Vectors

In order to assign a value to a specified variable (e.g. 3 to x), we do the following:

x<-3

or

x=3

We will use the operator <- in our future examples for assigning values.When we assign a number to a variable, R counts it as a vector with a single element. So, by using

[.] next to the specified variable, we can assign another element onto any index we want. Finally, if wewant to see what is stored in that specified variable, simply we write its name and press enter.

x[4]<-7.5x # press enter to display the content of x# [1] 3.0 NA NA 7.5

Here, NA stands for “not available”. In fact, we have not assigned any values for the second and thethird indices.

You can use # to add comments on a command line. R will ignore the rest of the line after thissymbol. However, the next line will be executed by R. (So you do not have to close your comment linewith # when it ends)

We can create an consecutive integer vector between two integers by a simple command.

x<-1:8 # creates a consecutive integer vectorx# [1] 1 2 3 4 5 6 7 8y<-15:11y# [1] 15 14 13 12 11

1

Page 2: r Tutorial Ismail

Following operation will add 3 to each element of x and store it as y.

y<-x+3y# [1] 4 5 6 7 8 9 10 11

The previous command actually sums up a vector of length 8 with a single element vector. Here, Rrepeats the short vector again and again until it reaches the length of the long vector. Following sequenceof commands explains this operation clearly.

x<-1:8y<-1:4x# [1] 1 2 3 4 5 6 7 8y# [1] 1 2 3 4x+y # we can see the summation without storing them to any new variable# [1] 2 4 6 8 6 8 10 12

In this summation y is repeated upto index 8 (since x is a vector of length 8). So, the fifth element ofx is summed up with the first element of y, the sixth element of x is summed up with the second elementof y and so on. Yet, we might wonder what would it be if the length of y was not a multiple of the lengthof y. We can try to see it.

x<-1:8y<-1:3x+y# [1] 2 4 6 5 7 9 8 10# Warning message:# In x + y : longer object length is not a multiple of shorter object length

R again repeats the short vector until it reaches the length of the long vector. However, the lastrepetition may not be complete. R returns a warning message about this, yet it executes the operation.You should also know that you can make subtractions, multiplications, divisions, power and modulararithmetic operations with the same sense. We will get into these operations in section 2.4.

We can create vectors also with specified values. As instance, let us create a vector of length 6 withvalues 4, 8, 15, 16, 23, 42 and another vector of length 4 with values 501, 505, 578, 586. We use the functionc in order to combine those values i a vector. We can also learn about the number of elements in a vectorby using length() command.

x<-c(4,8,15,16,23,42) # "c"ombines a series of valuesy<-c(501,505,578,586)x# [1] 4 8 15 16 23 42y# [1] 501 505 578 586z<-c(x,y) # we can also combine two vectorsz# [1] 4 8 15 16 23 42 501 505 578 586

length(z)# [1] 10

We can also revert a vector from the last element to the first.

z<-rev(z) # we can use the same object to reassign that objectz# [1] 586 578 505 501 42 23 16 15 8 4

2

Page 3: r Tutorial Ismail

Suppose we would like to create a vector of length 10, elements of which will all be equal to 5. We dothe following.

x<-rep(5,10) # "rep"eat 5 ten timesx# [1] 5 5 5 5 5 5 5 5 5 5y<-c(3,5,7)z<-rep(y,4) # repeat vector y 4 timesz# [1] 3 5 7 3 5 7 3 5 7 3 5 7

From the previous example, we see that we can also repeat vectors. As a last example for this section,we would like to create a vector of length 21 between values 2 and 3, so that the difference betweenconsecutive elements will all be equal.

x<-seq(2,3,length.out=21) # "seq" stands for sequencex# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00

If we are not interested in the length of the sequence but the step size, we can use by parameterinstead of length.out.

x<-seq(2,3,by=0.05)x# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00

2.2 Logical Expressions

You can use

• < : less

• <=: less or equal

• > : greater

• >=: greater or equal

• ==: equal (do not forget that a single = symbol is used for assigning values)

• !=: not equal

to write logical expressions, so they will return a vector of TRUEs and FALSEs (in other words a vectorof zeroes and ones). In the following sequence of examples, we create a vector and use it in differentlogical expressions. If a vector element satisfies the expression, it returns a TRUE, otherwise a FALSE inthe corresponding index. You can use && as “and” and || as “or” in between logical expressions.

x<-10:20x# [1] 10 11 12 13 14 15 16 17 18 19 20x<17# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSEx<=17# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSEx>14# [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUEx>=14

3

Page 4: r Tutorial Ismail

# [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUEx==16# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSEx!=16# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE

x<-5(x<=10) && (x>=8)# [1] FALSE(x<=10) || (x>=8)# [1] TRUE

So what can we do with these logical expressions. As a first simple example, we have a vector x ofintegers from 1 to 20. We want to obtain a vector such that for every element of x that is less than 8, itwill yield zero values and the other elements will remain the same as they are in x.

x<-1:20y<-(x>=8)*(x)y# [1] 0 0 0 0 0 0 0 8 9 10 11 12 13 14 15 16 17#[18] 18 19 20

As for the second example, we will evaluate the ordering costs of some goods. We can order at least30 and at most 50 units of goods from our supplier in a single order. We have a fixed cost of 50$ if weorder less than or equal to 45 units and 15$ otherwise. A single unit costs 7$ if we order less than 40units and 6.5$ otherwise. If we want to evaluate the total ordering cost for each alternative:

units<-30:50marginalcost<-7*units*(units<40)+6.5*units*(units>=40)marginalcost# [1] 210.0 217.0 224.0 231.0 238.0 245.0 252.0 259.0# [9] 266.0 273.0 260.0 266.5 273.0 279.5 286.0 292.5#[17] 299.0 305.5 312.0 318.5 325.0

fixedcost<-50*(units<=45)+15*(units>45)fixedcost# [1] 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 15#[18] 15 15 15 15

totalcost<-fixedcost+marginalcosttotalcost# [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0# [9] 316.0 323.0 310.0 316.5 323.0 329.5 336.0 342.5#[17] 314.0 320.5 327.0 333.5 340.0

Following from the previous example, say we are not interested in an ordering that costs greater than318$. Under these circumstances, we just want to make a list of the amount of units that we can orderand the list of costs correspond to that amount of units.

units[totalcost<=318]# [1] 30 31 32 33 34 35 36 37 38 40 41 46totalcost[totalcost<=318]# [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0# [9] 316.0 310.0 316.5 314.0

The first of the previous two commands tells to yield the elements of units vector only for which thecorresponding elements of totalcost vector is less than or equal to 318. The second command tells toyield the elements of totalcost vector only which are less than or equal to 318.

4

Page 5: r Tutorial Ismail

Like we did in the previous example, we can extract a subvector (subset of a vector which follows thesame sequence) from a vector with different ways. Check out following examples:

x<-seq(5,8,by=0.3) # we will have 11 elements in this vectorx# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0

y1<-x[3:7] # extract a subvector from the indices 3 to 7y1# [1] 5.6 5.9 6.2 6.5 6.8

y2<-x[2*(1:5)] # extract a subvector from even indicesy2# [1] 5.3 5.9 6.5 7.1 7.7

y3<-x[-1] # extract a subvector by eliminating the first indexy3# [1] 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0

y4<-x[-length(x)] # extract a subvector by eliminating the last indexy4# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7

y5<-x[-seq(1,11,3)] # extract a subvector by eliminating all indices giveny5# [1] 5.3 5.6 6.2 6.5 7.1 7.4 8.0

y6<-x[seq(1,11,3)] # extract a subvector by choosing all indices giveny6# [1] 5.0 5.9 6.8 7.7

2.3 Creating Matrices

Every vector we create with the methods given in section 2.1 and 2.2 are vertical vectors by default. Donot get confused with the display of the vector. We can create a horizontal vector by using the functiont(), where t stands for transpose.

x<-1:5y<-t(x)y# [,1] [,2] [,3] [,4] [,5]# [1,] 1 2 3 4 5

As you can see, R displays a horizontal vector in a completely different way. And if we take thetranspose of vector y, we will see the actual display of a vertical vector.

t(y)# [,1]# [1,] 1# [2,] 2# [3,] 3# [4,] 4# [5,] 5

In order to create an m× n matrix in R, first we need to create a vector (let us name it vec) whichcontains the columns of the matrix sequentially from the first to the last. Then we use the functionsimply matrix(vec,nrow=m,ncol=n).

5

Page 6: r Tutorial Ismail

vec<-1:12x<-matrix(vec,nrow=3,ncol=4)

x# [,1] [,2] [,3] [,4]# [1,] 1 4 7 10# [2,] 2 5 8 11# [3,] 3 6 9 12

t(x)# [,1] [,2] [,3]# [1,] 1 2 3# [2,] 4 5 6# [3,] 7 8 9# [4,] 10 11 12

You can take the inverse of a n× n matrix by using solve() function.

x<-matrix(c(1,2,-1,1,2,1,2,-2,-1),nrow=3,ncol=3)x

[,1] [,2] [,3][1,] 1 1 2[2,] 2 2 -2[3,] -1 1 -1

xinv<-solve(x)xinv# [,1] [,2] [,3]# [1,] 0.0000000 0.25000000 -0.5# [2,] 0.3333333 0.08333333 0.5# [3,] 0.3333333 -0.16666667 0.0

You can create a matrix that has its all elements equal by writing that specific value into the firstparameter poistion in the function matrix(). You can also assign a vector into the diagonal elements ofa square matrix with the function diag().

x<-matrix(0,nrow=4,ncol=4)x# [,1] [,2] [,3] [,4]# [1,] 0 0 0 0# [2,] 0 0 0 0# [3,] 0 0 0 0# [4,] 0 0 0 0

diag(x)<-1 # assigns 1 to all diagonal elements of xx# [,1] [,2] [,3] [,4]# [1,] 1 0 0 0# [2,] 0 1 0 0# [3,] 0 0 1 0# [4,] 0 0 0 1

You can learn about the number of columns, number of rows and the total number of elements in amatrix by following functions.

x<-matrix(0,ncol=5,nrow=4)ncol(x)

6

Page 7: r Tutorial Ismail

# [1] 5nrow(x)# [1] 4length(x)# [1] 20

2.4 Arithmetic Operations on R

We made a little introduction to the arithmetic operations in the section 2.1. We have stated that we cansum and multiply two vectors componentwisely and we can also make subtarction, division and modulararithmetic operations in the same way.

x<-2*(1:5)x# [1] 2 4 6 8 10

y<-1:5y# [1] 1 2 3 4 5

x+y# [1] 3 6 9 12 15

x*y# [1] 2 8 18 32 50

x/y# [1] 2 2 2 2 2

x-y# [1] 1 2 3 4 5

x^2 # makes a power operation# [1] 4 16 36 64 100x^y# [1] 2 16 216 4096 100000

x%%3 # yields mod(3) of every element in x# [1] 2 1 0 2 1

y<-3:7y# [1] 3 4 5 6 7

x%%y # makes a productwise modular operation# [1] 2 0 1 2 3

x%/%y # makes an integer division# [1] 0 1 1 1 1

In the previous example, x and y were vertical vectors. Even if one of them were defined as a horizontalvector, R again would do those operations but this time the results would also be horizontal vectors.

You can find the maximum value with max() and its minimum value with min(). You can sum upall the elements of a vector with sum() and take the product of all the elements of a vector with prod().

x<-c(3,1,6,5,8,10,9,12,3)

7

Page 8: r Tutorial Ismail

min(x)# [1] 1max(x)# [1] 12sum(x)# [1] 57prod(x)# [1] 2332800

You can compare two vectors componentwisely by pmax() and pmin(), so you can either obtain thecomponentwise maximum or the componentwise minimum of two vectors (this will come very handyespecially in option pricing simulation). You can also sort a vector with the function sort() and thefunction order() yields the sequence of indices when sorting a vector (both functions sort values fromminimum to maximum by default, but we can use additional parameter decreasing=TRUE to obtain anorder from maximum to minimum)

x<-1:10y<-10:1z<-c(3,2,1,6,5,4,10,9,8,7)

a<-pmax(x,y,z) # you can write as many vectors as you wanta# [1] 10 9 8 7 6 6 10 9 9 10sort(a)# [1] 6 6 7 8 9 9 9 10 10 10order(a)# [1] 5 6 4 3 2 8 9 1 7 10

b<-pmin(x,y,z)b# [1] 1 2 1 4 5 4 4 3 2 1sort(b,decreasing=TRUE)# [1] 5 4 4 4 3 2 2 1 1 1order(b,decreasing=TRUE)# [1] 5 4 6 7 8 2 9 1 3 10

R can also do matrix multiplications with the operator %*%. This operator should be handled carefullyto obtain correct results. Be sure about the dimensions of your matrices. R is also capable of makingsome corrections if the dimensions of the matrices do not hold.

x<-matrix(1:6,ncol=2,nrow=3)x# [,1] [,2]# [1,] 1 4# [2,] 2 5# [3,] 3 6

y<-matrix(1:4,ncol=2,nrow=2)y# [,1] [,2]# [1,] 1 3# [2,] 2 4

x%*%y# [,1] [,2]# [1,] 9 19

8

Page 9: r Tutorial Ismail

# [2,] 12 26# [3,] 15 33

y%*%x# Error in y %*% x : non-conformable arguments

y%*%t(x) # taking the transpose should help[,1] [,2] [,3]

[1,] 13 17 21[2,] 18 24 30

Consider matrix multiplication of two vertical vectors. R corrects the first vector as a horizontalvector and the operation yields a scalar. If we were to make a matrix multiplication of two horizontalvectors, R would not be able to make any correction about this and would yield an error. To return anouter product, the second vector must strictly be horizontal.

x<-1:3y<-3:1

x%*%y# [,1]# [1,] 10

t(x)%*%t(y)# Error in t(x) %*% t(y) : non-conformable arguments

t(x)%*%y # same as the first operation but we have a correct notation now# [,1]# [1,] 10

x%*%t(y) # only this one returns an outer product# [,1] [,2] [,3]# [1,] 3 2 1# [2,] 6 4 2# [3,] 9 6 3

Given a vector of real values, you can obtain the cumulative sums vector by the function cumsum()and the cumulative products vector by the function cumprod().

x<-c(1,4,5,6,2,12)y<-cumsum(x)y# [1] 1 5 10 16 18 30# every index has the sum of the elements in x upto that index

z<-cumprod(x)z# [1] 1 4 20 120 240 2880# every index has the product of the elements in x upto that index

You can evaluate the factorial of a positive real number with factorial() and absolute value ofany real number with abs(). You can take the square root of positive real number with sqrt(), andthe logarithm of a positive real number with log(). You can compute the exponential function of a realnumber with exp() and the gamma function of a positive real number with gamma(). For integer rounding,floor() yields the largest integer which is less than or equal to the specified value and ceiling() yieldsthe smallest integer which is greater than or equal to the specified value. as.integer() yields only theinteger part of the specified value.

9

Page 10: r Tutorial Ismail

x<-c(1,4,5,6,2,12)factorial(3)# [1] 6factorial(1:6)# [1] 1 2 6 24 120 720

abs(-4)# [1] 4abs(c(-3:3))# [1] 3 2 1 0 1 2 3

sqrt(4)# [1] 2sqrt(1:9)# [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427# [9] 3.000000

log(100) # this is natural logarithm unless any base is defined# [1] 4.60517log10(100) # this is logarithm with base 10# [1] 2log2(100) # this is logarithm with base 2# [1] 6.643856log(100,5) # this is logarithm with base 5, which is the second parameter in log()# [1] 2.861353log(c(10,20,30,40))# [1] 2.302585 2.995732 3.401197 3.688879

exp(4.60517) # must yield 100, maybe with a rounding error# [1] 99.99998exp(log(100)) # no rounding errors# [1] 100exp(seq(-2,2,0.4))# [1] 0.1353353 0.2018965 0.3011942 0.4493290 0.6703200 1.0000000 1.4918247# [8] 2.2255409 3.3201169 4.9530324 7.3890561

gamma(5) # equivalent to factorial(4)# [1] 24gamma(5.5) # equivalent to factorial(4.5)# [1] 52.34278

x<-c(-3,-3.5,4,4.2)floor(x)# [1] -3 -4 4 4ceiling(x)# [1] -3 -3 4 5as.integer(x)# [1] -3 -3 4 4

10

Page 11: r Tutorial Ismail

3 Probability and Statistical Basis of R

3.1 Probability Functions in R

There are four functions related to the distributions which are well-known and commonly used in proba-bility theory and statistics. Let us give the definitions of those functions on normal distribution and thentalk about this probability distributions which are available in R.

• dnorm(x,y,z): returns the pdf (probability distribution function) value of x in a normal distributionwith mean y and standard deviation z.

• pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distributionwith mean y and standard deviation z.

• qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standarddeviation z. Clearly x must be in the closure of the unit interval (x ∈ [0, 1]).

• rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates willfollow a normal distribution with mean y and standard deviation z.

Check out the following examples about normal distribution:

dnorm(0.5) # if no parameter is defined, R assumes a std. normal distribution# [1] 0.3520653dnorm(0,2,1)# [1] 0.05399097dnorm(3,3,5)# [1] 0.07978846

pnorm(0) # the area below the curve# on the left side of "0" in a std. normal distribution

# [1] 0.5pnorm(2)# [1] 0.9772499pnorm(5,3,1)# [1] 0.9772499

# following are the inverse of the previous "pnorm()" functionsqnorm(0.5)# [1] 0qnorm(0.9772499)# [1] 2.000001qnorm(0.9772499,3,1)# [1] 5.000001

rnorm(20,2,1) # will generate 20 RVs which follow normal dist.# with mean 2 and std. dev. 1

# [1] 2.31502453 0.37445729 2.04994863 1.89381118 0.63099383 1.50837615# [7] 0.57363369 2.84601422 2.54003868 3.43652548 0.88941281 3.36373629# [13] 0.58945290 2.44678124 -0.05360271 2.73920472 2.73643684 1.79465998# [19] 1.30906099 2.18648566

Here is a list of useful distributions that are available for computation in R. There are also otherdistributions which are available in R but not in this list. (For each distribution below, you can obtainthe cdf function by changing the initial d to p, the inverse cdf by changing to q and random variategenerator by changing to r). Apart from the normal distribution, please intend to practice and learnabout d,p,q,r functions over the first six distribution in this list.

11

Page 12: r Tutorial Ismail

• dpois(x,y) : returns the pmf (probability mass function) value of x in a poisson distribution withmean (rate) y.

• dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size yand success probability z.

• dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y.

• dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upperbound z.

• dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y.

• dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parametery and a scale parameter z. (If you do not write scale in parameter definition, it assumes z as therate parameter, which is equal to 1/scale)

• dcauchy(x,y,z) : returns the pdf value of x in a cauchy distribution with a location parameter yand scale parameter z.

• dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom yand the non-centrality parameter z.

• dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and thenon-centrality parameter z.

• df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degreesof freedom-2 z and the non-centrality parameter a.

• dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersionparameter y and success probability z.

• dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hypergeometric distri-bution with a white population size y, a black population size z, number of drawings made fromthe whole population a.

• dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y andlog-standard deviation z.

• dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y andshape-2 parameter z.

• dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter yand scale parameter z.

• dweibull(x,y,z) : returns the pdf value of x in a weibull distribution with a shape parameter yand scale parameter z.

3.2 Statistical Functions in R and Analyzing Simulation Output

You can find the mean of a vector with the function mean(), its standard deviation with sd(), its variancewith var(), its median with median(). You can use the function summary() to learn about 25 and 75per cent quantiles (which are called quartiles altogether with the median).

x<-rnorm(1000000,5,2) # x is a vector of 1000000 RVs# which follow a normal dist. with mean 5 and std. dev. 2

mean(x)# [1] 4.997776sd(x)

12

Page 13: r Tutorial Ismail

# [1] 2.000817var(x)# [1] 4.003268median(x)# [1] 4.997408summary(x)# Min. 1st Qu. Median Mean 3rd Qu. Max.# -4.904 3.650 4.997 4.998 6.346 14.420summary(x,digits=6)# Min. 1st Qu. Median Mean 3rd Qu. Max.# -4.90360 3.65020 4.99741 4.99778 6.34564 14.42310quantile(x) # this command yields the quartiles also# 0% 25% 50% 75% 100%# -4.903599 3.650201 4.997408 6.345639 14.423129

# quartiles can also be obtained by the following waysort(x)[1000000*0.25]# [1] 3.650189sort(x)[1000000*0.5]# [1] 4.997408sort(x)[1000000*0.75]# [1] 6.345639

Of course, when you try this sequence of commands, you will get different results since rnorm() willproduce RVs from a different seed.

In Monte Carlo simulation, we gather a vector of random variables xi, i = 1, . . . , n as the output.Since we are interested in the expectation of the output random variable, we estimate this expectation

with the mean of the vector, which is x̄ =n∑

i=1

xi

/n. This estimate is also another random variable,

since when we run the simulation again, we will get a completely different mean. Due to central limittheorem, as n goes to infinity, the difference between the actual value and this estimate follows a normaldistribution with mean 0 and standard deviation:

1n

√√√√ n∑i=1

(xi − x̄)2 =1√n

√√√√ 1n

n∑i=1

(xi − x̄)2 =sd (x)√

n

where sd(x) is the standard deviation of output vector and evaluated with sd() in R.

n<-1000000x<-rexp(n,3) # suppose x is our simulation output vector

# it actually follows exponential distribution with rate 3# but assume we do not know this fact

mean(x) # this is our expectation that we are interested in# we also need the standard deviation of this expectation

# [1] 0.33318sd(x)/sqrt(n) # this is the standard deviation of the mean# [1] 0.0003337626sd(x) # DO NOT CONFUSE this with the std. dev. of the mean.

# this is the std. dev. of the output vector which might not# even follow a normal distribution

# [1] 0.3337626

# Here is an elegant way to summarize your simulation outputxest<-mean(x)

13

Page 14: r Tutorial Ismail

# Since xest follows a normal distribution, we can obtain a %95 confidence interval for iterror<-qnorm(0.975)*sd(x)/sqrt(n) # this is the radius for %95 confidence intervalubound<-xest+errorlbound<-xest-errorres<-c(xest,error,ubound,lbound)names(res)<-c("result","error estimate","%95ub","%95lb")res# result error estimate %95ub %95lb# 0.3331800165 0.0006541626 0.3338341792 0.3325258539

4 Creating Functions and Defining Loops in R

4.1 Creating Functions in R

We use the following structure in order to create a specific function which is not already defined in R.

# f<-function(p1,p2,....) # define necessary parameters for the function# {# use defined parameters (arguments) and other tools to obtain your result# write your result variable into the last line so the function will return it

# }

So the sequence of commands says f is a function with parameters (p1,p2,....) which does theoperations in {}.

Check out the following examples of simple functions to comprehend how to create functions in R.

# EXAMPLE 01# A function that yields the circumference and the area of a circle given the radiuscircle<-function(r # radius length

){cf<-2*pi*r # evaluates the circumferencea<-pi*r^2 # evaluates the enclosed areares<-c(cf,a)names(res)<-c("circumference","area")res

}

circle(3)# circumference area# 18.84956 28.27433circle(1)# circumference area# 6.283185 3.141593

# EXAMPLE 02# A function that yields the perimeter and the area of a triangle# given corner coordinates# Check "www.mathopenref.com/coordtriangleareabox.html" for the explanationtriangle<-function(a, # coordinate of 1st corner (must be a vector of length 2)b, # coordinate of 2nd corner (must be a vector of length 2)c # coordinate of 3rd corner (must be a vector of length 2)

){if(length(a)!=2 || length(b)!=2 || length(c)!=2){print("error, coordinates inappropriate")

14

Page 15: r Tutorial Ismail

}# evaluating the perimeterab<-sqrt((a[1]-b[1])^2+(a[2]-b[2])^2)bc<-sqrt((c[1]-b[1])^2+(c[2]-b[2])^2)ac<-sqrt((a[1]-c[1])^2+(a[2]-c[2])^2)pm<-ab+bc+ac# evaluating the areatrab<-abs((a[1]-b[1])*(a[2]-b[2]))/2trbc<-abs((c[1]-b[1])*(c[2]-b[2]))/2trac<-abs((a[1]-c[1])*(a[2]-c[2]))/2

maxxy<-pmax(a,b,c)minxy<-pmin(a,b,c)

sqa<-min(max((a[1]-minxy[1])*(a[2]-minxy[2]),0),max((maxxy[1]-a[1])*(maxxy[2]-a[2]),0))sqb<-min(max((b[1]-minxy[1])*(b[2]-minxy[2]),0),max((maxxy[1]-b[1])*(maxxy[2]-b[2]),0))sqc<-min(max((c[1]-minxy[1])*(c[2]-minxy[2]),0),max((maxxy[1]-c[1])*(maxxy[2]-c[2]),0))area<-(maxxy[1]-minxy[1])*(maxxy[2]-minxy[2])-trab-trbc-trac-sqa-sqb-sqc

pm<-(area!=0)*pm # if area=0, then there is no triangle

res<-c(pm,area)names(res)<-c("perimeter","area")res

}

coora<-c(23,18)coorb<-c(13,34)coorc<-c(50,5)triangle(coora,coorb,coorc)# perimeter area# 95.84525 151.00000

coora<-c(10,18)coorb<-c(13,34)coorc<-c(50,5)triangle(coora,coorb,coorc)# perimeter area# 105.3489 339.5000

coora<-c(3,5)coorb<-c(9,15)coorc<-c(6,10)triangle(coora,coorb,coorc)# perimeter area# 0 0

Remember the ordering cost problem in section 2.2. We will create a function that yields the outputin case of a change in unit costs and ordering costs. In this function we will also assign default values toinput parameters. So, whenever a parameter is undefined in the function call, R will assume the defaultvalue for this parameter.

# EXAMPLE 03orderingcostlist<-function(huc=7, # higher unit costluc=6.5, # lower unit cost

15

Page 16: r Tutorial Ismail

ucc=40, # minimum order amount with the lower unit costhfc=50, # higher fixed costlfc=15, # lower fixed costfcc=45, # maximum order amount with the higher unit costtcub=318 # total cost upper bound

){units<-30:50marginalcost<-huc*units*(units<ucc)+luc*units*(units>=ucc)fixedcost<-hfc*(units<=fcc)+lfc*(units>fcc)totalcost<-fixedcost+marginalcostres<-totalcost[totalcost<=tcub]names(res)<-units[totalcost<=tcub]res

}

orderingcostlist() # will yield the same results before# 30 31 32 33 34 35 36 37 38 40 41 46# 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0 316.0 310.0 316.5 314.0

orderingcostlist(hfc=55,luc=6.3) # we just change two parameter values# 30 31 32 33 34 35 36 37 40 41 46 47 48# 265.0 272.0 279.0 286.0 293.0 300.0 307.0 314.0 307.0 313.3 304.8 311.1 317.4

In order to see the construction of an if-else statement in R, We will implement following function asa last example.

f (x) =

x2 x < −2x + 6 −2 ≤ x < 0−x + 6 0 ≤ x < 4√

x x ≥ 4

# EXAMPLE 04f<-function(x){if(x<(-2)){x^2

}else if(x<0){x+6

}else if(x<4){-x+6

}else{sqrt(x)

}}

c(f(-4),f(-1),f(3),f(9))# [1] 16 5 3 3

Note that you can also use predefined functions (R functions) as parameters. You will see an exampleof this in section 4.2.

4.2 Defining Loops in R

A basic structure for a predefined number of loops, we use the following structure:

# for(i in x){ # as i gets sequential values from vector x in each loop

16

Page 17: r Tutorial Ismail

# do required operations depending on i variable# }

You can do every vectoral operation with a for-loop. But in R, it takes longer to execute loops. Thus,it is better to use vectoral operations when possible. The following example estimates the expectationfor the maximum of two standard uniform random variates, Y = max(U1, U2), which is actually equalto 2/3. We will not use pmax() function. Instead, we will define a for-loop. Now, this is our first MonteCarlo simulation in this paper.

simmax2unif<-function(n){y<-0

# in order to record the output of our simulation in "res"# we should define it before the for-loopfor(i in 1:n){ # i will take integer values from 1 to nu1<-runif(1)u2<-runif(1)y[i]<-max(u1,u2) # record the estimate as the "i"th entry}res<-mean(y)res[2]<-qnorm(0.975)*sd(y)/sqrt(n)names(res)<-c("expectation","error estimate")res

}

simmax2unif(100000)# expectation error estimate# 0.665354266 0.001463458system.time(x<-simmax2unif(100000)) # execution time in seconds# user system elapsed# 35.30 0.08 35.43

# Do the same simulation with pmax()simmax2unif_2<-function(n){u1<-runif(n)u2<-runif(n)y<-pmax(u1,u2)res<-mean(y)res[2]<-qnorm(0.975)*sd(y)/sqrt(n)names(res)<-c("expectation","error estimate")res

}

simmax2unif_2(1000000)# expectation error estimate# 0.6665182787 0.0004621282system.time(x<-simmax2unif_2(100000)) # execution time in seconds# user system elapsed# 0.03 0.00 0.03

As you can see, vectoral operations work way much faster than loops. Still, under some circumstances,loops might be the only option to make a computation.

While-loops are useful espacially for the convergence algorithms. For undefined number of loops, weuse a while-loop, which is defined as follows:

# while(condition){ # as long as the condition is satisfied, run the loop# do required operations

# }

17

Page 18: r Tutorial Ismail

Here is a basic root finding algorithm that uses a while-loop:

# a root finding algorithm# finds the unique real root of a continuous function in an interval# the function should intersect with x-axis and should not be a tangent to x-axisfindroot<-function(f, # continuous function that we will solve for zerointerval, # the interval where we have a single solution (a vector of length 2)errbound=1e-12, # maximum approximation errortrace=FALSE # if trace is true, print the covergent sequence

){a<-interval[1]b<-interval[2]if(f(a)*f(b)>0){print("error - no solution or more than one solution")

}else{counter<-0res<-0err<-abs(a-b)while(err>errbound){c<-(a+b)/2fc<-f(c)if(f(a)*fc>0){a<-c

}else{b<-c

}err<-abs(a-b)counter<-counter+1res[counter]<-a

}print(c(a,counter))if(trace){print(res)

}}

}

func<-function(x){x^2-2}int<-c(1,2)findroot(func,int)# [1] 1.414214 40.000000findroot(func,int,trace=TRUE)# [1] 1.414214 40.000000# [1] 1.000000 1.250000 1.375000 1.375000 1.406250 1.406250 1.414062 1.414062# [9] 1.414062 1.414062 1.414062 1.414062 1.414185 1.414185 1.414185 1.414200# [17] 1.414207 1.414211 1.414213 1.414213 1.414213 1.414213 1.414214 1.414214# [25] 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214# [33] 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214

5 Drawing Plot Diagrams and Histograms in R

We would like to draw a plot diagram for the density function of standard normal distirbution in theinterval (-4,4). We should create a dense vector in the x-axis (it should be dense in order to make a good

18

Page 19: r Tutorial Ismail

approximation), and evaluate their function responses as a second vector.

x<-seq(-4,4,length.out=51) # this is not enoughly densey<-dnorm(x)plot(x,y) # plots with blank dots (figure 1)

windows() # you can use this command to display your diagram in a new windowplot(x,y,type="l") # connects the same dots (figure 2)

x<-seq(-4,4,length.out=10001) # this is a dense vectory<-dnorm(x)windows()plot(x,y,type="l") # connects more dense dots (figure 3)# diagrams are in the next page

Now, we want to see how to draw a histogram of a vector in R with hist(). Histograms are quitepretty tools to see the distribution of a given data. You can obtain a better histogram by changing breakparameter.

x<-rnorm(1000000,3,1.5)# a vector of normal RVs with mean 3 and std. dev. 1.5

hist(x)

windows()hist(x,breaks=50)

windows()hist(x,breaks=100)# histograms are in the next page

You can also add new lines and functions to a plot diagram or a histogram which is already displayed.We use lines() command with a similar use of plot() command. This time, there is no necessity foradding a type parameter. You can also add lines to existing diagrams with abline() command. Checkout following examples.

hist(x,breaks=100)y<-seq(-5,10,length.out=100001)lines(y,dnorm(y,3,1.5)*200000)

y<-seq(-5,10,length.out=101)windows()plot(y,dnorm(y,3,1.5))lines(y,dnorm(y,3,1.5))

windows()plot(y,dnorm(y,3,1.5),type="l")abline(v=4.5) # add a "v"ertical line on x=4.5abline(v=1.5) # add a "v"ertical line on x=1.5abline(h=dnorm(1.5,3,1.5)) # add a "h"orizontal line on y=dnorm(1.5,3,1.5)abline(a=0.10,b=0.01) # add a line with slope=0.01 and intercept=0.10# diagrams are in the next page

19

Page 20: r Tutorial Ismail

Figure 1: Plot diagrams for the density function of standard normal distribution

Figure 2: Histograms of a vector of normal RVs with mean 3 and std. dev. 1.5

Figure 3: Adding lines on existing diagrams with lines() (1-2) and abline() (3) commands

20

Page 21: r Tutorial Ismail

6 Basic User Information

6.1 Scaning and Printing Data

Assume that you have a data (containing real numbers) written in a text file in the following format.

3 25 94.9 12547 32556 5689 567435 342.176.5 983.20 343# There are 15 real values

You can use the command scan() in order to store this data in a vector by scanning it from left toright and top to down. Spaces and new lines will separate the values to store them in new indices.

x<-scan()# press enter after writing this line, it will display "1:" on the command line# Press CTRL+V to paste the copied data, 15 real values will be stored in x# it will display "16:" on the command line# Press enter in order to finish scanning process, 16th index will be ignored

# 1: 3 25 94.9 12# 5: 547 32556 56# 8: 89 567# 10: 435 342.1# 12: 76.5 983.2# 14: 0 343# 16:# Read 15 items

x# [1] 3.0 25.0 94.9 12.0 547.0 32556.0 56.0 89.0 567.0# [10] 435.0 342.1 76.5 983.2 0.0 343.0

You can also scan a column of cells from an Excel sheet, but not rows. Be careful that the decimalseparator is (.) in R. So you can only scan values that uses (.) as the decimal separator.

You can also read tables from a text file. Assume you have a text file containing a data similar to thefollowing format:

length weight age1.72 72.3 251.69 85.3 231.80 75.0 261.61 66 23

1.73 69 24# 3 values in each row

Right click to the R shortcut on your desktop. Choose properties and see your “Start In” directory(you can also change it). Copy your text file and paste it in that directory. Suppose it is named data.txt.Write the following command:

x<-read.table(file="data.txt",header=TRUE)# if you do not have any headers in your data, choose header as FALSEx # press enter to display x table# length weight age# 1 1.72 72.3 25

21

Page 22: r Tutorial Ismail

# 2 1.69 85.3 23# 3 1.80 75.0 26# 4 1.61 66.0 23# 5 1.73 69.0 24x$length# [1] 1.72 1.69 1.80 1.61 1.73x$weight# [1] 72.3 85.3 75.0 66.0 69.0x$age# [1] 25 23 26 23 24

In order to read tables from Excel sheet, you can just copy and paste it to a text file, so you can readthe table from that file.

You can print a comment or a vector within a function by using print() command. To print acomment, do not forget to put it in a quotation.

print("error")# [1] "error"x<-1:5print(x)# [1] 1 2 3 4 5

6.2 Session Management

You can find detailed information about the functions which came with R. You can learn about theparameters (arguments) that are available within the function and a few examples about the function.Just write ? and the name of the function that you want to learn information about. Check out theexplanations given in R about following functions.

?det?sample?sin?cbind

You can use apropos(".") to find a list of all functions that contains a specific word. These functionscan be given with the default library or can be defined by you in that work session.

apropos("norm")# [1] "dlnorm" "dnorm" "normalizePath" "plnorm"# [5] "pnorm" "qlnorm" "qnorm" "qqnorm"# [9] "qqnorm.default" "rlnorm" "rnorm"

apropos("exp")# [1] ".__C__expression" ".expand_R_libs_env_var" ".Export"# [4] ".mergeExportMethods" ".standard_regexps" "as.expression"# [7] "as.expression.default" "char.expand" "dexp"# [10] "exp" "expand.grid" "expand.model.frame"# [13] "expm1" "expression" "getExportedValue"# [16] "getNamespaceExports" "gregexpr" "is.expression"# [19] "namespaceExport" "path.expand" "pexp"# [22] "qexp" "regexpr" "rexp"# [25] "SSbiexp" "USPersonalExpenditure"

If you need to see all the objects that you have created in your work session, simply write objects().

objects()# [1] "a" "b" "circle" "coora"

22

Page 23: r Tutorial Ismail

# [5] "coorb" "coorc" "error" "f"# [9] "findroot" "fixedcost" "func" "int"# [13] "lbound" "marginalcost" "n" "orderingcostlist"# [17] "res" "simmax2unif" "simmax2unif_2" "totalcost"# [21] "triangle" "ubound" "units" "vec"# [25] "x" "xest" "xinv" "y"# [29] "y1" "y2" "y3" "y4"# [33] "y5" "y6" "z"

7 Exercises

1. Write an R function that takes

• your initial capital K = 100,

• continuously compounding interest rate r = 0.12,

• a vector ty of times (yearly)

as parameters (arguments). The function should yield the state of your capital at the end of timeperiods given in ty vector. The function should also yield a plot diagram which shows the stateof the capital on y-axis and time on x-axis.

Note: The state of the capital at time t is evaluated by K(t) = K × ert.

(a) Run the function in order to show your capital state at the end of every year until the end ofthe 10th year.

(b) Run the function in order to show your capital state at the end of every month until the endof the 3rd year.

2. We are interested in finding π with Monte Carlo simulation.

Hint: Generate two vectors of standard uniform RVs of length n = 10000. Coupling them will yielduniformly distributed points on [0, 1] × [0, 1]. Count the number of points which fall into the unitcircle, say it is x. Now, 4× x/n should yield an estimate for π. Repeat the procedure nout = 100times.

Note: You can find interesting information about π in the link:

http://www.sixtysymbols.com/videos/035.htm

3. James and Dwight are flipping a coin which has a head probability p. James scores 1 point wheneverhead comes and Dwight scores 1 point whenever tail comes. The game ends whenever somebodygets 10 points ahead. We are interested in the number of coin flips that James and Dwight shouldrealize in order to claim a winner. Use Monte Carlo simulation to solve the following questions.

(a) What is the expected number coin flips if p = 0.4? Draw a histogram of the simulation output.

(b) What is the expected number coin flips if p = 0.5? Draw a histogram of the simulation output.

(c) What is the expected number coin flips if p = 0.55 Draw a histogram of the simulation output.

Note: The difference between the scores at ith coin flip D(i) = j(i)− d(i) is called a random walkprocess. The problem can be solved analytically with a Markov Chain structure, however we areinterested in a solution with Monte Carlo simulation.

Hint 1 (primitive method): Create a for-loop of length n that stores the number of games theyhave to play to claim a winner (in result vector).

In each for-loop, run a while-loop that generates a standart uniform random variate to identify thewinner and adds one point to his score. Before closing while-loop, evaluate the absolute differencebetween scores (that will decide to break out from while-loop or not). Also put a counter into the

23

Page 24: r Tutorial Ismail

while-loop, in order to find the number of coin flips. Store this number in the corresponding indexof your result vector just when you break out from the while-loop.

Hint 2 (a more fast and professional way): Generate a vector of standard uniform RVs (ofa length k). Identify winners for each round with a single uniform RV. Use a trick with cumsum()(How?). If a sequence of k rounds is not enough to make 10 points absolute difference, add anotherk uniform RVs at the end of the previous uniform RVs vector. Go on untill you get a 10 pointsdifference. Then, find the first index that yields a 10 point absolute difference.

Repeat the whole procedure n times.

24