introducing. what is and what can i do with it? r is freely available software for windows, mac os...

65
Introducing

Upload: kenyon-ledford

Post on 31-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Introducing

Page 2: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

What is and what can I do with it?

R is freely available software for Windows, Mac OS and LinuxTo download R in New Zealand: http://cran.stat.auckland.ac.nz/

What is R? A very simple programming language

A place for you to input dataA collection of tools for you to perform calculations

A tool for producing graphicsA statistics suite that can be downloaded on to any PC, Mac or Linux system

A software package that can run on high performance computing clusters

Page 3: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

What is and what can I do with it?With R you can:

Perform simple or advanced statistical tests and analysese.g. standard deviation, t-test, principal component analysis

Read and manipulate data from existing filese.g. tables in Excel files, trees in nexus files, data on websites

Write data or figures to filese.g. export a figure to .pdf, export a .csv file

Produce simple or advanced figures

Page 4: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

What is and what can I do with it?

http://dx.doi.org/10.1098/rspb.2014.0806

Figure 2. A reconstruction of the evolutionary history of carotenoid pigmentation in feathers. The likelihood that ancestors could display carotenoid feather pigments has been reconstructed using ‘hidden’ transition rates in three rate categories (AIC = 4002.5, 11 transition rates) [33]. The POEs (defined in Material and methods) for carotenoid feather pigmentation are identified by red circles. Branches are coloured according to the proportional likelihood of carotenoid-consistent colours at the preceding node. Solid purple points indicate species for which carotenoid feather pigments were confirmed present from chemical analysis; open black points represent those for which where carotenoids were not detected in feathers after chemical analysis. Supertree phylogeny from [21].

Page 5: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Who is this guide for?

Starting at ground level and shaping you into a confident R user

Are you…Completely new to R?

An infrequent R user who wants a refresher?

The material in these slides may not be useful for confident R users.

An Introduction to RW. N. Venables, D. M. Smith and the R Core Teamhttp://cran.r-project.org/doc/manuals/R-intro.pdf

Page 6: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

What does this guide cover?

Part zero: Getting startedInteracting with R

Part one: ObjectsVectors, Matrices, Character arrays

Part two: Data manipulationAnalysing data, T-test

Part three: External dataReading data into R, ANOVA

Part four: Packages and librariesInstalling new packages into R

Part five: ScriptsUsing pre-written code

Part six: Logic (programming)Other functions in R

Page 7: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Starting

This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7. For Mac OS, R can be executed from terminal. For Unix, seek professional help…

The only point of difference should be the initial starting of R and the visual appearance: Console commands will be the same for all operating systems.

Page 8: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part zero: Getting started#Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a comment or instruction

#Start R by finding the R application on your computer#You will be presented with the R console

Page 9: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part zero: Getting started#There are a variety of ways of using R, and we will start out with the most basic#We are going to enter lines of code into R by typing or pasting them into the R console

#At its most basic, R is just a calculator

> 1+1[1] 2> 1*3[1] 3> 4-7[1] -3> 20/4[1] 5>

#The lines above this have come from the R Console. Remember to remove the > symbol if you copy text directly from these slides and paste it into R

Page 10: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part zero: Getting started

#Some more basic mathematical operations in R

> 12--2[1] 14> 2^2[1] 4> sqrt(9)[1] 3> 4*(1+2)[1] 12

Page 11: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part zero: Exercise#Use R to find the length of the hypotenuse in the triangle shown below

#Side a has length 3, Side b has length 4, and the hypotenuse has length h

h2=a2+b2

h= √(a2+b2)

3

4

h

Page 12: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part zero: Exercise#Use R to find the length of the hypotenuse in the triangle shown below

> sqrt(3^2+4^2)[1] 5

3

4

h

Page 13: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects#R is more than just a basic calculator…#Most operations in R will use objects, which are values stored in R

#Type x=1 into the R console#You have now input a number into R by storing that number as an object. For this example, the name of our object is x#Objects must be named using letters alone, or letters followed by other symbols#Object names cannot include spaces

> x=1>

#Congratulations, you have just programmed R to store an object.

#Type x into the R console to recall your object> x[1] 1>

Page 14: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#We will now replace the value of x with 10

> x[1] 1> x=10> x[1] 10>

#As you can see, the value of an object can be easily replaced by simply making the object equal to a new value

Page 15: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#Let’s make y into a vector - a one dimensional array

#There are several ways of making a vector in R. These methods introduce functions.#A function is an operation performed on numbers and/or objects.

#The two easiest ways of making a vector in R use different functions:

#Use the concatenate function c and place numbers inside parentheses> y=c(10,11,12,13,14,15,16,17,18,19,20)> y [1] 10 11 12 13 14 15 16 17 18 19 20

#Use the array function and place numbers inside parentheses> y=array(10:20)> y [1] 10 11 12 13 14 15 16 17 18 19 20

Page 16: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#Just as we replaced x with a single value, we can also replace a single value within our vector

#Let’s replace the fifth number in our vector with 0

> y [1] 10 11 12 13 14 15 16 17 18 19 20> y[5]=0> y [1] 10 11 12 13 0 15 16 17 18 19 20>

#Square brackets [] placed after a vector will instruct R that we are interested in only a part of the vector. In the example above, we are referring to the fifth position in the vector

Page 17: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#Try these vector manipulations as well:

> y[1]=y[2]> y [1] 11 11 12 13 0 15 16 17 18 19 20>

#The value of the first position was changed to be the same as the value in the second position

> y[c(1,3,5)]=5> y [1] 5 11 5 13 5 15 16 17 18 19 20>

#The values in the first, third and fifth positions were made equal to 5

Page 18: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#Onward! We will make a new object, a two-dimensional matrix, and call it z

#Our matrix will have ten rows and ten columns, and we will start out by filling all the cells with 0

> z=matrix(0,ncol=10,nrow=10)> z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0[10,] 0 0 0 0 0 0 0 0 0 0>

Page 19: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects#We can replace parts of our matrix, like we did with our vector> z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0[10,] 0 0 0 0 0 0 0 0 0 0> z[1,3]=33> z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 33 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0[10,] 0 0 0 0 0 0 0 0 0 0

#Here, the two numbers inside the square brackets are a coordinate for the matrix: first row, third column

Page 20: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects#We can replace an entire row by not providing a column coordinate> z[1,]=33> z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 33 33 33 33 33 33 33 33 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0[10,] 0 0 0 0 0 0 0 0 0 0>

#Likewise, we can replace an entire column> z[,3]=c(1:10)> z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 1 33 33 33 33 33 33 33 [2,] 0 0 2 0 0 0 0 0 0 0 [3,] 0 0 3 0 0 0 0 0 0 0 [4,] 0 0 4 0 0 0 0 0 0 0 [5,] 0 0 5 0 0 0 0 0 0 0 [6,] 0 0 6 0 0 0 0 0 0 0 [7,] 0 0 7 0 0 0 0 0 0 0 [8,] 0 0 8 0 0 0 0 0 0 0 [9,] 0 0 9 0 0 0 0 0 0 0[10,] 0 0 10 0 0 0 0 0 0 0>

Page 21: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#Lastly, we will make a character array, which is like a vector or a matrix except that it can hold numbers and letters

> w=matrix("df",ncol=10,nrow=10)> w [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [2,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [3,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [4,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [5,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [6,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [7,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [8,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [9,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [10,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" >

#So, this covers the basics of creating objects for storing data in R.

Page 22: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Objects

#Let’s clean out the objects that we made in Part One

> ls()[1] "w" "x" "y" "z">

#The list objects command ls() will show us which objects are stored in R#We can permanently remove a specific object with the rm() function

> rm(x)> ls()[1] "w" "y" "z">

#We can also remove all objects

> rm(list = ls())> ls()> character(0)

Page 23: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Exercise#Make a new matrix object with three columns and seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object.

#Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object.

#Replace the fourth row of your matrix with your vector.

Page 24: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part one: Exercise#Make a new matrix object with three columns, seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object.> daniel=matrix(9,ncol=3,nrow=7)> daniel [,1] [,2] [,3][1,] 9 9 9[2,] 9 9 9[3,] 9 9 9[4,] 9 9 9[5,] 9 9 9[6,] 9 9 9[7,] 9 9 9

#Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object.> thomas=c(101,898,-3)> thomas[1] 101 898 -3

#Replace the fourth row of your matrix with your vector.> daniel[4,]=thomas> daniel [,1] [,2] [,3][1,] 9 9 9[2,] 9 9 9[3,] 9 9 9[4,] 101 898 -3[5,] 9 9 9[6,] 9 9 9[7,] 9 9 9

Page 25: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

HELP!

#You can call on the help function if you become lost or unstuck when using R

#Can’t remember how to make a matrix?

> ?matrix>

Page 26: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation

#This will be a worked example for a Student’s T-test for the means of two samples, showcasing the storage and analysis of data in R

Page 27: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation

#Make x a vector containing 1000 random numbers

> set.seed(1)> x=rnorm(1000)

#Make y a vector containing 1000 random numbers

> set.seed(100)> y=rnorm(1000)

#The random numbers in R are not truly random, they are simply drawn from a pool of data that has many characteristics of random data. Using the set.seed function, we can define a set of ‘random’ numbers for use in our calculations. This will mean that we should all get the same results from our ‘random’ numbers’

#We will use Student’s T-test to see if the mean of x and mean of y are significantly different

Page 28: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation

#What are the assumptions for a T-test?

#1) That the two samples (x and y) are each normally distributed#2) That the two samples have the same variance#3) That the two samples are independent

#These are calculated data so we will assume that 3) is true.

#We should test 1) and 2) if we want our T-test results to be meaningful!

Page 29: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation#We will use the Shapiro-Wilk1 test to see if the data are normally distributed

#The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that the data are normal

#We would reject the null hypothesis for our sample if we received a p-value of <0.05

#To perform a Shapiro-Wilk test in R we use the shapiro.test function

> shapiro.test(x)

Shapiro-Wilk normality test

data: xW = 0.9988, p-value = 0.7256> > shapiro.test(y)

Shapiro-Wilk normality test

data: yW = 0.9993, p-value = 0.9765

1Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591–611

Page 30: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation#We will use an F-test1 to see if x and y have equal variances

#The null hypothesis of this F-test is that the two datasets have equal variances, and this hypothesis is rejected if the p-value is <0.05

#We calculate an F-test for equal variances in R using the var.test function

> var.test(x,y)

F test to compare two variances

data: x and yF = 1.0084, num df = 999, denom df = 999, p-value = 0.8947alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval: 0.890733 1.141648sample estimates:ratio of variances 1.008417

1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

Page 31: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation

#Are your x and y normally distributed? (hint… mine are)#Do your x and y have equal variances? (hint… mine do)

Page 32: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Data manipulation

#Let’s perform the Student’s T-test and see if the mean of x and the mean of y are significantly different

#We will use a simple form of the t.test function. This test requires three pieces of information: x, y, and information about equal variance

> t.test(x,y,var.equal=TRUE)

Two Sample t-test

data: x and yt = -0.6161, df = 1998, p-value = 0.5379alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.11903134 0.06212487sample estimates: mean of x mean of y -0.01164814 0.01680509

#The null hypothesis for this test is that x and y have the same mean value. The significance level was set at 0.95, so the rejection criteria would be a p-value less than 0.05. Did we reject the null hypothesis?

Page 33: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Exercise#Generate vector objects a and b as below

> set.seed(10)> a=rnorm(1000,sd=2)> set.seed(50)> b=rnorm(1000,sd=1)

#Is the mean of a significantly different from the mean of b? Is it appropriate to use a Student’s T-test to address this question?

Page 34: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part two: Exercise> shapiro.test(a)

Shapiro-Wilk normality test

data: aW = 0.9979, p-value = 0.2538

> shapiro.test(b)

Shapiro-Wilk normality test

data: bW = 0.9978, p-value = 0.2242

> var.test(a,b)

F test to compare two variances

data: a and bF = 3.7431, num df = 999, denom df = 999, p-value < 2.2e-16alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval: 3.306307 4.237678sample estimates:ratio of variances 3.743136

> t.test(a,b,var.equal=F)

Welch Two Sample t-test

data: a and bt = 0.3949, df = 1497.218, p-value = 0.693alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.1106290 0.1663946sample estimates: mean of x mean of y 0.022749483 -0.005133326

>

Is the mean of a different from the mean of b?

p-value = 0.693

Fail to reject the null hypothesis that the means are different.

Page 35: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

Country

#Datasets can often be too large to type into R. This section of the guide will show you how to automatically read data into R and then perform an analysis

#For this test we will perform a one-way analysis of variance (ANOVA)

#Right click on the dataset embedded above the arrow , move the mouse to ‘Macro- Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma separated values file) to a folder on your computer

#The dataset contains a count of endangered species for sixty randomly selected countries in three different regions. These data have been extracted from Table 6a of the IUCN Red List summary statistics: http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf

Page 36: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#We are going to use a one-way ANOVA to see if the mean number of endangered species is different in different regions (AFRICA, ASIA and EUROPE).

#First step: we will now tell R where to look for the file, using the setwd()function

> setwd("H:/Projects/Teaching/R")

#Hint: your working directory will be different to mine#Note: we use forwardslashes / and not backslashes \

#Second step: we read the file into R as a new object called IUCN. The term sep="," is used because values in the dataset are separated by commas. The term header=T is used because the first row of the IUCN table contains column names

> IUCN=read.table("IUCN.csv",sep=",",header=T)

#Alternatively, if we know the full file path, then we could read the file into R without using setwd()

> IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T)

Page 37: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#What are the assumptions for a one-way ANOVA?

#1) That the data in each group have been randomly selected from a normal distribution #2) That each group of data have the same variance #3) That each group of data is independent

#Assumption 3) may be unlikely but we will assume it is true.

#We should test 1) and 2) if we want our ANOVA results to be meaningful!

Page 38: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data#We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and EUROPE) and are normally distributed

#First though, we will separate out the data for each region so that we can test for normality separately> af=IUCN[which(IUCN[,2]=="AFRICA"),3]

#Let’s take a closer look:IUCN[,2]calls up the second column of the IUCN object

#The which() function is asking ‘which of the values in column 2 of the IUCN object contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa row values.

#Now we can use the Africa row values to find the number of Endangered species for each African country. These species counts are stored in column 3 of the IUCN object. IUCN[which(IUCN[,2]=="AFRICA"),3]

#Now we store the endangered species counts for African countries as the af object af=IUCN[which(IUCN[,2]=="AFRICA"),3]

Page 39: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#Repeat for ASIA and EUROPE

> ai=IUCN[which(IUCN[,2]=="ASIA"),3]> eu=IUCN[which(IUCN[,2]=="EUROPE"),3]

Page 40: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data#We will use a Bartlett Test of Homogeneity of Variances1 to test if variance is equal across our three groups (AFRICA, ASIA, EUROPE).

#The function for the Bartlett test is simply Bartlett.test(). The terms for this function will be the Endangered species column of the IUCN object, and the Region column of the IUCN object. Column 3 and column 2 respectively.

#A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the groups have equal variances.

#We would reject the null hypothesis for our dataset if we received a p-value of <0.05.

> bartlett.test(IUCN[,3]~IUCN[,2])

Bartlett test of homogeneity of variances

data: IUCN[, 3] by IUCN[, 2]Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988

11Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

Page 41: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#Here we reject the null hypothesis – at least Region has a variance that is not equal to the variance of another Region in the dataset. #Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed however. #The ANOVA test is robust to violations of this second assumption. This means that it can still produce meaningful results even if the groups do not have equal variances. As a rule of thumb, we can proceed if the maximum variance of our groups is less than 4 times greater than the minimum variance of our groups. > var(af)[1] 25.07692> var(ai)[1] 9.002849> var(eu)[1] 7.464387>

#The variance of the number of endangered species in Africa is substantially greater than the other two variance values. However, the Africa group variance is less than 4 time the variance of the Europe group> var(eu)<4*var(af)[1] TRUE

#So, we will proceed, but we need to be aware that with unequal variances is will be tougher for an analysis of variance to find a significant result.

Page 42: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#Perform the one-way ANOVA using the aov() function with the following syntax, and store the results as an object called IUCN_ANOVA

> IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN)

#You can see the ANOVA results by calling up the IUCN_ANOVA object

> IUCN_ANOVACall: aov(formula = Endangered_species ~ Region, data = IUCN)

Terms: Region ResidualsSum of Squares 703.284 1080.148Deg. of Freedom 2 78

Residual standard error: 3.721297Estimated effects may be unbalanced>

Page 43: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#Use the summary() function to find out more about the ANOVA

> summary(IUCN_ANOVA) Df Sum Sq Mean Sq F value Pr(>F) Region 2 703.3 351.6 25.39 3.21e-09 ***Residuals 78 1080.1 13.8 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1>

#Interpretation: How do we read this table to find out if the mean number of endangered species is different in different regions?

#The null hypothesis for this test is that the mean number of endangered species is the same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F)) is less than the significance level for this test (i.e. <0.05). So, we reject the null hypothesis, and conclude that the mean number of endangered species is significantly different between regions.

Page 44: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#Are the number of endangered animals different between all regions, or just different for one region? To find out we will use Tukey’s Honest Significant Difference test.

#The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following syntax

> TukeyHSD(IUCN_ANOVA,"Region") Tukey multiple comparisons of means 95% family-wise confidence level

Fit: aov(formula = Endangered_species ~ Region, data = IUCN)

$Region diff lwr upr p adjASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684

#Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with a p adj value <0.05 had a significantly different number of endangered species.

Page 45: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part three: External data

#Bonus: Let’s plot our IUCN data to better visualise these results

> boxplot(Endangered_species~Region,data=IUCN)

mean

upper quartile

lower quartile

minimum(excl. outliers)

maximum(excl. outliers)

AFRICA ASIA EUROPE

51

01

52

02

5

AFRICA ASIA EUROPE

51

01

52

02

5 Outlier

Page 46: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part Three: Exercise#Plotting basics

#To quickly generate a plot in R using only default options, simply use the plot() function.

> plot(af)> #There are many variables that you change to improve the look of your plots

plot(af,xlab="Country",main="Africa",col=rainbow(100),pch=16,ylab="Endangered species (number)",cex=2,font=6)

barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AFRICA"),1],las=2,ylab="Endangered species (count)",main="Africa")

#Use ?plot and ?barplot to learn about the variables you can change when plotting data

Page 47: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

#You have been using some of the basic functions that are packaged with R, and you have been either generating or importing datasets

#Anyone can write a new function in R though, or make a dataset, and these functions and datasets can be bundled together into a package

#R is modular, which means you can download and install new packages to give you access to new functions and/or datasets

#There is an automatic and a manual method for installing packages. This guide will teach you how to manually install packages in R

#Why the manual method you ask? Because R requires internet access to download packages, which can be complicated by a University proxy. I can’t guarantee that the proxy won’t be an issue. That’s why. Well that, and it will be good for you.

Page 48: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

#This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’ package, first written by Emmanuel Paradis in 2008

#The abbreviation for this package is ape

Page 49: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

#Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.html into the address bar – go to the website. The page should be mostly black text on a white background.

#Find the Downloads section towards the bottom of the website.

#For mac users: download the Mac OS X binary (ape_3.1-4.tgz)#For PC users: download the Windows binary (ape_3.1-4.zip)#For UNIX users: again, seek professional help

#Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find#Note to future users: the file name may be slightly different if Paradis has updated ape

Page 50: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

#Run R

#Use the install.packages function with the following syntax to install the ape package

> install.packages("H:/Teaching/ape_3.1-4.zip")

#Remember to replace my file path “H:/Teaching/” with the file path of the folder where you downloaded the ape package

#You should see text like this appear after you enter the install.packages commandInstalling package into ‘C:/Documents/R/win-library/3.1’(as ‘lib’ is unspecified)inferring 'repos = NULL' from the file namepackage ‘ape’ successfully unpacked and MD5 sums checked

#Congratulations, you have now added functions and datasets written by Emmanuel Paradis to your own copy of R

Page 51: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

#You only need to install a package into R once. The package is now available as a ‘library’. If you want to use the ape library in your current R session, then you need to load the library into R

> library(ape)>

#So, you install a package once, and load a library many times (every time you run R and want to use the library)

#The ape library is now available for youto use. Ape is a library of datasets and tools that have been designed around phylogenetic analyses. We quickly will explore some of the data and functions in ape:

> data(bird.orders)

#The data function loads a dataset into R. Here we have loaded the bird orders dataset that is part of the ape library

Page 52: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

> plot(bird.orders)

#The plot function detects that bird.orders is a special type of object – it is a ‘phylo’ class of object. This type of object is a different object class from the vectors, matrices and data frames that we have been working with#The ape library has a special plot function for plotting ‘phylo’ objects. This special plot function replaced the normal plot function when we tried to plot bird.orders.#Don’t worry! All of this happened automatically because we installed the ape package

Page 53: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part four: Packages and libraries

#Test: Use the ? (help) function for plot.phylo to learn how to plot the bird.orders dataset as a fan, as below

> ?plot.phylo

Page 54: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part Three: Exercise

#Download, install and load two packages: ggplot2 and labeling

#Get the packages using Google ‘r ggplot2 cran’ and ‘r labeling cran’ or use the links below

http://cran.r-project.org/web/packages/labeling/index.htmlhttp://cran.r-project.org/web/packages/ggplot2/index.html

#Use the new data and functions provided by these packages to plot the density of diamonds against their weight (carat).

> qplot(carat, data = diamonds, geom = "density", colour = color)>

#For more information on ggplot see http://ggplot2.org/book/qplot.pdf

Page 55: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part five: Scripts

#One of the best features of R is the ability to automatically carry out many commands, one after another. For this type of operation we would first write all of our commands into a script, and then enter the entire script into R in one action

#We are going to use previously scripted code for this section of the guide. Our script will generate, analyse and plot some data.

#Go ahead and open this embedded text file by right clicking on it and clicking ‘Packager Shell Object Object’ ‘Activate Contents’

Part four script.txt

#Copy the entire contents of this notepad document and paste it all into R

#Now, read through the notepad document to find out what has taken place

Page 56: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part six: Logic (programming)

#There are many functions in R that do more than just basic mathematical operations

#We have seen one already, the which() function. This function looked through an object to find a particular value that we wanted.

> which(IUCN[,2]==“AFRICA”)

#Here we will focus on loops, which we access using the for() function.

#A loop is written as follows

>for(i in 1:10){ }

# for starts the loop# i is a value that will be updated as the loop iterates# 1 is the starting value for i# 10 is the final value for i#The curly brackets {} enclose the calculations that are looped

Page 57: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part six: Logic (programming)

#Make j = 1> j=1

#We will use a loop to increase the value of j by i through ten iterations > for(i in 1:10){ j+i }

#We don’t get to see what happens inside a loop unless we specifically ask for it > for(i in 1:10){+ print(j+i)+ }[1] 2[1] 3[1] 4[1] 5[1] 6[1] 7[1] 8[1] 9[1] 10[1] 11>

Page 58: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part six: Logic (programming)

#What is the new value of j? j is still 1, because we did not store the changed value.

> for(i in 1:10){+ j=j+1+ }> j[1] 11

#j is now equal to 11. How did that happen?

> j=1> for(i in 1:10){+ j=j+1+ print(j)+ }[1] 2[1] 3[1] 4[1] 5[1] 6[1] 7[1] 8[1] 9[1] 10[1] 11

Page 59: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part six: Exercise

#Make a vector of ten random numbers

#Using a loop, add 100 to each number in the vector, in sequence. For example, in the first iteration of your loop you will add 100 to the first value of your vector, in the second iteration of your loop you will add 100 to the second value of your vector, and so on.

Page 60: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Part six: Exercise

> x=rnorm(10)> x [1] -0.81673186 0.35409408 0.69619606 -2.04003445 -1.02832503 -0.31418186 [7] 0.09717105 0.78778455 -0.15048025 1.86026573>>> for(i in 1:length(x)){+ x[i]=x[i]+100+ }> x [1] 99.18327 100.35409 100.69620 97.95997 98.97167 99.68582 100.09717 [8] 100.78778 99.84952 101.86027>

Page 61: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Department of Conservation Reference: 10039929 

Photograph by Chris Smuts-Kennedy

Grid of monitored stations

How far does a Duvaucel's gecko travel after release?

Methods:• Record the grid coordinates of the station

where the gecko is released• Each day for three subsequent days

measure the grid coordinates of the station where the gecko is found

• Calculate the distance between recorded stations

• 10 m by 10 m grid

1 m

1 m

Page 62: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

#Step one: Set up the monitoring grid data for each day. 0 means that the gecko was not observed in that grid cell, 1 means that the gecko was observed in that grid cell.

#Release dayset.seed(1)d0=rep(0,100)d0[round(runif(1,min=0,max=100))]=1day.zero=matrix(d0,ncol=10,nrow=10)

#Day one checkset.seed(2)d1=rep(0,100)d1[round(runif(1,min=0,max=100))]=1day.one=matrix(d1,ncol=10,nrow=10)

#Day two checkset.seed(3)d2=rep(0,100)d2[round(runif(1,min=0,max=100))]=1day.two=matrix(d2,ncol=10,nrow=10)

#Day three checkset.seed(4)d3=rep(0,100)d3[round(runif(1,min=0,max=100))]=1day.three=matrix(d3,ncol=10,nrow=10)

Page 63: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

#Step two: Combine all of the grid data into one list. This will help us quickly analyse the data as a single batch.

days=list(day.zero,day.one,day.two,day.three)

#Step three: Create a matrix where we will store the grid locations for the gecko location, and calculate the daily distance.

movement=matrix(0,ncol=3,nrow=length(days))colnames(movement)=c("Easting","Northing","Displacement (m)")

#Step four: Find the grid cell for the location of the gecko on each day and store that information in the movement matrix.

for(i in 1:length(days)){ movement[i,1]=which(days[[i]]==1, arr.ind=TRUE)[1] movement[i,2]=which(days[[i]]==1, arr.ind=TRUE)[2] }

Page 64: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

#Step five: Calculate the distance that the gecko travelled each day.

for(j in 2:length(days)){ movement[j,3]=sqrt(((abs(movement[j,1]-movement[j- 1,1]))^2)+((abs(movement[j,2]-movement[j-1,2]))^2)) } #Step six: Plot the distance between each station where the gecko was found on each subsequent day.

barplot(movement[,3],xlab="Day",ylab="Displament (m)",main="Gecko distance")

Page 65: Introducing. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: //cran.stat.auckland.ac.nz

Conclusion

#By now you should have a good understanding of how to use R

#We have covered all of the basic ways of interacting with R:- Storing data- Plotting data- Analysing data with functions- Loading new functions for data analysis

#There is so much further you can take this though – your imagination is the limit!

#You should think of this tutorial as a quick reference guide to help get you on your feet

#You can also check out tutorial videos at illuminatingaotearoa.wordpress.com/zoostar