new bootcampr - jason heppler weblog · 2020. 2. 20. · r data modes r allows us to implement...

30
BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha February 18, 2020 @jaheppler

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

BootcampRAN INTRODUCTION TO R

Jason A. Heppler, PhD University of Nebraska at Omaha February 18, 2020 @jaheppler

Page 2: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Welcome!

Page 3: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Hi. I'm Jason. I like to gesture at screens.Digital Engagement Librarian, University of Nebraska at Omaha Mentor, Mozilla Open Leaders Researcher, Humanities+Design, Stanford University

Page 4: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical
Page 5: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical
Page 6: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Today's plan •Basics of R

•How the language works •Symbols and grammar •Math and statistics •R data types

•Interactive worksheet!

Open up RStudio. We'll start doing a few things together soon.

Page 7: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Some vocabulary •Packages are add-on features for R that can include data,

new functions and methods, and extended capabilities. •Scripts are where you store commands to be run by R. •Functions are commands that do something to an object in R. •Dataframe is the main element for statistical purposes, an

object with rows and columns. •Workspace is the working memory of R where all objects are

stored. •Vector is the basic unit of data in R

Page 8: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

A note on maintaining R •Adding packages to R also means keeping them up-to-date. Use update.packages() in the console or the package update interface in RStudio.

•Package updates are at the whim of the package developer. There may not be a regular release cycle.

Page 9: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Help! •R is good at helping you through self-guidance. •Try typing ?summary in the console. •Now try typing ??regression. •If you're getting odd warnings or errors? Jump over to Google

or Stack Overflow. A number of R Core members hang out there.

Page 10: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

(Some more) Help! •The most important thing you can do in getting help is to have a

reproducible example available (a short simulated data and code that replicates the problem). For example:

foo <- c(1, "b", 5, 7, 0) bar <- c(1, 2, 3, 4, 5) foo + bar

Error: non-numeric argument to binary operator

Page 11: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

The Data Frame •Open up RStudio and type in:

data(mtcars) mtcars

mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 [ reached 'max' / getOption("max.print") -- omitted 23 rows ]

Page 12: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

The Data Frame •The data frame operates a lot like a spreadsheet, and it's the

central feature for doing data analysis in R. •Data frames are the primary method for storing and

manipulating data. •Unlike a spreadsheet, everything we do to the data frame will

either be by the entire row or entire column.

Page 13: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a calculator 2 + 2

[1] 4

2 * pi # multiply by a constant

[1] 6.283

7 + runif(1, min = 0, max = 1) # add a random variable

[1] 7.375

4^4 # powers

[1] 256

sqrt(4^4) # functions

[1] 16

Page 14: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a calculator data(trees)

median(trees$Girth)

var(trees$Girth) # variance

sd(trees$Girth) # standard deviation

max(trees$Girth) # max value

min(trees$Girth) # min value

range(trees$Girth) # range

quantile(trees$Girth) # quantiles 25%

fivenum(trees$Girth) # box plot elements

length(trees$Girth) # number of observations for a variable

length(trees) # number of observations for a dataset

nrows(trees) # number of rows in a data frame

Page 15: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Arithmetic Operators R can do the usual arithmetic operators + - = / * and ^, plus integer division %/% and remainder integer division %%.

Try the following in the console:

2 + 2

2/2

2 * 2

2^2

2 == 2

23%/%2

23%%2

Page 16: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Symbols <- is the assignment operator. RStudio keyboard shortcut exists for macOS (Option + -) and Windows and Linux (Alt + -).

This is different from most programming languages, which often use a single = for assignment.

Try entering into your console:

foo <- 3 foo

Page 17: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Symbols : is the sequence operator. We can create ranges this way. Try the following in the console:

1:10

You can also store these ranges in a variable. Try:

a <- 100:120 a

Page 18: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Symbols # is for writing comments. Anything after the # is not evaluated and ignored by R.

# Something I want to keep from R, but mostly # notes for myself or someone else so they # understand what's happening with the follow # code. Below, we just add two numbers. 2 + 2

Page 19: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Advanced Math We can do plenty of advanced math in R. For example, we can generate distributions of data very easily. Try this:

rnorm(100)

Neat, huh? Now try this:

hist(rnorm(10000))

On advanced R: •Hadley Wickham, Advanced R

<https://adv-r.hadley.nz> •Hadley Wickham, R for Data Science

<https://r4ds.had.co.nz>

Page 20: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

The Workspace R lets us store vectors, datasets, and functions in memory. All R objects are stored in the memory of the computer, and R makes it easy for organizing the workspace. Try the following in the console:

x <- 5 # store the variable x # print the variable

z <- 3 ls() # list all variables

ls.str() # list and describe variables

rm(x) # delete a variable ls()

Page 21: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a language R, like any programming language, has a set of rules to follow. You'll learn more as you go, but let's cover a few quick ones.

1. Case sensitivity matters. A and a are not the same.

a <- 3 A <- 4 print(c(a, A)) # what happens if you type print(a,A)?

Are they the same?

a == A

Page 22: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a language 2. c() stands for concatenate, and allows vectors to have multiple elements.

If you need two elements in a vector, you need to wrap it up in c.

c() can put together any vectors, but typically you want to keep the objects of the vector all of the same type (e.g., don't mix strings and numbers).

G <- c(3,4) print(G)

Page 23: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a language 3. R is maddeningly inconsistent in naming conventions. Some functions are camelCase, others are.dot.separated, others used_underscores. RStudio autocomplete can try to help.

4. R has multiple packages and functions that do the same thing, even sometimes sharing function names. Sometimes you'll need to tell R explicitly which package you're referring to. This is done with two colons :: (e.g., dplyr::filter())

Page 24: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a language: Objects Everything in R is an object, even functions. We can manipulate objects in a variety of ways. For example, we can apply the summary function to a variety of object types. Let's try this.

# summary of columns 1, 2, and 3 summary(mtcars[, 1:3])

# summary of a single column summary(mtcars$mpg)

Page 25: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a language: Objects Since everything is an object in R, we can do all sorts of operations against them.

length(unique(mtcars$mpg))

We can also store the results of function calls.

unique_mpg <- length(unique(mtcars$mpg)) unique_mpg

Page 26: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R as a language: Operators We can use comparison operators to compare values across vectors. (< > <= >= == !=)

big <- c(9,12,15,25) small <- c(9, 3, 4, 2)

big > small

big == small # don't do big = small!

Page 27: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical.

Vectors must be of one consistent type of data. If you make a vector that mixes types, it will default to a character vector.

is.numeric(A) is.character(A) is.logical(A)

# If you don't know what the data type is, # just ask! class(A)

Page 28: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

R Data Modes There are several more supported classes in R beyond numeric, character, and logical. This includes things like linear models, matrices, networks, spatial data frames, and others.

Classes determine what you can and cannot do to objects.

Page 29: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Let's do some more hands-on

Head on over to https://tinyurl.com/unobootcamp

Page 30: New BootcampR - Jason Heppler weblog · 2020. 2. 20. · R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical

Questions? Troubleshooting?

Next workshop: February 25, 1:30p-3p: Spark Joy with Data (CL 232)