4 r tutorial dplyr apply function

34

Click here to load reader

Upload: sakthi-dasans

Post on 13-Apr-2017

1.610 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: 4 R Tutorial DPLYR Apply Function

R ProgrammingSakthi Dasan Sekar

http://shakthydoss.com 1

Page 2: 4 R Tutorial DPLYR Apply Function

Apply functions

Apply functions in R

apply lapply sapply tapply vapply mapply

These functions usually have apply in there name. They used to apply a specify function to each column or row to R objectsThey are much more helpful than a for or while loops.

http://shakthydoss.com 2

Page 3: 4 R Tutorial DPLYR Apply Function

Apply functions

apply

It is used to apply a function to a matrix in row wise or column wise.

Returns a vector or array or list.

apply(x, margin, function)

It takes minimum three arguments 1. matrix / array 2. margin 3. function

http://shakthydoss.com 3

Page 4: 4 R Tutorial DPLYR Apply Function

Apply functions

apply

apply(x, margin, function)

margin - tells whether function need to apply for row or column

margin = 1 indicates function need to apply for row

margin = 2 indicates function need to apply for column

function can be mean, sum, average etc.

http://shakthydoss.com 4

Page 5: 4 R Tutorial DPLYR Apply Function

Apply functions

apply

Example

m <- matrix( c(1,2,3,4),2,2 )

apply(m,1,sum)

returns a vector containing sum of rows in the matrix in m

returns a vector containing sum of column in the matrix in m

apply(m,2,sum)

http://shakthydoss.com 5

Page 6: 4 R Tutorial DPLYR Apply Function

Apply functions

lapply

lapply function takes list as argument and apply the function by looping through each element in the list.

Returns a list.

lapply(list, function)

It takes minimum two argument1. List 2. function

http://shakthydoss.com 6

Page 7: 4 R Tutorial DPLYR Apply Function

Apply functions

lapply

Example

list <- list(a = c(1,1), b=c(2,2), c=c(3,3))

lapply(list,sum)

Returns a list containing sum of a,b,c.

lapply(list,mean)

Returns a list containing mean of a,b,c.

http://shakthydoss.com 7

Page 8: 4 R Tutorial DPLYR Apply Function

Apply functions

sapply

sapply(list, func)

It takes minimum two argument

1. list

2. function

sapply does every thing similar to lappy expect that sapply can simplify retuning object.

If the result is list and every element in list is of size 1 then vector is retuned.

If the restult is list and every element in list is of same size (>1) then matrix is returned.

Other wise result is retuned as a list itself.

http://shakthydoss.com 8

Page 9: 4 R Tutorial DPLYR Apply Function

Apply functions

sapply

Example

list <- list(a = c(1,1), b=c(2,2), c=c(3,3))

sapply(list,sum)

Returns a vector containing sum of a,b,c.

list <- list(a = c(1,2), b=c(1,2,3), c=c(1,2,3,4))

sapply(list, range)

Returns a matrix containing min and max of a,b,c.

http://shakthydoss.com 9

Page 10: 4 R Tutorial DPLYR Apply Function

Apply functions

tapply

tapply works on vector, It apply the function by grouping factors inside the vector.

tapply(x, factor, fun)

It takes minimum three arguments 1. vector

2. factor of vector

3. function

http://shakthydoss.com 10

Page 11: 4 R Tutorial DPLYR Apply Function

Apply functions

tapply

Example

age <- c(23,33,28,21,20,19,34)

gender <- c("m","m","m","f","f","f","m")

f <- factor(gender)

tapply(age,f,mean)

Returns the mean age for male and female.

http://shakthydoss.com 11

Page 12: 4 R Tutorial DPLYR Apply Function

Apply functions

vapply

vapply works just like sapply except that you need to specify the type of return value (integer, double, characters).vapply is generally safer and faster than sapply. Vapply can save some time in coercing returned values to fit in a single atomic vector.

vapply(x, function, FUN.VALUE)It takes minimum three arguments 1. list 2. function 3. return value (integer, double, characters)

http://shakthydoss.com 12

Page 13: 4 R Tutorial DPLYR Apply Function

Apply functions

vapply

Example

list <- list(a = c(1,1), b=c(2,2), c=c(3,3))

vapply(list, sum, FUN.VALUE=double(1))

http://shakthydoss.com 13

Page 14: 4 R Tutorial DPLYR Apply Function

Apply functions

mapply

mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.

mapply(FUN, ...)

http://shakthydoss.com 14

Page 15: 4 R Tutorial DPLYR Apply Function

Apply functions

Example

list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))

We see that we are repeatedly calling the same function (rep) where the first argument varies from 1 to 4, and the second argument varies from 4 to 1. Instead, we can use mapply:

mapply(rep, 1:4, 4:1)

which will produce the same result.

http://shakthydoss.com 15

Page 16: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr overview

dplyr is a powerful R-package to transform and summarize tabular data with rows and columns.

By constraining your options, it simplifies how you can think about common data manipulation tasks.

It provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate those thoughts into code.

It uses efficient data storage backends, so you spend less time waiting for the computer.

http://shakthydoss.com 16

Page 17: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr is grammar for data manipulation.

It provides five verbs, basically function that can be applied on the data set

1. select - used to select rows in table or data.frame

2. filter - used to filter records in table or data.frame

3. arrange - used for re arranging the table or data.frame

4. mutate - used for adding new data

5. summarize - states the summary of data

http://shakthydoss.com 17

Page 18: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr installation

dplyr is not one among the default package, you have to install them separately

install.packages("dplyr")

loading dplyr into memory

library(dplyr)

http://shakthydoss.com 18

Page 19: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - Select

Often you work with large datasets with many columns but only a few are actually of interest to you. select function allows you to rapidly select only the interest columns in your dataset.

To select columns by nameselect(mtcars, mpg, disp)

To select a range of columns by name, use the “:” (from:to) operatorselect(mtcars, mpg:hp)

http://shakthydoss.com 19

Page 20: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - Select

To select with columns and row with string match.

select(iris, starts_with("Petal"))

select(iris, ends_with("Width"))

select(iris, contains("etal"))

select(iris, matches(".t."))

http://shakthydoss.com 20

Page 21: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - Select

You can rename variables with select() by using named arguments.

Example

select(mtcars, miles_per_gallon = mpg)

http://shakthydoss.com 21

Page 22: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - filter

Filter function in dplyr allows you to easily to filter, zoom in and zoom out of data your are interested.

filter(data, condition,..)

Simple filter

filter(mtcars, cyl == 8)

filter(mtcars, cyl < 6)

http://shakthydoss.com 22

Page 23: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - filter

Multiple criteria filter

filter(mtcars, cyl < 6 & vs == 1)

filter(mtcars, cyl < 6 | vs == 1)

Comma separated arguments are equivalent to "And" condition

filter(mtcars, cyl < 6, vs == 1)

http://shakthydoss.com 23

Page 24: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - arrange

arrange function basically used to arrange the data in specify order.

You can use desc to arrange the data in descending order.

arrange(data, ordering_column )

http://shakthydoss.com 24

Page 25: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - arrange

Example

Range the data by cyl and disp

arrange(mtcars, cyl, disp)

Range the data by descending order of disp

arrange(mtcars, desc(disp))

http://shakthydoss.com 25

Page 26: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - mutate

mutate function helps to adds new variables to existing data set.

Example

mutate(mtcars, my_custom_disp = disp / 1.0237)

my_custom_disp will be added to mtcars dataset.

http://shakthydoss.com 26

Page 27: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - summarise

dplyr summarise function help to Summarise multiple values to a single value in the dataset.

summarise(mtcars, mean(disp))

summarize with group functionsummarise(group_by(mtcars, cyl), mean(disp))summarise(group_by(mtcars, cyl), m = mean(disp), sd = sd(disp))

http://shakthydoss.com 27

Page 28: 4 R Tutorial DPLYR Apply Function

dplyr package

dplyr - summarise

List of Summary function that can be used inside dplyr summarise

mean, median, mode, max, min, sun, var, length, IQR

First - returns the first element of vector last - returns the last element of vector nth(x,n) - The 'n' the element of vector n() - the number of rows in the data.framen_distinct(x) - the number of unique value in vector x

http://shakthydoss.com 28

Page 29: 4 R Tutorial DPLYR Apply Function

DPLYR & APPLY FUNCTION

Knowledge Check

http://shakthydoss.com 29

Page 30: 4 R Tutorial DPLYR Apply Function

DPLYR & APPLY FUNCTION

Apply functions in R used to apply a specify function to each column or row to R objects.

A. TRUE

B. FALSE

Answer A

http://shakthydoss.com 30

Page 31: 4 R Tutorial DPLYR Apply Function

DPLYR & APPLY FUNCTION

Which one of the following is true about function apply(x, margin, function)

A. When margin = 2 it indicates function need to apply for row.

B. When margin = 1, it indicates function need to apply for row.

C. x must be of type list.

D. only arithmetic functions can be passed into apply function.

Answer B

http://shakthydoss.com 31

Page 32: 4 R Tutorial DPLYR Apply Function

DPLYR & APPLY FUNCTION

Define lapply.

A. lapply function takes list as argument and apply the function by looping through each element in the list.

B. lapply function takes list, array or matrix and apply the function by looping through each element in the list.

C. lapply is not standalone. it should with apply function.

D. lapply is used when latitude and longitude comes into to picture.

Answer A

http://shakthydoss.com 32

Page 33: 4 R Tutorial DPLYR Apply Function

DPLYR & APPLY FUNCTION

dplyr is a powerful R-package to transform and summarize tabular data with rows and columns. It also refered as grammar for data manipulation.

A. TRUE

B. FALSE

Answer A

http://shakthydoss.com 33

Page 34: 4 R Tutorial DPLYR Apply Function

DPLYR & APPLY FUNCTION

How do you rearrange the order of column in data set using dplyrfunctions.

A. order_data(data, ordering_column)

B. sort_data(data,ordering_column)

C. dplyr(data,ordering_column)

D. arrange(data, ordering_column)

Answer D

http://shakthydoss.com 34