lab3: data management(ii) and rmarkdown · 2021. 3. 5. · lab3: data management(ii) and rmarkdown...

92
Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020 Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 1 / 92

Upload: others

Post on 07-Jun-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Lab3: Data Management(II) and RmarkdownIntroduction to Econometrics,Fall 2020

Zhaopeng Qu

Nanjing University

9/27/2020

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 1 / 92

Page 2: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

1 Review Last Lecture

2 Data Management(II): Relating data

3 50 分钟 Rmarkdown 简介 + 操作

4 Important info about Homework

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 2 / 92

Page 3: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Review Last Lecture

Section 1

Review Last Lecture

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 3 / 92

Page 4: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Review Last Lecture Data Management(I)

Subsection 1

Data Management(I)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 4 / 92

Page 5: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Review Last Lecture Data Management(I)

Review: Main Content of last lecture

Variable and Data Types in RData Structure in RTwo Models to clean data in R

Basic RTidyverse

The workflow of data analysis is as follows

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 5 / 92

Page 6: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Review Last Lecture Data Management(I)

Review: tidy

Import DataTransform and Relate Data(Structure)Improve data quality(observations):

Missing valuesDuplicatesOutliers

Wrangling Datagenerate new variablesselect variables and observations

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 6 / 92

Page 7: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data

Section 2

Data Management(II): Relating data

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 7 / 92

Page 8: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Subsection 1

Introduction

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 8 / 92

Page 9: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Introduction

Collectively, multiple tables of data are called relational data becauseit is the relations, not just the individual datasets, that are important.eg. a household survey often composites with several subsets: someare reported in household level: such as consumption,assets,etc.others are reported in individual level: like working information.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 9 / 92

Page 10: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

nycflights13 dataa dataset on flights departing New York City in 2013.

installed.packages(nycflights13)

library(tidyverse)library(nycflights13)

airlines

## # A tibble: 16 x 2## carrier name## <chr> <chr>## 1 9E Endeavor Air Inc.## 2 AA American Airlines Inc.## 3 AS Alaska Airlines Inc.## 4 B6 JetBlue Airways## 5 DL Delta Air Lines Inc.## 6 EV ExpressJet Airlines Inc.## 7 F9 Frontier Airlines Inc.## 8 FL AirTran Airways Corporation## 9 HA Hawaiian Airlines Inc.## 10 MQ Envoy Air## 11 OO SkyWest Airlines Inc.## 12 UA United Air Lines Inc.## 13 US US Airways Inc.## 14 VX Virgin America## 15 WN Southwest Airlines Co.## 16 YV Mesa Airlines Inc.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 10 / 92

Page 11: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

nycflights13:airports

airports

## # A tibble: 1,458 x 8## faa name lat lon alt tz dst tzone## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>## 1 04G Lansdowne Airport 41.1 -80.6 1044 -5 A America/New_Yo~## 2 06A Moton Field Municipal A~ 32.5 -85.7 264 -6 A America/Chicago## 3 06C Schaumburg Regional 42.0 -88.1 801 -6 A America/Chicago## 4 06N Randall Airport 41.4 -74.4 523 -5 A America/New_Yo~## 5 09J Jekyll Island Airport 31.1 -81.4 11 -5 A America/New_Yo~## 6 0A9 Elizabethton Municipal ~ 36.4 -82.2 1593 -5 A America/New_Yo~## 7 0G6 Williams County Airport 41.5 -84.5 730 -5 A America/New_Yo~## 8 0G7 Finger Lakes Regional A~ 42.9 -76.8 492 -5 A America/New_Yo~## 9 0P2 Shoestring Aviation Air~ 39.8 -76.6 1000 -5 U America/New_Yo~## 10 0S9 Jefferson County Intl 48.1 -123. 108 -8 A America/Los_An~## # ... with 1,448 more rows

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 11 / 92

Page 12: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

nycflights13:planes

planes

## # A tibble: 3,322 x 9## tailnum year type manufacturer model engines seats speed engine## <chr> <int> <chr> <chr> <chr> <int> <int> <int> <chr>## 1 N10156 2004 Fixed wing m~ EMBRAER EMB-1~ 2 55 NA Turbo-~## 2 N102UW 1998 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 3 N103US 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 4 N104UW 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 5 N10575 2002 Fixed wing m~ EMBRAER EMB-1~ 2 55 NA Turbo-~## 6 N105UW 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 7 N107US 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 8 N108UW 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 9 N109UW 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## 10 N110UW 1999 Fixed wing m~ AIRBUS INDUST~ A320-~ 2 182 NA Turbo-~## # ... with 3,312 more rows

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 12 / 92

Page 13: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

nycflights13:weather

weather

## # A tibble: 26,115 x 15## origin year month day hour temp dewp humid wind_dir wind_speed## <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 EWR 2013 1 1 1 39.0 26.1 59.4 270 10.4## 2 EWR 2013 1 1 2 39.0 27.0 61.6 250 8.06## 3 EWR 2013 1 1 3 39.0 28.0 64.4 240 11.5## 4 EWR 2013 1 1 4 39.9 28.0 62.2 250 12.7## 5 EWR 2013 1 1 5 39.0 28.0 64.4 260 12.7## 6 EWR 2013 1 1 6 37.9 28.0 67.2 240 11.5## 7 EWR 2013 1 1 7 39.0 28.0 64.4 240 15.0## 8 EWR 2013 1 1 8 39.9 28.0 62.2 250 10.4## 9 EWR 2013 1 1 9 39.9 28.0 62.2 260 15.0## 10 EWR 2013 1 1 10 41 28.0 59.6 260 13.8## # ... with 26,105 more rows, and 5 more variables: wind_gust <dbl>,## # precip <dbl>, pressure <dbl>, visib <dbl>, time_hour <dttm>

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 13 / 92

Page 14: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Relations in nycflights13 data

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 14 / 92

Page 15: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Key variables

The variables used to connect each pair of tables are called keys. Akey is a variable (or set of variables) that uniquely identifies anobservation.

Primary Key: uniquely identifies an observation in its own tableForeign Key: uniquely identifies an observation in another tablehead(planes$tailnum) # primary key in `planes` table

## [1] "N10156" "N102UW" "N103US" "N104UW" "N10575" "N105UW"

head(flights$tailnum) # foreign key in `flights` table

## [1] "N14228" "N24211" "N619AA" "N804JB" "N668DN" "N39463"

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 15 / 92

Page 16: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Key variables

Verify that key variables do indeed uniquely identify each observation.count() the primary keys and look for entries where n is greater thanone

planes %>%count(tailnum) %>%filter(n>1)

## # A tibble: 0 x 2## # ... with 2 variables: tailnum <chr>, n <int>

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 16 / 92

Page 17: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Key variables

Sometimes you can

weather %>%count(year,month,day,hour,origin) %>%filter(n>1)

## # A tibble: 3 x 6## year month day hour origin n## <int> <int> <int> <int> <chr> <int>## 1 2013 11 3 1 EWR 2## 2 2013 11 3 1 JFK 2## 3 2013 11 3 1 LGA 2

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 17 / 92

Page 18: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Introduction

Key variablesduplicates function

weather %>%distinct(year,month,day,hour,origin)

## # A tibble: 26,112 x 5## origin year month day hour## <chr> <int> <int> <int> <int>## 1 EWR 2013 1 1 1## 2 EWR 2013 1 1 2## 3 EWR 2013 1 1 3## 4 EWR 2013 1 1 4## 5 EWR 2013 1 1 5## 6 EWR 2013 1 1 6## 7 EWR 2013 1 1 7## 8 EWR 2013 1 1 8## 9 EWR 2013 1 1 9## 10 EWR 2013 1 1 10## # ... with 26,102 more rows

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 18 / 92

Page 19: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Join tables

Subsection 2

Join tables

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 19 / 92

Page 20: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Join tables

Relations between two tables

A primary key and the corresponding foreign key in another tableform a relation.

1 1 : many2 1 : 13 many : many

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 20 / 92

Page 21: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Join tables

Join(merge) tables: mutating join

mutating join: It first matches observations by their keys, thencopies across variables from one table to the other.Aim: add the full airline name to the flight2

flights2 <- flights %>%select(year:day,hour,tailnum,carrier,origin)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 21 / 92

Page 22: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Join tables

Join(merge) tables: mutating joinAim: add the full airline name to the flight2

flights2 %>%left_join(airlines, by = "carrier")

## # A tibble: 336,776 x 8## year month day hour tailnum carrier origin name## <int> <int> <int> <dbl> <chr> <chr> <chr> <chr>## 1 2013 1 1 5 N14228 UA EWR United Air Lines Inc.## 2 2013 1 1 5 N24211 UA LGA United Air Lines Inc.## 3 2013 1 1 5 N619AA AA JFK American Airlines Inc.## 4 2013 1 1 5 N804JB B6 JFK JetBlue Airways## 5 2013 1 1 6 N668DN DL LGA Delta Air Lines Inc.## 6 2013 1 1 5 N39463 UA EWR United Air Lines Inc.## 7 2013 1 1 6 N516JB B6 EWR JetBlue Airways## 8 2013 1 1 6 N829AS EV LGA ExpressJet Airlines Inc.## 9 2013 1 1 6 N593JB B6 JFK JetBlue Airways## 10 2013 1 1 6 N3ALAA AA LGA American Airlines Inc.## # ... with 336,766 more rows

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 22 / 92

Page 23: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Join tables

Join(merge) tables: mutating joinAim: add the type of plane to the flights

planes %>%select(tailnum,type) %>%right_join(flights,by = "tailnum")

## # A tibble: 336,776 x 20## tailnum type year month day dep_time sched_dep_time dep_delay arr_time## <chr> <chr> <int> <int> <int> <int> <int> <dbl> <int>## 1 N10156 Fixe~ 2013 1 10 626 630 -4 802## 2 N10156 Fixe~ 2013 1 10 1120 1032 48 1320## 3 N10156 Fixe~ 2013 1 10 1619 1540 39 1831## 4 N10156 Fixe~ 2013 1 11 632 634 -2 810## 5 N10156 Fixe~ 2013 1 11 1116 1120 -4 1328## 6 N10156 Fixe~ 2013 1 11 1845 1819 26 1959## 7 N10156 Fixe~ 2013 1 12 830 838 -8 1024## 8 N10156 Fixe~ 2013 1 12 1410 1359 11 1555## 9 N10156 Fixe~ 2013 1 13 1551 1536 15 1804## 10 N10156 Fixe~ 2013 1 13 2221 2039 102 2346## # ... with 336,766 more rows, and 11 more variables: sched_arr_time <int>,## # arr_delay <dbl>, carrier <chr>, flight <int>, origin <chr>, dest <chr>,## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 23 / 92

Page 24: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Subsection 3

Further Understanding joins

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 24 / 92

Page 25: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

a simple case

x <- tribble(~key, ~val_x,

1, "x1",2, "x2",3, "x3"

)y <- tribble(

~key, ~val_y,1, "y1",2, "y2",4, "y3"

)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 25 / 92

Page 26: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Inner joinsInner join matches pairs of observations whenever their keys are equal.

x %>%inner_join(y, by = "key")

## # A tibble: 2 x 3## key val_x val_y## <dbl> <chr> <chr>## 1 1 x1 y1## 2 2 x2 y2Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 26 / 92

Page 27: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Outer joins:leftA left join keeps all observations in x.

x %>%left_join(y, by = "key")

## # A tibble: 3 x 3## key val_x val_y## <dbl> <chr> <chr>## 1 1 x1 y1## 2 2 x2 y2## 3 3 x3 <NA>Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 27 / 92

Page 28: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Outer joins: rightA right join keeps all observations in y.

x %>%right_join(y, by = "key")

## # A tibble: 3 x 3## key val_x val_y## <dbl> <chr> <chr>## 1 1 x1 y1## 2 2 x2 y2## 3 4 <NA> y3Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 28 / 92

Page 29: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Outer joins: fullA full join keeps all observations in x and y.

x %>%full_join(y, by = "key")

## # A tibble: 4 x 3## key val_x val_y## <dbl> <chr> <chr>## 1 1 x1 y1## 2 2 x2 y2## 3 3 x3 <NA>## 4 4 <NA> y3

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 29 / 92

Page 30: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Filtering joinssemi_join(x,y) keeps all observations in x that have a match in y.

x %>%semi_join(y, by = "key")

## # A tibble: 2 x 2## key val_x## <dbl> <chr>## 1 1 x1## 2 2 x2Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 30 / 92

Page 31: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Filtering joins

anti_join(x,y) drops all observations in x that have a match in y.

x %>%anti_join(y, by = "key")

## # A tibble: 1 x 2## key val_x## <dbl> <chr>## 1 3 x3

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 31 / 92

Page 32: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Top 10 most popular destinations

top_dest <- flights %>%count(dest, sort = TRUE) %>%head(10)

top_dest

## # A tibble: 10 x 2## dest n## <chr> <int>## 1 ORD 17283## 2 ATL 17215## 3 LAX 16174## 4 BOS 15508## 5 MCO 14082## 6 CLT 14064## 7 SFO 13331## 8 FLL 12055## 9 MIA 11728## 10 DCA 9705

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 32 / 92

Page 33: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Top 10 most popular destinationsfind each flight that went to one of those destinations.

flights %>%filter(dest %in% top_dest$dest)

## # A tibble: 141,145 x 19## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time## <int> <int> <int> <int> <int> <dbl> <int> <int>## 1 2013 1 1 542 540 2 923 850## 2 2013 1 1 554 600 -6 812 837## 3 2013 1 1 554 558 -4 740 728## 4 2013 1 1 555 600 -5 913 854## 5 2013 1 1 557 600 -3 838 846## 6 2013 1 1 558 600 -2 753 745## 7 2013 1 1 558 600 -2 924 917## 8 2013 1 1 558 600 -2 923 937## 9 2013 1 1 559 559 0 702 706## 10 2013 1 1 600 600 0 851 858## # ... with 141,135 more rows, and 11 more variables: arr_delay <dbl>,## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 33 / 92

Page 34: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Top 10 most popular destinationsdifficult to extend that approach to multiple variables.

## Joining, by = "dest"

## # A tibble: 141,145 x 19## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time## <int> <int> <int> <int> <int> <dbl> <int> <int>## 1 2013 1 1 542 540 2 923 850## 2 2013 1 1 554 600 -6 812 837## 3 2013 1 1 554 558 -4 740 728## 4 2013 1 1 555 600 -5 913 854## 5 2013 1 1 557 600 -3 838 846## 6 2013 1 1 558 600 -2 753 745## 7 2013 1 1 558 600 -2 924 917## 8 2013 1 1 558 600 -2 923 937## 9 2013 1 1 559 559 0 702 706## 10 2013 1 1 600 600 0 851 858## # ... with 141,135 more rows, and 11 more variables: arr_delay <dbl>,## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 34 / 92

Page 35: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Using anti_join

Whether are there flights that do not have a match in planes

## # A tibble: 722 x 2## tailnum n## <chr> <int>## 1 <NA> 2512## 2 N725MQ 575## 3 N722MQ 513## 4 N723MQ 507## 5 N713MQ 483## 6 N735MQ 396## 7 N0EGMQ 371## 8 N534MQ 364## 9 N542MQ 363## 10 N531MQ 349## # ... with 712 more rows

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 35 / 92

Page 36: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Data Management(II): Relating data Further Understanding joins

Some Notes for join data

1 Start by identifying the variables that form the primary key in eachtable.

2.Check that none of the variables in the primary key are missing andwhether is duplicated(for inner join).

3 Check that your foreign keys match primary keys in another table.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 36 / 92

Page 37: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作

Section 3

50 分钟 Rmarkdown 简介 + 操作

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 37 / 92

Page 38: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 What we have learned so far

Subsection 1

What we have learned so far

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 38 / 92

Page 39: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 What we have learned so far

What we have practiced so far

Loaded the necessary librariesRead the data into RCleaned and reorganized dataVisualized by drawing a plot

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 39 / 92

Page 40: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 What we have learned so far

What we are going to learn next

Build models to analysis the dataWrite a beautiful report in desirable formats or make a good slides forpresentation.All the results should easily shared with and replicated by others.All steps above constitute a entire workflow of data analysis.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 40 / 92

Page 41: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Why use Rmarkdown?

Subsection 2

Why use Rmarkdown?

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 41 / 92

Page 42: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Why use Rmarkdown?

Introduction to Markdown

Markdown: 是一种轻量级的标记语言,它允许人们使用易读易写的纯文本格式编写文档,然后转换成有效的 XHTML(或者 HTML)文档。(维基百科)。具有简介的语法、优美的格式以及多种软件的支持,因此是目前互联网上最流行的写作程式。简单讲:容易学 + 兼容性强 + 功能强大

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 42 / 92

Page 43: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Why use Rmarkdown?

Word is far from perfect

先看我们最常用的但也无比糟糕的 Word体积臃肿,功能貌似强大,但很多其实都用不上。每次打开都费很久时间。最让人崩溃的是非常非常容易崩溃。版本和格式之间兼容性非常差。

比如 MacOS 和 Win 之间。从网页上复制文字到 Word 里经常是乱的。

与其他软件相结合能力差,无法单独生成动态文档。团队协作能力差。

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 43 / 92

Page 44: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Why use Rmarkdown?

Markdown is more popular in scientific writing

再看我们介绍的 Markdown体积小,因为是纯文本文件,任意一个文本编辑器都可以打开。即使文字内容很多,也不容易崩溃。文档格式万能转换。(借助其他软件比如 pandoc)可以直接转化为pdf、docx、html。可以与统计软件相结合,生成动态文档。与 Github 应用相结合,可以实现版本控制和程式化协作。(What isGithub?)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 44 / 92

Page 45: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Why use Rmarkdown?

Basic Markdown Synatax

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 45 / 92

Page 46: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Why use Rmarkdown?

Basic Markdown Synatax

very easy to learn: 10 minutes is enough to learn.Markdown online tutorial

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 46 / 92

Page 47: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 What is RMarkdown?

Subsection 3

What is RMarkdown?

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 47 / 92

Page 48: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 What is RMarkdown?

R+Markdown

在 RStudio 中,基于 Markdown 语法,结合 R 的功能的文档处理套件。

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 48 / 92

Page 49: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 What is RMarkdown?

Yihui Xie and knitr packageDeveloped by a Chinese leading figure in R field: Yihui Xie(谢益辉)

COS 访谈第 20 期: 千秋邈矣独留我,百战归来再读书:谢益辉Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 49 / 92

Page 50: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Getting started

Subsection 4

Getting started

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 50 / 92

Page 51: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Getting started

Build a Project

Recall how to creat a project within RStudioWhy?

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 51 / 92

Page 52: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Getting started

Install and Library Packages

(if necessary) install packagesrmarkdownknitr

#install.packages("rmarkdown")#install.packages("knitr")#library("rmarkdown")#library("knitr")

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 52 / 92

Page 53: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Getting started

Generate a document

StaticDocument: Html,Word,LaTeX,Presentation: Html,Beamer in Latex,PowerpointNotebook: html

DynamicShiny

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 53 / 92

Page 54: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Introduction to Rmarkdown

Subsection 5

Introduction to Rmarkdown

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 54 / 92

Page 55: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Introduction to Rmarkdown

Introduction to Rmarkdown

.Rmd formatDocument:

HtmlWord (requires you install MS Word)PDF (requires you install LaTex system)

PresentationHtml: ioslides & SlidifyPDF: beamer of LaTex system (requires you install LaTex system)Powerpoint: (requires you install MS Powerpoint)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 55 / 92

Page 56: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Introduction to Rmarkdown

Generate a html file(default)

small sizeeasy to compilecan transfer to Word or PDF later

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 56 / 92

Page 57: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Basic Synatx in RMarkdown(Markdown)

Subsection 6

Basic Synatx in RMarkdown(Markdown)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 57 / 92

Page 58: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Basic Synatx in RMarkdown(Markdown)

Introduction

Let’s try it by using a default template.Creat(open) a Rmarkdown file

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 58 / 92

Page 59: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Basic Synatx in RMarkdown(Markdown)

Compile a file in RMarkdown

Command(Control)+Shift+KOr knit

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 59 / 92

Page 60: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Basic Synatx in RMarkdown(Markdown)

The Structure of a rmarkdown template

Basic Settings: YAMLTitles and Texts:R codes: chunkInsert imagesTablesMath expressionsFor a quick review, please refer to

RMarkodwn Reference GuideRMarkodwn Cheat Sheet

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 60 / 92

Page 61: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Subsection 7

Other formats

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 61 / 92

Page 62: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Forms for document and presentation

Document:HtmlWord (requires you install word)PDF (requires you install LaTex system)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 62 / 92

Page 63: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Generate a .docx file with Word

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 63 / 92

Page 64: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Generate a .docx file with Word

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 64 / 92

Page 65: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Geneate a PDF report with LateX

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 65 / 92

Page 66: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Geneate a PDF report with LaTeX

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 66 / 92

Page 67: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Geneate a PDF report with LaTeX

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 67 / 92

Page 68: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Introduction to LaTeXLaTex 是一种文字编辑处理排版的软件,是科技写作的主要格式之一。简单理解是相当于另一种 office 软件(集合了 Word 和 Powerpoint的功能)。与 Word 的简单区别在于 LaTex 不是 “所见即所得” 式的软件,即我们不能随意的在窗口里调整整个文档的格式,而是要通过各种函数、命令或者程序包的设定来确定文字版面的格式。LaTex 的好处在于美观,这种美观性来自于格式的规范和准确。尤其是当在文中包含大量公式和图表等内容时,不会出现错位和乱码等问题。这方面使用过 Word 输入过公式和图表的同学都感受过“不能对齐” 的痛苦。想要使用 Latex 必须要学习 LaTex“语法”,这其实是一整套排版编辑的语言。如果想要全部掌握,也没有那么容易。正如虽然我们都使用 Word 很长时间, 但 Word 里的大部分功能我们都很少使用。所以学习 LaTex 也可以先从最简单的方式开始,很多复杂的功能等到以后碰到问题需要时再学习应用。在本课程的内容中,大家只需要安装好即可。

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 68 / 92

Page 69: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Install LaTeX Packages or Full System

Supported by LaTex system you have to install ityou have to install itat first.

install “tinytex”” package within RStudio(recommended)install full LaTex system outside RStudio

Mac: MacTexWin: Texlive or MikeTex

install.packages('tinytex')tinytex::install_tinytex()

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 69 / 92

Page 70: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Other formats

Install LaTeX Packages or Full System

system("pdflatex --version") # 安装之后,检查版本

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 70 / 92

Page 71: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Subsection 8

中文支持

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 71 / 92

Page 72: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX

LaTex 的中文支持需要加载特殊的字体包才能使用。使用谢益辉开发的 rticles packages 的模版直接在 YAML 里添加中文包

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 72 / 92

Page 73: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX:中文支持

LaTex 的中文支持需要加载特殊的字体包才能使用。使用谢益辉开发的 rticles packages 的模版直接在 YAML 里添加 LaTeX 中文包

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 73 / 92

Page 74: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX:中文支持

#installed.packages("rticles")

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 74 / 92

Page 75: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX: 中文支持

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 75 / 92

Page 76: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX:中文支持

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 76 / 92

Page 77: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX:中文支持

在 YAML 里添加 LaTeX 中文包

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 77 / 92

Page 78: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 中文支持

Geneate a PDF report with LaTeX:中文支持

在 YAML 里添加 LaTeX 中文包

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 78 / 92

Page 79: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 RMarkdown for Presentations

Subsection 9

RMarkdown for Presentations

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 79 / 92

Page 80: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 RMarkdown for Presentations

Presentation forms

Html: ioslides & SlidifyPDF: beamer of LaTex system (requires you install LaTex system)Powerpoint: (requires you install Powerpoint)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 80 / 92

Page 81: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 RMarkdown for Presentations

Basic Forms

very similar to documentstry it by yourself

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 81 / 92

Page 82: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Resources and References

Subsection 10

Resources and References

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 82 / 92

Page 83: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Some useful online resources

Subsection 11

Some useful online resources

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 83 / 92

Page 84: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

50 分钟 Rmarkdown 简介 + 操作 Some useful online resources

Some useful online resources

RStudio,R Markdown from RStudioCosma Shalizi,(2016),Using R Markdown for Class Reports

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 84 / 92

Page 85: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework

Section 4

Important info about Homework

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 85 / 92

Page 86: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Subsection 1

Important info about Homework

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 86 / 92

Page 87: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Deadline

Due to Oct.14 24:00.Late sending will lower your scores by a standard “10% per hourdeduction”

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 87 / 92

Page 88: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Required forms(作业格式)

The forms of report can be either MS Word or PDF,other forms willnot accepted.All files have to be generated by either Rmarkdown or StataMarkdown.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 88 / 92

Page 89: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Email and Mita Platform(交作业途径)

You must upload your homework twice to two platforms.(你必须提交两次作业到两个地方,只提交一个将被判定为未交作业。)

1 Upload your report(only for PDF or docx files) to the mita system.2 Send a email with an attachment including all files(may be compressed

in .zip or .RAR format) of your homework to:[email protected] for backup.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 89 / 92

Page 90: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Mita platform

注意作业通知上传作业 (只接受 PDF 和 Word,其他格式会失败)

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 90 / 92

Page 91: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Email formats

Email Subject: 学号 _ 姓名 _ 专业 _HW1Rename your file with ID_Name_Major_HW1 (eg. 151090001_ 户纳东 _ 经济 _HW1.zip)Wrong filename or no subject emails may not be archived and scoredit properly.“All files” may include your formal report in Word or PDF formats,Rmarkdown(.Rmd for R users) or stmd(.stmd for Stata users) togenerate the final report, as well as Rscript(.R for R users) ordofile(.do for Stata users) to clean the survey data.

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 91 / 92

Page 92: Lab3: Data Management(II) and Rmarkdown · 2021. 3. 5. · Lab3: Data Management(II) and Rmarkdown Introduction to Econometrics,Fall 2020 Zhaopeng Qu Nanjing University 9/27/2020

Important info about Homework Important info about Homework

Language and Rule

English or Chinese(Writing in English will receive a small bonus)Important Rule: Plagiarism Will Be NOT Tolerant!. If TAs haveenough evidence to make me believe that your homework is very very“similar” with another finished by one of your classmates, then bothof you will score ZERO!

Zhaopeng Qu (Nanjing University) Lab3: Data Management(II) and Rmarkdown 9/27/2020 92 / 92