reading data into r

50
Reading data into 2012-09-28 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO KNOW

Upload: kazuki-yoshida

Post on 10-May-2015

1.861 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Reading Data into R

Reading data into

2012-09-28 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO(KNOW

Page 2: Reading Data into R

! Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH

! Introduction to R

Previously in this group

Page 3: Reading Data into R

Menu

! What statistics is all about.

! Data-reading functions in R

! Installing packages

! Reading excel files

! Reading other files

Page 4: Reading Data into R

is the study of the collection, organization, analysis, interpretation,

and presentation of

datahttp://en.wikipedia.org/wiki/Statistics

http://mediacrushllc.com/2012/internet-statistics-2012/

Page 5: Reading Data into R

No data,No life

No statistics

Page 6: Reading Data into R

http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html

Loading data is the first step

Page 7: Reading Data into R

Supported! .RData (native): load()

! .csv: read.csv()

! .xls/.xlsx: library(gdata) or library(XLConnect)

! .sas7bdat: read.sas7bdat() via library(sas7bdat)

! .dta: read.dta via library(foreign)

! and more...http://cran.r-project.org/doc/manuals/R-data.html

Page 8: Reading Data into R

library()packages

Page 9: Reading Data into R

http://r4stats.com/articles/popularity/

4000+user-

contributedpackages

Fast development

Page 10: Reading Data into R

Downside:not much can be

done withoutpackages

Page 11: Reading Data into R

CRAN

Page 13: Reading Data into R

Let’s try

Page 14: Reading Data into R

Open R Studio

Page 15: Reading Data into R

http://rstudio.org

Watch the screencast

Page 16: Reading Data into R

SourceConsole

Plot Workspace

switched

Page 17: Reading Data into R

Menu: RStudio - Preferences

SourceConsole

My configuration

Plot Workspace

Page 18: Reading Data into R

Menu: RStudio - Preferences My configuration

Configure CRAN mirror

Page 19: Reading Data into R

Use .CSVif possible

http://www.edrugsearch.com/edsblog/cvs-takes-on-wal-marts-generic-drug-prices-with-a-gimmicky-twist/#.UEfft0J8z0d

Comma Separated Values

Page 21: Reading Data into R

read.csv(“file.csv”)

http://www.wondergraphs.com/img/SFO_Landings.csv

Careful big file!

Page 22: Reading Data into R

new.dat <- read.csv(“file.csv”)

name of a dataset here

file name herefunction to read .csv files

Page 23: Reading Data into R

new.dat <- read.csv(file.choose())

name of a dataset here

function to open a file-choose dialoguefunction to read .csv files

alternatively

Page 24: Reading Data into R

Space separated

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat

Page 25: Reading Data into R

read.table(“file.dat”)or

read.table(“file.dat”, header = T)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat

Page 26: Reading Data into R

tab-separated

Page 28: Reading Data into R

For comma-, tab-, or space-separated text

Let’s try!

Page 31: Reading Data into R

install.packages(“gdata”, dep = T)

library(gdata)read.xls(“file.xls”)

Perl configuration necessary on Winhttp://cran.r-project.org/web/packages/gdata/INSTALL

Page 32: Reading Data into R

install.packages(“XLConnect”, dep = T)

library(XLConnect)readWorksheet(loadWorkbook(“file.xls”),

sheet=1)

install.packages("XLConnect", type = "source") on Mac

Define a function for simplicitymy.read.xls <- function(file) readWorksheet(loadWorkbook(file), sheet = 1)my.read.xls(“file.xls”)

Page 33: Reading Data into R

install.packages(“package”, dep = T)

package name here

To install a package

short for dependenciesshort for TRUE

Page 34: Reading Data into R
Page 35: Reading Data into R

To load a package

library(package)

package name here

double quote “” can be omitted

Page 36: Reading Data into R

Just click box

Page 37: Reading Data into R

Install packageLoad package

Read xls file chosen to nhefs

Page 38: Reading Data into R
Page 39: Reading Data into R

install.packages(“sas7bdat”, dep = T)

library(sas7bdat)read.sas7bdat(“file.sas7bdat”)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/smoking.sas7bdat

Page 41: Reading Data into R
Page 44: Reading Data into R

install.packages(“XML”, dep = T)library(XML)

readHTMLTable("http://www.drugs.com/top200_2003.html", which = 2, skip.rows = 1)

http://www.drugs.com/top200_2003.html

Page 45: Reading Data into R

Fixed width

Page 46: Reading Data into R

read.fwf(“file.txt”, width = c(3, 5, ...))

Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject

Page 47: Reading Data into R

Important functions

! install.packages(“PackageName”, dep = T)

! library(PackageName)

! str(dataset)

! summary(dataset)

! head(dataset)

Page 48: Reading Data into R
Page 49: Reading Data into R

Appendix:Probability Functions

Page 50: Reading Data into R

-norm -t -binom -pois what it does

d- dnorm dt dbinom dpois

density (mass)

given x-axis

p- pnorm pt pbinom ppois

return probability,

given x- axis(quan.)

q- qnorm qt qbinom qpois

return quantile (x-axis),

given prob.

-testlibrary(BS

DA): z.test,

zsum.test

t.test, library(BS

DA): tsum.test

binom.test poisson.test

return p-value and confidence

interval