reading data into r
TRANSCRIPT
Reading data into
2012-09-28 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO(KNOW
! Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
! Introduction to R
Previously in this group
Menu
! What statistics is all about.
! Data-reading functions in R
! Installing packages
! Reading excel files
! Reading other files
is the study of the collection, organization, analysis, interpretation,
and presentation of
datahttp://en.wikipedia.org/wiki/Statistics
http://mediacrushllc.com/2012/internet-statistics-2012/
No data,No life
No statistics
http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
Loading data is the first step
Supported! .RData (native): load()
! .csv: read.csv()
! .xls/.xlsx: library(gdata) or library(XLConnect)
! .sas7bdat: read.sas7bdat() via library(sas7bdat)
! .dta: read.dta via library(foreign)
! and more...http://cran.r-project.org/doc/manuals/R-data.html
library()packages
http://r4stats.com/articles/popularity/
4000+user-
contributedpackages
Fast development
Downside:not much can be
done withoutpackages
CRAN
Comprehensive
RArchive
Network
http://cran.r-project.org/web/packages/available_packages_by_date.html
Let’s try
Open R Studio
SourceConsole
Plot Workspace
switched
Menu: RStudio - Preferences
SourceConsole
My configuration
Plot Workspace
Menu: RStudio - Preferences My configuration
Configure CRAN mirror
Use .CSVif possible
http://www.edrugsearch.com/edsblog/cvs-takes-on-wal-marts-generic-drug-prices-with-a-gimmicky-twist/#.UEfft0J8z0d
Comma Separated Values
.csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
read.csv(“file.csv”)
http://www.wondergraphs.com/img/SFO_Landings.csv
Careful big file!
new.dat <- read.csv(“file.csv”)
name of a dataset here
file name herefunction to read .csv files
new.dat <- read.csv(file.choose())
name of a dataset here
function to open a file-choose dialoguefunction to read .csv files
alternatively
Space separated
http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
read.table(“file.dat”)or
read.table(“file.dat”, header = T)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
tab-separated
read.delim(“file.tsv”)http://www.brookscole.com/cgi-wadsworth/
course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495384
960&disciplinenumber=1038&template=AUS
For comma-, tab-, or space-separated text
Let’s try!
http://www.last.fm/music/Excel/+images/285200http://www.biography.com/people/bill-gates-9307520
Excel files prevalent
http://www.hsph.harvard.edu/faculty/miguel-hernan/files/nhefs_book.xls
http://www.hsph.harvard.edu/faculty/miguel-hernan/causal-inference-book/
http://www.philipcoppens.com/matrixconstructs.html
We will use publicly available
data
install.packages(“gdata”, dep = T)
library(gdata)read.xls(“file.xls”)
Perl configuration necessary on Winhttp://cran.r-project.org/web/packages/gdata/INSTALL
install.packages(“XLConnect”, dep = T)
library(XLConnect)readWorksheet(loadWorkbook(“file.xls”),
sheet=1)
install.packages("XLConnect", type = "source") on Mac
Define a function for simplicitymy.read.xls <- function(file) readWorksheet(loadWorkbook(file), sheet = 1)my.read.xls(“file.xls”)
install.packages(“package”, dep = T)
package name here
To install a package
short for dependenciesshort for TRUE
To load a package
library(package)
package name here
double quote “” can be omitted
Just click box
Install packageLoad package
Read xls file chosen to nhefs
install.packages(“sas7bdat”, dep = T)
library(sas7bdat)read.sas7bdat(“file.sas7bdat”)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/smoking.sas7bdat
library(foreign)read.xport(“file.xpt”)
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2009-2010/DEMO_F.xpt
library(foreign)read.dta(“file.dta”)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/headache.dta
http://www.drugs.com/top200_2003.html
HTML table
install.packages(“XML”, dep = T)library(XML)
readHTMLTable("http://www.drugs.com/top200_2003.html", which = 2, skip.rows = 1)
http://www.drugs.com/top200_2003.html
Fixed width
read.fwf(“file.txt”, width = c(3, 5, ...))
Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
Important functions
! install.packages(“PackageName”, dep = T)
! library(PackageName)
! str(dataset)
! summary(dataset)
! head(dataset)
Appendix:Probability Functions
-norm -t -binom -pois what it does
d- dnorm dt dbinom dpois
density (mass)
given x-axis
p- pnorm pt pbinom ppois
return probability,
given x- axis(quan.)
q- qnorm qt qbinom qpois
return quantile (x-axis),
given prob.
-testlibrary(BS
DA): z.test,
zsum.test
t.test, library(BS
DA): tsum.test
binom.test poisson.test
return p-value and confidence
interval