itmat pcbi-r-course-1
DESCRIPTION
First part of 3-part course on teaching the R statistical package.TRANSCRIPT
Intro to using R for Bioinformatics: Part 1 : The Basics
Angel [email protected]
Injecting a bit of reality
Taking it a bit further…
Waxing floors is not fun, and may not seem relevant, but have some faith Daniel-san
Outline
• We will teach you some basic uses of R– “Do & Tell” method where you will be asked to do
an exercise and once done, we will explain what just happened.
– Will cover basics, plotting and microarray analysis• We will not teach you statistics.
What is ?R is a language and environment for statistical computing and graphics.
– http://www.r-project.org
You can do stuff like this
Install & Run R
• You should have already installed R, but if you had trouble please see us after class
• Start R– On Windows, use Tinn-R– On Mac, use the source R application– On Linux, use the console
Help is plentiful
Help in three ways
Too much! Get me out!
More Helphelp.start()
– Start an HTML help session
help(mean)– Looks up the mean()
function's help page– ?mean
help.search(mean) – Displays all help pages that
contain text “mean”– ??mean
Whet your appetite…
The Basics
• Please enter each of the following lines into your R session:
Basic Algebra
You will also see this form:
Variables
• “x” and “y” are variables. • They are pointers to some value• They can also be pointers to some function
Vectors
Enter this in your session: Results
Small tangent: What is “c (1,2,3)”?
• Use the help()
Accessing Vector MembersIn R, Vectors start indexes at 1. Most programming languages start indexing at zero
Also, NOT WHAT YOU THINK IT IS! It is a INDEX VECTOR, meaning that you access the members of a vector with a vector
Small Tangent 2: Creating Sequences
• Create regular sequences using a colon
• Colon has high operator precedence
• Also see the seq() function
Vectors
• Are a list of items of the same data type
Short for “double precision floating point number”
Doing Stuff with Vectors
• Math operations occur on each element in sequence
• Returns a vector of the same size
Factors
• Simply a vector of items that mean something– Disease classifications, drug dosage, US states,
months, hapmap ethnic group– Can be ordered– Can have multiple levels• GO Functions
Array and Matrix
• Multi-dimensional generalizations of vectors– k-dimensions where k > 0– Assigned by the dim attribute
• Can be indexed by two or more indices– If a single index value (can be a vector) is given,
then dim is ignored and underlying vector values are accessed directly
– Unless the given index values is also an array• Matrix is a two-dimensional array
Example
An INDEX ARRAY
List
• An ordered collection of named components
List Access
Data Frame
• Bastard step child of List and Matrix– Essentially a list of vectors of same length
• Closest representation to an Excel file in R• Easiest way to make one is to read in a CSV file
Functions
• We’ve already used them• Functions take in arguments and perform
some action using those arguments. • Actions do not affect the input arguments
Example
Write to CSV file
Extra column of the row indices
Save your work!
• R keeps track of your data and functions
• You can start from where you left off if you save these to some file
Start from your save point