stat 565- lecture 0 introduction and map of this class

14
Stat 565- Lecture 0 Introduction and Map of this Class

Upload: stanley-parker

Post on 04-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Stat 565- Lecture 0 Introduction and Map of this Class

Stat 565- Lecture 0

Introduction and Map of this Class

Page 2: Stat 565- Lecture 0 Introduction and Map of this Class

What are we trying to do:

• The main purpose of this class is to get:– Statisticians a flair for the specific issues in genomic

data– Biologist an idea about how/why we analyze data

the way we do– Computer scientist an idea about the biological

(very briefly) and statistical issues with this data.

Bottom Line: is to allow three groups of people to talk to each other.

Page 3: Stat 565- Lecture 0 Introduction and Map of this Class

My role versus your role in the class

• My role: to pose the problems and try to teach the logic behind the statistical procedures.

Page 4: Stat 565- Lecture 0 Introduction and Map of this Class

Your role:

– If you are a Statistician: talk to your classmates about the Statistics. Some of the Statistics will be very familiar to you but NOT to your class mates.

– If you are a biologist: to explain to me and the class what some of the issues are. Specific problems like hybridization, PCR and ideas that you are familiar with but not all your class-mates

– If you are a computer scientist: to explain what some of the challenges are computationally and how we address them.

Page 5: Stat 565- Lecture 0 Introduction and Map of this Class

Statistical terms and topics we will use

• Descriptive Statistics

• Hypothesis Testing

• Multivariate Statistics

• Non-parametric Statistics

• Bayesian Statistics

• Design of Experiments: Optimal Design

Page 6: Stat 565- Lecture 0 Introduction and Map of this Class

Descriptive Statistics

• Terms we will see and use:– Histograms and Shapes– Boxplots– Scatter Plots– Mean, Median– Standard Deviation, Quartiles, Quantiles– Coefficient of Variation– Distribution plots: normal qq plots etc.

Page 7: Stat 565- Lecture 0 Introduction and Map of this Class

Hypothesis Testing

• Hypothesis • Type I and Type II errors• T-test• F-test for ANOVA• Chi-squares• P-values• Multiplicity (simultaneous testing/multiple comparison)

– Error control, family-wise error rates, FWER, Bonferroni, FDR– Single step vs sequential methods

Page 8: Stat 565- Lecture 0 Introduction and Map of this Class

Multivariate Statistics

• EDA: Exploratory Data Analysis• Cluster Analysis (hierarchical, non-hierarchical,

distance metrics, types of clustering)• Principal Components (Idea behind this)• Discriminant Analysis• Supervised vs Unsupervised Learning

Page 9: Stat 565- Lecture 0 Introduction and Map of this Class

Non-parametric Statistics

• How do non-parametric tests work in general• Sign Test• Wilcoxon Signed Rank Test• Wilcoxon Rank Sum test (Mann Whitney Test)• Tukey’s biweight algorithm• Kruskal Wallis test

Page 10: Stat 565- Lecture 0 Introduction and Map of this Class

Bayesian Statistics

• How these work• Empirical Bayes Methods• Moderated t or F test

Page 11: Stat 565- Lecture 0 Introduction and Map of this Class

Design of Experiments

• Why design?• Block designs• Criteria for determining optimality• Dye-swaps, block designs, loop designs

Page 12: Stat 565- Lecture 0 Introduction and Map of this Class

Structure of the Class

• We will use the basic definition of Statistics to define the structure of the class:

• Statistics comprises of methods for collecting, compiling, describing, analyzing and inferring from data.

Page 13: Stat 565- Lecture 0 Introduction and Map of this Class

Our steps

• We talk about the experiment that generates this data

• Specific nuances to the data collection, design issues, systematic effects

• Leads us to Normalization and issues therein• Type of Data – description and compilation• Analyze data for overall effects (Clustering etc)• Inferring from data (hypothesis testing)• Overall process

Page 14: Stat 565- Lecture 0 Introduction and Map of this Class

We will use R

• R is free-ware and we can access it readily. I will use this mainly in class.

http://cran.r-project.org/• The version I have is 3.1.2Choose your computer and operating system and Download and Install

R. The Binary Versions are the fastest. Also Install the packages as many as you can. It will ask for a CRAN site or mirror that’s close to you. I always use USA(WA) as my CRAN site.

• Will give you SAS code as well if you are interested.