basic statistical conceptsi122server.vu-wien.ac.at/.../day1/statsbasics_lecture.pdf · basic...

43
Carolin Kosiol Institute of Population Genetics Vetmeduni Vienna <[email protected]> Spezielle Statistik in der Biomedizin WS 2014/15 Basic Statistical Concepts

Upload: others

Post on 11-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Carolin Kosiol

Institute of Population Genetics

Vetmeduni Vienna

<[email protected]>

Spezielle Statistik in der Biomedizin

WS 2014/15

Basic Statistical Concepts

Page 2: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Aims of the course

An intuitive understanding of the fundamental

concepts in probability and statistics

Computational approaches using R

Practical understanding of linear models and related

concepts

The statistical model and frameworks that allow us

to identify specific genetic differences responsible

for differences in organisms that we can measure

You will be able to analyze a large data set for this

particular problem, e.g., a Genome-Wide Association

Study (GWAS)

Page 3: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Dates

26.11.2014 9:00-12:00 13.00-16:00

27.11.2014 9:00-12:00 13:00-16:00

28.11.2014 9:00-13:00 (homework assignment)

10.12.2014 9:00-12:00 13:00-15:00

11.12.2014 9:00-12:00 13:00-16:00

18.12.14: 13:00-14:00 (exam date)

Page 4: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Grading

Participation during lectures 5%

Homework assignment 35%

Written exam 60%

Page 5: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Website Resource

i122server.vuwien.ac.at/pop/Kosiol_website/

SpezStatistik2014/

login: student

Password: statistics

The information will be updated throughout the

course.

We will post slides for the computer labs and code

We will post all homeworks, exams, solutions, etc.

Page 6: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Book recommendations

Statistics: An Introduction using R

- Michael Crawley

- http://www3.imperial.ac.uk/naturalsciences/research/statisticsusingr

- Wiley and Sons

- ISBN-10: 0470022981

- ISBN-13: 978-0470022986

- 1 copy @ library

• ground floor signature 52079 ; BL-05.00

Page 7: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Book recommendations

Applied Statistical Genetics with R

- Andrea S. Foulkes

- Springer 2009

- ISBN: 978-0-387-89554-3

(quite technical)

Genome-Wide Disease Association

Analysis: A Primer

-Bruce Rannala

-Preview chapter

http://rannala.org/books/CUPChap2.pdf

(similar choice of topics, free)

Page 8: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Aims of Statistics

• Development and application of methods to

– collect

– summarize

– analyze

– interpret

data.

Page 9: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Problem Structure

• Data result from some underlying parameters +

random fluctuation

• Statistical methods permit to find out about

underlying parameters

• Statistical model: Specify how the data are

influenced by parameters and randomness

Page 10: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Situations where Statistics helps …

• Agriculture: which variety and fertilizer produces

highest yield?

• Medicine: which therapy for a certain disease is the

most effective?

• Nutrition: does green tea reduce the risk of certain

types of cancer?

• Statistical Genetics: find out about mutations that

convey resistance or susceptibility to a disease.

Page 11: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Statistical Genetics

We know that aspects of an organism (measurable

attributes and states such as disease) are influenced by

the genome (the entire DNA sequence) of an individual

This means difference in genomes (genotype) can

produce differences in a phenotype:

• Genotype - any quantifiable genomic difference among

individuals, e.g. Single Nucleotide Polymorphisms (SNPs).

• Phenotype - any measurable aspect of an organisms

(that is not the genotype!).

Examples?

Page 12: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

An Illustration

Example: People are different...

We know that environment plays a role in these differences ...and

for many, differences in the genome play a role

For any two people, there are millions of differences in their DNA, a

subset of which are responsible for producing differences in a given

measurable aspect.

Page 13: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

An Illustration (cont.)

The problem: for any two people, there can be millions of

differences their genomes...

How do we figure out which differences are involved in

producing differences and which ones are not?

This course is concerned with how we do this.

Note that the problem (and methodology) applies to any

measurable difference, for any type of organism!!

Page 14: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Why do we want to know this?

We target genomic differences responsible for genetic

diseases for gene therapy

We can manipulate genomes of agricultural crops to be

disease resistant strains

We can explain why a disease has a particular frequency

in a population, why we see a particular set of differences

These differences provide a foundation for understanding

how pathways, developmental processes, physiological

processes work

The list goes on...

Page 15: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Statistical Genomics

Traditionally, determining the impact of genome

differences on phenotypes was the province of fields of

“Genetics”

Given this dependence on genomes, it is no surprise that

modern genetic fields now incorporates genomics: the

study of an organism’s entire genome (wikipedia

definition)

However, one can study genetics without genomics (i.e.

without direct information concerning DNA) and the

merging of genetics genomics is quite recent

Page 16: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

History of Statistical Genetics

In sum: during the last decade, the greater availability of DNA

sequence data has completely changed our ability to make

connections between genome differences and phenotypes

Page 17: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Quantitative Genomics

In this course, we will use statistical modeling to say

something about biology, specifically the relationships

between genotype (DNA) and phenotype

Quantitative genomics is a field concerned with the

modeling of the relationship between genomes and

phenotypes and using these models to discover and

predict

We will use frameworks from the fields of probability and

statistics for this purpose

Page 18: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Advances in sequencing

Page 19: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Sources of random fluctuation

• Random sampling

• Measurement error

• Sequencing errors

• Genetic drift

• Others sources (depending on application)

Page 20: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Genomics & Statistics

A non-technical definition of probability: a mathematical

framework for modelling under uncertainty

Such a system is particularly useful for modelling

systems where we don’t know and/or cannot measure

critical information for explaining the patterns we observe

This is exactly the case we have in quantitative

genomics when connecting differences in a genome to

differences in phenotypes

Page 21: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Genomics & Statistics

We are interested in using a probability model to identify

relationships between genomes and phenotypes using

DNA sequences and phenotype measurements.

For this purpose, we will use the framework of statistics,

which we can (non-technically) define as a system for

interpreting data for the purposes of prediction and

decision making given uncertainty.

Page 22: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Basic Statistical Concepts

Population and Sample

Random Samples

Controlled experiment versus observational studies

Types of Data

Descriptive Statistics vs. Statistical Inference

Page 23: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Population and Sample

observations are taken

Population can be real or hypothetical, choice of

population depends on aim of investigation

Sample: Subset of the population that is actually

investigated

Different samples lead to different conclusions: statistical

methods take uncertainty due to sampling into account

Page 24: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Random Samples

Simple random sampling: each member of population has same chance of being included into sample.

Other types of random sampling: chance of being included is well defined.

Random sampling ensures that sample represents population.

Ideal, but often difficult to achieve.

Page 25: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Observational Study versus Experiment

In a controlled experiment investigator is able to choose

explanatory variables of interest.

Observational study: Both explanatory variables and

responses are observed.

Important consequences when investigating cause-effect

relationships.

Page 26: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 1

Page 27: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 1

Page 28: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 2

Page 29: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 2

Page 30: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 3

Page 31: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 3

Page 32: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Response vs. Explanatory variables

Example 4:

Page 33: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Types of Data

• Qualitative Variables: gender, species, hair color, …

Quantitative variables

Discrete: things you count

(number of people successfully treated, number of “A”

alleles)

Continuous: things you measure

(temperature, blood pressure)

Page 34: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Descriptive Statistics vs. Statistical Inference

Descriptive Statistics: Summarize Data.

Statistical Inference: Find out about properties of the

population using sample(s), separate as good as

possible random fluctuation from underlying parameters.

Quantities used to summarize data (>summary statistics)

often also used for statistical inference.

Page 35: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Descriptive Statistics

EDA - Exploratory Data Analysis & Summary Statistics

-> Get a feel for your data!

Always plot your data before using formal tools of

analysis.

EDA is the quickest way to see what the data says,

often reveals interesting features that were not

expected,

helps prevent inappropriate analyses and unfounded

conclusions.

Plots also have a central role in checking up on the

assumptions made by formal methods.

Page 36: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Measures of Location (Central Tendency)

Page 37: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Measures of Location (Cont.)

Page 38: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Order statistics

Page 39: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Quantiles

Page 40: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Measures of spread

Page 41: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Numerical EDA with R

arithmetic mean: mean()

median: median()

weighted mean: weighted.mean(x,w)

variance: var()

standard deviation: sd()

minimum: min()

maximum: max()

quantiles: quantile(x, probs=seq(0, 1, 0.25))

range: range(); diff(range())

interquartile range: IQR()

Page 42: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

Graphical EDA with R

boxplot

bar chart / pie chart

histogram

density plots

Page 43: Basic Statistical Conceptsi122server.vu-wien.ac.at/.../day1/StatsBasics_lecture.pdf · Basic Statistical Concepts . Aims of the course ... • Statistical Genetics: find out about

We continue with the R practical after the

break!