www.biostat.ir1 بسم الله الرّحمن الرّحيم. 2 biostatistics academic preview...

62
www.biostat.ir www.biostat.ir 1 م ي حّ ر ل ا ن م حّ ر ل ه ا ل ل م ا س ب م ي حّ ر ل ا ن م حّ ر ل ه ا ل ل م ا س ب

Upload: damon-blair

Post on 27-Dec-2015

249 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

www.biostat.irwww.biostat.ir 11

حمن حمن بسم الله الّر� بسم الله الّر�

حيم حيمالّر� الّر�

Page 2: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

www.biostat.ir 2

Biostatistics Academic

Preview

Descriptive Statistics

Page 3: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

3www.biostat.ir

What Is Statistics?What Is Statistics?

Statistics is the science of describing Statistics is the science of describing or making inferences about the or making inferences about the world from a sample of data.world from a sample of data.

Descriptive statistics are numerical Descriptive statistics are numerical estimates that organize and sum up estimates that organize and sum up or present the data.or present the data.

Inferential statistics is the process of Inferential statistics is the process of inferring from a sample to the inferring from a sample to the population.population.

Page 4: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

4www.biostat.ir

Statistics has two major chapters:Statistics has two major chapters:

Descriptive StatisticsDescriptive Statistics

Inferential statisticsInferential statistics

Page 5: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

5www.biostat.ir

Two types of StatisticsTwo types of Statistics Descriptive statisticsDescriptive statistics

Used to summarize, organize and simplify Used to summarize, organize and simplify datadata

What was the average height score?What was the average height score? What was the highest and lowest score?What was the highest and lowest score? What is the most common response to a What is the most common response to a

question?question? Inferential statisticsInferential statistics

Techniques that allow us to study Techniques that allow us to study samplessamples and then make generalizations about the and then make generalizations about the populationspopulations from which they were selected from which they were selected

Are 5th grade boys taller than 5th grade girls?Are 5th grade boys taller than 5th grade girls? Does a treatment suitable?Does a treatment suitable?

Page 6: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

6www.biostat.ir

Population and SamplesPopulation and Samples

The Population under study is the set off The Population under study is the set off all individualsall individuals of interest for the research. of interest for the research.

That That part of the populationpart of the population for which we for which we collect measurements is called sample.collect measurements is called sample.

The number of individuals in a sample is The number of individuals in a sample is denoted by n.denoted by n.

Page 7: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

7www.biostat.ir

VariablesVariables

Page 8: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

8www.biostat.ir

DefinitionsDefinitions

Variable:Variable: a characteristic that a characteristic that changeschanges or or variesvaries over time and/or over time and/or different subjects under consideration.different subjects under consideration.

Changing over timeChanging over time Blood pressure, height, weightBlood pressure, height, weight

Changing across a populationChanging across a population gender, racegender, race

Page 9: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

9www.biostat.ir

Types of variablesTypes of variables

Data

Variables

Quantitative(numeric)

Qualitative(categorical)

Discrete Continuous Nominal Ordinal

Page 10: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

10www.biostat.ir

Types of variables :Types of variables :DefinitionsDefinitions

Quantitative variables (numeric):Quantitative variables (numeric): measure a numerical quantity of measure a numerical quantity of amount on each experimental unitamount on each experimental unit

Qualitative variables (categorical)Qualitative variables (categorical):: measure a non numeric quality or measure a non numeric quality or characteristic on each experimental characteristic on each experimental unity by classifying each subject into a unity by classifying each subject into a categorycategory

Page 11: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

11www.biostat.ir

Types of variables :Types of variables :Quantitative variablesQuantitative variables

Discrete variables:Discrete variables: can only take can only take values from a list of possible valuesvalues from a list of possible values Number of brushing per dayNumber of brushing per day

Continuous variablesContinuous variables: : can assume can assume the infinitely many values the infinitely many values corresponding to the points on a line corresponding to the points on a line intervalinterval weight, heightweight, height

Page 12: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

12www.biostat.ir

Types of variables :Types of variables :Categorical variablesCategorical variables

Nominal:Nominal: unordered categoriesunordered categories RaceRace GenderGender

Ordinal:Ordinal: ordered categoriesordered categories likert scales( disagree, neutral, agree )likert scales( disagree, neutral, agree ) Income categoriesIncome categories

Page 13: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

13www.biostat.ir

Types of VariablesTypes of Variables

A discrete variable has gaps between A discrete variable has gaps between its values. For example, number of its values. For example, number of brushing per day is a discrete variable. brushing per day is a discrete variable.

A continuous variable has no gaps A continuous variable has no gaps

between its values. All values or between its values. All values or fractions of values have meaning. Age fractions of values have meaning. Age is an example of continuous variable.is an example of continuous variable.

Page 14: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

14www.biostat.ir

Levels of MeasurementLevels of Measurement

Reflects type of information Reflects type of information measured and helps determine what measured and helps determine what descriptive statistics and which descriptive statistics and which statistical test can be used.statistical test can be used.

Page 15: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

15www.biostat.ir

Four Levels of Four Levels of MeasurementMeasurement

NominalNominal lowest level, categories, no rank lowest level, categories, no rank

OrdinalOrdinal second lowest, ranked second lowest, ranked categoriescategories

IntervalInterval next to highest, ranked next to highest, ranked categories with categories with known units between rankingsknown units between rankings

RatioRatio highest level, ranked categories highest level, ranked categories with with known intervals and an known intervals and an absolute zeroabsolute zero

Page 16: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

16www.biostat.ir

Scales of MeasurementScales of Measurement TemperatureTemperature Men/WomenMen/Women Good/Better/BestGood/Better/Best WeightWeight Republicans/Democrats/ Republicans/Democrats/

IndependentsIndependents VolumeVolume IQIQ Not at all/A little/A lotNot at all/A little/A lot

IntervalInterval

NominalNominal OrdinalOrdinal RatioRatio NominalNominal RatioRatio IntervalInterval OrdinalOrdinal

Page 17: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

17www.biostat.ir

Descriptive Statistics

Qualitative Quantitative

FrequencyRelative frequency

Percentage

Measures of Central TendencyMeasures of spreadFive number system

TablesPie ChartsBar Graphs

Tables HistogramsBox plotsBar chartsLine charts

Page 18: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

18www.biostat.ir

Descriptive MeasuresDescriptive Measures

Central Tendency measuresCentral Tendency measures. . They They are computed in order to give a “center” are computed in order to give a “center” around which the measurements in the around which the measurements in the data are distributed.data are distributed.

Relative Standing measuresRelative Standing measures. . They They describe the relative position of a specific describe the relative position of a specific measurement in the data.measurement in the data.

Variation or Variability measuresVariation or Variability measures. . They describe “data spread” or how far They describe “data spread” or how far away the measurements are from the away the measurements are from the center.center.

Page 19: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

19www.biostat.ir

Measures of Central TendencyMeasures of Central Tendency

MeanMean: :

Sum of all measurements in the data divided by Sum of all measurements in the data divided by the number of measurements.the number of measurements.

MedianMedian: :

A number such that at most half of the A number such that at most half of the measurements are below it and at most half of measurements are below it and at most half of the measurements are above it.the measurements are above it.

ModeMode: : The most frequent measurement in the data.The most frequent measurement in the data.

Page 20: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

20www.biostat.ir

Summary Statistics: Summary Statistics: Measures of central tendency Measures of central tendency

(location)(location) Mean: The mean of a data set is the sum of the Mean: The mean of a data set is the sum of the

observations divided by the number of observationobservations divided by the number of observation Population mean: Sample mean:Population mean: Sample mean:

Median: The median of a data set is the “middle Median: The median of a data set is the “middle value”value” For an odd number of observations, the median is the For an odd number of observations, the median is the

observation exactly in the middle of the ordered listobservation exactly in the middle of the ordered list For an even number of observation, the median is the For an even number of observation, the median is the

mean of the two middle observation is the ordered listmean of the two middle observation is the ordered list

Mode: The mode is the single most frequently Mode: The mode is the single most frequently occurring data valueoccurring data value

n

iix

n 1

1

n

iix

1

n

iix

nx

1

1

Page 21: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

21www.biostat.ir

SkewnessSkewnessTheThe skewness skewness of a distribution is measured by of a distribution is measured by comparing the relative positions of the mean, median comparing the relative positions of the mean, median and mode.and mode. Distribution is Distribution is symmetricalsymmetrical

Mean = Median = ModeMean = Median = Mode

Distribution Distribution skewed rightskewed right Median lies between mode and mean, and Median lies between mode and mean, and

mode is less than meanmode is less than mean

Distribution Distribution skewed leftskewed left Median lies between mode and mean, and Median lies between mode and mean, and

mode is greater than meanmode is greater than mean

Page 22: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

22www.biostat.ir

Relative positions of the mean and Relative positions of the mean and median for (a) right-skewed, (b) median for (a) right-skewed, (b) symmetric, andsymmetric, and(c) left-skewed distributions (c) left-skewed distributions

Note: The mean assumes that the data is normally distributed. If this is not the case it is better to report the median as the measure of location.

Page 23: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

23www.biostat.ir

Frequency Distributions and Histograms Frequency Distributions and Histograms

Histograms for symmetric and skewed distributions.

Page 24: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

24www.biostat.ir

Normal curvesNormal curvessame mean but different standard same mean but different standard

deviationdeviation

Page 25: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

25www.biostat.ir

Further NotesFurther Notes

When the Mean is greater than the Median the When the Mean is greater than the Median the data distribution is skewed to the Right.data distribution is skewed to the Right.

When the Median is greater than the Mean the When the Median is greater than the Mean the data distribution is skewed to the Left.data distribution is skewed to the Left.

When Mean and Median are very close to each When Mean and Median are very close to each other the data distribution is approximately other the data distribution is approximately symmetricsymmetric..

Page 26: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

26www.biostat.ir

Summary statisticsSummary statisticsMeasures of spread (scale)Measures of spread (scale)

Variance: The average of the squared Variance: The average of the squared deviations of each sample value from the deviations of each sample value from the sample mean, except that instead of dividing sample mean, except that instead of dividing the sum of the squared deviations by the the sum of the squared deviations by the sample size N, the sum is divided by N-1.sample size N, the sum is divided by N-1.

Standard deviation: The square root of the Standard deviation: The square root of the sample variance sample variance

Range: the difference between the maximum Range: the difference between the maximum and minimum values in the sample. and minimum values in the sample.

n

ii xx

ns

1

2

1

1

n

ii xx

ns

1

22

1

1

Page 27: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

27www.biostat.ir

Summary statistics: measures of Summary statistics: measures of spread (scale)spread (scale)

We can describe the spread of a distribution by We can describe the spread of a distribution by using percentiles. using percentiles.

The The pth pth percentile of a distribution is the value percentile of a distribution is the value such that p percent of the observations fall at or such that p percent of the observations fall at or below it.below it. Median=50Median=50thth percentile percentile

Quartiles divide data into four equal parts.Quartiles divide data into four equal parts. First quartile—QFirst quartile—Q11

25% of observations are below Q25% of observations are below Q11 and 75% above Q and 75% above Q11

Second quartile—QSecond quartile—Q22

50% of observations are below Q50% of observations are below Q2 2 and 50% above Qand 50% above Q22

Third quartile—QThird quartile—Q33

75% of observations are below Q75% of observations are below Q33 and 25% above Q and 25% above Q33

Page 28: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

28www.biostat.ir

QuartilesQuartiles

25% 25% 25% 25%

Q 3Q 2Q 1

Page 29: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

29www.biostat.ir

Five number systemFive number system

MaximumMaximum MinimumMinimum Median=50Median=50thth percentile percentile Lower quartile Q1=25Lower quartile Q1=25thth percentile percentile Upper quartile Q3=75Upper quartile Q3=75thth percentile percentile

Page 30: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

30www.biostat.ir

Graphical display of Graphical display of numerical variablesnumerical variables

(histogram)(histogram)

Class IntervalFrequency

20-under 30 6

30-under 40 18

40-under 50 11

50-under 60 11

60-under 70 3

70-under 80 1

010

20

0 10 20 30 40 50 60 70 80

Years

Fre

qu

ency

Page 31: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

31www.biostat.ir

Frequency Distributions and Histograms Frequency Distributions and Histograms

A histogram of the compressive strength data with 17 bins.

Page 32: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

32www.biostat.ir

Frequency Distributions and Histograms Frequency Distributions and Histograms

A histogram of the compressive strength data with nine bins.

Page 33: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

33www.biostat.ir

Histogram of compressive strength data.

Frequency Distributions and Histograms Frequency Distributions and Histograms

Page 34: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

34www.biostat.ir

Q1 Q3Q2Minimum Maximum

Median

Graphical display of Graphical display of numerical variablesnumerical variables

(box plot)(box plot)

Page 35: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

35www.biostat.ir

NegativelySkewed

PositivelySkewed

Symmetric(Not Skewed)

S < 0 S = 0 S > 0

Graphical display of Graphical display of numerical variablesnumerical variables

(box plot)(box plot)

Page 36: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

36www.biostat.ir

Summary measuresSummary measures Count=frequencyCount=frequency Percent=frequency/total samplePercent=frequency/total sample

The distribution of a categorical The distribution of a categorical variable lists the categories and variable lists the categories and gives either a count or a percent of gives either a count or a percent of individuals who fall in each categoryindividuals who fall in each category

Univariate statisticsUnivariate statistics(categorical variables)(categorical variables)

Page 37: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

37www.biostat.ir

Displaying categorical Displaying categorical variablesvariables

RankRank Cause Cause of Deathof Death

FrequenFrequency (%)cy (%)

11 Heart Heart DiseaseDisease

710,760 710,760 (43%)(43%)

22 CancerCancer 553,091 553,091 (33%)(33%)

33 StrokeStroke 167,661 167,661 (11%)(11%)

44 CLRDCLRD 122,009 122,009

( 7%)( 7%)

55 AccidentAccidentss

97,90097,900

( 6%)( 6%)

TotalTotal All five All five causescauses

1,651,421,651,4211

0

20

40

60

heart cancer stroke CLRD accident

heart cancer stroke CLRD accident

Page 38: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

38www.biostat.ir

Response and explanatory Response and explanatory variablesvariables

Response variable: the variable which we intend to model. we intend to explain through statistical

modeling

Explanatory variable: the variable or variables which may be used to model the response variable values may be related to the response

variable

Page 39: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

39www.biostat.ir

Bivariate relationshipsBivariate relationships

An extension of univariate An extension of univariate descriptive statisticsdescriptive statistics

Used to detect evidence of Used to detect evidence of association in the sampleassociation in the sample Two variables are said to be associated Two variables are said to be associated

if the distribution of one variable differs if the distribution of one variable differs across groups or values defined by the across groups or values defined by the other variableother variable

Page 40: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

40www.biostat.ir

Bivariate RelationshipsBivariate Relationships

Two quantitative variablesTwo quantitative variables Scatter plotScatter plot Side by side stem and leaf plotsSide by side stem and leaf plots

Two qualitative variablesTwo qualitative variables TablesTables Bar chartsBar charts

One quantitative and one qualitative One quantitative and one qualitative variablevariable Side by side box plotsSide by side box plots Bar chartBar chart

Page 41: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

41www.biostat.ir

Two quantitative variablesTwo quantitative variablesCorrelationCorrelation

What type of relationship exists between the two variables and is the correlation significant?

x y

Cigarettes smoked per day

Height

Hours of Training

Explanatory(Independent)Variable

Response(Dependent)Variable

A relationship between two variables.

Number of Accidents

Shoe Size Height

Lung Capacity

IQ

Page 42: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

42www.biostat.ir

Negative Correlation as x increases, y decreases

x = hours of trainingy = number of accidents

Scatter Plots and Types of Scatter Plots and Types of CorrelationCorrelation

Accidents

Page 43: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

43www.biostat.ir

Positive Correlation as x increases y increases

x = SAT scorey = GPAGPA

Scatter Plots and Types of Scatter Plots and Types of CorrelationCorrelation

Page 44: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

44www.biostat.ir

IQ

No linear correlation

x = height y = IQ

Scatter Plots and Types of Scatter Plots and Types of CorrelationCorrelation

Page 45: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

45www.biostat.ir

Correlation CoefficientCorrelation CoefficientA measure of the strength and direction of a linear relationship

between two variables

2222 )( yynxxn

yxxynr

The range of r is from -1 to 1.

If r is close to 1 there is a strong

positive correlation

If r is close to -1 there is a strong negative correlation

If r is close to 0 there is no

linear correlation

-1 0 1

Page 46: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

46www.biostat.ir

Positive and negative Positive and negative correlationcorrelation

11 If two variablesIf two variables x x and and yy are positively correlated this are positively correlated this means that:means that: large values of large values of xx are associated with large values of are associated with large values of

yy, and, and small values ofsmall values of xx are associated with small values of are associated with small values of

yy

22 If two variables If two variables x x and and yy are negatively correlated this are negatively correlated this means that:means that: large values of large values of xx are associated with small values of are associated with small values of

yy, and, and small values ofsmall values of x x are associated with large values of are associated with large values of

yy

Page 47: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

47www.biostat.ir

Positive correlationPositive correlation

Page 48: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

48www.biostat.ir

Negative correlationNegative correlation

Page 49: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

49www.biostat.ir

Two qualitative variablesTwo qualitative variables(Contingency Tables)(Contingency Tables)

Categorical data is usually displayed Categorical data is usually displayed using a contingency table, which using a contingency table, which shows the frequency of each shows the frequency of each combination of categories observed in combination of categories observed in the data valuethe data value The rows correspond to the categories of The rows correspond to the categories of

the explanatory variablethe explanatory variable

The columns correspond the categories of The columns correspond the categories of the response variablethe response variable

Page 50: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

50www.biostat.ir

ExampleExample

Aspirin and Heart AttacksAspirin and Heart Attacks Explanatory variable=drug receivedExplanatory variable=drug received

placeboplacebo AspirinAspirin

Response variable=heart attach statusResponse variable=heart attach status yesyes nono

Page 51: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

51www.biostat.ir

Contingency table:Contingency table: heart attack example heart attack example

Heart Heart AttackAttack

No Heart No Heart AttackAttack

TotalTotal

AspirinAspirin 104104 10,93310,933 11,03711,037

placeboplacebo 189189 10,84510,845 11,03411,034

TotalTotal 293293 21,77821,778 22,07122,071

Page 52: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

52www.biostat.ir

Two qualitative variablesTwo qualitative variables

BotBothh

NeithNeitherer

OneOne

NeverNever 1717 141141 6868 226226

OccasionOccasionalal

1111 5454 4444 109109

RegularRegular 1919 4040 5151 110110

TotalTotal 4747 235235 163163 4454450

10

20

30

40

50

60

Both Neither One

Never Occasional Regular

Marijuana Use in College: x=parental use, y=student use

Page 53: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

53www.biostat.ir

0 1

10

20

30

40

50

age

l bw

22.31

23.66

21.5

22

22.5

23

23.5

24

yes no

low birth weight

low birth weight

Box plot of age by low birth weight Mean age by low birth weight

One quantitative, One One quantitative, One qualitativequalitative

Page 54: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

54www.biostat.ir

Trivariate RelationshipsTrivariate Relationships An extension of bivariate descriptive An extension of bivariate descriptive

statisticsstatistics

We focus on description that helps us We focus on description that helps us decide about the role variables might play decide about the role variables might play in the ultimate statistical analysesin the ultimate statistical analyses

Identify variables that can increase the Identify variables that can increase the precision of the data analysis used to precision of the data analysis used to answer associations between two other answer associations between two other variablesvariables

Page 55: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

55www.biostat.ir

Confounding and effect Confounding and effect modificationmodification

A factor, Z, is said to A factor, Z, is said to confoundconfound a relationship a relationship between a risk factor, X, and an outcome, Y, if it between a risk factor, X, and an outcome, Y, if it is not an effect modifier and the unadjusted is not an effect modifier and the unadjusted strength of the relationship between X and Y strength of the relationship between X and Y differs from the common strength of the differs from the common strength of the relationship between X and Y for each level of Z. relationship between X and Y for each level of Z.

A factor, Z, is said to be an A factor, Z, is said to be an effect modifiereffect modifier of a of a relationship between a risk factor, X, and an relationship between a risk factor, X, and an outcome measure, Y, if the strength of the outcome measure, Y, if the strength of the relationship between the risk factor, X, and the relationship between the risk factor, X, and the outcome, Y, varies among the levels of Z. outcome, Y, varies among the levels of Z.

Page 56: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

56www.biostat.ir

Example: confoundingExample: confounding

In our low birth weight data suppose In our low birth weight data suppose we wish to investigate the association we wish to investigate the association between race and low birth weight.between race and low birth weight.

Our ability to detect this association Our ability to detect this association might be affected by:might be affected by: Smoking status being associated with low Smoking status being associated with low

birth weightbirth weight Smoking status being associated with raceSmoking status being associated with race

Page 57: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

57www.biostat.ir

Multiple ModelsMultiple Models

Allows one to calculated the association Allows one to calculated the association between and response and outcome of between and response and outcome of interest, after controlling for potential interest, after controlling for potential confounders.confounders.

Allows for one to assess the association Allows for one to assess the association between an outcome and multiple between an outcome and multiple response variables of interest.response variables of interest.

Page 58: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

58www.biostat.ir

Time Sequence Plots Time Sequence Plots

• A time series or time sequence is a data set in which the observations are recorded in the order in which they occur. • A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.). • When measurements are plotted as a time series, weoften see

•trends, •cycles, or •other broad features of the data

Page 59: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

59www.biostat.ir

Time Sequence Plots Time Sequence Plots

Company sales by year (a) and by quarter (b).

Page 60: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

60www.biostat.ir

Tests comparing difference between 2 or more Tests comparing difference between 2 or more groupsgroups

TestTest Dependent Dependent variablevariable

Independent Independent variablevariable

PairedPaired

((dependent t-dependent t-testtest))

Interval/ratio Interval/ratio pre and post pre and post teststests

Nominal Nominal

Unpaired Unpaired (independent t-(independent t-test)test)

Interval/ratioInterval/ratio Nominal (2 Nominal (2 grps)grps)

ANOVA F-ANOVA F-testtest

Interval/ratioInterval/ratio Nominal (>2 Nominal (>2 grps)grps)

Chi-SquareChi-Square

(Nonparamet(Nonparametric)ric)

Nominal Nominal (Dichotomous)(Dichotomous)

NominalNominal

Page 61: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

61www.biostat.ir

Tests demonstrating Tests demonstrating association between two association between two

groupsgroups

TestTest Dependent Dependent var.var.

Independent Independent var.var.

Spearman rhoSpearman rho OrdinalOrdinal OrdinalOrdinal

Mann-Whitney Mann-Whitney UU

Non-parametricNon-parametric

OrdinalOrdinal NominalNominal

Pearson’s rPearson’s r Interval/ratioInterval/ratio Interval/ratioInterval/ratio

Page 62: Www.biostat.ir1 بسم الله الرّحمن الرّحيم.  2 Biostatistics Academic Preview Descriptive Statistics

62www.biostat.ir

Tests demonstrating Tests demonstrating association between two association between two groups, controlling for groups, controlling for

third variablethird variableTestTest DependentDependent IndependentIndependent

Logistic Logistic regressionregression

NominalNominal NominalNominal

Linear Linear regressionregression

Interval/ratioInterval/ratio Interval/ratioInterval/ratio

Pearson Pearson partial rpartial r

Interval/ratioInterval/ratio Interval/ratioInterval/ratio

Kendall’s Kendall’s partial rpartial r

OrdinalOrdinal OrdinalOrdinal