statistics short course - [email protected] statistics short course . llorenç...
TRANSCRIPT
ASYRAS 2017 Barcelona 1 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Llorenç Badiella Oliver Valero Servei d'Estadística Aplicada Universitat Autònoma de Barcelona www.uab.cat/s-estadistica [email protected]
Statistics short course
ASYRAS 2017 Barcelona 2 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Contenido
Chapter Section
ASYRAS 2017 Barcelona 3 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Contents
1) Evidence:
• Relevance
• Validity
• Reliability
2) Study Design
• Objective and Hypothesis
• Experimental units
• Variables
3) Statistical Methodology – Study Case
• Summary measures
• Baseline comparisons and Bivariate Analyses
• Statistical models
ASYRAS 2017 Barcelona 4 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Statistics
1) Evidence
ASYRAS 2017 Barcelona 5 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
In order to obtain solid conclusions from an experiment or survey,
3 conditions are needed:.
3) Relevance
1) Reliability
2) Validity 2) Study Design
1) Sample Size
3) Conclusions
ASYRAS 2017 Barcelona 6 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
Low Reliability: Imprecise conclusions
3) Relevance
1) Reliability
2) Validity
ASYRAS 2017 Barcelona 7 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
Low Reliability: Imprecise conclusions
ASYRAS 2017 Barcelona 8 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
3) Relevance
1) Reliability
2) Validity
Low validity: bad experimental design, unrealistic conclusions
ASYRAS 2017 Barcelona 9 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
Low validity, bad experimental design: Unrealistic conclusions
ASYRAS 2017 Barcelona 10 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
Doubtful Relevance: Conclusions with no value
3) Relevance
1) Reliability
2) Validity
ASYRAS 2017 Barcelona 11 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
1) Evidence
Doubtful Relevance: Conclusions with no value
ASYRAS 2017 Barcelona 12 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
2) Study Design Objective
o Descriptive
o Comparative
o Predictive
o Classification
Types of Studies by objective
ASYRAS 2017 Barcelona 13 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
2) Study Design Objective
Types of Studies by objective
Objective: The purpose toward which an endeavor is directed.
o Demonstrate Murphy’s law (Anything that can go wrong will go wrong)
o Explore if Scots (or Catalans) are frugal
o Investigate if a pleasant working environment improves productivity (relaxing
music)
o Predict whether a patient, hospitalized due to a heart attack, will have a second
heart attack (based on demographic, diet and clinical measurements for that
patient).
o Perform a segmentation of customers (products purchased)
ASYRAS 2017 Barcelona 14 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
2) Study Design Objective
Hypothesis: A quantifiable and testable statement involving population traits. Descriptive: Scots are cheap o The majority of scots do not leave large tips at restaurants o Less than 20% of scots leave tips larger than 15% at restaurants
Comparative: Investigate the effect of relaxing music in productivity (in a clothes shop) o Relaxing music compared to Silent environment, improves sales by 10%. Predictive: Predict whether a patient will have a second heart attack o ¿Are the following variables: Gender, age, social status, diet, clinical measurements,
complaints,… related to the observation of a second heart attack?
Classification: Perform a segmentation of the costumers o ¿Do the following variables Gender, age, social status, Loyalty, profitability, … help to define
a set of costumers clusters?
ASYRAS 2017 Barcelona 15 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
2) Study Design Experimental units
Experimental Units: A unit in a statistical analysis refers to one member of a set of entities being independently studied (patients, subjects, animals, emails, images, pixels, etc.).
The extrapolation of results (prediction) using statistical tools, based on the analysis of a sample of units is only valid for the population where those units come from.
Variables (features, measures, inputs, output) will be evaluated on each experimental unit .
ASYRAS 2017 Barcelona 16 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
2) Study Design Variables
Types of variables (based on their nature):
Variable Examples
Nominal A, B, C
Binary Y, N
Ordinal None, Mild,
Moderate, High, Severe
Ordinal I, II, III, IV
Discrete (0,1,2,3…)
Continuous R
Time R+
Rate R+
Qu
alit
ativ
e Q
uan
tita
tive
Centre, Region, Treament, Political party
Sex, Cure, Success
Improvement, Satisfaction, Severity
Improvement, Satisfaction, Severity
Nº Events, Nº Products
Age, Revenue, Income
Time to some event occurrence
Success rate
ASYRAS 2017 Barcelona 17 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Types of data (based on their role):
2) Study Design Variables
A unique variable capable of providing the most relevant and convincing evidence directly related to the primary objective of the trial. This will usually be an efficacy (or safety) variable.
Primary Response Variable (target variable, primary endpoint)
Secondary Response Variable
Supportive measurements related to the primary objective or measurements of effects related to the secondary objectives.
Explanatory Variables
Those variables, usually individual characteristics used to explain and predict response variables.
ASYRAS 2017 Barcelona 18 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
We want to measure the student’s opinion about a course they have just received. Questionnaires are provided to students and filled out voluntarily and anonymous.
The results were: Global average satisfaction (0-10): 4.25 (n=31)
It is possible to conclude that students are unsatisfied?
2) Study Design Examples
ASYRAS 2017 Barcelona 19 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
We want to value the efficacy of a course for a group of unemployed people (n = 504). The registration for the course is voluntary. After 6 months, the efficacy of the course is evaluated measuring the percentage of people still unemployed between those wo assisted and those who didn’t. The results were:
Attended the course (n=120): 23% are still unemployed Did not attend the course (n=384): 64% are still unemployed
(p<0.001)
It is possible to conclude that the course was effective?
2) Study Design Examples
ASYRAS 2017 Barcelona 20 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
An experimental study to evaluate the efficacy of a reinforcement course for students with learning difficulties was carried out. After a certain period of time, an important improvement is detected in students who attended all sessions.
However, only 40% of the students completed the course. The reason is that the course implied a huge ammount of workload.
What would be the conclusions of the study?
2) Study Design Examples
ASYRAS 2017 Barcelona 21 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
2) Study Design Design (Comparative Studies)
Randomized Controlled Study with Parallel Arms
Included Subjects
Random Allocation
Control Treatment
Experimental Treatment
Response Variable
Response Variable
ASYRAS 2017 Barcelona 22 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Cross-Over 2x2 - Randomized Controlled Study
Included Subjects
Random Allocation
Control Treatment Period 1
Experimental Treatment Period 1
Experimental Treatment Period 2
Control Treatment Period 2
Response Variable
Response Variable
2) Study Design Design (Comparative Studies)
ASYRAS 2017 Barcelona 23 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Observational Study - Cohorts
Subjects Included Risk Factor
Unexposed Cohort
Exposed Cohort
Response Variable
Response Variable
2) Study Design Design (Comparative Studies)
ASYRAS 2017 Barcelona 24 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Subjects Included
Response Variable
Risk Factor
Risk Factor
Unexposed
Exposed
Unexposed
Exposed Negative (Controls)
Positive (Cases)
Observational Study : Cases and Controls (Binary Response Variables)
2) Study Design Design (Comparative Studies)
ASYRAS 2017 Barcelona 25 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
STATISTICS
Statistical techniques
o Descriptive
o Univariate
o Bivariate
o Multivariate
Statistical methods
o Validation
o Summarization and visualization
o Bivariate Baseline Analysis: Group vs Explanatory
o Bivariate Analysis: Response vs Explanatory
o (Multivariate ) Primary analysis
o (Multivariate ) Secondary analysis
3) Statistical methodology
ASYRAS 2017 Barcelona 26 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
STATISTICS
Validation
Ensure the validity of the data
o Missing values
o Data entry errors
o Outliers
o Inconsistencies between variables
3) Statistical methodology
Summarization and visualization
o Description of the target population
o Description of experimental groups
o Description of main results
ASYRAS 2017 Barcelona 27 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
STATISTICS
Bivariate Baseline Analysis: Group vs Explanatory
o Ensure the comparability between experimental groups
o Identify potential confounders (Colinearity analysis)
3) Statistical methodology
(Multivariate ) Primary analysis
o Confirm primary experimental hypothesis
Bivariate Analysis: Response vs Explanatory
o Identify explanatory variables related to the response variable
ASYRAS 2017 Barcelona 28 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
STATISTICS
(Multivariate ) Secondary analysis
o Analyse secondary response variables or objectives
o Consideration of alternative models for the primary analysis in case of non-compliance with the premises of the technique.
o Explore specific subgroups of the target population, evaluate the influence of outliers, analyse interactions between explanatory variables, etc.
3) Statistical methodology
ASYRAS 2017 Barcelona 29 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
3) Statistical methodology: Study Case
ASYRAS 2017 Barcelona 30 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Data Base
3) Statistical methodology: Study Case
ASYRAS 2017 Barcelona 31 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Quantitative variables:
3.1) Summary Statistics Summary measures
ASYRAS 2017 Barcelona 32 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
3.1) Summary Statistics Summary measures
Quantitative variables:
ASYRAS 2017 Barcelona 33 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.1) Summary Statistics Summary measures
ASYRAS 2017 Barcelona 34 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Qualitative variables:
3.1) Summary Statistics Summary measures
ASYRAS 2017 Barcelona 35 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Qualitative variables:
3.1) Summary Statistics Summary measures
ASYRAS 2017 Barcelona 36 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.1) Summary Statistics Summary measures
ASYRAS 2017 Barcelona 37 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Comparison of a Quantitative Variable between Independent Groups
3.2) Baseline Analysis
Note: in presence of repeated measurements on the same individuals, Mixed Models should be considered.
ASYRAS 2017 Barcelona 38 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Comparison of a Quantitative Variable between Independent Groups
3.2) Baseline Analysis
Note: in presence of repeated measurements on the same individuals, Mixed Models should be considered.
ASYRAS 2017 Barcelona 40 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.2) Baseline Analysis
ASYRAS 2017 Barcelona 41 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Comparison of a Qualitative Variable between Independent Groups
3.2) Baseline Analysis
ASYRAS 2017 Barcelona 42 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.2) Baseline Analysis
ASYRAS 2017 Barcelona 43 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.2) Baseline Analysis
Variable Group Effect p-value Test
Type of school (Science)
CLIL 70.1% 0.437 Chi Square
Control 65.7%
Sex (Boy) CLIL 55.5%
0.036 Chi Square Control 42.9%
Extra (Yes) CLIL 56.2%
0.840 Chi Square Control 55.0%
Reading pre CLIL 15.8 (7.4)
0.038 Wilcoxon Control 14.4 (7.8)
Listening pre CLIL 3.0 (1.5)
0.407 Wilcoxon Control 2.9 (1.6)
Proficiency CLIL 3.4 (1.4)
0.099 Wilcoxon Control 3.2 (1.5)
ASYRAS 2017 Barcelona 44 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Comparison of two Quantitative Variables: • Pearson Correlation (r): Degree of linear association [-1,1] • Slope: Y = a + b X Note: in absence of Normality, Spearman Correlation should be used.
3.3) Bivariate analysis
ASYRAS 2017 Barcelona 45 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.3) Bivariate analysis
Variable Category Effect p-value Test
Group CLIL 10.2 (7.0)
0.208 Wilcoxon Control 11.5 (7.5)
Type of school Arts & crafts 10.8 (8.3)
0.757 Wilcoxon Science 10.9 (6.7)
Sex Boy 11.0 (7.0)
0.941 Wilcoxon Girl 10.7 (7.6)
Extra Yes 10.5 (7.1)
0.617 Wilcoxon No 11.1 (7.4)
Reading pre -0.292 <0.001 Spearman Listening pre -0.11 0.069 Spearman Proficiency -0.23 <0.001 Spearman
Proficiency High 7.4 (6.0)
<0.001 Kruskal-Wallis Medium 11.7 (6.7) Low 12.0 (7.8)
ASYRAS 2017 Barcelona 46 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Y = a + b1 X1 + … + bkXk + e
• Quantitative response variable • All important explanatory variables are considered • Independent observations • Homogeneity of residual variance • Residuals Normally distributed
3.4) Statistical Models (I)
ASYRAS 2017 Barcelona 47 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Results:
3.4) Statistical Models (I)
ASYRAS 2017 Barcelona 48 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Check for Multicollinearity: • Correlations • Variance Inflation Factor (VIF) lower than 3
3.4) Statistical Models (I)
ASYRAS 2017 Barcelona 49 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Model validation:
3.4) Statistical Models (I)
ASYRAS 2017 Barcelona 50 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Interactions:
3.4) Statistical Models (II)
ASYRAS 2017 Barcelona 51 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Interactions:
3.4) Statistical Models (II)
ASYRAS 2017 Barcelona 52 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
If model premises are not satisfied or the response has another distribution different from Normal: o Transformation
o Logistic regression (binary outcome)
o Poisson regression (counts)
If observations are not independent :
o Mixed models / Generalized mixed models
3.4) Other Statistical Models
ASYRAS 2017 Barcelona 53 Llorenç Badiella & Oliver Valero
STATISTICS SHORT COURSE Servei d’Estadística Aplicada UAB
Thank You www.uab.es/s-estadistica