Download - Statistical analysis training course
STATISTICAL ANALYSIS TRAINING COURSE
This course was lectured in Municipality of DubaiBy Data Analyst: Marwa Abo-AmraOn November 2014
WHY WE USE STATISTICS & STATISTICAL
ANALYSISData don’t make any sense, we will have to resort to statistics.
IN TODAY’S WORLD… Customer
Surveys Medical News Demographics
Political Polls Economic Predictions
Marketing Information
Sales Forecasts
Stock Market Projections
Consumer Price Index
Sports Statistics
COURSE CONTENTS• Introduction to Statistics
• Statistical Concepts
• Data Collection Methods
• Statistical Analysis
• Introduction to Statistical Software: SPSS and Minitab
• Data Preparation and Analysis using MS Excel, SPSS, & Minitab
• Statistical Report writing
• Practical Examples and Applications
COURSE LEARNING OBJECTIVESThose who deal with the different fields in life should be familiar with the importance of data and how to deal with it as well as the scientific methods to extract the necessary indicators for decision-making, and help decision makers to know the various alternatives and how to evaluate them.
INTRODUCTION TO STATISTICS
DEFINITION OF STATISTICSStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to assist in making more effective decisions.
STAGES OF STATISTICS“Statistics is a way to get information from data.”
Collect Organize Present Analyze Interpret
Descriptive Analysis
Make Valid Conclusion & Take Correct Decision
Data Preparation Inferential Analysis
IMPORTANCE OF STATISTICS
simplifies complex
data
provides a
technique of
comparison
helps in formulati
ng policies
helps to test &
formulate theories
presents the facts
in a definite
form
studies relations
hip
helps in forecasti
ng
STATISTICAL CONCEPTS
SOME BASIC CONCEPTS
Population
Sample &
Sampling
Parameter &
Statistic
Elementary Units
& Variable
s
VARIABLES
Dependent
Independent Cause Effect
Variables
Descriptive InformationDiscrete
(Counted)Continuous (Measured)
Qualitative Quantitative
SexNationality
Type of crimeOccupation
ReligionMarital status
Literacy…etc.
Number of children in a family
Number of accidents per week
Students marks…etc.
AgeDistanceHeightWeight…etc.
VARIABLES & UNITSEmployees Race Sex Job Title Years of
Service Annual Salary
Abel Caucasian Male Pilot 2 $34,000
Cruz Caucasian Male Chief mechanic 10 60,000
Dunn Western Male Chief pilot 23 70,000
Hill Western Female Secretary 5 14,000
King Caucasian Male Janitor 8 17,000
Otis Caucasian Male Grounds keeper 10 20,000
West Western Male Mechanic 2 36,000
Wolf Caucasian Female Pilot 7 36,000
Zorn Caucasian Female Mechanic 7 40,000
Qualitative variable
Elementary unit
Frame Population of
employee sexes
Population of
employee salaries
Qualitative variable
Datum
Sample of employee salaries
Selected characteristics of all the full-time employees of Mountain Aviation, Inc.,: July 1, 1994
QUANTITATIVE VARIABLES
Nominal• Attributes are
only named• No order
Ordinal• Attributes can
be rank-ordered• Distance is not
meaningful
Interval• Distance is
meaningful
Ratio• Absolute zero
Strongest
Weakest
Discrete Continuous
Measurement Level
POPULATION VS. SAMPLINGPopulation
Sample
Parameter
Statistic
Something that describes the
population as a whole
Something that describes the
sample
Elementary Unit
Sampling Inference
Target Population
WHY SAMPLING?Reduced
costTime factor
Very large
populations
Partly accessibl
e populatio
ns
Destructive units Accuracy
SIMPLE RANDOM SAMPLINGFor the sampling plan to be statistically valid, the sample must be randomly selected from the population.
DATA COLLECTION & PREPARATION
WHAT IS DATA?
Data is a collection of facts, such as values or measurements.Examples of data include price, exam scores, export and import, labor and so on.
Data Information Statistics
20 kg, 25 kg 5 individuals in the 20-to-25-kg range Mean weight = 22.5 kg
28 kg, 30 kg, etc. 15 individuals in the 26-to-30-kg range
Median weight = 28 kg
DATA VS. INFORMATION VS. STATISTICS
SOURCES OF DATA
Internal• Organization
internal recordsExternal• Published and
unpublished external sources
DATA COLLECTION METHODS
Cens
us
Sam
ple
Surv
eyEx
perim
ent
Obse
rvat
iona
l St
udy
Census
Survey
Experiment (Control)
Observation (No Control)
OTHER DATA COLLECTION METHODS Interviews Face-to-face Telephone
Focus Groups
Ethnographies, Oral History, & Case Studies
Schedule through enumerators
Documents & Records
Questionnaire Schedule• It is not confirmed if expected
respondent have filled the answers• Respondent identity is know
• Very slow • Information collected in time• No personal contact • Direct personal contact• Used only when respondent is
educated & cooperative• Information can be collected from
illiterate people• Wider distribution of samples
possible• Difficult for wider distribution
• More incomplete and false information
• Relatively more complete and correct information
• Information validity depends on quality of questionnaire
• Information validity depends on honesty & competence of numerator
• Questionnaire physical appearance should be attractive
• Schedule physical appearance not necessary
• Observation cannot be used • Observation can be used by enumerator
Comparison
FRAMING A QUESTIONNAIRE OR A SCHEDULE
Cover Letter
Number of Questions
Nature of Questions
Questions should be simple
Arrangement of Questions
Information collected is usable
Avoid mathematical questions
TYPES OF QUESTIONS
• Yes/No• Multiple
Choice• Scaled (i.e.,
Likert)
• no predefined options or categories
• closed-ended but arranged one under the other
• answered only when the respondent provides a particular response
Contingency
questions
Matrix question
s
Closed ended
questions
Open ended
questions
LIKERT SCALE
1 2 3 4 5 6 7 8 9
Disagree
Agree
9-point Scale
5-point Scale
Not at all
Satisfied
Not Satisfie
d
Neutral Satisfied
Very Satisfie
d
1 2 3 4 5
7-point Scale
Entirely Disagre
e
Mostly Disagre
e
Somewhat
Disagree
Neither Agree nor Disagree
Somewhat Agree
Mostly Agree
Entirely Agree
1 2 3 4 5 6 7
SEMANTIC DIFFERENTIAL SCALE
RANK-ORDER SCALE
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS METHODS & TECHNIQUES
Descriptive Statistics
Tables & Graphs
Summary Measures
Inferential Statistics
Confidence Intervals
Hypothesis Testing
DATA ANALYSIS
- Performed to study a single variable- Descriptive statistics• Tables & Graphs• Summary Measures• One sample tests• Normality Tests• Normal Probability
Plot
- Performed to study two variables relationship- Statistical Analysis• Comparisons• Contingency Tables• Scatter plot• Correlation• Regression
Univ
aria
teBivariate
DESCRIPTIVE STATISTICSTables & Graphs
Frequency Table
Frequency HistogramBar & Column ChartTime-series Line GraphPie ChartStem-&-leaf DiagramBoxplot
Summary Measures
Measures of Location
Measures of Variability
Measures of Shape
Proportion
SUMMARY MEASURES
Measures of Central Tendency (Location)• Mean (Average)• Median (Middle
Value)• Mode (Most often
Value)
Measures of Dispersion (Spread)• Range (Highest
Value – Lowest Value)
• Variance• Standard Deviation• Coefficient of
Variation
Measures of Shape• Skewness• Kurtosis
Proportion• Frequency of
observations in a particular category as a fraction of all observations
TABLES & GRAPHSFrequency Table Frequency Histogram Frequency Histogram
Time-series Line Graph Pie Chart Stem-&-Leaf Diagram Boxplot
FREQUENCY TABLE
HISTOGRAM VS. BAR CHART
TIME-SERIES LINE GRAPH
PIE CHART
STEM-AND-LEAF DIAGRAM
BOXPLOT
DATA REPRESENTATION SUMMARY
Variable Associated Data type
Graphically represented by
Summary measures
Central Tendency Dispersion
Qualitative (Categorical)
Nominal Pie ChartColumn or Bar Chart
Mode Range
Ordinal Column or Bar Chart ModeMedian
Range
Quantitative (Scale)
Interval & Ratio
Frequency HistogramBoxplotStem-&-leaf DiagramTime-series line Graph
ModeMedianMean
RangeVarianceStandard DeviationCoefficient of Variation
Effective representation of data depends on choosing proper graphical tool and summary measures.
NORMAL DISTRIBUTION
CENTRAL LIMIT THEOREM
The central limit theorem in it's
shortest form states that the
sampling distribution of the
sampling means approaches a
normal distribution as the sample
size gets larger, regardless of the
shape of the population
distribution.
TEST OF NORMALITY Since a number of the most common statistical tests rely
on the normality of a sample or population, it is often
useful to test whether the underlying distribution is normal,
or at least symmetric. This can be done via the following
approaches:
• Review the distribution graphically (histograms, & Normal Probability plots)
• Analyze the skewness & kurtosis
• Employ statistical tests (Chi-square, Kolmogorov-Smironov, & Shapiro-Wilk)
INFERENTIAL STATISTICS
Confidence
Interval Hypothesis Testing
DATA ANALYSIS
- Performed to study a single variable- Descriptive statistics• Tables & Graphs• Summary Measures• One sample tests• Normality Tests• Normal Probability
Plot
- Performed to study two variables relationship- Statistical Analysis• Comparisons• Contingency Tables• Scatter plot• Correlation• Regression
Univ
aria
teBivariate
STATISTICAL TESTSIndependent
Samples Parametric
Tests1 Sample(t-test)
2 Samples(t-test)
K Samples(ANOVA)
Independent Samples
Nonparametric Tests1 Sample
(Binomial, Chi-square,
Kolmogorov-Smirnov)
2 Samples(Mann-Whitney)
K Samples(Kruskal-Wallis)
Related Samples
Parametric Tests
2 Samples(t-test)
K Samples(Repeated Measures ANOVA)
Related Samples
Nonparametric Tests
2 Samples(Wilcoxon)
K Samples(Friedman)
CONTINGENCY TABLESA two-way table is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies
SCATTER PLOT
A Scatter (XY) Plot has points that show the relationship between two sets of data.
In this example, each dot shows one person's weight versus their height.
Positive Association
Negative Association No
Association
CORRELATION
When two sets of data are strongly linked together we say they have a High Correlation.
• Correlation is Positive when the values increase together, and
• Correlation is Negative when one value decreases as the other increases
Correlation can have a value:1 is a perfect positive correlation0 is no correlation (the values don't seem linked at all)-1 is a perfect negative correlation
The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.
LINEAR REGRESSION In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect.
SPSS AND MINITAB
MS EXCEL
STATISTICAL PACKAGE FOR SOCIAL SCIENCES (SPSS)
MINITAB
DATA PREPARATION AND ANALYSIS USING MS EXCEL,
SPSS, & MINITAB
DATA PREPARATION
Collect Data
Enter the
Data into
Computer
Check for
Data Accura
cy
Develop a
Database
Structure
WHAT NEXT…?
Data ready in
Excel
Import Data to SPSS or Minitab
Explore data using
Descriptive
Statistics
Perform Hypoth
esis Testing
if needed
Write your
report
STATISTICAL REPORT WRITING
HOW TO WRITE A RESEARCH REPORT…?Cover Page• Title• Date• Researcher Name• Supervisor Name
Contents & Executive Summary
Introduction• Data Source• Data Analysis
Technique• Main Research Aim
Data Analysis• Descriptive• Research Questions Conclusion
TAILORED REPORTS
Sales Real Estate
Marketing NPS Other
SOURCES• https://cyfernetsearch.org/ilm_4_4• http://stattrek.com/statistics/data-collection-methods.aspx• http://onlinestatbook.com/2/introduction/descriptive.html• http://books.google.com.eg/books?
id=JbzzoRyOiXcC&pg=PA19&lpg=PA19&dq=introduction+to+statistics+internal+and+external+data&source=bl&ots=hKFxFq_DHG&sig=EHQL0LJ8JxvGQ1dRN5Et8BX6VLk&hl=en&sa=X&ei=2ohrVKC7FoLksASUzYGQCQ&ved=0CCgQ6AEwAg#v=onepage&q=introduction%20to%20statistics%20internal%20and%20external%20data&f=false
• http://www.sagepub.com/upm-data/10985_Chapter_4.pdf• http://sociology.about.com/od/Statistics/a/Descriptive-inferential-statistics.htm• http://www.statcan.gc.ca/edu/power-pouvoir/ch2/methods-methodes/5214773-eng.htm• http://www.stats.gla.ac.uk/steps/glossary/index.html• http://www.psychstat.missouristate.edu/introbook/sbk28m.htm
THANK YOU
Have Successful career, Wonderful Life full of Joy, Productivity, & Happiness
Marwa Abo-AmraEmail: [email protected]: analystamra.blogspot.com