survey data & sampling

67
DATA Muhammad Bilal uhammad Fahim d Iqrar Hussain

Upload: syed-iqrar-hussain

Post on 07-Aug-2015

50 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Survey data & sampling

DATAMuhammad BilalMuhammad Fahim

Syed Iqrar Hussain

Page 2: Survey data & sampling

How can we Define “Data”…..???TerminologiesTypes of DataData Collection…???

How to Analyze & Represent Data….???What is Sample & Sampling…???Terminologies in SamplingTypes of SamplingHow to Calculate Sample Size…..???

Outline

Page 3: Survey data & sampling

The word data is the plural of datum, which literally means "to give“ or "something given".

“Data is a collection of facts, such as values or measurements.”

“Data are measurements or observations that are collected as a source of information.”

It can be numbers, words, measurements, observations or even just descriptions of things.

How Can We Define Data....??

Page 4: Survey data & sampling

Data UnitA data unit is one entity (such as a person

or business) in the population being studied, about which data are collected. A data unit is also referred to as a unit record or record.

Data ItemA data item is a characteristic of a data

unit which is measured or counted, such as height, country of birth, or income. A data item is also referred to as a variable because the characteristic may vary between data units, and may vary over time.

Terminologies

Page 5: Survey data & sampling

Observation An observation is an occurrence of a

specific data item that is recorded about a data unit. It may also be referred to as datum, which is the singular form of data. An observation may be numeric or non-numeric.

Dataset A dataset is a complete collection of

all observations.

Terminologies

Page 6: Survey data & sampling

Example of Dataset

Page 7: Survey data & sampling

There are main two types of data with respect to its characteristics:

Qualitative DataQuantitative Data

Types of Data w.r.t Characteristics

Page 8: Survey data & sampling

“Data that is not given numerically.” It deals with description. It can be observed but not measured. Qualitative → Quality

Example: Favorite Color, Place of Birth, Favorite Food, Type of Car

Qualitative Data

Page 9: Survey data & sampling

It is given in numerical form.It deals with numbers. It can be measured. Quantitative → Quantity 

Example: Length, Height, Area, Volume, Weight, Speed, Time, Temperature, Humidity, Sound Levels, Cost, Ages, etc.

Quantitative Data

Page 10: Survey data & sampling

Quantitative data can be divided into:Discrete DataContinuous Data

Discrete data is counted, Continuous data is measured

Cont’d

Page 11: Survey data & sampling

Discrete DataDiscrete data can only take certain

values (like whole numbers).Example: The number of students in a class (you can't have half a student).

Continuous DataContinuous Data is data that can take

any value (within a range). Example: A person's height: could be any value (within the range of human heights), not just certain fixed heights,

Cont’d

Page 12: Survey data & sampling
Page 13: Survey data & sampling

Example

Page 14: Survey data & sampling

Univariate DataIt means "one variable" (one type of data).Example: Travel Time (minutes): 15, 29, 8, 42, 35, 21, 18, 42, 26The variable is Travel Time.

Types of Data w.r.t. Variables

Page 15: Survey data & sampling

Bivariate or Multivariate DataIt means "two or more than two

variables“. With bivariate or multivariate data you have two or more than two sets of related data that you want to compare.Example: The two variables are Ice Cream Sales and Temperature.

Univariate DataBivariate or

Multivariate Data Involving a single variable Involving two or more variables

Does not deal with causes or relationships

Deals with causes or relationships

Page 16: Survey data & sampling

There are main two types of data with respect to data collection techniquesPrimary DataSecondary Data

Types of Data w.r.t Collection Techniques

Page 17: Survey data & sampling

Primary DataPrimary data means original data

that has been collected specially for the purpose in mind. It means someone collected the data from the original source first hand. Data collected this way is called Primary Data.

Example: Questionnaire, Surveys, Experiments, Interviews.

Page 18: Survey data & sampling

Secondary data is data that has been collected for another purpose. When we use Statistical Method with Primary Data from another purpose for our purpose we refer to it as Secondary Data.

Example: Books, Journals, Magazines, Newspapers, E-journals, General Websites, Web-blogs.

Secondary Data

Page 19: Survey data & sampling

Data

Primary Data

Quantitative Data

Univariate Data

Bivariate Data

Qualitative Data

Univariate Data

Bivariate Data

Secondary Data

Quantitative Data

Univariate Data

Bivariate Data

Qualitative Data

Univariate Data

Bivariate Data

Page 20: Survey data & sampling

Data Collection….???

“Data Collection is a process of obtaining useful information for a defined purpose from various sources.”

The issue is not: How do we collect data?It issue is: How do we collect useful data?

Page 21: Survey data & sampling

Why we Collect Data…??

The purpose of data collection is: To obtain information to keep on recordTo make decisions about important issuesTo pass information on to others

Page 22: Survey data & sampling

Data Collection Plan..??“A document that defines all the details concerning data collection, including how much and what type of data is required and when and how it should be collected.”Why do we want the data?What purpose will they serve?Where will we collect the data?What type of data will we collect?Who will collect the data?How do we collect the right data?

Page 23: Survey data & sampling

How can we Collect Data...??Tools used to collect data are

Mail Telephone In-person and Web-based Surveys Direct or Participatory Observation Interviews Focus Groups Expert Opinion Case Studies Literature Search Content Analysis of Internal and External Records

The data collection tools must be strong enough to support the findings of the evaluation.

Page 24: Survey data & sampling

Data Analysis“Analysis of data is a process of

inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.”

Page 25: Survey data & sampling

Bar GraphsPie Charts Line Graphs  Scatter (x,y) Plots  PictographsHistogramsFrequency

DistributionStem and Leaf Plots

Cumulative Tables and Graphs

Relative FrequencyCheck Sheet

How To Analyze & Represent Data….???

Page 26: Survey data & sampling

Bar GraphsA Bar Graph (also called Bar Chart) is a graphical display of data using bars of different heights.

Page 27: Survey data & sampling

HistogramsA Histogram is a graphical display of data using bars of different heights.

It is similar to a Bar Chart, but a histogram groups numbers into ranges.

Page 28: Survey data & sampling

Pie ChartA special chart that uses "pie slices" to show relative sizes of data.

Page 29: Survey data & sampling

Line GraphsA graph that shows information that is connected in some way (such as change over time)

Page 30: Survey data & sampling

Scatter PlotsA graph of plotted points that show the relationship between two sets of data.

Page 31: Survey data & sampling

PictographsA Pictograph is a way of showing data using images.

Page 32: Survey data & sampling

Frequency DistributionFrequency:

Frequency is how often something occurs.

By counting frequencies we can make a Frequency Distribution table.

Example: Sam's team has scored the following numbers of goals in recent football games:

Page 33: Survey data & sampling

Stem and Leaf PlotsA special table where each data value is split into a "leaf" (usually the last digit) and a "stem" (the other digits).

Like in this example:

Page 34: Survey data & sampling

Suppose you have the following list of values: 12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41. You could make a frequency distribution table showing how many tens, twenties, thirties, and forties you have:

FrequencyClass Frequency

10 - 19 2

20 - 29 2

30 - 39 4

40 - 49 3

Page 35: Survey data & sampling

Cumulative Tables and GraphsCumulative means "how much so far". To

have cumulative totals, just add up the values as you go.

Example: Jamie has earned this much in the last 6 months:

Page 36: Survey data & sampling

Relative Frequency“How often something happens divided by all outcomes.”

Page 37: Survey data & sampling

Check sheet“A generic tool that can be adapted

for a wide variety of purposes, the check sheet is a structured, prepared form for collecting and analyzing data.”

Page 38: Survey data & sampling

Sample Size &

Sampling

Page 39: Survey data & sampling

Census & SampleCensus

A Census is when we collect data for every member of the group (the whole "population").

Sample“A Sample is when we collect data just for selected

members of the group.”

Example: There are 120 people in your local football club. We can ask everyone (all 120) what their age is. That

is a census. Or you could just choose the people that are there

this afternoon. That is a sample.

PopulationSample

Page 40: Survey data & sampling

What is Sampling…..???

Sampling is the process of selecting units from population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen.

Page 41: Survey data & sampling

Sampling reduce expenses and time by allowing researchers to estimate information about a whole population without having to survey each member of the population.

Sampling is like taking out and testing a few grains of rice from the cooking vessel to know if the dish is done or not.

Purpose of Sampling

Page 42: Survey data & sampling

Sampling UniversePopulation from which we are

sampling.Sampling Unit

The unit selected during the process of sampling. Example: If we select households from a list of all units in the population, the sampling unit is in this case the household.

Terminologies in Sampling

Page 43: Survey data & sampling

Basic Sampling Unit or Elementary UnitThe sampling unit selected at the

last stage of sampling. In a multi-stage survey if we first select

villages and then select household within those selected villages, the basic sampling unit would be the household.

RespondentPerson who’s responding to our

questionnaires on the field.

Terminologies in Sampling

Page 44: Survey data & sampling

Survey SubjectEntity or person from whom we are

collecting data.Sampling Frame

Description of the sampling universe, usually in the form of the list of sampling units. Example: Villages, Households or Individuals.

Terminologies in Sampling

Page 45: Survey data & sampling

Types of Sampling Technique

There are main two types of Sampling Technique:

Probability SamplingNon-Probability Sampling

Page 46: Survey data & sampling

Probability Sampling

A probability sampling is one in which every unit in the population has a chance (greater than zero) of being selected in the sample.

Probability Sampling can be further sub-classified into:Stratified Sampling

Simple Random Sampling Systematic Sampling

Cluster Sampling

Page 47: Survey data & sampling

Simple Random Sampling (SRS)In a simple random sampling (SRS) of a

given size, all such subsets of the frame are given an equal probability. Each element of the frame thus has an equal probability of selection: the frame is not subdivided or partitioned.

Simple random sampling is always an EPS design (equal probability of selection), but not all EPS designs are simple random sampling.

Probability Sampling

Page 48: Survey data & sampling

SRS may also be cumbersome and tedious when sampling from an unusually large target population.Example: N college students want to get a ticket for a basketball game, but there are not enough tickets (X) for them, so they decide to have a fair way to see who gets to go. Then, everybody is given a number (1 to N), and random numbers are generated, either electronically or from a table of random numbers.

Page 49: Survey data & sampling

Systematic SamplingA method of selecting sample members

from a larger population according to a random starting point and a fixed, periodic interval called the sampling interval.

The sampling interval (sometimes known as the skip) is calculated as:

where n is the sample size, and N is the population size.

Probability Sampling

Page 50: Survey data & sampling

Example: Suppose you want to sample 8 houses from a street of 120 houses.

Skip = k = 120/8 =15So, every 15th house is chosen after a random

starting point between 1 and 15. If the random starting point is 11, then the

houses selected are 11, 26, 41, 56, 71, 86, 101, and 116.

Page 51: Survey data & sampling

Stratified SamplingWhere the population embraces a

number of distinct categories, the frame can be organized by these categories into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected.

Probability Sampling

Page 52: Survey data & sampling

Example: Suppose that in a company there are the following staff: Total: 180 Male (Full-time): 90 Male (Part-time): 18Female (Full-time): 9 Female (Part-time): 63

we are asked to take a sample of 40 staff, stratified according to the above categories.Male (Full-time) = 90 x (40 / 180) = 20Male (Part-time) = 18 x (40 / 180) = 4Female (Full-time) = 9 x (40 / 180) = 2Female (Part-time) = 63 x (40 / 180) = 14

Page 53: Survey data & sampling

Cluster SamplingCluster sampling is exactly what its title

implies. You randomly select clusters or groups in a population instead of individuals.

The objective of this method is to choose a limited number of smaller geographic areas in which simple or systematic random sampling can be conducted.

Probability Sampling

Page 54: Survey data & sampling

It’s completed in 2 stages:1st Stage: Random Selection of Clusters: The

entire population of interest is divided into small distinct geographic areas, such as villages, camps, etc. We then need to find an approximate size of the population for each “village”.

2nd Stage = Random Selection of Households within Clusters: Households are chosen randomly within each cluster using simple or systematic random sampling.

Page 55: Survey data & sampling

Advantages DisadvantagesSimple

Random Sampling

(SRS)

Estimates are easy to calculate.Simple random sampling is always an EPS design, but not all EPS designs are

simple random sampling.

If sampling frame large, this method impracticable.

Minority subgroups of interest in population may not be present in sample

in sufficient numbers for study.

Systematic Sampling

Sample easy to selectSuitable sampling frame can be

identified easilySample evenly spread over entire

reference population

Sample may be biased if hidden periodicity in population coincides with

that of selection.Difficult to assess precision of estimate

from one survey.

Stratified Sampling

Low CostGreater accuracyBetter coverage

Sampling frame of entire population has to be prepared separately for each stratum

When examining multiple criteria, stratifying variables may be related to

some, but not to others, further complicating the design, and potentially

reducing the utility of the strata.In some cases. stratified sampling can potentially require a larger sample than

would other methods

Cluster Sampling

Cuts down on the cost of preparing a sampling frame.

This can reduce travel and other administrative costs.

sampling error is higher for a simple random sample of same size.

Often used to evaluate vaccination coverage in EPI

Page 56: Survey data & sampling

Non-probability sampling is any sampling method where some elements of the population have no chance of selection or where the probability of selection can't be accurately determined.

Probability Sampling can be further sub-classified into:Quota SamplingAccidental Sampling

Non-Probability Sampling

Page 57: Survey data & sampling

Quota SamplingIn quota sampling, the population is

first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion.Example: An interviewer may be told to sample 200 females and 300 males between the age of 45 and 60.

Non-Probability Sampling

Page 58: Survey data & sampling

In quota sampling the selection of the sample is non-random.

Interviewers might be tempted to interview those who look most helpful.

The problem is that these samples may be biased because not everyone gets a chance of selection.

Page 59: Survey data & sampling

Accidental SamplingAccidental sampling (sometimes

known as Grab, Convenience or Opportunity sampling) is a type of non-probability sampling which involves the sample being drawn from that part of the population which is close to hand.

Non-Probability Sampling

Page 60: Survey data & sampling

Example: If the interviewer were to conduct such a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area.

If the survey were to be conducted at different times of day and several times per week. This type of sampling is most useful for pilot testing.

Page 61: Survey data & sampling

Factors Affecting Sample Size..???

Sample size depends upon :Population sizeConfidence IntervalConfidence Level

By increasing sample size, accuracy increases and margin of error decreases

Page 62: Survey data & sampling

Confidence Level

The confidence level tells you how sure you can be.

It is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval.

The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level.

Factors Affecting Sample Size..???

Page 63: Survey data & sampling

Confidence IntervalIt expresses the degree of uncertainty

associated with a sample statistic. A confidence interval is an interval estimate combined with a probability statement.Interval Estimate

An interval estimate is defined by two numbers, between which a population parameter is said to lie. For example, a < < b is an interval estimate μ

for the population mean . It indicates that the μpopulation mean is greater than a but less than b.

Factors Affecting Sample Size..???

Page 64: Survey data & sampling

How to Calculate Sample Size…..???

Page 65: Survey data & sampling
Page 66: Survey data & sampling

04/15/2023

References“What is data..??” available from: http://www.mathsisfun.com/data/data.html (20 March 2013)“Sampling” available from: http://en.wikipedia.org/wiki/Sampling_statistics (21 March 2013)“Qualitative data analysis ” available from: http://www.learnhigher.ac.uk/analysethis/main/qualitative.html (14 March 2013)“Calculating the Sample Size ” available from: http://www.ifad.org/gender/tools/hfs/anthropometry/ant_3.htm (21 March 2013)“Sampling Strategies” available from: http://www.dissertation-statistics.com/sampling-strategies.html (21 March 2013)“Univariate vs Bivariate Data” available from: http://regentsprep.org/REgents/math/ALGEBRA/AD1/unidat.htm (21 March 2013)

Page 67: Survey data & sampling

Thanks for Listening…….