unit 3 p 1 analysis of data

ANALYSIS OF DATA,

EDITING, AND

CODING

Rahul Pratap Singh Kaurav, Ph.D. Asst. Professor, Marketing

Unit II

MBA 203

ANALYSIS OF DATAExcellence is to do a common thing in an uncommon way.

- Booker T. Washington

Stages of Data Analysis

Raw Data The unedited responses from a respondent exactly as indicated by that respondent.

Non-respondent Error Error that the respondent is not responsible for creating, such as when the interviewer marks a response incorrectly.

Data Integrity The notion that the data file actually contains the information that the researcher is trying to obtain to adequately address research questions.

Overview of the Stages of Data Analysis

Why editing?

Editing - I

Editing

The process of checking the completeness, consistency, and legibility of data and making the data ready for coding and transfer to storage.

Field Editing

Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.

In-House Editing

A rigorous editing job performed by a centralized office staff.

Editing - II

Checking for Consistency Respondents match defined population

Check for consistency within the data collection framework

Taking Action When Response is Obviously in Error Change/correct responses only when there are multiple pieces of evidence for doing so.

Editing Technology Computer routines can check for consistency automatically.

Editing for Completeness - I

Item Nonresponse The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data.

Plug Value

An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis.

Choice of value is based on a predetermined decision rule.

Impute

To fill in a missing data point through the use of a statistical process providing an educated guess for the missing response based on available information.

Editing for Completeness - II

What about missing data? List-wise deletion

The entire record for a respondent that has left a response missing is excluded from use in statistical analysis.

Pair-wise deletion

Only the actual variables for a respondent that do not contain information are eliminated from use in statistical analysis.

Facilitating the Coding Process

Editing And Tabulating “Don’t Know” Answers Legitimate don’t know (no opinion)

Reluctant don’t know (refusal to answer)

Confused don’t know (does not understand)

Editing - III

Pitfalls of Editing Allowing subjectivity to enter into the editing process.

Data editors should be intelligent, experienced, and objective.

A systematic procedure for assessing the questionnaire should be developed by the research analyst so that the editor has clearly defined decision rules.

Pretesting Edit Editing during the pretest stage can prove very valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording.

Data File Terminology - I

Field A collection of characters that represents a single type of data—usually a variable.

String Characters Computer terminology to represent formatting a variable using a series of alphabetic characters (nonnumeric characters) that may form a word.

Record A collection of related fields that represents the responses from one sampling unit.

Data File Terminology - II

Data File The way a data set is stored electronically in spreadsheet-like form in which the rows represent sampling units and the columns represent variables.

Value Labels Unique labels assigned to each possible numeric code for a response.

Code Construction

Two Basic Rules for Coding Categories:1. They should be exhaustive, meaning that a coding category should exist for all possible

responses.

2. They should be mutually exclusive and independent, meaning that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.

Test Tabulation Tallying of a small sample of the total number of replies to a particular question in order

to construct coding categories.

Devising the Coding Scheme

A coding scheme should not be too elaborate. The coder’s task is only to summarize the data.

Categories should be sufficiently unambiguous that coders will not classify items in different ways.

Code book Identifies each variable in a study and gives the variable’s description, code name, and position in the data matrix.

The Nature of Descriptive Analysis

Descriptive Analysis The elementary transformation of raw data in a way that describes the basic characteristics such as central tendency, distribution, and variability.

Histogram A graphical way of showing a frequency distribution in which the height of a bar corresponds to the observed frequency of the category.

Levels of Scale Measurement and Suggested Descriptive Statistics

Creating and Interpreting Tabulation

Tabulation The orderly arrangement of data in a table or other summary format showing the number of responses to each response category.

Tallying is the term when the process is done by hand.

Frequency Table A table showing the different ways respondents answered a question.

Sometimes called a marginal tabulation.

Frequency Table Example

Cross-Tabulation - I

Cross-Tabulation Addresses research questions involving relationships among multiple less-than interval variables.

Results in a combined frequency table displaying one variable in rows and another variable in columns.

Contingency Table A data matrix that displays the frequency of some combination of responses to multiple variables.

Marginals Row and column totals in a contingency table, which are shown in its margins.

Cross-Tabulation Tables from a Survey Regarding AIG and Government Bailouts

Cross-Tabulation - II

How Many Cross-Tabulations? Every possible response becomes a possible explanatory variable.

When hypotheses involve relationships among two categorical variables, cross-tabulations are the right tool for the job.

Quadrant Analysis (also known as IPA, Importance Performance Analysis) An extension of cross-tabulation in which responses to two rating-scale questions are plotted in four quadrants of a two-dimensional table.

An Importance-Performance or Quadrant Analysis of Hotels

Pie Charts Work Well with Tabulations and Cross-Tabulations

Computer Programs for Analysis

Statistical Packages Spreadsheets

Excel

Statistical software:

SAS

SPSS (Statistical Package for Social Sciences)

MINITAB

Computer Graphics and Computer Mapping

Box and Whisker Plots Graphic representations of central tendencies, percentiles, variabilities, and the shapes of frequency distributions.

Interquartile Range A measure of variability.

Outlier A value that lies outside the normal range of the data.

Common types of charts

Bar Charts

Column Chart

Pie Chart

Line Chart

A 3-D Graph Showing Fast-Food Consumption Patterns around the U.S.

Box and Whisker Plot

Radar (Spider) Chart

Breakdown of travel and tourism’s total contribution to employment 2011

Area Chart

Scatter Plot

Surface Chart

Combo Chart

Gantt Chart

CPM (Critical Path Method)

unit 3 p 1 analysis of data

Business

legibility of data

data ready

data file

data editors

editing process

missing data point

editing iichecking

editing iiipitfalls