unit 3 p 1 analysis of data
TRANSCRIPT
ANALYSIS OF DATA,
EDITING, AND
CODING
Rahul Pratap Singh Kaurav, Ph.D. Asst. Professor, Marketing
Unit II
MBA 203
Stages of Data Analysis
Raw Data The unedited responses from a respondent exactly as indicated by that respondent.
Non-respondent Error Error that the respondent is not responsible for creating, such as when the interviewer marks a response incorrectly.
Data Integrity The notion that the data file actually contains the information that the researcher is trying to obtain to adequately address research questions.
Editing - I
Editing
The process of checking the completeness, consistency, and legibility of data and making the data ready for coding and transfer to storage.
Field Editing
Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.
In-House Editing
A rigorous editing job performed by a centralized office staff.
Editing - II
Checking for Consistency Respondents match defined population
Check for consistency within the data collection framework
Taking Action When Response is Obviously in Error Change/correct responses only when there are multiple pieces of evidence for doing so.
Editing Technology Computer routines can check for consistency automatically.
Editing for Completeness - I
Item Nonresponse The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data.
Plug Value
An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis.
Choice of value is based on a predetermined decision rule.
Impute
To fill in a missing data point through the use of a statistical process providing an educated guess for the missing response based on available information.
Editing for Completeness - II
What about missing data? List-wise deletion
The entire record for a respondent that has left a response missing is excluded from use in statistical analysis.
Pair-wise deletion
Only the actual variables for a respondent that do not contain information are eliminated from use in statistical analysis.
Facilitating the Coding Process
Editing And Tabulating “Don’t Know” Answers Legitimate don’t know (no opinion)
Reluctant don’t know (refusal to answer)
Confused don’t know (does not understand)
Editing - III
Pitfalls of Editing Allowing subjectivity to enter into the editing process.
Data editors should be intelligent, experienced, and objective.
A systematic procedure for assessing the questionnaire should be developed by the research analyst so that the editor has clearly defined decision rules.
Pretesting Edit Editing during the pretest stage can prove very valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording.
Data File Terminology - I
Field A collection of characters that represents a single type of data—usually a variable.
String Characters Computer terminology to represent formatting a variable using a series of alphabetic characters (nonnumeric characters) that may form a word.
Record A collection of related fields that represents the responses from one sampling unit.
Data File Terminology - II
Data File The way a data set is stored electronically in spreadsheet-like form in which the rows represent sampling units and the columns represent variables.
Value Labels Unique labels assigned to each possible numeric code for a response.
Code Construction
Two Basic Rules for Coding Categories:1. They should be exhaustive, meaning that a coding category should exist for all possible
responses.
2. They should be mutually exclusive and independent, meaning that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.
Test Tabulation Tallying of a small sample of the total number of replies to a particular question in order
to construct coding categories.
Devising the Coding Scheme
A coding scheme should not be too elaborate. The coder’s task is only to summarize the data.
Categories should be sufficiently unambiguous that coders will not classify items in different ways.
Code book Identifies each variable in a study and gives the variable’s description, code name, and position in the data matrix.
The Nature of Descriptive Analysis
Descriptive Analysis The elementary transformation of raw data in a way that describes the basic characteristics such as central tendency, distribution, and variability.
Histogram A graphical way of showing a frequency distribution in which the height of a bar corresponds to the observed frequency of the category.
Creating and Interpreting Tabulation
Tabulation The orderly arrangement of data in a table or other summary format showing the number of responses to each response category.
Tallying is the term when the process is done by hand.
Frequency Table A table showing the different ways respondents answered a question.
Sometimes called a marginal tabulation.
Cross-Tabulation - I
Cross-Tabulation Addresses research questions involving relationships among multiple less-than interval variables.
Results in a combined frequency table displaying one variable in rows and another variable in columns.
Contingency Table A data matrix that displays the frequency of some combination of responses to multiple variables.
Marginals Row and column totals in a contingency table, which are shown in its margins.
Cross-Tabulation - II
How Many Cross-Tabulations? Every possible response becomes a possible explanatory variable.
When hypotheses involve relationships among two categorical variables, cross-tabulations are the right tool for the job.
Quadrant Analysis (also known as IPA, Importance Performance Analysis) An extension of cross-tabulation in which responses to two rating-scale questions are plotted in four quadrants of a two-dimensional table.
Computer Programs for Analysis
Statistical Packages Spreadsheets
Excel
Statistical software:
SAS
SPSS (Statistical Package for Social Sciences)
MINITAB
Computer Graphics and Computer Mapping
Box and Whisker Plots Graphic representations of central tendencies, percentiles, variabilities, and the shapes of frequency distributions.
Interquartile Range A measure of variability.
Outlier A value that lies outside the normal range of the data.