008 revised notes intro to datapresentation
DESCRIPTION
bbTRANSCRIPT
Biological WeaponsProliferation Prevention ProgramBiological Threat Reduction Program
Introduction to Data Presentation
TRNEPI-00152
2
Learning objectives
Define different types of variables Create and interpret one and two variable
tables Create and interpret a line graph Create and interpret one and two variable bar
charts Describe when to use each type of table,
graph, and chart
3
Why organize data?
Many records Look for trends and relationships Get familiar with data before analysis Catch errors Communicate findings to others
4
How to organize data
Identify what type of data you have Determine what you need to communicate with
the data Summarize using tables, graphs, and/or charts
5
Variable: definition
What is observed or measured in the way people differExamples:
age height hair color smoking
6
Continuous(real-valued)e.g. height
Discrete(count data)e.g. number
of admissions
Ordinal(ordered)
e.g. response to treatment
Nominal(not ordered)e.g. ethnic
group
Quantitativemeasurement
Variable
Qualitativeor categorical
Types of Variables
7
Types of VariablesCategorical
Nominal OrdinalSex Nationality Status M Yemen MildM Jordan ModerateF Yemen SevereM Jordan MildF Sudan ModerateF Yemen MildM Sudan ModerateM Iran SevereF Jordan SevereM Iran MildF Yemen ModerateF Sudan ModerateM Iran MildM Yemen Severe
Quantitative
Discrete ContinuousChildren Weight 1 56.41 47.82 59.93 13.11 25.71 23.02 30.03 13.72 15.42 52.51 26.61 38.21 59.02 57.9
8
Why Does it Matter?
Categorical and quantitative variables are statistically summarized and presented in different ways
Variable Type Data Presentation
Quantitative Graphs, Tables
Categorical Charts, Tables
Biological WeaponsProliferation Prevention ProgramBiological Threat Reduction Program
Tables
10
Tables: Characteristics
Data is arranged in rows and columns Presentation is simple and self-explanatory
Title Label each row and column Show totals for rows and columns Include units of measure (yrs, mg/dl) Explain codes in footnote
11
Simple Frequency Distribution
Age group (years) Number of Cases<14 230
15-19 437820-24 1040525-29 961030-34 864835-44 690145-54 2631>55 1278
Total 44081
Primary and secondary syphilis morbidityby age, United States, 1989
12
Determining Class Intervals
The intervals must be mutually exclusive and encompass all data.
For preliminary analysis a large number of intervals (4-8) is used. These intervals can then be consolidated.
Use standard or frequently applied intervals (for instance, up to the age of 19, 20-24 years, 25-29 years, etc.).
A category must be provided to accommodate unknown values (for instance “age unknown.”)
13
Two Variable Table
14
Format for 2 X 2 Table
Ill Well TotalExposed a bUnexposed c dTotal
15
Format for 2 X 2 Table
Dead Alive Total
Diabetic 100 89 189
Non-diabetic 811 2340 3151
TotalTotal 911911 24292429 3340
Follow-up status among diabetic and nondiabetic white men NHANES, 1982-1984
Biological WeaponsProliferation Prevention ProgramBiological Threat Reduction Program
Graphs and Charts
17
Charts and Graphs: Advantages
Easier to understand and interpret Get a good feel for the data before formal
analysis Reveal patterns in data
Used to generate hypothesis
18
Graphs: Types
Arithmetic-scale line graphs In-set graphs Histograms Frequency Polygons Cumulative Frequency Curve Scatter diagram
19
Graphs
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7
Independent Variable
Depe
nden
t Var
iabl
eTitle
20
Types of Variables
Dependent Describe outcome of interest
Examples: Dead, cancer, ill
Independent May cause or contribute to variation of the
dependent variable Not influenced by dependent variable
Examples: Time, age, packs of cigarettes, cholesterol levels
21
Arithmetic-Scale Line Graph
Source: CDC, National Notifiable Diseases Surveillance System
40
30
20
10
01950 1960 1970 1980 1990
Incidence of Hepatitis A, United States, 1952-1993
Rat
e /1
00,0
00
Year
22
Arithmetic-Scale Line Graph: Characteristics
Method of choice for plotting rates over time Set distance on graph represents same quantity
anywhere on the axis Horizontal graph x:y ratio is 5:3
Y-axis should start with 0 Determine largest value of Y needed to plot Round off that number and divide into
intervals
23
Arithmetic-Scale Line Graph
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
<1 1-4 5-910
-1415
-1920
-2425
-2930
-3435
-3940
-4445
-4950
-5455
-5960
-64 65+
19961997199819992000
Registered Death Rates by Age and Year, 1996-2000
Rat
e pe
r 100
0 po
pula
tion
Age Categories (Years)
24
Inset Graph
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
<1 1-4 5-910
-1415
-1920
-2425
-2930
-3435
-3940
-4445
-4950
-5455
-5960
-64 65+
19961997199819992000
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
1-4 5-910-14
15-1920-24
25-2930-34
35-3940-44
45-49
Registered Death Rates by Age and Year, 1996-2000
Rat
e pe
r 100
0 po
pula
tion
Age Categories (Years)
25
Inset Graph: Characteristics
A magnified portion of the larger, or host, graph
Can see data in better detail Smaller graph is “inset” into the larger
graph Variables remain the same
Independent data points do not change (e.g. age categories will remain in 5-year segments)
26
Histograms
Frequency of measles by week of onset Dec 6, 2000 to May 16, 2001
27
Histograms: characteristics
Graph of the frequency distribution of a continuous variable
Columns are adjoining
Area of each column is proportional to number of observations in that interval
28
March 13 March 14Onset (3-hour periods)
29
30
Histograms using continuous data
31
Frequency Polygon
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Week
Cas
es
Cases
Cases-FP
Example of a Frequency Polygon
32
Frequency Polygon: Characteristics
Graph of entire frequency distribution of a continuous variable
Number of events in interval plotted at midpoint of interval
Straight line connects points Useful to compare two or more
distributions on the same axis
33
Frequency Polygon
Relative frequency of serum cholesterol level by age
34
Cumulative FrequencyCumulative incidence of hepatitis B virus infection
by duration of high-risk behavior
0102030405060708090
100
0 1 2 3 4 5 6 7 8 9 10 11 12Years at Risk
Perc
ent i
nfec
ted
with
HB
V
IV Drug Users Homosexual Men Heterosexuals - multiple parters
35
Scatter Diagram
Relationship between age in years and heavy metal X exposure
36
Charts: Types
Appropriate for categorical data Bar charts
Simple Grouped Stacked
Pie charts
37
Simple Bar Chart Annual Death Rates by Govornorate, 1996-2000
0 50 100 150 200 250 300 350 400
AQABA
ZARQA
AMMAN
JARAS
MAFRQ
MADAB
TAFEL
IRBID
BALQA
KARAK
AJLON
MAANN
Gov
orna
orat
e
Rate per 100,000 population
38
Bar Charts: Characteristics
Display data from one-variable table Each variable is represented by a bar Bars are proportional to the number of events Can be presented vertically or horizontally
39
Vertical Bar Chart Qualitative Ordinal Variable
0
5
10
15
Mild Moderate Severe
Distribution of Cases by Clinical Status
Cas
es
Clinical Status
40
Grouped Bar Chart
Race
Freq
uenc
y
Treatment completion and cure of disease X in various racial groups, 1994-2000
0200400600800
1000120014001600
Race A Race B Race C Race D
CasesCompletionCure
41
Grouped Bar Chart: Characteristics
Illustrate data from two variable or three variable tables
Bars within groups are usually adjoining Bars between groups have a space Limit number of bars within group to less than
four
42
Stacked Bar Chart
0100200300400500600
1992 1993 1994 1995 1996
OthersFalciparum
Cases of malaria in a region, 1992-1996
Time
Case
s
43
Pie Chart
44
Anti-HAV Prevalence
High
Intermediate
Low
Very Low
Geographic Distribution of Hepatitis A Virus Infection
45
46
Selecting the Right Presentation Method (1)
Type of Graph or Diagram Application
Arithmetic Scale Graph
Inset Graph
Histogram
Data or indicator trends over time.
View a larger image of a portion of the host
graph
1.Frequency distribution for a continuous variable.
2. Number of cases during an epidemic (epidemic curve) or over time.
47
Selecting the Right Presentation Method (2)
Type of Graph or Diagram Application
Frequency Polygons
Cumulative Frequency Curve
Scatter Plot
Simple Bar Charts
Frequency distribution of a continuous variable for displaying components
Display cumulative frequency of a quantitative variable
Plot the relationship between 2 variables – looking for any correlation.
Compare the size or frequency of different categories of the same variable.
48
Selecting the Right Presentation Method (3)
Type of Graph or Diagram Application
Grouped Bar Chart
Stacked Bar Chart
Pie Chart
Compare the sizes or frequencies of different categories across 2-4 data sets
Compare totals and display component parts for several data groups
Display parts of a whole
49
Selecting the Right Presentation Method (4)
Type of Graph or Diagram Application
Spot Map
Area Map
Display locations of cases or occurrences
Display occurrences or indicators as they correspond to geographic divisions
50
Summary
Tables, charts, and graphs are effective tools for organizing, summarizing, and communicating data
In order to effectively communicate data, the correct presentation method must be selected