Providing Help with Statistical Concepts and Terms:
Enhanced Glossary and Ontology
Stephanie W. Haas
Ron Brown
Cristina Pattuelli
Development of Enhanced Glossary
term
content formatcontext specificity
presentations user control
ontology
Terms• Include terms that users frequently encounter on
agency sites, not comprehensive dictionary• Basic level of statistical literacy, not highly
technical resource• Strategies for term identification
– examination of frequently-visited pages– anecdotal evidence from agency and non-agency
consultants– metadata user study– webcrawl of agency sites
Content
• Provide basic level of explanation• May include:
– definition– example– brief tutorial– demonstration– interactive simulation– combination
• May incorporate related terms and concepts• Give pointers to more complete and/or more
technical explanations
Context specificity
• Explanations provided at varying levels of specificity– General, context-free, “universal”– Agency or concept-specific, incorporating
entities from agriculture, labor, science R&D, energy, etc.
– Table- or statistic-specific, based on a single row, column, or statistic, e.g., CPI, national death rate, gasoline prices in NY state, etc.
• Provide explanations of term or concept that are as relevant to user’s current context as possible.
• When user invokes help on a term, the most specific explanations available are offered.
• If there is no explanation for that specific statistic or table, more general (e.g., agency-specific) ones are offered. Default is “universal” level.
• Path from specific to general is based on the ontology.
Format
• User can choose desired format of explanation, based on interest, learning style, reading level, hardware/software limitations– text– text plus audio (narration)– graphic– animation– interactive
User Control
• Make glossary help attractive and accessible• Help users understand the statistics they find
without interrupting their information-seeking task
• Let users know when help is available• Let users choose the format and specificity they
desire• Control mechanisms, e.g., means of invocation
and termination, pop-up windows, mouse-overs, etc.
Creating the Ontology
• Select ontology editor to meet our needs• Include terms and concepts to support
glossary. – May need “connecting nodes” that aren’t in
glossary
• Relationships– standard – isa, instance, etc.– domain-specific – predicts, smoothes, etc.
• Visualization tools for end users (future work)
Ontology support for glossary
Relationships support design and display of term explanations
• Specificity of explanations– inheritance of more general explanations
• Explanation templates– sample: samples for specific surveys– index: CPI, Antiknock Index
• Related terms – incorporation into tutorial– population, sample
Current Coverage
• adjustment– universal– age adjustment - FL death rates– seasonal adjustment - NY unemployment rate
• index – universal, CPI, Antiknock index
• population, parameter, sample, statistic– universal, weekly gasoline prices, NY state
weekly gasoline prices, height & weight of U.S. adult residents
Mock-ups
Suppose this picture represents the population of people in the entire country.In this population, a certain percentage (p) of people like dogs. In this example,10 people like dogs. P is the parameter that measures this view of the population.It is the value that you would get if you could survey the entire population. 20% of the people in this population like dogs.
Dislikes dogsLikes dogs
p = 10/50 = .2 = 20%
population & sample (1)Population
In real life it is difficult to survey the entire population so we take a sample. We can then count the number of people in the sample who like dogs,and calculate a statistic (P*) that is an estimate of the value of p.In this case, P* overestimates the value of the parameter p.
Dislikes dogsLikes dogs
population & sample (2)
Sample
p = 10/50 = .2 = 20% P* = 3/10 = .3 = 30%
Population
EIA weekly gasoline prices
Every Monday, retail prices for all three grades of gasoline are collected by telephone from a sample of approximately 900 retail gasoline outlets.
Reported in:Weekly U.S. Retail Gasoline Prices, Regular Grade
Dollars per gallon, including all taxeshttp://www.eia.doe.gov/oil_gas/petroleum/data_publications/wrgp/mogas_home_page.html
•text example of population and sample for this table•graphical example of population and sample for this table
population: all retail gasoline outlets
sample: 900 retail gasoline outlets
regular gasoline,mean price/gallon,9/30/02 = $1.413
graphical example of population & sample, gasoline prices
•text example of sample for NY•graphical example of sample for NY
$$
9/30/02
sample of New York retail gasoline outlets
mean cost = $1.529 per gallon
graphical example of sample, NY gasoline prices
•graphical example of population and sample for body measurements
graphical example of population and sample for body measurements
each participant represents approximately 50,000 other U.S. residents
5,000 individuals are surveyed annually
Populationis_described_by
mean
standard_deviation
Parameter
Sample
Is p
art
of
sample_mean
sample_standard_deviation
Statisticis_described_by
Is a
pre
dict
or o
f
Populationis_described_by
mean
standard_deviation
Parameter
Sample
Is p
art
of
sample_mean
sample_standard_deviation
Statisticis_described_by
Is a
pre
dict
or o
f
Is a
pre
dict
or o
f
Is a
pre
dict
or o
f
Population
Sample
Is p
art
of
U.S. residents
NY State retail gasoline outlets
U.S. retail gasoline outlets
5,000 U.S. residents/yr
900 U.S. retail gasoline outlets
n NY State retail gasoline outlets
Is p
art
of
Is p
art o
f
Is p
art
of
instance of
U.S. R&D companies
n U.S. R&D companies
Is p
art
of
IndexAn index combines numbers measuring different things into a single number. The single number represents all the different measures in a compact, easy-to-use form. Values for an index can be compared to each other, for example, over time.
combiner
index = 12.3
10.1
103
24.759
6
42
12
12.5
13
13.5
14
14.5
Jan Apr Jul Oct
Jan.combiner
Apr.combiner
Jul.combiner
Oct.combiner
12.3 13.1 13.9 14.3
The index has increased this year.
Consumer Price Index (CPI)
The Consumer Price Index (CPI) represents changes in prices of all goods and services produced for consumption by urban households. It combines prices into a single number that can be compared over time.
Items are classified into 8 major groups:•Food and Beverages•Housing•Apparel•Transportation•Medical Care•Recreation•Education and Communication•Other
Consumer Price Index
medical careother
CPI combiner
transportationfood & beverage
apparel
recreation
housing
education & communication
Telephone
The Consumer Price Index has increased since 1995.
1997 CPICombiner
1998 CPICombiner
1999 CPICombiner
2000 CPICombiner
2001 CPICombiner
160
165
170
175
180
1997 1998 1999 2000 2001
Antiknock Index, also known as Octane Rating
A number used to indicate gasoline’s antiknock performance in motor vehicle engines. The two recognized laboratory engine test methods for determining the antiknock rating, i.e., octane rating, of gasolines are the Research method and the Motor method. In the United States, to provide a single number as guidance to the consumer, the antiknock index (R+M)/2, which is the average of the Research and Motor octane numbers, was developed.
http://www.eia.doe.gov/glossary/glossary_a.htm
Research method
Motor method
Antiknock Index, also known as Octane Rating
Regular:
85 - 88
Midrange:
88 - 90
Premium:
90 or above
(R + M)/2
AntiknockCombiner
Next Steps
• expand coverage of core terms– webcrawl indicates measures of central
tendency are next: average, mean, median, mode
• expand coverage of ontology• expand presentation examples
– animations, simulations
• explore user controls• user study of effectiveness