Towards Common Standards for Studies of Software Engineering
Tools and Tool Features
Timothy C. Lethbridge
University of Ottawa
Premise: It is desirable to guide researchers studying SE tools
Proposal: Create an inventory of practices to guide such studies
Researchers could then create papers that would be More comparable More easily reviewable More indexable
Types of Evaluation CommonlyFound in Tools Papers
a) None - just a description b) Includes rationale c) Demonstration of adoption d) Anecdotes and lessons learned e) Informal studies - includes descriptive stats f) Formal experiments involving students g) Formal experiments involving practitioners
Case studies papers: Some combination of b-e
Experimental papers: f and g but beware of overconfidence in results
Papers of type e, f and g would benefit from following certain consistency patterns to facilitate comparability
Inventory of Measures.
The following are purely examples that might be found in such an inventory M1. Time taken to perform a given task. M2. Amount of a given task completed correctly
in a fixed time. The fixed time might depend on the task.
M3. Errors made in a given task M4. Subjective answers on a scale to specific
questions: (Questions to be listed in the inventory)
Inventory of study types
ST1. Usability evaluation of a specific feature or tool implementation. Help ensure that results from other study types are
not confounded purely by poor usability. Provides evidence for these research
questions: Q1a To what extent is the feature or tool usable?
Measures: M1, M2 and M3 (compared against a threshold).
Q1b What usability defects are present and which ones should be repaired? (qualitative).
Study types - continued
ST2. Comparison of a small number of different feature implementations, each providing roughly the same functionality.
Provides evidence for these research questions: Q2a What is the best user interface for a
certain feature? Measures: M1, M2, M3, M4 (measured separately for
each implementation) Q2b What comments do users have about each
implementation? (qualitative)
Study types - continued
ST3. Comparison of two alternative feature sets that achieve roughly the same goal, but in different ways.
Provides evidence for these research questions: Q3 What is the 'best' functionality for a certain
task? Measures: M1, M2, M3, M4
Measured separately for each feature set
Study types - continued
ST4.Comparison of presence and absence of a feature (or of a small feature set) in a tool
Provides evidence for these research questions: Q4a Is the feature worth including in a final
tool set? Measures: M1, M2, M3 (measured separately for a
tool with presence or absence of the features) Q4b What benefits are provided by the
feature? (qualitative)
Study types - continued
ST5. Determination of which specific combinations of features are most useful as the context varies
Provides evidence for these research questions: Q5 Which features should be available in a
given tool so the tool can be used in a variety of contexts?
Measures: M1, M2, M3, M4a, M4c Measured as the feature sets and contexts are
varied in different combinations
Study types - continued
ST6 Comparison of entire tools Incorporating sets of features Less abstract than ST3
Provides evidence for these research questions: Q6 Which of several tools is best used for a
given task? Measures: M1, M2, M3, M4
Measured separately for each tool