scientific software engineering methods and their validity
DESCRIPTION
This presentation is part of a series of seminar of scientific methods for software engineering. Outline and goals: Start from a general point of view in the philosophy of science Drill down to implications for every day scientific work (Projects, Publications, PhD Thesis, ) Discuss … how to allocate the presented methods into that picture the methods in context of a PhD dissertation the notion of validity and how to increase itTRANSCRIPT
Technische Universität München
Philosophy of Science
Scientific Methods and their Validity
Dr. Daniel Méndez Fernández Dr. Antonio Vetrò
Prof. Dr. Manfred Broy
Technische Universität München Institute for Informatics
Software & Systems Engineering
Goals of the talk
§ Get (back) to a bigger picture – Start from a general point of view in the philosophy of science – Drill down to implications for every day scientific work (Projects,
Publications, PhD Thesis, …) § Discuss …
– how to allocate the presented methods into that picture – the methods in context of a PhD dissertation – the notion of validity and how to increase it
2
Agenda
§ Postulate § Scientific Methods Overview § Scientific Methods “in Action” § In Quest for the Validity
3
Agenda
§ Postulate § Scientific Methods Overview § Scientific Methods “in Action” § In Quest for the Validity
4
What is science?
Science: Systematically and objectively gaining (and preserving), documenting, and disseminating knowledge § In principle, science tries to be objective by aspiring knowledge based on “facts”
(independent of subjective judgment!) However: § Accepting scientific results is a social process (documentation, communication,
following rules). § Some elements of science (mathematics, logics) seem to be unbiased – but
nevertheless rely on acceptance by the peers and capabilities to apply the theories.
§ One could also say: “In the end, it is also a matter of beliefs, capability, and individual and social judgment” (following some basic principles, rules, and codes)
5
Philosophy and science
6
Epistemology (“Erkenntnislehre”)
Ethics (“Verhaltenslehre”)
Ontology (“Seinslehre”)
Ontological questions (“Außenweltproblem”)
Questions on the
“being” à Bound to reality
Epistemological questions
(“Erkenntnisproblem”)
Questions on the observation / discovery
Ethical questions (“Verhaltensproblem”)
Questions on actions à Bound to morality
Object-Subject relation
From: Orkunoglu, 2010
Philosophy and science
7
Epistemology (“Erkenntnislehre”)
Ethics (“Verhaltenslehre”)
Ontology (“Seinslehre”)
Ontological questions (“Außenweltproblem”)
Is there a world independent of
subjectivity?
Epistemological questions
(“Erkenntnisproblem”)
From ehere do discoveries result? ���From experiences?
Ethical questions (“Verhaltensproblem”)
From where does ethics result? Does there exist something like universal
ethics?
Idealism
Realism
Solipsism
Rationalism
Empiricism
Scepticism
Normative Ethics
Descriptive Ethics
Everyday Ethics
From: Orkunoglu, 2010
What is science?
8
§ Aristoteles (384-324 BC) – Search for truth – Search for laws and reasoning for phenomena – Understanding the nature of phenomena
§ Francis Bacon (1561-1626) – Progress of knowledge of nature (reality) – Draw benefits from growing knowledge
§ Era of (French) Enlightenment (Voltaire (1694-1778), Diderot (1713-1784)) – Emancipation from god and beliefs
§ Kant (1724-1804) – System of Epistemology
§ Constructivism (Förster (1911-2002), N. Luhmann (1927-1998)) – Subjective construction
From: Orkunoglu, 2010
What is science?
9
Science
Theory Empiricism Communication
• Formal theories • Deduction • Models • Predictions • Explanations • …
• Observations • Experiments • Facts • …
• Intersubjective evaluations
• Agreement • …
Adapted From: Orkunoglu, 2010
3 objectives of science: • Analyse and Explain
• Predict • Design
Engineering approach: developing tools and techniques to solve practical problems by means of existing technology and available knowledge: is this science ?
What is the notion of “Truth”?
§ We speak about truth, if no subjective interpretation and distortion is possible § We could also say: “Whenever I repeat my treatment to a certain population, it will
always lead to the same observation” § If we have “universal truth”, we can call our results “generalisable” (“externally valid”) Challenges: Obtaining truth § Can we obtain something as “universal truth”? § Can we do so in a life time? Or even within a PhD? § What if my observations/interpretations/analyses are dependent on human factors? à Things can be true for certain contexts only!
10 Image: Sjøberg, 2011
A major challenge: Human factors
Why are human factors important to our field? § Software Engineering is an engineering discipline applied by human beings. § The value of solutions to practical problems too often depends on those to apply
the solutions.
What implications can we draw from that? § The notion of truth is “threatened” by subjectivity. à The good: We can make use of that subjectivity
(e.g. “expert opinion”) à The bad: We need to be aware of the implications
(e.g. the threats to the external validity) à The ugly: When relying on subjects, we will never obtain full external validity … One could also say: “Outside mathematics, there is no certainty.”
11
Truth in science is relative!
The different views onto science § Science is created by humans
– sociology of science – psychology of science (or scientists) – economy of science
§ Science as knowledge creation (discovery) – theory of knowledge – knowledge and insight – understanding and explanation
§ Science as mean to change the world – creative science – science and power – science and technology – design
12
Agenda
§ Postulate § Scientific Methods Overview § Scientific Methods “in Action” § In Quest for the Validity
13
Big Picture… 1st layer
14
Philosophy of science
Principle ways of working
Methods and Tools
Fundamental Theories
Epistemology (“Erkenntnistheorie”)
Empirical methods
Statistics
Hypothesis testing
Case studies
Logic
Examples
Theories
In Software Engineering, we rely on every layer!
15
Philosophy of science
Principle ways of working
Methods and Tools
Fundamental Theories
Setting of Empirical Software Engineering: § Methods and tools § Support theory building and
evaluation
§ Analogy: Theoretical and Experimental Physics
What do we usually need (e.g. in a PhD)?
16
Philosophy of science
Principle ways of working
Fundamental Theories
Methods and Tools
You are (usually) here
Big Picture… 2nd layer
17
Theory/System of theories
(Tentative) Hypotheses
Observations / Evaluations
Study Population
Induction
Pattern Building
Deduction
Falsification / Support
Further reading: Runeson et al. Case Study Research in Software Engineering: Guidelines and Experiments
Theory Building
Big Picture… 3rd layer: Methods and Tools
§ Each method I can apply… – Has a specific purpose – Relies on a specific data type
Purposes § Exploratory § Descriptive § Explanatory § Improving
Data Types § Qualitative § Quantitative
18
Example: Grounded Theory
(Tentative) Hypotheses
Study Population
Qualitative Data
Descriptive Exploratory, or
Explanatory
Big Picture… 3rd layer: Methods and Tools
19 Further reading: Runeson et al. Case Study Research in Software Engineering: Guidelines and Experiments
Theory/System of theories
(Tentative) Hypotheses
Observations / Evaluations
Study Population
Pattern Building
Falsification / Support
Theory Building
Formal / conceptual analysis
Grounded theory
Confirmatory • Case & Field Studies • Experiments, • Simulations Survey and Interview
Research
For now, prototyping is not part of this “method view” (so aren’t reference models)
• Ethnographic Studies
• Folklore Gathering
Exploratory • Case & Field Studies • Data Analysis
Field Study Research
How much external validity can I expect from applying the methods we usually apply?
20
You shall only get a feeling, please don‘t sue us Environment:
Reality
Artificial Environment
Level of Evidence (Lab) Experiment
Simulation
Case Study Research Survey R
esearch
...
...
Action R
esearch
We distinguish different levels of evidence
21 Further reading: Wohlin An Evidence Profile for Software Engineering Research and Practice
+ For
- Against
Strong evidence
Evidence
Circumstantial evidence
Third-party claim
First or second part claim
Strong evidence
Circumstantial evidence
Third-party claim
Evidence
First or second part claim
Agenda
§ Postulate § Scientific Methods Overview § Scientific Methods “in Action” § In Quest for the Validity
22
Preliminary remarks: A PhD thesis can have many contributions
Possible contributions § Exploration / evaluation of concepts
and dependencies § Identification of problems and / or
deficiencies in existing assumptions § Contributions to a precise
terminology § New views on existing concepts and
transfer of those concepts to new fields of application
§ New methods / methodologies § New theories § … Important: § Identification of scientific contribution
23
There is no one and only way of writing a “good thesis” Scientific methods § Theories
– Consistent, complete, … – Validation (of accuracy)
§ Dialectic § Empirical methods
– Experiments – Case/Field Studies – ….
§ Literature analyses § …. Important: § Scientific evaluation
– Empirical – Experimental – Theoretical – Positioning against state of science – …
What can be the scope of a thesis?
24
Scientific methods
Practical Problem Existing Theory
Evidently solve a problem (or parts of it) Refine Theory
Provide guidance for future research
Inspired by: Shneidermann Keynote at ESEM 2013
Problem solving
25 Source: http://researchinprogress.tumblr.com
How it should be
How it often is in reality
Let’s engineer problem discovery & solving
26
Engineeringcycle
Implementation Evaluation / Problem Investigation
Treatment DesignDesign Validation
Treatment Implementation
- Stakeholders, goals?- Phenomena? Effects?- (Lack of) contribution to goals?
- Specify requirements!- Contribution to goals?- Available treatments?- Design new ones!
- Effects of treatment in this context?- Effects satisfy requirements?- Trade-offs?- Sensitivity?
- Transfer to practice!
Further reading: Wieringa, R.J.: Relevance and problem choice in design science. In: Global Perspectives on Design Science Research. Lecture Notes in Computer Science (2010) 61–76
In any way, stick to the code of scientific working!
Principles in scientific work and behaviour 1. Integrity 2. Honesty 3. Transparency and accuracy 4. Rationalism Principles of working (and writing) § Clearly and objectively outline the goals, methods and contribution of your thesis
– motivation – relevance – validity
§ Describe related work, gaps left open, and how you intend to close those gaps § Choose appropriate methods (and reflect on them) § Work in teams!
27
If working in teams
§ Clarify your own (individual) contributions as soon as possible – Publish together with clear (predefined) authorship – Make your work transparent
• Discuss with colleagues from your research group (or from other groups) • Disseminate your results (and get feedback)
à In the end, however, be aware: only your individual contribution counts! § Dissertations and (funded) research projects
– Dissertation results can (and often should) be part of research projects – Problems: Potentially different goals, time constraints, …. – Instrument:
• Make clear (and discuss) your own contributions • Publish your results – also in early stages
28
Finally: There is a formal code of ethics for researchers The seven principles of the code, intended to guide scientist's actions, are: § Act with skill and care in all scientific work. Maintain up to date skills and assist
their development in others. § Take steps to prevent corrupt practices and professional misconduct. Declare
conflicts of interest. § Be alert to the ways in which research derives from and affects the work of other
people, and respect the rights and reputations of others. § Ensure that your work is lawful and justified. § Minimize and justify any adverse effect your work may have on people, animals
and the natural environment. § Seek to discuss the issues that science raises for society. Listen to the aspirations
and concerns of others. § Do not knowingly mislead, or allow others to be misled, about scientific matters.
Present and review scientific evidence, theory or interpretation honestly and accurately.
29 Source: David King 2007, the UK government's chief scientific advisor
Professional and ethical responsibility
§ Software engineering involves wider responsibilities than simply the application of technical skills
§ Software engineers must behave in an honest and ethically responsible way if they are to be respected as professionals
§ Ethical behaviour is more than simply upholding the law § Principles:
– Confidentiality – Competence – Intellectual property rights – Refrain from computer misuse – …
30 Further reading: M. Broy and B. Berenbach Professional and Ethical Dilemmas in Software Engineering, IEEE Computer 2009
ACM/IEEE Code of Ethics
§ Software engineers shall commit themselves to making the analysis, specification, design, development, testing and maintenance of software a beneficial and respected profession. In accordance with their commitment to the health, safety and welfare of the public, software engineers shall adhere to the following Eight Principles: – PUBLIC INTEREST
– CLIENT AND EMPLOYER INTEREST
– PRODUCT
– JUDGEMENT
– MANAGEMENT
– PROFESSION
– COLLEAGUES
– SELF
31
Agenda
§ Postulate § Scientific Methods Overview § Scientific Methods “in Action” § In Quest for the Validity
32
Postulate
§ There are certain rules and principles for doing scientific work § Creation of scientific knowledge follows a number of patterns of scientific
method § There is a scientific community to judge about the quality of scientific work
33
How to judge the quality of scientific contributions?
§ The notion of quality is multi-faceted... (in general).
§ A scientific contribution as well as the methods used can be evaluated w.r.t.: – Relevance and impact (theoretical and practical) – Rigorousness – Novelty – Appropriateness – Validity – Conformance to scientific rules – …
34
Validity – what is it
In science and statistics, validity § is the extent to which a concept, theory, conclusion, or measurement is well-
founded – well-formedness – preciseness – consistency – scope – ...
§ corresponds accurately to the real world.
35 Source: Adapted from Wikipedia
Understanding the validity: Why and what?
§ Increase awareness of potential threats in my study regarding – Level of objectivity (“External Validity”) – Appropriateness of design to answer research questions (“Construct Validity”) – Appropriateness of measurements (“Internal Validity”)
Ø Support yourself in designing a study Ø Support others in understanding and potentially replicating your study Ø Support yourself and others in better understanding:
Ø The context of a study Ø The limitations of a study
Ø Increase the trustworthiness of the results
36
Types of validity
37
Cause construct
Effect construct
Experiment objective Theory cause-effect
construct
Treatment Outcome
Observation
Experiment operation
treatment-outcome construct
Independent variable Dependent variable
1 2
3
4
1. Conclusion 2. Internal 3. Construct 4. External
3
Source: Wohlin et al. Experimentation in Software Engineering: An Introduction.
Types of validity
§ Following classification scheme has been established for empirical SE: 1. Conclusion validity:
“In this study, is there a relationship between treatment and outcome ? 2. Internal Validity:
“Assuming there is a relationship in this study, is the relationship a causal one?” 3. Construct Validity:
Assuming that there is a causal relationship in this study, can we claim that the treatment reflects well our cause construct and that our measure reflects well our idea of the construct of the measure ?
4. External Validity: “Assuming that there is a causal relationship in this study between the cause and the effect, can we generalize this effect to other persons, places or times ?
38
The validity questions are cumulative
§ Validity types build on one other
Is there a dependency between the cause and the effect ? Adapted from William M.K. Trochim, 2008
Is the dependency causal ?
Can we generalize to the constructs?
Can we generalize to other persons, places, times ?
Validity is not just the last paragraph of a paper!
Validity evaluation is part of research planning!
§ For each threat type, a list of threats is available in [Cook79] and [Campbell63] – Credibility – Transferability – Confirmability – …
§ Priority among the threats is a matter of optimization
§ Possible rank in theory testing : – Internal – construct – conclusion – external
§ Possible rank in applied research: – Internal – external – construct – conclusion
40
How can I support validity in general?
In general, we have 2 possibilities: 1. Support the validity by construction (often referred to as “validity procedures”) 2. Increase the validity after the fact
41
Constructively supporting validity
Conclusion Validity § Capture and critically discuss statistical assumptions and estimate probability of making errors § Draw baselines to compare representatives of samples (e.g., in surveys) Internal Validity § Minimise side-effects and confounding factors, e.g., wording in questionnaire, effects by
interviewer and action research § Be unbiased! § Refer to method and subject triangulation Construct Validity § Reproducibly define research questions and methods (e.g. by using GQM) External Validity § Observe and explain objects and subjects à Qualitative studies § Refer to data triangulation § Refer to independent replication studies! Further Tips § Define and report the study according to available guidelines § Be patient, be flexible § Recognise the positive value of checking the threats to validity!
42
Example § Comparing four approaches for technical debt identification,
Nico Zazworka, Antonio Vetro’, Clemente Izurieta, Sunny Wong, Yuanfang Cai, Carolyn Seaman & Forrest Shull, Software Quality Journal, 21(2), 2013
§ Large correlational analyses (~ 100.000 data points) on 13 releases of Hadoop open source software to discover relationship between quality structural metrics (at code, design and architectural level) and rework indicators (defect proneness and change proneness)
43
Threat Type Control strategy Choice of statistical significance thresholds Conclusion Literature-based choice of thresholds
Data transformation [0,N] à [0,1] Conclusion Distribution check
Metrics not normalized by classes size Conclusion Correlation check
Correlations found are incidental Internal Effect measured on two outcomes
Classes size measured by nr of methods Construct Correlation check
Defect proneness measured by nr of bug fixes Construct Checked with three different computation methods
Findings generalizability External Aggregation on 13 different releases
Increasing the validity after the fact
Independent Confirmation § Case study /experimental research of theories by researchers not involved in
development of theory § Replication of experiments or case studies until reaching saturation
(or getting retired)
Challenges § What can we expect from a PhD thesis?
44
Discuss! J
Some final, but important remarks
§ Don’t focus on the “size” of the problem, but on – The relevance (the practical, but also the theoretical!) – The accuracy in the investigation (problem and evaluation research)
§ However: Don’t be afraid to – aim high! – be hard-headed! – (but also accept if things don’t work)
§ When conducting empirical investigations: – Do not make claims you can not eventually measure – The scope / locality … is not the most important thing, as long as:
• The study population is accurately chosen and described • The validity is carefully outlined • The conclusions are drawn accordingly
§ Finally: Don’t think in black and white only – Don’t divide the world in basic and applied research – Don’t be afraid to look also at other disciplines
45