exploratory testing stew 2016

Exploratory testing – black or white? PER RUNESON, ELIZABETH BJARNASON, KAI PETERSEN

Exploratory Testing

…a powerful approach, yet widely misunderstood …orders of magnitude more productive than scripted testing …simultaneous learning, test design and test execution

James Bach Exploratory testing evangelist

What is ET?

Exploratory software testing (ET) is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project. Cem Kaner

Sounds promising…

…but… – impossible to automate – highly dependent on tester skills – hard to replicate failures (if testing is not

traced)

And, do we really know?

Exploring Exploratory Testing – outline

• Variations of exploratory testing • Empirical evidence on:

– Efficiency – Relation to knowledge and skills

• Recommendations • Making exploratory testing actionable

Variations of Exploratory Testing

Freestyle Pure scripted

Test object only

Test object, test steps, test data

Test goals, constraints

Test charter types •  Fully scripted: defined test steps and test data –> no room

for exploration steps.

•  Low degree of exploration: defined high goals, restrictions and test steps. Tester chooses test data.

•  Medium degree of exploration: defined high goals and additional restrictions e.g. detailed goals, priorities, risks, tools used, the functionality that needs to be covered, or the test method to be used.

•  High degree of exploration: one or more high goals. Besides that, the tester can freely explore the system.

•  Freestyle: Only the test object is provided to the tester.

Is Exploratory Testing efficient?

Experiments on TCT vs ET •  46 students and 24 practitioners in 90 minute

sessions [Afzal 2015] –  Test design included in TCT session –  Faults found (ET) >> Faults found (TCT)

•  79 students in 90 minute sessions [Itkonen 2007] –  Test design NOT included in TCT session –  Faults found (ET) ≈ Faults found (TCT)

Empir Software Eng (2015) 20:844–878DOI 10.1007/s10664-014-9301-4

An experiment on the effectiveness and efficiencyof exploratory testingWasif Afzal ·Ahmad Nauman Ghazi · Juha Itkonen ·Richard Torkar ·Anneliese Andrews ·Khurram Bhatti

Published online: 16 April 2014© Springer Science+Business Media New York 2014

Abstract The exploratory testing (ET) approach is commonly applied in industry, but lacks

scientific research. The scientific community needs quantitative results on the performance

of ET taken from realistic experimental settings. The objective of this paper is to quantify

the effectiveness and efficiency of ET vs. testing with documented test cases (test case based

testing, TCT). We performed four controlled experiments where a total of 24 practition-

ers and 46 students performed manual functional testing using ET and TCT. We measured

the number of identified defects in the 90-minute testing sessions, the detection difficulty,

severity and types of the detected defects, and the number of false defect reports. The results

show that ET found a significantly greater number of defects. ET also found significantly

more defects of varying levels of difficulty, types and severity levels. However, the two test-

ing approaches did not differ significantly in terms of the number of false defect reportsCommunicated by: Jose Carlos MaldonadoW. Afzal (!)School of Innovation, Design and Engineering, Malardalen University, Vasteras, Sweden

e-mail: [email protected]. N. Ghazi · K. BhattiBlekinge Institute of Technology, SE-37179, Karlskrona, SwedenA. N. Ghazie-mail: [email protected]. ItkonenDepartment of Computer Science and Engineering, Aalto University, Espoo, Finland

e-mail: [email protected]. TorkarDepartment of Computer Science and Engineering, Chalmers University of Technology|University of

Gothenburg, Gothenburg, Swedene-mail: [email protected]. AndrewsUniversity of Denver, Denver, CO 80208, USAe-mail: [email protected]

Is Exploratory Testing Efficient?

•  Yes, very efficient if you only run test test case once •  Equally or more efficient, if you only count execution •  Not efficient, if you want to automate

CC BY-ND 2.0 Hans Splinter @Flickr

Knowledge in Exploratory Testing

Analysis of 12 ET sessions in 4 units of 3 companies, analyzing 88 failures [Itkonen 2013]

Knowledge types •  domain knowledge, •  system knowledge, and •  generic software engineering knowledge

Design +

Oracle

The Role of the Tester’s Knowledgein Exploratory Software TestingJuha Itkonen, Member, IEEE, Mika V. Mantyla, Member, IEEE, andCasper Lassenius, Member, IEEEAbstract—We present a field study on how testers use knowledge while performing exploratory software testing (ET) in industrial

settings. We video recorded 12 testing sessions in four industrial organizations, having our subjects think aloud while performing their

usual functional testing work. Using applied grounded theory, we analyzed how the subjects performed tests and what type of

knowledge they utilized. We discuss how testers recognize failures based on their personal knowledge without detailed test case

descriptions. The knowledge is classified under the categories of domain knowledge, system knowledge, and general software

engineering knowledge. We found that testers applied their knowledge either as a test oracle to determine whether a result was correct

or not, or for test design, to guide them in selecting objects for test and designing tests. Interestingly, a large number of failures, windfall

failures, were found outside the actual focus areas of testing as a result of exploratory investigation. We conclude that the way

exploratory testers apply their knowledge for test design and failure recognition differs clearly from the test-case-based paradigm and

is one of the explanatory factors of the effectiveness of the exploratory testing approach.Index Terms—Software testing, exploratory testing, validation, test execution, test design, human factors, methods for SQA, and V&VÇ1 INTRODUCTION

SOFTWARE testing is traditionally considered a process ofexecuting test cases, which are carefully designed using

test-case design techniques [1], [2], [3], [4]. Test-case designtechniques aim at ensuring systematic coverage, detectionof typical error types, and reduction of redundant testing[1], [2], [5]. The test-case-based testing paradigm (TCBT)assumes that actual test execution, even when performedas a manual activity, is a more or less mechanical task.During execution, the predefined test cases are run andtheir outputs compared to the documented expectedresults. However, studies on industrial practice report thatreal-world testing seldom is based on rigorous, systematic,and thoroughly documented test cases [6], [7], [8].

Although test automation has been the focus of a greatdeal of research, manual testing is still widely utilized andappreciated in the software industry, and is unlikely to bereplaced by automated testing in the foreseeable future [6],[9], [10], [11], [12]. In many software development contexts,the manual testing effort of professional testers andapplication domain experts is crucial to ensuring thatproducts fulfill the needs of the users or please the markets.In this context, exploratory software testing has beenproposed as an effective testing approach.The exploratory testing (ET) approach differs signifi-cantly from traditional software testing in that it is notbased on predesigned test cases. Instead, it is a creative,

experience-based approach in which test design, execution,and learning are parallel activities, and the results ofexecuted tests are immediately applied for designingfurther tests [13]. Exploratory testing is a recognized testingapproach [14], but has commonly been referred to as ad hoctesting or error guessing [1], [2], [14]. Practitioners,however, recognize that exploratory aspects are funda-mental to most manual testing activities [10], [15], [16], [17].There are a growing number of practitioner reports andstudies on the benefits of exploratory testing [13], [18], [19],[20], [21]. In these reports, ET is commonly described in thecontext of system-level testing of interactive systemsthrough the GUI and from the end user’s point of view.

In studies of manual testing and, in particular, in the ETcontext, the experience and especially the applicationdomain knowledge of testers have been recognized asimportant aspects affecting the tester’s behavior and results[12], [16], [22], [23].

In this paper, we use the term knowledge to refer to thetester’s personal knowledge in a rather wide meaning. Usingthe terminology of Robillard [24], we include both topic, i.e.,meaning of words, and episodic, i.e., experience withknowledge, types of knowledge, and, to some extent, tacitknowledge.

Knowledge can be applied to different exploratorytesting tasks and purposes. First, knowledge can be usedas information to guide exploratory test design. Second,knowledge can be used to recognize failures, i.e., as anoracle to distinguish between a correct, expected outcomeand an incorrect, defective outcome [14]. Third, knowledge,together with the observed actual behavior of the testedsystem, can be used to create new, better tests duringexploratory testing.

In this paper, we present a study in which we examinedhow testers provoke and recognize software failures in

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 5, MAY 2013

707

. The authors are with the Department of Computer Science and Engineering,

School of Science, Aalto University, PO Box 15400, FI-00076 Aalto,

Finland. E-mail: {juha.itkonen, mika.mantyla, casper.lassenius}@aalto.fi.Manuscript received 11 May 2011; revised 25 Feb. 2012; accepted 4 Sept.

2012; published online 10 Sept. 2012.Recommended for acceptance by L. Williams.For information on obtaining reprints of this article, please send e-mail to:

[email protected], and reference IEEECS Log Number TSE-2011-05-0143.

Digital Object Identifier no. 10.1109/TSE.2012.55.

0098-5589/13/$31.00 ! 2013 IEEE Published by the IEEE Computer Society

Findings on Knowledge

1.  ET is efficient since the testers use different types of personal knowledge, rather than restricting their focus

2.  Failures are incidentally found outside the actual target features of the testing activities

3.  A large fraction of the failures do not require complicated test designs to be provoked

4.  Domain knowledge issues are straightforward to provoke, while system or generic knowledge issues are more complicated to provoke in terms of the number of interacting conditions.

Formal Training in Exploratory Testing

•  Experiment with 20 professionals [Micalef 2016] – with/without formal test training – 20 injected faults in e-commerce system – up to 40 minute session with eye-tracking device

TABLE IIIPARTICIPANT DEMOGRAPHICS AND BACKGROUND INFORMATION (G:

GENDER, A: AGE, E: YEARS OF EXPERIENCE IN A TESTING ROLE)

Carmen GeorgeID Domain G A E Domain G A E1 Design M 36 0 E-comm F 31 32 E-comm F 51 1 Payments M 25 3.53 Content M 30 0 E-comm M 39 134 E-comm M 34 0 Payments M 27 1.55 E-comm F 37 0 Telco M 32 36 Gaming M 26 1 Payments F 23 37 E-comm F 26 0 E-comm F 27 78 E-comm F 29 0 Payments M 31 29 E-comm F 24 0 Networks M 28 6

10 Virtualiz. M 24 2 Payments M 21 1Median 29.5 0 Median 27.5 3

Mean 31.7 0.4 Mean 28.4 4.3

were allowed to test the system (exploratory) for up to40 minutes whilst a researcher observed their activitiesfrom a remote monitor (and taking notes unobtrusively).The participant’s terminal was equipped with an eye-tracking device (Tobii X-120), a camera (with face-tracking capabilities) as well as a microphone (see Figure3). Participants were encouraged to think-aloud duringthe test session, commenting on specific bugs beingdiscovered as well as general issues with the system. TobiiStudio (Eye Tracking Software for Analysis) was usedto capture eye-gaze data together with mouse activity,audio and video streams. Most participants suggestedimprovements to the e-commerce site while others foundbugs that were not intentionally injected as part of thestudy. Think-aloud allows participants to focus on theirprimary task without the need to interrupt their workflowto log bugs (manually or online). A researcher tooknote of any points of interest (POIs) that may haveoccurred during test session (e.g., participant’s gaze wasfixed for a long time on a specific element) in whichcase the participant was invited for a reflective think-aloud (RTA) session right after the test session. Here theresearcher plays back portions of the session (POIs) tothe participant upon which a discussion ensues. This addsa second layer of understanding to the otherwise sterilegaze-data. Following the test session (and RTA, if appli-cable), a short debriefing exercise is conducted wherebythe researcher concludes the session through a short semi-structured interview (capturing participants’ biographicinformation, knowledge of exploratory testing strategiesas well as insights into their professional experience).

Data processing1) Reviewing raw data: Over 13 hours of gaze-point data

together with the corresponding audio and video streamswere reviewed systematically using a set of predefinedscoring sheets (see Figure 4). For each session theresearcher took note of a) predefined observable be-havioural patterns (e.g., tester hovered around the same

Fig. 3. Participant eye-tracking terminal (left) and remote monitoring station(right)

area where a bug was previously found), b) bugs reported(if at all) and c) corresponding timestamps (down to aminute by minute granularity).

Fig. 4. Scoring sheet used during the data processing stage. This helpedthe researcher to systematically annotate observed behavioural patterns (fromvideo, audio and gaze-data) together with bugs reported by each participant.

2) Data staging: Strategies were abstracted as a sequentialset of behavioural patterns (e.g., ES2 = [BP21, BP5,BP22, BP20]), therefore allowing the researcher to mapa series of observed participant behavioural patterns toa probability that a particular strategy was being usedduring a specific timeframe (knowingly or unknowingly).Furthermore the exploratory strategies selected for thisstudy were grouped into three broad categories: guidedstrategies, semi-guided strategies and unguided strategiesin descending order of rigour and technical know-howrequired for each strategy. Given that we were monitoringthe use of seven strategies by twenty participants in thestudy, it was decided that it would be more manageable togroup strategies together based on where they lay on theguided/unguided scale as depicted in Figure 5. Based onthis scale, ES1 and ES3 were classified as unguided, ES2,ES4 and ES7 were classified as guided whilst ES6 andES5 were classified as semi-guided because they exhibita balanced amount of characteristics from both extremes.The resulting information was synthesised in a tabularformat which represented a minute by minute log ofwhich tester type was using which strategy type and

309

Do Exploratory Testers Need Formal Training?

An Investigation Using HCI Techniques

Mark MicallefFaculty of ICTUniversity of [email protected]

Chris PorterFaculty of ICTUniversity of [email protected]

Andrea BorgFaculty of ICTUniversity of [email protected]

Abstract—Exploratory software testing is an activity which

can be carried out by both untrained and formally trained

testers. We personify the former as Carmen and the latter as

George. In this paper, we outline a joint research exercise between

industry and academia that contributes to the body of knowledge

by (1) proposing a data gathering and processing methodology

which leverages HCI techniques to characterise the differences in

strategies utilised by Carmen and George when approaching an

exploratory testing task; and (2) present the findings of an initial

study amongst twenty participants, ten formally trained testers

and another ten with no formal training. Our results shed light

on the types of strategies used by each type of tester, how they are

used, the effectiveness of each type of strategy in terms of finding

bugs, and the types of bugs each tester/strategy combination

uncovers. We also demonstrate how our methodology can be

used to help assemble and manage exploratory testing teams in

the real world.

I. INTRODUCTION

Carmen and George are both employed as software testers.

However, whilst George is formally trained and certified,

Carmen has no training but got the job because she is ‘good at

finding bugs’. A heated debate about whether or not software

testers should be formally trained and certified was sparked

by the announcement in 2011 of the development of a new

ISO standard on software testing [1] and further fuelled by

the perception that unlike other roles in software engineering,

testing can be carried out by anyone, providing they under-

stand an application’s domain. One side argues that certified

testers will approach testing with discipline and consistency

whilst the other argues that much like a driving license does

not make one a good driver, a testing certification does not

guarantee a good tester. Furthermore, end-users regularly find

and report bugs in systems even though they are not trained

as testing professionals. Testing encompasses a wide range of

skills (e.g., planning, design, automation, exploratory testing),

proficiency in most of which requires a certain level of formal

training.However, an interesting opportunity for joint research be-

tween academia and the industry presents itself when one

considers that exploratory testing can be carried out by both

trained and untrained testers. Exploratory testing involves a

software tester interacting with a system in an unscripted man-

ner guided mainly by her intuition and experience. Although it

is a recognised approach, the technique is frequently referred

to as ad-hoc testing and suffers from the reputation of deliv-

ering inconsistent results depending on the tester executing it.

Despite the fact that there are documented exploratory testing

strategies which can be utilised [2], effectiveness has been

shown to be dependent on the tester’s knowledge [3], learning

style [4] and even personality [5]. The debate as to whether

or not formal training is a positive or negative influence on

the quality of exploratory testing motivates our hypothesis.

A. Hypothesis and Research Questions

If one partitions testers into two broad groups such that one

group consists of testers with a formal qualification in software

testing, and the second group consists of testers with no such

formal training, then we hypothesise that the two groups of

testers intuitively use different yet complimentary exploratory

testing strategies. In order to explore this hypothesis, we

propose to investigate three research questions:

(RQ1) Which types of exploratory testing strategies are

utilised by testers in each group?

(RQ2) Which types of bugs are found by testers in each

group?(RQ3) Is there a link between the bugs found and the testing

strategies adopted by the tester groups?

B. Research ChallengeThe main research challenge here is instigated mostly by the

fact that our subjects of interest (software testers) probably

do not possess explicit and standardised knowledge of the

strategies that they utilise to do their testing. To use a medical

metaphor, whereas two doctors are highly likely to refer to

any number of medical procedures by their technical names

and understand each other perfectly well, very little such

standardisation exists in exploratory software testing. This

is further compounded by the fact that one of the groups

we want to study is not even formally trained and would

therefore be even less likely to know any jargon. Therefore,

we needed to design a methodology which non-intrusively

extracts information about which strategies are being used by

participants at any point in time.C. How can HCI Help?Human Computer Interaction (HCI) is an interdisciplinary

area of research and practice which has evolved throughout the

2016 IEEE International Conference on Software Testing, Verification and Validation Workshops

/16 $31.00 © 2016 IEEEDOI 10.1109/ICSTW.2016.31

305

Do Exploratory Testing need Formal Training?

Fig. 5. The scale of test strategies - from unguided to guided

whether any bugs (by type) were reported. The datastaging phase produced the necessary level of detailrequired to conduct an in-depth analysis as discussed inthe following step.

Evaluation and Conclusions

1) Data analysis: Here, we leveraged preprocessed data inorder to produce outcomes and recommendations. Quan-titative tests on scoring data allowed us to identify di-verging a) behavioural patterns between tester categories,b) rigour in the use of test strategies and c) effectivenessof strategy use in terms of bugs discovered (and theirrespective categories). For this purpose pivot charts wereadopted so as to stay as close to the data as possible whilebeing able to transform the observed data to uncoverpotentially hidden patterns and correlations.

2) Expert evaluation: A questionnaire was sent out to around40 software quality assurance professionals to obtain ameasure of perceived importance for the various bugcategories covered in our study (i.e., which bugs, in orderof importance, should be reported and fixed prior to re-lease?). Finally we briefed a group of testing profession-als with our main results and interpretation thereof, andthese in turn contributed with their own practice-basedopinions, interpretations and recommendations. Thesetwo feedback loops were considered while formulatingour discussion (see Section V) as well as in our finalremarks (see Section VI) and potential future researchavenues (see Section VI-A).

3) Conclusions: A set of recommendations were finallyproduced with respect to the assembly and managementof exploratory testing teams, which could in turn improveefficiency and return on investment.

IV. RESULTS

As outlined in Section I-A, this work was driven bythree research questions which were designed to collectivelycharacterise the differences between how formally trainedtesters (George) and untrained testers (Carmen) approach anexploratory testing task. In this section, we present an outlineof the empirical results relating to each of these questions.We then go on to a synthesised discussion of these results inSection V.

TABLE IVDISTRIBUTION OF BUGS FOUND ACCORDING TO CATEGORY AND THE

TYPE OF TESTER THAT FOUND THEM.

Category Carmen George TotalContent Bugs 35 (54%) 30 (46%) 65Input Validation Bugs 6 (21%) 23 (79%) 29Logical Bugs 5 (50%) 5 (50%) 10Functional UI Bugs 10 (48%) 11 (52%) 21Nonfunctional UI Bugs 1 (11%) 8 (89%) 9

57 77 134

A. RQ1: Which types of strategies are use by testers in eachgroup?

1) Testers with no formal training (Carmen) overwhelminglyrely on unguided strategies (65%), using guided strategiesonly 31% of the time and semi-guided strategies for 4%of the time.

2) Formally trained testers (George) split their use of guidedand unguided strategies evenly (45% and 46% respec-tively) whilst making double the use of semi-guidedstrategies as untrained testers (9%).

3) Interestingly, whilst Carmen seems to always prefer un-guided strategies, George tends to alternate between them,using unguided strategies to find opportunities for moreguided testing (see Figure 6 and Figure 7).

Fig. 6. Strategy types collectively used by untrained testers (Carmen)throughout the 40 minute test sessionsg

Fig. 7. Strategy types collectively used by formally trained testers (George)throughout the 40 minute test sessions

B. RQ2: Which bugs are found by testers of each group?

1) We split bugs into five categories (see Section III-D).

310

w training w/o training

[Micalef 2016]

Recommendations on Exploratory Testing

Freestyle Pure scripted

Domain issues Little repetition

System issues Much repetition –>

automation Use both Train your testers

Actionable Exploratory Testing

Workshop agenda •  Introduction (10 min): research context, team & participants •  The principles of exploratory testing (5 min) •  Alternative types of test charters (20 min)

•  Exercise: Write test cases according to test charter templates (15 + 25 min)

•  Reflect on improvements (10 min)

•  Closing (5 min): Sum up; next steps

Further reading

•  Itkonen J, Mäntylä M, Lassenius C (2007) Defect Detection Efficiency: Test Case Based vs. Exploratory Testing. ESEM’07, pp 61–70

•  Itkonen J., Mäntylä M. V. and Lassenius, C. The Role of the Tester's Knowledge in Exploratory Software Testing IEEE Transactions on Software Engineering (2013) 39(3):707–724

•  Micalef M, Porter C, Borg A, Do Exploratory Testers Need Formal Training? An Investigation Using HCI Techniques, TAIC-PART, ICST Workshops 2016: 305-314

•  Afzal W, Ghazi, A N, Itkonen, J, Torkar, R, Andrews A, Khurram Bhatti, An Experiment on the Effectiveness and Efficiency of Exploratory Testing, Empir Software Eng (2015) 20:844–878

Further contacts

Per Runeson [email protected]

Elizabeth Bjarnason [email protected]

Kai Peterson [email protected]

Exploratory testing - black and white! PER RUNESON, ELIZABETH BJARNASON, KAI PETERSEN