ambiguous software requirement specification detection: an … · 2019. 5. 9. · quirement based...

Ambiguous Software Requirement Specification Detection AnAutomated Approach

Mohd Hafeez OsmanTechnical University of Munich Germany

University Putra Malaysia Malaysiahafeezosmantumde

Mohd Firdaus ZaharinMinistry of Education Malaysiafirdauszaharinmoegovmy

ABSTRACTSoftware requirement specification (SRS) document is the mostcrucial document in software development process All subsequentsteps in software development are influenced by the requirementsThis implies that the quality of SRS influences the quality of the soft-ware product However issues in requirement such as ambiguitiesor incomplete specificationmay lead tomisinterpretation of require-ments which consequently higher the risk of time and cost overrunof the project Finding defects in the initial phase is crucial since thedefect that found late is more expensive than if it was found earlyThus the requirement should be tested before moving to otherdevelopment phases This study describes an automated approachfor detecting ambiguous software requirement specification To thisend we propose the combination of text mining and machine learn-ing Since the dataset is derived from Malaysian industrial SRSs andthe respondentsrsquo that evaluated our result are from Malaysia wefocus this study on Malaysian context for a pragmatic reason Weused text mining for feature extraction and for preparing the train-ing set Based on this training set the method rsquolearnsrsquo to classifyambiguous and non-ambiguous requirement specification In thispaper we study a set of nine classification algorithms from the ma-chine learning community and evaluate which algorithms performbest to detect ambiguous software requirement specification Wetake a step forward to develop a working prototype to evaluate thereliability of our approach The validation of the working prototypeshows that the classification result is reasonably acceptable Eventhough this study is an experimental benchmark we optimist thatthe result of this study may contribute to enhance the quality ofSRS and as well as assisting the requirement testing or review

KEYWORDSSoftware Engineering Requirement Engineering Machine Learn-ing Text Mining

ACM Reference FormatMohd Hafeez Osman and Mohd Firdaus Zaharin 2018 Ambiguous Soft-ware Requirement Specification Detection An Automated Approach InProceedings of International Workshop of Requirement Engineering and Test-ing (RET2018) ACM New York NY USA Article 4 8 pages httpsdoiorg10475123_4

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for third-party components of this work must be honoredFor all other uses contact the ownerauthor(s)RET2018 June 2018 Gothenburg Swedencopy 2018 Copyright held by the ownerauthor(s)ACM ISBN 123-4567-24-5670806 $1500httpsdoiorg10475123_4

1 INTRODUCTIONSoftware requirement is the foundation of software developmentHence high-quality software requirements specification (SRS) mayincrease the possibility of high software quality It is just like theterm garbage in garbage out that has been used in software pro-gramming which means If there is a logical error in software orincorrect data are entered the result will probably be either a wronganswer or a system crash [10] For companies that outsource theirsoftware development software requirements document is the mostcrucial communication document that specifies the stakeholderrsquos vi-sion and needs of software to be developed Hence a good softwarerequirements document is needed to ensure the software developersunderstand the stakeholder needsResearch Problem Issues in software requirements such as am-biguities or incomplete requirements specification can lead to timeand cost overrun in a project [13] A problem in a software require-ment specification need to be detected as early as possible sincethe problem that found late in the project is more expensive thanif it was found early [8] Some of the issues in requirements speci-fication can be manually detected by the requirements engineersand some are not possible For example the requirements specifica-tion that ambiguous unclear and incomplete can be easily detectedby the requirement engineer but the requirements that related tothe domain knowledge is difficult to be detected Consequently anapproach that offers requirements engineers rapid detection to apossible defect in specification could contribute valuable feedback[12]

Software requirements defects detection based on requirementstemplate (also known as boilerplate) is feasible based on the workby Arora et Al [6]5 As the baseline Arora et al used the require-ment templates from the ISOIECIEEE 29148 [1] and the templateproposed by Pohl and Rupp [24] However since requirements inthe industry are nearly exclusively written in Natural Languageit is hard to detect the issues in requirements because a naturallanguage has no formal semantics In the Malaysia context mostof the Industrial SRSs are written in Malay A lot of research hasbeen conducted to solve this problem are focused on English Thereare little works focused on Malay Furthermore the boilerplatesfor formulating requirement are not commonly used in Malaysiathat make the requirements testing or review much more difficultWhile most of the work used Natural Language Processing (NLP)to detect requirements defect another possible technique for de-tection of requirements detection is by using text mining Textmining generally refers to the process of extracting interesting andnon-trivial patterns or knowledge from unstructured text docu-ments [19] Since the requirements are in the form of unstructured

RET2018 June 2018 Gothenburg Sweden MH Osman et al

text text mining is a suitable technique for detecting ambiguousrequirement specificationsGoal The main objective of this work is to come out with an auto-mated approach on detecting the defect in software requirementspecification In this paper we aim at formulating an automated ap-proach to detect ambiguous software requirement specification Tothis end we propose a combination of text mining and supervisedmachine learning We focus on detecting ambiguous SRS written inMalays since the dataset is derived from Malaysian Industrial soft-ware development and the respondents for validating our proposedsolution are from Malaysia

The dataset consists of occurrence of feature-words This datasetis originated from 180 requirement specifications (gathered from 4Malaysia Industrial SRSs) The 180 requirement specification havebeen tagged or labeled manually (ambiguous or non-ambiguous)Supervised machine learning technique is used learn and classifythe ambiguous requirements We explore nine (9) classificationalgorithms to find the suitable classification algorithms for our pur-pose The classification algorithms are OneR Naive Bayes LogisticRegression Nearest Neighbour (k-NN) Decision Table DecisionStump J48 Random Forest and Random Tree We develop a work-ing prototype to evaluate the reliability of the result and our overallapproach The working tool is validated by ten (10) requirementengineersystem analystContribution The contributions of this paper are the following

bull Formulation of feature-words based on SRS documentbull Exploring predictive power of feature-wordsbull Evaluation of nine classification algorithms for detectingambiguous requirements

bull An automated tool-supported approach for detecting am-biguous software requirements

This paper is structured as follows Section 2 discusses the re-lated work Section 3 describes the research questions and Section3 explains the approach We present the results and findings in Sec-tion 5 Discussion and Future Work in Section 7 This is followedwith conclusion in Section 8

2 RELATEDWORKVarious works have been done on software requirements qualityassurance There are also works focusing on the classification ofquality characteristics (eg [9]) and others develop a comprehensivechecklist [4] [7] and [26]]

Arora et al [5] proposed an automated solution and tool-supported[6] approach for checking conformance to requirements templatesThe work is based on two (2) requirement templates which areRupprsquos templates [8] and Easy Approach to Requirement Templates(EARS) [20] The approaches build on a natural language processingtechnique known as text chunking

Femmer et al [12] proposed an approach for smelling bad re-quirement based on requirement quality criteria listed at the ISO29148 - requirement engineering standards [1] This work aimed atrapidly detecting requirements that violate the Requirement Engi-neering principles They also develop a tool to detect the require-ments principles violation by using part-of-speech (POS) taggingmorphological analysis and dictionaries

Alshazly et al [3] concerned with detecting defects in softwareengineering specification based on defect taxonomies They pro-posed a taxonomy focusing on the defects in the requirementsphase and added correlations between the defects and the causesof their occurrences to guarantee the quality of SRS documentsThe taxonomy intended to help in training the software engineerSRS writers support creating accurate requirements and efficientrequirement defect prevention and requirement defect detection

Haron et al [15] formulated a conceptual model on managinglexical ambiguity to reduce the possibility of misinterpretation er-rors in Malay sentences by identifying potential Malay vague wordsPrior to this work [16] they performed a research to identify a listof potential ambiguous word in Malay language that intended to as-sist SRS writers or requirement engineers to avoid using the vaguewords while documenting SRS This ongoing study has collected120 potential ambiguous Malay words which can potentially beused as the reference in developing SRS

There are also work on improving software requirements speci-fication by using NLP and text mining such as the work by Sateliet al [25] The summary for the recent works are the following

a Most of the works for detecting defect in requirements specifi-cation refers to several requirement structures such as RupprsquosEARS IEEE 830 and ISOIECIEEE 29148 From our observationnot all requirements following the requirement structure butstill the requirement are still understandable or in fair qualityrequirement It is important to have a technique that detectrequirements defect based on truly unstructured text

b It is difficult to validate the requirements defect detection basedon the aforementioned requirement templates since case studiesthat follow the requirement structure are limited

c Requirement specification is highly related to the underlyinglanguage used to express the requirement Most of the worksfocus on English as the basic language The list of ambiguouswords provided by Haron et al [15][16] is beneficial to be usedin our research which focuses on Malaysian Context

3 RESEARCH QUESTIONSIn the context of the research problems specified in section 1 thispaper intended to answer the following research questionsRQ1 What are the influential words in classifying ambigu-ous software requirements We explore the predictive (classi-fication) power of each words that exist in software requirementspecifications which are used in this study We also evaluate thepredictive power of words that are suggested by Haron et al[17]and are mentioned in Berry et al [8]

RQ2 What are suitable classification algorithms in classify-ing ambiguous software requirement The candidate classifi-cation algorithms are evaluated to find the suitable algorithm(s) inclassifying ambiguous software requirement

RQ3 Which set of words produce the best classification per-formance We explore how the performance of the classificationalgorithm is influenced by partitioning the words in different sets

Ambiguous Software Requirement Specification Detection An Automated Approach RET2018 June 2018 Gothenburg Sweden

Figure 1 Overall Approach

4 APPROACHThis section describes the overall approach (as illustrated in figure1) that consist of data collection data preprocessing text processingtext classification prototype development and validation

41 Data CollectionThe main data for this work is the SRS documents As mentionedin section 1 this study is an early work of automated detection onambiguous software requirement for SRS written in Malay Thuswe have selected four (4) SRS documents from four (4) separated(Malaysian) software development projects Out of these four (4)documents we extracted 180 software requirement specificationsthat we believe are suitable to be in the dataset

42 Document PreprocessingIn Document Preprocessing there are two (2) major activities in-volved which are data labeling and data cleansing or filtering

a Data LabelingIn data labeling activity we manually classify each extractedrequirement specification into (i) ambiguous or Y and (ii) non-ambiguous or N This labels are crucial for supervised machinelearning purposes

b Data CleansingData Cleansing refers to the process of detecting and correcting(or removing) corrupt or inaccurate records from a record settable or database and refers to identifying incomplete incorrectinaccurate or irrelevant parts of the data and then replacing mod-ifying or deleting the dirty or coarse data [27] This activity willclean or scrub the unstructured text (software requirement speci-fication) that consists of stop word removal stemming and wordfiltering Since the library for stop word removal and stemmingfor Malay language is not available this activity is conductedmanually We also convert all short forms words into formalwords (eg yg to yang)

43 Text ProcessingThe expected output of this activity is a text dictionary that consistof classification features and labels We consider all words thatare exist in the selected requirement specifications as our features(called feature-words) In total there are 405 feature-words gatheredfrom 180 requirement specifications The text dictionary is createdbased on the following steps

(1) Calculate all number of occurrence for each feature-wordsin each software requirements

(2) Integrate the label and the software requirementsThis text processing activity is supported by RapidMiner [2] Anexample of text dictionary is illustrated in Table 1

44 Text ClassificationThe text classification process aims at (i) performing univariateanalysis to measure the predictive power of each feature-word(ii) selecting the suitable classification algorithms for building aclassification model to classify ambiguous and non-ambiguous re-quirement and (iii) evaluating different sets of feature-word Theexplanation of these tasks are the following

Univariate analysisTo measure predictive power of predictors (feature-words in our

case) we used the information gain with respect to the class [33]Univariate predictive power means measuring how influential a sin-gle feature-word is in prediction or classification performance Theresults of this algorithm are normally used to select the most suit-able predictor for prediction purposes Nevertheless in our studywe did not use it for predictor selection but for an exploratoryanalysis of the usefulness of various feature-words We comparethe feature-words that we found in our study with the ambiguouswords that are suggested by Haron et al [15] and the words men-tioned in Berry et al [8]

Classification algorithm selectionThis process is divided into several task (i) classification algo-

rithms candidate selection (ii) classification model constructionand (iii) classification model evaluation The detail explanations arethe followingi Classification algorithms candidate selectionPrior to making a selection of the classification algorithms weran exploratory experiments on a wider range of supervisedmachine learning algorithms We do not expect that there willbe a single silver bullet algorithm that will outperform all oth-ers across our dataset Also we are not just interested in asingle algorithm that scores a top result on a given problembut are looking for sets of classification algorithms that producea reasonable result for the dataset In this way we will openthe possibility of mixing the classification algorithms for betterclassification performance As mentioned above we desired toexplore a diverse set of classification algorithms representativefor different approaches For example decision trees stumpstables and random trees or forests all divide the input space upin disjoint smaller sub spaces and make a prediction based onoccurrence of positive classes in those sub spaces K-NN are


Table 1 Text Dictionary Example

yang sistem akan maklumat dan untuk telah mohon guna boleh proses (405 words) LabelReq1 0 0 0 0 0 1 0 0 0 1 0 NReq2 1 0 0 1 0 0 0 0 0 1 0 YReq3 2 1 0 0 0 1 0 0 0 0 0 YReq4 1 0 1 0 2 3 0 0 1 0 2 YReq5 2 0 1 0 1 0 0 0 1 0 0 YReq6 1 0 0 1 1 0 0 0 0 2 1 YReq7 0 0 1 0 0 1 0 0 0 0 1 NReq8 4 0 1 0 0 1 0 0 0 1 0 YReq9 1 0 0 0 0 0 0 0 0 0 0 N

similar local approaches but the sub spaces here are overlap-ping In contrast logistic regression and Naive Bayes modelparameters are estimated based on potentially large numberof instances and can thus be seen as more global models [21]On the other hand OneR generates one-level decision tree ex-pressed in the form of a set of rules that all test one particularattribute [14] More explanation about these algorithms can befound at [14] and [22]

ii Classification model constructionThis task is supported by WEKA [14] (tool) We used defaultconfiguration suggested by WEKA for the training activityfor each classification algorithms For every training and testactivity for the classification algorithm we used 10-fold crossvalidation it means that we use only 10 of the data for testingand 90 for training To further improve reliability we ran eachexperiment 10 times using different randomization

iii Classification model evaluationThe classification algorithms are considered to be suitable forour purpose based on its classification performance Since ourdataset is reasonably balance we evaluate the classificationperformance based on three measurements (i) accuracy (per-centage of correct classification) (ii) precision () and (iii) recallAt first we evaluated the classification algorithms performanceagainst accuracy of ZeroR (the baseline) ZeroR predict the ma-jority class (if nominal) or the average value (if numeric) [14]This classifier does not take feature as the predictor and presentthe percentage of probability or random guessing Accuracyvalue below ZeroR shows that the model (created by the classi-fication algorithm) does not make any significant improvementin classifying the ambiguous requirement compared to randomguessing Hence the classification algorithm accuracy valueshould better higher than ZeroR Then we compare the clas-sification algorithm based on precision and recall Precisionand recall are common in information retrieval for evaluat-ing classification performance [11] If needed we evaluate theF-Measure (balance between precision and recall) of the classi-fication algorithms performance

Evaluation of feature-words set performanceThe univariate analysis is intended to show the predictive per-

formance for individual feature-words However this analysis doesnot show the predictive performance for set of features To evaluatethe performance sets of feature-words we created additional three

(3) datasets (i) use only words suggested by Haron et al as setof feature-words (ii) use only words mentioned in Berry et al asfeature-words and (iii) combine both datasets from (i) and (ii) Weevaluated the performance of all datasets against all classificationalgorithm candidates

45 Prototype Development and ValidationIn this study we aim at formulating an automated approach to assistrequirement engineers to detect ambiguous software requirementspecification We build an interactive prototype tool to classifyambiguous and non-ambiguous requirement specification basedon the evaluated classification model Based on this prototype wevalidate our (initial) approach by conducting a survey

5 RESULT AND FINDINGSThis section describes i) evaluation on predictive power of feature-words (ii) evaluation on classification algorithms performance and(iii) analysis of datasets Each of the following subsection answersthe Research Questions that have been mentioned in section 3

51 RQ1 Feature-words evaluationAttribute Evaluator analysis from WEKA (tool) is used to evalu-ate the influence of each feature-words in classifying ambiguousand non-ambiguous requirement specification This evaluation pro-duces a score from 0 to 1 for every feature-words A value closerto 1 means strong influence We consider a predictor as influentialwhen the Information Gain score gt 0 Out of 405 feature-wordsthat are extracted from 180 software requirement specification wefound that 94 feature-words have Information Gain score gt 0 Thesefeature-words have the score ranging from 03010 to 00205

Table 2 shows the list of 30 most influential feature-words Basedon our best knowledge and experience these words are commonwords that can be found in SRS documents Only several words inthe list (Table 2) are domain related words such as dana (fund)majikan (employer) and emel (e-mail) Which means the do-main related words have little influence in classifying the ambigu-ous and non-ambiguous requirements We take a step further tocompare the list of influential feature-words with the list of am-biguous words that were suggested by Haron etal [17] and werementioned in Berry et al [8] We matched the words that weresuggestedmentioned in both study with our feature-words TheInformation Gain value for these words are presented in Table 3


for Haron et al and Table 4 for Berry et al We discovered thatseveral words that have been listed by both study influence theclassification performance From this result we decided to use thelist of words by both study as our permanent feature-words in ourprototype

Table 2 List of 30 influential words based on InfoGainAt-trEval Score

No word InfoGainscore


1 sistem 03010 16 jika 007472 papar 02855 17 buat 006623 dan 02371 18 semak 006624 akan 01320 19 tidak 006625 tersebut 01095 20 pilih 006366 maklumat 01090 21 yang 006207 hendaklah 01036 22 ahli 005978 dalam 01033 23 dari 005869 dana 01023 24 tempoh 0057910 majikan 00861 25 telah 0057911 projek 00841 26 emel 0057912 kepada 00838 27 melalui 0057913 proses 00819 28 baru 0057914 boleh 00809 29 juga 0052415 senarai 00762 30 sekiranya 00524

Table 3 Preditive power of feature-words by Haron et al

Word Similar feature-word InfoGain scoremenghantar hantar 00310mengeluarkan dikeluarkan 00000menerima terima 00416mengemaskini kemaskini 00000maklumat maklumat 01090meluluskan lulus 00000notifikasi notifikasi 00000untuk untuk 00420sekiranya sekiranya 00524khidmat perkhidmatan 00000dengan dengan 00318permohonan mohon 00000operasi operasi 00000pengguna not available not available

52 RQ2 Evaluation on ClassificationAlgorithms Performance

As mentioned in section 4 we evaluate the classification algorithmsperformance based on (i) accuracy (ii) precision and (iii) recallFirst we evaluate the classification algorithms performance bycomparing the classifications algorithms with ZeroR score TheZeroR accuracy score is 53 As illustrated in Table 5 all candidateclassification performed better than ZeroR which means that all

Table 4 Preditive power of feature-words by Berry et al

Word Similar feature-word InfoGain Scoresemua semua 00000setiap setiap 00222sahaja sahaja 00000juga juga 00524walaupun walaubagaimanapun 00000atau ataupun 00257beberapa not available not availablebahawa bahawa 00000yang yang 00620

these words were translated to Malay

classification algorithm present significant improvements compareto random guessing

Then we compare the classification algorithms with OneR OneRis the simplest classification algorithm that takes only one featurefor prediction or classification This evaluation is done becauseaccording to Holte[18] there is a possibility that a simple algo-rithm works well in a dataset If the performance of using complexclassification algorithms are not much different or same with thesimplest algorithms it is better to use simple classification algo-rithms for better performance (in terms of classification speed) Theresult shows that OneR performs better than k-NN and DecisionStump while Decision Table score the same result Hence k-NNDecision Stump and Decision Table are considered not suitable asclassification algorithms for our purpose

We further our investigation by evaluating classification perfor-mance based on precision and recall score Again OneR is used asour benchmark The result (Table 5) shows that several classificationalgorithms perform well in recall but not for precision For examplenaive bayes score high for recall (095) but low for precision (073)To get a better picture on these classification performance we lookat F-Measure ie a score that consider both precision and recall Theresult shows that Naive Bayes Logistic Regression J48 RandomForest and Random Tree are suitable classification algorithms forour purpose Out of these five (5) classification algorithms RandomForest is the most suitable classification algorithms based on all theevaluated scores

Table 5 Classification Algorithms Performance

Accuracy Precision Recall F-MeasureOneR 7806 080 074 076Naive Bayes 8022 073 095 082Logistic Reg 8094 078 086 081k-NN 7189 064 094 076Decision Table 7806 074 084 078Decision Stump 7717 069 095 080J48 8267 083 082 081Random Forest 8967 090 088 089Random Tree 8089 078 085 081


Table 6 Classification Performance (accuracy) based onDataset

AllData DataH DataB DataHBOneR 7806 6578 6078 6578Naive Bayes 8022 6533 5811 6722Logistic Reg 8094 6783 5472 6667k-NN 7189 6789 5822 6983Decision Table 7806 6200 5672 6328Decision Stump 7717 4889 5778 5233J48 8267 7117 5783 7089Random Forest 8967 7044 5906 7656Random Tree 8089 6822 5900 7139 Data_H used feature-words suggested by Haron et al [17]Data_B used feature-words mentioned in Berry et al [8] and

DataHB is the combination of DataH and DataB

53 RQ3 Analysis on DatasetThe accuracy score for all datasets are illustrated in Table 5 Thistable shows that by using all (405) feature-words the classificationalgorithms produce the best results The other datasets score muchlower than using all words as feature-words This result indicatesthat it is not enough to only used the words that are suggested byHaron et al [17] and words that are mentioned in Berry et al [8]to classify ambiguity and non-ambiguity requirements

6 PROTOTYPE AND VALIDATIONThis section describes the development of the prototype and thevalidation of the prototype

61 Prototype DevelopmentThe main objective of this tool is to assist the requirement engi-neer to develop a good quality requirement specification This tooldisplays several useful information on how to formulate a goodsoftware requirement specification based on ISOIECIEEE 29148and the information from International Requirements EngineeringBoard (IREB) As for now this tool offers a functionality to detectambiguous and non-ambiguous requirement specification A briefdescription about this functionality is the following

bull Random Forest is used as the classification algorithmbull 180 requirement specifications are used as the learning datasetbull The feature-words shall be dynamically chosen by the user

Figure 2 shows the main page of the tool that display the informa-tion about a good requirement and a function to detect ambiguousrequirement specification The user need to provide the requirementspecification as the input Then the tool classifies the requirementby using the classification model then presents the classification re-sult (see Figure 3) as well as the information about the words (wordsoccurrence information) This tool is developed using PHP AJAXPython and JQuery We used scikit-learn[23] library for executingthe machine learning task

62 ValidationIn this study we desired to provide a pragmatic solution to thisproblem Although this study is still in an initial stage we need to

Figure 2 Toolrsquos Main Page

Figure 3 Toolrsquos Classification Results

Figure 4 Respondentsrsquo Background


observe the practicality of this tool in helping requirement engineerwriting a good requirement specification Thus we conducted asimple survey to validate the prototype

This survey is conducted to measure certain attributes based onexpertsrsquo opinion and review Those attributes are the InformationQuality and the Usefulness of the Tool The result of this survey isimportant to indicate the acceptance satisfaction and usefulnessof prototype tool from the expertsrsquo view The respondents wererequired to use the prototype and then answer several questionsThe format (Likert scale) of a six-level item is used as answer op-tions in this questionnaire ie strongly disagreemoderately disagreedisagree agree moderately agree and strongly agree

Respondentsrsquo BackgroundOverall we have selected ten (10) respondents that worked as

Requirement Engineer and System Analyst to involve in this ac-tivity Out of 10 respondents three (3) respondents have less than2 years of experience four (4) respondents have 2 to 5 years ofexperience and three (3) respondents have more than 5 years ofexperience (figure 4)

Information QualityTo assess the quality of information presented in the tool the

respondents were required to answer the following questionsa The result of requirement ambiguity detection is accurate

The result shows that none of the respondent disagree that theresult of the tool is inaccurate Seven (7) of the respondentsansweredmoderately agree two (2) of the respondents answeredstrongly agreewhile only one (1) answered agree This shows thatfrom the respondentrsquos perspective the result of the ambiguousrequirement specification detection is moderately accepted

b The tool provides me with sufficient informationIn general we can summarize that the respondents slightly agreethat the tools provide sufficient information From the result four(4) of the respondents answered agree five (5) answered mod-erately agree while only one (1) respondent answered stronglyagree Perhaps this is due to the prototype only give the classifi-cation result and the words that are used but not the informationon what kind of word that are contribute to ambiguous softwarerequirement

Usefulness of the ToolTo assess the usefulness of the tool the respondents were re-

quired to answered two (2) questiona The tool increases productivity in formulating Software Require-

ment Specification (SRS)Most of the respondents moderately agree that the tool increasethe productivity in formulating SRS Three (3) of the respondentsanswered strongly agree while only one respondent answeredagree

b This tool improves the Requirement Engineer System AnalystPerformance in preparing SRSFive (5) respondents answered that they are moderately agreewhile four (4) respondents answered that they are strongly agreeThis shows that the tool can be used as a learning material forRequirement Engineers System Analysts to improve their SRSpreparation skills

7 DISCUSSION AND FUTUREWORKThis section presents our discussion on the dataset classificationmodel construction prototype tool and threats to validity

71 DatasetAs for now we have processed four (4) SRS (written) and selected180 requirement specifications We are aware that we have a littlecoverage on SRS Furthermore using all words as feature-wordsis difficult when the data is much bigger Nevertheless this studywas considered as an early experimental benchmark we intendedto observe the possibility of using text mining and supervised ma-chine learning to solve issues related to ambiguous requirementspecification We see a number of ways to extend this study such asincreasing the dataset improving the ground truth (ambiguousnon-ambiguous label) formulating more feature words (eg number ofwords per specification formulating weight for common ambigu-ous word and expend the ambiguous words that are mentioned byHaron et al [15])

72 Classification Model ConstructionGenerally this study is expected to find the suitable algorithms toclassify ambiguous requirement based on our dataset This studyhas shown that Naive Bayes Logistic Regression J48 RandomForest and Random Tree are suitable to be used as classification al-gorithms for our dataset Based on the classification score RandomForest shows the best performance in terms of precision and recallHence this classification algorithm is used in the prototype toolThere is a possibility to enhance the performance of Random For-est by reconfiguring the parameter Also combination with otherclassification algorithms can be another alternative

73 Prototype ToolThe purpose of developing the prototype is to evaluate the classi-fication model with the input from the requirement engineer orsoftware analyst We validate this tool to observe the reliability ofthe tool in classifying ambiguous and non-ambiguous requirementspecification Moreover we also like to observe on the usefulnessof the prototype in helping requirement engineers and system ana-lysts in formulating requirement specification Although the resultis reasonably acceptable we see a lot of improvement should bedone such as presenting the word that probably contribute to am-biguous requirement increase more dataset improve feature-wordsand validate the tool with more requirement engineer

74 Threats to ValidityThis subsection describes the threats to validity of this study thatconsists of threats to internal validity external validity and con-struct validityThreats to Internal Validity The stemming of the words in therequirements specification during the data preprocessing was donemanually There is possibility that we unable to find several rootwords Another threats to internal validity is the feature-wordsthat are based on English-Malay translation (words by Berry et al[8]) These words are translated literally without comparing otherwords that are also synonym but in different context


Threats to External Validity We use 180 requirement specifica-tion that are gathered from four (4) SRS These SRS only representseveral system domain and limited pattern on requirement specifi-cation formation Hence we could not generalize the result of thisstudy for all systems in MalaysiaThreats to Construct Validity We use the default parameter toconstruct a classification model for each classification algorithmscandidate There is a possibility for a classification algorithm toproduce a better classification score if the classification algorithmsparameter are configured differently Also there is a possibility thatseveral other classification algorithms that would perform betterclassification that we did not covered in this study

8 CONCLUSIONSThis study aimed to come up with an automated approach to clas-sify ambiguous requirement specification For a practical reasonwe focus on SRS that are written in Malay The dataset was cre-ated based on 180 software requirement specifications that wereextracted from four (4) SRS We evaluated the predictive power ofeach words and also compared the highly influential feature-wordsfor classifying ambiguous requirements with existing research Wediscovered that several ambiguous words suggested by existingresearch influenced the classification of ambiguous requirementHowever we found that there are also some other words that arehighly influential for classifying ambiguous requirements from ourdataset

We have conducted an experiment to find the suitable classifica-tion algorithms for our purpose We discovered that Naive BayesLogistic Regression J48 Random Forest and Random Tree is suit-able for our dataset Random Forest performed the best classifi-cation scores compared to all aforementioned algorithms Basedon the experiment we developed a working prototype to evaluatethe result and the reliability of our classification model The toolwas validated by ten (10) requirement engineerssystem analystThe validation showed that the classification result of the tool isreasonably acceptable Also the validation result shows that thetool is useful for improving the quality of SRS Although this studywas considered as an early experimental benchmark we believethat the result of this study provides a significant contribution forimproving SRS quality as well as supporting requirement testingor review task

ACKNOWLEDGMENTSThe authors would like to thank the respondents of this study fortheir support and commitment

REFERENCES[1] 2011 ISOIECIEEE International Standard - Systems and software engineering

ndash Life cycle processes ndashRequirements engineering ISOIECIEEE 291482011(E)(Dec 2011) 1ndash94 httpsdoiorg101109IEEESTD20116146379

[2] 2017 RapidMiner httpsrapidminercom (2017)[3] Amira A Alshazly Ahmed M Elfatatry and Mohamed S Abougabal 2014 De-

tecting defects in software requirements specification Alexandria EngineeringJournal 53 3 (2014) 513ndash527

[4] Bente Anda and Dag IK Sjoslashberg 2002 Towards an inspection technique foruse case models In Proceedings of the 14th international conference on Softwareengineering and knowledge engineering ACM 127ndash134

[5] Chetan Arora Mehrdad Sabetzadeh Lionel Briand and Frank Zimmer 2015Automated checking of conformance to requirements templates using natural

language processing IEEE transactions on Software Engineering 41 10 (2015)944ndash968

[6] Chetan Arora Mehrdad Sabetzadeh Lionel Briand Frank Zimmer and RaulGnaga 2013 RUBRIC A flexible tool for automated checking of conformance torequirement boilerplates In Proceedings of the 2013 9th Joint Meeting on Founda-tions of Software Engineering ACM 599ndash602

[7] Daniel M Berry Antonio Bucchiarone Stefania Gnesi Giuseppe Lami and Gian-luca Trentanni 2006 A new quality model for natural language requirementsspecifications In Proceedings of the international workshop on requirements engi-neering foundation of software quality (REFSQ)

[8] A Bucchiarone S Gnesi G Lami G Trentanni and DM Berry 2006 A NewQuality Model for Natural Language Requirements Specification In Proc of the12th International Working Conference on Requirements Engineering Foundationfor Software Quality (REFSQacircĂŹ06) June 2006 Luxembourg Grand-Duchy ofLuxembourg Essener Informatik Beitrage ISBN 3-922602-26 Vol 6

[9] Alan Davis Scott Overmyer Kathleen Jordan Joseph Caruso Fatma DandashiAnhtuan Dinh Gary Kincaid Glen Ledeboer Patricia Reynolds Pradip Sitaramet al 1993 Identifying and measuring quality in a software requirements spec-ification In Software Metrics Symposium 1993 Proceedings First InternationalIEEE 141ndash152

[10] Dictionarycom 2018 wwwDictionarycom (2018) Retrieved 5th February2018 from httpwwwdictionarycom

[11] Tom Fawcett 2006 An introduction to ROC analysis Pattern recognition letters27 8 (2006) 861ndash874

[12] Henning Femmer Daniel Meacutendez Fernaacutendez Elmar Juergens Michael KloseIlona Zimmer and Joumlrg Zimmer 2014 Rapid Requirements Checks with Require-ments Smells Two Case Studies In Proceedings of the 1st International Workshopon Rapid Continuous Software Engineering (RCoSE 2014) ACM New York NYUSA 10ndash19 httpsdoiorg10114525938122593817

[13] Daniel MAtildeľndez FernAtildeąndez and Stefan Wagner 2015 Naming the pain inrequirements engineering A design for a global family of surveys and firstresults from Germany Information and Software Technology 57 (2015) 616 ndash 643httpsdoiorg101016jinfsof201405008

[14] M Hall E Frank G Holmes B Pfahringer P Reutemann and IH Witten 2009The WEKA Data Mining Software An Update (2009) Issue 1

[15] Hazlina Haron Abdul Azim Abdul Ghani and Hazliza Haron 2015 A conceptualmodel to manage lexical ambiguity in Malay textual requirements ARPN Journalof Engineering and Applied Sciences 10 3 (2015) 1405ndash1412

[16] Hazlina Haron and Abdul AzimAbdul Ghani 2014 Amethod to identify potentialambiguous Malay words through Ambiguity Attributes mapping An exploratoryStudy CoRR abs14026764 (2014) arXiv14026764 httparxivorgabs14026764

[17] Hazlina Haron and Abdul Azim Abdul Ghani 2015 A Survey on AmbiguityAwareness towards Malay System Requirement Specification (SRS) among In-dustrial IT Practitioners Procedia Computer Science 72 (2015) 261ndash268

[18] Robert C Holte 1993 Very simple classification rules perform well on mostcommonly used datasets Machine learning 11 1 (1993) 63ndash90

[19] Ah hwee Tan 1999 Text Mining The state of the art and the challenges In InProceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from AdvancedDatabases 65ndash70

[20] Alistair Mavin Philip Wilkinson Adrian Harwood and Mark Novak 2009 Easyapproach to requirements syntax (EARS) In Requirements Engineering Conference2009 RErsquo09 17th IEEE International IEEE 317ndash322

[21] M H Osman M R V Chaudron and P v d Putten 2013 An Analysis of MachineLearning Algorithms for Condensing Reverse Engineered Class Diagrams In2013 IEEE International Conference on Software Maintenance 140ndash149 httpsdoiorg101109ICSM201325

[22] M H Osman M R V Chaudron P van der Putten and T Ho-Quang 2014Condensing reverse engineered class diagrams through class name based abstrac-tion In 2014 4th World Congress on Information and Communication Technologies(WICT 2014) 158ndash163 httpsdoiorg101109WICT20147077321

[23] Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent MichelBertrand Thirion Olivier Grisel Mathieu Blondel Peter Prettenhofer RonWeiss Vincent Dubourg Jake Vanderplas Alexandre Passos David CournapeauMatthieu Brucher Matthieu Perrot and Eacutedouard Duchesnay 2011 Scikit-learnMachine Learning in Python J Mach Learn Res 12 (Nov 2011) 2825ndash2830httpdlacmorgcitationcfmid=19530482078195

[24] Klaus Pohl and Chris Rupp 2011 Requirements Engineering Fundamentals AStudy Guide for the Certified Professional for Requirements Engineering Exam -Foundation Level - IREB Compliant (1st ed) Rocky Nook

[25] Bahar Sateli Elian Angius Srinivasan Sembakkam Rajivelu and Reneacute Witte2012 Can text mining assistants help to improve requirements specificationsMining Unstructured Data (MUD 2012) Canada (2012)

[26] Axel Van Lamsweerde 2009 Requirements engineering From system goals to UMLmodels to software Vol 10 Chichester UK John Wiley amp Sons

[27] Shaomin Wu 2013 A review on coarse warranty data and analysis ReliabilityEngineering amp System Safety 114 (2013) 1ndash11

Abstract

1 Introduction

2 Related Work

3 Research Questions

4 Approach

41 Data Collection

42 Document Preprocessing

43 Text Processing

44 Text Classification

45 Prototype Development and Validation

5 Result and Findings

51 RQ1 Feature-words evaluation

52 RQ2 Evaluation on Classification Algorithms Performance

53 RQ3 Analysis on Dataset

6 Prototype and Validation

61 Prototype Development

62 Validation

7 Discussion and Future Work

71 Dataset

72 Classification Model Construction

73 Prototype Tool

74 Threats to Validity

8 Conclusions

Acknowledgments

References


text text mining is a suitable technique for detecting ambiguousrequirement specificationsGoal The main objective of this work is to come out with an auto-mated approach on detecting the defect in software requirementspecification In this paper we aim at formulating an automated ap-proach to detect ambiguous software requirement specification Tothis end we propose a combination of text mining and supervisedmachine learning We focus on detecting ambiguous SRS written inMalays since the dataset is derived from Malaysian Industrial soft-ware development and the respondents for validating our proposedsolution are from Malaysia

The dataset consists of occurrence of feature-words This datasetis originated from 180 requirement specifications (gathered from 4Malaysia Industrial SRSs) The 180 requirement specification havebeen tagged or labeled manually (ambiguous or non-ambiguous)Supervised machine learning technique is used learn and classifythe ambiguous requirements We explore nine (9) classificationalgorithms to find the suitable classification algorithms for our pur-pose The classification algorithms are OneR Naive Bayes LogisticRegression Nearest Neighbour (k-NN) Decision Table DecisionStump J48 Random Forest and Random Tree We develop a work-ing prototype to evaluate the reliability of the result and our overallapproach The working tool is validated by ten (10) requirementengineersystem analystContribution The contributions of this paper are the following

bull Formulation of feature-words based on SRS documentbull Exploring predictive power of feature-wordsbull Evaluation of nine classification algorithms for detectingambiguous requirements

bull An automated tool-supported approach for detecting am-biguous software requirements

This paper is structured as follows Section 2 discusses the re-lated work Section 3 describes the research questions and Section3 explains the approach We present the results and findings in Sec-tion 5 Discussion and Future Work in Section 7 This is followedwith conclusion in Section 8

2 RELATEDWORKVarious works have been done on software requirements qualityassurance There are also works focusing on the classification ofquality characteristics (eg [9]) and others develop a comprehensivechecklist [4] [7] and [26]]

Arora et al [5] proposed an automated solution and tool-supported[6] approach for checking conformance to requirements templatesThe work is based on two (2) requirement templates which areRupprsquos templates [8] and Easy Approach to Requirement Templates(EARS) [20] The approaches build on a natural language processingtechnique known as text chunking

Femmer et al [12] proposed an approach for smelling bad re-quirement based on requirement quality criteria listed at the ISO29148 - requirement engineering standards [1] This work aimed atrapidly detecting requirements that violate the Requirement Engi-neering principles They also develop a tool to detect the require-ments principles violation by using part-of-speech (POS) taggingmorphological analysis and dictionaries

Alshazly et al [3] concerned with detecting defects in softwareengineering specification based on defect taxonomies They pro-posed a taxonomy focusing on the defects in the requirementsphase and added correlations between the defects and the causesof their occurrences to guarantee the quality of SRS documentsThe taxonomy intended to help in training the software engineerSRS writers support creating accurate requirements and efficientrequirement defect prevention and requirement defect detection

Haron et al [15] formulated a conceptual model on managinglexical ambiguity to reduce the possibility of misinterpretation er-rors in Malay sentences by identifying potential Malay vague wordsPrior to this work [16] they performed a research to identify a listof potential ambiguous word in Malay language that intended to as-sist SRS writers or requirement engineers to avoid using the vaguewords while documenting SRS This ongoing study has collected120 potential ambiguous Malay words which can potentially beused as the reference in developing SRS

There are also work on improving software requirements speci-fication by using NLP and text mining such as the work by Sateliet al [25] The summary for the recent works are the following

a Most of the works for detecting defect in requirements specifi-cation refers to several requirement structures such as RupprsquosEARS IEEE 830 and ISOIECIEEE 29148 From our observationnot all requirements following the requirement structure butstill the requirement are still understandable or in fair qualityrequirement It is important to have a technique that detectrequirements defect based on truly unstructured text

b It is difficult to validate the requirements defect detection basedon the aforementioned requirement templates since case studiesthat follow the requirement structure are limited

c Requirement specification is highly related to the underlyinglanguage used to express the requirement Most of the worksfocus on English as the basic language The list of ambiguouswords provided by Haron et al [15][16] is beneficial to be usedin our research which focuses on Malaysian Context

3 RESEARCH QUESTIONSIn the context of the research problems specified in section 1 thispaper intended to answer the following research questionsRQ1 What are the influential words in classifying ambigu-ous software requirements We explore the predictive (classi-fication) power of each words that exist in software requirementspecifications which are used in this study We also evaluate thepredictive power of words that are suggested by Haron et al[17]and are mentioned in Berry et al [8]

RQ2 What are suitable classification algorithms in classify-ing ambiguous software requirement The candidate classifi-cation algorithms are evaluated to find the suitable algorithm(s) inclassifying ambiguous software requirement

RQ3 Which set of words produce the best classification per-formance We explore how the performance of the classificationalgorithm is influenced by partitioning the words in different sets
















































































































Abstract

1 Introduction

2 Related Work


4 Approach

41 Data Collection


43 Text Processing









62 Validation


71 Dataset


73 Prototype Tool


8 Conclusions

Acknowledgments

References
















































































































Abstract

1 Introduction

2 Related Work


4 Approach

41 Data Collection


43 Text Processing









62 Validation


71 Dataset


73 Prototype Tool


8 Conclusions

Acknowledgments

References

ambiguous software requirement specification detection: an … · 2019. 5. 9. · quirement based...

Documents