software development: do good manners matter? · software development: do good manners matter?...

35
Submitted 17 November 2015 Accepted 15 June 2016 Published 18 July 2016 Corresponding author Giuseppe Destefanis, [email protected] Academic editor Arie van Deursen Additional Information and Declarations can be found on page 29 DOI 10.7717/peerj-cs.73 Copyright 2016 Destefanis et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Software development: do good manners matter? Giuseppe Destefanis 1 , Marco Ortu 2 , Steve Counsell 1 , Stephen Swift 1 , Michele Marchesi 2 and Roberto Tonelli 2 1 Department of Computer Science, Brunel University, London, United Kingdom 2 Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari, Italy ABSTRACT A successful software project is the result of a complex process involving, above all, people. Developers are the key factors for the success of a software development process, not merely as executors of tasks, but as protagonists and core of the whole development process. This paper investigates social aspects among developers working on software projects developed with the support of Agile tools. We studied 22 open- source software projects developed using the Agile board of the JIRA repository. All comments committed by developers involved in the projects were analyzed and we explored whether the politeness of comments affected the number of developers involved and the time required to fix any given issue. Our results showed that the level of politeness in the communication process among developers does have an effect on the time required to fix issues and, in the majority of the analysed projects, it had a positive correlation with attractiveness of the project to both active and potential developers. The more polite developers were, the less time it took to fix an issue. Subjects Data Mining and Machine Learning, Data Science, Software Engineering Keywords Social and human aspects, Politeness, Mining software repositories, Issue fixing time, Software development INTRODUCTION High-level software development is a complex activity involving a range of people and activities; ignoring human aspects in the software development process or managing them in an inappropriate way can, potentially, have a huge impact on the software production process and team effectiveness. Increasingly, researchers have tried to quantify and measure how social aspects affect software development. Bill Curtis claimed that ‘‘the creation of a large software system must be analyzed as a behavioural process’’ (Curtis, Krasner & Iscoe, 1988). Coordinating and structuring a development team is thus a vital activity for software companies and team dynamics have a direct influence on group successfulness. Open-source development usually involves developers that voluntarily participate in a project by contributing with code-development. In many senses, the management of such developers is more complex than the management of a team within a company: developers are not in the same place at the same time and coordination therefore becomes more difficult. Additionally, the absence of face-to-face communication mandates the use of alternative technologies such as mailing lists, electronic boards or issue tracking systems. In this context, being rude or aggressive when writing a comment or replying to a contributor can affect the cohesion of the group, its membership and the successfulness of a project. How to cite this article Destefanis et al. (2016), Software development: do good manners matter? PeerJ Comput. Sci. 2:e73; DOI 10.7717/peerj-cs.73

Upload: others

Post on 13-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Submitted 17 November 2015Accepted 15 June 2016Published 18 July 2016

Corresponding authorGiuseppe Destefanis,[email protected]

Academic editorArie van Deursen

Additional Information andDeclarations can be found onpage 29

DOI 10.7717/peerj-cs.73

Copyright2016 Destefanis et al.

Distributed underCreative Commons CC-BY 4.0

OPEN ACCESS

Software development: do good mannersmatter?Giuseppe Destefanis1, Marco Ortu2, Steve Counsell1, Stephen Swift1,Michele Marchesi2 and Roberto Tonelli2

1Department of Computer Science, Brunel University, London, United Kingdom2Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari, Italy

ABSTRACTA successful software project is the result of a complex process involving, above all,people. Developers are the key factors for the success of a software developmentprocess, not merely as executors of tasks, but as protagonists and core of the wholedevelopment process. This paper investigates social aspects among developers workingon software projects developed with the support of Agile tools. We studied 22 open-source software projects developed using the Agile board of the JIRA repository.All comments committed by developers involved in the projects were analyzed andwe explored whether the politeness of comments affected the number of developersinvolved and the time required to fix any given issue. Our results showed that the levelof politeness in the communication process among developers does have an effect on thetime required to fix issues and, in the majority of the analysed projects, it had a positivecorrelation with attractiveness of the project to both active and potential developers.The more polite developers were, the less time it took to fix an issue.

Subjects Data Mining and Machine Learning, Data Science, Software EngineeringKeywords Social and human aspects, Politeness, Mining software repositories, Issue fixing time,Software development

INTRODUCTIONHigh-level software development is a complex activity involving a range of people andactivities; ignoring human aspects in the software development process or managing themin an inappropriate way can, potentially, have a huge impact on the software productionprocess and team effectiveness. Increasingly, researchers have tried to quantify andmeasurehow social aspects affect software development. Bill Curtis claimed that ‘‘the creation ofa large software system must be analyzed as a behavioural process’’ (Curtis, Krasner &Iscoe, 1988). Coordinating and structuring a development team is thus a vital activity forsoftware companies and team dynamics have a direct influence on group successfulness.Open-source development usually involves developers that voluntarily participate in aproject by contributing with code-development. In many senses, the management of suchdevelopers is more complex than the management of a team within a company: developersare not in the same place at the same time and coordination therefore becomes moredifficult. Additionally, the absence of face-to-face communication mandates the use ofalternative technologies such as mailing lists, electronic boards or issue tracking systems. Inthis context, being rude or aggressive when writing a comment or replying to a contributorcan affect the cohesion of the group, its membership and the successfulness of a project.

How to cite this article Destefanis et al. (2016), Software development: do good manners matter? PeerJ Comput. Sci. 2:e73; DOI10.7717/peerj-cs.73

Page 2: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

On the other hand, a respectful environment provides an incentive for new contributorsto join the project and could significantly extend the lifetime and usefulness of a project tothe community.

According to VersionOne (https://www.versionone.com/pdf/2013-state-of-agile-survey.pdf) (VersionOne, 2013): ‘‘more people are recognising that agile developmentis beneficial to business, with an 11% increase over the last two years in the number ofpeople who claim that agile helps organisations complete projects faster’’. A main priorityreported by users was to accelerate time to market, manage changing priorities moreeasily and better align IT and business objectives. Agile project management tools andKanban boards experienced the largest growth in popularity of all agile tool categories,with use or planned use increasing by 6%. One of the top five ranked tools was AtlassianJIRA (https://www.atlassian.com/software/jira), with an 87% recommendation rate. Agileboards represent the central aspect of communication in the Agile philosophy. Accordingto Perry (2008) ‘‘the task board is one of the most important radiators used by an agile teamto track their progress.’’ The JIRA board is a good solution for bridging the gap betweenopen-source software development and the Agile world. It is the view of many that agiledevelopment requires a physical aspect, i.e., developers working together in the same roomor building, or at the same desk; the pair programming paradigm, for example, requires atleast two people working simultaneously on the same piece of code. By using tools such asthe JIRA board (Fig. 1) it is possible to use an agile board for development of a project bydevelopers in different physical places. Working remotely, in different time zones and withdifferent time schedules, with developers from around the world, requires coordinationand communication. The JIRA board displays issues from one or more projects, giving thepossibility of viewing, managing and reporting on work in progress. It is possible to use aboard that someone else has created, or create as many boards as needed.

When a new developer joins a development team, the better the communicationprocess works, the faster the new developer can become productive and the learning curvereduced. The notion of an agile board therefore places emphasis on the know-how andshared-knowledge of a project being easily accessible for the development team throughoutthe development process. Fast releases, continuous integration and testing activities aredirectly connected to the knowledge of the system under development. The potential foragile boards to simplify development across geographically disparate areas is in this senserelatively clear. In a similar vein, the social and human aspects of the development processare becoming more and more important. The Google work style has become a modelfor many software start-ups: a pleasant work environment is important and affects theproductivity of employees.

One important contributor to a healthy work environment is that each employeeis considerate and polite towards their fellow employees. Collins dictionary (http://www.collinsdictionary.com/dictionary/english/polite) defines politeness as ‘‘showingregard for others, in manners, speech, behaviour, etc.’’ We focus on the politeness of thecomment-messages written by the developers. The research aims to show how projectmanagement tools such as agile boards can directly affect the productivity of a softwaredevelopment team and the health of a software project.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 2/35

Page 3: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 1 Example of JIRA board with issues.

The state of the art tool developed by Danescu-Niculescu-Mizil et al. (2013) was usedto evaluate politeness within comment-messages. The authors proposed a machinelearning approach for evaluating the politeness of a request posted in two different webapplications: Wikipedia (https://en.wikipedia.org/wiki/Main_Page) and Stack Overflow(http://stackoverflow.com). Stack Overflow is well known in the software engineeringfield and is largely used by software practitioners; hence, the model that authors usedin Danescu-Niculescu-Mizil et al. (2013) was suitable for our domain based on Jira issues,where developers post and discuss about technical aspects of issues. The authors provide aWeb application (http://www.mpi-sws.org/~cristian/Politeness.html) and a library versionof their tool. To prepare the training set for the machine learning approach, over 10,000utterances were labeled using AmazonMechanical Turk. The authors decided to restrict theresidence of the annotators to the US and conducted a linguistic background questionnaire.

Since ‘‘politeness is a culturally defined phenomenon and what is considered politein one culture can sometimes be quite rude or simply eccentric in another culturalcontext’’ (http://en.wikipedia.org/wiki/Politeness), the choice of limiting the residence ofthe annotators to the US could be interpreted as a weakness of the the tool. However, theannotators analysed comments written by authors from around the world and not only

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 3/35

Page 4: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

from the US. Therefore, the possible bias introduced by annotators with a similar culturalbackground, is reduced and the different cultures of the developers involved in the analysisconsidered. The use of the tool would have been problematic, if annotators were from theUS and they had analysed only comments written by authors from the US.

We considered 22 open source projects from one of the largest datasets of issues reportsopenly available (Ortu et al., 2015d). This paper aims to answer the following researchquestions:• Does a relationship exist between politeness and issues fixing time?Issue fixing time for polite issues is shorter than issue fixing time for impolite andmixed issues.• Does politeness among developers affect the attractiveness of a project?Magnetism and Stickiness are positively correlated with the percentage of politecomments.• Does the percentage of polite comments vary over time?The percentage of polite comments does vary over time and in some cases it changesfrom lower percentage of polite comments to higher percentage of polite commentsfrom two consecutive observation intervals. The percentage of polite comments overtime is (for the majority of the projects in our corpus) seasonal and not random.• How does politeness vary with respect to JIRAmaintenance types and issuepriorities?Comments related to issues with maintenance Bug, priority Minor and Trivial, tend tohave a higher percentage of impolite comments. Issues with maintenance New Feature,priority Blocker and Critical tend to have a higher percentage of polite comments.

This paper is an extended version of earlier work by the same authors (Ortu et al., 2015b).We added eight new systems to the original corpus analysed in (Ortu et al., 2015b) andtwo new research question (RQ3 and RQ4), we also reviewed the RQ2 performing deeperstatistical analysis. The remainder of this paper is structured as follows: In the next section,we provide related work. ‘Experimental Setup’ describes the dataset used for this studyand our approach/rationale to evaluate the politeness of comments posted by developers.In ‘Results,’ we present the results and elaborate on the research questions we address.In ‘Discussion’ we present a discussion on the obtained results and ‘Threats to Validity’discusses the threats to validity. Finally, we summarise the study findings and present plansfor future work in ‘Conclusions and Future Work.’

RELATED WORKA growing body of literature has investigated the importance and the influence of humanand social aspects, emotions and mood both in software engineering and softwaredevelopment. Research has focused on understanding how the human aspects of a technicaldiscipline can affect final results (Brief & Weiss, 2002;Capretz, 2003;Cockburn & Highsmith,2001; Erez & Isen, 2002; Kaluzniacky, 2004), and the effect of politeness (Novielli, Calefato& Lanubile, 2014; Tan & Howard-Jones, 2014; Winschiers & Paterson, 2004; Tsay, Dabbish& Herbsleb, 2014; Rousinopoulos, Robles & González-Barahona, 2014).

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 4/35

Page 5: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Feldt et al. (2008) focused on personality as a relevant psychometric factor and presentedresults from an empirical study about correlations between personality and attitudes tosoftware engineering processes and tools. The authors found that higher levels of thepersonality dimension ‘‘conscientiousness’’ correlated with attitudes towards work style,openness to changes and task preference.

IT companies are also becamingmore conscious of social aspects. Ehlers (2015) evaluatedthe efforts of IT companies in acquiring software engineers by emphasizing socialness intheir job advertising. The research analyzed 75,000 jobs advertising from the recruitingplatform Indeed and about 2,800 job ads from StackoverflowCareers to investigatecorrelations between social factors and the employee satisfaction of a work place. Thefindings showed that many companies advertise socialness explicitly. The Manifesto forAgile Development indicates that people and communications are more essential thanprocedures and tools (Beck et al., 2001).

Studies have also investigated the relationship between affect and work-relatedachievements, including performance (Miner & Glomb, 2010) and problem-solvingprocesses, such as creativity, (Amabile et al., 2005). Furthermore, strong evidence foremotional contagion on all its possible polarities has been found in a recent very large scalestudy (Kramer, Guillory & Hancock, 2014). Therefore, affect is an interesting avenue forresearch in software engineering.

Steinmacher et al. (2015) analyzed social barriers that obstructed first contributions ofnewcomers (new developers joining an open-source project). The study indicated howimpolite answers were considered as a barrier by newcomers. These barriers were identifiedthrough a systematic literature review, responses collected from open source projectcontributors and students contributing to open source projects.

Roberts, Hann & Slaughter (2006) conducted a study which revealed how the differentmotivations of open-source developers were interrelated, how thesemotivations influencedparticipation and how past performance influenced subsequent motivations.

Guzman & Bruegge (2013) and Guzman (2013a) have proposed prototypes and initialdescriptive studies towards the visualization of affect over a software development process.In their work, the authors applied sentiment analysis to data coming frommailing lists, webpages, and other text-based documents of software projects. Guzman et al. built a prototypeto display a visualization of the affect of a development team, and they interviewed projectmembers to validate the usefulness of their approach. In another study, Guzman, Azócar& Li (2014), performed sentiment analysis of Github’s commit comments to investigatehow emotions were related to a project’s programming language, the commits’ day ofthe week and time, and the approval of the projects. The analysis was performed over 29top-starred Github repositories implemented in 14 different programming languages. Theresults showed Java to be the programming language most associated with negative affect.No correlation was found between the number of Github stars and the affect of the commitmessages.

Panichella et al. (2015) presented a taxonomy to classify app reviews into categoriesrelevant to software maintenance and evolution, as well as an approach that mergesthree techniques (Natural Language Processing, Text Analysis, Sentiment Analysis) to

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 5/35

Page 6: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

automatically classify app reviews into the proposed categories. The authors showed thatthe combined use of these techniques achieves better results (a precision of 75% and arecall of 74%) than results obtained using each technique individually (precision of 70%and a recall of 67%).

Pletea, Vasilescu & Serebrenik (2014) studied security-related discussions on GitHub,as mined from discussions around commits and pull requests. The authors found thatsecurity-related discussions account for approximately 10% of all discussions on GitHuband that more negative emotions were expressed in security-related discussions than inother discussions. These findings confirmed the importance of properly training developersto address security concerns in their applications as well as the need to test applicationsthoroughly for security vulnerabilities in order to reduce frustration and improve overallproject atmosphere.

Garcia, Zanetti & Schweitzer (2013) analyzed the relation between the emotions and theactivity of contributors in theOpen Source Software project Gentoo. The case study built onextensive data sets from the project’s bug tracking platform Bugzilla, to quantify the activityof contributors, and its mail archives, to quantify the emotions of contributors by meansof sentiment analysis. The Gentoo project is known for a period of centralization withinits bug triaging community. This was followed by considerable changes in communityorganization and performance after the sudden retirement of the central contributor. Theauthors analyzed how this event correlated with the negative emotions, both in bilateralemail discussions with the central contributor, and at the level of the whole communityof contributors. The authors also extended the study to consider the activity patterns ofGentoo contributors in general. They found that contributors were more likely to becomeinactive when they expressed strong positive or negative emotions in the bug tracker, orwhen they deviated from the expected value of emotions in the mailing list. The authorsused these insights to develop a Bayesian classifier which detected the risk of contributorsleaving the project.

Graziotin, Wang & Abrahamsson (2015) conducted a qualitative interpretive study basedon face-to-face open-ended interviews, in-field observations and e-mail exchanges. Thisenabled the authors to construct a novel explanatory theory of the impact of affects ondevelopment performance. The theory was explicated using an established taxonomyframework. The proposed theory built upon the concepts of events, affects, attractors,focus, goals, and performance.

In other work Graziotin, Wang & Abrahamsson (2014) reported the results of aninvestigation with 42 participants about the relationship between the affective states,creativity, and analytical problem-solving skills of software developers. The results offeredsupport for the claim that happy developers were better problem solvers in terms oftheir analytical abilities. The authors provided a better understanding of the impact ofaffective states on the creativity and analytical problem-solving capacities of developers,introduced and validated psychological measurements, theories, and concepts of affectivestates, creativity, and analytical-problem-solving skills in empirical software engineering,and raised the need for studying the human factors of software engineering by employinga multi-disciplinary viewpoint.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 6/35

Page 7: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Rigby & Hassan (2007) analyzed, using a psychometrically-based linguistic analysis tool,the five big personality traits of software developers in the Apache httpd server mailinglist. The authors found that two developers responsible for the major Apache releaseshad similar personalities and their personalities were different from other developers.Bazelli, Hindle & Stroulia (2013) analyzed questions and answers on stackoverflow.com todetermine the developer personality traits, using the Linguistic Inquiry and Word Count(Pennebaker, Francis & Booth, 2001). The authors found that the top reputed authors weremore extroverted and expressed less negative emotions than authors of down voted posts.

Tourani, Jiang & Adams (2014) evaluated the use of automatic sentiment analysis toidentify distress or happiness in a team of developers. They extracted sentiment valuesfrom themailing lists of twomature projects of theApache software foundation, consideringdevelopers and users. The authors found that an automatic sentiment analysis tool obtainedlow precision on email messages (due to long size of the analyzed text) and that users anddevelopers express positive and negative sentiment on mailing lists.

Murgia et al. (2014b) analyzed whether issue reports carried any emotional informationabout software development. The authors found that issue reports contain emotionsregarding design choices, maintenance activity or colleagues.Gómez et al. (2012) performedan experiment to evaluate whether the level of extraversion in a team influenced the finalquality of the software products obtained and the satisfaction perceived while this workwas being carried out. Results indicated that when forming work teams, project managersshould carry out a personality test in order to balance the amount of extraverted teammembers with those who are not extraverted. This would permit the team members to feelsatisfied with the work carried out by the team without reducing the quality of the softwareproducts developed.

Acuña, Gómez & Juristo (2008), performed empirical research examining the workclimate within software development teams. The authors attempted to understand if teamclimate (defined as the shared perceptions of team work procedures and practices) boreany relation to software product quality. They found that high team vision preferencesand high participative safety perceptions of the team were significantly related to bettersoftware. In a study conducted by Fagerholm et al. (2014), it was shown that softwareteams engaged in a constant cycle of interpreting their performance. Thus, enhancingperformance experiences requires integration of communication, team spirit and teamidentity into the development process.

Jongeling, Datta & Serebrenik (2015) studied whether the sentiment analysis tools agreedwith the sentiment recognized by human evaluators as well as with each other. Furthermore,the authors evaluated the impact of the choice of a sentiment analysis tool on softwareengineering studies by conducting a simple study of differences in issue resolution timesfor positive, negative and neutral texts. The authors repeated the study for seven datasetsand different sentiment analysis tools and observed that the disagreement between thetools can lead to contradictory conclusions.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 7/35

Page 8: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 1 Selected Projects Statistics.

Project # of comments # of developers

HBase 91,016 951Hadoop Common 61,958 1,243Derby 52,668 675Lucene Core 50,152 1,107Hadoop HDFS 42,208 757Cassandra 41,966 1,177Solr 41,695 1,590Hive 39,002 850Hadoop Map/Reduce 34,793 875Harmony 28,619 316OFBiz 25,694 578Infrastructure 25,439 1,362Camel 24,109 908ZooKeeper 16,672 495GeoServer 17,424 705Geronimo 18,017 499Groovy 18,186 1,305Hibernate ORM 23,575 4,037JBoss 23,035 453JRuby 22,233 1,523Pig 21,662 549Wicket 17,449 1,243Tot 737,572 18,144

EXPERIMENTAL SETUPDatasetWe built our dataset collecting data from the Apache Software Foundation Issue Trackingsystem, JIRA (https://www.atlassian.com/software/jira). An Issue Tracking System (ITS) isa repository used by software developers to support the software development process. Itsupports corrective maintenance activity like Bug Tracking systems, along with other typesof maintenance requests. We mined the ITS of the Apache Software Foundation collectingissues from October 2002 to December 2013. In order to create our dataset, since the focusof our study was about the usefulness of Agile boards, we selected projects for which theJIRA Agile board contained a significant amount of activity. Namely, we selected thosesystems for which the Agile board containedmore than 15,000 comments (in order to builda time series with sufficient data) and there was a recordedmonthly activity (i.e., developerswere active for every considered month). The JIRA dataset contains roughly 1,000 projects,150 of which characterised by more than 1,000 comments. Table 1 shows the corpus of 22projects selected for our analysis, highlighting the number of comments recorded for eachproject and the number of developers involved.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 8/35

Page 9: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

1User’s names are reported as<dev_name_a> for the sake of privacy.

Table 2 Examples of polite and impolite comments.

Comment POLITE

Hey <dev_name_a>,Would you be interested in contributinga fix and a test case for this as well?Thanks,<dev_name_b>

YES

<dev_name>, can you open anew JIRA for those suggestions?I’ll be happy to review.

YES

<dev_name>, the latest patch isn’t applyingcleanly to trunk - could you resubmit it please?Thanks.

YES

<dev_name>,Since you can reproduce, do you still wantthe logs? I think I still have them if needed.

YES

Why are you cloning tickets?Don’t do that.

NO

shouldnt it check for existence oftarball even before it tries to allocateand error out ???

NO

<dev_name_a>, why no unit test?<dev_name_b>, why didn’t you waitfor +1 from Hudson???

NO

> this isn’t the forum to clarifyWhy not? The question is whetherthis is redundant with Cascading,so comparisons are certainly relevant, no?

NO

Comments politenessGiven some texts, the tool developed byDanescu-Niculescu-Mizil et al. (2013) calculates thepoliteness of its sentences providing one of two possible labels: polite or impolite as result.impolite. Table 2 shows some examples of polite and impolite comments as classified bythe tool.1

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 9/35

Page 10: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 2 Example of timeline for JIRA issue.

We evaluated the percentage of polite comments per month considering all commentsposted in a certainmonth. For each comment we assigned a value according to the followingrules:

• Value of +1 for those comments marked as polite by the tool.• Value of −1 for those comments marked as impolite.

Finally, we calculated the percentage of polite comments for a certain month. Weanalyzed the politeness of about 700K comments.

Issue politenessThe next step was to infer the politeness of issues from the knowledge of commentspoliteness. We grouped issues together as follows:

• we first divided comments in polite sets: polite, impolite;• we divided issues in three sets: polite issues (commented only with polite comments),impolite issues (commented only by impolite comments) and mixed issue (commentedwith both polite and impolite comments).

For each issue we evaluated the politeness expressed in its comments and we then dividedissues in three groups: polite issues containing polite comments, impolite issues containingimpolite comments and mixed issues containing both polite and impolite comments.Our dataset contains in total 174,871 issues, 5.3% (9,269) of the total were classified aspolite, 56.9% (99,501) classified as impolite, 37.8% (66,101) classified as mixed. For eachof this three groups of issues we evaluated the issue fixing time as the difference betweenresolution and creation time. Figure 2 shows the typical issue timeline in JIRA:• T cr represents the time when an issue is created.• T cl represents the time when an issue is closed.• T a represents the time when an issue is assigned to a developer.• T s is the time when a developer subscribes to an issue that has been assigned to them.

To infer the issue fixing time (abbreviated as IFT), we used the approach proposed byMurgia et al. (2014a). We computed the time interval between the last time an issue hadbeen closed and the last time it had been subscribed to by an assignee.

AttractivenessOur research focuses around the concepts developed by Yamashita et al. (2014) andYamashita et al. (2016) who introduced the concepts of magnetism and stickiness for

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 10/35

Page 11: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

2We consider as active all developers whoposted/commented/resolved/modifiedan issue during the observed time (fromdev_1 to dev_10).

Figure 3 Example of Magnetism and Stickiness metrics computation in 2011.

a software project. A project is classified as Magnetic if it has the ability to attract newdevelopers over time. Stickiness is the ability of a project to keep its developers over time.We measured these two metrics by considering the period of observation of one year.Figure 3 shows an example of the evaluation of Magnetism and Stickiness metrics. In thisexample, we were interested in calculating the value of Magnetism and Stickiness for 2011.From 2010 to 2012, we had a total of 10 active2 developers. In 2011, there were seven activedevelopers and 2 of them (highlighted with black heads) were new. Only 3 (highlightedwith grey heads) of the seven active developers in 2011 were also active in 2012. We canthen calculate the Magnetism and Stickiness as follows:• Magnetism is the fraction of new active developers during the observed time interval, inour example 2/10 (dev_6 and dev_7 were active in 2011 but not in 2010).• Stickiness is the fraction of active developers that were also active during next timeinterval, in our example 3/7 (dev_1, dev_2, dev_3 were active in 2011 and in 2012).

Data analysisTo perform statistical testing, filter the data and produce the visualisation of the results, weused scikit-learn (http://scikit-learn.org/stable/) for RQ1, and the R projects for statisticalcomputing (R Development Core Team, 2014) for the other RQs. To facilitate replication ofour study, we have created a replication package (https://bitbucket.org/giuseppedestefanis/peerjcs_replicationpackage) which contains the dataset, the tool used to detect politenessand the R and Python scripts for performing the statistical analysis.

RESULTSDoes a relationship exist between politeness and issues fixing time?Motivation. Murgia et al. (2014a) demonstrated the influence of maintenance type on theissue fixing time, while Zhang, Gong & Versteeg (2013) developed a prediction model forbug fixing time for commercial software.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 11/35

Page 12: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

In another study, Zhang et al. (2012) analyzed the interval between bug assignment andthe time when bug fixing starts (bugs are classified as a type of issue in JIRA). After a bugbeing reported and assigned, some developers will immediately start fixing the bug whileothers will start bug fixing after a long period. The authors explored the delays of developersempirically analyzing three open source software systems. They studied factors affectingbug fixing time along three dimensions (bug reports, source code involved in the fix andcode changes that are required to fix the bug) and compared such factors using logisticregression models. The most influential factors on bug fixing appeared to be the severityof a bug, its description and comments and operating system where a bug was found.Hence, there are indeed many factors able to influence bugs fixing time and issues fixingtime; in this case, we were interested in finding out if politeness expressed by developers incomments had an influence on issue fixing time.

To detect differences among the fixing time of polite, impolite andmixed issues, we usedthe Kruskal–Wallis test. Such a test is non parametric and unpaired (Siegel, 1956; Kruskal& Wallis, 1952; Weiss et al., 2007). The test can be used with no restrictions or hypotheseson the statistical distribution of the sample populations. The test is suitable for comparingdifferences among the medians of two or more populations when their distributions arenot gaussian. We grouped all the issues (by category) of all the projects contained in ourcorpus and then we tested the null hypothesis H0: the three distributions of issue fixing timeare equal for the three typologies of considered issues (polite, impolite, mixed). The outcomeof the Kruskal–Wallis test is a p-value p< 2−16 indicating that the three distributions arestatistically different. Figure 4 shows the box-plot of the issues fixing time for the threegroups of issues considered (polite, impolite and mixed) for all the issues analysed. Theissues fixing time is expressed in hours on a logarithmic scale. The median of the issuefixing time for polite issues is shorter than that for impolite and mixed issues, while themedian for impolite issues is shorter than the one for mixed issues. Figure 4 also shows thatthe percentage of impolite issues is the highest (56.9%), followed by mixed issues (37.7%)and then polite issue (5.3%).

Figures 5 and 6 show the boxplots of the issue fixing time when considering singleprojects. We considered the four projects (Hadoop HDFS, Derby, Lucene-Core, HadoopMap/Reduce) with the highest number of comments in our corpus as examples. Forvisualising the boxplots of Figs. 5 and 6, we grouped the issues by project and then bycategory. The results obtained with all the issues grouped together (without distinguishingfor project) are confirmed also with these four examples. Themedian of the issues for politeissues is lower than the other two medians, and also in this cases issues with mixed politeand impolite comments have a longer issue fixing time than issues with impolite comments.

Findings. Issue fixing time for polite issues is shorter than issue fixing time for impolite andmixed issues.

Does politeness among developers affect the attractivenessof a project?Motivation. Magnetism and Stickiness are two interesting metrics able to describe thegeneral health of a project; namely, if a project is able to attract new developers and to

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 12/35

Page 13: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 4 Box-plot of the fixing-time expressed in Hours. The number in parentheses next to issuegroup indicates the percentage of issues.

Figure 5 Box-plot of the fixing-time expressed in Hours for single projects. The number in parentheses next to polite/impolite indicates the per-centage of impolite and polite issues.

keep them over time we can then conclude that the project is healthy. On the other hand,if a project is not magnetic and is not sticky we can conclude that the project is losingdevelopers and is not attracting new developers over time. Although there may be manyfactors influencingmagnetism and stickiness, wewere interested in analysing the correlationbetween politeness expressed by developers in their comments and these two metrics.

To detect if there was a direct correlation between magnetism and stickiness of a projectand politeness, we considered an observation time of one month. During this time interval

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 13/35

Page 14: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 6 Box-plot of the fixing-time expressed in Hours for single projects. The number in parentheses next to polite/impolite indicates the per-centage of impolite and polite issues.

we measured magnetism, stickiness and the percentage of polite comments. Since wehad no evidence that the politeness in the observed time could affect magnetism andstickiness in the same time interval or in the following observation time, we evaluated across-correlation coefficient. To study the cross correlation, we first studied the time seriesfor stationarity. A time series Xt (t = 1,2...) is considered to be stationary if its statisticalproperties (autocorrelation, variance, expectation) do not vary with time (Priestley, 1981;Priestley, 1988; Hamilton, 1994). Time series stationarity is a condition required forcalculating the most common correlation coefficient. However, it is still possible to studycross-correlation between time series which are not stationary (Krištoufek, 2014).

We used the R package fpp (https://cran.r-project.org/package=fpp) and we applied:

• Ljung–Box test (Box & Pierce, 1970; Ljung & Box, 1978): this test for stationarityconfirms independence of increments, where rejection of the null hypothesisH0 indicatesstationarity (the null hypothesis H0 is that the data are non-stationary);• Augmented Dickey-Fuller (ADF) t -statistic test (Said & Dickey, 1984; Diebold &Rudebusch, 1991; Banerjee et al., 1993): in the Augmented Dickey-Fuller (ADF) t -statistictest the null hypothesis H0 is that the data are non-stationary (small p-values (e.g., lessthan 0.05) suggest that the time-series is stationary).• Kwiatkowski-Phillips–Schmidt–Shin test (KPSS) (Kwiatkowski et al., 1992): this testreverses the hypotheses, hence the null-hypothesisH0 is that the time-series is stationary.Small p-values (e.g., less than 0.05) suggest that the time-series is not stationary.

We decided to proceed using the results obtained from the three tests used for checkingstationarity of a time series and then consider the worst case scenario (e.g., even if only onetest out of three indicated rejection of the hypothesis of stationarity for a given time series,we considered that time series as non stationary). The results of the three tests are shownin Table 3. The cells in grey indicate that the p-value for the corresponding test is below 5%(our cutoff for significance), thus we infer in these cases that the test indicates stationarity

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 14/35

Page 15: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 3 P-value results for stationarity tests (Ljung–Box, Augmented Dickey-Fuller, KPSS).

Project Politeness Magnetism Stickiness

HBase 0.00023 0.028 0.01 0.9765 0.01 0.0285 0.266 0.156 0.1Hadoop C. 0.00072 0.59 0.044 0.0677 0.01 0.1 3.062e-08 0.01 0.01Derby 1.626e-08 0.01 0.01 0.19 0.049 0.1 0.037 0.094 0.01Lucene C. 0.498 0.0173 0.017 0.38 0.08 0.01 3.105e-13 0.48 0.01Hadoop HDFS 0.025 0.79 0.09 0.007 0.01 0.01 0.093 0.30 0.036Cassandra 0.25 0.043 0.1 0.57 0.18 0.1 0.099 0.387 0.01Solr 0.0075 0.31 0.01 0.0003 0.01 1.615e-07 0.01 0.01 0.01Hive 8.075e-08 0.113 0.01 0.335 0.32 0.01 0.006 0.28 0.01Hadoop MR 6.815e-05 0.4 0.01 0.0003 0.06 0.01 8.821e-08 0.92 0.012Harmony 0.03 0.012 0.01 0.0019 0.23 0.058 3.658e-08 0.01 0.01OFBiz 0.0006 0.01 0.01 0.91 0.09 0.1 7.986e-05 0.01 0.01Infrastructure 0.16 0.01 0.1 0.0002 0.01 0.99 0.53 0.028 0.01Camel 0.00097 0.01 0.01 0.79 0.01 0.1 4.885e-15 0.36 0.01ZooKeeper 0.14 0.45 0.046 0.26 0.01 0.1 0.398 0.17 0.01GeoServer 0.064 0.01 0.01 0.0005 0.01 0.1 0.49 0.027 0.011Geronimo 0.32 0.01 0.1 0.84 0.078 0.1 2.665e-15 0.59 0.01Groovy 8.514e-05 0.01 0.047 0.14 0.01 0.1 1.225e-07 0.01 0.01Hibernate ORM 4.393e-05 0.063 0.016 0.15 0.01 0.06 3.074e-12 0.01 0.033JBoss 0.36 0.01 0.1 0.53 0.95 0.027 2.2e-16 0.88 0.01JRuby 0.67 0.29 0.02 0.074 0.01 0.1 0.00075 0.32 0.01Pig 7.096e-05 0.12 0.02 0.59 0.01 0.078 2.2e-16 0.45 0.01Wicket 0.084 0.064 0.1 0.093 0.202 0.1 1.105e-05 0.33 0.018

(on the contrary, cells in white indicate that the p-value for the corresponding test is above5%). For example, for the percentage of polite comments time series of HBase (row 1, firstthree cells), we have that the first two tests (Box-Ljung and Augmented Dickey-Fuller)suggest stationarity, while the third test (KPSS) rejects the hypothesis of stationarity.For the majority of the cases, the tests provides discordant results. Only for Magnetismfor Infrastructure (row 12) and GeoServer (row 15) there is agreement among the threetests. Thus, we considered all the time series in Table 3, except for the cases mentionedbefore, being not stationary. Table 4 shows the results obtained applying the algorithmillustrated in Krištoufek (2014) (http://stats.stackexchange.com/questions/149799/code-for-detrended-cross-correlation-in-r). For the majority of the cases there is a weak positivecorrelation between politeness and Magnetism (14 projects out 22) and politeness andStickiness (13 projects out 22).

We also calculated the cross correlation (using the ccf function in R) after applyingtime series differencing to transform time series that were not stationary in Table 3 intostationary. The D differencing operator applied to a time series k is to create a new serieskn whose value at time t is the difference between k(t+ t1) and k(t ). This method is usefulfor removing cycles and trends.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 15/35

Page 16: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 4 Cross-Correlation for not stationary series.

Project Politeness-Magnetism Politeness-Stickiness

HBase 0.131 −0.096Hadoop Common 0.024 0.018Derby −0.0018 0.0147Lucene Core 0.32 0.137Hadoop HDFS −0.08 0.104Cassandra 0.296 −0.136Solr −0.169 0.0018Hive 0.305 0.184Hadoop Map/Reduce −0.047 0.052Harmony 0.21 0.026OFBiz 0.11 0.097Infrastructure −0.025 0.292Camel 0.115 0.013ZooKeeper −0.19 −0.11GeoServer 0.22 −0.13Geronimo 0.23 0.27Groovy 0.013 −0.22Hibernate ORM 0.02 −0.1JBoss −0.26 −0.2JRuby −0.12 −0.06Pig 0.26 0.14Wicket 0.016 −0.14

Figure 7 Differencing Time-series.

As an example, Fig. 7A shows the Magnetism time-series for Lucene Core. From Table 3,row 4, we can see that the time series is not stationary, since all the three test failed inproving stationarity. By applying the differencing operator (first differencing), we obtainthe new time-series in Fig. 7B.

The new time-series in Fig. 7B is stationary; all the three tests provide the sameindication for stationarity (Box-Ljung: p−value = 1.4e−13, Augmented Dickey-Fuller:

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 16/35

Page 17: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 8 Cross-Correlation—Lucene Core.

p−value= 0.01, KPSS: p−value= 0.1) and the plot appears roughly horizontal (the visualinspection is a practical rule of thumb which can help when evaluating a time-series forstationarity). If the time-series under study are stationary, it is possible to calculate thecross correlation. The cross correlation function (ccf) is defined as the set of correlations(height of the vertical line segments in Fig. 8) between two time series xt +h and yt forlags h= 0,±1,±2,.... A negative value for h represents a correlation between the x-seriesat a time before t and the y-series at time t . For example, if the lag h=−1, then thecross correlation value would give the correlation between xt −1 and yt . On the contrary,negative lines correspond to anti-correlated events.

The ccf helps to identify lags of xt that could be predictors of the yt series.• When h< 0 (left side of plots in Figs. 8A and 8B), x leads y .• When h> 0 (right side of plots in Figs. 8A and 8B), y leads x .

Table 5 shows the maximum value of the cross-correlation coefficient between thepercentage of polite comments and Magnetism and the percentage of polite commentsand Stickiness and the lag (or lags) in which the maximum value occurs. The values arecalculated using the R function ccf .

A negative lag x means that the current values of Stickiness are likely to be higher, if xmonth before the percentage of polite comments was higher. On the other hand, a positivelag z means that a current higher percentage of polite comments is linked with higherMagnetism z months later. For both Magnetism and Stickiness, we observe that a positivemaximum correlation exists.

The difference sin lags presented inTable 5 could be explained looking at the compositionof our corpus. We selected the projects with a higher number of comments from JIRA,regardless of domain, history and/or programming language used. Additionally, there aresystems which are younger than others and, as a consequence, the time series may havedifferent lengths.

Findings. Magnetism and Stickiness are positively correlated with the percentage of politecomments.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 17/35

Page 18: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 5 Politeness vs Magnet and Sticky Cross-Correlation Coefficient.

Project Pol VsMag Lag Pol Vs Stick Lag

HBase 0.378 −1,7 0.418 6Hadoop Common 0.283 −5 0.203 5Derby 0.181 15 0.185 −1Lucene Core 0.31 −6 0.243 −11Hadoop HDFS 0.3 0 0.32 −1Cassandra 0.414 5 0.315 2Solr 0.33 −4 0.23 12Hive 0.4 −14 0.3 −13Hadoop Map/Reduce 0.35 3 0.35 3Harmony 0.3 −5 0.35 1OFBiz 0.25 14 0.22 −11Infrastructure 0.17 4 0.26 −11Camel 0.28 −12 0.26 −14,-2,7ZooKeeper 0.41 6 0.27 −2GeoServer 0.3 0 0.2 −5,−1Geronimo 0.32 0 0.24 0Groovy 0.2 3 0.21 2JBoss 0.3 8 0.43 5Hibernate ORM 0.23 1 0.18 −3JRuby 0.25 10 0.2 −10Pig 0.38 11 0.17 7,9Wicket 0.3 −13 0.27 11

Does the percentage of polite comments vary over time?Motivation. Politeness has an influence on the productivity of a team (Ortu et al., 2015b;Ortu et al., 2015a;Ortu et al., 2015c). Thus, it is interesting to understand if there are periodsof time in which the level of politeness decreases (potentially affecting the productivity ofa team).

We calculated the level of politeness for any given issue and then plotted thepercentage of polite comments per month grouping issues per project. For each projectconsidered in this study, the percentage of polite comments over time can be seen astime series, hence we performed tests for randomness and seasonality to understandthe nature of politeness time series. A time series is considered random if it consists ofindependent values from the same distribution. We used the Bartels test (Bartels, 1982)for studying randomness (Brockwell & Davis, 2006) and the results from the AugmentedDickey-Fuller test and KPSS test from ‘Does politeness among developers affect theattractiveness of a project?’ for studying seasonality. We used the R package randtest(https://cran.r-project.org/web/packages/randtests/randtests.pdf) and we applied:

• Bartels test: in this test, the null hypothesis H0 of randomness is tested against nonrandomness.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 18/35

Page 19: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 6 Randomness and seasonality test results for Politeness.

Randomness Seasonality

Project Bartels-rank p-value Aug. D-F p-value KPSS

HBase 0.0006529 0.028 0.01Hadoop Common 3.362e-06 0.59 0.044Derby 6.824e-08 0.01 0.01Lucene Core 0.008213 0.0173 0.017Hadoop HDFS 1.622e-06 0.79 0.09Cassandra 0.3747 0.043 0.1Solr 0.0026 0.31 0.01Hive 1.044e-09 0.113 0.01Hadoop Map/Reduce 0.0004533 0.4 0.01Harmony 0.001998 0.012 0.01OFBiz 0.0003783 0.01 0.01Infrastructure 0.02888 0.01 0.1Camel 2.951e-06 0.01 0.01ZooKeeper 0.07181 0.45 0.046GeoServer 0.0008848 0.01 0.01Geronimo 0.0702 0.01 0.1Groovy 0.00025 0.01 0.047Hibernate ORM 0.000384 0.063 0.016JBoss 0.2112 0.01 0.1JRuby 0.6816 0.29 0.02Pig 1.957e-05 0.12 0.02Wicket 0.1448 0.064 0.1

For studying the seasonality of the percentage of polite comments time series, weconsidered the results (from ‘Does politeness among developers affect the attractiveness ofa project?’) of the following tests:

• Augmented Dickey Fuller test: the null hypothesisH0 is that the data are non-stationaryand non-seasonal;• Kwiatkowski-Phillips–Schmidt–Shin (KPSS) test: the null hypothesisH0 is that the dataare stationary and non-seasonal.

The results of the tests are shown in Table 6.For randomness, the cells in grey indicate that the p-value for the corresponding test

is higher than 5% (our cutoff for significance), thus we infer in these cases that the testindicates randomness (null hypothesis H0 of randomness). On the contrary, cells in whiteindicate that the p-value for the corresponding test is lower than 5%. In the majority ofthe cases (16 out 22), the percentage of polite comments time series were not random. Forseasonality, the cells in grey indicate that the p-value for the corresponding test is less than5% (our cutoff for significance), thus we infer in these cases that the test rejects the nullhypothesis H0 of non-seasonality.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 19/35

Page 20: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 6 shows that for the Augmented Dickey Fuller test, 12 projects (HBase, Derby,Lucene Core, Cassandra, Harmony, OFBiz, Infrastructure, Camel, GeoServer, Geronimo,Groovy, JBoss) have a percentage of polite comments time series which presents seasonality,while 10 time series are not seasonal. For the KPSS test, the null hypothesis of non-seasonality is rejected for 16 time series (out of 22).

It is interesting to note that there are variations in the percentage of polite comments overtime. This is by nomeans a representation of a time dynamics, but simply the representationof random variation of the percentage of polite comments over time. Cassandra and OFBizpresent seasonality (Table 6) and Fig. 9 shows the seasonal component determined usingthe stl function (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/stl.html) in R. InCassandra, Fig. 9(A), and OFbiz, Fig. 9(B), we see how the percentage of polite commentsdecreases for some time interval and increases for some others. It is also possible to analysethe percentage of polite comments trend, while for Cassandra the trend is increasingstarting from the year 2011, for OFBiz the trend is decreasing.

Findings. The percentage of polite comments does vary over time and in some cases itchanges from lower percentage of polite comments to higher percentage of polite commentsfrom two consecutive observation intervals. This fact could be related to the compositionof our corpus. We considered only open source systems, hence there are no strictdeadlines or particular busy days (such as Fridays, as suggested by Śliwerski, Zimmermann &Zeller (2005)).

How does politeness vary with respect to JIRA maintenance typesand issue priorities?Motivation. Understanding which typology of issue attracts more impolite commentscould help both managers and developers better understand the development processand take action to better manage the distribution of issues within developmentteams. A classification of the type of issues, is provided on the JIRA wiki (https://cwiki.apache.org/confluence/display/FLUME/Classification+of+JIRA+Issues). Thefollowing list gives a brief introduction:

• Bug: this type of issue indicates a defect in the source code, such as logic errors,out-of-memory errors, memory leaks and run-time errors. Any failure of the product toperform as expected and any other unexpected or unwanted behaviour can be registeredas type Bug.• SubTask: this type of issue indicates that a task must be completed as an element ofa larger and more complex task. Subtask issues are useful for dividing a parent issueinto a number of smaller tasks, more manageable units that can be assigned and trackedseparately.• Task: this type of issue indicates a task that it is compulsory to complete.• Improvement: this type of issue indicates an improvement or enhancement to anexisting feature of the system.• New Feature: this type of issue indicates a new feature of the product yet to be developed.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 20/35

Page 21: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 9 Time series decomposition.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 21/35

Page 22: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 7 Maintenance statistics.

Type IMPOLITE POLITE

Bug 201,359 163,489Sub-task 24,333 22,909Task 19,379 14,442Improvement 90,477 90,564New Feature 33,640 39,538Wish 2573 2681Test 3788 3270New JIRA Project 172 210Brainstorming 98 172Umbrella 87 144

Table 8 Priority statistics.

Priotity IMPOLITE POLITE

Blocker 20,657 19,049Critical 21,517 19,410Major 241,012 219,841Minor 82,892 71,905Optional 105 52Trivial 12,009 8,479

• Wish: this type of issue is used to track general wishlist items, which could be classifiedas new features or improvements for the system under development.• Test: this type of issue can be used to track a new unit or integration test.• New JIRA Project: this type of issue indicates the request for a new JIRA project to beset up.• Brainstorming: this type of issue is more suitable for items in their early stage offormation not yet mature enough to be labelled as a Task or New Feature. It provides abucket where thoughts and ideas from interested parties can be recorded as the discussionand exchange of ideas progresses. Once a resolution is made, a Task can be created withall the details defined during the brainstorming phase.• Umbrella: this type of issue is an overarching type comprised of one or more sub-tasks.

Tables 7 and 8 provide information about the absolute number of issues (maintenancee priority) in our corpus.

To detect the level of politeness for each category of issue, we grouped the issue commentsfor type of maintenance and priority.

To justify claims such as ‘‘issues of type A tend to have more polite comments thanissues of type B’’, we used the multiple contrast test procedure (Konietschke, Hothorn &Brunner, 2012) using a 5% error rate. Instead of following a classical two-step approachin which a global null hypothesis is tested to begin with, and as a second step multiple

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 22/35

Page 23: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Table 9 Maintenance—significant results.

Pair Lower Upper p-value

Improvement—Bug 0.009 0.029 9.067173e–04New Feature—Bug 0.042 0.077 9.409836e–06New Feature—Improvement 0.023 0.059 2.411054e–04Task—Improvement −0.085 −0.048 5.512420e–06Sub-task—New Feature −0.071 −0.028 2.764748e–04Task—New Feature −0.132 −0.083 1.791890e–07Test—New Feature −0.083 −0.007 2.128063e–02Wish—New Feature −0.116 −0.024 4.615626e–03Task—Sub-task −0.081 −0.036 1.468676e–04Test—Task 0.025 0.101 2.563559e–03

comparisons are used to test sub-hypotheses related to each pair of groups, to answer ourresearch question, we used the Tukey’s test (Tukey, 1949) which is a single-step multiplecomparison procedure and statistical test.

To visualize the results obtained from the multiple contrast test procedure, we usedthe T∼ -graph presented by Vasilescu et al. (2014a). This visual comparison of multipledistributions using T∼-graphs provides an immediate understanding of groups locatedin higher positions in the graph, i.e., issue categories with higher politeness, while groupslocated in lower positions in the graph are related to issue categories with lower politeness.Other studies related to the application of T-graphs can be found in Vasilescu, Filkov &Serebrenik (2013) and Vasilescu et al. (2014b).

The approach used to build a T∼ -graph is the following (cf. Vasilescu et al. (2014a)):

• for each pair of groups it is necessary to analyse the 95% confidence interval to test ifthe corresponding null sub-hypothesis can be rejected;• If the lower boundary of the interval is greater than zero for groups A and B, we concludethat the metric value is higher in A than in B;• If the upper boundary of the interval is less than zero for groups A and B, we concludethat the metric value is lower in A than in B;• If the lower boundary of the interval is less than zero and the upper boundary is greaterthan zero, we conclude that the data does not provide enough evidence to reject the nullhypothesis.• based on the results of the comparisons we construct the graph with nodes being groupsand containing edges (A, B) if the metric value is higher in A than in B.

The result of the multiple contrast test procedure are presented in Tables 9 and 10.We used the mctp function for the Tukey’s test, from the nparcomp R package(https://cran.r-project.org/web/packages/nparcomp/nparcomp.pdf) to obtain the tables.Table 9 summarises the significant results which we used to build a T∼ -graph. For thecategory Brainstorming, the upper boundary of the interval is less than zero and the upperboundary is greater than zero. In this situation the data does not provide enough evidence

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 23/35

Page 24: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 10 T∼ -graph for issue maintenance.

Table 10 Issue priority classification.

Pair Lower Upper p-value

Critical—Blocker −0.018 0.022 0.997Major—Blocker −0.070 −0.042 0Minor—Blocker −0.093 −0.065 0Trivial—Blocker −0.149 −0.113 0Major—Critical −0.073 −0.043 0Minor—Critical −0.097 −0.066 0Trivial—Critical −0.152 −0.115 0Minor—Major −0.030 −0.016 0Trivial—Major −0.088 −0.062 0Trivial—Minor −0.066 −0.039 0

and we cannot conclude that, for example, Brainstorming issue are more (or less) politethe Bug issues. Same thing happens for the category Umbrella and New JIRA Project.

Figure 10 shows the T∼ -graph resulting from the values in Table 9. The categoryNew Feature is more polite than Test, Sub-task, Improvement andWish; Test, Sub-task andImprovement are more polite than Task; Improvement are more polite than Bug. Issues withmaintenance Bug are related to defects and software failures. This category presents thelower politeness. Issues with maintenance New Feature are proposals made by developersand it is interesting to see that when proposing something new, developers tend to be morepolite. Issues on JIRA are also classified considering the level of priority, as Major, Minor,Blocker (e.g., an issue which blocks development and/or testing work), Critical and Trivial.

Table 10 shows the results of the multiple contrast test procedure for the different groupsof issue priority. Figure 11 shows the associated T∼ -graph.

Blocker and Critical issues are more polite than Major, Minor and Trivial issues. Majorissues are more polite than Minor and Trivial, while Minor are more polite than Trivial.Trivial issues are characterised by lower politeness.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 24/35

Page 25: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Figure 11 T∼ -graph for issue Priority.

Findings. Comments related to issues with maintenance Bug, priority Minor and Trivial,tended to have a lower politeness. Issues with maintenance New Feature, priority Blocker andCritical, tended to have a higher politeness.

DISCUSSIONSoftware development, as well as other fields, is an activity organised around team-based environments. The implementation of team structures is not simple and does notnecessarily result in success, because it is not enough just to put people together in teamsand to presume that everybody knows or agrees onwhat to do (Allen & Hecht, 2004). Peopleworking together apply different personal assumptions and interpretations to their worktasks (Keyton & Beck, 2008). Hence, conflicts within teams are possible. Conflicts affectteams’ productivity and team leaders are certainly interested in knowing how to prevent,

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 25/35

Page 26: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

avoid, or, in the worst case, manage conflicts whichmight occur. In this paper, we presentedan analysis about the links between politeness and productivity of developers involved ina project. Politeness is a factor that certainly helps in diminishing conflict and frictionbetween people. The findings of this study contribute in highlighting the importance andthe impact of the psychological state of a developer on the software production process.

We started the paper by proposing that politeness information mined from softwarerepositories like issue repositories could offer a way to investigate productivity duringthe software development process. We showed that issue fixing time for polite issues wasshorter than issue fixing time for impolite and mixed-issues. This result matches commonsense; if someone is asked to accomplish a task in a polite way, there is higher possibilityfor a relaxed collaboration and faster results. On the other hand, impolite requests caneasily generate discomfort, stress and burnout, often negatively impacting the actual timetaken to complete a given task. Surprisingly, mixed issues (commented with both politeand impolite comments) presented longer fixing time than impolite issues. The mixedinteraction ‘polite-impolite’ between developers could explain this fact. Ortu et al. (2016)showed that when in the presence of impolite or negative comments, the probability ofthe next comment being impolite or negative was 13% and 25%, respectively. Hence partof the longer time could be spent in trying to shift the exchange of comments toward a(more) polite level. For example, in a small study of a single project with two deadlines(Guzman, 2013b), the authors find that, as deadlines came closer, more and longer emailswere exchanged with higher emotional intensity. The lower fixing time for impolite issues(compared to the fixing time ofmixed issues) could also be related to the fact that developers(especially newcomers) being addressed with impolite comments can react faster becausethey feel emotionally pushed and want to show (to the community) that they are able toaccomplish a task.

As a second point, we showed that a positive correlation existed between the percentageof polite comments and Magnetism and Stickiness of a project. For each project in thecorpus, we first calculated the percentage of polite comments per month and the valueof Magnetism and Stickiness, generating three time series. We studied each time seriesfor stationarity and then performed correlation analysis. We found that the percentage ofpolite comments is, for the majority of the project in our corpus, positive correlated withMagnetism and Stickiness. However, we need to point out that the first cross correlationanalysis performed for the non-stationary series presented weak correlation values (< 0.5).Higher correlation values were found after the cross correlation analysis of the stationaryseries (after differencing). The attractiveness of a project is indeed a complex phenomenonand there are different confounding factors (fame and importance of the project couldbe perceived also as a status-symbol, e.g., being part of the Linux developers community)of which politeness among developers can be part of. Further analysis on a larger corpusof projects are required. Our findings highlight the fact that politeness might be a factoraffecting the attractiveness of a project.

Third, we found that the percentage of polite comments over time was (for the majorityof the projects in our corpus) seasonal and not random. This is an interesting (andsomewhat expected) fact that could help managers and developers in better understanding

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 26/35

Page 27: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

the development process. Further experiments are required to better analyse the linksbetween seasonality and deadlines that developers face during the development phases.Higher workload could lead to lower politeness, while normal activities can be linked withhigher politeness.

Fourth, we studied how politeness varied with respect to JIRA maintenance types andissue priorities. Again, the results match common sense. Bug issues Maintenance werethose with lower politeness. When something is broken and needs to be fixed, the situationis less attractive and could generate impolite reactions. On the contrary,New Feature issuesMaintenance were the ones with higher politeness; those kind of issues indicate a newfeature yet to be developed, hence, there might be higher enthusiasm among developers(a developer can be the first one who started working for a New feature; the level offreedom felt for the task can be higher leading to higher politeness) when dealing with suchissues. Regarding issue Priority, Critical and Blocker issues were the categories with higherpoliteness, while Trivial issues were those characterised by lower politeness. Again, thisis what one would expect, since critical issues are both important and challenging, whiletrivial issues might be related to minor programming mistakes and/or poor knowledge ofprogramming practices

Knowing when lower politeness will occur can help managers in taking actions aimed atkeeping the general mood high and relaxed, lowering and preventing conflicts, obtaininghigher productivity as a result. The results presented in this study can be also helpful whendefining a team of developers. Knowing the profile (from a politeness point of view) of thedevelopers can provide hints for creating balanced teams. A JIRA plug-in able to presentthe politeness level of the communication flow in a graphical way (e.g., cockpit view) couldhelp both developers and managers in constantly monitoring the general mood of thedevelopers working for a company or for a project (in the case of open source collaborationparadigm).

THREATS TO VALIDITYThreats to external validity correspond to the generalisation of our results (Campbell &Stanley, 1963). In this study, we analysed comments from issue reports from 22 opensource projects. Our results cannot be representative of all environments or programminglanguages, we considered only open-source systems and this could affect the generality of thestudy. Commercial software is usually developed using different platforms and technologies,by developers with different knowledge and background, with strict deadlines and costlimitations. Replication of this work on other open source systems and on commercialprojects are needed to confirm our findings. Also, the politeness tool can be subject to biasdue the domain used to train the machine learning classifier.

Threats to internal validity concern confounding factors that can influence the obtainedresults. Based on empirical evidence, we suppose a relationship between the emotional stateof developers and what they write in issue reports (Pang & Lee, 2008). Since the main goalof developer communication is the sharing of information, the consequence of removing orcamouflaging emotionsmay make comments lessmeaningful and causemisunderstanding.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 27/35

Page 28: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

This work is focused on sentences written by developers for developers. To illustrate theinfluence of these comments, it is important to understand the language used by developers.We believe that the tool used formeasuring politenessDanescu-Niculescu-Mizil et al. (2013)is valid in the software engineering domain, since the developers also used requests postedon Stack Overflow to train the classifier. The comments used in this study were collectedover an extended period from developers unaware of being monitored. For this reason, weare confident that the emotions we analyzed were genuine. We do not claim any causalitybetween politeness and the issue resolution time, but we built an explanatory model tounderstand the characteristics of issues with short and long fixing time. Confounds couldhave affected validity of the results for RQ1 about lower issue fixing time for polite andmixed issues. The number of developers involved in discussing issues might differ, aswell as severity and complexity of an issue under analysis. Another threat to validity isrelated to classification of JIRA issue types. As highlighted by Herzig, Just & Zeller (2013)the categorisation of issue reports is dependent on the perspective of the observer. This factcould affect the results obtained for research question 4.

Threats to construct validity focus on how accurately the observations describe thephenomena of interest. The detection of emotions from issue reports presents difficultiesdue to vagueness and subjectivity. The politeness measures are approximated and cannotperfectly identify the precise context, given the challenges of natural language and subtlephenomena like sarcasm.

CONCLUSIONS AND FUTURE WORKSoftware engineers have been trying to measure software to gain quantitative insights intoits properties and quality since its inception. In this paper, we present the results aboutpoliteness and attractiveness on 22 open-source software projects developed using theAgile board of the JIRA repository. Our results show that the level of politeness in thecommunication process among developers does have an effect on both the time requiredto fix issues and the attractiveness of the project to both active and potential developers.The more polite developers were, the less time it took to fix an issue. In the majority ofcases, the more the developers wanted to be part of project, the more they were willing tocontinueworking on the project over time. This work is a starting point and further researchon a larger number of projects is needed to validate our findings especially, consideringproprietary software developed by companies, different programming languages anddifferent dimension. The development of proprietary software follows different dynamics(e.g., strict deadlines and given budget) and this fact could lead to different results. Westarted the development of an application which will be able to automatically analyse allthe comments on a issue tracking systems (as we have done for this paper) and will providereports and data to managers, team-leaders and/or developers interested in understandingthe health of a project from a ‘‘mood’’ point of view. The takeawaymessage is that politenesscan only have a positive effect on a project and on the development process.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 28/35

Page 29: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

ACKNOWLEDGEMENTSThe authors would like to thank the anonymous reviewers for their valuable commentsand suggestions to improve the quality of the paper.

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThe research presented in this paper was partly funded by the Engineering and PhysicalSciences Research Council (EPSRC) of the UK under grant ref: EP/M024083/1. The fundershad no role in study design, data collection and analysis, decision to publish, or preparationof the manuscript.

Grant DisclosuresThe following grant information was disclosed by the authors:Engineering and Physical Sciences Research Council (EPSRC): EP/M024083/1.

Competing InterestsThe authors declare there are no competing interests.

Author Contributions• Giuseppe Destefanis, Marco Ortu and Steve Counsell conceived and designedthe experiments, performed the experiments, analyzed the data, contributedreagents/materials/analysis tools, wrote the paper, prepared figures and/or tables,performed the computation work, reviewed drafts of the paper.• Stephen Swift performed the experiments, analyzed the data, contributed reagents/ma-terials/analysis tools, performed the computation work, reviewed drafts of the paper.• Michele Marchesi contributed reagents/materials/analysis tools, reviewed drafts of thepaper.• Roberto Tonelli performed the experiments, contributed reagents/materials/analysistools, performed the computation work, reviewed drafts of the paper.

Data AvailabilityThe following information was supplied regarding data availability:

We used a public dataset available at http://openscience.us/repo/social-analysis/social-aspects.html, and all the scripts we prepared (along with the raw data) can be found atBitBucket: https://bitbucket.org/giuseppedestefanis/peerjcs_replicationpackage/

REFERENCESAcuña ST, GómezM, Juristo N. 2008. Towards understanding the relationship between

team climate and software quality—a quasi-experimental study. Empirical SoftwareEngineering 13(4):401–434 DOI 10.1007/s10664-008-9074-8.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 29/35

Page 30: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Allen NJ, Hecht TD. 2004. The ‘romance of teams’: toward an understanding of itspsychological underpinnings and implications. Journal of Occupational and Orga-nizational Psychology 77(4):439–461 DOI 10.1348/0963179042596469.

Amabile T, Barsade SG, Mueller JS, Staw BM. 2005. Affect and creativity at work.Administrative Science Quarterly 50(3):367–403 DOI 10.2189/asqu.2005.50.3.367.

Banerjee A, Dolado JJ, Galbraith JW, Hendry D, et al. 1993. Co-integration, errorcorrection, and the econometric analysis of non-stationary data. OUP Catalogue.

Bartels R. 1982. The rank version of von neumann’s ratio test for randomness. Journal ofthe American Statistical Association 77(377):40–46DOI 10.1080/01621459.1982.10477764.

Bazelli B, Hindle A, Stroulia E. 2013. On the personality traits of Stack Overflow users.In: Software Maintenance (ICSM), 2013 29th IEEE International Conference on.Piscataway: IEEE, 460–463.

Beck K, Beedle M, Van BennekumA, Cockburn A, CunninghamW, Fowler M,Grenning J, Highsmith J, Hunt A, Jeffries R, Kern J. 2001.Manifesto for agilesoftware development. Available at http://www.agilemanifesto.org .

Box GE, Pierce DA. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statisticalAssociation 65(332):1509–1526 DOI 10.1080/01621459.1970.10481180.

Brief AP,Weiss HM. 2002. Organizational behavior: affect in the workplace. AnnualReview of Psychology 53(1):279–307 DOI 10.1146/annurev.psych.53.100901.135156.

Brockwell PJ, Davis RA. 2006. Introduction to time series and forecasting . New York:Springer Science & Business Media.

Campbell DT, Stanley JC. 1963. Experimental and quasi-experimental designs forgeneralized causal inference. Boston: Houghton Mifflin.

Capretz LF. 2003. Personality types in software engineering. International Journal ofHuman-Computer Studies 58(2):207–214 DOI 10.1016/S1071-5819(02)00137-4.

Cockburn A, Highsmith J. 2001. Agile software development: the people factor.Computer 11:131–133.

Curtis B, Krasner H, Iscoe N. 1988. A field study of the software design process for largesystems. Communications of the ACM 31(11):1268–1287 DOI 10.1145/50087.50089.

Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J, Potts C. 2013. A com-putational approach to politeness with application to social factors. In: Proceedings ofACL.

Diebold FX, Rudebusch GD. 1991. On the power of Dickey-Fuller tests against fractionalalternatives. Economics Letters 35(2):155–160 DOI 10.1016/0165-1765(91)90163-F.

Ehlers J. 2015. Socialness in the recruiting of software engineers. In: Proceedings of the12th ACM international conference on computing frontiers. New York: ACM, 33.

Erez A, Isen AM. 2002. The influence of positive affect on the components of expectancymotivation. Journal of Applied Psychology 87(6):1055–1067DOI 10.1037/0021-9010.87.6.1055.

Fagerholm F, IkonenM, Kettunen P, Münch J, Roto V, Abrahamsson P. 2014.How dosoftware developers experience team performance in lean and agile environments?

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 30/35

Page 31: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

In: Proceedings of the 18th international conference on evaluation and assessment insoftware engineering . New York: ACM, 7.

Feldt R, Torkar R, Angelis L, SamuelssonM. 2008. Towards individualized softwareengineering: empirical studies should collect psychometrics. In: Proceedings ofthe 2008 international workshop on Cooperative and human aspects of softwareengineering . New York: ACM, 49–52.

Garcia D, Zanetti MS, Schweitzer F. 2013. The role of emotions in contributors activity:a case study on the Gentoo community. In: Cloud and green computing (CGC), 2013third international conference on. Piscataway: IEEE, 410–417.

GómezMN, Acuña ST, GeneroM, Cruz-Lemus JA. 2012.How does the extraversionof software development teams influence team satisfaction and software quality?A controlled experiment. International Journal of Human Capital and InformationTechnology Professionals (IJHCITP) 3(4):11–24 DOI 10.4018/jhcitp.2012100102.

Graziotin D,Wang X, Abrahamsson P. 2014.Happy software developers solve problemsbetter: psychological measurements in empirical software engineering. PeerJ 2:e289DOI 10.7717/peerj.289.

Graziotin D,Wang X, Abrahamsson P. 2015.How do you feel, developer? An explana-tory theory of the impact of affects on programming performance. PeerJ ComputerScience 1:e18 DOI 10.7717/peerj-cs.18.

Guzman E. 2013a. Visualizing emotions in software development projects. In: 2013 1stIEEE working conference on software visualization—proceedings of VISSOFT 2013.Piscataway: IEEE, 1–4.

Guzman E. 2013b. Visualizing emotions in software projects. In: Software visualization(VISSOFT), 2013 first IEEE working conference on. Piscataway: IEEE, 1–4.

Guzman E, Azócar D, Li Y. 2014. Sentiment analysis of commit comments in GitHub: anempirical study. In: Proceedings of the 11th Working Conference on Mining SoftwareRepositories—MSR 2014. New York: ACM Press, 352–355.

Guzman E, Bruegge B. 2013. Towards emotional awareness in software developmentteams. In: Proceedings of the 2013 9th joint meeting on foundations of softwareengineering . New York: ACM Press, 671–674.

Hamilton JD. 1994. Time series analysis, vol. 2. Princeton university press Princeton.Herzig K, Just S, Zeller A. 2013. It’s not a bug, it’s a feature: how misclassification

impacts bug prediction. In: Proceedings of the 2013 international conference onsoftware engineering . Piscataway: IEEE Press, 392–401.

Jongeling R, Datta S, Serebrenik A. 2015. Choosing your weapons: on sentiment analysistools for software engineering research. In: Software maintenance and evolution(ICSME), 2015 IEEE international conference on. Piscataway: IEEE, 531–535.

Kaluzniacky E. 2004.Managing psychological factors in information systems work: anorientation to emotional intelligence. Hershey: IGI Global.

Keyton J, Beck SJ. 2008. Team attributes, processes, and values: a pedagogical framework.Business Communication Quarterly 71(4):488–504 DOI 10.1177/1080569908325863.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 31/35

Page 32: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Konietschke F, Hothorn LA, Brunner E. 2012. Rank-based multiple test proceduresand simultaneous confidence intervals. Electronic Journal of Statistics 6:738–759DOI 10.1214/12-EJS691.

Kramer ADI, Guillory JE, Hancock JT. 2014. Experimental evidence of massive-scaleemotional contagion through social networks. Proceedings of the National Academy ofSciences of the United States of America 111(24):8788–8790DOI 10.1073/pnas.1320040111.

Krištoufek L. 2014.Measuring correlations between non-stationary series with DCCAcoefficient. Physica A: Statistical Mechanics and its Applications 402:291–298DOI 10.1016/j.physa.2014.01.058.

KruskalWH,Wallis WA. 1952. Use of ranks in one-criterion variance analysis. Journal ofthe American statistical Association 47(260):583–621DOI 10.1080/01621459.1952.10483441.

Kwiatkowski D, Phillips PC, Schmidt P, Shin Y. 1992. Testing the null hypothesis ofstationarity against the alternative of a unit root: how sure are we that economic timeseries have a unit root? Journal of econometrics 54(1):159–178DOI 10.1016/0304-4076(92)90104-Y.

Ljung GM, Box GE. 1978. On a measure of lack of fit in time series models. Biometrika65(2):297–303 DOI 10.1093/biomet/65.2.297.

Miner AG, Glomb TM. 2010. State mood, task performance, and behavior at work: awithin-persons approach. Organizational Behavior and Human Decision Processes112(1):43–57 DOI 10.1016/j.obhdp.2009.11.009.

Murgia A, Concas G, Tonelli R, OrtuM, Demeyer S, Marchesi M. 2014a. On theinfluence of maintenance activity types on the issue resolution time. In: Proceedingsof the 10th international conference on predictive models in software engineering . NewYork: ACM, 12–21.

Murgia A, Tourani P, Adams B, OrtuM. 2014b. Do developers feel emotions? anexploratory analysis of emotions in software artifacts. In: Proceedings of the 11thworking conference on mining software repositories. New York: ACM, 262–271.

Novielli N, Calefato F, Lanubile F. 2014. Towards discovering the role of emotions inStack Overflow. In: Proceedings of the 6th international workshop on social softwareengineering . New York: ACM, 33–36.

OrtuM, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R. 2015a. Arebullies more productive? Empirical study of affectiveness vs. issue fixing time. In:Proceedings of the 12th working conference on mining software repositories, MSR 2015.

OrtuM, Destefanis G, Counsell S, Swift S, Tonelli R, Marchesi M. 2016. Arsonistsor firefighters? Affectiveness in agile software development. In: Agile processes,in software engineering, and extreme programming . Berlin Heidelberg: SpringerInternational Publishing, 144–155.

OrtuM, Destefanis G, KassabM, Counsell S, Marchesi M, Tonelli R. 2015b. Wouldyou mind fixing this issue? an empirical analysis of politeness and attractiveness insoftware developed using agile boards. In: Agile processes, in software engineering,

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 32/35

Page 33: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

and extreme programming . Berlin Heidelberg: Springer International Publishing,129–140.

OrtuM, Destefanis G, KassabM,Marchesi M. 2015c.Measuring and understanding theeffectiveness of jira developers communities. In: Proceedings of the 6th InternationalWorkshop on Emerging Trends in Software Metrics, WETSoM 2015.

OrtuM, Destefanis G, Murgia A, Marchesi M, Tonelli R, Adams B. 2015d. The JIRArepository dataset: understanding social aspects of software development. In:Proceedings of the 11th international conference on predictive models and data analyticsin software engineering . New York: ACM, 1.

Pang B, Lee L. 2008. Opinion mining and sentiment analysis. Foundations and Trends inInformation Retrieval 2(1–2):1–135 DOI 10.1561/1500000011.

Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC. 2015.How canI improve my app? Classifying user reviews for software maintenance and evolution.In: Software maintenance and evolution (ICSME), 2015 IEEE international conferenceon. Piscataway: IEEE, 281–290.

Pennebaker JW, Francis ME, Booth RJ. 2001. Linguistic inquiry and word count: Liwc2001.Mahway: Lawrence Erlbaum Associates 71:2001.

Perry T. 2008. Drifting toward invisibility: the transition to the electronic task board. In:Agile, 2008. AGILE’08. Conference. Piscataway: IEEE, 496–500.

Pletea D, Vasilescu B, Serebrenik A. 2014. Security and emotion: sentiment analysisof security discussions on GitHub. In: Proceedings of the 11th working conference onmining software repositories. New York: ACM, 348–351.

Priestley MB. 1981. Spectral analysis and time series. San Diego: Academic Press.Priestley MB. 1988.Non-linear and non-stationary time series analysis. Amsterda:

Elsevier.RDevelopment Core Team. 2014. R: a language and environment for statistical comput-

ing . Vienna: the R Foundation for Statistical Computing. Available at http://www.R-project.org/ .

Rigby PC, Hassan AE. 2007.What can OSS mailing lists tell us? A preliminary psycho-metric text analysis of the Apache developer mailing list. In: Proceedings of the fourthinternational workshop on mining software repositories. Piscataway: IEEE ComputerSociety, 23.

Roberts JA, Hann I-H, Slaughter SA. 2006. Understanding the motivations, participa-tion, and performance of open source software developers: a longitudinal study ofthe Apache projects.Management Science 52(7):984–999DOI 10.1287/mnsc.1060.0554.

Rousinopoulos A-I, Robles G, González-Barahona JM. 2014. Sentiment analysisof free/open source developers: preliminary findings from a case study. RevistaEletrônica de Sistemas de Informação 13(2):1677–3071 DOI 10.5329/RESI.

Said SE, Dickey DA. 1984. Testing for unit roots in autoregressive-moving averagemodels of unknown order. Biometrika 71(3):599–607 DOI 10.1093/biomet/71.3.599.

Siegel S. 1956.Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 33/35

Page 34: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Śliwerski J, Zimmermann T, Zeller A. 2005. Don’t program on Fridays! How to locatefix-inducing changes. In: Proceedings of the 7th workshop software reengineering .Available at http:// thomas-zimmermann.com/publications/ files/ sliwerski-wsr-2005.pdf .

Steinmacher I, Conte TU, Gerosa M, Redmiles D. 2015. Social barriers faced bynewcomers placing their first contribution in open source software projects. In:Proceedings of the 18th ACM conference on computer supported cooperative work &social computing . New York: ACM, 1–13.

Tan S, Howard-Jones P. 2014. Rude or polite: do personality and emotion in an artificialpedagogical agent affect task performance? In: 2014 global conference on teaching andlearning with technology (CTLT 2014). 41.

Tourani P, Jiang Y, Adams B. 2014.Monitoring sentiment in open source mailinglists: exploratory study on the Apache ecosystem. In: Proceedings of 24th annualinternational conference on computer science and software engineering . Armonk: IBMCorporation, 34–44. Available at http://mcis.soccerlab.polymtl.ca/publications/2014/cascon14.pdf .

Tsay J, Dabbish L, Herbsleb J. 2014. Let’s talk about it: evaluating contributions throughdiscussion in GitHub. In: Proceedings of the 22nd ACM SIGSOFT internationalsymposium on foundations of software engineering . New York: ACM, 144–154.

Tukey JW. 1949. Comparing individual means in the analysis of variance. Biometrics5(2):99–114 DOI 10.2307/3001913.

Vasilescu B, Filkov V, Serebrenik A. 2013. Stack overflow and GitHub: associationsbetween software development and crowdsourced knowledge. In: Social computing(SocialCom), 2013 international conference on. Piscataway: IEEE, 188–195.

Vasilescu B, Serebrenik A, GoeminneM,Mens T. 2014a. On the variation and special-isation of workload—a case study of the Gnome ecosystem community. EmpiricalSoftware Engineering 19(4):955–1008 DOI 10.1007/s10664-013-9244-1.

Vasilescu B, Serebrenik A, Mens T, Van den BrandMGJ, Pek E. 2014b.How healthy aresoftware engineering conferences? Science of Computer Programming 89:251–272DOI 10.1016/j.scico.2014.01.016.

VersionOne. 2013. 8th Annual State of Agile Survey report. San Francisco: VersionOne.Available at https://www.versionone.com/pdf/2013-state-of-agile-survey.pdf .

Weiss C, Premraj R, Zimmermann T, Zeller A. 2007.How long will it take to fixthis bug? In: Proceedings of the fourth international workshop on mining softwarerepositories. Piscataway: IEEE Computer Society, 1.

Winschiers H, Paterson B. 2004. Sustainable software development. In: Proceedings of the2004 annual research conference of the South African Institute of Computer Scientistsand Information Technologists on IT research in developing countries. South AfricanInstitute for Computer Scientists and Information Technologists, 274–278.

Yamashita K, Kamei Y, McIntosh S, Hassan AE, Ubayashi N. 2016.Magnet orsticky? Measuring project characteristics from the perspective of developerattraction and retention. Journal of Information Processing 24(2):339–348DOI 10.2197/ipsjjip.24.339.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 34/35

Page 35: Software development: do good manners matter? · Software development: do good manners matter? Giuseppe Destefanis 1, Marco Ortu2, Steve Counsell , Stephen Swift , Michele Marchesi2

Yamashita K, McIntosh S, Kamei Y, Ubayashi N. 2014.Magnet or sticky? An OSSproject-by-project typology. In: Proceedings of the 11th working conference on miningsoftware repositories. New York: ACM, 344–347.

Zhang H, Gong L, Versteeg S. 2013. Predicting bug-fixing time: an empirical study ofcommercial software projects. In: Proceedings of the 2013 international conference onsoftware engineering . Piscataway: IEEE Press, 1042–1051.

Zhang F, Khomh F, Zou Y, Hassan AE. 2012. An empirical study on factors impactingbug fixing time. In: 2012 19th working conference on reverse engineering (WCRE).Piscataway: IEEE, 225–234.

Destefanis et al. (2016), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.73 35/35