sentiment analysis tools for software engineering research cannot be used out of the box

16
On sentiment analysis tools for software engineering research Robbert Jongeling Subhajit Datta Alexander Serebrenik Eindhoven U of Technology (NL) Singapore U of Technology and Design (SG) Eindhoven U of Technology (NL) @jongeling_r @datta_subhajit @aserebrenik

Upload: alexander-serebrenik

Post on 12-Feb-2017

757 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Sentiment analysis tools for software engineering research cannot be used out of the box

On sentiment analysis tools for software engineering research

Robbert Jongeling Subhajit Datta Alexander SerebrenikEindhoven U of Technology (NL)

Singapore U of Technology and Design (SG)

Eindhoven U of Technology (NL)

@jongeling_r @datta_subhajit @aserebrenik

Page 2: Sentiment analysis tools for software engineering research cannot be used out of the box

E. Guzman, D. Azócar, and Y. Li, “Sentiment analysis of commit

comments in GitHub: An empirical study,” MSR 2014

A.-I. Rousinopoulos, G. Robles, and J. M. González-Barahona, “Sentiment

analysis of Free/Open Source developers: preliminary findings from a case study,” Revista Eletrônica de

Sistemas de Informação, 2014

E. Guzman and B. Bruegge, “Towards emotional awareness in software

development teams,” in Joint Meeting on Foundations of Software Engineering, 2013

D. Pletea, B. Vasilescu, and A. Serebrenik, “Security and emotion: Sentiment analysis of security discussions on GitHub”, MSR

2014

M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies

more productive? empirical study of affectiveness vs. issue fixing time,” in MSR

2015

D. Garcia, M. S. Zanetti, and F. Schweitzer, “The role of emotions in contributors activity: A case study on the Gentoo

community,” in International Conference on Cloud and Green Computing, 2013

Page 3: Sentiment analysis tools for software engineering research cannot be used out of the box

E. Guzman, D. Azócar, and Y. Li, “Sentiment analysis of commit

comments in GitHub: An empirical study,” MSR 2014

A.-I. Rousinopoulos, G. Robles, and J. M. González-Barahona, “Sentiment

analysis of Free/Open Source developers: preliminary findings from a case study,” Revista Eletrônica de

Sistemas de Informação, 2014

E. Guzman and B. Bruegge, “Towards emotional awareness in software

development teams,” in Joint Meeting on Foundations of Software Engineering, 2013

D. Pletea, B. Vasilescu, and A. Serebrenik, “Security and emotion: Sentiment analysis of security discussions on GitHub”, MSR

2014

M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies

more productive? empirical study of affectiveness vs. issue fixing time,” in MSR

2015

D. Garcia, M. S. Zanetti, and F. Schweitzer, “The role of emotions in contributors activity: A case study on the Gentoo

community,” in International Conference on Cloud and Green Computing, 2013

NLTK SentiStrength

Page 4: Sentiment analysis tools for software engineering research cannot be used out of the box

E. Guzman, D. Azócar, and Y. Li, “Sentiment analysis of commit

comments in GitHub: An empirical study,” MSR 2014

A.-I. Rousinopoulos, G. Robles, and J. M. González-Barahona, “Sentiment

analysis of Free/Open Source developers: preliminary findings from a case study,” Revista Eletrônica de

Sistemas de Informação, 2014

E. Guzman and B. Bruegge, “Towards emotional awareness in software

development teams,” in Joint Meeting on Foundations of Software Engineering, 2013

D. Pletea, B. Vasilescu, and A. Serebrenik, “Security and emotion: Sentiment analysis of security discussions on GitHub”, MSR

2014

M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies

more productive? empirical study of affectiveness vs. issue fixing time,” in MSR

2015

D. Garcia, M. S. Zanetti, and F. Schweitzer, “The role of emotions in contributors activity: A case study on the Gentoo

community,” in International Conference on Cloud and Green Computing, 2013

NLTK SentiStrength

Trained on movie/product reviews. Threat: might misidentify (or fail to identify) a sentiment in a software engineering artefact

Page 5: Sentiment analysis tools for software engineering research cannot be used out of the box

• RQ1: To what extent do different sentiment analysis tools agree with emotions of software developers?

• RQ2: To what extent do different sentiment analysis tools agree with each other?

• RQ3: Do different sentiment analysis tools lead to contradictory results in a software engineering study?

Page 6: Sentiment analysis tools for software engineering research cannot be used out of the box

Murgia et al. MSR 2014

392 comments x 4 evaluators

joy love surprise anger fearsadness

positive negative{ {RQ1 RQ2

Page 7: Sentiment analysis tools for software engineering research cannot be used out of the box

Murgia et al. MSR 2014

392 comments x 4 evaluators

joy love surprise anger fearsadness

positive negative{ {

Consistent: positive: 3 positive, none negative negative: 3 negative, none positive neutral: ≥3 without emotion indication

AlchemyStanford NLPNLTKSentiStrength

RQ1Manual

neg neu pos

Tool

neg

neu

pos

RQ2Tool A

neg neu pos

Tool B

neg

neu

pos

RQ1 RQ2

Page 8: Sentiment analysis tools for software engineering research cannot be used out of the box

Murgia et al. MSR 2014

392 comments x 4 evaluators

joy love surprise anger fearsadness

positive negative{ {

Consistent: positive: 3 positive, none negative negative: 3 negative, none positive neutral: ≥3 without emotion indication

AlchemyStanford NLPNLTKSentiStrength

RQ1Manual

neg neu pos

Tool

neg

neu

pos

5424

217

0 ≤ Adjusted Rand Index ≤ 1[Santos, Embrechts, ICANN 2009]

RQ2Tool A

neg neu pos

Tool B

neg

neu

pos

RQ1 RQ2

Page 9: Sentiment analysis tools for software engineering research cannot be used out of the box

Murgia et al. MSR 2014

392 comments x 4 evaluators

joy love surprise anger fearsadness

positive negative{ {

Consistent: positive: 3 positive, none negative negative: 3 negative, none positive neutral: ≥3 without emotion indication

AlchemyStanford NLPNLTKSentiStrength

RQ1Manual

neg neu pos

Tool

neg

neu

pos

5424

217

0 ≤ Adjusted Rand Index ≤ 1[Santos, Embrechts, ICANN 2009]

RQ2Tool A

neg neu pos

Tool B

neg

neu

pos

RQ1 RQ2

Page 10: Sentiment analysis tools for software engineering research cannot be used out of the box

RQ1: To what extent do different sentiment analysis tools agree with emotions of software developers?

RQ1Manual

neg neu pos

NLTK

neg 19 51 11

neu 0 138 7

pos 5 28 36

Tool ARI

NLTK 0.239

SentiStrength 0.113

Stanford NLP 0.108

Alchemy 0.079

Tools do not agree with manual evaluation

RQ1 RQ2

Page 11: Sentiment analysis tools for software engineering research cannot be used out of the box

RQ2: To what extent do different sentiment analysis tools agree with each other?

RQ2SentiStrength

neg neu pos

NLTK

neg 17 39 25

neu 15 96 34

pos 6 20 43

Tool A Tool B ARINLTK Alchemy 0.104NLTK SentiStrength 0.090

Tools do not agree with each other

RQ1 RQ2

Page 12: Sentiment analysis tools for software engineering research cannot be used out of the box

RQ3

issue tracker

over

text

response time

Sentiment Analysis Tool

compare times for neg, neu, pos

issues/questionsq & a site

NLTK

Page 13: Sentiment analysis tools for software engineering research cannot be used out of the box

issue tracker

over

text

response time

Sentiment Anal. Tool

compare times for neg, neu, pos

issues/questionsq & a site

NLTK ∩ SentiStrength

issue tracker

over

text

response time

Sentiment Anal. Tool

compare times for neg, neu, pos

issues/questionsq & a site

SentiStrength

RQ3

issue tracker

over

text

response time

Sentiment Analysis Tool

compare times for neg, neu, pos

issues/questionsq & a site

NLTK

Are the results the same?

Page 14: Sentiment analysis tools for software engineering research cannot be used out of the box

NLTK SentiStrength NLTK ∩ SentiStrength

ASFdescr

neg > neu*** neg > neu***pos > neu*** pos > neu*** pos > neu***

pos > neg*** pos > neg***

ASF titleneg > neu**pos > neu*** pos > neu**

pos > neg* pos > neg**

GNOME descr

neg > neu*** neg > neu*** neg > neu***pos > neu*** pos > neu*** pos > neu***pos > neg***

neg > pos***SO

descr ø neg > pos* ø

RQ3 RQ3: Do different sentiment analysis tools lead to contradictory results in a software engineering study?

Choice of the sentiment analysis tool affects results of the software engineering study

Page 15: Sentiment analysis tools for software engineering research cannot be used out of the box

Tools do not agree with manual evaluationTools do not agree with each other

Choice of the sentiment analysis tool affects results of the software engineering study

SummarySentiment analysis tools are trained on movie/

product reviews. Threat: might misidentify (or fail to identify) a sentiment in a software engineering artefact

Page 16: Sentiment analysis tools for software engineering research cannot be used out of the box

Next steps?

• Train sentiment analysis tools on software engineering data

• Data of Murgia et al.: first step

• More and better-suited data is needed