can development work describe itself?
Post on 14-Sep-2014
2.139 views
DESCRIPTION
Work descriptions are informal notes taken by developers to summarize work achieved in a particular session. Existing studies indicate that maintaining them is a distracting task, which costs a developer more than 30 min. a day. The goal of this research is to analyze the purposes of work descriptions, and find out if automated tools can assist developers in efficiently creating them. For this, we mine a large dataset of heterogeneous work descriptions from open source and commercial projects. We analyze the semantics of these documents and identify common information entities and granularity levels. Information on performed actions, concerned artifacts, references and new work, shows the work management purpose of work descriptions. Information on problems, rationale and experience shows their knowledge sharing purpose. We discuss how work description information, in particular information used for work management, can be generated by observing developers' interactions. Our findings have many implications for next generation software engineering tools. Paper: Walid Maalej and Hans-Jörg Happel, Can Development Work Describe Itself? In Proceedings of the 7th IEEE Conference on Mining Software Repositories, IEEE CS, 2010.TRANSCRIPT
CanDevelopmentWorkDescribeItself?
WalidMaalej,TechnischeUniversitätMünchen
Hans‐JörgHappel,FZIResearchCenterKarlsruhe
MSR’2010,CapeTown,SouthAfrica,Mai2010
©W.Maalej,Mai2010
ExecuIveSummary
AnalyzingWorkDescripIon–MSR2010 2
GroundedTheoryonWorkDescripIons
Informalnotesthatdescribedevelopers’workcontainwell‐definedsemanKcs,granularitylevels,andinformaKonpaMers
1
Toalargeextent,workdescripKonscanbegeneratedbyobservingtheworkcontextofdevelopersandtheirinteracKons
2
©W.Maalej,Mai2010 3
Outline
WorkDescripIonAutomaIon
ResearchSeRng
Research Results
MoIvaIon
2
1
3
4
AnalyzingWorkDescripIon–MSR2010
©W.Maalej,Mai2010
WhatAreWorkDescripIons?
AnalyzingWorkDescripIon–MSR2010 4
TimesheetSocialmedia
CommentsCommitmessage ArIfactsincludingwork
descripIons
Personalnote
AworkdescripIonisaninformaltextwriTenbyaknowledgeworkertosummarizeachievementsandothernotableissuesofaparIcularworksession
©W.Maalej,Mai2010
PreviousStudiesShowedInteresIngProperIesofWorkDescripIons
AnalyzingWorkDescripIon–MSR2010 5
Effort and Quality Issues
5%ofdevelopers‘Imeisspentfordescribingwork(30min.perday)
10%ofthesessionshavepseudodescripIons(eithernoImeornotmoIvaIon)
Regularities in Content and Metadata
Theoverallvocabularyusageseemstobepredictable
Thevocabularysizeisrathersmall
Differentprojectshavesimilarrankingofterms
To which extent can developers‘ work descriptions be automated?
©W.Maalej,Mai2010 6
Outline
WorkDescripIonAutomaIon
ResearchSeRng
Research Results
MoIvaIon
2
1
3
4
AnalyzingWorkDescripIon–MSR2010
©W.Maalej,Mai2010
ResearchQuesIons
AnalyzingWorkDescripIon–MSR2010 7
ContentofWorkDescripIons
ThesemanIcsofinformaIonincludedinworkdescripIons
InformaIonEnIIes
TextfragmentswithsimilarsemanIcs
InformaIonGranularity
Thelevelsofdetailincluded(abstracIonlevels)
PreferencesOccurrences
Whichinfor‐maIonenIIesareincludedandhowo`en?
CombinaIons
HowaretheseenIIescombined?
DocertaindevelopersprefercertaininformaIon?
Levels
Whataregranularitylevels?
Causes
WhichproperIeseffectthegranularity?
©W.Maalej,Mai2010
DataSetsCollectedinDifferentContexts
Dataset Summary Period Developers Entries
MyCompDevelopers‘personalnotesataGermansoTwarecompany
2001–2009 25 38,005
ApacheCommitmessagesandcodecommentsofallApacheprojects
2001–2009 1,145 598,418
UnicaseCommitmessagesandcodecommentsoftheunicaseproject
2008–2009 18 5097
EurekaPersonalnotesinaobservaKonalstudyat5companies
2008 21 91
AnalyzingWorkDescripIon–MSR2010 8
©W.Maalej,Mai2010
TheDataAnalysisProcess
AnalyzingWorkDescripIon–MSR2010 9
©W.Maalej,Mai2010 10
Outline
WorkDescripIonAutomaIon
ResearchSeRng
ResearchResults
MoIvaIon
2
1
3
4
AnalyzingWorkDescripIon–MSR2010
©W.Maalej,Mai2010
InformaIonEnIIesandTheirUsageFrequencies
AnalyzingWorkDescripIon–MSR2010 11
Occurrences %
Entity Average Apache Mycomp Unicase Eureka
Activity 71 69 76 71 67
Artifact 55 60 53 49 58
Problem 47 47 47 49 45
Rationale 28 30 29 25 31
New Work 24 24 20 28 22
Status 19 24 20 17 15
Reference 15 15 19 17 10
Solution 15 19 15 16 11
Experience 10 11 6 9 13
©W.Maalej,Mai2010
FindingsonInformaIonEnIIes
AnalyzingWorkDescripIon–MSR2010 12
ThemajorityofinformaKononperformedacKviKes(82%)iscombinedwithconcernedarKfacts
1
InformaKononproblemsisusedtodescribeworkdone,workneedtobedone,andthecontextofexperiences
2
ThecombinaKonpaMernsshowthatsharingknowledgeandmanagingworkaretwogoalsofworkdescripKons
3
Theretwoclustersofdevelopers:thosewhoprefertousearKfactsandthosewhoprefertouseproblemstodescribework
4
©W.Maalej,Mai2010
GranularityLevelsandUsageFrequencies
AnalyzingWorkDescripIon–MSR2010 13
Granularity Level
Occurrences %
Average Apache Mycomp Unicase Eureka
Domain
Implementation 54 58 37 62 60
Project 31 29 34 29 30
Requirement 12 10 26 6 7
Object
Method 33 33 49 28 20
Class 29 29 25 31 32
Line 17 17 8 17 27
Component 15 14 16 19 10
Activity
Edit 53 55 41 57 60
SE Process 36 34 42 30 39
Knowledge 12 13 15 11 9
©W.Maalej,Mai2010
FindingsonInformaIonGranularity
AnalyzingWorkDescripIon–MSR2010 14
ThemajorityofworkdescripKons(62%)includeinformaKonfromasinglegranularitylevel
1
Developersthinkconsistently(inasingleabstracKonlevel)whentakingnotesaboutarKfacts
2
Theshorterthesessionisthemorefine‐grainedarethedescribedarKfacts
3
LevelsofacKvitygranularityoverlap(edit,processandknowledge)
4
©W.Maalej,Mai2010 15
Outline
WorkDescripIonAutomaIon
ResearchSeRng
Research Results
MoIvaIon
2
1
3
4
AnalyzingWorkDescripIon–MSR2010
©W.Maalej,Mai2010
TwoMainEnablersForAutomaIngWorkDescripIons
AnalyzingWorkDescripIon–MSR2010 16
SharedsemanKcsofdevelopers’workingcontext,i.e.acKviKes,
arKfacts,andproblems
HeurisKcsderivedfromempiricalfindingsondevelopers’behavior
AutomaKngWorkDescripKon
©W.Maalej,Mai2010
SharedSemanIcstoAnnotateContext:Developers’InteracIons
AnalyzingWorkDescripIon–MSR2010 17
©W.Maalej,Mai2010
SharedSemanIcstoAnnotateContextDevelopers’ArIfacts
AnalyzingWorkDescripIon–MSR2010 18
©W.Maalej,Mai2010
HeurisIcstoGenerateWorkDescripIons
AnalyzingWorkDescripIon–MSR2010 19
1
2
3
4
Four factors to generate work
description
Developers Preferences • Learn from previous behavior of
developers and which information they describe in which situation
Appropriate Granularity • Guess the appropriate
level of detail Relevant vs. Irrelevant Context • Only a subset of artifacts
concerned by the interactions is included in the description
• Useful metrics are accumulated usage duration, usage age, and usage frequency
Problem-Solution States • Detect if a developer is
encountering a problem, searching for a solution, or applying a solution
• Indictors are are error messages, breakpoint usage, searches, or usage of particular keywords
©W.Maalej,Mai2010 AnalyzingWorkDescripIon–MSR2010 20
• MostinformaIonenIIescanbecreatedautomaIcallybyobservingdeveloper’scontext
• ForthatweproposeasetofontologiesandheurisIcstobeused
InformaIonEnIIes
• InformaKononacKviKes,arKfacts,problems,newwork,andstatusisincludedforworkmanagement
• InformaKononsoluKons,raKonale,andexperienceisincludedtocaptureandshareknowledge
InformaIonGranularity
• Therearedifferentlevelsofdomain,object,andacKvitygranularity
• TheseareusedconsistentlyandwithcommonpaMerns
Developers‘Preference
• Developerseitherthinkproblem‐centeredorarKfact‐centeredwhendescribingtheirwork
• TheyusewelldefinedinformaKonpaMernssuchas<acKvityconcernsarKfacts>
SummaryoftheTalk
©W.Maalej,Mai2010
Feedback,QuesIons,SuggesIonsandCollaboraIonareWelcomed!
AnalyzingWorkDescripIon–MSR2010 21
Hans‐JörgHappelFZI
WalidMaalejTUM