franhouder july2013
DESCRIPTION
Transfer learning and SETRANSCRIPT
TRANSFER LEARNING AND SE [email protected]
WVU, JULY 2013
SOUND BITES
• Ye olde worlde SE
• “The” model of SE (defects, effort, etc)
• 21st century SE
• Models (plural) • No generality in models • But , perhaps generality in how we find those models
• Transfer learning
2
3
WHAT IS TRANSFER LEARNING?
• Source = old= Domain1 = < Eg1, P1>
• Target = new = Domain2 = <Eg2, P2>
• If we move from domain1 to domain2, do have have to start afresh?
• Or can we learn faster in “new” … • … Using lessons learned from “old”?
• NSF funding (2013..2017):
• Transfer learning in Software Engineering • Menzies, Layman, Shull , Diep
4
WHO CARES? (WHAT’S AT STAKE?)
• “Transfer” is a core scientific issue
• Lack of transfer is the scandal of SE
• Replication is Empirical SE is rare
• Conclusion instability • It all depends.
• The full stop syndrome
• The result?
• A funding crisis
5
MANUAL TRANSFER (WAR STORIES)
• Brazil, SEL, 2002: need domain knowledge (but now gone)?
• NSF, SEL, 2006: need better automatic support
• Kitchenham, Mendes et al, TSE 2007: for = against
• Zimmermann FSE, 2009: cross works in 4/600 times
6
WAR STORIES (EFFORT ESTIMATION) Effort = a . locx . y
• learned using Boehm’s methods
• 20*66% of NASA93 • COCOMO attributes • Linear regression (log
pre-processor) • Sort the co-efficients
found for each member of x,y
7
WAR STORIES (DEFECT ESTIMATION)
8
BUT THERE IS HOPE
• Maybe we’ve been looking in the wrong direction
• SE project data = surface features of an underlying effect • Go beneath the surface
9
Focused too much on what we can see at first glance
Did not check the nuances on the hidden structure beneath
10
BUT THERE IS HOPE
With new data mining technologies, true picture emerges, where we can see what is going on
12/1/2011 11
BUT THERE IS HOPE
ESEM, 2011 : How to Find Relevant Data for Effort Estimation
TIM MENZIES, EKREM KOCAGUNELI
THERE IS HOPE
• Maybe we’ve been looking in the wrong direction
• SE project data = surface features of an underlying effect • Go beneath the surface
13
USD DOD MILITARY PROJECTS (LAST DECADE)
14
You must segment to find relevant
data
15"
DOMAIN SEGMENTATIONS
15
Q: What to do about rare
zones?
A: Select the nearest ones from the rest But how?
IN THE LITERATURE: WITHIN VS CROSS = ??
BEFORE THIS WORK
16
Kitchenham et al. TSE 2007
• Within-company learning (just use local data)
• Cross-company learning (just use data from other companies)
Results mixed • No clear win from cross
or within
Cross vs within are no rigid boundaries
• They are soft borders • And we can move a
few examples across the border
• And after making those moves
• “Cross” same as “local”
SOME DATA DOES NOT DIVIDE NEATLY ON EXISTING DIMENSIONS
17
THE LOCALITY(1) ASSUMPTION
18
Data divides best on one attribute 1. development centers of developers; 2. project type; e.g. embedded, etc; 3. development language 4. application type (MIS; GNC; etc); 5. targeted hardware platform; 6. in-house vs outsourced projects; 7. Etc
If Locality(1) : hard to use data across these boundaries
• Then harder to build effort models: • Need to collect local data (slow)
THE LOCALITY(N) ASSUMPTION
19
Data divides best on combination of attributes If Locality(N)
• Easier to use data across these boundaries
• Relevant data spread all around
• little diamonds floating in the dust
HOW TO FIND RELEVANT TRAINING DATA?
20
independent attributes
w x y z class similar 1
0 1 1 1 2 similar 2
0 1 1 1 3 different 1 7 7 6 2 5 different 2 1 9 1 8 8 different 3 5 4 2 6 10
alien 1 74 15 73 56 20 alien 2 77 45 13 6 40 alien 3 35 99 31 21 60 alien 4 49 55 37 4 80
Use similar?
Use more variant?
Use aliens ?
VARIANCE PRUNING
21
independent attributes
w x y z class similar 1
0 1 1 1 2 similar 2
0 1 1 1 3 different 1 7 7 6 2 5 different 2 1 9 1 8 8 different 3 5 4 2 6 10
alien 1 74 15 73 56 20 alien 2 77 45 13 6 40 alien 3 35 99 31 21 60 alien 4 49 55 37 4 80
1) Sort the clusters by “variance” 2) Prune those high variance things 3) Estimate on the rest
“Easy path”: cull the examples that hurt the learner
PRUNE !
KEEP !
TEAK: CLUSTERING + VARIANCE PRUNING (TSE, JAN 2011)
22
• TEAK is a variance-based instance selector • It is built via GAC trees
• TEAK is a two-pass system • First pass selects low-variance relevant projects • Second pass retrieves projects to estimate from
ESSENTIAL POINT
23
TEAK finds local regions important to the estimation of particular cases
TEAK finds those regions via locality(N)
• Not locality(1)
WITHIN AND CROSS DATASETS
24
Note: all Locality(1) divisions
EXPERIMENT1: PERFORMANCE COMPARISON OF WITHIN AND CROSS-SOURCE DATA
25
• TEAK on within & cross data for each dataset group (lines separate groups) • LOOCV used for runs • 20 runs performed for each treatment • Results evaluated w.r.t. MAR, MMRE, MdMRE and Pred(30), but see http://goo.gl/6q0tw
• If within data outperforms cross, the dataset is highlighted with gray
• See only 2 datasets highlighted
EXPERIMENT 2: RETRIEVAL TENDENCY OF TEAK FROM WITHIN AND CROSS-SOURCE DATA
26
EXPERIMENT2: RETRIEVAL TENDENCY OF TEAK FROM WITHIN AND CROSS-SOURCE DATA
27
Diagonal (WC) vs. Off-Diagonal (CC) selection percentages sorted
Percentiles of diagonals and off-diagonals
HIGHLIGHTS
28
1. Don’t listen to everyone • When listening to a crowd, first
filter the noise
2. Once the noise clears: bits of me are similar to bits of you
• Probability of selecting cross or within instances is the same
3. Cross-vs-within is not a useful distinction
• Locality(1) not informative • Enables “cross-company”
learning
SO, THERE IS HOPE
• Maybe we’ve been looking in the wrong direction
• SE project data = surface features of an underlying effect • Go beneath the surface
• Assuming locality(N), not locality(1)
• No cross-, no within- • Its all data we can learn from
29
TSE, 2013 : LOCAL VS. GLOBAL MODELS FOR EFFORT ESTIMATION AND DEFECT PREDICTION TIM MENZIES, ANDREW BUTCHER (WVU) ANDRIAN MARCUS (WAYNE STATE) THOMAS ZIMMERMANN (MICROSOFT) DAVID COK (GRAMMATECH)
Do not on what we can see at first glance
Check the nuances on the hidden structure beneath
31
THERE IS HOPE
12/1/2011 32
Cluster then learn (using envy)
• Seek the fence where the grass is greener on the other side.
• Learn from there
• Test on here
• Cluster to find “here” and “there”
12/1/2011 33
ENVY = THE WISDOM OF THE COWS
12/1/2011 34
@attribute recordnumber real @attribute projectname {de,erb,gal,X,hst,slp,spl,Y} @attribute cat2 {Avionics, application_ground, avionicsmonitoring, … } @attribute center {1,2,3,4,5,6} @attribute year real @attribute mode {embedded,organic,semidetached} @attribute rely {vl,l,n,h,vh,xh} @attribute data {vl,l,n,h,vh,xh} … @attribute equivphyskloc real @attribute act_effort real @data 1,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,25.9,117.6 2,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,24.6,117.6 3,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,7.7,31.2 4,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,8.2,36 5,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,9.7,25.2 6,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,2.2,8.4 ….
DATA = MULTI-DIMENSIONAL VECTORS
CAUTION: DATA MAY NOT DIVIDE NEATLY ON RAW DIMENSIONS
The best description for SE projects may be synthesize dimensions extracted from the raw dimensions
12/1/2011 35
FASTMAP
36
Fastmap: Faloutsos [1995] O(2N) generation of axis of large variability
• Pick any point W; • Find X furthest from W, • Find Y furthest from Y.
c = dist(X,Y) All points have distance a,b to (X,Y)
• x = (a2 + c2 − b2)/2c • y= sqrt(a2 – x2)
Find median(x), median(y) Recurse on four quadrants
HIERARCHICAL PARTITIONING Prune
Find two orthogonal dimensions
Find median(x), median(y)
Recurse on four quadrants
Combine quadtree leaves with similar densities
Score each cluster by median score of class variable
37
Grow
38
Learning via “envy”
• Seek the fence where the grass is greener on the other side.
• Learn from there
• Test on here
• Cluster to find “here” and “there”
39
ENVY = THE WISDOM OF THE COWS
HIERARCHICAL PARTITIONING Prune
Find two orthogonal dimensions
Find median(x), median(y)
Recurse on four quadrants
Combine quadtree leaves with similar densities
Score each cluster by median score of class variable
40
Grow
HIERARCHICAL PARTITIONING Prune
Find two orthogonal dimensions
Find median(x), median(y)
Recurse on four quadrants
Combine quadtree leaves with similar densities Score each cluster by median score of class variable This cluster envies its neighbor with better score and max abs(score(this) - score(neighbor)) 41
Grow
Where is grass greenest?
Q: HOW TO LEARN RULES FROM NEIGHBORING CLUSTERS
A: it doesn’t really matter • Many competent rule learners
But to evaluate global vs local rules: • Use the same rule learner for local vs global rule learning
This study uses WHICH (Menzies [2010])
• Customizable scoring operator • Faster termination • Generates very small rules (good for explanation)
42
DATA FROM HTTP://PROMISEDATA.ORG/DATA
Effort reduction = { NasaCoc, China } : COCOMO or function points
Defect reduction = {lucene,xalan jedit,synapse,etc } : CK metrics(OO)
Clusters have untreated class distribution.
Rules select a subset of the examples:
• generate a treated class distribution
43
0 20 40 60 80 100
25th
50th
75th
100th
untreated global local
Distributions have percentiles:
Treated with rules learned from all data
Treated with rules learned from neighboring cluster
Lower median efforts/defects (50th percentile)
Greater stability (75th – 25th percentile)
Decreased worst case (100th percentile)
BY ANY MEASURE, LOCAL BETTER THAN GLOBAL
44
RULES LEARNED IN EACH CLUSTER
What works best “here” does not work “there”
• Misguided to try and tame conclusion instability • Inherent in the data
Can’t tame conclusion instability.
• Instead, you can exploit it • Learn local lessons that do better than overly generalized global theories
45
RULES LEARNED IN EACH CLUSTER
What works best “here” does not work “there”
• Misguided to try and tame conclusion instability • Inherent in the data
Can’t tame conclusion instability.
• Instead, you can exploit it • Learn local lessons that do better than overly generalized global theories
46
Do not on what we can see at first glance
Check the nuances on the structures within our data
• Cluster, then envy
47
SO THERE IS HOPE
48
Conclusion
LACK OF TRANSFER = THE GREAT SCANDAL OF SE
• Replication is Empirical SE is rare
• Conclusion instability
• “It all depends.” is not good enough
• A funding crisis
49
BUT THERE IS HOPE
• Maybe we’ve been looking in the wrong direction
• SE project data = surface features of an underlying effect • Go beneath the surface
• Assuming locality(N), not locality(1)
• No cross-, no within- • Its all data we can learn from
50
Do not on what we can see at first glance
Check the nuances on the structures within our data
• Cluster, then envy
51
BUT THERE IS HOPE
With new data mining technologies, true picture emerges, where we can see what is going on
12/1/2011 52
BUT THERE IS HOPE
53