migrating legacy procedural system towards object-oriented paradigm
DESCRIPTION
Migrating Legacy Procedural System towards Object-Oriented Paradigm Département d’informatique et de recherche opérationnelle Université de Montréal Laboratoire de génie logiciel. Content of the Presentation Research Objective Legacy Dilemma Migration Approaches - PowerPoint PPT PresentationTRANSCRIPT
Séminaire GÉLO, Montréal, le 17 avril 2002 1
Migrating Legacy Procedural System towards Object-Oriented Paradigm
Département d’informatique etde recherche opérationnelle
Université de Montréal
Laboratoire de génie logiciel
Séminaire GÉLO, Montréal, le 17 avril 2002 2
Content of the Presentation
1. Research Objective
2. Legacy Dilemma
3. Migration Approaches
4. Semi/full-automatic Techniques
5. Progressive Approach & Migration Model
6. Migration Risk Analysis
(continue…)
Séminaire GÉLO, Montréal, le 17 avril 2002 3
7. Design Recovery
8. Rule-based Class Recovery
9. Migration Process Workflow Automation
10. Migration Programmer Assessment Model
11. Subject and Target System Quality Comparison
12. Conclusion
Séminaire GÉLO, Montréal, le 17 avril 2002 4
Research Objectives
Reveal the pertinent issues concerned with legacy procedural system evolution
Provide a comprehensive solution to migrate legacy system towards object technology
Provide tools and automatic techniques to support such a solution.
Séminaire GÉLO, Montréal, le 17 avril 2002 5
What is a Legacy Application?
Characteristics:
Vital to corporation
Vast former investment & mission critical
Represent substantial business knowledge
Old & Large : 10-25 years, 100K - 1M LOC
Poorly understood: personnel turn-over
Resistant to change
Séminaire GÉLO, Montréal, le 17 avril 2002 6
The Legacy Dilemma
Crucial to organization, while suffering inflexibility to changing business, market and end-user needs
Old, complex and heavily modified from the original designs by years of maintenance
Inaccessible of the original developers and system documentation is out-of-date or nonexistent
Cost of replacement is prohibitively expensive
Result: enormous maintenance costs (Major issue)
Séminaire GÉLO, Montréal, le 17 avril 2002 7
Why Migrate to O-O Paradigm?
Major benefits from OO technology
Data & control encapsulation: largely reduce the complexity of subject procedural system
Feature strong modularity: increase evolvabiltiy to ease understanding, reuse and maintenance
Why migrating?
Re-design large OO systems is prohibitive in both time and cost.
Séminaire GÉLO, Montréal, le 17 avril 2002 8
The Migration Difficulties
Human Factor
Software maintainers opt to completely rewrite application rather than attempt to adapt it to a new paradigm by hand.
Program Factor
Legacy systems have been evolved & modified over an extended period of time thus largely increase the complexity of understanding
Séminaire GÉLO, Montréal, le 17 avril 2002 9
Migration Approaches
COREM [Gall & Klösch].
Legacy Wrapping [Lucia et al.]
Automatic Technique Aid Approach (intensive interests)
Concept Lattice
Genetic algorithm
Concept clustering ….
Séminaire GÉLO, Montréal, le 17 avril 2002 10
1.D es ignRec overy
4.S ys temT rans for
m ation
3.O bjec tM apping
D om ain know ledge
extrac t low -level des ign doc um ents : s truc turec har ts , data-f low diagram s , generateentity-relations hip d iagram , trans form E R D in t or e v e r s e o b je c t - o r ie n t e d a p p l i c a t io n m o d e l ( R o o A M ) .
F ooAM
P roc edural S ys tem
Requirem ent analys is
s ourc e c ode a d a p t io n :p ro g ram t ran s fo rm at io n
2. O OA p p l ic a t ion M o d e l in g
extens ive dom ainexpert par tic ipation
(i) COREM approach
Séminaire GÉLO, Montréal, le 17 avril 2002 11
( Continue..)
Characteristic: use additional domain knowledge.
Result: involvement of domain expert
Advantage: the results are more reliable.
Disadvantage:
Domain expertise is not always available
Cost may be very high.
Séminaire GÉLO, Montréal, le 17 avril 2002 12
(ii) Legacy wrapping approach
The identified “legacy objects” are encapsulated into object wrappers and new OO system use existing resources through the wrapper’s interface.
St a t ic A n a ly s i s o f L e g a c y C o d e
D ec om pos ing program s
D is till applic ation dom ain c om ponent
Abs trac ting an o-o m odel
R e -e n g in e e rin g b y o -o m o d e l
En c a p s u la t io n t h e id e n t ifie do b je c t in t o w ra p p e rs
In c re m e n t a l t ra n s la t io n o fo b je c t w ra p p e rs
Elim inating w rapper in to real objec t
Reduc e the c om plexity of m igration
Legac y S ourc e
O O S ourc e
Séminaire GÉLO, Montréal, le 17 avril 2002 13
(iii) Automatic Technique Aid Approach (intensive interests)
Arc hitec tureO O m odelingLegac y
S ourc eCode
O bjec t- featureidentif ic ation
Legac y S ys temM odular ization
F unc tionalc om ponent rec overy
Auto
mat
ic In
form
atio
n M
inin
g Te
chni
que
P roc edural-O bjec tm apping
O O P aradigms hif ting
O OS ourc eCode
H u m a n in t e rv e n e
Séminaire GÉLO, Montréal, le 17 avril 2002 14
Object Identification
Finding objects in legacy systems is a key step in migrating towards OO paradigm
Object technology is an enabler for componentization : splitting a large application into reusable components
Intensive research work have been done to support the automation of object identification
Séminaire GÉLO, Montréal, le 17 avril 2002 15
Semi/full-automatic Techniques
Metric-based: similarity/concept clustering, type-based cohesion, delta-IC
Concept-based: Galois lattice analysis
Graph-based: strongly connected component
Genetic algorithm: derive better solutions starting from a set of initial solutions
Program slicing …
Séminaire GÉLO, Montréal, le 17 avril 2002 16
The Issues of Object Identification
Legacy systems greatly vary in source language, application domain,etc.
Difficult to select the best-suited identification approach for the legacy system at hand.
Automatic technique tends to be too fine-grained, strict recover criteria makes it lack of flexibility for vague concepts
Séminaire GÉLO, Montréal, le 17 avril 2002 17
Incremental Approach & Migration Model
Full-size progressive migration approach
allows subject system to run with no downtime
provides a flexible management means to large migration projects
Seed growing incremental migration approach
OO target application skeleton
Function-based decomposition
Séminaire GÉLO, Montréal, le 17 avril 2002 18
Initial state
OO code
Bridge code Final state
Legacy code
Intermediate states
Initial state
OO code
Bridge code Final state
Legacy code
Intermediate states
(i) Full-size progressive migration approach
Séminaire GÉLO, Montréal, le 17 avril 2002 19
man.c
random.capp.c
utility.coperater.csttictis.cmemory.crselect.c
generate.cinitial.creport.c
Level 2.
Level 3.
Level 1
Level 0
MU1 MU2.
MU3. MU4. MU5.
MU6.
MU7
Example: the Decomposition of SGA-C System
The Decomposition of SGA-C System
Séminaire GÉLO, Montréal, le 17 avril 2002 20
(ii) Seed growing incremental migration approach
Se e d building: base d o n le gac y sys te m de s ign re c o ve ryC re ating an O O ske le to n arc hite c ture as a sys te m fram e wo rk
Se e d gro wing
Se m i/full Auto m atic Te c hnique sSuppo r ting O O m o de lling
inc re m e ntally build the re s t o f sys te m c o m po ne nt
Séminaire GÉLO, Montréal, le 17 avril 2002 21
Migration Model(Reverse Direction)
A r c h i t e c t u r e a b s t r a c t ,D e s ig n R e c o v e r y
S ourc e CodeM odular ization
Legac y Know ledgeM ining
Legac y S ys temD ec om pos ition
Environm ent
Legac y S ys temUnders tanding
Legac y S ourc e Legac y F unc tion
Com ponentization
Se m i- a u t o m a t ic t e c h n iq u e s
R ev ers e en g in eerin g t ech n iq u es
M igration F eas ib ilityEvaluation
B u s in es s L o g ic, p ro ces s s eq u en ce, alg o ri th m , fo rm u la
S u b s y s tem , fu n ct io n al i ty , m u l t i -lev el g ran u lari ty
Séminaire GÉLO, Montréal, le 17 avril 2002 22
Migration Model (Forward direction)
Environm ent
S ubjec t S ourc e S ubjec t F unc tion
Applic ation P latformIntegration , T es ting
O O m odel rec as ting ,Arc hitec ture Reorganization
Com ponentization
Com ponent EntityD is c rim ination
P latf rom (O S ,GU I,i/ o ,re la t e d C o m p o n e n t P aradigm F ree
Com ponent
D om ain Know ledgeInjec tion
Expert In tervene
P latform dependentM odule Cons truc t
D e s ig n M e t h o d o lo g y In je c t io n
B u s in e s s L o g ic , t y p ic a l ly t h o s ein d e p e n d e n t w i t h t h e a p p l i c a t io n
p la t f o r m o r in t e r f a c e
A p p l icat io n p lat fo rm d ep en d en t : l an g u ag e,in terface, I/O , O S .
M odule As s em bling
P aradigm S hif ting
Se m i- a u t o m a t ic d a t a m in in g t e c h n iq u e
S u b ject p arad ig m an aly s i s & d es ig n ,co n s t ru ct io n
P rogram trans form ation
Séminaire GÉLO, Montréal, le 17 avril 2002 23
Migration Risk Analysis
Measured in probability of failure by the end of migration time frame.
Migration strategy failure: inadequate consideration of migration requirement, environment integration, migration process, legacy knowledge mining
Inadequate reverse & reengineering tool support
Risk factors: resources, migration plan, human, technology, inner&outer environment, disaster event….
Séminaire GÉLO, Montréal, le 17 avril 2002 24
M igration T ool S upport
0 .2
P os s ib ilityLittle M edium S trong
T im e S tr ic t (m onth /s ize)
0 .1
0 .9
P os s ib ility
S tr ic t M edium Loos e
(features /s ize)
M Time
Little 0
Medium 0.8
Strong 0.2
Strict 0 XH H M
Medium 0.9
H M 0.72
L 0.18
Loose 0.1
M L 0.08
XL 0.02
0 . 1 8
P os s ib ility X H H M
0 . 7 2
F ailure R is k
L X L
0 . 0 2
1 . 0
0.8
Migration Project Risk Prediction
Séminaire GÉLO, Montréal, le 17 avril 2002 25
Design Recovery: levels of abstraction
Application: concepts, business rules, policies
Function: logical and functional specifications,non-functional requirements
Structure– Data and control flow, dependency graphs– Structure and subsystem charts– Architectures– AST’s
Séminaire GÉLO, Montréal, le 17 avril 2002 26
Data Dependency Design Recovery
Legacy modularization based on user defined data structure
Model (legacy source code file) works as container of its data
User defined data structure dependency reflects its container dependency
Séminaire GÉLO, Montréal, le 17 avril 2002 27
no de.c
co n fig u rat io n . c
m ain W in .c
in M s g Bo x .c
ic o n s . c
m a in . c
u til. c
in D a teD lg . c
erro r.c
p o rt fo l io . c
p a th . c
tr eeView .cc o m v m o m .c
inD at eR an ge.c
inSt at u s B ar.cm an u Bar . c
inC hart .c in P o o l. c
in Glo ssa r y D lg.c
in L o g W in .c
in C o lo r D lg . c
tr an s ac tio n . c
inT rans In p .c
inT rans B u t .c
inT rans L is t .c
t ran s A rray .c
inD at eEn t ry .c
inF ileEn t ry .c
in Im p o rt Inp .c
in F ileD lg . c
ru nT im e.c
te s te r . c
ac c o u n t. c
r a ted D a ta . c
inT rans In p .c
ar c h iv e . c
d a taS e t . c
inN ew D lg.c inR ep o rt D lg.cin In p u t D lg.c
inG enR ep D lg.c
in C h ar tD lg . c
in P r o p D lg.cinD iary D lg.c
in Im p o rt D lg.c
in D is c D lg . c
inC onfigD lg.c
in Ac tio n I tem .c
n o d eM en u . c
c o lo r . cim p o r t . c
inD at aSh eet .c
inD at aSh eet .c
i n A ct io n It em . c
inc hart 1 .17k
dates et 0 .2k
c olor 0 .25k
c onfiguration 0.85k
inc hartD lg 0.93k
in im portd lg 0.68k
ac c ount 0 .73k
in trans lis t 0 .87k
im port 0 .82k
trans array 0.5k
in im portinp 0.41k
ic on 0.2k
node 1.1k
inac tionitem 0.83k in p ro p d lg 0 . 7 4 k indis c dlg 0.36k in teres t 0.16k
arc hive 0.22k
util 0 .74k
portfo lio 0.57k
trans ac tion 0.91k
m ainw in 0.71k
ins tatus bar 0 .3k
indiarydlg 0.24k inreportd lg 0.74k
ingenrepdlg 0.67k runtim e 0.55kindatedlg 0.21kindaterange 0.58k
in trans but 0 .21k
inc onfigdlg 1.05k
in trans inp 0.7k
inc olorc fg 0 .25
indataentry 0.34k inglos s arydlg 0.46k
inm s gbox 0.3k inpool 0 .45k path 0.53k
indatas heet 0 .41k
ratedata 1 .36k
treeview 1.46k m ain 0.18k m enubar 1 .08k nodem enu 0.4k
error 0 .18k
c om m on 0.16k
infiled lg 0.38k inf ileentry 0.3k
innew dlg 0.35k
F i g u r e 4 . C an d i d at e C l as s C o m p o s i t i o n /A g g r e g at i o n /D e p e n d e n c y H i e r ar c h i c a l D i ag r am
C o n f ig u ra t io n S u bs y s te m
K e rn e l S u bs y s te m
Ut ility S u bs y s te m
I n te rfa ce S u bs y s te m
The o r iginal m o de l re fe re nc e graph
The D ata D e pe nde nc y M o de l R e fe re nc e G raph
Interest: a personal finance system (92 files, 28 kloc)
Séminaire GÉLO, Montréal, le 17 avril 2002 28
Behavior Tracing & Component Mining
The incapability of source code-based approach: statically analyzing, pre-defined visualization mode
Fixed execution scope will high-light legacy function module
Reveal the relation between high level source code and the system behavior, thus reflect the real components
Shortage: the incomplete tracing set
Séminaire GÉLO, Montréal, le 17 avril 2002 29
Legac y s ourc e c ode expend:extinc t c om pos ition func tions
In jec t probe c ode: behavior trac ingfac ility c ode
D es ign exec ution range to h igh-lightc er tain s ys tem behavior
D a e m o n t h re a d : t e s t m o n it o r c a p t u re &re c o rd t h e o u t p u t fro m t h e p ro b e in s t ru m e n t
Analyze & m ining s ys tem s truc tureand c om ponents
D ynam ic T rac e
Exec ution
Legac y S ourc e
Legac y S ourc ew ith probe c ode
Behavior tracing: structure & component mining
Séminaire GÉLO, Montréal, le 17 avril 2002 30
A piece of automatically modified source code
1. write in file "WatchFile" line: cf_init()
2. The daemon thread will detect the length variation of WatchFile, then it'll
get the line: cf_init(), and seach in reporsitory, find out the
definition is: configuration:cf_init(void).
3. daemon thread will write a new line in file: BehaviorTrace. Firstly add
x spaces, such number of spaces x is written in file SpaceFile; secondly,
write: configuration:cf_init(void ).
4. daemon thread increase the number written in SpaceFile with 3.
cf_init();
decrease 3 from the number written in SpaceFile
cf_read_config_file();
cf_show();
Séminaire GÉLO, Montréal, le 17 avril 2002 31
Result behavior record:
Main.c: main(int arc=$,;char ** argv=&)
Main.c: parse_argument(int arc=$; char **argv=$;Glist **filelist=$)
Configuration.c: cf_init()
configuration.c : cf_set_rootdir(char *dir($))
configuration.c: UDT inconfig($)
configuration.c: clear_autoload()
Utility.c: ut_quoat_word(char *word=$)
Color.c: show_color(int value=$)
Color.c: ………
……….
Séminaire GÉLO, Montréal, le 17 avril 2002 32
Capacity of Behavior tracing
Legacy system decomposition: mining functional component, subsystem structure; grouping models, routines, user defined data structure based on their cooperation
Common service component business logic recovery
Dynamic metric collection: eg. to a certain system function request, the number of routine calls, involved models, routines, data structures, etc.
Séminaire GÉLO, Montréal, le 17 avril 2002 33
Rule based Class Recovery Model (RCR):
P rogram Entity Bas e
Rule Bas e
P rogram AnalyzerS ourc e Code
Rule Editor
O b je c t M o d u le R e c o v e ryh u m a n / m a c h in e
Candidate Clas s es
Arc hitec tureRepres entation
Rule M odify
Expert
O bjec t paradigmm igration
Legac y S ys tem Unders tanding
Clas s S c reened
Expert
P rogram S truc tural Repres entation
Séminaire GÉLO, Montréal, le 17 avril 2002 34
Rules are distilled & defined by expertise
Expressed both in natural language and semantic operational format (SOF)
SOF can be “understood” and executed by automatic class mining machine.
Eg. “Each UDT is a candidate class”, in SOF, expressed as: ( UDT | Class)
“Routine that references the 3-variables is candidate method of that class”, in SOF, expressed as: ( Routine, 3-variable | Method, Class)
Séminaire GÉLO, Montréal, le 17 avril 2002 35
A rule from case study:
For a global U_variable, if its data type is a simple UDT, we can look at it as an object entity; if its data type is a combination of UDTs, we look at itself as a representation of another higher level class candidate, this class is constituted by its compositional UDT.
A good example is an array of a UDT: “Account” is a UDT, a global U_variable is an array of type “Account”, thus it forms another concept: a new class candidate: AccountList.
Séminaire GÉLO, Montréal, le 17 avril 2002 36
Rule Sreened by Statistic Study
25 participants use 11 suggested rules (case study)
Record the usage of rules in each “human discovered” class
Record new rules that created by human in study
Build usage apportion pie-chart, refine those mostly used rules, create algorithms according to these widely accept rules to find class automaiclly
Séminaire GÉLO, Montréal, le 17 avril 2002 37
Migration Process Workflow Automation
Describes the flow of information and monitors the performance of each work
Migration process teamwork automatic arrangement
Incremental migration project progress control
Séminaire GÉLO, Montréal, le 17 avril 2002 38
Migration Workflow Diagram
W o rkflo w s t a t icd e s c rip t io n
Le g a c y S y s t e mD e c o m p o s it io n :
b y s t a t ic a n a ly s is &b e h a v io r t ra c in g
W o rkflo w S t a t u s : d y n a m ic d e s c rip t io n
W orkflow proc es sdiagram generator
R u n t im e M o n it o r &C o n t ro lle r
W orkflowM onitor ing Center
W orkflowEditor
T im e S c h e d u lefo r p ro je c t &
p e rs o n n e l
Le g a c yS o u rc e C o d e
Séminaire GÉLO, Montréal, le 17 avril 2002 39
Migration Workflow Components
Workflow Definition Tool: supports the capture of the migration process definition
Workflow Engine: performs the management of the workflow processes execution, sequences the various activities
Workflow Monitoring: monitors the status of workflow processes, dynamically configures the runtime controller in a progressive migration project
Séminaire GÉLO, Montréal, le 17 avril 2002 40
Migration Programmer Assessment
Maintenance programmer quality factors
Personal maturity level (easy-going, initiative, cooperative, conscientious, discipline, like migration task …
Technical maturity level (skills, learning ability, …
Calibration: fuzzy concept
Séminaire GÉLO, Montréal, le 17 avril 2002 41
T ec hnic al M atur ity
0 .2
P os s ib ilityLow M edium High
P ers onal M atur ity
0 .1
0 .9
P os s ib ility
Low M edium High
(s um poin t)
T P
Low 0
Medium 0.8
High 0.2
Low 0 XH H M
Medium 0.9
H M 0.72
L 0.18
High 0.1
M L 0.08
XL 0.02
0 . 1 8
P os s ib ility X H H M
0 . 7 2
Q uality ofM igration P rogram m er
L X L
0 . 0 2
1 . 0
0.8
(s um poin t)
Migration Programmer Quality Assessment Model:
Séminaire GÉLO, Montréal, le 17 avril 2002 42
Subject and Target System Comparison
Function matching
Quality comparison
Mainly interested in understandability and maintenabiltiy (Complexity)
Procedural system quality
OO system quality
Result comparison
Séminaire GÉLO, Montréal, le 17 avril 2002 43
Conclusion
---Thank your!