prevalence of mainframe systems -...
TRANSCRIPT
Prevalence of Mainframe Systems
Over 70% of all business critical software runs on mainframes
Most common language used is COBOL
Over 100 billion lines of assembler code
Over 350 billion lines of COBOL being maintained
Nine million COBOL developers
(Source: IBM survey)
Assembler Support Costs
Annual cost of maintaining one function point:
Assembler $48.00
PL/1 $39.00
C $21.00
COBOL $17.00
(Source: Capers Jones)
Reasons for Migrating from Assembler
Time to market of new or changed product: eg financial
Easier maintenance of newer languages
Access to greater resource base
Reduce exposure & risk
Future proofing product — sustainability
Improve agility, development and time to market
Reduced cost of ownership
Failed Manual Migration Projects
One company invested over 40 man-years in a migration
project: eventually abandoning the project with no result
1987: California DMV (Department of Motor Vehicles)
launched a 5 year $28 million modernisation project. Declared
a “hopeless failure” after seven years and $44 million spent
2006: DMV tried again with a six year $208 million project.
This was cancelled in 2013: the only result was a new system
for issuing drivers’ licenses.
How does FermaT change the picture?
FermaT is an automated migration system based on program
transformation theory.
The theory uses infinitary logic and set theory to prove that two
programs are semantically equivalent.
The theory is based on WSL: a Wide Spectrum Language which
include specifications and low-level programming constructs.
Software Migration using FermaT
1. Translate from the source language (eg assembler) into WSL
2. Apply WSL to WSL semantic-preserving transformations to
restructure and simplify the WSL. This includes analysis of
subroutines into procedures
3. Apply further transformations to bring the WSL into a close
correspondence with the target language
4. Translate the restructured WSL to the target language (eg
COBOL), customised according to the required coding
standards
This whole process is automated. Each stage can be customised
and “fine tuned” as needed for a particular project.
Modelling Assembler in WSL
Our approach involves three types of modelling:
1. Complete model: Each assembler instruction is translated
into WSL statements which capture all the effects of the
instruction, including condition codes and registers;
2. Partial model: Branches to register are modelled by
attempting to determine all possible targets of such a branch,
associating a value with each target, and calling a “dispatch”
routine which finds the target for the given value;
3. Self-modifying code: Most cases are detected and handled
(overwriting a NOP/branch, modifying a length field etc.)
but some very rare cases of general self-modifying code may
require human intervention: usually to renovate the assembler
using more standard programming practices!
Modelling Assembler in WSL
Standard opcodes
Standard system macros for file handling etc.
User macros
Structured macros
Condition Code
BAL/BAS (Branch and Save)
Branch to Register
External Subroutine Calls
Modelling Assembler in WSL
Detected Jump Tables
EXecute Statements
Data Declarations
DSECT Base Register Modification
EQUates: constants and aliases
Self-Modifying Code
Structured and Unstructured CICS calls
SQL, etc.
Migration Technology
Recent improvements to the migration technology:
Improved detection and translation of self-modifying code;
Extensive jump table detection;
Improved dataflow analysis;
Array detection and analysis (including detection of arrays of
structures);
Implementation of program slicing for WSL (our internal
Wide Spectrum Language) and assembler;
Static Single Assignment computation;
Improvements in subroutine restructuring.
Transformation Theory
The mathematical basis of the transformation theory is essential
for an automated transformation system.
If each transformation is “only” 99.9% correct, then after
applying a sequence of 1,000 transformations, the probability that
the result is still correct is less than 37%
After 10,000 transformations the probability of the program still
working drops below 0.005%!
The only way to guarantee 100% correctness is to prove that the
transformation is valid.
Transformation Theory
Initially developed at Oxford University in the late 1980’s and
extended and applied to reverse engineering of assembler code at
Durham University in the 1990’s.
Software Migrations Ltd. was founded in 1988 with funding from
IBM to research and develop this technology.
An early major migration project was with a company that had
previously invested over 40 man years in a (failed) manual
migration: 750,000 lines of assembler migrated to efficient,
portable, maintainable C code. This took about six man years of
work, elapsed time 18 months.
Transformation Theory: Overview
A transformation consists of:
1. A name
2. A machine-checkable correctness condition
3. The code for applying the transformation to the current
program at the current position
If the correctness condition is verified for the current position
within the current program, then the transformation can be
applied and the mathematical theory guarantees that the result is
semantically equivalent to the original program.
Transformation Theory: Overview
A state is a collection of variables, each of which has a value.
The special state ⊥ (bottom) represents non-termination.
The semantics of a program is a function which maps each initial
state to the set of possible final states.
A transformation is valid if the “before” and “after” programs
have the same semantic function.
Transformation Theory: Overview
The semantics of a program can be captured in a single infinitary
logic formula, the weakest precondition.
If the weakest preconditions for the “before” and “after”
programs are logically equivalent, then the transformation is valid.
We can use all the techniques of mathematical logic, developed
over the history of mathematics, to prove the equivalence of these
two formulae, and thereby prove the correctness of a
transformation.
Transformation Theory: Example
A very simple example transformation is reversing the branches in
an if statement. For any condition B and any statements S1 and
S2, the statement:
if B then S1 else S2 fi
is semantically equivalent to the statement:
if ¬B then S2 else S1 fi
Assembler Restructuring
Assembler code:
CLC NUMAREA+9(2),=CL2’23’ IS HH GREATER THAN 23?
BH CONVTIMZ YES - SKIP TO EXIT
CLC NUMAREA+11(2),=CL2’59’ IS MM GREATER THAN 59
BH CONVTIMZ YES - SKIP TO EXIT
CLC NUMAREA+13(2),=CL2’59’ IS SS GREATER THAN 59?
BH CONVTIMZ YES - SKIP TO EXIT
CONVTIMY EQU * REFORMAT THE TIME
MVC DDTIMEO(2),NUMAREA+9 MOVE IN THE HH
MVI DDTIMEO+2,C’:’ MOVE IN A SEPERATOR
MVC DDTIMEO+3(2),NUMAREA+11 MOVE IN THE MM
MVI DDTIMEO+5,C’:’ MOVE IN A SEPERATOR
MVC DDTIMEO+6(2),NUMAREA+13 MOVE IN THE SS
Raw WSLA 0000D8 ≡ C : IS HH GREATER THAN 23?;
if a[db(NUMAREA + 9, r12), 1 + 1] =!XF c lit(1, 2, “23”)
then cc := 0
elsif a[db(NUMAREA + 9, r12), 1 + 1] <!XF c lit(1, 2, “23”)
then cc := 1
else cc := 2 fi;
call A 0000DE end
A 0000DE ≡ C : YES - SKIP TO EXIT;
if cc = 2 then call CONVTIMZ fi;
call A 0000E2 end
A 0000E2 ≡ C : IS MM GREATER THAN 59;
if a[db(NUMAREA + 11, r12), 1 + 1] =!XF c lit(1, 2, “59”)
then cc := 0
elsif a[db(NUMAREA + 11, r12), 1 + 1] <!XF c lit(1, 2, “59”)
then cc := 1
else cc := 2 fi;
call A 0000E8 end
A 0000E8 ≡ C : YES - SKIP TO EXIT;
if cc = 2 then call CONVTIMZ fi;
call A 0000EC end
A 0000EC ≡ C : IS SS GREATER THAN 59?;
if a[db(NUMAREA + 13, r12), 1 + 1] =!XF c lit(1, 2, “59”)
then cc := 0
elsif a[db(NUMAREA + 13, r12), 1 + 1] <!XF c lit(1, 2, “59”)
then cc := 1
else cc := 2 fi;
call A 0000F2 end
A 0000F2 ≡ C : YES - SKIP TO EXIT;
if cc = 2 then call CONVTIMZ fi;
call CONVTIMY end
CONVTIMY ≡ C : REFORMAT THE TIME;
call A 0000F6 end
A 0000F6 ≡ C : MOVE IN THE HH;
CDATED := r2;
!P mvc(a[db(NUMAREA + 9, r12), 1 + 1] var a[db(DDTIMEO, CDATED), 1 + 1]);
call A 0000FC end
Restructured WSLC : IS HH GREATER THAN 23?;
C : YES - SKIP TO EXIT;
C : IS MM GREATER THAN 59;
C : YES - SKIP TO EXIT;
C : IS SS GREATER THAN 59?;
C : YES - SKIP TO EXIT;
if NUMAREA[10..11] 6 “23”
∧ NUMAREA[12..13] 6 “59”
∧ NUMAREA[14..15] 6 “59”
∧ r15 = 0
then C : REFORMAT THE TIME;
C : MOVE IN THE HH;
a[CDATED].DDTIME.DDTIMEO[1..2] := NUMAREA[10..11];
C : MOVE IN A SEPERATOR;
a[CDATED].DDTIME.DDTIMEO[3] :=!XF mvi(“:”);
C : MOVE IN THE MM;
a[CDATED].DDTIME.DDTIMEO[4..5] := NUMAREA[12..13];
C : MOVE IN A SEPERATOR;
a[CDATED].DDTIME.DDTIMEO[6] :=!XF mvi(“:”);
C : MOVE IN THE SS;
a[CDATED].DDTIME.DDTIMEO[7..8] := NUMAREA[14..15] fi
COBOL
IF (NUMAREA-X-10-2 <= ’23’
AND NUMAREA-X-12-2 <= ’59’
AND NUMAREA-X-14-2 <= ’59’) THEN
* REFORMAT THE TIME
* MOVE IN THE HH
MOVE NUMAREA-X-10-2 TO DDTIMEO-X-1-2
* MOVE IN A SEPERATOR
MOVE ’:’ TO DDTIMEO-X-3-1
* MOVE IN THE MM
MOVE NUMAREA-X-12-2 TO DDTIMEO-X-4-2
* MOVE IN A SEPERATOR
MOVE ’:’ TO DDTIMEO-X-6-1
* MOVE IN THE SS
MOVE NUMAREA-X-14-2 TO DDTIMEO-X-7-2
END-IF
Simplify and Optimise
Numeric test
TRT FIELD1,NUMTAB
BZ GOODNUM
. . .
NUMTAB DC 240X’FF’
DC 10X’00’
DC 6X’FF’
FermaT finds and analyses the translation table and determines
that the TRT is testing for numerics. So we can generate this
COBOL:
IF FIELD1 IS NUMERIC THEN
. . .
END-IF
Loop Optimisations
Original assembler:
LA R0,7 MAXIMUM BYTES TO BLANK
LA R1,DOUBLEWD ADDRESS DISPLAY FIELD
*
BLANKC DS 0H
CLI 0(R1),C’0’ LEADING DIGIT A ZERO?
BNE BLANKOUT NO, GET OUT
MVI 0(R1),C’ ’ CHANGE TO A BLANK
LA R1,1(R1) POINT TO NEXT BYTE
BCT R0,BLANKC UNTIL ALL DONE
*
BLANKOUT DS 0H
Loop Optimisations
Optimised COBOL:
PERFORM VARYING DOUBLEWD-INDEX FROM 1 BY 1
UNTIL DOUBLEWD-INDEX > 7
OR DOUBLEWD(DOUBLEWD-INDEX:1) NOT EQUAL TO ’0’
*
* LEADING DIGIT A ZERO?
* NO, GET OUT
* CHANGE TO A BLANK
MOVE ’ ’ TO DOUBLEWD(DOUBLEWD-INDEX:1)
* POINT TO NEXT BYTE
* UNTIL ALL DONE
END-PERFORM
Restructuring WSL from Assembler
One of the most difficult tasks in assembler restructuring is
converting assembler subroutines into structured procedures.
A subroutine call is implemented as a BAL (Branch And Link)
or BAS (Branch And Save) instruction. This stores the return
address in the indicated register and then branches to the
indicated label. The subroutine body may store return address
elsewhere, if the register is needed for some other purpose.
A subroutine return is implemented by reloading a register
with the return address (if necessary) and then executing a
BR (Branch to Register) instruction which branches to the
address in the register.
Restructuring WSL from Assembler
Restructuring involves converting assembler style subroutines into
structured procedures.
Subroutine call:
A 0001 ≡ r14 := 1234; call FOO end
A 0002 ≡ . . .
Subroutine return:
FOORET ≡ destination := r14; call dispatch end
. . .
dispatch ≡ if destination = 0 then call Z
elsif . . .
elsif destination = 1234 then call A 0002
elsif . . . fi end
Restructuring WSL from Assembler
The analyser has to do the following:
Determine which actions belong in the body of FOO
Prove (via dataflow analysis) that the value assigned to r14
always ends up in destination before the call to dispatch
Restructured code:
A 0001 ≡ r14 := NOTUSED 1234; FOO(); call A 0002 end
A 0002 ≡ . . .
where
proc FOO() ≡ . . . end
Subroutine CallCONVNX3 DS 0H
* LINK TO CONVERSION RTN
BAL R14,MM2MTH
* MOVE MMM TO OUTPUT
MVC DDO2M,WRKMTH
WSL translation:
CONVNX3 ≡ call A 0002F6 end
A 0002F6 ≡ C : LINK TO CONVERSION RTN;
r14 := 762;
call MM2MTH end
A 0002FA ≡ C : MOVE MMM TO OUTPUT;
DDO2M := WRKMTH;
call A 000300 end
. . .
dispatch ≡ if destination = 0 then call Z
. . .
elsif destination = 762 then call A 0002FA
. . .
fi end
Subroutine BodyMM2MTH EQU *
* CONVERT MM TO MMM (ALPHA)
...
MM2MTHX EQU *
* RETURN FROM SUB ROUTINE
BR R14
WSL translation:
MM2MTH ≡ C : CONVERT MM TO MMM (ALPHA);
call A 000374 end
. . .
MM2MTHX ≡ call A 0003D0 end
A 0003D0 ≡ C : RETURN FROM SUB ROUTINE;
destination := r14;
call dispatch end
Migration Performance
Record for most number of transformations 579,150 (6,000
line module, 11,000 lines of listing, 19 CSECTS, largest
CSECT had 8 concurrently active base registers) — 1 hour
47 minutes to migrate
Capable of migrating 500,000 lines of assembler code per hour
An iterative approach to migration to discover and cater for
any new assembler tricks
Migration technology is extremely flexible allowing us to tailor
each stage of the process of migration
Migration Case Study (extract)LAAA B LAB
BAL R10,ENDGROUP
LAB MVI LAAA+1,0
MVC WLAST,WRITEM
ZAP WNET,=P’0’
BAL R10,PROCGRP
MVI XSW1,X’FF’
B LAA
LAC BAL R10,PROCGRP
MVI XSW1,X’FF’
B LAA
LAD CLI XSW1,X’FF’
BNE LADA
BAL R10,ENDGROUP
Migration Case Study (extract)LADA EQU *
MVC WPRT(17),=CL17’NUMBER CHANGED = ’
ED WORKB,WCHANGE
LA R4,WORKB
LA R1,9
LADB CLI 0(R4),C’ ’
BNE LADC
LA R4,1(R4)
BCT R1,LADB
LADC EX R1,WMVC1
*WMVC1 MVC WPRT+17(1),0(R4)
BAL R10,WRITE1
Migrated COBOL CodeMOVE LOW-VALUES TO XSW1
PERFORM S0040-READ-DDIN-P
PERFORM UNTIL END-OF-FILE
IF WLAST NOT = WRITEM THEN
IF F-LAAA NOT = 1 THEN
PERFORM S0050-ENDGROUP-P
END-IF
MOVE 0 TO F-LAAA
MOVE WRITEM TO WLAST
MOVE 0 TO WNET
END-IF
PERFORM S0080-PROCGRP-P
MOVE HIGH-VALUES TO XSW1
PERFORM S0040-READ-DDIN-P
END-PERFORM
IF XSW1 = HIGH-VALUES THEN
PERFORM S0050-ENDGROUP-P
END-IF
MOVE ’NUMBER CHANGED = ’ TO WPRT-X-1-17
CALL ’SMLED’ USING WORKB BY VALUE 1
BY REFERENCE CC1 WEDIT-ADDR WCHANGE BY VALUE 4
Metrics
Metric Raw WSL Structured WSL
Statements 561 106
Expressions 1,589 210
McCabe 184 17
Control/Data Flow 520 156
Branch–Loop 145 17
Structural 6,685 751
Restructuring Problems
Problems:
Multiple entry points to subroutine
Multiple exit points from subroutine
Branch from the middle of one subroutine into another
Two subroutines branch to common code
Returning directly to the caller’s caller
Increment the return address (to skip over a branch)
Overwrite the return address with a different one
A subroutine “return” which is actually a “call”
. . . and many more . . .
Inline Parameters to Subroutine
BAL R15,CVTBIN
DC A(ERRTNCD)
DC A(MSG023+52)
DC F’4’
. . . code . . .
Register R15 is used as both the return address and a pointer to
the parameters.
Return point is R15 + 12
In one organisation, 43% of all modules had subroutines with
inline parameter data!
Inline Parameters to Subroutine
Solution:
1. Ensure that the first inline parameter has a label;
2. Generate the following code for the call:
rn := !XF inline par(code,ADDRESS OF(par));
call SUBR
3. Change any branch to “return address plus length of inline
data”, to a direct return
4. Dispatch processing looks for subroutine calls of the form:
rn := !XF inline par(code,ADDRESS OF(par));
call SUBR
5. When subroutine is converted to procedure call, generate:
rn := ADDRESS OF(par); SUBR()
Inline Code
A block of inline code has been turned into a subroutine:
LA R14,RETLAB
SUBR . . .
block of code is here
. . .
BR R14
RETLAB . . .
Elsewhere, the block can be called via:
BAL SUBR,R14
This causes no difficulty.
Assembler Restructuring
Currently, over 99% of all hand-written assembler modules can be
fully restructured automatically.
Bugs Discovered
An example of a bug uncovered by the failure to restructure:
BAS R04,S00100
...
S00100 do some processing
...
L R04,S00R04
BR R04
Bugs Discovered
Another example:
LA R15,4 IT CAME FROM VIM
BAL R10,SUBR020 GO DECIDE WHICH ONE
LTR R5,R5 DID WE FIND ANYTHING
BZ ERROR040 NO, SO ERROR CONDITION
...
ERROR040 EQU * ERROR CONDITION IF WE GET HERE
ST R15,ERRPECD2 SAVE ERROR CODE
LA R15,252 MAJOR ERROR CODE
ICM R15,12,ERRMODID INDICATE THE MODULE
BR R10 AND RETURN
Case Studies
US State Government Department:
870,000 LOC
Complexity Improvement: 56%
Pointer Reduction: 69%
Bugs Detected: 550 per MLOC
EX instructions: 24 per MLOC
Self-Modifying Code: 2,347 per MLOC
Complex Subroutine Linkage: 74% of modules
Non-Standard Module Linkage: 60% of modules
Case Studies
Large Insurance Company:
1.8 million lines of code.
Complexity Improvement: 52%
Pointer Reduction: 53%
Bugs Detected: 274 per MLOC
EX instructions: 745 per MLOC
Self-Modifying Code: 590 per MLOC
Complex Subroutine Linkage: 37% of modules
Non-Standard Module Linkage: 14% of modules
Case Studies
Human Resource Company (Payroll Systems):
350,000 lines of code.
Complexity Improvement: 71%
Pointer Reduction: 47%
Bugs Detected: 302 per MLOC
EX instructions: 1,169 per MLOC
Self-Modifying Code: 127 per MLOC
Complex Subroutine Linkage: 53% of modules
Non-Standard Module Linkage: 78% of modules
Case Studies
Organisation Programs Modules Failed Success
Insurance 3,000 8,991 84 99.07%
H.R. & Payroll 360 1,953 7 99.62%
Payroll 256 1,075 3 99.72%
Conclusion
Totally automated migration of assembler to a high-level language
such as C or COBOL is feasible with complete restructuring
achieved for over 99% of assembler modules.