prevalence of mainframe systems -...

48

Upload: ngoduong

Post on 08-Nov-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Prevalence of Mainframe Systems

Over 70% of all business critical software runs on mainframes

Most common language used is COBOL

Over 100 billion lines of assembler code

Over 350 billion lines of COBOL being maintained

Nine million COBOL developers

(Source: IBM survey)

Assembler Support Costs

Annual cost of maintaining one function point:

Assembler $48.00

PL/1 $39.00

C $21.00

COBOL $17.00

(Source: Capers Jones)

Reasons for Migrating from Assembler

Time to market of new or changed product: eg financial

Easier maintenance of newer languages

Access to greater resource base

Reduce exposure & risk

Future proofing product — sustainability

Improve agility, development and time to market

Reduced cost of ownership

Failed Manual Migration Projects

One company invested over 40 man-years in a migration

project: eventually abandoning the project with no result

1987: California DMV (Department of Motor Vehicles)

launched a 5 year $28 million modernisation project. Declared

a “hopeless failure” after seven years and $44 million spent

2006: DMV tried again with a six year $208 million project.

This was cancelled in 2013: the only result was a new system

for issuing drivers’ licenses.

How does FermaT change the picture?

FermaT is an automated migration system based on program

transformation theory.

The theory uses infinitary logic and set theory to prove that two

programs are semantically equivalent.

The theory is based on WSL: a Wide Spectrum Language which

include specifications and low-level programming constructs.

Software Migration using FermaT

1. Translate from the source language (eg assembler) into WSL

2. Apply WSL to WSL semantic-preserving transformations to

restructure and simplify the WSL. This includes analysis of

subroutines into procedures

3. Apply further transformations to bring the WSL into a close

correspondence with the target language

4. Translate the restructured WSL to the target language (eg

COBOL), customised according to the required coding

standards

This whole process is automated. Each stage can be customised

and “fine tuned” as needed for a particular project.

Modelling Assembler in WSL

Our approach involves three types of modelling:

1. Complete model: Each assembler instruction is translated

into WSL statements which capture all the effects of the

instruction, including condition codes and registers;

2. Partial model: Branches to register are modelled by

attempting to determine all possible targets of such a branch,

associating a value with each target, and calling a “dispatch”

routine which finds the target for the given value;

3. Self-modifying code: Most cases are detected and handled

(overwriting a NOP/branch, modifying a length field etc.)

but some very rare cases of general self-modifying code may

require human intervention: usually to renovate the assembler

using more standard programming practices!

Modelling Assembler in WSL

Standard opcodes

Standard system macros for file handling etc.

User macros

Structured macros

Condition Code

BAL/BAS (Branch and Save)

Branch to Register

External Subroutine Calls

Modelling Assembler in WSL

Detected Jump Tables

EXecute Statements

Data Declarations

DSECT Base Register Modification

EQUates: constants and aliases

Self-Modifying Code

Structured and Unstructured CICS calls

SQL, etc.

Migration Technology

Recent improvements to the migration technology:

Improved detection and translation of self-modifying code;

Extensive jump table detection;

Improved dataflow analysis;

Array detection and analysis (including detection of arrays of

structures);

Implementation of program slicing for WSL (our internal

Wide Spectrum Language) and assembler;

Static Single Assignment computation;

Improvements in subroutine restructuring.

Transformation Theory

The mathematical basis of the transformation theory is essential

for an automated transformation system.

If each transformation is “only” 99.9% correct, then after

applying a sequence of 1,000 transformations, the probability that

the result is still correct is less than 37%

After 10,000 transformations the probability of the program still

working drops below 0.005%!

The only way to guarantee 100% correctness is to prove that the

transformation is valid.

Transformation Theory

Initially developed at Oxford University in the late 1980’s and

extended and applied to reverse engineering of assembler code at

Durham University in the 1990’s.

Software Migrations Ltd. was founded in 1988 with funding from

IBM to research and develop this technology.

An early major migration project was with a company that had

previously invested over 40 man years in a (failed) manual

migration: 750,000 lines of assembler migrated to efficient,

portable, maintainable C code. This took about six man years of

work, elapsed time 18 months.

Transformation Theory: Overview

A transformation consists of:

1. A name

2. A machine-checkable correctness condition

3. The code for applying the transformation to the current

program at the current position

If the correctness condition is verified for the current position

within the current program, then the transformation can be

applied and the mathematical theory guarantees that the result is

semantically equivalent to the original program.

Transformation Theory: Overview

A state is a collection of variables, each of which has a value.

The special state ⊥ (bottom) represents non-termination.

The semantics of a program is a function which maps each initial

state to the set of possible final states.

A transformation is valid if the “before” and “after” programs

have the same semantic function.

Transformation Theory: Overview

The semantics of a program can be captured in a single infinitary

logic formula, the weakest precondition.

If the weakest preconditions for the “before” and “after”

programs are logically equivalent, then the transformation is valid.

We can use all the techniques of mathematical logic, developed

over the history of mathematics, to prove the equivalence of these

two formulae, and thereby prove the correctness of a

transformation.

Transformation Theory: Example

A very simple example transformation is reversing the branches in

an if statement. For any condition B and any statements S1 and

S2, the statement:

if B then S1 else S2 fi

is semantically equivalent to the statement:

if ¬B then S2 else S1 fi

Assembler Restructuring

Assembler code:

CLC NUMAREA+9(2),=CL2’23’ IS HH GREATER THAN 23?

BH CONVTIMZ YES - SKIP TO EXIT

CLC NUMAREA+11(2),=CL2’59’ IS MM GREATER THAN 59

BH CONVTIMZ YES - SKIP TO EXIT

CLC NUMAREA+13(2),=CL2’59’ IS SS GREATER THAN 59?

BH CONVTIMZ YES - SKIP TO EXIT

CONVTIMY EQU * REFORMAT THE TIME

MVC DDTIMEO(2),NUMAREA+9 MOVE IN THE HH

MVI DDTIMEO+2,C’:’ MOVE IN A SEPERATOR

MVC DDTIMEO+3(2),NUMAREA+11 MOVE IN THE MM

MVI DDTIMEO+5,C’:’ MOVE IN A SEPERATOR

MVC DDTIMEO+6(2),NUMAREA+13 MOVE IN THE SS

Raw WSLA 0000D8 ≡ C : IS HH GREATER THAN 23?;

if a[db(NUMAREA + 9, r12), 1 + 1] =!XF c lit(1, 2, “23”)

then cc := 0

elsif a[db(NUMAREA + 9, r12), 1 + 1] <!XF c lit(1, 2, “23”)

then cc := 1

else cc := 2 fi;

call A 0000DE end

A 0000DE ≡ C : YES - SKIP TO EXIT;

if cc = 2 then call CONVTIMZ fi;

call A 0000E2 end

A 0000E2 ≡ C : IS MM GREATER THAN 59;

if a[db(NUMAREA + 11, r12), 1 + 1] =!XF c lit(1, 2, “59”)

then cc := 0

elsif a[db(NUMAREA + 11, r12), 1 + 1] <!XF c lit(1, 2, “59”)

then cc := 1

else cc := 2 fi;

call A 0000E8 end

A 0000E8 ≡ C : YES - SKIP TO EXIT;

if cc = 2 then call CONVTIMZ fi;

call A 0000EC end

A 0000EC ≡ C : IS SS GREATER THAN 59?;

if a[db(NUMAREA + 13, r12), 1 + 1] =!XF c lit(1, 2, “59”)

then cc := 0

elsif a[db(NUMAREA + 13, r12), 1 + 1] <!XF c lit(1, 2, “59”)

then cc := 1

else cc := 2 fi;

call A 0000F2 end

A 0000F2 ≡ C : YES - SKIP TO EXIT;

if cc = 2 then call CONVTIMZ fi;

call CONVTIMY end

CONVTIMY ≡ C : REFORMAT THE TIME;

call A 0000F6 end

A 0000F6 ≡ C : MOVE IN THE HH;

CDATED := r2;

!P mvc(a[db(NUMAREA + 9, r12), 1 + 1] var a[db(DDTIMEO, CDATED), 1 + 1]);

call A 0000FC end

Restructured WSLC : IS HH GREATER THAN 23?;

C : YES - SKIP TO EXIT;

C : IS MM GREATER THAN 59;

C : YES - SKIP TO EXIT;

C : IS SS GREATER THAN 59?;

C : YES - SKIP TO EXIT;

if NUMAREA[10..11] 6 “23”

∧ NUMAREA[12..13] 6 “59”

∧ NUMAREA[14..15] 6 “59”

∧ r15 = 0

then C : REFORMAT THE TIME;

C : MOVE IN THE HH;

a[CDATED].DDTIME.DDTIMEO[1..2] := NUMAREA[10..11];

C : MOVE IN A SEPERATOR;

a[CDATED].DDTIME.DDTIMEO[3] :=!XF mvi(“:”);

C : MOVE IN THE MM;

a[CDATED].DDTIME.DDTIMEO[4..5] := NUMAREA[12..13];

C : MOVE IN A SEPERATOR;

a[CDATED].DDTIME.DDTIMEO[6] :=!XF mvi(“:”);

C : MOVE IN THE SS;

a[CDATED].DDTIME.DDTIMEO[7..8] := NUMAREA[14..15] fi

COBOL

IF (NUMAREA-X-10-2 <= ’23’

AND NUMAREA-X-12-2 <= ’59’

AND NUMAREA-X-14-2 <= ’59’) THEN

* REFORMAT THE TIME

* MOVE IN THE HH

MOVE NUMAREA-X-10-2 TO DDTIMEO-X-1-2

* MOVE IN A SEPERATOR

MOVE ’:’ TO DDTIMEO-X-3-1

* MOVE IN THE MM

MOVE NUMAREA-X-12-2 TO DDTIMEO-X-4-2

* MOVE IN A SEPERATOR

MOVE ’:’ TO DDTIMEO-X-6-1

* MOVE IN THE SS

MOVE NUMAREA-X-14-2 TO DDTIMEO-X-7-2

END-IF

Simplify and Optimise

Numeric test

TRT FIELD1,NUMTAB

BZ GOODNUM

. . .

NUMTAB DC 240X’FF’

DC 10X’00’

DC 6X’FF’

FermaT finds and analyses the translation table and determines

that the TRT is testing for numerics. So we can generate this

COBOL:

IF FIELD1 IS NUMERIC THEN

. . .

END-IF

Loop Optimisations

Original assembler:

LA R0,7 MAXIMUM BYTES TO BLANK

LA R1,DOUBLEWD ADDRESS DISPLAY FIELD

*

BLANKC DS 0H

CLI 0(R1),C’0’ LEADING DIGIT A ZERO?

BNE BLANKOUT NO, GET OUT

MVI 0(R1),C’ ’ CHANGE TO A BLANK

LA R1,1(R1) POINT TO NEXT BYTE

BCT R0,BLANKC UNTIL ALL DONE

*

BLANKOUT DS 0H

Loop Optimisations

Optimised COBOL:

PERFORM VARYING DOUBLEWD-INDEX FROM 1 BY 1

UNTIL DOUBLEWD-INDEX > 7

OR DOUBLEWD(DOUBLEWD-INDEX:1) NOT EQUAL TO ’0’

*

* LEADING DIGIT A ZERO?

* NO, GET OUT

* CHANGE TO A BLANK

MOVE ’ ’ TO DOUBLEWD(DOUBLEWD-INDEX:1)

* POINT TO NEXT BYTE

* UNTIL ALL DONE

END-PERFORM

Restructuring WSL from Assembler

One of the most difficult tasks in assembler restructuring is

converting assembler subroutines into structured procedures.

A subroutine call is implemented as a BAL (Branch And Link)

or BAS (Branch And Save) instruction. This stores the return

address in the indicated register and then branches to the

indicated label. The subroutine body may store return address

elsewhere, if the register is needed for some other purpose.

A subroutine return is implemented by reloading a register

with the return address (if necessary) and then executing a

BR (Branch to Register) instruction which branches to the

address in the register.

Restructuring WSL from Assembler

Restructuring involves converting assembler style subroutines into

structured procedures.

Subroutine call:

A 0001 ≡ r14 := 1234; call FOO end

A 0002 ≡ . . .

Subroutine return:

FOORET ≡ destination := r14; call dispatch end

. . .

dispatch ≡ if destination = 0 then call Z

elsif . . .

elsif destination = 1234 then call A 0002

elsif . . . fi end

Restructuring WSL from Assembler

The analyser has to do the following:

Determine which actions belong in the body of FOO

Prove (via dataflow analysis) that the value assigned to r14

always ends up in destination before the call to dispatch

Restructured code:

A 0001 ≡ r14 := NOTUSED 1234; FOO(); call A 0002 end

A 0002 ≡ . . .

where

proc FOO() ≡ . . . end

Subroutine CallCONVNX3 DS 0H

* LINK TO CONVERSION RTN

BAL R14,MM2MTH

* MOVE MMM TO OUTPUT

MVC DDO2M,WRKMTH

WSL translation:

CONVNX3 ≡ call A 0002F6 end

A 0002F6 ≡ C : LINK TO CONVERSION RTN;

r14 := 762;

call MM2MTH end

A 0002FA ≡ C : MOVE MMM TO OUTPUT;

DDO2M := WRKMTH;

call A 000300 end

. . .

dispatch ≡ if destination = 0 then call Z

. . .

elsif destination = 762 then call A 0002FA

. . .

fi end

Subroutine BodyMM2MTH EQU *

* CONVERT MM TO MMM (ALPHA)

...

MM2MTHX EQU *

* RETURN FROM SUB ROUTINE

BR R14

WSL translation:

MM2MTH ≡ C : CONVERT MM TO MMM (ALPHA);

call A 000374 end

. . .

MM2MTHX ≡ call A 0003D0 end

A 0003D0 ≡ C : RETURN FROM SUB ROUTINE;

destination := r14;

call dispatch end

Migration Performance

Record for most number of transformations 579,150 (6,000

line module, 11,000 lines of listing, 19 CSECTS, largest

CSECT had 8 concurrently active base registers) — 1 hour

47 minutes to migrate

Capable of migrating 500,000 lines of assembler code per hour

An iterative approach to migration to discover and cater for

any new assembler tricks

Migration technology is extremely flexible allowing us to tailor

each stage of the process of migration

Migration Case Study (extract)LAAA B LAB

BAL R10,ENDGROUP

LAB MVI LAAA+1,0

MVC WLAST,WRITEM

ZAP WNET,=P’0’

BAL R10,PROCGRP

MVI XSW1,X’FF’

B LAA

LAC BAL R10,PROCGRP

MVI XSW1,X’FF’

B LAA

LAD CLI XSW1,X’FF’

BNE LADA

BAL R10,ENDGROUP

Migration Case Study (extract)LADA EQU *

MVC WPRT(17),=CL17’NUMBER CHANGED = ’

ED WORKB,WCHANGE

LA R4,WORKB

LA R1,9

LADB CLI 0(R4),C’ ’

BNE LADC

LA R4,1(R4)

BCT R1,LADB

LADC EX R1,WMVC1

*WMVC1 MVC WPRT+17(1),0(R4)

BAL R10,WRITE1

Migrated COBOL CodeMOVE LOW-VALUES TO XSW1

PERFORM S0040-READ-DDIN-P

PERFORM UNTIL END-OF-FILE

IF WLAST NOT = WRITEM THEN

IF F-LAAA NOT = 1 THEN

PERFORM S0050-ENDGROUP-P

END-IF

MOVE 0 TO F-LAAA

MOVE WRITEM TO WLAST

MOVE 0 TO WNET

END-IF

PERFORM S0080-PROCGRP-P

MOVE HIGH-VALUES TO XSW1

PERFORM S0040-READ-DDIN-P

END-PERFORM

IF XSW1 = HIGH-VALUES THEN

PERFORM S0050-ENDGROUP-P

END-IF

MOVE ’NUMBER CHANGED = ’ TO WPRT-X-1-17

CALL ’SMLED’ USING WORKB BY VALUE 1

BY REFERENCE CC1 WEDIT-ADDR WCHANGE BY VALUE 4

Metrics

Metric Raw WSL Structured WSL

Statements 561 106

Expressions 1,589 210

McCabe 184 17

Control/Data Flow 520 156

Branch–Loop 145 17

Structural 6,685 751

Restructuring Problems

Problems:

Multiple entry points to subroutine

Multiple exit points from subroutine

Branch from the middle of one subroutine into another

Two subroutines branch to common code

Returning directly to the caller’s caller

Increment the return address (to skip over a branch)

Overwrite the return address with a different one

A subroutine “return” which is actually a “call”

. . . and many more . . .

Inline Parameters to Subroutine

BAL R15,CVTBIN

DC A(ERRTNCD)

DC A(MSG023+52)

DC F’4’

. . . code . . .

Register R15 is used as both the return address and a pointer to

the parameters.

Return point is R15 + 12

In one organisation, 43% of all modules had subroutines with

inline parameter data!

Inline Parameters to Subroutine

Solution:

1. Ensure that the first inline parameter has a label;

2. Generate the following code for the call:

rn := !XF inline par(code,ADDRESS OF(par));

call SUBR

3. Change any branch to “return address plus length of inline

data”, to a direct return

4. Dispatch processing looks for subroutine calls of the form:

rn := !XF inline par(code,ADDRESS OF(par));

call SUBR

5. When subroutine is converted to procedure call, generate:

rn := ADDRESS OF(par); SUBR()

Inline Code

A block of inline code has been turned into a subroutine:

LA R14,RETLAB

SUBR . . .

block of code is here

. . .

BR R14

RETLAB . . .

Elsewhere, the block can be called via:

BAL SUBR,R14

This causes no difficulty.

Assembler Restructuring

Currently, over 99% of all hand-written assembler modules can be

fully restructured automatically.

Bugs Discovered

An example of a bug uncovered by the failure to restructure:

BAS R04,S00100

...

S00100 do some processing

...

L R04,S00R04

BR R04

Bugs Discovered

Another example:

LA R15,4 IT CAME FROM VIM

BAL R10,SUBR020 GO DECIDE WHICH ONE

LTR R5,R5 DID WE FIND ANYTHING

BZ ERROR040 NO, SO ERROR CONDITION

...

ERROR040 EQU * ERROR CONDITION IF WE GET HERE

ST R15,ERRPECD2 SAVE ERROR CODE

LA R15,252 MAJOR ERROR CODE

ICM R15,12,ERRMODID INDICATE THE MODULE

BR R10 AND RETURN

Case Studies

US State Government Department:

870,000 LOC

Complexity Improvement: 56%

Pointer Reduction: 69%

Bugs Detected: 550 per MLOC

EX instructions: 24 per MLOC

Self-Modifying Code: 2,347 per MLOC

Complex Subroutine Linkage: 74% of modules

Non-Standard Module Linkage: 60% of modules

Case Studies

Large Insurance Company:

1.8 million lines of code.

Complexity Improvement: 52%

Pointer Reduction: 53%

Bugs Detected: 274 per MLOC

EX instructions: 745 per MLOC

Self-Modifying Code: 590 per MLOC

Complex Subroutine Linkage: 37% of modules

Non-Standard Module Linkage: 14% of modules

Case Studies

Human Resource Company (Payroll Systems):

350,000 lines of code.

Complexity Improvement: 71%

Pointer Reduction: 47%

Bugs Detected: 302 per MLOC

EX instructions: 1,169 per MLOC

Self-Modifying Code: 127 per MLOC

Complex Subroutine Linkage: 53% of modules

Non-Standard Module Linkage: 78% of modules

Case Studies

Organisation Programs Modules Failed Success

Insurance 3,000 8,991 84 99.07%

H.R. & Payroll 360 1,953 7 99.62%

Payroll 256 1,075 3 99.72%

Conclusion

Totally automated migration of assembler to a high-level language

such as C or COBOL is feasible with complete restructuring

achieved for over 99% of assembler modules.

�������������

� ������������ ���������������

��������������������������������������������

� !�����������