how to recover malare assembly codes
TRANSCRIPT
How to recover malware assembly codesJean-Yves MarionLORIA!
Jean-Yves Marion - Laboratoire de Haute Sécurité (LHS)
Duqu : The precursor to the next Stuxnet
Duqu is Targeted attacksStart in June 2010 ? Discover in Sept. 2011 by Crysys (Budapest) See white paper of Symantec
Code injection
Duqu is similar to Stuxnet ➡ Same installation mechanisms and Similar functionalities ➡ But Anti-Virus companies detect it in Sept 2011 ! ➡ None of 43 anti-virus of VirusTotal was able to detect Duqu knowing Stuxnet.
0-day exploit
Driver File (.sys)
Installer (.dll)
Decryption
DUQU Main DLL Service.exe
The decryption routine!of the payload installer
Unpack a UPX-file
The main DLL code is !now decrypted !and depacked !in memory only
Wave 1
Wave 2
Decryption
Duqu is a self-modifying program
A common protection scheme for malware
Wave 1
payload
P33C7 18+&012234A
%0:+$-.&'$JK/$01$-.+$<&5=+:$50(+$A,$A'$600<,$L3W
Decrypt
..........
Decrypt
P33C7 18+&012234A
%0:+$-.&'$JK/$01$-.+$<&5=+:$50(+$A,$A'$600<,$L3W
Decrypt
Wave 2
P33C7 18+&012234A
%0:+$-.&'$JK/$01$-.+$<&5=+:$50(+$A,$A'$600<,$L3W
Self-modifying program schema
Self-modifying codesA bare semantics
µ0[c] : binary c loaded into memory µ0
µn : memory
�n : registers
�n(ip) returns the address of the next instruction to run
Traces ➡ Traces are obtained by code instrumentation : we use Pin (intel)
We collect an execution trace of P :
For each run instruction, we gather
– its memory address
– its machine instruction
(µ0[c],�0) ! (µ1,�1) ! . . . ! (µn,�n) ! . . .
Self-modifying codesDynamic typing of memories
�(m) = (kr
, kw
, kx
) where m is a memory adress
kw is the writing level
kr is the reading level
kx
is the execution level
�0(m) = (0, 0, 0)
(µ0[c],�0,�0) ! (µ1,�1,�1) ! . . . ! (µn�n,�n) ! . . .
The execution level is the level of �n(ip) given by �n(�n(ip))
Self-modifying codesDynamic typing of memories
�(m) = (kr
, kw
, kx
) where m is a memory adress
kw is the writing levelkx
is the execution level
�0(m) = (0, 0, 0)
(µ0[c],�0,�0) ! (µ1,�1,�1) ! . . . ! (µn�n,�n) ! . . .
An instruction written at level k has an execution level of k+1
@a: mov esi,$index @b: xor [@offset+esi],$key @c: sub esi,4 @d: jnz @b @offset: [encrypted data]
Wave 1 @a,…,@d
Decrypt
Wave 2 @offset
kw is 1
The execution level of @offset is 2 because it is written by instructions in wave 1
So kx
is 2
Self-modifying codes
kw is the writing level
kx
is the execution level
A self-modifying program c is a program such that its execution level is > 1 for an input
�
i+1(m) =
8><
>:
(kr
, k + 1, kx
) if m is written
and �
i
(m) = (kr
, kw
, kx
)
�
i
(m) otherwise
�
i+1(m) =
((k
r
, k, k + 1) m = �(ip) & �
i
(�i
(ip)) = (kr
, k, kx
)
�
i
(m) otherwise
Similar to Phase semantics of Preda, Giacobazzi, and Debray
The execution level is k + 1
�i(ip) points to an instruction like mov [m],eax
Packer protectionsExemple (4/5)
• hostname packe avec Themida
����������� ��
����������� ��
���������� ��
������������ ��
������������ ��
���������� ��
��������� ��
����������� ��
����������� ��
������������ ��
���������� ��
Different code waves with their
relations
Themida packer
Yoda packer
�������������
������������
UPX
7 Résultats expérimentaux
Fig. 7.2: Résultats de l’analyse
Nom du binaire k k Blind Decrypt Check Scrambledhostname.exe (original) 1 1
!EPack_1..exe 2 2 X Xacprotect-hostname.exe 18 882 X X X X
aspack-hostname.exe 2 3 X X X Xenigma_protector_1.16.exe 5 24 X X X X
exefog_1.1.exe 3 5 X X Xexpressor-hostname.exe 2 3 X X
fsg.exe 2 2 X X Xmew11.exe 2 2 X X X
molebox-hostname.exe 3 5 X X X Xmorphine_1.9.exe 3 3 X X X
nakedpack.exe 2 2 Xnpack-hostname.exe 2 2 X
nspack.exe 3 4 X X Xpackman_1..exe 2 2 X X X
pec2-hostname.exe 3 4 X X X Xpelock-hostname.exe 9 16 X X X X
pepack.exe 1 1 Xpespin-hostname.exe 4 38 X X X X
petite.exe 2 2 X Xrlpack_1.17_full_version.exe 2 2 X X X
rlpack-hostname.exe 2 2 Xtelock_.98.exe 2 2 X
themida_1.8.5.2.exe 11 164 X X X Xupx-hostname.exe 2 2 X X
vmprotect-hostname.exe 1 1 Xwinupack-hostname.exe 3 4 X X X X
Yodas_Crypter_v1.3.exe 4 4 X X X Xyp-1.02-hostname.exe 4 6 X X X X
Légende :– k est le niveau d’exécution maximal en typage classique (chapitre 5, définition
50) ;– k est le niveau d’exécution maximal en typage monotone.
84
Where are we ?
(µ0[c],�0,�0) ! (µ1,�1,�1) ! . . . ! (µn,�n,�n) ! . . .Dynamic typed memory trace
which defines a sequence of waves
Wave 1
Decrypt
..........
DecryptDecrypt
Wave 2 Wave K
Can we recover the assembly code of the wave K ? Can we reconstruct the full CFG ?
The inputs: An execution trace inside wave K The snapshot of the memory at wave K
The problem and its inputs
Can we recover the assembly code of this wave ?
The inputs: An execution trace inside wave K The snapshot of the memory at wave K
Snapshot of the memory at the beginning of wave 5
Dynamic vs Static analysis
A trace obtained by dynamic analysis
Dynamic typed memory trace
Undiscovered code in white boxes
Why is it difficult to recover a CFG in x86 ?
Indirect jumps
100: jmp eax
– Fuzzing !– We need to have a robust approximation of x86 semantics!– Abstract interpretation!– SMT Solver
What is the set of possible values of eax ?
Junk code insertionJunk code insertion at the expected return adress
!!
100 : call @a
junk code
@a : …
How to determine the return address of a call ?
125 : pop esi
Modify the return address (125)
See Debray’s paper
Yet another difficulty mis-alignment
01006 e7a f e 04 0b inc byte [ ebx+ecx ]01006 e7d eb f f jmp +101006 e7e f f c9 dec ecx
01006 e80 7 f e6 jg 01006 e6801006 e82 8b c1 mov eax , ecx
Figure 1. Overlapping assembly in tELock.
010059 f0 89 f9 mov ecx , edi
,=< 010059 f2 79 07 jns +9| 010059 f4 0 f b7 07 movzx eax , word [ edi ]| 010059 f7 47 inc edi
| 010059 f8 50 push eax
| 010059 f9 47 inc edi
| 010059 fa b9 57 48 f2 ae mov ecx , ae f24857‘�> 010059 fb 57 push edi
010059 f c 48 dec eax
010059 fd f2 ae repne scasb010059 f f 55 push ebp
Figure 2. Overlapping assembly in UPX.
2.2.1 tELock0.99
tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recursivedisassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded onthe two bytes eb ff, that jumps to the address 01006e7d+1, which is a dec ecx instruction (ff c9 ) which shares thebyte ff at address 01006e7d+ 1 with the jmp instruction.
2.2.2 UPX
UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditionaljump to separate the control flow into two overlapping blocks which both realign after a few instructions.(TODO: expliquer les deux branches, rapidement en quoi elles sont utiles)
2.2.3 Overlapping in state-of-the-art disassemblers
Existing disassemblers, even when doing recursive traversal, assume that code cannot overlap and fail at displayingthe resulting disassembly.
With IDA Pro (v6.3), the tELock example looks as follows:
01006E7A inc byte ptr [ ebx+ecx ]01006E7D jmp short near ptr loc_1006E7D+101006E7D ; ��������������������������������������01006E7F db 0C9h ;
01006E80 db 7Fh ;
01006E81 db 0E6h ;
01006E82 db 8Bh ;
01006E83 db 0C1h ;
With Radare (TODO: recursive?), the tELock example is disassembled as follows:
01006 e7a fe040b inc byte [ ebx+ecx ]01006 e7d e b f f jmp 6 e7e01006 e7 f c9 leave01006 e80 7 f e6 jg 6e6801006 e82 8bc1 mov eax , ecx
Both are not able to follow the jmp: the target of the jmp is already disassembled in another assembly instructionand is thus deemed invalid.
2
teLock
01006 e7a f e 04 0b inc byte [ ebx+ecx ]01006 e7d eb f f jmp +101006 e7e f f c9 dec ecx
01006 e80 7 f e6 jg 01006 e6801006 e82 8b c1 mov eax , ecx
Figure 1. Overlapping assembly in tELock.
010059 f0 89 f9 mov ecx , edi
,=< 010059 f2 79 07 jns +9| 010059 f4 0 f b7 07 movzx eax , word [ edi ]| 010059 f7 47 inc edi
| 010059 f8 50 push eax
| 010059 f9 47 inc edi
| 010059 fa b9 57 48 f2 ae mov ecx , ae f24857‘�> 010059 fb 57 push edi
010059 f c 48 dec eax
010059 fd f2 ae repne scasb010059 f f 55 push ebp
Figure 2. Overlapping assembly in UPX.
2.2.1 tELock0.99
tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recursivedisassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded onthe two bytes eb ff, that jumps to the address 01006e7d+1, which is a dec ecx instruction (ff c9 ) which shares thebyte ff at address 01006e7d+ 1 with the jmp instruction.
2.2.2 UPX
UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditionaljump to separate the control flow into two overlapping blocks which both realign after a few instructions.(TODO: expliquer les deux branches, rapidement en quoi elles sont utiles)
2.2.3 Overlapping in state-of-the-art disassemblers
Existing disassemblers, even when doing recursive traversal, assume that code cannot overlap and fail at displayingthe resulting disassembly.
With IDA Pro (v6.3), the tELock example looks as follows:
01006E7A inc byte ptr [ ebx+ecx ]01006E7D jmp short near ptr loc_1006E7D+101006E7D ; ��������������������������������������01006E7F db 0C9h ;
01006E80 db 7Fh ;
01006E81 db 0E6h ;
01006E82 db 8Bh ;
01006E83 db 0C1h ;
With Radare (TODO: recursive?), the tELock example is disassembled as follows:
01006 e7a fe040b inc byte [ ebx+ecx ]01006 e7d e b f f jmp 6 e7e01006 e7 f c9 leave01006 e80 7 f e6 jg 6e6801006 e82 8bc1 mov eax , ecx
Both are not able to follow the jmp: the target of the jmp is already disassembled in another assembly instructionand is thus deemed invalid.
2
IDA failsbecause of jmp +1
BB [0x4 -> 0x5] (0x2)0x4 dec ecx
BB [0x3 -> 0x4] (0x2)0x3 jmp 0x4
BB [0x6 -> 0x7] (0x2)0x6 jg 0x���ee
BB [0x0 -> 0x2] (0x3)0x0 inc byte [ebx+ecx]
BB [0x8 -> 0x9] (0x2)0x8 mov eax, ecx
Figure 4. Control flow graph for the tELock sample
010059 f0 89 f9 mov ecx , edi
,=< 010059 f2 79 07 jns +9| 010059 f4 0 f b7 07 movzx eax , word [ edi ]| 010059 f7 47 inc edi
| 010059 f8 50 push eax
| 010059 f9 47 inc edi
| 010059 fa b9 57 48 f2 ae mov ecx , ae f24857‘�> 010059 fb 57 push edi
010059 f c 48 dec eax
010059 fd f2 ae repne scasb010059 f f 55 push ebp
Figure 5. Overlapping assembly in UPX.
2.2.2 UPX
UPX uses overlapping to optimize the size of the final packed binary (figure 5). The unpacker part uses a conditionaljump to separate the control flow into two overlapping blocks which both realign after a few instructions.(TODO: expliquer les deux branches, rapidement en quoi elles sont utiles)
The control flow graph for this overlapping code is given on figure ??.
2.2.3 Overlapping in state-of-the-art disassemblers
Existing disassemblers, even when doing recursive traversal, assume that code cannot overlap and fail at displayingthe resulting disassembly.
With IDA Pro (v6.3), the tELock example looks as follows:
01006E7A inc byte ptr [ ebx+ecx ]01006E7D jmp short near ptr loc_1006E7D+101006E7D ; ��������������������������������������01006E7F db 0C9h ;
01006E80 db 7Fh ;
01006E81 db 0E6h ;
01006E82 db 8Bh ;
01006E83 db 0C1h ;
With Radare (TODO: recursive?), the tELock example is disassembled as follows:
01006 e7a fe040b inc byte [ ebx+ecx ]01006 e7d e b f f jmp 6 e7e01006 e7 f c9 leave
3
Another example of mis-alignment 01006 e7a f e 04 0b inc byte [ ebx+ecx ]01006 e7d eb f f jmp +101006 e7e f f c9 dec ecx
01006 e80 7 f e6 jg 01006 e6801006 e82 8b c1 mov eax , ecx
Figure 1. Overlapping assembly in tELock.
010059 f0 89 f9 mov ecx , edi
,=< 010059 f2 79 07 jns +9| 010059 f4 0 f b7 07 movzx eax , word [ edi ]| 010059 f7 47 inc edi
| 010059 f8 50 push eax
| 010059 f9 47 inc edi
| 010059 fa b9 57 48 f2 ae mov ecx , ae f24857‘�> 010059 fb 57 push edi
010059 f c 48 dec eax
010059 fd f2 ae repne scasb010059 f f 55 push ebp
Figure 2. Overlapping assembly in UPX.
2.2.1 tELock0.99
tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recursivedisassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded onthe two bytes eb ff, that jumps to the address 01006e7d+1, which is a dec ecx instruction (ff c9 ) which shares thebyte ff at address 01006e7d+ 1 with the jmp instruction.
2.2.2 UPX
UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditionaljump to separate the control flow into two overlapping blocks which both realign after a few instructions.(TODO: expliquer les deux branches, rapidement en quoi elles sont utiles)
2.2.3 Overlapping in state-of-the-art disassemblers
Existing disassemblers, even when doing recursive traversal, assume that code cannot overlap and fail at displayingthe resulting disassembly.
With IDA Pro (v6.3), the tELock example looks as follows:
01006E7A inc byte ptr [ ebx+ecx ]01006E7D jmp short near ptr loc_1006E7D+101006E7D ; ��������������������������������������01006E7F db 0C9h ;
01006E80 db 7Fh ;
01006E81 db 0E6h ;
01006E82 db 8Bh ;
01006E83 db 0C1h ;
With Radare (TODO: recursive?), the tELock example is disassembled as follows:
01006 e7a fe040b inc byte [ ebx+ecx ]01006 e7d e b f f jmp 6 e7e01006 e7 f c9 leave01006 e80 7 f e6 jg 6e6801006 e82 8bc1 mov eax , ecx
Both are not able to follow the jmp: the target of the jmp is already disassembled in another assembly instructionand is thus deemed invalid.
2
UPX
Re-synchronized
bytes in common !!
mov ecx,edi jnz +9
movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857
push ebp
push edu dec eax repine scasb
Share 4 bytes
Let’s recap the problem
First instruc*on
Last instruc*on
TRACER
W2 W4W1 W3 W5
Snapshot of the memory at the beginning of wave 5
Goal : Reconstruct the full CFG
Problem inputsSnapshot of the memory at the beginning of wave 5
An execution trace
A path in the woods
Junk codes insertion after a call
100 : call @a
junk code
125 : pop esi
@a : pop ebp Modify the return
address @b: ret
100:call @a, @a:pop esi,…, @b:ret;125:pop esi;…
A trace will provide automatically the address 125
It is junk codes only if it is not reachable
Trace:
See the paper of Krugel and al, Usenix 2004 for another approach
Method for mis alignment
… 89 F9 79 07 0F B7 07 47 50 47 B9 57 48 F2 AE 55 …
mov ecx,edi jnz +9
push edi dec eax repne scasb push ebp
movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857
push ebp
An obfuscation similar to UPX
The CFG construction follow the trace
Then, we search for missing codes
3/ We split blocks
Method for mis alignment
… 89 F9 79 07 0F B7 07 47 50 47 B9 57 48 F2 AE 55 …
mov ecx,edi jnz +9
push edi dec eax repne scasb
movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857
push ebp
An obfuscation similar to UPX
1/ The CFF construction follows the trace
2/ Then, we search for missing codes
3/ We split blocks
Misalignment can come from indirect jump … traces are then useful !
The overall method (work in progress)• A partial CFG is an un-complete CFG
• Two partial CFG are in conflict if there are two mis-aligned instructions.
• Traces define a set of partial CFG which are in conflict.
mov ecx,edi jnz +9
movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857
push edi dec eax repne scasb push ebp
push ebppush ebp
Share the !same adresses
• Edges between partial CFG indicate mis-alignement
• Then we can synchronize partial CFG
• There are orphan partial CFG
• There are ok if there is an edge to a valid address
• Statistic recognition is useful at this stage
Conclusion and Questions
• We develop a disassembler for self-modified codes :
BinVizz : Visualization of each code wave from a trace