asm2vec: boosting static representation …...asm2vec: boosting static representation robustness for...
TRANSCRIPT
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization
StevenH.H.Ding
DataMiningandSecurityLab
SchoolofInformationStudies
McGillUniversity
Montreal,Canada
BenjaminC.M.FungDataMiningandSecurityLabSchoolofInformationStudies
McGillUniversity,Montreal,Canada
PhilippeCharland
MissionCriticalCyberSecuritySectionDefenceR&DCanada–Valcartier
Quebec,Canada
Reverseengineer
Manualanalysis
Reverseengineering
2
Didanyoneanalyzesomethingsimilarbefore?Isitalibraryfunction?
f1f2f3
LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0
Disassemble
Abinaryfile
WithKam1n0
3
LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0
Commentedassemblyfunction
LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0
Labeledlibraryfunction
TypeI:Exactclone
4
0x1FE69C0+ PUSHebp
0x1FE69C1+ MOVebp,esp
0x1FE69C3+ MOVecx,[ebp+arg_0]
0x1FE69C6+ PUSHebx
0x1FE69C7+ MOVebx,[ebp+arg_8]
0x1FE69CA+ PUSHesi
0x1FE69CB+ MOVesi,ecx
0x1FE69CD+ ANDecx,0FFFFh
0x1FE69D3+ SHResi,10h
0x1FE69D6+ CMPebx,1
0x1FE69D9+ +JNZloc_1FE6A0C
0x1FE69C0+ PUSHebp
0x1FE69C1+ MOVebp,esp
0x1FE69C3+ MOVecx,[ebp+arg_0]
0x1FE69C6+ PUSHebx
0x1FE69C7+ MOVebx,[ebp+arg_8]
0x1FE69CA+ PUSHesi
0x1FE69CB+ MOVesi,ecx
0x1FE69CD+ ANDecx,0FFFFh
0x1FE69D3+ SHResi,10h
0x1FE69D6+ CMPebx,1
0x1FE69D9+ +JNZloc_1FE6A0C
TypeII:Syntacticallyequivalent
5
0x1FE05B0+ PUSHebp
0x1FE05B1+ MOVebp,esp
0x1FE05B3+ MOVecx,[ebp+arg_0]
0x1FE05B6+ PUSHebx
0x1FE05B7+ MOVebx,[ebp+arg_8]
0x1FE05BA+ PUSHesi
0x1FE05BB+ MOVesi,ecx
0x1FE05BD+ ANDecx,0FFFFh
0x1FE05B3+ SHResi,10h
0x1FE05B6+ CMPebx,1
0x1FE05B9+ +JNZloc_1FE05BC
0x1FE69C0+ PUSHebp
0x1FE69C1+ MOVebp,esp
0x1FE69C3+ MOVeax,[ebp+msg_0]
0x1FE69C6+ PUSHedx
0x1FE69C7+ MOVedx,[ebp+msg_1]
0x1FE69CA+ PUSHesi
0x1FE69CB+ MOVesi,eax
0x1FE69CD+ ANDeax,0FFFFh
0x1FE69D3+ SHResi,10h
0x1FE69D6+ CMPedx,1
0x1FE69D9+ +JNZloc_1FE6A0C
TypeIII:Minormodification
6
0x1FE05B0+ PUSHebp
0x1FE05B1+ MOVebp,esp
+
+
0x1FE05B7+ MOVebx,[ebp+arg_8]
0x1FE05BA+ PUSHesi
0x1FE05BB+ MOVesi,ecx
0x1FE05BD+ ANDecx,0FFFFh
0x1FE05B3+ MOVeax,ecx
0x1FE05B6+ SHResi,10h
0x1FE05B9+ CMPebx,1
0x1FE05C1+ +JNZloc_1FE05BC
0x1FE69C0+ PUSHebp
0x1FE69C1+ MOVebp,esp
0x1FE69C3+ MOVeax,[ebp+msg_0]
0x1FE69C6+ PUSHedx
0x1FE69C7+ MOVedx,[ebp+msg_1]
0x1FE69CA+ PUSHesi
0x1FE69CB+ MOVesi,eax
0x1FE69CD+ ANDeax,0FFFFh
0x1FE69D3+ SHResi,10h
0x1FE69D6+ CMPedx,1
0x1FE69D9+ +JNZloc_1FE6A0C
originalclone7
Obfuscation and Optimization - Challenges
8
Obfuscation and Optimization - Problems
• P1:Therelationshipsamongassemblytokens• xmm0(SSE)registervs.SSEoperationssuchasmovaps• fclosevs.fopen.• strcpyvs.memcpy.
• P2:Tokencombinationweights• Reverseengineerslookfor‘interestingpattern’.(higherweight)• Regular,random,orrepeatedpatternisnotinteresting.(lowerweight)
• SoundsofamiliarinNLP!
9
Learning English
1)Thecat____onthemat.
A:foodB:satC:sittingD:isspeaking
10
Paragraph Vector (p2vec):
11
king–man+woman=queenbad-good=maniacal_killer*
* ExamplecollectedfromAndreasMueller@amuellerml
Asm2Vec:
12
T-SNE Visualization
13
T-SNE Visualization
14
Evaluation (Quantitative)
15
Evaluation (Quantitative)
16
Evaluation (Case Studies)
17
Vulnerability retrieval
Evaluation (Case Studies)
18
Asm2Vec (IEEE S&P19) +Againstobfuscationandoptimization.+Evenbetterthanthemostrecentdynamicapproach.+Staticapproach:efficientandscalable.- Binarydiffering(interpretability?)- Staticapproach:cannotrecognizejumptable,etc.-Assemblycodecomefromthesameprocessorfamily.
19
TheKam1n02.xBinaryAnalysisPlatform
20
Subgraphclone
21
Sym1n0
22
Thank you. Questions?