semantic trace-based malware variants detectioncrest.cs.ucl.ac.uk/cow/12/slides/khalidcow12.pdf ·...

31
Overview Trace-based approach Experiments Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6, 2011

Upload: others

Post on 21-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic Trace-based Malware VariantsDetection

Khalid Alzarooni

CREST - DCS - UCL

April 6, 2011

Page 2: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Outline

Overview

Trace-based approach

Experiments

Page 3: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Overview

Page 4: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Malware Variants

• Speed of evolution of malware partly driven by automaticgeneration of program variants

• Semantic equivalence tables used in malware, e.g.polymorphic and metamorphic malware

• These alter “local behaviour” of programs but larger scalebehaviour is unchanged

Page 5: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Malware Problem

Anoirel S. IssaSymantec, UK (EICAR 2009)

“Poly or metamorphic engines have some essentialcomponents that help them build highly obfuscated code.A single engine is able to produce unique variants thatcan reach millions.”

Malware evolution: M0 → M1 → M2 → M3 → . . .Syntactic view: code0 6≈ code1 6≈ code2 6≈ code3 6≈ . . .

Page 6: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Some Code Obfuscation Schemes

[Beaucamps, 2007, Szor, 2005]

Label Category Obfuscation

gi Garbage insertion {} → {C}op Opaque predicate {} → {PT/F}ec Equivalent command {op} → {op}rr Register renaming {Rx} → {Ry}cs Command split {C} → {Cx ,Cy}cm Command merging {Cx ,Cy} → {Cxy}cr Command reorder {(Cx ,Cy )} → {(Cy ,Cx)}.. . . . . . .

Page 7: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Example: a program P and its semantically equivalentvariant P ′

P :a R0:=n

b R1:=m

c R2:=R1

d R3:=R2+R0

e R4:=R1+k

f R5:=1

−→

P ′ :a′ R0:=n

cr1 JMP rr1

gi1 R22:=R22+1

op1 PT JMP cm

rr1 R11:=m

gi2 R22:=R22+1

cr2 JMP op1

cm R3:=R11+R0

e′1 R4:=k

e′2 R4:=R4+R11

rr2 R15:=1

Page 8: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Malware Problem

• To detect variants of a known malware

• Given two arbitrary programs is it possible to tell whether theyare semantically equivalent?

• It is undecidable: not possible to devise an algorithm toproduce “yes” or “no” detection answer [Cohen, 1987]

P ′?≈ P

Page 9: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic trace-based

Program↓

Program approximation↓

Trace collection↓

Semantic analysis↓

Detection of semantic signatures

Page 10: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Test scenarios

Results:

• Tested samples: Bho, Binom, Mobler, Telf, . . .

• Most malware successfully matched, with k ≥ 60%

• No false positives, similarity ≤ 20% (10 benign executables)

• 100% malware variants classification

• sig-w-slice: accuracy 30% and speed 26% in detection phase

• sig-wo-slice: 5 :7 faster in sig. generation phase

Page 11: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Trace-based approach

Page 12: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic trace-based

• Design a detector that can tell when two programs areapproximately equivalent, which might often be good enough

• Approximate semantic equivalence is decidable

• Approximate a program’s semantics [[P]]• CFG abstract traces (program paths) & test inputs• concrete & semantic traces

Malware evolution: M0 → M1 → M2 → M3 → . . .Syntactic view: code0 6≈ code1 6≈ code2 6≈ code3 6≈ . . .Semantic view: [[M0]] ≈ [[M1]] ≈ [[M2]] ≈ [[M3]] ≈ . . .

Page 13: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic trace-based

• M1 is a variant of M0 if [[M0]] is sub-sequence of [[M1]]

.

2 3 4

1 2 3 4

malware trace

variant trace

1

t

t ′

∀t ∈ [[M0]], ∃t ′ ∈ [[M1]] : t ≺ t ′

Page 14: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic trace-based

Two phases:

1. Signature generation2. Detection

Page 15: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Signature generation phase

executable M↓ (disassembler & translator)

abstract code (AAPL)↓ (test data generator)

abstract trace and a test input x↓ (semantic simulator)

a concrete trace↓ (trace slicer)

trace slices↓ (abstracter)

semantic traces τm

semantic signature = (τm, x)

Page 16: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Detection phase

executable P↓ (disassembler & translator)

abstract code (AAPL)↓ (semantic simulator, sigm = (τm, x))

a concrete trace↓ (abstracter)

(τp, τm)↓ (Matcher)

yes/no

Page 17: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Experiments

Page 18: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Detector prototype

Signature generationphase

Malicious program M

Semantic signatures

Suspicious program P

Detection phase

Yes/No

Page 19: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Test scenarios

We tested:

• Robustness against real in-the-wild variants

• Effectiveness of trace slicing in the signatures

• Fig. gen.& detection phases: sig-wo-slice vs. sig-w-slice

• False positives

• Classification of malware samples

Page 20: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Test scenarios

Results:

• Tested samples: Bho, Binom, Mobler, Telf, . . .

• Most malware successfully matched, with k ≥ 60%

• sig-w-slice: accuracy 30% and speed 26% in detection phase

• sig-wo-slice: 5 :7 faster in sig. generation phase

• No false positives, similarity ≤ 20% (10 benign executables)

• 100% malware variants classification

Page 21: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Prototype limitation

Technical shortcomes:

• Limited to viruses and worms

• Does not work for dynamic packed code and code withanti-disassembly techniques and

• Relay on tools to manually unpack (encrypted) anddisassemble files

Page 22: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Thank you very much !

0Image: Salvatore Vuono / FreeDigitalPhotos.net

Page 23: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

References

Alzarouni, K., Clark, D., and Tratt, L. (2010).

Semantic malware detection.

Technical Report TR-10-03, Department of Computer Science,King’s College London.

Beaucamps, P. (2007).

Advanced metamorphic techniques in computer viruses.

In Proceedings of the International Conference on Computer,Electrical, and Systems Science, and Engineering - CESSE’07.

Cohen, F. (1987).

Computer viruses: theory and experiments.

Comput. Secur., 6(1):22–35.

Szor, P. (2005).

The Art of Computer Virus Research and Defense.

Addison-Wesley, Reading, Mass.

Page 24: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Detector components

Page 25: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Trace Semantics

• Trace semantics of a program is the set of all traces T thatthe program can produce

• A trace t ∈ T is a sequence of pairs of execution context Xand program syntax C

• Execution context: memory (locations) and environment(variables) values X = E ×M

• Program syntax: source code (commands)

ρ ∈ E = R→ Z⊥ (environments)

m ∈M = Z→ Z⊥ ∪ C (memory)

ξ ∈ X = E ×M (execution contexts)

S = C×X (program states)

Page 26: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Trace Semantics

• Signatures refer to exact program state

• Semantic signatures refer to values at particular memorylocations and in registers that are observed to be constantacross variants from the same malware family

• Detection: environment-memory traces of M that arecontained (subtraces) of environment-memory traces of M ′

Page 27: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic SimulatorNot “live” testingEvaluate abstract trace and collect concrete tracesSemantics of Actions:

A : A×X → X

A[[R := E ]]ξ = (ρ′,m) where ξ = (ρ,m) and ρ′ = ρ(R 7→ E[[E ]]ξ)

A[[∗R := E ]]ξ = (ρ,m′) where ξ = (ρ,m) and m′ = m(ρ(R) 7→ E[[E ]]ξ)

A[[JMP E]]ξ = (ρ′,m) where ξ = (ρ,m) and ρ′ = ρ(PC 7→ E[[E ]]ξ)

A[[RTN]]ξ = (ρ′,m) where ξ = (ρ,m) and ρ′ = ρ(PC 7→ m(ρ(SP)), SP 7→ SP + 1)

A[[PUSH E]]ξ = (ρ′,m′) where ξ = (ρ,m) and ρ′ = ρ(SP 7→ SP − 1) and

m′ = m(ρ(SP − 1) 7→ E[[E ]]ξ)

Page 28: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Semantic Simulator

Not “live” testingEvaluate abstract trace and collect concrete tracesSemantics of Commands:

C : S → Σ(S) (determines transition relation between states)

C[[CA]]ξ = (ξ′,C ′) where ξ = (ρ,m), ξ′ = A[[A]]ξ and

C ′ =

{m(ρ(PC)) if A := JMP ∪ CALL ∪ RTN

m(ρ(PC + 1)) otherwise

C[[CB ]]ξ = (ξ′,C ′) where ξ = (ρ,m), and

(ξ′,C ′) =

{ξ′ = (ρ′,m), ρ′ = ρ(PC 7→ E[[E ]]ξ),C ′ = m(ρ(E[[E ]]ξ)) if B[[B]]ξ = trueξ′ = ξ,C ′ = m(ρ(PC + 1)) otherwise

Page 29: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

TSAlgo – Trace slicing

• Pslice−→ P ′ (semantically invariant subprogram wrt a criterion)

• tslice−→ t ′ (semantically invariant subtrace wrt tsc)

• Trace slicing criterion tsc : recent definition points of variablesin t

• A conjecture: useful in the detection step for more accurateand efficient results.

• Effect is to shorten the trace and thus the signature

Page 30: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Signature matching

sig = (τm, x) of M and τp of P:

MD(sig ,P) =

{yes if τm is contained in τpno otherwise

Our assumption: some core semantic values in the two variantsthat would match with a high degree of similarity, indicating thelikelihood of them being behaviourally the same.

Page 31: Semantic Trace-based Malware Variants Detectioncrest.cs.ucl.ac.uk/cow/12/slides/Khalidcow12.pdf · Semantic malware detection. Technical Report TR-10-03, Department of Computer Science,

Overview Trace-based approach Experiments

Signature matching

• we look for corresponding semantic traces of τm in τp,

• a fuzzy matching to determine whether τp correspondssemantically to τm

semantics similarity measure = no. of mappings/|τm|

• We consider τm is contained in τp if the similarity measure isabove a certain similarity threshold k ,

k ≤ similarity measure ≤ 100

k: a large percentage of (desired) mappings