semantic trace-based malware variants detectioncrest.cs.ucl.ac.uk/cow/12/slides/khalidcow12.pdf ·...
TRANSCRIPT
Overview Trace-based approach Experiments
Semantic Trace-based Malware VariantsDetection
Khalid Alzarooni
CREST - DCS - UCL
April 6, 2011
Overview Trace-based approach Experiments
Outline
Overview
Trace-based approach
Experiments
Overview Trace-based approach Experiments
Overview
Overview Trace-based approach Experiments
Malware Variants
• Speed of evolution of malware partly driven by automaticgeneration of program variants
• Semantic equivalence tables used in malware, e.g.polymorphic and metamorphic malware
• These alter “local behaviour” of programs but larger scalebehaviour is unchanged
Overview Trace-based approach Experiments
Malware Problem
Anoirel S. IssaSymantec, UK (EICAR 2009)
“Poly or metamorphic engines have some essentialcomponents that help them build highly obfuscated code.A single engine is able to produce unique variants thatcan reach millions.”
Malware evolution: M0 → M1 → M2 → M3 → . . .Syntactic view: code0 6≈ code1 6≈ code2 6≈ code3 6≈ . . .
Overview Trace-based approach Experiments
Some Code Obfuscation Schemes
[Beaucamps, 2007, Szor, 2005]
Label Category Obfuscation
gi Garbage insertion {} → {C}op Opaque predicate {} → {PT/F}ec Equivalent command {op} → {op}rr Register renaming {Rx} → {Ry}cs Command split {C} → {Cx ,Cy}cm Command merging {Cx ,Cy} → {Cxy}cr Command reorder {(Cx ,Cy )} → {(Cy ,Cx)}.. . . . . . .
Overview Trace-based approach Experiments
Example: a program P and its semantically equivalentvariant P ′
P :a R0:=n
b R1:=m
c R2:=R1
d R3:=R2+R0
e R4:=R1+k
f R5:=1
−→
P ′ :a′ R0:=n
cr1 JMP rr1
gi1 R22:=R22+1
op1 PT JMP cm
rr1 R11:=m
gi2 R22:=R22+1
cr2 JMP op1
cm R3:=R11+R0
e′1 R4:=k
e′2 R4:=R4+R11
rr2 R15:=1
Overview Trace-based approach Experiments
Malware Problem
• To detect variants of a known malware
• Given two arbitrary programs is it possible to tell whether theyare semantically equivalent?
• It is undecidable: not possible to devise an algorithm toproduce “yes” or “no” detection answer [Cohen, 1987]
P ′?≈ P
Overview Trace-based approach Experiments
Semantic trace-based
Program↓
Program approximation↓
Trace collection↓
Semantic analysis↓
Detection of semantic signatures
Overview Trace-based approach Experiments
Test scenarios
Results:
• Tested samples: Bho, Binom, Mobler, Telf, . . .
• Most malware successfully matched, with k ≥ 60%
• No false positives, similarity ≤ 20% (10 benign executables)
• 100% malware variants classification
• sig-w-slice: accuracy 30% and speed 26% in detection phase
• sig-wo-slice: 5 :7 faster in sig. generation phase
Overview Trace-based approach Experiments
Trace-based approach
Overview Trace-based approach Experiments
Semantic trace-based
• Design a detector that can tell when two programs areapproximately equivalent, which might often be good enough
• Approximate semantic equivalence is decidable
• Approximate a program’s semantics [[P]]• CFG abstract traces (program paths) & test inputs• concrete & semantic traces
Malware evolution: M0 → M1 → M2 → M3 → . . .Syntactic view: code0 6≈ code1 6≈ code2 6≈ code3 6≈ . . .Semantic view: [[M0]] ≈ [[M1]] ≈ [[M2]] ≈ [[M3]] ≈ . . .
Overview Trace-based approach Experiments
Semantic trace-based
• M1 is a variant of M0 if [[M0]] is sub-sequence of [[M1]]
.
2 3 4
1 2 3 4
malware trace
variant trace
1
t
t ′
∀t ∈ [[M0]], ∃t ′ ∈ [[M1]] : t ≺ t ′
Overview Trace-based approach Experiments
Semantic trace-based
Two phases:
1. Signature generation2. Detection
Overview Trace-based approach Experiments
Signature generation phase
executable M↓ (disassembler & translator)
abstract code (AAPL)↓ (test data generator)
abstract trace and a test input x↓ (semantic simulator)
a concrete trace↓ (trace slicer)
trace slices↓ (abstracter)
semantic traces τm
semantic signature = (τm, x)
Overview Trace-based approach Experiments
Detection phase
executable P↓ (disassembler & translator)
abstract code (AAPL)↓ (semantic simulator, sigm = (τm, x))
a concrete trace↓ (abstracter)
(τp, τm)↓ (Matcher)
yes/no
Overview Trace-based approach Experiments
Experiments
Overview Trace-based approach Experiments
Detector prototype
Signature generationphase
Malicious program M
Semantic signatures
Suspicious program P
Detection phase
Yes/No
Overview Trace-based approach Experiments
Test scenarios
We tested:
• Robustness against real in-the-wild variants
• Effectiveness of trace slicing in the signatures
• Fig. gen.& detection phases: sig-wo-slice vs. sig-w-slice
• False positives
• Classification of malware samples
Overview Trace-based approach Experiments
Test scenarios
Results:
• Tested samples: Bho, Binom, Mobler, Telf, . . .
• Most malware successfully matched, with k ≥ 60%
• sig-w-slice: accuracy 30% and speed 26% in detection phase
• sig-wo-slice: 5 :7 faster in sig. generation phase
• No false positives, similarity ≤ 20% (10 benign executables)
• 100% malware variants classification
Overview Trace-based approach Experiments
Prototype limitation
Technical shortcomes:
• Limited to viruses and worms
• Does not work for dynamic packed code and code withanti-disassembly techniques and
• Relay on tools to manually unpack (encrypted) anddisassemble files
Overview Trace-based approach Experiments
Thank you very much !
0Image: Salvatore Vuono / FreeDigitalPhotos.net
Overview Trace-based approach Experiments
References
Alzarouni, K., Clark, D., and Tratt, L. (2010).
Semantic malware detection.
Technical Report TR-10-03, Department of Computer Science,King’s College London.
Beaucamps, P. (2007).
Advanced metamorphic techniques in computer viruses.
In Proceedings of the International Conference on Computer,Electrical, and Systems Science, and Engineering - CESSE’07.
Cohen, F. (1987).
Computer viruses: theory and experiments.
Comput. Secur., 6(1):22–35.
Szor, P. (2005).
The Art of Computer Virus Research and Defense.
Addison-Wesley, Reading, Mass.
Overview Trace-based approach Experiments
Detector components
Overview Trace-based approach Experiments
Trace Semantics
• Trace semantics of a program is the set of all traces T thatthe program can produce
• A trace t ∈ T is a sequence of pairs of execution context Xand program syntax C
• Execution context: memory (locations) and environment(variables) values X = E ×M
• Program syntax: source code (commands)
ρ ∈ E = R→ Z⊥ (environments)
m ∈M = Z→ Z⊥ ∪ C (memory)
ξ ∈ X = E ×M (execution contexts)
S = C×X (program states)
Overview Trace-based approach Experiments
Trace Semantics
• Signatures refer to exact program state
• Semantic signatures refer to values at particular memorylocations and in registers that are observed to be constantacross variants from the same malware family
• Detection: environment-memory traces of M that arecontained (subtraces) of environment-memory traces of M ′
Overview Trace-based approach Experiments
Semantic SimulatorNot “live” testingEvaluate abstract trace and collect concrete tracesSemantics of Actions:
A : A×X → X
A[[R := E ]]ξ = (ρ′,m) where ξ = (ρ,m) and ρ′ = ρ(R 7→ E[[E ]]ξ)
A[[∗R := E ]]ξ = (ρ,m′) where ξ = (ρ,m) and m′ = m(ρ(R) 7→ E[[E ]]ξ)
A[[JMP E]]ξ = (ρ′,m) where ξ = (ρ,m) and ρ′ = ρ(PC 7→ E[[E ]]ξ)
A[[RTN]]ξ = (ρ′,m) where ξ = (ρ,m) and ρ′ = ρ(PC 7→ m(ρ(SP)), SP 7→ SP + 1)
A[[PUSH E]]ξ = (ρ′,m′) where ξ = (ρ,m) and ρ′ = ρ(SP 7→ SP − 1) and
m′ = m(ρ(SP − 1) 7→ E[[E ]]ξ)
Overview Trace-based approach Experiments
Semantic Simulator
Not “live” testingEvaluate abstract trace and collect concrete tracesSemantics of Commands:
C : S → Σ(S) (determines transition relation between states)
C[[CA]]ξ = (ξ′,C ′) where ξ = (ρ,m), ξ′ = A[[A]]ξ and
C ′ =
{m(ρ(PC)) if A := JMP ∪ CALL ∪ RTN
m(ρ(PC + 1)) otherwise
C[[CB ]]ξ = (ξ′,C ′) where ξ = (ρ,m), and
(ξ′,C ′) =
{ξ′ = (ρ′,m), ρ′ = ρ(PC 7→ E[[E ]]ξ),C ′ = m(ρ(E[[E ]]ξ)) if B[[B]]ξ = trueξ′ = ξ,C ′ = m(ρ(PC + 1)) otherwise
Overview Trace-based approach Experiments
TSAlgo – Trace slicing
• Pslice−→ P ′ (semantically invariant subprogram wrt a criterion)
• tslice−→ t ′ (semantically invariant subtrace wrt tsc)
• Trace slicing criterion tsc : recent definition points of variablesin t
• A conjecture: useful in the detection step for more accurateand efficient results.
• Effect is to shorten the trace and thus the signature
Overview Trace-based approach Experiments
Signature matching
sig = (τm, x) of M and τp of P:
MD(sig ,P) =
{yes if τm is contained in τpno otherwise
Our assumption: some core semantic values in the two variantsthat would match with a high degree of similarity, indicating thelikelihood of them being behaviourally the same.
Overview Trace-based approach Experiments
Signature matching
• we look for corresponding semantic traces of τm in τp,
• a fuzzy matching to determine whether τp correspondssemantically to τm
semantics similarity measure = no. of mappings/|τm|
• We consider τm is contained in τp if the similarity measure isabove a certain similarity threshold k ,
k ≤ similarity measure ≤ 100
k: a large percentage of (desired) mappings