introduction to malware analysis
TRANSCRIPT
Introduction to Malware Analysis
Disclaimer
• This stuff requires the analyst to dive extremely deep into technical details
• This quick talk will attempt to give you a 1000 foot view of malware analysis
• I put a careful distinction between Malware Analysis and Reverse Engineering
Malware Analysis Overview
• Static Analysis: involves analyzing the code without actually running the code– File identification, header information, strings, etc.
– Disassembler – IDA Pro
• Dynamic Analysis: involves executing the code in a controlled manner and monitoring system changes– Sysinternals, memory forencis, etc.
– Debuggers – Immunity Debugger OllyDbg
Coding Terms
• Malware authors with code in High Level Programming Language: C/C++
Static Analysis: File Identification
• Linux “file” utility
• Python-magic module
Static Analysis: MD5 Hash
• Linux “md5sum” utility: md5sum <fileName>
• Python hashlib module:
Static Analysis: Strings
• Can be a quick way to gain intelligence from the file:
– Domains, Ips, URLs, Function names, hardcoded information
Static Analysis: Packers
• Packers are used to obfuscate the code which leads to: Changes the file signature (MD5 Hash) – Obfuscates the file strings, and code – Compress file size (sometimes)
• Packed code can be identified by: – Examining the PE sections, and Imports: If a PE file only
has LoadLibrary/GetProcAddress normally packed – Strings: UPX0, UPX1, aspack, adata, NSP0, NSP1, WinRAR
SFX, PEC2, PECompact2, Themida, Orean.sys, NTkrnl, Secure Suite
• Tools like (PEiD, LordPE, and Python peutils module)
Static Analysis: Packers
• Unpacked vs. Packed Strings:
Data Encoding
• Malware uses encoding for a number of reasons, some are to disguise internal workings, hide C2 information, and data exfil– Some simple encoding algorithms are: – Character Substitution
– XOR – uses a static key to XOR with the original value – Base64 – Can use default or custom character set – Default Base64 character set: A-Z, a-z, 0-9, +, /
• We will examine two common data encoding techniques used in Malware XOR and Base64
Data Encoding: XOR
• Strings are often required to be stored in a program in order to pass it as a parameter to a function
• XOR once = encoded
• XOR again with same key = plaintext
Data Encoding: Base64
• Storing base64 strings as HTML comments is how the APT group “Comment Crew” got their name. This technique is still leveraged today in malware
• Base64 is a common encoding scheme because it is very easy to decode
Static Analysis: PE File Format
• PE data structure contains all the information required for the Windows OS loader to manage executable code. .text – instructions the CPU executes – .rdata – Imports and Exports – .data – Global data – .rsrc – Resources (icons, images, strings, etc.)
• Useful information in PE header: Imports and Exports – Gives an idea to malware functionality – Compilation Time, Language Settings, and strings – Section Names – Packed code can have non-standard section names
• Tools to analyze PE header: pescanner.py, CFF Explorer, python pefile, Resource Hacker, Dependency Walker, LordPE, etc.
Windows API Calls:
• When performing advanced static or dynamic analysis it’s important to have a good understanding of Windows API calls
• By looking at the imported functions within the PE header you can see which Windows API functions the PE file wants to utilize
• By recognizing API calls you can quickly get an idea of malware’s functionality by analyzing strings output, and during advanced static analysis using a disassembler
• An excellent resource for Windows API calls is MSDN. Google search “API_Function MSDN”
Windows API: MSDN Example
• The Parameters modify how the function will be used on the system.
• The return type is what the function will return after it is called in a program
Windows API: Disassembly
• Parameters are pushed to the stack in Last In First Out(LIFO) order, which is why they are in reverse order in the disassembly
Wake Up
• Okay, that was likely starting to bore some people – SORRY
• Let’s move to Dynamic analysis which is more flashy
Getting Infected
• Double clicking the executable doesn’t always work– Sometimes you need to register the malware as a service or load it as
a DLL (regsvr32.exe and rundll32.exe )
• Install the malware as a service
– Interact with the system like a normal user The malware may be waiting for a certain application to open to inject code into it (Ex: Internet Explorer)
– It could require a CLI argument : One sample required <filename> /install in order to actually run the malware
– Static analysis is normally required to determine CLI switches
SysInternals Tool Suite
• If I could pick just one tool, id pick the 50+ in the Sysinternals tool suite
• Tools put out by Mark Russinovich – now works for Microsoft
• Process Explorer, Process Monitor, Autoruns, etc.
Process Explorer
Process Monitor
• Very verbose tool that generates a lot of events
• Filtering is required to make sense of the data
Process Monitor Cont.
• Press Ctrl+L to bring up the filtering dialog box – Quick filters are: Operation is WriteFile
– Category is Write
Malware Persistence - Autoruns
• Really is the key to identify malware – how does it gain persistence?
• Autoruns can help enumerate persistence mechanisms:
Monitoring Network Activity
• Some interesting network indicators of malware are:
– SYNs out to an IP or domain
– UDP traffic to IP or domain
– HTTP GET/POST requests
– DNS Queries
– Connection attempt times are important. Every 1 min, 30mins, etc.
Automation? Sandboxes
• So far the basic dynamic analysis we have talked about can be automated
• Sandboxes are a good tool in any malware analyst toolbox – they have Pro’s and Con’s:– Pros: Speeds up analysis, fast, saves time– Cons: Misses details, can be fooled
• Sandboxes can be open source or commercial:– Really good free option is Cuckoo sandbox:
• Install Tutorial: http://www.primalsecurity.net/im-cuckoo-for-malware-with-a-spice-of-reverse-engineering/
Summary
• Malware analysis requires both static and dynamic analysis techniques to accurately enumerate indicators of compromise
• As with any automated tool an analyst will need to be able to validate findings manually