基於機器學習的惡 意軟體分類實作 microsoft malware … › 2016 › pacific ›...

81
Machine Learning Protect against tomorrow’s threats 基於機器學習的惡 意軟體分類實作: Microsoft Malware Classification Challenge 經驗談 Trend Micro ch0upi miaoski Kyle Chung 2 Dec 2016

Upload: others

Post on 03-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

基於機器學習的惡意軟體分類實作:MicrosoftMalwareClassificationChallenge經驗談Trend Microch0upimiaoskiKyle Chung2 Dec 2016

Page 2: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsch0upi

• StaffengineerinTrendMicro• MachineLearning+DataAnalysis• Threatintelligenceservices• KDDCup 2014+KDDCup 2016:Top10• GoTrend:6th inUECCup2015

Page 3: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsmiaoski

• SeniorthreatresearcherinTrendMicro• Threatintelligence• SmartCity• SDR• Arduino+RPi makers• Lovescats

Page 4: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

4

Outline

• WhyMalwareClassification?• MachineLearning• MicrosoftChallenge• HowtoSolveit?• Conclusion

Page 5: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MALWARECLASSIFICATION

Page 6: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsWhat’sMalwareClassification?

• Identifymalwarefamily

Page 7: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsWhyMalwareClassification?

• Knowhowtoclean• Possibleattribution• Setproperpriority

Page 8: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsCurrentMalwareClassification

• Manuallygeneratedbyresearchers• Usesignaturetofingerprintmalware• YARArules

Page 9: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsChallenge

• Manualprocessè wrongfamily• moreandmoremalwarefamilies

• Verylargevolume• daily1M+samples

• Increasingsignatures• Slowinscanning+needmorestorage

Page 10: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsBSidesLV 2016:VirusShare

• JohnSeymour,LabelingtheVirusShare Corpus:LessonsLearned,BSidesLV 2016

• VirusShare Corpus:~20Mfiles

Page 11: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threatsMachineLearningfor

• Automationofmalwarefamilyidentification• Saveresearcher’seffort

Page 12: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MACHINE LEARNING

Page 13: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

13

MachineLearningSteps

• PrepareData• GenerateFeature• TrainModel• MakePrediction• Evaluate

Page 14: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

14

FruitClassification

• Apple

• Banana

Page 15: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

15

FruitFeatures

• Color• Shape• Size• Weight

Page 16: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

16

Learning

• Apple

• Banana

Page 17: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

17

LearnedModelofFruit

• Apple• Color:Red• Shape:Round

• Banana• Color:Yellow• Shape:Long

Page 18: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

18

NewFruitComing…

• Apple?Banana?

Page 19: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

19

PredictionofFruits

• Fruit1• Color:Red=>Apple• Shape:Round=>Apple

• Fruit2• Color:Yellow=>Banana• Shape:Long=>Banana

Page 20: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

20

EvaluationofFruit

• Accuracy:(9+9)/20=90%Apple Banana

Apple 9 1Banana 1 9Total 10 10

Page 21: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

21

MachineLearningSteps

• PrepareData• GenerateFeature• TrainModel• MakePrediction• Evaluate

Page 22: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

22

MachineLearningis...

• Mathematicalmethodsandalgorithms• Fromhistoricallabelleddata• Findaseparatinghyperplane• Applyitonfuturedata

Page 23: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

23

Featureis...

• Measurablepropertyofaphenomenonbeingobserved• Usetodescribeentries

• Featurevector• inputofmachinelearningalgorithm

• Sourceoffeatures• dataexploring• domainknowledge

Page 24: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

24

Modelis...

• Amathematicaldescriptionofhowtoclassifythedata

• Parameterstunedbycertainalgorithm• training

• Usedtomakeprediction

Page 25: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

25

Prediction

• Identifytheclassofnewentities• Withtrainedmodelfromtrainingdata

Page 26: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

26

Evaluation

• Reviewmodelresultbysomemeasurements• Crossvalidation• Evaluationfunctions

• Accuracy• logloss• AUC• precision,recall,F1

Page 27: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

27

GlueLanguage

• GluethestepsofMachineLearningLearning• Batchrunningforlargeamountofdata• IntegrationwithHadoop,Spark• Richlibraries/Algorithmsupport• Easytodevelop/learn

Page 28: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

28

scikit learn

• Opensourcemachinelearninglibraryforpython• Variousclassification,regressionandclustering

algorithms• InteroperatewithNumPy,SciPy,andunderlying

BLAS

Page 29: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

29

scikit learncheat-sheet

Page 30: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

30

SupportedAlgorithminscikit learn

• ClassificationAlgorithm• LogisticRegression:linear_model.LogisticRegression()• SVM:svm.SVC()• RandomForest:ensemble.RandomForestClassifier()

• Interface• fit(X,Y):trainmodel• Yp=predict(X):makeprediction

Page 31: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

31

EvaluationFunctionsinscikit learn

• Evaluationfunctions• metrics.accuracy_score()• metrics.log_loss()• metrics.auc()• metrics.f1_score()

• metrics.confusion_matrix()• metrics.classification_report()

Page 32: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MICROSOFTMALWARECLASSIFICATIONCHALLENGE

Page 33: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

33

MicrosoftMalwareClassificationChallenge

• HostedbyWWW2015/BIG2015• MicrosoftMalwareProtectionCenter

MicrosoftAzureMachineLearningMicrosoftTalentManagement

• PEHexdump &Disassembled• Training:10,868(compressed:17.5GB)• Testing:10,873(compressed:17.7GB)

Page 34: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

34

9 Classes

Category Count1 Ramnit 15412 Lollipop 24783 Kelihos_ver3 29424 Vundo 4755 Simda 426 Tracur 7517 Kelihos_ver1 3988 Obfuscator.ACY 12289 Gatak 1013

Total 10868

Page 35: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

35

Class:Ramnit

• Stealsensitivepersonalinformation• Infectedthroughremovabledrivers• Copyitselfusingahard-codedname,orwitha

randomfilenametoarandomfolder• Injectcodesintosvchost.exe• InfectsDLL,EXE,HTML

Page 36: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

36

Class:Lollipop

• Anadwareshowsadswhenbrowsingweb• Bundlewiththird-partysoftware• AutorunwhenWindowsstarting

Page 37: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

37

Class:Kelihos

• ATrojanfamilydistributesspamemailwithmalwaredownloadlink

• CommunicatewithC&Cserver• SomevariantsinstallWinPcap tospynetwork

activity

Page 38: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

38

PEHexdump w/oHeader

Page 39: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

39

IDAProDump

Page 40: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

40

IDAProDump

Page 41: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

41

IDAProDump

Page 42: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

42

EvaluationFunction

pij is the submitted probability of sample i is class jyij=1 if sample i is class j,yij=0 for others

𝑙𝑜𝑔𝑙𝑜𝑠𝑠 = −1𝑁))𝑦𝑖𝑗log(𝑝𝑖𝑗)

4

567

8

967

Page 43: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

43

EvaluationExample

• Submission00000000,0.5,0.5,0,0,0,0,0,0,0,000000001,0,0,0.5,0.5,0,0,0,0,0,000000002,0,0,1,0,0,0,0,0,0,0

• logloss =- (log(0.5)+log(0)+log(1))/3• log(0)=>log(1e-15)

Page 44: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

44

LeaderBoard

• Publicvs.Private

Page 45: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

45

LeaderBoard

Public

Private

Page 46: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

HOWTOSOLVEIT?

Page 47: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

47

FirstFeatureSet

• Binarysize• Hexcount• Stringlengthstats• TLSH

Page 48: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

48

BinarySize

Category Avg. Size1 Ramnit 14821702 Lollipop 58295303 Kelihos_ver3 89826304 Vundo 11209505 Simda 45523306 Tracur 18011507 Kelihos_ver1 50519008 Obfuscator.ACY 8271189 Gatak 2555070

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

10000000

1 2 3 4 5 6 7 8 9

Page 49: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

49

HexCount

• CountofHEX• 00,01,02,…,FE,FF,??• 257dimensions

• 1-gram

Page 50: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

50

HexCountDistribution

0009121b242d36 3f 48515a636c757e879099a2abb4bdc6 cf d8e1ea f3 fc

0009121b242d36 3f 48515a636c757e879099a2abb4bdc6 cf d8e1ea f3 fc

0009121b242d36 3f 48515a636c757e879099a2abb4bdc6 cf d8e1ea f3 fc

1. Ramnit

2. Lollipop

3. Kelihos_v3

Page 51: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

51

HexCountConfusionMatrix?

Page 52: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

52

StringStats

• String:printablecharswherelength>4• Stringcount,avg.length,maxlength

0

2000

4000

6000

8000

10000

12000

14000

1 2 3 4 5 6 7 8 902468

101214161820

1 2 3 4 5 6 7 8 9

Avg. Count Avg. length

Page 53: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

53

TLSH

• TrendMicroLocalitySensitiveHash• Fuzzymatchingforsimilaritycomparison• GetthemostsimilarclassbyvotingofTop5similar

filesfromtrainingdata

Page 54: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

54

TLSH

0

500

1000

1500

2000

2500

3000

3500

1 2 3 4 5 6 7 8 9

Page 55: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

55

MoreFeatures

• HEXn-gram• APIcall• Importtable• Instruction• Domainknowledge

Page 56: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

56

2-gram/3-gram

• 2-gram:(256+1)^2=66,049• 3-gram:(256+1)^3=16,974,593

Page 57: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

57

HEX2-gram/3-gram

• Important2-gramExample• Featureselection:reducefeaturesize

BiHEX 1. Ramnit 2. Lollipop 3. Kelihos_ver397 86 1.412 2.047 26.6514b e5 1.718 0.722 13.201f7 99 1.746 12.539 13.60675 08 228.09 288.78 13.1684e 47 146.318 12.159 13.512

Page 58: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

58

APICall

• APIusedinPE

API 1. Ramnit 2. Lollipop 3. Kelihos_ver3IsWindow() 0.164 0.257 0.987DispatchMessageA() 0.159 0.845 0.987GetCommandLineA() 0.355 0.981 0.025DllEntryPoint() 0.656 0 0GetIconInfo() 0.023 0 0.936

Page 59: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

59

ImportTable

• Alookuptableforcallingfunctionsinothermodule

1. Ramnit 2. Lollipop 3. Kelihos_ver3KERNEL32.dll KERNEL32.dll USER32.dllUSER32.dll USER32.dll KERNEL32.dllADVAPI32.dll ADVAPI32.dll MSASN1.dllole32.dll OPENGL32.dll UXTHEME.dllOLEAUT32.dll OLEAUT32.dll CLBCATQ.dllmsvcrt.dll GDI32.dll DPNET.dllAPPHELP.dll WS2_32.dll NTSHRUI.dll

Page 60: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

60

OtherInfoinImportTable

• NumberofdistinctDLL

0

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9

Page 61: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

61

InstructionFrequency

• Verypowerful

instruction 1. Ramnit 2. Lollipop 3. Kelihos_ver3imul 86.768 2257.3 0.002movzx 289.17 118.79 0sbb 68.815 17.375 4.746jnz 1154.8 154.57 7.842mov 12336.6 7059.8 158.94

Page 62: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

62

MoreDomainKnowledge

• Segment• Packer• Othertypeofbinary

Page 63: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

63

Segments

• Commonsegmentname

• Uniquesegmentname1. Ramnit 2. Lollipop 3. Kelihos_ver3_data _text _rdata_text _data _text_rdata _rdata _data_bss _zenc_gnu_deb_tls

Page 64: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

64

OtherInfofromSegments

• NumberofSegments

Page 65: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

65

Packer

• CommonsegmentnameofPacker• UPX0/UPX1onlyinclass8.Obfuscator.ACY

Page 66: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

66

OtherTypeofBinary

• RARfiles

Page 67: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

67

OtherTypeofBinary

• MicrosoftOfficefiles

Page 68: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

68

Ensemble:LinearBlending

• Combinetheresultfromseveralmodels• Voteofmodels

Page 69: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

WORKOFWINNINGTEAM

Page 70: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

70

Features

• Instructionn-gram• ASMpixelmap

http://blog.kaggle.com/2015/05/26/microsoft-malware-winners-interview-1st-place-no-to-overfitting/

Page 71: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

71

Features

• ASMpixelmap(intensityoffirst1000bytes)

Page 72: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

72

xgboost

• Gradientboostingpackage• WidelyusedinKagglecompetition

Page 73: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

CONCLUSION

Page 74: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

74

PhysicalMeaningofFeatures

• Hexn-gram• Opcode + imm/addr

• Instructionn-gram• Opcode

Page 75: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

75

HappyEnding?

• Welcometotherealworld!• Newmalwarefamily• Mis-labelling

• Mechanismtomitigatetheissues.

Page 76: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

76

TrendMicroXGen

Page 77: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

77

TrendMicroMLContest

• MalwareIdentificationChallenge• 134teams,626players,from6+countries• Real-timescoring

Page 78: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

78

HowtoImproveModel

• Usedomainknowledge• Unpack, unzip ...

• Improvefeaturerepresentation• Distinctive features for classes which you don’t do

well• Regulateoverfitting

Page 79: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

79

HowtoImproveModel

• Findwhichitemscannotbecoveredbymodel• Adjustcurrentfeatures• Findnewfeatures• Tuningalgorithmparameters• Usedifferentalgorithm• Ensemble/Blending

Page 80: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

MachineLearning

Protect againsttomorrow’s

threats

80

LocalLibraryvs.CloudPlatform

Cloudplatformisnotnecessarilyeasier• Glue&Integration

• Data(pre-)processing• Modeltraining/prediction• Evaluation

• DiversityofMLalgorithms• Parametertuning

Page 81: 基於機器學習的惡 意軟體分類實作 Microsoft Malware … › 2016 › pacific › 0composition › pdf...Machine Learning Protect against tomorrow’s threats Machine Learning

MachineLearning

Protect againsttomorrow’s

threats

THANKYOU