centre de comunicacions avançades de banda ampla (ccaba) universitat politècnica de catalunya...
TRANSCRIPT
![Page 1: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/1.jpg)
Centre de Comunicacions Avançadesde Banda Ampla (CCABA)
Universitat Politècnicade Catalunya (UPC)
Identification of Network Applications based on Machine Learning Techniques
COST-TMA Meeting, Samos 2008
Valentín Carela-Español
Pere Barlet-Ros
Josep Solé-Pareta
{vcarela, pbarlet, pareta}@ac.upc.edu
![Page 2: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/2.jpg)
Outline
Scenario and objectives Existing solutions
Well-known ports Payload based (pattern matching) Machine Learning
– Supervised– Unsupervised
Proposed method Results Conclusions and Future work
![Page 3: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/3.jpg)
Scenario and objectives
Scenario: SMARTxAC Traffic Monitoring and Analysis System for the Anella Científica Real-time classification Independent from packet contents High-speed link
Objectives: Development of a ML Technique to identify applications in
SMARTxAC Automate the ML training phase Adapt our solution to Netflow Study how it affects the sampling
![Page 4: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/4.jpg)
Outline
Scenario and objectives Existing solutions
Well-known ports Payload based (pattern matching) Machine Learning
– Supervised– Unsupervised
Proposed method Results Conclusions and Future work
![Page 5: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/5.jpg)
Existing Solutions
Well-known ports+ Computationally lightweight- Very low accuracy
Payload based (pattern matching)+ High accuracy- Packet contents are required- Computationally expensive- Content encryption- Privacy legislations
Consequence: Not a feasible solutions
![Page 6: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/6.jpg)
Existing Solutions
Machine Learning Techniques- Difficult training phase+ Packet contents are not required+ High accuracy+ Computationally viable
Two main possibilities: Supervised methods:
+ Better accuracy for classes expected- Need a complete pre-labeled dataset- Difficult detection of retraining necessity - No detection of new classes
Unsupervised methods: + Do not need a full labeled dataset+ Automatic detection of new classes+ Better accuracy for new classes
![Page 7: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/7.jpg)
Outline
Scenario and objectives Existing solutions
Well-known ports Payload based (pattern matching) Machine Learning
– Supervised– Unsupervised
Proposed method Results Conclusions and Future work
![Page 8: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/8.jpg)
Proposed method
Supervised identification based on C4.5 algorithm Developed by Ross Quinlan as extension of ID3 Based on the construction of a classification tree
Training set Actual traffic flows Pairs <flow features, applications> Feature vector contains relevant characteristics of traffic flows Application is identified using L7-filter
![Page 9: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/9.jpg)
Machine Learning process
1) Collection of the training set• Representative flows of the environment to be monitored
2)Automatic flow classification → application class• Pattern matching using L7-filter• It can be simplified if an artificial training set is used in 1)
3) Feature extraction from the training flows
4) Construction of a C4.5 classification tree• E.g. using Weka
5) Deployment of the tree obtained in 4) in the monitoring system
6) Retraining of the system• Starting from phase 1)
![Page 10: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/10.jpg)
Outline
Scenario and objectives Existing solutions
Well-known ports Payload based (pattern matching) Machine Learning
– Supervised– Unsupervised
Proposed method Results Conclusions and Future work
![Page 11: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/11.jpg)
Accuracy
![Page 12: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/12.jpg)
Netflow Accuracy
![Page 13: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/13.jpg)
Accuracy
![Page 14: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/14.jpg)
Features Accuracy
· Best Normal Feature Subset : dport, bytes_out, avg_out_size, sport, avg_in_size, push_in.
· Best Netflow Feature Subset: dport, bytes, push
![Page 15: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/15.jpg)
How it affects the sampling?
![Page 16: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/16.jpg)
Outline
Scenario and objectives Existing solutions
Well-known ports Payload based (pattern matching) Machine Learning
– Supervised– Unsupervised
Proposed method Results Conclusions and Future work
![Page 17: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/17.jpg)
Conclusions and Future Work
Machine learning techniques are a good solution to identify applications
The identification in sampled scenarios are still very open
Future work:
Find a more accurate automatic system to label the dataset Build early decision trees to identify the flow as soon as
possible Find features that achieves more accuracy and more resilient
to sampling Test with traces from another networks to check the generality
of the solution.
![Page 18: Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine](https://reader035.vdocuments.net/reader035/viewer/2022062313/56649cc35503460f9498b947/html5/thumbnails/18.jpg)
Thank you for your attention
Questions?