hype or hope? machine learning based security analytics ... · –the "openid connect relying...
TRANSCRIPT
Hype or hope? Machine learning based security analytics for web applications
Lei Ding, Ph.D.R&D Principal, Accenture LabsArlington, VA
Web applicationsExpand attack surface of web-facing infrastructure
DatabasesAppServers
Web Apps
File StorageCloud Services
Organization’s web-facing infrastructure
Web API: Web Application Programming InterfaceMiddle layer in the connected digital world
Databases
AppServers
Mobile Apps
Web Apps
APIsFile Storage
Cloud Services
API security
Web/Mobile Applications
BackendServices/Databases
New attack surfaceNew threat vectors
New access modelsExpect API breaches to accelerate
“By 2022 API abuses will be the attack vector most responsible for data breaches within enterprise web applications.” - Gartner Research
APIs
API use case example:
Travel App receives flight information
Travel App requests for flights information by making an
Airline’s database
receives the request
Airline responds by providing flight information
MOBILE/ WEB APPLICATIONS
APIBACKEND
SYSTEMS/ SERVICES
airlines
API request
Securing APIs
Publicly Disclosed vulnerabilitiesCVE-2017-8304, Apr. 2017
– Oauth/playground/callback.html allows XSS with a crafted URICVE-2017-6059, Apr. 2017
– Ping Identity OpenID Connect authentication module for Apache before 2.14 allows remote attackers to spoof page content via a malicious URL provided to the user, which triggers an invalid request
CVE-2017-6413, Mar. 2017– The "OpenID Connect Relying Party and OAuth 2.0 Resource Server" module before 2.1.6 allows remote attackers to bypass authentication via
crafted HTTP trafficCVE-2017-4960, Mar. 2017
– Failure to handle exceptional conditions, subject Oauth clients to a denial of service attackCVE-2017-6062, Jan. 2017
– The "OpenID Connect Relying Party and OAuth 2.0 Resource Server" module before 2.1.5 allows remote attackers to bypass authentication via crafted HTTP traffic
… …
Standards
OAUTH OPEN ID Connect JWT mTLS
Securing APIs
Best Practices
Set up monitoring and alerting
Enforce security in the gateway
Use gateway traffic management
Restrict or block malicious APIs
Standards
OAUTH OPEN ID Connect JWT mTLS
Machine learning based API security analyticsBig data problem
QUANTITATIVEMEASURES
MACHINE LEARNINGBIG DATA ANALYSIS
REAL-TIME RISK ASSESSMENT
MORE SECURITY INSIGHTS
AUTOMATED ANOMALY DETECTION
1
2
3
Machine learning based risk assessment API level risk score assessment
New API Request GenerateProfile
New API Profile
Baseline ProfilesHistorical API Data GenerateProfile
Deviation
Allow Access
Alert & Block Access
Alert Access
Risk Score
Deployment use case
• Offline training uses historical logs• Real-time risk assessment on new incoming API requests• Generate reports on security alerts and API blocking/limiting recommendations to API Policy Management/Custom Policy
module
Users API Management Tool/Gateway Backend System
Server Database
API requestAPI response
to_resource()
from_resource()
Online DetectionOffline Training
Logs (.json)
Runtime Monitor Alerts Custom
Policy
Risk Assessment
Recommendation
Security Alert Report
Risk Assessment Report
With existing API management tool/gateway
Machine learning based risk assessment
•
• Quantitatively measure the likelihood of a new request being anomalous (risk score)
• Provide real-time API risk assessment
• Aid decision making in the the presence of (potential) attacks
Learn relationships among events produced by API requests
An example: legit admin registration
Learn relationships among events produced by API requestsPrivilege escalation attack (11/2018, CVE-2018-19207)
ML based solutionsChallenges
• Incomplete training data– Lack of training data for various attacks– Some classes are severely under sampled due to the measuring costs for that class
caused by the low frequency of occurrence (e.g., zero day attacks)
• Web-based exploits vs. rare benign web events
Security analyticsDefense in depth
API Requests File Activity Logs
User API Gateway
/endpoint1/collection1/resource1/resource2
/collection2/resource1
…/endpoint2
API request
API response
to_resource()
from_resource()
Backend System
Mobile Apps
Web Apps
Server Database
Different data sources may show new components of the attack that are not visible within the data from API gateway
Database Logs
using different sources of data
Discussion
• Training data is not fully representative of the real data– Variety of data across different web application, industries, organizations, occasions– Distributions generating the new data varies from the distribution that generates the data
used to train the model– Unclear what the representative distribution of the data is
• Adversary’s new capability– Data manipulation to bypass ML based anomaly detection– Poisoning attacks on ML models– Adversarial attacks on ML models– … …
Need to build more robust ML models
Q&A