hype or hope? machine learning based security analytics ... · –the "openid connect relying...

Hype or hope? Machine learning based security analytics for web applications

Lei Ding, Ph.D.R&D Principal, Accenture LabsArlington, VA

Web applicationsExpand attack surface of web-facing infrastructure

DatabasesAppServers

Web Apps

File StorageCloud Services

Organization’s web-facing infrastructure

Web API: Web Application Programming InterfaceMiddle layer in the connected digital world

Databases

AppServers

Mobile Apps

Web Apps

APIsFile Storage

Cloud Services

API security

Web/Mobile Applications

BackendServices/Databases

New attack surfaceNew threat vectors

New access modelsExpect API breaches to accelerate

“By 2022 API abuses will be the attack vector most responsible for data breaches within enterprise web applications.” - Gartner Research

APIs

API use case example:

Travel App receives flight information

Travel App requests for flights information by making an

Airline’s database

receives the request

Airline responds by providing flight information

MOBILE/ WEB APPLICATIONS

APIBACKEND

SYSTEMS/ SERVICES

airlines

API request

Securing APIs

Publicly Disclosed vulnerabilitiesCVE-2017-8304, Apr. 2017

– Oauth/playground/callback.html allows XSS with a crafted URICVE-2017-6059, Apr. 2017

– Ping Identity OpenID Connect authentication module for Apache before 2.14 allows remote attackers to spoof page content via a malicious URL provided to the user, which triggers an invalid request

CVE-2017-6413, Mar. 2017– The "OpenID Connect Relying Party and OAuth 2.0 Resource Server" module before 2.1.6 allows remote attackers to bypass authentication via

crafted HTTP trafficCVE-2017-4960, Mar. 2017

– Failure to handle exceptional conditions, subject Oauth clients to a denial of service attackCVE-2017-6062, Jan. 2017

– The "OpenID Connect Relying Party and OAuth 2.0 Resource Server" module before 2.1.5 allows remote attackers to bypass authentication via crafted HTTP traffic

… …

Standards

OAUTH OPEN ID Connect JWT mTLS

Securing APIs

Best Practices

Set up monitoring and alerting

Enforce security in the gateway

Use gateway traffic management

Restrict or block malicious APIs

Standards

OAUTH OPEN ID Connect JWT mTLS

Machine learning based API security analyticsBig data problem

QUANTITATIVEMEASURES

MACHINE LEARNINGBIG DATA ANALYSIS

REAL-TIME RISK ASSESSMENT

MORE SECURITY INSIGHTS

AUTOMATED ANOMALY DETECTION

1

2

3

Machine learning based risk assessment API level risk score assessment

New API Request GenerateProfile

New API Profile

Baseline ProfilesHistorical API Data GenerateProfile

Deviation

Allow Access

Alert & Block Access

Alert Access

Risk Score

Deployment use case

• Offline training uses historical logs• Real-time risk assessment on new incoming API requests• Generate reports on security alerts and API blocking/limiting recommendations to API Policy Management/Custom Policy

module

Users API Management Tool/Gateway Backend System

Server Database

API requestAPI response

to_resource()

from_resource()

Online DetectionOffline Training

Logs (.json)

Runtime Monitor Alerts Custom

Policy

Risk Assessment

Recommendation

Security Alert Report

Risk Assessment Report

With existing API management tool/gateway

Machine learning based risk assessment

•

• Quantitatively measure the likelihood of a new request being anomalous (risk score)

• Provide real-time API risk assessment

• Aid decision making in the the presence of (potential) attacks

Learn relationships among events produced by API requests

An example: legit admin registration

Learn relationships among events produced by API requestsPrivilege escalation attack (11/2018, CVE-2018-19207)

ML based solutionsChallenges

• Incomplete training data– Lack of training data for various attacks– Some classes are severely under sampled due to the measuring costs for that class

caused by the low frequency of occurrence (e.g., zero day attacks)

• Web-based exploits vs. rare benign web events

Security analyticsDefense in depth

API Requests File Activity Logs

User API Gateway

/endpoint1/collection1/resource1/resource2

/collection2/resource1

…/endpoint2

API request

API response

to_resource()

from_resource()

Backend System

Mobile Apps

Web Apps

Server Database

Different data sources may show new components of the attack that are not visible within the data from API gateway

Database Logs

using different sources of data

Discussion

• Training data is not fully representative of the real data– Variety of data across different web application, industries, organizations, occasions– Distributions generating the new data varies from the distribution that generates the data

used to train the model– Unclear what the representative distribution of the data is

• Adversary’s new capability– Data manipulation to bypass ML based anomaly detection– Poisoning attacks on ML models– Adversarial attacks on ML models– … …

Need to build more robust ML models

hype or hope? machine learning based security analytics ... · –the "openid connect relying...

Documents