Download - Netflix OSS Meetup Season 5 Episode 1
Netflix Open Source
Netflix Open Source - @NetflixOSS
Season 5, Episode 1
Agenda
6:00-7:00 Registration, Food/Drink, Networking
7:00-8:00 Talks:
• RepoKid - Travis McPeak and Patrick Kelley, Netflix
• BetterTLS - Ian Haken, Netflix
• Authorization at Netflix - Manish Mehta, Netflix
• Open Policy Agent - Torin Sandall, OPA project
• PADME - Kamil Pawlowski, PADME project
8:00-9:00 Demos, Networking
HeadlineRightsizing Permissions @Scale
Patrick Kelley
9-27-2017
The Antagonist
Set Builder: (Me)
● Name: Patrick Kelley @monkeysecurity
● ~ 5 years @ Netflix
● Decent trampoline jumper
● OSS Fan
○ SecurityMonkey
○ CloudAux
○ PolicyUniverse
○ Aardvark
○ Repokid
○ SWAG
You are Entitled to Nothing
Permissions granted to new apps:
● Permissions are automatically granted to applications on deploy.
● Apps start with a small base-set of permissions.
● Manual interaction with the security team is limited.
Eventually:
● Default permission set is empty. We peek inside your AMI to build policies.
● Library owners define required permissions.*
Remove Unused
PermissionsRepokid gathers data from multiple plugins and
determines which permissions may be removed.
After sending notifications, repokid will “repo”
unused permissions. If something goes wrong,
repokid allows for easy rollback.
https://github.com/Netflix/repokid
https://github.com/Netflix-Skunkworks/aardvark
AWS Policy Anatomy{
"Action": "s3:GetObject","Resource": "arn:aws:s3:::test-bucket-*","Effect": "Allow"
}
Service Access Advisor
Event CloudTrail
Resource S3 Access Logs
Thank You !
Netflix Open Source - @NetflixOSS
BetterTLS
Netflix Open Source - @NetflixOSS
A test suite for HTTPS clients implementing verification of
the Name Constraints certificate extension
How Does Web PKI Work?
google.co
mVerisign
172.317.5.110
Symantec
Digicert
Verisign
google.com
On Trusting Your Truststore
nsa.govWoSign
China
23.210.7.329
Verisign
DigicertWoSign
China
nsa.gov
Another Use Case
password
reset
.acme
.internal
ACME
Root CA
74.304.23.58
passwordreset.acme.internal
ACME
Root CA
Responsibility, Risk, and Transparency
bankof
america
.comACME
Root CA
17.59.228.350
ACME
Root CA
bankofamerica.com
We want to apply authorization
rules to CAs.
Is ACME Root CA authorized to
create a certificate for
bankofamerica.com?
The Name Constraints X509 Extension
● RFC 5280 (May 2008)
● Applies only to CA certificates. Specifies:
○ Type of name to which it applies (DNS, IP, etc)
○ Subtree (DNS prefix or IP range)
○ Whitelisted or blacklisted
● Constraints on CA hierarchy can be nested!
Implementations should “intersect” the constraints.
○ The ACME Root CA can be whitelisted for *.internal
○ The ACME Test Environment CA can be blacklisted
for *.prod.internal
How Name Constraints Works
ACME Root
CA
ACME
Internal CA
NC: *.internal
passwordreset
.acme.internal
✓
ACME Root
CA
ACME
Internal CA
NC: *.internal
bankofamerica
.com
×
The Name Constraints extension is
only useful if clients implement it.
...correctly.
The Name Constraints extension is
only useful if clients implement it...
Let’s Test! Thoroughly!
● Put the server name in both CN and SAN
● Use both DNS names and IP names
○ Use both valid and invalid names
● Use both NC whitelisting and blacklisting
○ Use both passing and non-passing
whitelists/blacklists
● Mix and match all of these
○ Computers are really good at brute forcing all
combinations of things
● Let’s contact vendors about any issues we find
● And let’s make it public!
Introducing BetterTLS.com
Making TLS Better
● Chrome now has 100% pass on Windows and Linux
○ Chrome on OSX still has some blacklist failures
because of unfixed bugs in Apple’s proprietary TLS
implementation. :(
● Go found a bug in their NC verification
○ They’ve fixed it and included a bettertls certificate in
their own test suite!
● Java has fixed bugs in their NC verification
○ Release including the fix is pending
What Should I Do?
● If you use TLS in your project, consider utilizing the
bettertls.com test suite.
● Contribute!
○ Help us extend BetterTLS with other (e.g. more
specific) Name Constraints tests
○ Submit additional client test results
○ Invent another TLS extension suite (HPKP, HSTS, …)
● If you manage any sort of CA, use name constraints to
reduce risk to your users, to reduce your own liability, and
to increase transparency!
Thank You !
Netflix Open Source - @NetflixOSS
Authorization at Netflix
Netflix Open Source - @NetflixOSS
Netflix’s architecture for implementing
Authorization at scale
Background - Definitions
Transfer $1000 from Account X to Account Y
Me My Bank
1. Verify the Identity of the Requester (Authentication or AuthN)
2. Verify that the Requestor is authorized to perform
the requested operation (Authorization or AuthZ)
These 2 steps do not need to be tied together !!
Background – Netflix Architecture
AuthZ Problem
A way to define and enforce rules that read
Identity I
can/cannot perform
Operation O
on
Resource R
For ALL combinations of I, O, and R in the ecosystem.
Design Considerations
● Resource types
● Identity types
● Underlying Protocols
● Implementation Languages
● Latency
● Flexibility of Rules
● Company Culture
● Capture Intent
Result
DistributorDistributorDistributor
AuthZ Agent
App
Code
S
S
H
Policy
Portal
App CodeAuthZ Agent
DistributorDistributorAggregator
Policy DB
Other Data
Sources
Service A
Service B
Zooming In
AuthZ Agent
API Stager
Open Policy Agent Engine
Updater
Periodic updates on policies
and associated data
Did it work?
Resource types REST, SSH, Keys, Kafka Topics
Identity types VM/Container Services, Batch Jobs, FTEs, Contractors
Underlying Protocols HTTP, gRPC, Kafka Protocol
Implementation Languages Java, Node JS, Ruby, Python
Latency < 0.5 ms for basic policies
Flexibility of Rules OPA Policy Engine
Company Culture Policy Portal
Capture Intent Policy Portal UI hides Policy text for most use cases
Take Away
● AuthZ is a fundamental security problem
● Seek comprehensive solution for better Control and Visibility
● Get there faster with Open Source Tools (e.g. OPA)
● Get involved in communities (e.g. PADME)
Thank You !
Netflix Open Source - @NetflixOSS
Open Policy Agent
Netflix Open Source - @NetflixOSS
An open source, general-purpose policy engine
www.openpolicyagent.org
PolicyWhy it’s important
The Policy Problem
ratings
details
commentslanding_page
master
nodes nodes
instance-976
elb-east
bucket-acme
lambda-xyz
keypair-foo
Application Platform Infrastructure
The Policy Problem
ratings
details
commentslanding_page
master
nodes nodes
instance-976
elb-east
bucket-acme
lambda-xyz
keypair-foo
Can user X do operation Y
on resource Z?
Application Platform Infrastructure
The Policy Problem
ratings
details
commentslanding_page
master
nodes nodes
instance-976
elb-east
bucket-acme
lambda-xyz
keypair-foo
Which cluster should this
workload be deployed on?
Can user X do operation Y
on resource Z?
Application Platform Infrastructure
The Policy Problem
ratings
details
commentslanding_page
master
nodes nodes
instance-976
elb-east
bucket-acme
lambda-xyz
keypair-foo
Which cluster should this
workload be deployed on?
Which resources are not
tagged correctly?
Can user X do operation Y
on resource Z?
Application Platform Infrastructure
Writing Policy Is Hard!
http.body: null
http.method: GET
http.path:
- salary
- bob
http.query_params: {}
protocol.scheme: https
service.source:
ipv4: 10.0.0.128
namespace: production
port: 32757
service: landing_page
service.target:
ip: 10.0.1.95
namespace: production
port: 8080
service: details
ingress.user: alice
kind: Pod
metadata:
labels:
app: nginx
name: nginx-1493591563-bvl8q
namespace: production
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
securityContext:
privileged: true
nodeName: minikube
status:
containerStatuses:
- name: nginx
ready: true
restartCount: 0
hostIP: 192.168.99.100
phase: Running
podIP: 172.17.0.4
startTime: 2017-08-01T06:34:13Z
aws_autoscaling_group.lamb:
availability_zones#: '1'
availability_zones.3205: us-west-1a
desired_capacity: '4'
destroy: false
health_check_grace_period: '300'
launch_configuration: kitten
wait_for_capacity_timeout: 10m
aws_instance.puppy:
ami: ami-09b4b74c
instance_type: t2.micro
source_dest_check: 'true'
aws_launch_configuration.kitten:
associate_public_ip_addr: 'false'
destroy: false
image_id: ami-09b4b74c
instance_type: t2.micro
name: kitten
Application Platform Infrastructure
Infrastructure
Writing Policy Is Hard!
http.body: null
http.method: GET
http.path:
- salary
- bob
http.query_params: {}
protocol.scheme: https
service.source:
ipv4: 10.0.0.128
namespace: production
port: 32757
service: landing_page
service.target:
ip: 10.0.1.95
namespace: production
port: 8080
service: details
ingress.user: alice
kind: Pod
metadata:
labels:
app: nginx
name: nginx-1493591563-bvl8q
namespace: production
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
securityContext:
privileged: true
nodeName: minikube
status:
containerStatuses:
- name: nginx
ready: true
restartCount: 0
hostIP: 192.168.99.100
phase: Running
podIP: 172.17.0.4
startTime: 2017-08-01T06:34:13Z
aws_autoscaling_group.lamb:
availability_zones#: '1'
availability_zones.3205: us-west-1a
desired_capacity: '4'
destroy: false
health_check_grace_period: '300'
launch_configuration: kitten
wait_for_capacity_timeout: 10m
aws_instance.puppy:
ami: ami-09b4b74c
instance_type: t2.micro
source_dest_check: 'true'
aws_launch_configuration.kitten:
associate_public_ip_addr: 'false'
destroy: false
image_id: ami-09b4b74c
instance_type: t2.micro
name: kitten
Context DependentApplication Platform
Infrastructure
Writing Policy Is Hard!
http.body: null
http.method: GET
http.path:
- salary
- bob
http.query_params: {}
protocol.scheme: https
service.source:
ipv4: 10.0.0.128
namespace: production
port: 32757
service: landing_page
service.target:
ip: 10.0.1.95
namespace: production
port: 8080
service: details
ingress.user: alice
kind: Pod
metadata:
labels:
app: nginx
name: nginx-1493591563-bvl8q
namespace: production
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
securityContext:
privileged: true
nodeName: minikube
status:
containerStatuses:
- name: nginx
ready: true
restartCount: 0
hostIP: 192.168.99.100
phase: Running
podIP: 172.17.0.4
startTime: 2017-08-01T06:34:13Z
aws_autoscaling_group.lamb:
availability_zones#: '1'
availability_zones.3205: us-west-1a
desired_capacity: '4'
destroy: false
health_check_grace_period: '300'
launch_configuration: kitten
wait_for_capacity_timeout: 10m
aws_instance.puppy:
ami: ami-09b4b74c
instance_type: t2.micro
source_dest_check: 'true'
aws_launch_configuration.kitten:
associate_public_ip_addr: 'false'
destroy: false
image_id: ami-09b4b74c
instance_type: t2.micro
name: kitten
Context Dependent
Complex Data
Application Platform
Writing Policy Is Hard!
http.body: null
http.method: GET
http.path:
- salary
- bob
http.query_params: {}
protocol.scheme: https
service.source:
ipv4: 10.0.0.128
namespace: production
port: 32757
service: landing_page
service.target:
ip: 10.0.1.95
namespace: production
port: 8080
service: details
ingress.user: alice
kind: Pod
metadata:
labels:
app: nginx
name: nginx-1493591563-bvl8q
namespace: production
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
securityContext:
privileged: true
nodeName: minikube
status:
containerStatuses:
- name: nginx
ready: true
restartCount: 0
hostIP: 192.168.99.100
phase: Running
podIP: 172.17.0.4
startTime: 2017-08-01T06:34:13Z
aws_autoscaling_group.lamb:
availability_zones#: '1'
availability_zones.3205: us-west-1a
desired_capacity: '4'
destroy: false
health_check_grace_period: '300'
launch_configuration: kitten
wait_for_capacity_timeout: 10m
aws_instance.puppy:
ami: ami-09b4b74c
instance_type: t2.micro
source_dest_check: 'true'
aws_launch_configuration.kitten:
associate_public_ip_addr: 'false'
destroy: false
image_id: ami-09b4b74c
instance_type: t2.micro
name: kitten
Context Dependent
Complex Data
Search and
Aggregation
Application Platform Infrastructure
OPA: Unified, Declarative, Context-aware
Application: “Employees can access
their own salary data. Managers can
access their subordinates salary
data.”
Platform: “Workloads that require
EU jurisdiction must be deployed on
clusters in European zones.”
Infrastructure: “Allow plans without
deletes unless the number of new
resources exceeds 100.” Data
(JSON)
Policy
(Rego)
Service
Policy
Query
Policy
Decision
OPA: Unified, Declarative, Context-aware
“Employees can access their own salary data. Managers
can access their subordinates salary data.”
allow {
input.path = [“salary”, employee_id]
input.user = employee_id
}
allow {
input.path = [“salary”, employee_id]
input.user = data.manager_of[employee_id]
}
OPA: Unified, Declarative, Context-aware
“Employees can access their own salary data. Managers
can access their subordinates salary data.”
allow {
input.path = [“salary”, employee_id]
input.user = employee_id
}
allow {
input.path = [“salary”, employee_id]
input.user = data.manager_of[employee_id]
}
Context
Pattern Matching
OPA: Unified, Declarative, Context-aware
“Workloads that require EU jurisdiction must be deployed on
clusters in European zones.”
placement[cluster.name] {
input.metadata.labels[“requires-eu-jurisdiction”]
cluster = data.clusters[_]
startswith(cluster.status.region, “eu-”)
}
OPA: Unified, Declarative, Context-aware
“Workloads that require EU jurisdiction must be deployed on
clusters in European zones.”
placement[cluster.name] {
input.metadata.labels[“requires-eu-jurisdiction”]
cluster = data.clusters[_]
startswith(cluster.status.region, “eu-”)
}
References Search
OPA: Unified, Declarative, Context-aware
“Allow plans without deletes unless the number of new
resources exceeds 100.”
deny { score > 100 }
weights = {“create”: 1, “modify”: 0, “delete”: 1000}
score = s {
sum([weights[op] | input.plan[_] = [op, _]], s)
}
AggregationComposition
The Open Policy Agent Project
● Declarative Language
● Document-oriented
● Daemon, Library
● Policy, Query, Data APIs
● Tooling (REPL, Tracing, Testing)
● Apache License 2.0
Data
(JSON)
Policy
(Rego)
Thank You !
Netflix Open Source - @NetflixOSS
PADME
Netflix Open Source - @NetflixOSS
Access Control In a Distributed World
www.padme.io
Goals
• Provable, Composable, Security
• Simplicity (ease of use)
• Well Defined Behavior in a Distributed Environment
The Problem
Configuring Access Policies is Hard
• Every component is different (heterogeneity)
• Web servers, networking gear, etc
• Services evolve, and policies need to change with them (temporality)
• Policies don’t understand the CAP Theorem (temporality)
Current State
• Recruited Core Team
• Use cases
• Skeletal Reference Architecture
Thank You !
Netflix Open Source - @NetflixOSS
Demo Stations
Open Policy Agent
StethoscopeHubCommander
Titus