private apps in the public cloud - devcontlv march 2016

Post on 11-Apr-2017

410 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Principal Architect

Private Apps in the Public CloudIssac Goldstand

IntroducingAppCloud

Introducing AppCloud

Out-of-the-Box ExperienceUsers discover applications as they set up a new PC or smartphone

App Discovery EngineUsers browse for software in a curated catalog

Introducing AppCloud

Dynamic NotificationsRe-engaging users when it makes sense

AnalyticsUnderstanding users.

Behind the scenes...

Introducing AppCloud

Sponsored AppsPopular Apps

AppCloud Catalog

* Popular Apps - Free apps that users are likely to install on their device* Sponsored Apps - Apps with campaigns that can generate revenue

Mix of popular and sponsored

apps

App Personalization Engine

Sponsored App

Developing the MVP

Developing the MVP

Developing the MVP

Design Considerations

2 Major Concerns

“The more possessions one owns, the more worries one

needs to deal with”

“ , דאגות מרבה נכסים ”מרבה

Things got...

Design Considerations

Things got… ...pretty crazy

Design Considerations

What happens if an attacker breaches one of the servers?

Design Considerations

Design Considerations

We need to partition our environments properly

Two types of environments

Two types of environments

Single TenantMultiple Tenant

Separate sensitive components

In a modern cloud, where most hardware is multi-

tenant by definition, how can we accomplish single tenant

partitioning?

Hardware (Logical)Network (Logical)

Application

Hardware Layer

TL; DR

We use dedicated compute instance per

component/environment (customer)

Networking Layer

Network

How we used to do it?

https://commons.wikimedia.org/wiki/File:3_men_working_on_a_portable_phone_switchboard.jpg

Each customer gets their own (set of) VLAN(s)

No interconnectivity between customer VLANs

Additional VLAN(s) for shared components

AWS

AWSVLAN == VPC

AWSVLAN == VPC

(loosely speaking)

Network Layer

Split each customer to their own VPC

AWS security groups

Network Layer

AWS Jump Rules

Target is another (or the same) security group

Network Layer

Each customer/component pair has a security group (at least one)

Allows fine-grained control of which services can access different sets of data

Note the separation of S3 buckets + use of IAM roles to access the S3 data

Each customer/component pair has a security group (at

least one)

Allows fine-grained control of which services can access

different sets of data

In addition to security, S3 replication allows for cross-

region deployments

https://github.com/issacg/s3sync

Application Layer

Application LayerExample Workflow

1) Back-end sends app to “Publish App” microservice

2) “Publish” microservice stores data in S3 storage

3) “Publish” microservice calls “Parse App Metadata” and “Sign App” microservices

4) “Publish” microservice saves metadata + signature to database

Future slides will use PoV of “Publish App” microservice

“Elastic applications in a public cloud should support

zero-configuration”

“Elastic applications in a public cloud should support

zero-configuration” - Me

Zero configuration allows us to support both auto-scaling groups and auto-healing in case of (many, but not all)

problems

Zero-Configuration

NetworkingService Discovery

Credentials/Identity Management

Application + Config

Networking

Service Discovery

Options

Option 1

AWS Internal ELBRoute53 Private Hosted

Zones

Service Discovery

Route53 Private Zone points to Internal ELB

ELB load balances traffic between Publish workers

If a publish worker fails the ELB health check, it is removed from the pool of healthy workers

Option 2

Standalone service discovery

ZooKeeper, Consul, etc

Consul

Rich feature-set built-in

Service discoveryKV storage

Global mutex/semaphoresLeader electionHigh availability (active/active)

Encryption (Gossip + HTTP/RPC)

Health checks

Incredibly elastic

Fits the cloud well

Consul

Register current instance as “publish-i1234567890abcdefa.node.customerA” and “publish.service.customerA” + healthchecks

Addresses of well-known “parseapp” and “signapp” services via service discovery

Name of S3 bucket + path via KV storage

S3 access via IAM role

Database (host, user, password) via KV storage

Zero-Configuration via Consul

Application LayerServer comes up via configuration management scripts

Server joins consul cluster

Server fetches application configuration based on well-known locations in the consul KV store

Server fetches application bits and boots

Service registers with consul (including healthcheck)

Instead of looking up an ELB with a well-known hostname,

we can use a well-known service name and connect to

any machine inside that service group

Service Discovery

Consul DNS lookup for “publish” service

Consul randomly picks a healthy instance and returns the address of the worker

If a publish worker fails the consul health check, it is removed from the pool of healthy workers

If we have a leader/follower app, we can use consul “tags” to get a specific instance (eg. master.publish.service…..consul)

Application Layer

Service Discovery

redis.service.nyc3.consul / redis.service.consul

Application Layer

https://www.flickr.com/photos/cogdog/566323330

Service Discovery

Consul Architectural Concepts

Application Layer

Consul High-level architecture

https://www.consul.io/docs/internals/architecture.html

Two gossip pools - WAN & LAN

LAN pools encapsulate a single (virtual) datacenter

Divided into server and client agents

In each DC, a single server is elected as “Leader”

Transactions are forwarded and committed to all servers

Leader is responsible for maintaining consistency in

its DC

WAN pool spans all datacenters (servers only)

Cross-datacenter requests use RPC-forwarding

(between server nodes) to query the remote DC

No DC stores information about other DCs

Rich ACL systemWho can access what?

Application Layer

http://imgur.com/gallery/WlgnC

Consul

We already split everything into VPCs

Each VPC becomes a DC in consul

Each environment (customer) automatically

gets a private KV store and private service registry

Shared services live in their own well-known dedicated DC with their own “shared” KV store & service registry

Application Layer

https://www.flickr.com/photos/mherzber/500917537

Consul

It’s possible to perform cross-datacenter queries

Controllable via ACLs

Application Layer

Publish asks Consul Blue - who is local signapp? signapp.service.consul

Consul Blue answers with gossipped address of random signapp instance in healthy state

Application Layer

Publish asks Consul Blue - who is tagged as leader node of parse in green DC? leader.parse.service.green.consul

Consul Blue checks the WAN gossipped peers for a server address of Consul Green cluster

Consul Blue forwards the query (via HTTP/S) to Consul Green over the WAN

Consul Green answers with gossipped address of a parse node with the tag “leader” and in healthy state

Is that good enough?

Probably

Application Layer

http://onceuponyourprime.com/2014/03/20/must-you-always-cross-your-eyes-and-dot-all-your-teas/

Application Layer

http://monteeggers.com/shiny-object-syndrome-killing-business

Secure storage and audit control of private data

One-time, short-lived, audited passwords

Growing ecosystem of backends supporting one-

time-passwords

AWS-STS, MySQL, PostgreSQL, SSH, PKI,

Consul

Application Layer

VaultHigh-level architecture

https://www.vaultproject.io/docs/internals/architecture.html

Consul + Vault

Consul + Vault access via Vault (via Provisioning Service)

Addresses of well-known “parseapp” and “signapp” services via consul service discovery

Name of S3 bucket + path via KV storage (access via IAM Role*) * Could also use Vault AWS backend

Database host via consul KV storage

Database user, password via Vault

Register current instance for consul service discovery

Zero-Configuration via Consul & Vault

NetworkingService DiscoveryCredentials/Identity ManagementApplication + Config

Application LayerServer comes up via configuration management scripts

Server identifies itself to Vault-backed provisioning service and gets consul SSL keypair + Consul access token + Vault access token for future queries

Server joins encrypted Consul cluster

Server fetches application bits and boots

Service fetches application configuration from Vault secret backend and Consul KV store

Service registers with consul (including healthcheck)

The Challenge

Application Layer

Provisioning Service

https://41.media.tumblr.com/eeb9825c9b3bf3a968d8ed63844b11df/tumblr_inline_nvrau6JwQD1rrhq52_540.jpg

How do you bootstrap access for a single image running in

multiple instances (eg, an AMI in an auto-scaling

group)?

We want to audit each machine’s access

individually - no shared authentication

We don’t want to allow multiple machines (or

“anything”s) to authenticate the same token twice

We don’t want to store secrets in a non-secret place

Many suggestions for inclusion as an

authentication plugin for Vault

I haven’t seen any I like

Let Vault focus on protecting the data, extend it with

external tooling to fit your needs

Provisioning Service

Amazon EC2 Instance Identity Document

http://169.254.169.254/latest/dynamic/instance-identity/

Includes embedded cryptographic signature to authenticate the document

Signed by AWS

Provisioning ServiceAWS EC2 Identity Document

Application LayerSubset of fields in the Identity Document

● AWS Account Number

● Instance ID

● Instance Primary Private IP address

● AMI + Kernel ID

● Launch request time

Missing component role

Missing environment (customer) identifier

We currently store those in the EC2 user-data to be

processed by our configuration management

system

After authenticating the instance the provisioning

service queries EC2 to obtain the user-data

It’s flexible to harden this later

Provisioning ServiceInstance sends its identity document to Provisioning Service (PrvSrv)

PrvSrv authenticates the AWS signature

PrvSrv verifies that IP making the request matches the IP in the doc

PrvSrv verifies that the AWS account, AMI are whitelisted

PrvSrv uses the instance id to query the EC2 API to fetch additional metadata

Using this metadata, PrvSrv requests/generates credentials for Vault + Consul and returns this info to the instance

PrvSrv sends additional Vault token to bootstrap Consul

EC2 Instance IDs are globally unique across accounts and

are never recycled

Provisioning service will only provide a single token for an

instance

Instances are guaranteed to be coming from inside our AWS accounts, and from a

verified IP address

Consul bootstrap information

TLS keys for the node (PKI backend)

https://github.com/issacg/vault-pki-client

Current gossip shared-key (Generic backend)

Token for consul (Consul backend)

Provides vault token for application (no backend)

No built-in Vault backend for Vault

No built-in Vault backend for Vault

Not an unsolvable problem - provisioning service can take

care of this

Future Plans & Challenges

Separate Vault per environment (eg, Vault per

Consul DC)

How do we manage unsealing with so many

Vault clusters?

How do we pass the secrets from the provisioning service

client to the application service in a secure manner?

How do we need to change the provisioning service to

run with containers?

Summing Things Up

In a modern cloud, where most hardware is multi-

tenant by definition, how can we accomplish single tenant

partitioning?

PartitioningHardware (Logical) Layer

Networking (Logical) LayerApplication Layer

Hardware (Logical) Layer

Instance per-service per-environment

Networking (Logical) Layer

VPC per environment

Security Group per component/environment

compute-instance

Application Layer

Separation of shared / private microservices

Consul + Vault + Provisioning service to

provide partitioned zero-configuration

Questions?

Thank you

Principal Architect

Issac Goldstandissac@ironsrc.com

top related