streaming movies brings you streamlined applications -- how adopting netflix libraries can improve...
DESCRIPTION
In this presentation, Andrew Spyker and I present our experience with adopting Netflix OSS, both from a deep runtime perspective for various applications and services as well as managing deployed services for scalability and failover.TRANSCRIPT
1
IBM Confidential
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix Libraries can Improve Your Application
or Service!
March, 2014
Andrew Spyker, STSM Michael Elder, STSM [email protected] [email protected]
@aspyker @mdelder
2
2012 2013
2014 Beyond
SPECjEnterprise
AcmeAir Cloud/Mobile Sample/Benchmark born
Sample applicaBon cloud prize work
AcmeAir Run On IBM Cloud at “Web Scale”
Portability cloud prize work
Scalable Services Fabric internally for
IBM Services
Landscaper & JazzHub adopBon of NOSS libraries
Historical Context
SoSLayer and BlueMix services
Ne0lixOSS
• “Technical indigesBon as a service” – Adrian CockcroS
• neWlix.github.io – 40+ OSS projects – Expanding every day
• Focusing more on interacBve mid-‐Ber server technology today …
What is service opera7onal excellence?
¡ OperaBng a 24/7 producBon public cloud service with paying customers means …
¡ High availability ¡ No SPOF’s, all components at least triple clustered
¡ AutomaBc Recovery ¡ ParBal failure should be recovered by system
¡ ConBnuous Delivery ¡ Changes delivered frequently with zero downBme
¡ ElasBc Scalability ¡ SoluBon can scale out easily
¡ OperaBonal Visibility ¡ Operators have live view of contextualized state of system
Micro service ImplementaBon
Call “Auth Service”
Ribbon REST client with Eureka
Web App Front End
(REST services) App Service (auth-‐service)
Execute auth-‐service
call
Hystrix
Eureka Server(s)
Eureka Server(s)
Eureka Server(s)
Karyon & Archaius
Fallback ImplementaBon
Implementa7on Detail Benefits
Decompose into micro services • Key user path always available • Failure does not propagate across service boundaries
Karyon /w automaBc Eureka registraBon using Archaius for property management
• New instances are quickly found, failing individual instances disappear • Per deployment environment composite property sources provide
dynamic behavior at runBme
Ribbon client with Eureka awareness • Load balances & retries across instances with “smarts” • Handles temporal instance failure
Hystrix as dependency circuit breaker • Allows for fast failure • Provides graceful cross service degradaBon/recovery
Highly Available Service Run7me Recipe
Region (Dallas)
DAL01
Datacenter (DAL06) DAL05
Eureka
Local LBs
Web App Auth Service Booking Service
Cluster Auto Recovery and Scaling Services
Global Load Balancers
Rule Why?
Always > 2 of everything 1 is SPOF, 2 doesn’t web scale and slow DR recovery
Including IaaS and cloud services You’re only as strong as your weakest dependency
Use auto scaler/recovery monitoring Clusters guarantee availability and service latency
Use applicaBon level health checks Instance on the network != healthy
IaaS High Availability
Our goals for this talk
¡ Change your perspecBve: as we move into hosted offerings, scalability and availability are criBcal to our business
¡ We’ll look at NeWlix RunBme Libraries that you can adopt in your products today – regardless of whether you’re building a hosted service
¡ The Libraries promote good development pracBces and architectural paperns
Pre-‐req: Inversion of Control PaJern
¡ Inversion of Control or Dependency Injec4on is a soSware design papern that defines a set of callback interfaces which are provided with their data or references when needed
¡ The client “inverts control” back to some container to tell it what references to use during execuBon
¡ Spring, Java Persistence API, Servlet, etc are all forms of this papern
¡ NeWlix leverages Google Guice (hpps://code.google.com/p/google-‐guice/) and makes heavy use of injecBon for their lifecycle management library (Karyon)
9
IBM Confidential
Lifecycle Management: Karyon
¡ In order to support microservices, it’s important to componenBze your architecture between microservices
¡ But within microservices, there are similar logically independent segments – EmailNoBficaBon, Database API, Storage API, etc
¡ If you have a big ServletContextListener today which does a bunch of iniBalizaBon, pay apenBon here – this will help you refactor that code into a more manageable approach
StorageComponent
DatabaseComponent
EmailNotificationComponent
MyProductApplication
http://…/yourservice
http://…/myservice
10
IBM Confidential
Lifecycle Management: Karyon
¡ Karyon defines @ApplicaBon and @Component annotaBons
¡ Well defined, loosely coupled classes form the larger applicaBon (e.g. Components)
¡ Allows us to change the monolithic ConfiguraBonService class (1500 lines originally) into about 65 lines
¡ Any new parts of the architecture should be defined as Components
¡ Also defines a “HealthCheck” type to be subclassed for each ApplicaBon
package com.urbancode.landscape.web.core; … @Application @Singleton public class LandscaperApplication { private static final Logger logger = LoggerFactory .getLogger(LandscaperApplication.class); @Inject private ServletContext context; @Inject private LocalStorage storage; @Inject private PersistenceConfig persistenceConfig; @Inject private ServiceBusClient serviceClient; @Inject private NotificationConfig notificationConfig; @PostConstruct public void initialize() { .. } }
11
IBM Confidential
Karyon: @Component ¡ @Components break up logical parts of the
larger applicaBon (as expected)
¡ Components can declare other Components in their InjecBons (e.g PersistenceConfig -‐> LocalStorage)
¡ Components are iniBalized **BEFORE** the ApplicaBon (hence cannot depend on the ApplicaBon)
¡ Components may choose to be @Singleton
¡ Components may declare @PreDestroy methods
package com.urbancode.landscape; … @Component @Singleton public class LocalStorage { @Inject private ServletContext context; .. @PostConstruct public void initialize() { .. } … }
12
IBM Confidential
Karyon Admin Console
¡ Available at hpp://localhost:8077/ (just by launching Tomcat)
¡ Provides detailed informaBon about the process including classpath, environment variables, properBes, and a web-‐based JMX console
13
IBM Confidential
Configura7on Management: Archaius
¡ Archaius defines a library for managing applicaBon properBes
¡ Exposed automaBcally through Karyon admin console
¡ Allows you to define “composite” sources: ¡ pull from properBes file, ¡ then database, ¡ then environment-‐specific properBes file
¡ Makes overriding properBes much easier
14
IBM Confidential
Dynamic Proper7es with Archaius
¡ Archaius also defines dynamic properBes
¡ Dynamic property API updates properBes for each request
¡ Allows you to update configuraBon without forcing a reboot
¡ ProperBes can be manipulated by JMX at runBme
// Create the dynamic property DynamicStringProperty novaEndpoint = DynamicPropertyFactory.getInstance() .getStringProperty(“key”, ”default_value"); // each new request will get the latest known value novaEndpoint.get();
15
IBM Confidential
Latency & Fault Tolerance: Hystrix
¡ Failure happens. Daily. Hourly. This minute even.
¡ Failure cannot be prevented, it can only be protected against through isolaBon
¡ When one dependency fails, it can cause cascading failures or chain reacBons which have a much broader impact
¡ Release It! describes many paperns and anBpaperns around stability and availability moBvated by exactly this kind of use case
16
IBM Confidential
Bulkhead PaJern
¡ When a cascading failure or chain reacBon occurs, the enBre user experience can be destroyed by one bad actor
¡ In this case, or in similar examples where one or more dependencies all fail due to an upstream service, you want to isolate that failure so that it doesn’t impact the rest of the architecture
¡ We call this protecBon the “Bulkhead” papern
17
IBM Confidential
Circuit Breaker PaJern
¡ When a failure occurs over and over again, a backend system may be down or experiencing too much load
¡ In the case of too much load, you can “shed load” to prevent further degradaBon
¡ We can “trip the circuit”, meaning that we avoid sending new requests aSer some failure threshold (default to 50%)
¡ We call this “Circuit Breaker”
18
IBM Confidential
Fail Fast PaJern
¡ When a circuit is “open” (e.g. the backend service is failing consistently), any new requests are immediately rejected or the fallback mechanism is invoked
¡ We call this failing fast.
¡ Enables you to respond to failure in a predictable way by implemenBng fallbacks in each command
19
IBM Confidential
Encapsula7ng Service Dependencies: HystrixCommand<R>
¡ Each command extends a common type
¡ The constructor configures Group ID, Command ID, and other sevngs
¡ A run method implements the behavior
public class OSGetAccessTokenCommand extends HystrixCommand<Access> { private static final HystrixCommandKey GET_ACCESS_TOKEN_CMD_KEY = HystrixCommandKey.Factory.asKey("GetAccessToken”); public OSGetAccessTokenCommand(…) { super(Setter.withGroupKey( HystrixCommandGroupKey.Factory.asKey(KEYSTONE_GROUP_ID)) .andCommandKey(GET_ACCESS_TOKEN_CMD_KEY) .andCommandPropertiesDefaults(HystrixCommandProperties.Setter() )); ... } ... protected Access run() throws Exception { ... } }
20
IBM Confidential
More than your money’s worth
¡ All commands with the same Group ID have their own resource pool (more on that in a bit)
¡ Commands have configurable Bmeouts which are automaBcally enforced
¡ Commands can define fallbacks in case of Bme out or dependency failure
¡ Support opBmizaBons such as Request Collapsing and Request Caching
Future<Access> request = new OSGetAccessTokenCommand( identityEndpoint.get(), identityUsername.get(), identityPassword.get()) .queue(); Access credentialToken = request.get();
21
IBM Confidential
In case of emergency …
¡ Implement getFallback() when failures occur
¡ OpBons might include returning mock data or non-‐personalized data
¡ OpBonally check for the failure reason with getFailedExecuBonExcepBon() (Bmeout, excepBon, etc)
¡ When using Jersey for REST API, use javax.ws.rs.core.Response as your response type, and decide proper Response in getFallback()
public class UCDGetComponentsCommand extends UCDAbstractCommand<JSONArray> { @Override protected JSONArray getFallback() { JSONArray componentsArray = new JSONArray(); // Return mock data in case of failure for (String name : new String[] { "JKE Web", "JKE Database", "Mortgage App", "JPetStore Web" }) { try { MockData.getInstance() .createComponentResource(componentsArray, name); } catch (JSONException e) { … } } return componentsArray; }
22
IBM Confidential
Methods of Isola7on for Commands
¡ Hystrix supports two forms of IsolaBon: Thread Pool and Semaphore
¡ Thread Pool is the easiest to understand – each Group has its own dedicated Thread Pool and when requests come in while the thread pool they fail fast
¡ Semaphore is useful though if you want IsolaBon but don’t want to break your exisBng Threading model – example session filters which configure ThreadLocals for Servlets or Jersey Resources
public abstract class UCLAbstractDBCommand<R> extends UCLAbstractCommand<R>{
public UCLAbstractDBCommand( HystrixCommand.Setter setter) {
super(setter.andCommandPropertiesDefaults( HystrixCommandProperties.Setter() .withExecutionIsolationStrategy(
ExecutionIsolationStrategy.SEMAPHORE))); } @Override protected final R run() throws Exception { return doRun(); }
# Runs in the Jersey Resource’s thread # Leverages ThreadLocals from Hibernate Session Filter
protected abstract R doRun() throws Exception; }
23
IBM Confidential
When to throw in the towel … (configuring 7meouts, etc)
¡ Virtually all of the HystrixCommand’s opBons can be configured at runBme through Archaius
¡ Notable examples include the Thread Pool limits and Bmeouts
¡ See the Hystrix wiki for a full accounBng of what you can do
¡ Set these properBes dynamically through Karyon’s JMX console and watch command behavior change on the fly
# landscaper.properties or landscaper-[env-name].properties # configure default timeout milliseconds hystrix.command.GetAccessToken.execution.isolation.thread.timeoutInMilliseconds=20000 hystrix.command.OSGetImages.execution.isolation.thread.timeoutInMilliseconds=8000 … # let no more than 20 GetAccessToken commands run # together hystrix.threadpool.KeystoneGroup.maxQueueSize=20 # wait 15 seconds if we “trip” the circuit for a command hystrix.command.GetAccessToken.circuitBreaker.sleepWindowInMilliseconds=15000
24
IBM Confidential
Embedding Unit Tests
¡ Promoted approach by NeWlix to reduce fricBon and introduces limited addiBonal bytes relaBve to third party libraries
¡ Makes it easy to write commands, the UnitTest becomes the test harness and verificaBon
¡ Always testSuccess() and testFailure() use cases to ensure expected behavior
import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; public class OSDeployEnvironmentCommand extends OSAbstractOrchestrationCommand<Response> { public static class UnitTest { @Test public void testSuccess() throws InterruptedException, ExecutionException, JSONException { … Future<Response> request = new OSDeployEnvironmentCommand(…) .queue(); Response response = request.get(); assertNotNull(response); assertEquals(HttpStatus.SC_OK, response.getStatus()); } }
25
IBM Confidential
Bootstrapping Karyon – Advanced Topic
¡ If your commands leverage Karyon, it’s possible to bootstrap Karyon as part of a UnitTest
¡ Requires a liple more setup, but enables you to have control over the dependency injecBon from the Google Guice container
¡ Allows you to create Mock classes for things like ServletContext or other classes, if your Jersey Resource or Hystrix Command needs them
protected static KaryonServer karyonServer; protected static Injector injector; protected static void configureKaryon(List<String> packages) throws Exception { if (karyonServer == null) { ConfigurationManager .loadPropertiesFromResources( "landscaper-heat-unittests.properties"); … karyonServer = new KaryonServer(); injector = karyonServer.initialize(); karyonServer.start(); …
26
IBM Confidential
REST API with Ribbon
¡ A microservice architecture is generally built around REST-‐based interfaces
¡ Ribbon provides a HTTP client for calling service dependencies
¡ Provides more visibility into behaviors like connecBon Bmeouts, auto-‐retry and number of retries through Archaius properBes
¡ Also supports client-‐side load balancing through Eureka
ConfigurationManager .loadPropertiesFromResources( "heat-api-client.properties"); ClientFactory.getNamedClient("heat-api-client"); URI uri = new URI("/v1/" + token.getToken().getTenant().getId() + "/stacks"); HttpRequest request = HttpRequest.newBuilder().uri(uri) .verb(Verb.POST) .entity(getEntity().toString().getBytes()) .header("Content-Type", "application/json") .header("Accept", "application/json") .header("User-Agent", "python-heatclient") .build(); HttpResponse response = heatAPIClient.executeWithLoadBalancer(request);
27
IBM Confidential
Tying it all into a bow …
¡ Ribbon is generally used with HystrixCommands to execute requests
¡ Ribbon will automaBcally discover available servers and load balance between them
¡ Here the example has an explicit list of servers, but we can change the sevngs to use Eureka for service discovery
# configuration settings heat-api-client.properties # Interval to refresh the server list from the source heat-api-client.ribbon.ServerListRefreshInterval=2000 # Connect timeout used by Apache HttpClient heat-api-client.ribbon.ConnectTimeout=3000 heat-api-client.ribbon.listOfServers=localhost:8004,10.0.2.15:8004 // No need to reference an explicit server, Ribbon finds one URI uri = new URI("/v1/" + token.getToken().getTenant().getId() + "/stacks"); HttpRequest request = HttpRequest.newBuilder().uri(uri) .verb(Verb.POST) ... HttpResponse response = heatAPIClient.executeWithLoadBalancer(request);
28
IBM Confidential
Leveraging Hystrix’s Built In Event Stream
29
IBM Confidential
Netflix OSS IBM port/enablement
Netflix “Zen” of Cloud • Worked with initial services to enable cloud native arch • Worked with initial services to enable NetflixOSS usages • Created scorecard and tests for “cloud native” readiness
Highly Available IaaS and Cloud Services
• Deployment across multiple IBM SoftLayer IaaS datacenters and global and local load balancers
• Complete automation via IBM SoftLayer IaaS API’s • Ensured facilities for automatic failure recovery
Micro-service Runtimes (Karyon, Eureka Client, Ribbon, Hystrix, Archaius)
• Ported to work with IBM SoftLayer IaaS and on the WebSphere Liberty Profile application server
• Created “eureka-sidecar” for non-Java runtimes and ElasticSearch discovery
Netflix OSS Servers (Asgard, Eureka Server, Turbine)
• Ported to work with IBM SoftLayer IaaS + RightScale • Operationalized HA and secure deployments for multiple service tenants
Adopted Chaos Testing • Ported Chaos Monkey to IBM SoftLayer IaaS • Performed manual Chaos Gorilla validation on services
Worked through devops tool chain
• Worked with initial services to enable continuous delivery with devops (and imagine baking via Animator like tool)
• Working through integration with Urban Code Deploy and other IBM continuous delivery tools
AIM Scalable Services Fabric Ne0lixOSS Port to So_Layer
30
IBM Confidential
Demo of Ne0lixOSS and Ne0lix services on So_Layer
Region (Dallas)
DAL01
Datacenter (DAL06) DAL05
Eureka
Local LBs Web App Auth Service Booking Service
Cluster Auto Recovery and Scaling Services
Global Load Balancers
Asgard
SoftLayer GLB/LLB and Datacenters
31
IBM Confidential
Demo of Ne0lixOSS and Ne0lix services on So_Layer
Turbine
Eureka
32
IBM Confidential
About Your Architecture
• Architecture should support DevOps principles such as staged roll out, operational insights, and scriptability • Each resource provides some very practical advice for building systems which are focused on reliability and
feedback loops
Release It!: Design and Deploy Production-Ready
Software
http://netflix.github.io/#repo
33
IBM Confidential