typesafe reactive platform: monitoring 1.0, commercial features and more
TRANSCRIPT
Typesafe Reactive Platform:Monitoring 1.0, Commercial features and more
Jamie Allen (@jamie_allen)Senior Director of Global Services
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
OPENCORE
Agenda
• Overview – Reactive Platform• Deep Dive – Improving Fault Tolerance (Resilience)• Next Steps – Getting Started
I’m Jamie Allen | @jamie_allen
Becomes fast fish that eats slow fish
Becomes streaming video delivery service
Is fighting for top talent on prime time
“Fundamental shift to digital business requires 50% of software in the next 5 years
to be built with a new model.”
Reactive: The new way of building software
Reactive Manifesto penned
Industry aligned
Reactive Streams defined
Included in JDK 9
Developers empowered
“You allowed us to come up with a design that we could only dream of before.”
“It’s hard to put into words how exciting it has been to work on a project like this.”
“You made programming fun again.”
“You saved my career.”
Reactive Platform Overview
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Enhance UsabilityIdentify Bottlenecks
Improve Performance
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Boost ResilienceStreamline Rollouts
Increase Predictability
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Mitigate Data LossReduce Ops BurdenImprove Cluster Health
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Protect ServersDelight Customers
Block Bad Behavior
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Unlock DataRevitalize ArchitectureMaximize Investments
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Reduce RiskEase Maintenance
Improve Predictability
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Eliminate ConflictsReduce GuessworkSpeed Development
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Boost ProductivityMitigate Production Risk
Speed Knowledge Transfer
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
OPENCORE
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
Focusing on Fault Tolerance for Resilient systems
System Orchestration
Application Monitoring
Application Availability
Partition Healing
Security Notifications
Legacy Integration
Expert Support
Certified Build
OPENCORE
Strengthening Resilience - Network Partition handling
• Network partitions - fundamental problem in distributed systems
• Akka SBR helps make decisions • Pre-built strategies, when to down nodes in cluster
• Static Quorum (like Zookeeper)• Keep Majority• Keep Oldest• Keep Referee
Akka Split Brain Resolver
Heartbeats
heartbeats
heartbeats
What network partitions look like to Ops
A
Heartbeats
heartbeats
heartbeats
Yikes, everyone is down!
A
What network partitions look like to Ops
Heartbeats
heartbeats
heartbeats
Hey team, `n-1` is down!I’ll take over `A`!
A
What network partitions look like to Ops
Heartbeats
heartbeats
heartbeats A A
Good if n-1 really is downBad if n-1 is just very unresponsive
Fundamentally, it is hard to distinguish the two states in distributed systems
What network partitions look like to Ops
Hey team, `n-1` is down!I’ll take over `A`!
Static Quorum (3 (> (n/2 +1))
A
Akka Split Brain Resolver
Static Quorum (3 (> (n/2 +1))
A
we need to down ourselves
Akka Split Brain Resolver
A
Keep Majority (aka. dynamic quorum)
Akka Split Brain Resolver
A
Keep Majority (aka. dynamic quorum)
we need to down ourselves
Akka Split Brain Resolver
referee node
Keep Majority (aka. dynamic quorum)
A
down-all-if-less-than-nodes
Akka Split Brain Resolver
referee node
Keep Referee
A
can’t see referee node!
down-all-if-less-than-nodes
Akka Split Brain Resolver
referee node
down-if-all-alone
Keep Referee
A
can’t see referee node!
Akka Split Brain Resolver
oldest node
down-if-all-alone
Keep Oldest
A
can’t see oldest node!
Akka Split Brain Resolver
oldest node
down-if-all-alone
Keep Oldest
A
can’t see oldest node!
oldest node can change,if “up until now oldest node” leaves the cluster
This is more dynamic than keep-referee.
Akka Split Brain Resolver
• No Brainer – Using Akka Cluster, deploy AWS• Next Steps - read docs, download Reactive Platform
Akka Split Brain Resolver
Strengthening Resilience - System Orchestration
ConductR
• Message-driven apps run on 10s, 100s, 1000s of nodes• Beyond 3 nodes, challenging for ops• ConductR, eases deployment and management
• focused on resilience for your system, not infrastructure
ConductR
• Manage microservices-based apps• Automated cluster startup• Dynamic service discovery• Scalable rolling updates
ConductR
• Hardcore resilience for systems• Load balancing at scale• Auto recovery failed apps/nodes• Advanced partition resolution
ConductR
• Smooth release process• Sandbox for Dev and Ops• Immutable, standardized• Various packaging formats
(Docker, JVM)
ConductR
• Keep your existing tools• Infrastructure agnostic• Combine with Monitoring• Consolidated logging
Without ConductR• Build machines• OS• App server• Apps lifecycle
• Add resilience• Config Load balancer• Config port
With ConductR• Build nodes w/ ConductR• OS• ConductR
• Deploy apps/ services to cluster via ConductR
Resilient from the core, not as an add-on
ConductR
• No Brainer – Using Akka Cluster, deploy AWS, 3+ nodes• Next Steps - view interactive demo, enjoy sandbox
Looking After System Resilience - Application Monitoring
• Asynchronous apps pose new challenges• Context is lost• Traces less useful• Easy to get flooded with data
Monitoring
Monitoring
• Monitor asynchronous apps• Real-time big picture• Configurable metrics• Customizable thresholds
Monitoring
• Enhance user experience• Design for performance• Fix bugs, code level views• Boost resilience, ConductR
Monitoring
• Vision for full coverage for Reactive systems:• Akka Streams, Data Flows• Futures, Scala and Java 8 • Tracing Play , Akka HTTP
Monitoring
• No Brainer – building Akka-based apps• Next Steps - view interactive demo, download Monitoring
The world is going Reactive
Reactive PlatformGetting Started
Sign up to get license ID• Get Started on Typesafe.com• Register for a free account• Apply ID to existing project, or start a new one
Getting Started with RP
Use with your new RP project• Developer sandbox with Docker• Full deployment evaluation also available
Experiment with ConductR
Use with your new RP project• Developer sandbox with Docker• Full deployment evaluation also available
Experiment with ConductR
GET IN TOUCH
Help is just a click away. Get in touch with Typesafe about:
• Production licensing and subscriptions• Additional services and support• On-site, expert training
CONTACT US