brighttalk high scale low touch and other bedtime stories - final
Embed Size (px)
TRANSCRIPT

High Scale Low Touch and Other Bedtime Stories

Mr. White has fifteen years of experience designing and managing the deployment of Systems Monitoring and Event Management software. Prior to joining IBM, Mr. White held various positions including the leader of the Monitoring and Event Management organization of a Fortune 100 company and developing solutions as a consultant for a wide variety of organizations, including the Mexican Secretaría de Hacienda y Crédito Público, Telmex, Wal-Mart of Mexico, JP Morgan Chase, Nationwide Insurance and the US Navy Facilities and Engineering Command.
Andrew White Cloud and Smarter Infrastructure Solution Specialist IBM Corporation

http://weheartit.com/entry/12433848!

Ground rules for this session… • If you can’t tell if I am trying to be funny…
– GO AHEAD AND LAUGH! • Feel free to text, tweet, yammer, or whatever
during this talk. I appreciate the real time feedback from you.
• If you have a question, no need to wait until the end. Just interrupt me. Seriously… I don’t mind.

Anti-patterns and
I am here today to share some of what I have learned about
Orchestration

What Is a System? It is a set of interconnected actors that change over time when they are influenced by other elements of the system.
Actor
Actor
Actor Actor
Actor
Actor
Actor
Actor

Systems are Volatile This change makes it difficult to control the behavior of the system. The good news is that systems are perfect. They always deliver the optimum result given a specific stimuli.

Rare Events “one chance in a million” will undoubtedly occur, with no less and no more than it’s appropriate frequency, however surprised we may be that it should occur to us. Sir Ronald A. Fisher
© Aquire Inc. 2012

The Gaussian Bell Curve Mean
-1σ +1σ -2σ +2σ
-3σ +3σ 67%
95%
99.5%

Two Important Properties • The causal effect between two actors will
always impact the entire system • Correlation != Causation

How Fire Works
Time
Oxygen Heat Fuel
Fire
Mat
ch S
trike
Action
Conditions
Fire
Oxygen
Heat
Fuel
Match Strike
-AND-
• Actions are momentary and act as a catalyst to bring about change
• Conditions are stable and exist over time

A Real World Example
Customers Complaining
Web Server returning 500 errors
The application server was timing
out
SQL Server was not processing queries
Transaction log was unable to grow
T: Drive at 0 Bytes free
Logs were not truncated
DBA on honeymoon vacation in Fiji
Logs are truncated manually
Company has only 1 DBA
“Backup” DBA was not aware the logs require truncation
Space allocations are fixed Lack of Control
Only one database cluster in use
DR SQL Cluster
DR Cluster being used for UAT testing
More Information Needed
One one application server exists
More Information Needed
Trying to do business on the website Desired Condition
-AND-
-AND-
-AND-
-AND-
-AND-
-AND-
-AND-

Hidden Factors
Hidden Factor
Smoking Lung Cancer

A failure is not always a mistake, it may simply be the best one can do under the circumstances. -- B.F. Skinner, American Psychologist & Professor, Harvard University

Uh oh. Two independent thought alarms in one day. The students are over-stimulated. Willie! Remove all the colored chalk from the classrooms. --B.F. Skinner, Principal, Springfield Elementary School

You can't judge my choices without understanding my reasons. -Unknown

Feedback Loops Unfortunately feedback has taken on both positive and negative indications. In reality, positive feedback is not “praise” and negative feedback is not “criticism.” Positive feedback reinforces while negative feedback balances.
Profits
Productivity
Cost Cutting Reinforcing
Balancing

The Agile Value Proposition
Availability
Change Frequency Change
Size
Change Capability Change
Risk
(-)
(+) (+)
(-) (-)
Adapted From: http://www.lean4it.com/2013/05/devops-cld-part-2.html

Customer Satisfaction
Availability
Change Frequency Change
Size
Change Capability Change
Risk
(-)
(+) (+)
(-) (-)
Business Value
Business Demand
Change Backlog (+)
(+)
(+)
(+)
(-)
(+)
Adapted From: http://www.lean4it.com/2013/05/devops-cld-part-2.html

Be Careful of Good Intentions
Availability
Change Frequency Change
Size
Change Capability Change
Risk
(-)
(+) (+)
(-) (-)
Business Value
Business Demand
Change Backlog (+)
(+)
(+)
(+)
(-)
(+)
Change Process
Release Process
(+)
(+)
(-) (+)
(+)
Adapted From: http://www.lean4it.com/2013/05/devops-cld-part-2.html

Be Careful of Good Intentions
Availability
Change Frequency Change
Size
Change Capability Change
Risk
(-)
(+) (+)
(-) (-)
Business Value
Business Demand
Change Backlog (+)
(+)
(+)
(+)
(-)
(+)
Change Process
Release Process
(+)
(+)
(-) (+)
(+)
Change Automation
Adapted From: http://www.lean4it.com/2013/05/devops-cld-part-2.html
(+) (-) (-)

The trick is not to spend our time trying to get better at predicting this world, or making it more predictable, for both of these strategies are bound to fail. - Nassim Nicholas Taleb, Author and Philosopher

http://www.lamomparis.com/2012/08/french-handball-fantasies-whatever.html
Woteva you say, man!

Increasing Complexity
§ Heterogeneous environments § Organizational silos § Skill gaps
Massive Scale
§ Users, transactions, data § Rapid demand cycles § Unpredictable
Rapid Pace
§ Evolving ecosystem § Minimize time to value § Accelerating business needs
Today’s IT infrastructures are too complex, provide poor scalability, and are slow to keep up with today’s rapid rate of change
A new set of challenges
V1 V2 V3 V4 V5 V5 ... …. Vn
C C
W1 W2 W3 W4
R1 R2 R3
Traditional (Systems of Record)
Emerging (Systems of Interaction)
Workload View

Future
§ Rapidly changing workloads, dynamic patterns
§ Dynamic automatic composition of heterogeneous system
§ Autonomic and proactive management
Current
§ Diverse workload, limited patterns
§ Homogeneous resource pooling
§ Expert configuration and mapping of workload
Traditional
§ Few, stable, and well known workloads
§ Fixed System hardware, manual scaling
§ Hardwired workload, minimal configuration
W1 W2 W3 W4
R1 R2 R3
Volatile workload characteristics result from changing business requirements
V1 V2 V3 V4 V5 … Vn V1 V2 V3 V4 V5 V5 ... …. Vn
C C
Workloads are volatile

A new approach to infrastructures
Simplified Reduce infrastructure complexity, specialization and support
Adaptive Ensure service qualities through policy based, scalable infrastructure automation
Responsive Accelerate application lifecycle and workload deployment through rapid change
Resource Smart
Application Aware
Definitions Patterns Analytics
Compute Network Storage
Software Defined Environments are optimized to deliver the agility, efficiency and performance needed for today’s workloads
Software Defined Environment

What are Software Defined Environments
Resource Smart
Application Aware
Definitions Patterns Analytics
Compute Network Storage
With a Software Defined Environment, business users can describe their requirements of the IT environment in a systematic way that in turn drives automation of the infrastructure
Software
Defined
Environments
Abstracted and virtualized IT infrastructure resources managed by software Applications that define infrastructure requirements and configuration IT infrastructure that extends multiple environments to go beyond the data center

Fully virtualized, integrated & programmable infrastructure
Elastically scalable resources available on-demand
Intelligent resource scheduling
Infrastructure that captures workload requirements and deployment best practices
Policy-based automation across infrastructure
Analytics to optimize the environment in real-time
The next-gen infrastructure for HSLT
Resource Smart
Application Aware
Definitions Patterns Analytics
Compute Network Storage
Software Defined Environment
Application Aware that understands the unique workload requirements Resource Smart that dynamically allocates infrastructure based on policies

What the business wants:
• Define business needs • Identify service opportunities and requirements
• Quickly experiment and test new services
What Software Defined Environments provides:
• Patterns of Expertise to link solution to infrastructure based on business rules
• Automated orchestration of workloads • Analytics-based optimization of workload to maximize outcomes
Con
tinuo
us O
ptim
izat
ion
P r io r i t yP o l ic y
S e r v i c e
P r io r i t yP o l ic y
S e r v i c e
Solution Definition
Leverage best practices with patterns of expertise

What’s the best infrastructure for my cloud?
How do I manage & secure my hybrid environment?
How do I maintain choice and flexibility?
How do I rapidly deploy & operate my cloud?
Are you building the right cloud?

Anti-Pattern: a commonly used process structure or pattern of action that, despite initially appearing to be an appropriate and effective response to a problem, typically has more bad consequences than beneficial results. -Wikipedia

Benefits Consequences
Related Patterns
How to recognize anti-patterns Problem
Solution
Context and Forces
Design Pattern
Benefits
Anti-Pattern Solution
Refactored Solution
Consequences
Symptoms and Consequences
Related Patterns
Design Anti-Pattern
Contextual Causes

Are Today’s Good Practices Tomorrow’s Anti-Patterns?

Architecture by Accident
The Humble Start… Meeting Demand…
The First Bottleneck…
The Second Bottleneck…
Becoming Mission Critical…
Enabling SOA… The Fun Begins…
How Did We Get Here?

Everything fails, all the time. - Werner Vogels, CTO Amazon


Six Types of Socratic Questions • What does that mean? • Can you explain that?
Questions that Clarify
• Is that always the case? • How can you verify or disprove?
Questions that Challenge
• What causes that to happen? • What would be an example?
Questions that Examine
• Why is that view best? • What is another way to look at it?
Questions that Expand
• What are the consequences? • What are you implying?
Questions that Asses
• What was the point of that question? • How does that question apply to us?
Questioning the Question

Questions we need to ask • What kind of scenarios do I need to plan for? • Where are my single-points-of-failure? • If I use a master/slave architecture, what happens
when I lose the master? • What happens when my load balancer fails? • How will I replace a node when it fails? • How will I recognize a failure when it occurs? • What if the cache grows beyond the memory limit
of the instance? • What if a downstream service times out or returns
an exception?

Requirements for cloud scalability • Build processes threads that resume on reboot • Allow for state to re-sync using message queues • Prepare pre-configured and pre-optimized virtual
images for quick launches • Avoid in-memory data stores • Have a good DR strategy and then automate it • Create application as loosely coupled systems to
enable better fault tolerance and larger scale

Don’t forget get about the cloud-as-a-system view

Where Does Over-Subscription Occur?
Corporate!LANs & VPNs!
Load Balancer!
Load Balancer!
Firewall!
Switch!
VM Server Farm!
Database!
NAS !Appliances!
Storage!Frame!
Web Servers!
Load Balancer!
1. Hypervisor 2. CPU Cycles 3. Memory 4. Blade Backplane I/O 5. SAN Fabric 6. Network Interfaces 7. Host Bus Adapters 8. Backup Device 9. WAN Circuits 10. Storage Processors
Common Locations !!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!
Here
Here
Here Here


The scalability of an application is defined by whether it can accommodate changes in traffic without requiring changes in architecture.

Service Orientation 1
2
3
4
5
6
Goals of Service
Orientation
Abstraction
Loose Coupling
Autonomy
Standard Services
Composability
Reusability

Or�ches�tra�tion [AWR-kuh-strey-shun]
• A central process controls everything and coordinates the execution of different operations involved in the operation
• The services do not "know” that they are involved in a composite process
• Only the central coordinator of the orchestration is aware of the desired outcome,
• The orchestration leverages explicit process definitions to operate the services in the correct order of invocation
1. the act of arranging a piece of music 2. the planning or execution of events in order to achieve a desired effect 3. The technique of arranging or manipulating, especially by means of
clever or thorough planning or maneuvering

Orchestration Illustration
Orchestrator
Web Service 1
Web Service 4
Web Service 3
Web Service 2

Cho�re�og�ra�phy [kawr-ee-OG-ruh-fee] 1. the art of composing ballets and other dances 2. the method of representing the various movements in dancing by a
system of notation 3. The arrangement or manipulation of actions leading up to an event
• Choreography does not rely on a central coordinator. • Each service knows exactly who and when to execute • Focuses on the exchange of messages and information • All services need to be aware of the business process,
operations to execute, messages to exchange, timing, etc.

Orchestration Illustration Web Service
1
Web Service 4
Web Service 3
Web Service 2
Send Receive
Invoke
Invoke
Invo
ke

Choreography vs. Orchestration • From the perspective of composing services to
execute business processes, orchestration is a more flexible paradigm and has the following advantages over choreography: – The coordination of component processes is centrally
managed by a known coordinator. – Web services can be incorporated without their being
aware that they are taking part in a larger business process.
– Alternative scenarios can be put in place in case faults occur.
Page 49

Orchestration Requirements • Event-based processing • Coordinate asynchronously between services • Correlate messages being exchanged • Provide for parallel processing • Allow for transaction roll-back • Manipulate and transform data between messaging
partners • Be able to manage long running business
transactions and activities • Have a robust mechanism for fault and error
handling

The Management Eco-System
Capacity Management Compute Storage Network Facilities
Event Management (Netcool OMNIbus)
CMDB (SmartCloud ControlDesk)
Billing (Cost Mgr)
Software Tracking
(TEM SUA)
SmartCloud Monitoring
Storage Productivity
Center
Network Performance
Manager
Data Center Infrastructure
Manager
Capacity Management
Predictive Insights Capacity Analyzer
Automated Reporting Engine (Tivoli Common Reporter)
Cloud Automation (SmartCloud Orchestrator)
Interface for Capacity Planners
Interface for Business Users
Policies (EGO)
Data Warehouse

One Integrated Environment
Distributed Database Mainframe Network Middleware Storage
Event Pool
Operational!Data Warehouse!
Predictive
Enrichment & Correlation
Service Desk Paging
CMDB
Knowledge
Asset Mgmt
Event Catalog
Event API
Business Telemetry
3rd Party Providers
Presentation Framework

Processing Streams
Situational Awareness
Engine
Adapted from http://www.slideshare.net/TimBassCEP/getting-started-in-cep-how-to-build-an-event-processing-application-presentation-717795
Real-Time Event Streams
Detected and Predicted Situations
Patterns from Historical Data
Causal Relationship from Past RCAs

Complex Event Processing
Event Pipeline
Event Queries
Time Window
Data Events
Control Event
Other Events
Event Filter
Scenarios
A
B
C
Feedback Loop
Event Intelligence
Action Events

Automated Action
Notification and Escalation
Business Impact
Analysis
Root Cause Analysis
Correlation and Event Suppression
Enrichment
Meta-Data Integration Bus
Distributed Collectors Distributed Collectors
LOB Managed Monitoring System
Service Provider Monitoring System
Vendor Managed Monitoring System
Element Manager
Element Manager
Element Manager
Other Enterprise
Data Document
Sharing Service Desk CMDB Batch Scheduling
Knowledge Database
Online Run Book
PBX/Call Manager
Visualization Framework Com
mon Event
Format
Topology And Relationship
Database Automated Action
Tools
Distributed Collectors Automated Provisioning
System
Predictive Analysis
Automated Change
Reconciliation
Security Management
Archive and Report
Business Telemetry Data
Service Center and Enterprise
Notification Tool
Event Processing

Palette of library assets enable easy
workflow composition through drag and drop
Access to rich libraries (toolkits) of reusable
automation assets that enable to speed
automation creation
Rich set of actions types, flow control, data handling
primitives that simplify creation of complex
automations
Easy workflow action editing for managing: data mapping,
error recovery options, implementation details , etc.
Graphical editor for composing and
connecting workflows
Rich tooling functions to edit, version, debug,
optimize workflows
Automating Responses

Custom"Data
Core pattern processing
BPM"Process
SCO
BPM"Human"Service
pre-provision "event operation post-provision "
event operation
Operation context
Operation context Operation
context
Custom"Data
SCO REST API
Operation context 1
2
4 5
3 SCO REST API
A B
A B
Once per registered event operation
Extending Workload Deployment with Custom Automation
Pre-process event
Post-process event

Completing the journey
Define
• Review the existing architecture • Review the business outcomes • Define the end state
Prioritize
• Consolidations • Technologies to virtualize • Business processes to model and workflows to automate
Execute
• Look for early wins • Evolve incrementally • Organize the teams effectively

You have to be realistic about how fast you can mature. Iterating helps form a cultural of continuous improvements
Iterative development

Let’s keep the conversation going…
ReverendDrew!
SystemsManagementZen.Wordpress.com!
systemsmanagementzen.wordpress.com/feed/!
@SystemsMgmtZen!
ReverendDrew!
614-306-3434!
