see through software
DESCRIPTION
Software, by default, is opaque. It doesn't explain what it is doing, or how well it's going. Only through development do we gain visibility into systems, and thus commonly only the needs of developers are met. See through software seeks to acknowledge the needs of the entire organization through democratization of data access. This talk focuses on logging and metrics as two sources of potential insight and the architecture CommerceHub has adopted to democratize access to this informationTRANSCRIPT
See-through softwareUsing logs, metrics and visualization to see your app at
runtime and share your vision with others
The best technical solutions are ones that solve for human relationships
Opaque software suffers from a lack of focus on the operations user experience
Software is opaque by default
Software is opaque by defaultWhat's it doing?
Software is opaque by defaultIs it going well?
Software is opaque by defaultWhen's it going to be done?
Software is opaque by defaultDoes it need me to do anything?
When you write opaque software
This is the user experience of operations
This is the user experience of support
This is the user experience of your boss
Opaque software leads to
• Misaligned priorities
• Loss of productivity
• A generally "unprofessional" experience for customers
• The "us vs them" attitude that is the antithesis of DevOps culture
If you don’t provide facts, you encourage mythology
See-through software acknowledges the operations user
experience of the entire organization
–Jason Nemec
“A good user experience should make a user feel smart, powerful and safe.”
We feel smart when
• We understand what's going on
• We understand how to change things
• Others share our understanding
We feel powerful when
• We are able to change things
• We see the results of our changes
• We can find an answer to our questions
We feel safe when
• We know there isn't a problem
• We know if there is a problem, we'll be able to understand it
• We can trust others to react on our behalf
These principles help to develop a roadmap to
improving transparency
Transparent software gets your attention
• Dashboarding
• Alerting
Transparent software takes on a shape
• Graphing
• Modeling
Transparent software tells stories
• Logging
• Auditing
• Reporting
Transparent software responds interactively to questions
• Ad Hoc Queries
• Post Hoc Analytics
Transparent software is democratic
• Wikis
• Shared visibility
• Persistent chat rooms
Software does not become transparent as the result of any single project
• Software evolves; its UX needs to evolve with it
• Insight is rarely easy to produce, and easy to produce information is rarely insightful
• Insight is frequently driven from the bottom up or from the outside in
Democratization means everybody benefits
And everybody has a role to play
"Clearinghouse" services• Store data for people who
can’t get it themselves
• Collect and persist data from many different sources
• Provide a single engine for serving information
• Reduce pressure on critical infrastructure from interested users
"Visualization" services
• Provide studio-like tools allow "non-technical" users feel safe to experiment
• Allow for rapid, real-time development of new insights on old data
• Allow for sharing and repurposing of insight
The see-through system at runtime
Logging
Good logs tell a story
• Each statement is a sentence: it needs verbs and nouns
• Each statement has a setting -- where, when and who
• It should be simple to reconstruct the story told by independent sentences
Aggregate your logs to create an epic
• Discover systems that are acting aberrantly
• Correlate errors between coordinating systems
• Graph meaningful patterns in your stories
Index your logs to find interesting stories quickly
• Audit individual chains of processing from start to finish
• Slice up your reports so they interest a specific group or team
• Build new reports quickly to solve unpredicted needs
Aggregation architectureComponents: log to the fastest, convenient, least likely to
fail store available (e.g. local disk)
Aggregation architectureLog shippers: asynchronously publish logs to an
aggregator
Aggregation architectureAggregator: parse, clean, enrich and store logs
Aggregation architectureClearinghouse: hold data and standardize access
Aggregation architectureVisualization: Allows data to inform and be manipulated
by end users
Log aggregations on private networksThe ELK stack
(Elasticsearch + Logstash + Kibana)
Log aggregation in the Cloud
Developing apps with log aggregation in mind
• Use Correlation IDs throughout your system
• Don't log secrets
• Build log strategies with shipping and rolling in mind
• Have a way to capture crashes
• Log using techniques that preserve context, such as JSON
The see-through system at runtime
Dashboarding
Focus on UX
Make users feel smart
• Dashboards should inform without a lot of explanation or prior knowledge
• Dashboards should direct the user to the next step
Make users feel powerful
• Dashboards should update frequently (aim for <10s)
• Dashboards should help users perform their job
• Dashboards should respond to the user's needs
Make users feel safe
• Dashboards should not overwhelm
• Dashboards accuracy should be known
• Thresholds should be meaningful
• Using a dashboard should not endanger the running software
How to build a dashboard item
Are you concerned with a technical or a business issue?
• Technical: Machine 123 is slow, West Coast users are slow, we’re moving 80 GB/s
• Business: Client ABC is slow, logins are slow, we’re moving 1000k orders/s
How does a stressed system look?How can you tell it from an unstressed system?
What kind of comparisons do you want to provide?
• Time series vs flat
• Machine vs Machine
• Current vs Previous
• Current vs Threshold
Dashboarding architectureMetric source: a process within an app that can produce
a numeric value
Dashboarding architectureMetrics collection API: decouples the collection of metrics
from their publishing; generally still part of the app
Dashboarding architectureStats Aggregator: an out-of-process component that
creates aggregate data points from a stream of metrics
Dashboarding architectureMetrics clearinghouse: hold data and standardize
access
Dashboarding architectureVisualization: Allows a user to build and correlate graphs
Dashboarding architectureDashboarding: Allows a user to share a distilled vision of
data
Dashboarding on private networksStatsD + Graphite
Dashboarding in the cloud
Developing apps with dash boarding in mind
• Collect and report everything that’s “free”
• Collect and report deep, valuable application metrics at runtime
• Understand aggregation and know when to apply it
• Be aware of multiplicative effects of metrics collection on bandwidth, storage and billing
ScoreKeeperGather metrics from existing datasources into statsd/
Graphite
See-through software
• Lets the people whose jobs depend on software understand what and how it's doing.
• Empowers people to ask their own questions and share their insights
Help teams become more successful
• By understanding when there's a problem
• By focusing energy where it's needed most
• By talking to customers in a competent and informed way
@DataMiller