un-broken logging - technologyug - leeds - matthew skelton
Post on 15-Jan-2017
1.049 Views
Preview:
TRANSCRIPT
Un-Broken Loggingthe foundation of software operability
TechnologUG, Leeds #techugThursday 22th October 2015
Matthew SkeltonSkelton Thatcher Consulting
@matthewpskelton
The way we use logging is (often) broken
How to make our logging more awesome
Why we should care
Matthew Skelton
@matthewpskelton
#techug
@Operability
#operability
WhoOwnsMyOperability.com
confession:
I am a big fan of logging
exceptional situationsedge cases
metricsanalytics‘audits’
…@evanphx
execution trace
BAD STUFF
Logging is often unloved
1. Discontinuous
2. Errors only, or arbitrary
3. ‘Bolted on’
4. No aggregation & search
5. Specify severity up front
GOOD STUFF
How to make logging awesome
1. Continuous event IDs
2. Transaction tracing
3. Log aggregation & search tools
4. Design for logging
5. Decoupled severity
reduce time-to-detectincrease team engagement
increase configurabilityenhance DevOps collaboration
#operability
Background
Autonomous weather station
MRI brain scan imaging
Oil well monitoring
Web-scale systems
logging makes things work
(event sourcing)
(structured logging)
(CQRS)
How is logging usually broken?
Logging is often unloved
1. Discontinuous
2. Errors only, or arbitrary
3. ‘Bolted on’
4. No aggregation & search
5. Specify severity up front
using logging mainly for errors
inconsistent use of logging
logging slows down the software
logging ‘pollutes’ my precious domain model
logging is just for those weird Ops people
logging assumed to be free ($0) to implement
no budget for aggregating logs across machines
log aggregation happens only in Production
logs not available to Devs
fights over log severity levels
poor time synchronisation
Some history, with pirates
weather, course, sightings, latitude, longitude, …
(even when quiet)
John
Har
rison
Why log?
verificationtraceability
accountability
charting the waters
- June 13th –Pirates!!!!
- Weds –Sharks!!!
- 19th Jun –BIGGER sharks!!!!
How to make logging awesome
How to make logging awesome
1. Continuous event IDs
2. Transaction tracing
3. Log aggregation & search tools
4. Design for logging
5. Decoupled severity
Storage I/O
Worker Job
Queue
Upload
Continuous event IDs
How many distinct event types (state transitions) in
your application?
represent distinct states
enum
Human-readable sets: unique values, sparse, immutable
C#, Java, Python, node(Ruby, PHP, …)
public enum EventID
{
// Badly-initialised logging data
NotSet = 0,
// An unrecognised event has occurred
UnexpectedError = 10000,
ApplicationStarted = 20000,
ApplicationShutdownNoticeReceived = 20001,
PageGenerationStarted = 30000,
PageGenerationCompleted = 30001,
MessageQueued = 40000,
MessagePeeked = 40001,
BasketItemAdded = 60001,
BasketItemRemoved = 60002,
CreditCardDetailsSubmitted = 70001,
// ...
}
Technical
Domain
public enum EventID
{
// Badly-initialised logging data
NotSet = 0,
// An unrecognised event has occurred
UnexpectedError = 10000,
ApplicationStarted = 20000,
ApplicationShutdownNoticeReceived = 20001,
PageGenerationStarted = 30000,
PageGenerationCompleted = 30001,
MessageQueued = 40000,
MessagePeeked = 40001,
BasketItemAdded = 60001,
BasketItemRemoved = 60002,
CreditCardDetailsSubmitted = 70001,
// ...
}
BasketItemAdded = 60001
BasketItemAdded = 60001
BasketItemRemoved = 60002
BasketItemAdded = 60001
BasketItemRemoved = 60002
represent distinct states
OrderSvc_BasketItemAdded
Monolith to microservices:debugger does not have the full view
Even with remote debugger, it’s boring to attach and detach
Storage I/O
Worker Job
Queue
Upload
Transaction tracing
‘Unique-ish’ identifier for each request
Passed through downstream layers
Unique-ish ID
What about APM?
APM gives us application insightBUT
How much do we learn? Is APM available on the Dev box?
It’s not just ‘an Ops problem’!
Helps us to understand how the software really works
Small overhead is worth it
Configurable severity levels
Which log level is right?
DEBUG, INFO, WARNING, ERROR, CRITICAL
Log level should *not* be fixed at compile or build time!
Tune log levels
Tune log levels
Tune log levels
{
"eventmappings": {
"events": {
"event": [ {
"id": "CacheServiceStarted",
"severity": { "level": "Information" }
}, {
"id": "PageCachePurged",
"severity": { "level": "Debug" },
"state": { "enabled": false }
}, {
"id": "DatabaseConnectionTimeOut",
"severity": { "level": "Error" }
} ]
}
}
}
Tune severity levels of specific event IDs
Event tracing
Use enumerations (or closest thing)
Technical and Domain event types
Distributed systems: debuggers less useful
Trace calls with ‘unique-enough’ handles
Tune log levels via config
Log aggregation & search tools
Design for log aggregation
develop the software using log aggregation as a first-class thing
stories for testing logging
BasketItemAdded
grep BasketItem
logging is (‘just’) another system component
NTP
Dev and Ops collaboration*
* and testers too!
Where?
auditingcompliance
pre-emptive fault diagnosisperformance
metrics…
Recap
Logging is often unloved
1. Discontinuous
2. Errors only, or arbitrary
3. ‘Bolted on’
4. No aggregation & search
5. Specify severity up front
How to make logging awesome
1. Continuous event IDs
2. Transaction tracing
3. Log aggregation & search tools
4. Design for logging
5. Decoupled severity
logging makes things work
“There is no thought behind aspect-oriented programming”
MINDFUL LOGGING (?!)
database transaction logs
‘Structured Logging’TW: “Adopt” (May 2015)
https://www.thoughtworks.com/radar/techniques/structured-logging
http://gregoryszorc.com/
.NET: http://serilog.net/Java: https://github.com/fluent/fluent-logger-java
sanity
More
Ditch the Debugger and Use Log Analysis Instead
Matthew Skelton
https://blog.logentries.com/2015/07/ditch-the-debugger-and-use-log-analysis-instead/
More
Using Log Aggregation Across Dev & Ops: The Pricing Advantage
Rob Thatcher
https://blog.logentries.com/2015/08/using-log-aggregation-across-dev-ops-the-pricing-
advantage/
Books
operabilitybook.comoperationalfeatures.com
Thank you
http://skeltonthatcher.com/enquiries@skeltonthatcher.com
@SkeltonThatcher
+44 (0)20 8242 4103
@matthewpskelton
top related