nanolog: a nanosecond scale logging system · overview • implemented a fast c++ logging system...
TRANSCRIPT
NanoLog:ANanosecondScaleLoggingSystem
StephenYangJohnOusterhout
February9th,2017PlatformLab Review2017
Overview• ImplementedafastC++LoggingSystem• 12.5nsmedianlatencyat60M logmsgs/sec• 10-100xfasterthanexistingsystemssuchasLog4j2andspdlog• Maintainsprintf-likesemantics
• Shiftsworkoutoftheruntimehot-path• Extractionofstaticinformationatcompile-time• Compactedbinaryoutputatruntime• Defersformattingtoanofflineprocess
• BenefitsandCosts• Allowsdetailedlogsinlowlatencysystems• Comesatthecostof1MBofRAMperthread,onecore,anddiskbandwidth
WhyFast Logging?• Cornerstoneofdebugging• Affordsvisibilityapplicationstate• Helpsinrootcauseanalysisafterexecution
• Problem:Loggingisslow• Applicationresponsetimesaregettingfaster(microseconds)• Loggingisnot(100-1000’sofnanoseconds)• Example:RAMCloudresponsetime=5µs,butlogtime=1µs
Whatmakesloggingslow?
• Compute:ComplexFormatting• Loggersneedtoprovidecontext(i.e.filelocation,time,severity,etc)• Themessageabovehas7argumentsandtakes850nstocompute
• OutputBandwidth:DiskIO• Ona250MB/sdisk,the129bytemessageabovetakes500ns tooutput!
1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000
Solutions
• Compute:RawDataOutput• Mostlogsinproductionarenotconsumedbyhumans• Savecomputationbydeferringformattingtoanofflineprocess• Sidebenefit:moreefficientforanalysisengines
• IO:ExtractingStaticInformation• StaticInfoinmessage:filelocation,line#,function,severity,formatstring.• Replacewithidentifierandcompactremainingdynamicinformation
1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000
NanoLogSystemArchitecture
CompactLog
Runtime
ApplicationExecutable
NanoLogRuntime
BufferBuffer
UserThread
Buffer
DecompressorAggregator
Offline
HumanReadable
Log
NanoLogPreprocessor
GCC
Compilation-TimeUserSourcesUser
SourcesUser
SourcesProcessed
UserSources
LibrarySourcesDecompressorAggregator
ApplicationExecutable
Compile-timeOptimizationsPost-ProcessedUserSource(main.ii)UserSource(main.cc)
NanoLogLibrary(StaticInfo.cc)
(a)Extractstaticloginfo
(b)Injectoptimizedlogcode
ApplicationExecutable
DecompressorExecutable
compilecompile
FastRuntimeArchitecture• IsolatetheThreads• Useper-threadbufferstolowersynchronization• Don’tnotifythebackgroundthread;letitpollfordata
• MinimizeOutputCost• Callerpushesdatauncompressed tosaveoncompute• IOThreadneedstosaveonbothIOandcomputetimes.
• Useonlyrudimentarycompaction(deltas+smallestbyterepresentations)
Runtime
NanoLogBackgroundThread
OutputLogFile[1bytesHeader][1-4byteUniqueId][1-8byteTimediff][0-4bytessize][0-nbytesarguments]....
UserThread BufferUserThread BufferUserThread Buffer
Decompressor/Aggregator• Offlineprocesstodecompresslog• Recombinesthestatic+dynamicdatatoproduceahuman-readablefile
• FutureWork• Query/Aggregateincompactedformat
CompactLogFile[1bytesHeader][1-4byteUniqueId][1-8byteTimediff][0-4bytessize][0-nbytesarguments]....
HumanReadableLogFile
2/9/1712:45:24[main]:HelloWorld21
Decompressor/Aggregator
Benchmarks• SystemSetup• Processor:[email protected]• Memory:24GBDDR3@1333Mhz• Disk:120GBCrucialM4overSATAII(~250MB/s)
• TestSetup• 100Miterationsoflogmessages,backtoback• LogMessage:“{time}{severity}:{56-bytemessage}”
• OverallResultsZeroArguments Boostv1.55 Log4j2 Spdlog NanoLog
Throughput(Log/s) 0.82M 1.43M 1.50M 60.1M
AverageLatency(ns) 1110ns 697ns 668ns 16.5ns
0.82 1.43 1.5
60.1
0
20
40
60
Throug
hput
(MillionsLog
s/sec)
Throughputvs.System
BoostLog Log4j2 spdlog NanoLog
TailLatencies
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
100 101 102 103 104 105 106 107 108 109
Frac
tion
of L
ogs
Latency (ns)
Kernel InterferenceBoost
Log4j2spdlog
NanoLog
TailLatency(+NanoLogCompute)
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
100 101 102 103 104 105 106 107 108 109
Frac
tion
of L
ogs
Latency (ns)
Kernel InterferenceBoost
Log4j2spdlog
NanoLogNanoLog with 10ns Compute
IncreasingParameters
60.1 60
37
2125
13.7
21.6
9.7
17.6
8
14.5
6.41
0
10
20
30
40
50
60
70
SmallIntegers(~1Byte) LargeIntegers(~4bytes)
MillionsofLogM
essages/second
Throughputwithincreasing“%d”parameters
0Params 1Param 2Params 3Params 4Params 5Params
Limitations/FutureWork• BetterCompression?• Isthereabetterwaytocompacttheoutput,butinaperformantway?
• Fullyfeatureddecompressor/aggregator• Operatingonthecompactrepresentationismoreefficient.• Iteratingoveracompactlogmessagetakesabout100nsvs.1.3µstooutput
• ResourceUtilization• Currentlythesystemrequires1MBperuserthread,afullcoretocompact,andthefullbandwidthofaSATASSDtomainlowlatency.Howdoesthischangewithnewhardware?
NanoLogSystemSummary• Compile-TimePreprocessor
• Extractstaticinformationfromlogmessagesatcompiletime• Filename,line#,functionname,etc
• CatalogsstaticinfoandassignsauniqueIDtoeachlogstatement• CodeInjectiontorecordonlyanidentifier+parameterarguments
• RuntimeLibrary• Producer/ConsumerLogoutput• Simplecompaction(takingdeltas/compactingintegers)
• OfflineDecompressor/Aggregator• Recombinestaticinformationforhumanconsumption(ifnecessary)• OfflineSearch/Grep/Aggregateincompressedformat
Questions