nanolog: a nanosecond scale logging system · overview • implemented a fast c++ logging system...

16
NanoLog: A Nanosecond Scale Logging System Stephen Yang John Ousterhout February 9 th , 2017 PlatformLab Review 2017

Upload: others

Post on 09-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

NanoLog:ANanosecondScaleLoggingSystem

StephenYangJohnOusterhout

February9th,2017PlatformLab Review2017

Page 2: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Overview• ImplementedafastC++LoggingSystem• 12.5nsmedianlatencyat60M logmsgs/sec• 10-100xfasterthanexistingsystemssuchasLog4j2andspdlog• Maintainsprintf-likesemantics

• Shiftsworkoutoftheruntimehot-path• Extractionofstaticinformationatcompile-time• Compactedbinaryoutputatruntime• Defersformattingtoanofflineprocess

• BenefitsandCosts• Allowsdetailedlogsinlowlatencysystems• Comesatthecostof1MBofRAMperthread,onecore,anddiskbandwidth

Page 3: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

WhyFast Logging?• Cornerstoneofdebugging• Affordsvisibilityapplicationstate• Helpsinrootcauseanalysisafterexecution

• Problem:Loggingisslow• Applicationresponsetimesaregettingfaster(microseconds)• Loggingisnot(100-1000’sofnanoseconds)• Example:RAMCloudresponsetime=5µs,butlogtime=1µs

Page 4: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Whatmakesloggingslow?

• Compute:ComplexFormatting• Loggersneedtoprovidecontext(i.e.filelocation,time,severity,etc)• Themessageabovehas7argumentsandtakes850nstocompute

• OutputBandwidth:DiskIO• Ona250MB/sdisk,the129bytemessageabovetakes500ns tooutput!

1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000

Page 5: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Solutions

• Compute:RawDataOutput• Mostlogsinproductionarenotconsumedbyhumans• Savecomputationbydeferringformattingtoanofflineprocess• Sidebenefit:moreefficientforanalysisengines

• IO:ExtractingStaticInformation• StaticInfoinmessage:filelocation,line#,function,severity,formatstring.• Replacewithidentifierandcompactremainingdynamicinformation

1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000

Page 6: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

NanoLogSystemArchitecture

CompactLog

Runtime

ApplicationExecutable

NanoLogRuntime

BufferBuffer

UserThread

Buffer

DecompressorAggregator

Offline

HumanReadable

Log

NanoLogPreprocessor

GCC

Compilation-TimeUserSourcesUser

SourcesUser

SourcesProcessed

UserSources

LibrarySourcesDecompressorAggregator

ApplicationExecutable

Page 7: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Compile-timeOptimizationsPost-ProcessedUserSource(main.ii)UserSource(main.cc)

NanoLogLibrary(StaticInfo.cc)

(a)Extractstaticloginfo

(b)Injectoptimizedlogcode

ApplicationExecutable

DecompressorExecutable

compilecompile

Page 8: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

FastRuntimeArchitecture• IsolatetheThreads• Useper-threadbufferstolowersynchronization• Don’tnotifythebackgroundthread;letitpollfordata

• MinimizeOutputCost• Callerpushesdatauncompressed tosaveoncompute• IOThreadneedstosaveonbothIOandcomputetimes.

• Useonlyrudimentarycompaction(deltas+smallestbyterepresentations)

Runtime

NanoLogBackgroundThread

OutputLogFile[1bytesHeader][1-4byteUniqueId][1-8byteTimediff][0-4bytessize][0-nbytesarguments]....

UserThread BufferUserThread BufferUserThread Buffer

Page 9: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Decompressor/Aggregator• Offlineprocesstodecompresslog• Recombinesthestatic+dynamicdatatoproduceahuman-readablefile

• FutureWork• Query/Aggregateincompactedformat

CompactLogFile[1bytesHeader][1-4byteUniqueId][1-8byteTimediff][0-4bytessize][0-nbytesarguments]....

HumanReadableLogFile

2/9/1712:45:24[main]:HelloWorld21

Decompressor/Aggregator

Page 10: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Benchmarks• SystemSetup• Processor:[email protected]• Memory:24GBDDR3@1333Mhz• Disk:120GBCrucialM4overSATAII(~250MB/s)

• TestSetup• 100Miterationsoflogmessages,backtoback• LogMessage:“{time}{severity}:{56-bytemessage}”

• OverallResultsZeroArguments Boostv1.55 Log4j2 Spdlog NanoLog

Throughput(Log/s) 0.82M 1.43M 1.50M 60.1M

AverageLatency(ns) 1110ns 697ns 668ns 16.5ns

0.82 1.43 1.5

60.1

0

20

40

60

Throug

hput

(MillionsLog

s/sec)

Throughputvs.System

BoostLog Log4j2 spdlog NanoLog

Page 11: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

TailLatencies

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104 105 106 107 108 109

Frac

tion

of L

ogs

Latency (ns)

Kernel InterferenceBoost

Log4j2spdlog

NanoLog

Page 12: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

TailLatency(+NanoLogCompute)

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104 105 106 107 108 109

Frac

tion

of L

ogs

Latency (ns)

Kernel InterferenceBoost

Log4j2spdlog

NanoLogNanoLog with 10ns Compute

Page 13: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

IncreasingParameters

60.1 60

37

2125

13.7

21.6

9.7

17.6

8

14.5

6.41

0

10

20

30

40

50

60

70

SmallIntegers(~1Byte) LargeIntegers(~4bytes)

MillionsofLogM

essages/second

Throughputwithincreasing“%d”parameters

0Params 1Param 2Params 3Params 4Params 5Params

Page 14: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Limitations/FutureWork• BetterCompression?• Isthereabetterwaytocompacttheoutput,butinaperformantway?

• Fullyfeatureddecompressor/aggregator• Operatingonthecompactrepresentationismoreefficient.• Iteratingoveracompactlogmessagetakesabout100nsvs.1.3µstooutput

• ResourceUtilization• Currentlythesystemrequires1MBperuserthread,afullcoretocompact,andthefullbandwidthofaSATASSDtomainlowlatency.Howdoesthischangewithnewhardware?

Page 15: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

NanoLogSystemSummary• Compile-TimePreprocessor

• Extractstaticinformationfromlogmessagesatcompiletime• Filename,line#,functionname,etc

• CatalogsstaticinfoandassignsauniqueIDtoeachlogstatement• CodeInjectiontorecordonlyanidentifier+parameterarguments

• RuntimeLibrary• Producer/ConsumerLogoutput• Simplecompaction(takingdeltas/compactingintegers)

• OfflineDecompressor/Aggregator• Recombinestaticinformationforhumanconsumption(ifnecessary)• OfflineSearch/Grep/Aggregateincompressedformat

Page 16: NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System • 12.5ns median latency at 60M log msgs/sec • 10-100x faster than existing systems

Questions