debugging node in prod
TRANSCRIPT
![Page 1: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/1.jpg)
Debugging Node.js in ProdYunong Xiao
@yunongx Software Engineer
Node Platform
November 2015
![Page 2: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/2.jpg)
Node.js @ Netflix
❖ 65+ Million Subscribers❖ Website (netflix.com)❖ Dynamic asset packager❖ PaaS on Node❖ Internal Services
![Page 3: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/3.jpg)
![Page 4: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/4.jpg)
–Gene Kranz, Flight Director, Apollo 13
“Let's work the problem, people. Let's not make things any worse by guessing”
![Page 5: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/5.jpg)
Apply the Scientific Method
1. Construct a Hypothesis
2. Collect data
3. Analyze data and draw a conclusion
4. Repeat
![Page 6: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/6.jpg)
Production Crisis
❖ Runtime Performance
❖ Runtime Crashes
❖ Memory Leaks
![Page 7: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/7.jpg)
Netflix is “Slow”
![Page 8: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/8.jpg)
Gather Request Data
http://restify.comhttp://github.com/restify/node-restify
Observable REST Framework
![Page 9: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/9.jpg)
to the Rescue[2014-12-09T14:07:26.293Z] INFO: shakti/restify-audit/20067: handled: 200, latency=1402 (req_id=b3fa3820-7fac-11e4-8908-a5c7b70d676f, latency=1435) GET / HTTP/1.1 host: www.netflix.com -- HTTP/1.1 200 OK x-netflix.client.instance: i-057e47ef x-frame-options: DENY content-type: text/html -- req.timers: { "parseBody": 700123, "apiRpc": 701911, "render": 400031 }
![Page 10: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/10.jpg)
req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031,}
On CPU
![Page 11: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/11.jpg)
CPU is Critical
❖ Node is essentially “single threaded”
❖ Cascading effect on ALL requests in process
![Page 12: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/12.jpg)
req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031,}
Can’t process ANY other request for 1.1 seconds
On CPU
![Page 13: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/13.jpg)
How Much Code?
$ find . -name "*.js*" | xargs cat | wc -l
6 042 301
![Page 14: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/14.jpg)
Statistically Sample Stack Traces
![Page 15: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/15.jpg)
Snapshot What’s Currently Executing
Stacktrace: A stack trace is a report of the active stack frames at a certain point in time during the execution of a program.
> console.log(ex, ex.stack.split("\n"))ReferenceError: ex is not defined at repl:1:13 at REPLServer.defaultEval (repl.js:132:27) at bound (domain.js:254:14) at REPLServer.runBound [as eval] (domain.js:267:12) at REPLServer.<anonymous> (repl.js:279:12) at REPLServer.emit (events.js:107:17) at REPLServer.Interface._onLine (readline.js:214:10) at REPLServer.Interface._line (readline.js:553:8) at REPLServer.Interface._ttyWrite (readline.js:830:14) at ReadStream.onkeypress (readline.js:109:10)
![Page 16: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/16.jpg)
Two Problems 1) How to sample stack traces from a running
process? 2) How to do 1) without affecting the process?
![Page 17: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/17.jpg)
Linux Perf EventsPERF(1) perf Manual PERF(1)
NAME perf - Performance analysis tools for Linux
SYNOPSIS perf [--version] [--help] COMMAND [ARGS]
DESCRIPTION Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.
![Page 18: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/18.jpg)
Sample Stack Traces w/ perf(1)
# perf record -F 99 -p `pgrep -n node` -g -- sleep 30[ perf record: Woken up 2 times to write data ][ perf record: Captured and wrote 0.524 MB perf.data (~22912 samples) ]
![Page 19: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/19.jpg)
Sample Stack Traceab2fee v8::internal::Heap::DeoptMarkedAllocationSites() (/apps/node/bin/a69754 v8::internal::StackGuard::HandleInterrupts() (/apps/node/bin/node)c9f13b v8::internal::Runtime_StackGuard(int, v8::internal::Object**3c793e3060bb (/tmp/perf-5382.map)3c793e3060bb (/tmp/perf-5382.map)3c793e3060bb (/tmp/perf-5382.map)3c793e3060bb (/tmp/perf-5382.map) (repeated 30 more lines)8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node)8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node)df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>,...df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node)fb1597 uv__run_check (/apps/node/bin/node)fabcee uv_run (/apps/node/bin/node)dfaa50 node::Start(int, char**) (/apps/node/bin/node)7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so)
Missing JS Frames
![Page 20: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/20.jpg)
Why? v8 places symbols JIT(Just in Time)
![Page 21: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/21.jpg)
node --perf_basic_prof_only_functions
“outputs the files in a format that the existing perf toolcan consume.”
![Page 22: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/22.jpg)
node --perf_basic_prof_only_functions
Available right now in Node v5.x
Coming soon to Node v4.x:https://github.com/nodejs/node/pull/3609
![Page 23: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/23.jpg)
3c793e446880 22c LazyCompile:~baseCallback /apps/node/webapp/node_modules/restify-errors/node_modules/lodash/index.js:1654
3c793e446b20 c4 LazyCompile:~baseReduce /apps/node/webapp/node_modules/restify-errors/node_modules/lodash/index.js:2519
3c793e446c60 330 LazyCompile:~ /apps/node/webapp/node_modules/restify-errors/node_modules/lodash/index.js:3040
3c793e447000 12c LazyCompile:~ /apps/node/webapp/node_modules/restify-errors/node_modules/lodash/index.js:2520
3c793e4471a0 2a4 LazyCompile:~ /apps/node/webapp/node_modules/restify-errors/lib/httpErrors.js:54
v8 Generated perf.map
![Page 24: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/24.jpg)
Resultsnode 5382 cpu-clock: 3c793e38b0c1 LazyCompile:DELETE native runtime.js:349 (/tmp/perf-5382.map) 3c793e31981d Builtin:JSConstructStubGeneric (/tmp/perf-5382.map) 3c793ff2ca94 (/tmp/perf-5382.map) 3c793e98a10f LazyCompile:~AtlasClient._run /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:85 (/tmp/perf-5382.map) 3c793f47de29 LazyCompile:*AtlasClient.timer /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:70 (/tmp/perf-5382.map) 3c793e9eee38 LazyCompile:~fetchSingleGetCallback /apps/node/webapp/singletons/ShaktiFetcher.js:120 (/tmp/perf-5382.map) 3c793f6cffee LazyCompile:*Model.get /apps/node/webapp/node_modules/nf-models/lib/Model.js:90 (/tmp/perf-5382.map) 3c793ed3e2ad (/tmp/perf-5382.map) 3c7940e4357b Handler:ca (/tmp/perf-5382.map) 3c793f060e3c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:134 (/tmp/perf-5382.map) 3c79404edbfa (/tmp/perf-5382.map) 3c79401fd3f7 (/tmp/perf-5382.map) 3c79400e307b LazyCompile:*fetchMulti /apps/node/webapp/singletons/ShaktiFetcher.js:50 (/tmp/perf-5382.map) 3c793fb9a59f LazyCompile:*fetch /apps/node/webapp/singletons/ShaktiFetcher.js:32 (/tmp/perf-5382.map) 3c793e896697 (/tmp/perf-5382.map) 3c7943aaabbe (/tmp/perf-5382.map) 3c793ef4c53c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:245 (/tmp/perf-5382.map) 3c793eaf4f01 LazyCompile:* /apps/node/webapp/node_modules/nf-packager/lib/index.js:194 (/tmp/perf-5382.map) 3c793eab130a LazyCompile:processImmediate timers.js:352 (/tmp/perf-5382.map) 3c793e319f7d Builtin:JSEntryTrampoline (/tmp/perf-5382.map) 3c793e3189e2 Stub:JSEntryStub (/tmp/perf-5382.map) a65baf v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/apps/node/bin/node) 8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) 8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node) fb1597 uv__run_check (/apps/node/bin/node) fabcee uv_run (/apps/node/bin/node) dfaa50 node::Start(int, char**) (/apps/node/bin/node) 7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so))
JS Frames
Native Frames
![Page 25: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/25.jpg)
Problem: Too Many Traces
$ cat out.nodestacks01 | grep cpu-clock | wc -l
744$ wc -l out.nodestacks01
58116
![Page 26: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/26.jpg)
Too Many Traces
![Page 27: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/27.jpg)
Solution: Flame Graphs
![Page 28: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/28.jpg)
Flamegraph
❖ Each box presents a function in the stack (stack frame)
❖ x-axis: percent of time on CPU❖ y-axis: stack depth❖ colors: random, or can be a
dimension❖ https://github.com/
brendangregg/FlameGraph
v8
libc
JS
built ins
![Page 29: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/29.jpg)
Flame Graph Interpretation
a()
b() h()
c()
d()
e() f()
g()
i()
![Page 30: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/30.jpg)
Flame Graph InterpretationTop edge shows who is running on-CPU, and how much (width)
a()
b() h()
c()
d()
e() f()
g()
i()
![Page 31: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/31.jpg)
Flame Graph InterpretationTop-down shows ancestry
e.g., from g():
h()
d()
e()
i()
a()
b()
c()
f()
g()
![Page 32: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/32.jpg)
Flame Graph Interpretation
a()
b() h()
c()
d()
e() f()
g()
i()
Widths are proportional to presence in samples
e.g., comparing b() to h() (incl. children)
![Page 33: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/33.jpg)
![Page 34: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/34.jpg)
![Page 35: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/35.jpg)
> 50% time on CPU
![Page 36: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/36.jpg)
lodash!
![Page 37: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/37.jpg)
function merge(object) { var args = arguments, length = 2;...
![Page 38: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/38.jpg)
Use _.assign() Instead
![Page 39: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/39.jpg)
Before
![Page 40: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/40.jpg)
After
![Page 41: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/41.jpg)
Flame Graphs
Helps you find 1 LoC out of 6 Million
![Page 42: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/42.jpg)
Results
❖ Dramatically reduced request latency
❖ Reduced CPU utilization
❖ Increased throughput
![Page 43: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/43.jpg)
Runtime Performance Technique
❖ Sample stack traces via perf(1)
❖ Visualize code distribution with CPU flame graphs
❖ Identify candidate code paths for performance improvement
❖ Repeat
![Page 44: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/44.jpg)
Runtime Crashes
![Page 45: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/45.jpg)
- Chafin, R. "Pioneer F & G Telemetry and Command Processor Core Dump Program." JPL Technical Report XVI, no. 32-1526 (1971): 174.
“The method described in this article was designed to provide a core dump… with a minimal impact
on the spacecraft… as the resumption of data acquisition from the spacecraft is the highest
priority.”
![Page 46: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/46.jpg)
Core Dumps
![Page 47: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/47.jpg)
Core Dumps
![Page 48: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/48.jpg)
Core Dumps — A Brief History
❖ Magnetic core memory❖ Dump out the contents of
“core” memory for debugging❖ “Core dump” was born❖ Initially printed on paper!❖ Postmortem debugging was
born!
![Page 49: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/49.jpg)
![Page 50: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/50.jpg)
Production Constraints
❖ Uptime is critical
❖ Not easily reproducible
❖ Can’t simulate environment
❖ Resume normal operations ASAP
![Page 51: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/51.jpg)
Postmortem Debugging
Take core dump
Restart app
Load core dump
elsewhere
Engineer FixDebug
Continue serving traffic
![Page 52: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/52.jpg)
Configure Node to Dump Core on Error
!"[0] <> node --abort_on_uncaught_exception throw.jsUncaught Error
FROMObject.<anonymous> (/Users/yunong/throw.js:1:63)Module._compile (module.js:435:26)Object.Module._extensions..js (module.js:442:10)Module.load (module.js:356:32)Function.Module._load (module.js:311:12)Function.Module.runMain (module.js:467:10)startup (node.js:134:18)node.js:961:3
[1] 4131 illegal hardware instruction (core dumped) node --abort_on_uncaught_exception throw.js
![Page 53: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/53.jpg)
Node Post Mortem Tooling
❖ Netflix uses Linux in Prod
❖ Linux — Work in progress
❖ https://github.com/tjfontaine/lldb-v8
❖ https://github.com/indutny/llnode
❖ Solaris — Full featured, compatible with Linux cores
❖ https://github.com/joyent/mdb_v8
![Page 54: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/54.jpg)
![Page 55: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/55.jpg)
Socks & Duct Tape: Setup a Debug Solaris Instance
EC2: http://omnios.omniti.com/wiki.php/Installation#IntheCloud
VM: http://omnios.omniti.com/wiki.php/Installation#Quickstart
![Page 56: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/56.jpg)
Post Mortem Methodology
❖ Where: Inspect stack trace
❖ Why: Inspect heap and stack variable state
![Page 57: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/57.jpg)
mdb(1) JS commands❖ ::help <cmd>
❖ ::jsstack
❖ ::jsprint
❖ ::jssource
❖ ::jsconstructor
❖ ::findjsobjects
❖ ::jsfunctions
![Page 58: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/58.jpg)
Load the Core Dump
# mdb ./node-v4.2.2-linux/node-v4.2.2-linux-x64/bin/node ./core.7186
> ::load ./mdb_v8_amd64.somdb_v8 version: 1.1.1 (release, from 28cedf2)V8 version: 143.156.132.195Autoconfigured V8 support from targetC++ symbol demangling enabled
linux node binary core dumpload mdb_v8 module
![Page 59: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/59.jpg)
::jsstack> ::jsstackjs: testjs: storeHeaderjs: <anonymous> (as OutgoingMessage._storeHeader)js: <anonymous> (as ServerResponse.writeHead)js: restifyWriteHeadjs: _cbjs: sendjs: <anonymous> (as <anon>)js: <anonymous> (as ReactRenderer._renderLayout)js: <anonymous> (as <anon>)js: <anonymous> (as <anon>)js: <anonymous> (as dispatchHandler)js: <anonymous> (as <anon>)js: runHooksjs: runTransitionToHooksjs: <anonymous> (as assign.to)js: <anonymous> (as <anon>)js: runHooksjs: runTransitionFromHooksjs: <anonymous> (as assign.from)js: <anonymous> (as React.createClass.statics.dispatch)native: _ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0...native: v8::internal::Execution::Call+0xc8native: v8::internal::Runtime_Apply+0x1cejs: <anonymous> (as b)
frame type
func name
![Page 60: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/60.jpg)
Always name your functions!var foo = function foo() {};
Foo.prototype.bar = function bar() {};
foo(function bar() {});
![Page 61: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/61.jpg)
::jsstack -vn0 Frame and Function Args> ::jsstack -vn0js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString)js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString)js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object)js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided)js: _cb file: /apps/node/webapp/node_modules/restify/lib/response.js
Func NameJS FileLine #
Func Args
![Page 62: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/62.jpg)
::jsstack -v Frame Source> ::jsstack -vjs: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString)
652 653 function storeHeader(self, state, field, value) { 654 // Protect against response splitting. The if statement is there to 655 // minimize the performance impact in the common case. 656 if (/[\r\n]/.test(value)) 657 value = value.replace(/[\r\n]+[ \t]*/g, ''); 658 659 state.messageHeader += field + ': ' + value + CRLF; 660 661 if (connectionExpression.test(field)) { 662 state.sentConnectionHeader = true; 663 if (closeExpression.test(value)) { 664 self._last = true; 665 } else { 666 self.shouldKeepAlive = true; 667 } 668 669 } else if (transferEncodingExpression.test(field)) {
![Page 63: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/63.jpg)
::jsstack Function Args> ::jsstack -vn0js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString)js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString)js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object)js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided)js: _cb file: /apps/node/webapp/node_modules/restify/lib/response.js
Memory Address of Var Var Type
![Page 64: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/64.jpg)
::jsprint Print JS Objects> 3bd67e0669b9::jsprint{ "_time": 1437690472539, "_headers": { "content-type": "text/html", "req_id": "5b7f18f2-7f12-4c68-b07f-3cd75698ba65", "set-cookie": “CENSORED; Domain=.netflix.com; Expires=Fri, 24 Jul 2015 10:27:52 GMT", "x-frame-options": "DENY", "x-ua-compatible": "IE=edge", "x-netflix.client.instance": "i-c420596c", }, "output": [], "_last": false, "_hangupClose": false, "_hasBody": true, "socket": { "_connecting": false, "_handle": [...], "_readableState": [...], "readable": true, "domain": null, "_events": [...], "_maxListeners": 10, "_writableState": [...], "writable": true, "allowHalfOpen": true, "onend": function <anonymous> (as socket.onend),
Actual JS Object Instance
![Page 65: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/65.jpg)
::jsconstructor Show Object Constructor
> 3bd67e0669b9::jsconstructor -vServerResponse (JSFunction: 2421205bced9)
![Page 66: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/66.jpg)
::jssource Print f() Source
> 2421205bced9::jssourcefile: http.js
1066 function ServerResponse(req) { 1067 OutgoingMessage.call(this); 1068 1069 if (req.method === 'HEAD') this._hasBody = false; 1070 1071 this.sendDate = true; 1072 1073 if (req.httpVersionMajor < 1 || req.httpVersionMinor < 1) { 1074 this.useChunkedEncodingByDefault = chunkExpression.test(req.headers.te); 1075 this.shouldKeepAlive = false; 1076 } 1077 } 1078 util.inherits(ServerResponse, OutgoingMessage);
![Page 67: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/67.jpg)
Core Dump === Complete Process State
![Page 68: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/68.jpg)
Memory Leaks
![Page 69: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/69.jpg)
Memory Leaks
![Page 70: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/70.jpg)
![Page 71: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/71.jpg)
![Page 72: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/72.jpg)
Generate Core Dump Ad-hoc
![Page 73: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/73.jpg)
gcore(1) GNU Tools gcore(1)
NAME gcore - Generate a core file for a running process
SYNOPSIS gcore [-o filename] pid
![Page 74: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/74.jpg)
Take a Core Dump!root@demo:~# gcore `pgrep node`[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".[New Thread 0x7facaeffd700 (LWP 5650)][New Thread 0x7facaf7fe700 (LWP 5649)][New Thread 0x7facaffff700 (LWP 5648)][New Thread 0x7facbc967700 (LWP 5647)][New Thread 0x7facbd168700 (LWP 5617)][New Thread 0x7facbd969700 (LWP 5616)][New Thread 0x7facbe16a700 (LWP 5615)][New Thread 0x7facbe96b700 (LWP 5614)]0x00007facbea5b5a9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6Saved corefile core.5602
![Page 75: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/75.jpg)
Problem: Find Leaking Objects
![Page 76: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/76.jpg)
::findjsobjects
NAME findjsobjects - find JavaScript objects
SYNOPSIS [ addr ] ::findjsobjects [-vb] [-r | -c cons | -p prop]
![Page 77: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/77.jpg)
::findjsobjects Find ALL JS Objects on Heap
> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 3dfe97453121 18 6721 Array 157a020e01 1304 101 <anonymous> (as Constructor): ... 8f1a53211 13879 12 ReactDOMComponent: _tag, tagName, props, ... 8f1a05691 85776 2 Array 3dfe97451a99 36 5589 Array 23e5d7d44351 1 218020 Object: .2f5hpw2hgjk.1.0.3, ... 8f1a05f31 40533 6 <anonymous> (as ReactElement): type, ... 8f1a04da1 252133 1 Array 8f1a04dc1 125869 7 Array 8f1a04f01 114914 8 Array 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...
![Page 78: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/78.jpg)
Memory Leak Strategy
❖ Look at objects on heap for suspicious objects
❖ Take successive core dumps and compare object counts
❖ Growing object counts are likely leaking
❖ Inspect object for more context
❖ Walk reverse references to find root object
![Page 79: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/79.jpg)
Look at Object Delta Between Successive Core Dumps
![Page 80: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/80.jpg)
Uptime = 45mins
> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...
![Page 81: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/81.jpg)
Uptime = 90 mins
> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ...
![Page 82: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/82.jpg)
Analyze Leaked Objects
![Page 83: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/83.jpg)
Representative Object
> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ...
Representative Object, 1 of 323454
![Page 84: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/84.jpg)
Look Closer> 8f1a04d39::jsprint{ "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": {}, "parent": { "id": "/apps/node/webapp/middleware/autoClientStrings.js", "exports": function autoExposeClientStrings, "parent": [...], "filename": "/apps/node/webapp/middleware/autoClientStrings.js", "loaded": true, "children": [...], "paths": [...], }, "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
![Page 85: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/85.jpg)
Use ::findjsobjects to Find All “Module” Objects
> 8f1a04d39::findjsobjects8f1a04d393fd996bffb393fd996bfcff13fd996bfbac13fd996bf8a193fd996bf79493fd996bf3ce93fd996bf0f193fd996bead713fd996bea8213fd996bea0013fd996be92b13fd996be73d13fd996be58d13fd996bd88b13fd996bcb4593fd996bcaa413fd996bc70093fd996bc3321
![Page 86: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/86.jpg)
Analyze All 320K+ Objects?
![Page 87: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/87.jpg)
Custom Querying With Pipes and Unix Tools
8f1a04d39::findjsobjects | ::jsprint ! grep filename | sort | uniq -c
![Page 88: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/88.jpg)
Results... 1 "filename": "/apps/node/webapp/ui/js/akira/components/messaging/paymentHold.js", 2 "filename": "/apps/node/webapp/ui/js/common/commonCore.js", 1 "filename": "/apps/node/webapp/ui/js/common/playPrediction/playPrediction.js", 3 "filename": "/apps/node/webapp/ui/js/common/presentationTracking/presentationTracking.js", 111061 "filename": “/apps/node/webapp/ui/js/common/playPrediction/playPrediction.js", 7103 "filename": “/apps/node/webapp/ui/js/pages/reactClientRender.js", 111061 "filename": “/apps/node/webapp/ui/js/pages/akiraClient.js", 118257 "filename": “/apps/node/webapp/middleware/autoClientStrings.js",... Client Side Modules
![Page 89: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/89.jpg)
What’s holding on to these modules?
![Page 90: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/90.jpg)
Aim: Find Root Object
![Page 91: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/91.jpg)
Walk Reverse Refs with ::findjsobjects -r
> 8f1a04d39::findjsobjects -r
8f1a04d39 referred to by 14fd6c5b13c1.parent
![Page 92: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/92.jpg)
Root Object> 1f313791bb41::jsprint[ { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",
![Page 93: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/93.jpg)
Spot the Leakvar cache = {};
function checkCache(someModule) { var mod = cache[someModule]; if (!mod) { try { mod = require(someModule); cache[someModule] = mod; return mod; } catch (e) { return {}; } }
return mod;}
Module could be client only, must catch
Should cache the fact we caught an exception here
![Page 94: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/94.jpg)
Root Cause
❖ Node caches metadata for each module
❖ If require process throws an exception, the module metadata is leaked (bug?)
❖ Client side module meant we were throwing during every request, and not caching the fact we tried to require it
❖ Each request leaks 3+ module metadata objects
![Page 95: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/95.jpg)
Memory Leaks
❖ Take successive core dumps (gcore(1))
❖ Compare object counts (::findjsobjects)
❖ Growing objects are likely leaking
❖ Inspect object for more context (::jsprint)
❖ Walk reverse references to find root obj (::findjsobjects -r)
![Page 96: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/96.jpg)
Post Mortem Debugging is Critical to Large Scale Node Deployments
![Page 97: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/97.jpg)
More State than Just Logs❖ Detailed stack trace (::jsstack)
❖ Function args for each frame (::jsstack -vn0)
❖ Get state of any object and its provenance (::jsprint, ::jsconstructor)
❖ Get source code of any function (::jssource)
❖ Find arbitrary JS objects (::findjsobjects)
❖ Unmodified Node binary!
![Page 98: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/98.jpg)
Production Failures are Inevitable
![Page 99: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/99.jpg)
But We Can Learn from Them
![Page 100: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/100.jpg)
Production Debugging❖ Runtime Performance
❖ CPU profiling/flame graphs
❖ Runtime Crashes
❖ Inspect program state with core dumps and mdb
❖ Memory leaks
❖ Analyze objects and references with core dumps and mdb
![Page 101: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/101.jpg)
Use the Scientific Method
![Page 102: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/102.jpg)
Epilogue — State of Tooling
❖ Join Working Group https://github.com/nodejs/post-mortem
❖ Help make mdb_v8 cross platform https://github.com/joyent/mdb_v8
❖ Contribute to https://github.com/tjfontaine/lldb-v8 and https://github.com/indutny/llnode
![Page 103: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/103.jpg)
Acknowledgements❖ mdb_v8
❖ Dave Pacheco, TJ Fontaine, Julien Gilli, Bryan Cantrill
❖ CPU Profiling/Flamegraphs
❖ Brendan Gregg, Google v8 team, Ali Ijaz Sheikh
❖ Linux Perf
❖ Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Peter Zijlstra
❖ lldb-v8
❖ TJ Fontaine
❖ llnode
❖ Fedor Indutny
![Page 104: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/104.jpg)
From Netflix: Node.js Page Fault Flame Graphs
![Page 105: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/105.jpg)
Get Involved!
![Page 107: Debugging node in prod](https://reader030.vdocuments.net/reader030/viewer/2022012813/586e73671a28ab99598b5477/html5/thumbnails/107.jpg)
Citations
❖ Slides 29-32 used with permission from “Java Mixed-Mode Flame Graphs”, Brendan Gregg, Oct 2015
❖ Slide 26 used with permission from http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html