node.js cpu profiling and memory leak detection with strongloop arc

Post on 13-Aug-2015

167 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Shubhra Kar | Products & Educationtwitter:@shubhrakar

Profiling & Memory Leak Diagnosis

nodejs @ hyper-scale

About me

J2EE and SOA architect

Performance architect

Node, mBaaS & APIs

These guys sent me !

Bert Belder

Ben Noordhuis

Node/io Core

RaymondFeng

Ritchie Martori

LoopBack & Express Core

SamRoberts

Miroslav Bajtos

Ryan Graham

Latency demands are uncompromising

Web SaaS Mobile IoT

10

25

50

100100

50

1015

40

100

25

The new curve

Concurrent Users Latency Adoption

That’s not your API, is it ?

So how do we tune performance ?

How does GC work in V8 – Almost the same !

Concept of reachability.Roots: Reachable or in live scope objectsInclude objects referenced from anywhere in the call stack (all local variables and parameters in the functions currently being invoked), and any global variables.Objects are kept in memory while accessible from roots through a reference or a chain of references.Root objects are pointed directly from V8 or the Web browser like DOM elementsGarbage collector identifies dead memory regions/unreachable objects through a chain of pointers from a live object; reallocates or releases them to the OS

Easy right ? Hell No !!!

Pause and then Stop the World

V8 essentially:stops program execution when performing a garbage collection cycle.processes only part of the object heap in most garbage collection cycles. This minimizes the impact of stopping the application.Accurately keeps all objects and pointers in memory. This avoids falsely identifying objects as pointers which can result in memory leaks.In V8, the object heap is segmented into many parts; hence If an object is moved in a garbage collection cycle, V8 updates all pointers to the object.

Short, Full GC and some algorithms to know

V8 divides the heap into two generations:

Short GC/scavengingObjects are allocated in “new-space” (between 1 and 8 MB). Allocation in new space is very cheap; increment an allocation pointer when we want to reserve space for a new object. When the pointer reaches the end of new space, a scavenge (minor garbage collection cycle) is triggered, quickly removing dead objects from new space.large space overhead, since we need physical memory backing both to-space and from-space. Fast by design, hence using for short GC cycles. Acceptable if new space is small – a few Mbs

Full GC/mark-sweep & mark-compactObjects which have survived two minor garbage collections are promoted to “old-space.” Old-space is garbage collected in full GC (major garbage collection cycle), which is much less frequent. A full GC cycle is triggered based on a memory thresholdTo collect old space, which may contain several hundred megabytes of data, we use two closely related algorithms, Mark-sweep and Mark-compact.

New Algorithm implementation

Incremental marking & lazy sweeping

In mid-2012, Google introduced two improvements that reduced garbage collection pauses significantly: incremental marking and lazy sweeping.Incremental marking means being able to do a bit of marking work, then let the mutator (JavaScript program) run a bit, then do another bit of marking work. Short pauses in the order of 5-10 ms each as an example for marking. Threshold based. At every alloc, execution is paused to perform an incremental marking step.

Lazy sweep cleans up set of objects at time eventually cleaning all pages.

OK…so how can I do heap analysis

StrongLoop CLI

$ slc start

$ slc ctl

Service ID: 1Service Name: express-example-appEnvironment variables:  No environment variables definedInstances:    Version  Agent version  Cluster size     4.1.0       1.5.1            4Processes:        ID      PID   WID  Listening Ports  Tracking objects?  CPU profiling?    1.1.50320  50320   0    1.1.50321  50321   1     0.0.0.0:3001    1.1.50322  50322   2     0.0.0.0:3001    1.1.50323  50323   3     0.0.0.0:3001    1.1.50324  50324   4     0.0.0.0:3001

$ slc ctl heap-snapshot 1.1.1

StrongLoop API

Programmatic heap snapshots (timer based)

Programmatic heap snapshots (threshold based)

var heapdump = require('heapdump') ... setInterval(function () {   heapdump.writeSnapshot() }, 6000 * 30) <strong>(1)</strong>

var heapdump = require('heapdump')var nextMBThreshold = 0 <strong>(1)</strong>

setInterval(function () {  var memMB = process.memoryUsage().rss / 1048576 <strong>(2)</strong>  if (memMB &gt; nextMBThreshold) { <strong>(3)</strong>    heapdump.writeSnapshot()    nextMBThreshold += 100  }}, 6000 * 2) <strong>(4)</strong>

StrongLoop Arc

StrongLoop Arc – Memory Analysis

CPU hotspots & event loop blocks ?

Don’t Block the EventLoop

Node.js Server

StrongLoop CLI

$ slc start

$ slc ctl

Service ID: 1Service Name: express-example-appEnvironment variables:  No environment variables definedInstances:    Version  Agent version  Cluster size     4.1.0       1.5.1            4Processes:        ID      PID   WID  Listening Ports  Tracking objects?  CPU profiling?    1.1.50320  50320   0    1.1.50321  50321   1     0.0.0.0:3001    1.1.50322  50322   2     0.0.0.0:3001    1.1.50323  50323   3     0.0.0.0:3001    1.1.50324  50324   4     0.0.0.0:3001

$ slc ctl cpu-start 50320 $ slc ctl cpu-stop 50320

CPU profiling in StrongLoop Arc

The Upside down wedding cake

Blocked event loop in Meteor atomosphere

node-fibers implements co-routines. Meteor uses this to hack local thread storage allowing V8 to run multiple execution contexts each mapped to a co-routine.

FindOrAllocatePerThreadDataForThisThread() used in switching context between co-routines

Co-routines are cooperative; the current coroutine has to yield control before another one can run and that is what Meteor does in its process.nextTick() callback; it essentially builds concurrent (but not parallel) green threads on a round-robin scheduler

Too many tiny tasks and not one long running one was blocking the event loop

process.nextTick() has a failsafe mechanism where it will process “x” tick callbacks before deferring the remaining ones to the next event loop tick. 

Native MongoDB driver disabled the failsafe to silence a warning message in node v0.10 about maxtickDepth being reached

ticks parent name2274 7.3% v8::internal::Isolate::FindOrAllocatePerThreadDataForThisThread()1325 58.3% LazyCompile: ~<anonymous> packages/meteor.js:6831325 100.0% LazyCompile: _tickCallback node.js:399

The fix

The workaround: switch fromprocess.nextTick() to setImmediate()

StrongLoop Smart Profiling….Arc support coming soon

$ slc ctl cpu-start 1.1.76901 12

$ slc ctl -C http://my.remote.host cpu-start 1.1.76901 12

Local Linux host

Remote Linux host

• Sniff for event loop block• Trigger deep profile when blocking

encountered

And finally the winner is…

Deep Transaction Tracing

StrongLoop – node.js Development to Production

Build and Deploy

Automate Lifecycle

Performance MetricsReal-time production monitoring

ProfilerRoot cause

CPU & Memory

API ComposerVisual modeling

StrongLoop Arc

Process Manager

Scale applications

Q2201

5Mesh

Deploy containerize

d

ORM, mBaaS, Realtime

top related