swift profiling middleware and tools

20
Swift Profiling Middleware and Tools

Upload: zhang-hua

Post on 28-May-2015

596 views

Category:

Design


1 download

TRANSCRIPT

Page 1: Swift profiling middleware and tools

Swift  Profiling  Middleware  and  Tools    

Page 2: Swift profiling middleware and tools

Agenda  

l  Background  

l  Profiling  Proposal  

l  Profiling  Architecture  

l  Profiling  Data  Model  

l  Profiling  Tools  

l  Profiling  Analysis  

Page 3: Swift profiling middleware and tools

Background  l  Profiling  -­‐  a  form  of  dynamic  program  analysis  that  measures    

-  the  space  (memory)  or  time  complexity  of  a  program  

-  the  usage  of  particular  instructions  

-  Frequency  and  duration  of  function  calls  

l  Instrument  either  source  code  or  binary  executable  form  using  a  tool  called  profiler.  

l  The  missing  part  of  current  profiling  method  is  to  provide  details  of  code  level  information  and  explains:    

l  How  often  the  significant  part  of  code  is  executed  or  called?  

l  How  long  it  take  to  execute  these  calls?  

l  Where's  the  most  time  consumed?    On  I/O  operations,  waiting  for  db  lock  or  wasting  cycles  in  loop?  

l  Why  does  the  response  time  of  container  PUT  operation  increase?  

l  Where  does  the  memory  leaking  happen?    how  much  memory  consumed  by  specific  code  snippet?  

Page 4: Swift profiling middleware and tools

Profiling  Proposal  l  The  Goal  

l  Target  for  researchers,  developers  and  admins,  provide  a  method  of  profiling  Swift  code  to  improve  current  implementation  and  architecture  based  on  the  generated  data  and  its  analysis.  

l  Scope  

l  A  WSGI  middleware  to  inject  swift  servers  to  collect  profiling  data  

l  The  middleware  can  be  configured  with  parameters  in  paste  file  

l  Dump  the  profiling  data  periodically  into  local  disk  

l  A  multi-­‐dimension  data  model    

l  profiling  analysis,  including  dimension  of  workload,  system,  code,  time  and  metrics  of  frequency,  duration,  memory  consumed,  object  counts,  call  graph  etc.  

l  Analysis  tools  of  report  and  visualization  

l  Can  leverage  open  source  tools  

l  Can  be  integrated  into  admin  dashboard  of  Horizon  

l  Blueprint  and  POC  are  submitted  for  discussion.  

Page 5: Swift profiling middleware and tools

Swift  Profiling  Architecture  

Page 6: Swift profiling middleware and tools

Profiling  Granularity  l  System  Level  

-  Region  -  Higher  latency  off-­‐site  locations  

-  Zone  -  Availability  zone  

-  Node  l   e.g.  storage  node,  proxy  node  

-  Process  l  Daemons  such  as  replicator,  auditor,  updater  l  WSGI  application  such  as  Proxy  server,  a/c/o  server  

 

Page 7: Swift profiling middleware and tools

Profiling  Granularity  l  Code  Level  (Python  Runtime)  

-  Package  

l  eventlet,  xattr,  swift.common  -  Module  

l  e.g.  db.py,  swob.py,  wsgi.py,  http.py  

-  Function  l   e.g.    __init__,  __call__,  HEAD,  GET,  PUT,  POST,  DELETE  

-  Code  Line  

l  specific  line  of  code  

Page 8: Swift profiling middleware and tools

Profiling  Deployment  and  Data  Model  Node

Node WSGI Server

WSGI Server

Daemon Daemon

IO Profiler

Memory Profiler

CPU Profiler Region Zone

Profile Data Model

Multi-Dimensional Profiling Data Model

Package Module Function

Time Frequency

Memory Consumed Space

Code

System Node Process

Duration

Logic Call Graph

Objects Count

Memory Leaks

Workload Read/Write Object Size

Zone Region

Profiling Data Model

Time

Dimensions

Metrics

Guests

LineNo

Page 9: Swift profiling middleware and tools

Profiling  Tools  Available  or  Needed  

Granularity CPU Time/Call Graph Memory Disk/Network I/O Process repoze.profile objgraph

Package Module Function profile, cProfile, hotshot

eventlet.green.profile

Code Line memory_profiler eventlet_io_profiler?

Granularity CPU Time/Call Graph Memory Disk/Network I/O All layers pstat, runsnake,

kcachegrind memstat? iostat?

Profiling open source hooks

Profiling report and visualization open source tools

aggregate/slice/drill-down

Page 10: Swift profiling middleware and tools

Profiling  Middleware  [pipeline:main] pipeline = profile … proxy-server [filter:profile] use = egg:swift#profile log_filename_prefix = /opt/stack/data/swift/profile/pn1/proxy.profile dump_interval = 5 dump_timestamp = false discard_first_request = true path = /__profile__ flush_at_shutdown = false unwind = false

Page 11: Swift profiling middleware and tools

Performance  Overhead  for  Profiling  Middleware    

Node Memory Worker Replicas

CosBench Controller 3GB -

Cosbench Driver1 3GB 120 -

Cosbench Driver2 3GB 120 -

Proxy 31GB 24 3

Account 35GB 24 3

Container 35GB 24 3

Object1 31GB 24 3

Object2 31GB 24 3

Page 12: Swift profiling middleware and tools

swift/common/profile.py from eventlet.green import profile from memory_profiler import LineProfiler import linecache import inspect import time, io, sys, os def cpu_profiler(func): def cpu_profiler(log_file, with_timestamp=False): def _outer_fn(func): def _inner_fn(*args, **kwargs): ts = time.time() fpath = ''.join([log_file,'-', str(ts)]) prof = profile.Profile() pcall = prof.runcall(func, *args, **kwargs) prof.dump_stats(fpath) return pcall return _inner_fn return _outer_fn def mem_profiler(log_file, with_timestamp=False): def _outer_fn(func): def _inner_fn(*args, **kwargs): ts = time.time() prof = LineProfiler() val = prof(func)(*args, **kwargs) fpath = ''.join([log_file, '-' , str(ts)]) astream = io.open(fpath,'w') show_results(prof, astream, precision=3) astream.flush() return val return _inner_fn return _outer_fn

openstack@openstackvm:/opt/stack/data/swift/profile$ ll total 188 drwxrwxr-x 2 openstack openstack 4096 Jul 18 16:35 ./ drwxr-xr-x 7 openstack openstack 4096 Jul 18 15:17 ../ -rw-r--r-- 1 openstack openstack 105502 Jul 18 16:35 proxy.cprofile -rw-r--r-- 1 openstack openstack 1391 Jul 18 16:35 proxy.mprofile -rw-r--r-- 1 openstack openstack 7195 Jul 18 16:35 container.cprofile

swift/swift/proxy/server.py: from swift.common.profile import cpu_profiler, mem_profiler @cpu_profiler(‘/opt/stack/data/swift/profile/proxy.cprofile’) def __call__(self, env, start_response): @mem_profiler(‘/opt/stack/data/swift/profile/proxy.mprofile’) def handle_request(self, req): … swift/swift/container/server.py from swift.common.profile import cpu_profiler, mem_profiler @cpu_profiler(‘/opt/stack/data/swift/profile/container.cprofile’) def __call__(self, env, start_response): ...

dump profile data

import cpu and memory profiler

 Profile  Hook  for  Swift  

Page 13: Swift profiling middleware and tools

eventlet  awared  profiling  

import sys import eventlet from eventlet.green import urllib2 import time sys.path.append('./') from decorators import profile_eventlet def some_long_calculation(id) x = 0 for i in xrange(1,100000000): x = i + x / i print x def some_work(id): print('start') eventlet.sleep(0) print('end') @profile_eventlet('./ep1.profile') def main(): pile = eventlet.GreenPool(1000) pile.spawn(some_work, 1) #pile.spawn(some_long_calculation, 2) pile.waitall() if __name__ == '__main__': main()

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 test_eventlet_builtin_profile2.py:14(some_work) 1 7.380 7.380 7.380 7.380 test_eventlet_builtin_profile2.py:9(some_long_calculation)

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 7.071 7.071 test_regular_profile2.py:10(some_work) 1 7.070 7.070 7.070 7.070 test_regular_profile2.py:5(some_long_calculation)

Output of standard profile:✖

Output of eventlet aware profile:✔

•  https://github.com/colinhowe/eventlet_profiler •  https://lists.secondlife.com/pipermail/eventletdev/2012-September/

001094.html

some prior art

Page 14: Swift profiling middleware and tools

Profiling  Analysis  •  Top-­‐K  statistics  analysis  through  drill-­‐down,  roll-­‐up,  slicing  to  identity  hot  code  

snippets  or  potential  bottleneck  to  be  optimized  

–  e.g.  function  call  frequency  and  duration  per  node  (sortable,  filterable,  aggregation)  

–  e.g.  module  call  frequency  and  duration  per  node  (sortable,  filterable,  aggregation)  

•  Linear  or  non-­‐linear  algorithm  analysis  to  identify  scalability  problem  

–  e.g.  Object  read/write  throughput  at  different  workload  

•  Evolution  analysis  

l  e.g.  Capture  profile  data  by  time  interval  and  compare  

l  Code  association  analysis  

l  e.g.  Call  graph  

 

 

Page 15: Swift profiling middleware and tools

#python pstats2.py '../data/hybrid/object.*’ % ? Documented commands (type help <topic>): ======================================== EOF callees dump kcachegrind quit read runsnake stats tojson add callers help list rawdata reverse sort strip % sort calls % stats swift 5 3909969520 function calls (3495132609 primitive calls) in 77381.834 seconds Ordered by: call count List reduced from 526 to 110 due to restriction <'swift'> List reduced from 110 to 5 due to restriction <5> ncalls tottime percall cumtime percall filename:lineno(function) 54546321 130.314 0.000 220.887 0.000 /usr/local/lib/python2.7/dist-packages/swift-1.9.1-py2.7.egg/swift/common/swob.py:211(_normalize) 44597503 80.804 0.000 258.501 0.000 /usr/local/lib/python2.7/dist-packages/swift-1.9.1-py2.7.egg/swift/common/swob.py:219(__getitem__) 17635615 25.768 0.000 34.190 0.000 /usr/local/lib/python2.7/dist-packages/swift-1.9.1-py2.7.egg/swift/common/swob.py:659(getter) 16130776 61.326 0.000 85.730 0.000 /usr/local/lib/python2.7/dist-packages/swift-1.9.1-py2.7.egg/swift/common/swob.py:267(__setitem__) 9948818 19.429 0.000 62.618 0.000 /usr/local/lib/python2.7/dist-packages/swift-1.9.1-py2.7.egg/swift/common/swob.py:230(__contains__) % kcachegrind

Profiling  Report  Tool  –  pstat2  

Page 16: Swift profiling middleware and tools

Profiling  Visualization  Tool  -­‐  kcachegrind  

Page 17: Swift profiling middleware and tools

Profiling  Visualization  Tool  -­‐  kcachegrind  

Call graph of PUT function for object server

Page 18: Swift profiling middleware and tools

Example 1 - Profiling Analysis of File System Call

•  posix call time consumption on object server(1MB, R80/W20)

71.866  44%  

21.977  14%  

18.46  12%  

18.216  11%  

13.44  8%  

11.547  7%  

6.898  4%  

0.114  0%  

0.001  0%  

Time  of  POSIX  CALL  of  Object  Server  (1M)  

{posix.write}   {posix.stat}  

{posix.unlink}   {posix.open}  

{posix.close}   {posix.read}  

{posix.listdir}   {posix.getpid}  

{posix.urandom}  

Page 19: Swift profiling middleware and tools

db.py:103(<lambda>)  711.755  64%  

db.py:887(put_object)  157.067  14%  

db.py:119(chexor)  

73.197  7%  

db.py:92(_timeout)  57.488  

5%  

db.py:1162(merge_items)  

40.968  4%  

db.py:107(<lambda>)  27.244  2%  

db.py:173(__init__)  

22.603  2%  

db.py:102(execute)  

14.711  1%  

db.py:809(_commit_puts)  

10.252  1%  

db.py:751(get_db_version)  

3.932  0%  

db.py:86(__init__)  2.353  0%  

db.py:103(<lambda>)  

db.py:887(put_object)  

db.py:119(chexor)  

db.py:92(_timeout)  

db.py:1162(merge_items)  

db.py:107(<lambda>)  

db.py:173(__init__)  

db.py:102(execute)  

db.py:809(_commit_puts)  

db.py:751(get_db_version)  

db.py:86(__init__)  

Example 2 - Profiling Analysis of sqlite db call

Time  of  DB  CALL  of  A/C  Server  

Page 20: Swift profiling middleware and tools