high performance django
TRANSCRIPT
![Page 1: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/1.jpg)
David Cramer http://www.davidcramer.net/ http://www.ibegin.com/
High Performance Django
![Page 2: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/2.jpg)
Curse
• Peak daily traffic of approx. 15m pages, 150m hits.
• Average monthly traffic 120m pages, 6m uniques.
• Python, MySQL, Squid, memcached, mod_python, lighty.
• Most developers came strictly from PHP (myself included).
• 12 web servers, 4 database servers, 2 squid caches.
![Page 3: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/3.jpg)
iBegin
• Massive amounts of data, 100m+ rows.
• Python, PHP, MySQL, mod_wsgi.
• Small team of developers.
• Complex database partitioning/synchronization tasks.
• Attempting to not branch off of Django.
![Page 4: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/4.jpg)
Areas of Concern
• Database (ORM)
• Webserver (Resources, Handling Millions of Reqs)
• Caching (Invalidation, Cache Dump)
• Template Rendering (Logic Separation)
• Profiling
![Page 5: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/5.jpg)
Tools of the Trade
• Webserver (Apache, Nginx, Lighttpd)
• Object Cache (memcached)
• Database (MySQL, PostgreSQL, …)
• Page Cache (Squid, Nginx, Varnish)
• Load Balancing (Nginx, Perlbal)
![Page 6: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/6.jpg)
How We Did It
• “Primary” web servers serving Django using mod_python.
• Media servers using Django on lighttpd.
• Static served using additional instances of lighttpd.
• Load balancers passing requests to multiple Squids.
• Squids passing requests to multiple web servers.
![Page 7: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/7.jpg)
Lessons Learned
• Don’t be afraid to experiment. You’re not limited to a one.
• mod_wsgi is a huge step forward from mod_python.
• Serving static files using different software can help.
• Send proper HTTP headers where they are needed.
• Use services like S3, Akamai, Limelight, etc..
![Page 8: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/8.jpg)
Webserver Software Python Scripts • Apache (wsgi, mod_py,
fastcgi) • Lighttpd (fastcgi) • Nginx (fastcgi) Reverse Proxies • Nginx • Squid • Varnish
Static Content • Apache • Lighttpd • Tinyhttpd • Nginx Software Load Balancers • Nginx • Perlbal
![Page 9: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/9.jpg)
Database (ORM)
• Won’t make your queries efficient. Make your own indexes.
• select_related() can be good, as well as bad.
• Inherited ordering (Meta: ordering) will get you.
• Hundreds of queries on a page is never a good thing.
• Know when to not use the ORM.
![Page 10: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/10.jpg)
Handling JOINs class Category(models.Model):
name = models.CharField() created_by = models.ForeignKey(User)
class Poll(models.Model): name = models.CharField() category = models.ForeignKey(Category) created_by = models.ForeignKey(User)
# We need to output a page listing all Poll's with # their name and category's name.
def a_bad_example(request): # We have just caused Poll to JOIN with User and Category, # which will also JOIN with User a second time. my_polls = Poll.objects.all().select_related() return render_to_response('polls.html', locals(), request)
def a_good_example(request): # Use select_related explicitly in each case. poll = Poll.objects.all().select_related('category') return render_to_response('polls.html', locals(), request)
![Page 11: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/11.jpg)
Template Rendering
• Sandboxed engines are typically slower by nature.
• Keep logic in views and template tags.
• Be aware of performance in loops, and groupby (regroup).
• Loaded templates can be cached to avoid disk reads.
• Switching template engines is easy, but may not give you
any worthwhile performance gain.
![Page 12: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/12.jpg)
Template Engines
![Page 13: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/13.jpg)
Caching
• Two flavors of caching: object cache and browser cache.
• Django provides built-in support for both.
• Invalidation is a headache without a well thought out plan.
• Caching isn’t a solution for slow loading pages or improper indexes.
• Use a reverse proxy in between the browser and your web servers:
Squid, Varnish, Nginx, etc..
![Page 14: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/14.jpg)
Cache With a Plan
• Build your pages to use proper cache headers.
• Create a plan for object cache expiration, and invalidation.
• For typical web apps you can serve the same cached page
for both anonymous and authenticated users.
• Contain commonly used querysets in managers for
transparent caching and invalidation.
![Page 15: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/15.jpg)
Cache Commonly Used Items def my_context_processor(request):
# We access object_list every time we use our context processors so # it makes sense to cache this, no? cache_key = ‘mymodel:all’ object_list = cache.get(cache_key) if object_list is None: object_list = MyModel.objects.all() cache.set(cache_key, object_list) return {‘object_list’: object_list}
# Now that we are caching the object list we are going to want to invalidate it class MyModel(models.Model):
name = models.CharField()
def save(self, *args, **kwargs): super(MyModel, self).save(*args, **kwargs) # save it before you update the cache cache.set(‘mymodel:all’, MyModel.objects.all())
![Page 16: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/16.jpg)
Profiling Code
• Finding the bottleneck can be time consuming.
• Tools exist to help identify common problematic areas.
– cProfile/Profile Python modules.
– PDB (Python Debugger)
![Page 17: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/17.jpg)
Profiling Code With cProfile import sys try: import cProfile as profile except ImportError: import profile try: from cStringIO import StringIO except ImportError: import StringIO from django.conf import settings
class ProfilerMiddleware(object): def can(self, request): return settings.DEBUG and 'prof' in request.GET and (not settings.INTERNAL_IPS or request.META['REMOTE_ADDR'] in
settings.INTERNAL_IPS) def process_view(self, request, callback, callback_args, callback_kwargs): if self.can(request): self.profiler = profile.Profile() args = (request,) + callback_args return self.profiler.runcall(callback, *args, **callback_kwargs) def process_response(self, request, response): if self.can(request): self.profiler.create_stats() out = StringIO() old_stdout, sys.stdout = sys.stdout, out self.profiler.print_stats(1) sys.stdout = old_stdout response.content = '<pre>%s</pre>' % out.getvalue() return response
![Page 18: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/18.jpg)
http://localhost:8000/?prof
![Page 19: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/19.jpg)
Profiling Database Queries from django.db import connection class DatabaseProfilerMiddleware(object): def can(self, request): return settings.DEBUG and 'dbprof' in request.GET \ and (not settings.INTERNAL_IPS or \ request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS)
def process_response(self, request, response): if self.can(request): out = StringIO() out.write('time\tsql\n') total_time = 0 for query in reversed(sorted(connection.queries, key=lambda x: x['time'])): total_time += float(query['time'])*1000 out.write('%s\t%s\n' % (query['time'], query['sql']))
response.content = '<pre style="white-space:pre-wrap">%d queries executed in %.3f seconds\n\n%s</pre>' % (len(connection.queries), total_time/1000, out.getvalue())
return response
![Page 20: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/20.jpg)
http://localhost:8000/?dbprof
![Page 21: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/21.jpg)
Summary
• Database efficiency is the typical problem in web apps.
• Develop and deploy a caching plan early on.
• Use profiling tools to find your problematic areas. Don’t pre-
optimize unless there is good reason.
• Find someone who knows more than me to configure your
server software.
![Page 22: High Performance Django](https://reader033.vdocuments.net/reader033/viewer/2022052906/558b4187d8b42a49668b466f/html5/thumbnails/22.jpg)
Slides and code available online at: http://www.davidcramer.net/djangocon
Thanks!