advanced django orm techniques

51
Advanced Django ORM techniques Daniel Roseman http://blog.roseman.org.uk

Upload: daniel-roseman

Post on 29-Jan-2018

33.828 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Advanced Django ORM techniques

Advanced Django ORM techniques

Daniel Roseman http://blog.roseman.org.uk

Page 2: Advanced Django ORM techniques

About Me

• Python user for five years

• Discovered Django four years ago

• Worked full-time with Python/Django since 2008.

• Top Django answerer on StackOverflow!

• Occasionally blog on Django, concentrating on efficient use of the ORM.

Page 3: Advanced Django ORM techniques

Contents

• Behind the scenes: models and fields

• How model relationships work

• More efficient relationships

• Other optimising techniques

Page 4: Advanced Django ORM techniques

Django ORM efficiency: a story

Page 5: Advanced Django ORM techniques
Page 6: Advanced Django ORM techniques

414 queries!

Page 7: Advanced Django ORM techniques

How can you stop this happening to you?

http://www.flickr.com/photos/m0n0/4479450696

Page 8: Advanced Django ORM techniques

Behind the scenes: models and fields

http://www.flickr.com/photos/spacesuitcatalyst/847530840

Page 9: Advanced Django ORM techniques

Defining a model

• Model structure initialised via metaclass

• Called when model is first defined

• Resulting model class stored in cache to use when instantiated

Page 10: Advanced Django ORM techniques

Fields

• Fields have contribute_to_class

• Adds methods, eg get_FOO_display()

• Enables use of descriptors for field access

Page 11: Advanced Django ORM techniques

Model metadata

• Model._meta

• .fields

• .get_field(fieldname)

• .get_all_related_objects()

Page 12: Advanced Django ORM techniques

Model instantiation

• Instance is populated from database initially

• Has no subsequent relationship with db until save

• No identity between models

Page 13: Advanced Django ORM techniques

Querysets

• Model manager returns a queryset: foos = Foo.objects.all()

• Queryset is an ordered list of instances of a single model

• No database access yet

• Slice: foos[0]

• Iterate: {% for foo in foos %}

Page 14: Advanced Django ORM techniques

Where do all those queries come from?

• Repeated queries

• Lack of caching

• Relational lookup

• Templates as well as views

Page 15: Advanced Django ORM techniques

Repeated queries

def get_absolute_url(self): return "%s/%s" % ( self.category.slug, self.slug )

Same category, but query is repeated for each article

Page 16: Advanced Django ORM techniques

• Same link on every page

• Dynamic, so can't go in urlconf

• Could be cached or memoized

Repeated queries

Page 17: Advanced Django ORM techniques

Relationships

http://www.flickr.com/photos/katietegtmeyer/124315322

Page 18: Advanced Django ORM techniques

Relational lookups

• Forwards:

foo.bar.field

• Backwards:

bar.foo_set.all()

Page 19: Advanced Django ORM techniques

Example modelsclass Foo(models.Model):

name = models.CharField(max_length=10)

class Bar(models.Model):

name = models.CharField(max_length=10)

foo = models.ForeignKey(Foo)

Page 20: Advanced Django ORM techniques

Forwards relationship

>>> bar = Bar.objects.all()[0]

>>> bar.__dict__

{'id': 1, 'foo_id': 1, 'name': u'item1'}

Page 21: Advanced Django ORM techniques

Forwards relationship

>>> bar.foo.name

u'item1'

>>> bar.__dict__

{'_foo_cache': <Foo: Foo object>, 'id': 1, 'foo_id': 1, 'name': u'item1'}

Page 22: Advanced Django ORM techniques

Fowards relationships

• Relational access implemented via a descriptor:django.db.models.fields.related.SingleRelatedObjectDescriptor

• __get__ tries to access _foo_cache

• If doesn't exist, does lookup and creates cache

Page 23: Advanced Django ORM techniques

select_related

• Automatically follows foreign keys in SQL query

• Prepopulates _foo_cache

• Doesn't follow null=True relationships by default

• Makes query more expensive, so be sure you need it

Page 24: Advanced Django ORM techniques

Backwards relationships{% for foo in my_foos %}

{% for bar in foo.bar_set.all %}

{{ bar.name }}

{% endfor %}

{% endfor %}

Page 25: Advanced Django ORM techniques

Backwards relationships

• One query per foo

• If you iterate over foo_set again, you generate a new set of db hits

• No _foo_cache

• select_related does not work here

Page 26: Advanced Django ORM techniques

Optimising backwards relationships

• Get all related objects at once

• Sort by ID of parent object

• Then cache in hidden attribute as with select_related

Page 27: Advanced Django ORM techniques

qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related

Page 28: Advanced Django ORM techniques

qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related

Page 29: Advanced Django ORM techniques

qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related

Page 30: Advanced Django ORM techniques

qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related

Page 31: Advanced Django ORM techniques

qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related

Page 32: Advanced Django ORM techniques

qs = Foo.objects.filter(criteria=whatever)obj_dict = dict([(obj.id, obj) for obj in qs])objects = Bar.objects.filter(foo__in=qs)relation_dict = {}for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj)for id, related in relation_dict.items(): obj_dict[id]._related = related

Page 33: Advanced Django ORM techniques

Optimising backwards[{'time': '0.000', 'sql': u'SELECT "foobar_foo"."id", "foobar_foo"."name" FROM "foobar_foo"'},

{'time': '0.000', 'sql': u'SELECT "foobar_bar"."id", "foobar_bar"."name", "foobar_bar"."foo_id" FROM "foobar_bar" WHERE "foobar_bar"."foo_id" IN (SELECT U0."id" FROM "foobar_foo" U0)'}]

Page 34: Advanced Django ORM techniques

Optimising backwards

• Still quite expensive, as can mean large dependent subquery – MySQL in particular very bad at these

• But now just two queries instead of n

• Not automatic – need to remember to use _related_items attribute

Page 35: Advanced Django ORM techniques

Generic relations• Foreign key to ContentType, object_id

• Descriptor to enable direct access

• iterating through creates n+m queries(n=number of source objects,m=number of different content types)

• ContentType objects automatically cached

• Forwards relationship creates _foo_cache

• but select_related doesn't work

Page 36: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 37: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 38: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 39: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 40: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 41: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 42: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 43: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 44: Advanced Django ORM techniques

generics = {}for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id)content_types = ContentType.objects.in_bulk( generics.keys())relations = {}for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects.\ in_bulk(list(fk_list))for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )

Page 45: Advanced Django ORM techniques

Other optimising techniques

Page 46: Advanced Django ORM techniques

Memoizing

• Cache property on first access

• Can cache within instance, if multiple accesses within same request

def get_expensive_items(self): if not hasattr(self, '_cache'): self._cache = self.expensive_op() return self._cache

Page 47: Advanced Django ORM techniques

DB Indexes

• Pay attention to slow query log and debug toolbar output

• Add extra indexes where necessary - especially for multiple-column lookup

• Use EXPLAIN

Page 48: Advanced Django ORM techniques

Outsourcing

• Does all the logic need to go in the web app?

• Services - via eg Piston

• Message queues

• Distributed tasks, eg Celery

Page 49: Advanced Django ORM techniques

Summary

• Understand where queries are coming from

• Optimise where necessary, within Django or in the database

• and...

Page 50: Advanced Django ORM techniques

PROFILE

Page 51: Advanced Django ORM techniques

Daniel Roseman

http://blog.roseman.org.uk