a17 indexing and query optimization by paul pederson

51
Indexing and Query Optimization Paul Pedersen Monday, October 15, 12

Upload: insight-technology-inc

Post on 08-Jun-2015

538 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: A17 indexing and query optimization by paul pederson

Indexing and Query OptimizationPaul Pedersen

Monday, October 15, 12

Page 2: A17 indexing and query optimization by paul pederson

What’s in store

• What are indexes?

• Picking the right indexes.

• Creating indexes in MongoDB

• Troubleshooting

Monday, October 15, 12

Page 3: A17 indexing and query optimization by paul pederson

Indexes are the single biggesttunable performance factor

in MongoDB.

Monday, October 15, 12

Page 4: A17 indexing and query optimization by paul pederson

Absent or suboptimal indexes are the most common avoidable

MongoDB performance problem.

Monday, October 15, 12

Page 5: A17 indexing and query optimization by paul pederson

So what problem do indexes solve?

Monday, October 15, 12

Page 6: A17 indexing and query optimization by paul pederson

Monday, October 15, 12

Page 7: A17 indexing and query optimization by paul pederson

How do you find a chicken recipe?

• An unindexed cookbook might be quite a page turner.

• Probably not what you want, though.

Monday, October 15, 12

Page 8: A17 indexing and query optimization by paul pederson

I know, I’ll use an index!

Monday, October 15, 12

Page 9: A17 indexing and query optimization by paul pederson

Monday, October 15, 12

Page 10: A17 indexing and query optimization by paul pederson

Let’s imagine a simple index

ingredient page

aardvark 790

... ...

beef 190, 191, 205, ...

... ...

chicken 182, 199, 200, ...

chorizo 497, ...

... ...

zucchini 673, 986, ...

Monday, October 15, 12

Page 11: A17 indexing and query optimization by paul pederson

How do you find a quick chicken recipe?

Monday, October 15, 12

Page 12: A17 indexing and query optimization by paul pederson

Let’s imagine a compound index

ingredient cooking time page

... ... ...

chicken 15 min 182, 200

chicken 25 min 199

chicken 30 min 289,316,320

chicken 45 min 290, 291, 354

... ... ...

Monday, October 15, 12

Page 13: A17 indexing and query optimization by paul pederson

Consider the ordering of index keys

Chicken, 15 min

Chicken, 45 min

Chicken, 25 min

Chicken, 30 min

Aardvark, 20 min Zuchinni, 45 min

Monday, October 15, 12

Page 14: A17 indexing and query optimization by paul pederson

How about a low-calorie chicken recipe?

Monday, October 15, 12

Page 15: A17 indexing and query optimization by paul pederson

Let’s imagine a 2nd compound index

ingredient calories page

... ... ...

chicken 250 199, 316

chicken 300 289,291

chicken 425 320

... ... ...

Monday, October 15, 12

Page 16: A17 indexing and query optimization by paul pederson

How about a quick, low-calorie recipe?

Monday, October 15, 12

Page 17: A17 indexing and query optimization by paul pederson

Let’s imagine a last compound index

calories cooking time page

... ... ...

250 25 min 199

250 30 min 316

300 25 min 289

300 45 min 291

425 30 min 320

... ... ...

How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?

Monday, October 15, 12

Page 18: A17 indexing and query optimization by paul pederson

Consider the ordering of index keys

250 cal,25 min

250 cal,30 min

300 cal,25 min

300 cal,45 min

How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?

4 index entries will be scanned, but only 1 will match!

425 cal,30 min

Monday, October 15, 12

Page 19: A17 indexing and query optimization by paul pederson

Range queries using an index on A, B• A is a range

• A is constant, B is a range

• A is constant, order by B

• A is range, B is constant/range

• B is constant/range, A unspecified

Monday, October 15, 12

Page 20: A17 indexing and query optimization by paul pederson

It’s really that straightforward.

Monday, October 15, 12

Page 21: A17 indexing and query optimization by paul pederson

B-Trees (Bayer & McCreight ’72)

Monday, October 15, 12

Page 22: A17 indexing and query optimization by paul pederson

B-Trees (Bayer & McCreight ’72)

13

Monday, October 15, 12

Page 23: A17 indexing and query optimization by paul pederson

B-Trees (Bayer & McCreight ’72)

13

Queries, Inserts, Deletes: O(log n)

Monday, October 15, 12

Page 24: A17 indexing and query optimization by paul pederson

All this is relevant to MongoDB.

• MongoDB’s indexes are B-Trees, which are designed for range queries.

• Generally, the best index for your queries is going to be a compound index.

• Every additional index slows down inserts & removes, and may slow updates.

Monday, October 15, 12

Page 25: A17 indexing and query optimization by paul pederson

On to MongoDB!

Monday, October 15, 12

Page 26: A17 indexing and query optimization by paul pederson

Declaring Indexes

• db.foo.ensureIndex( { username : 1 } )

Monday, October 15, 12

Page 27: A17 indexing and query optimization by paul pederson

Declaring Indexes

• db.foo.ensureIndex( { username : 1 } )

• db.foo.ensureIndex( { username : 1, created_at : -1 } )

Monday, October 15, 12

Page 28: A17 indexing and query optimization by paul pederson

And managing them....

> db.system.indexes.find() //db.foo.getIndexes()

{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }

Monday, October 15, 12

Page 29: A17 indexing and query optimization by paul pederson

And managing them....

> db.system.indexes.find() //db.foo.getIndexes()

{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }

> db.foo.dropIndex( { username : 1} )

{ "nIndexesWas" : 2 , "ok" : 1 }

Monday, October 15, 12

Page 30: A17 indexing and query optimization by paul pederson

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

Monday, October 15, 12

Page 31: A17 indexing and query optimization by paul pederson

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

• “_id” index is automatic (except capped collections before 2.2)

Monday, October 15, 12

Page 32: A17 indexing and query optimization by paul pederson

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

• “_id” index is automatic (except capped collections before 2.2)

• All queries can use just 1 index (except $or queries).

Monday, October 15, 12

Page 33: A17 indexing and query optimization by paul pederson

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

• “_id” index is automatic (except capped collections before 2.2)

• All queries can use just 1 index (except $or queries).

• The maximum index key size is 1024 bytes.

Monday, October 15, 12

Page 34: A17 indexing and query optimization by paul pederson

Indexes get used where you’d expect

• db.foo.find({x : 42}) • db.foo.find({x : {$in : [42,52]}}) • db.foo.find({x : {$lt : 42})• update, findAndModify that select on x,• count, distinct,• $match in aggregation• left-anchored regexp, e.g. /^Kev/

Monday, October 15, 12

Page 35: A17 indexing and query optimization by paul pederson

But indexes aren’t always helpful

• Most negations: $not, $nin, $ne

• Some corner cases: $mod, $where

• Matching most regular expressions, e.g. /a/ or /foo/i

Monday, October 15, 12

Page 36: A17 indexing and query optimization by paul pederson

Advanced Options

Monday, October 15, 12

Page 37: A17 indexing and query optimization by paul pederson

Arrays: the powerful “multiKey” index

{ title : “Chicken Noodle Soup”, ingredients : [“chicken”, “noodles”] }

ingredients page

chicken 42

... ...

noodles 42

... ...

> db.foo.ensureIndex( { ingredients : 1 } )

Monday, October 15, 12

Page 38: A17 indexing and query optimization by paul pederson

Unique Indexes

• db.foo.ensureIndex( { email : 1 } , {unique : true} )

> db.foo.insert({email : “[email protected]”})> db.foo.insert({email : “[email protected]”}) E11000 duplicate key error ...

Monday, October 15, 12

Page 39: A17 indexing and query optimization by paul pederson

Sparse Indexes

• db.foo.ensureIndex( { email : 1 } , {sparse : true} )

No index entries for docs without “email” field

Monday, October 15, 12

Page 40: A17 indexing and query optimization by paul pederson

Geospatial Indexes

{ name: "10gen Office", lat_long: [ 52.5184, 13.387 ] }

> db.foo.ensureIndex( { lat_long : “2d” } )

> db.locations.find( { lat_long: {$near: [52.53, 13.4] } } )

Monday, October 15, 12

Page 41: A17 indexing and query optimization by paul pederson

Troubleshooting

Monday, October 15, 12

Page 42: A17 indexing and query optimization by paul pederson

The Query Optimizer

• For each “type” of query, mongoDB periodically tries all useful indexes.

• Aborts as soon as one plan wins.

• Winning plan is temporarily cached.

Monday, October 15, 12

Page 43: A17 indexing and query optimization by paul pederson

Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ){ "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ...}

Monday, October 15, 12

Page 44: A17 indexing and query optimization by paul pederson

Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ){ "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ...}

Pay attention to the ratio n/nscanned!

Monday, October 15, 12

Page 45: A17 indexing and query optimization by paul pederson

Think you know better? Give us a hint> db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} )

Monday, October 15, 12

Page 46: A17 indexing and query optimization by paul pederson

Recording slow queries> db.setProfilingLevel( n , slowms=100ms )

n=0 profiler offn=1 record queries longer than slowms n=2 record all queries

> db.system.profile.find()

Monday, October 15, 12

Page 47: A17 indexing and query optimization by paul pederson

Operational Tips

Monday, October 15, 12

Page 48: A17 indexing and query optimization by paul pederson

Background index builds

db.foo.ensureIndex( { user : 1 } , { background : true } )

Caveats:• still resource-intensive• will build in foreground on secondaries

Monday, October 15, 12

Page 49: A17 indexing and query optimization by paul pederson

Minimizing impact on Replica Sets

for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup()

p.stepDown()p.restartAsStandalone()p.buildIndex()p.restartAsReplSetMember()

Monday, October 15, 12

Page 50: A17 indexing and query optimization by paul pederson

Absent or suboptimal indexes are the most common avoidable

MongoDB performance problem...

...so take some time and get your indexes right!

Monday, October 15, 12

Page 51: A17 indexing and query optimization by paul pederson

Thanks!

Monday, October 15, 12