MongoDB cookbook: Indexes

Indexes are crucial for the efficient execution of queries and aggregations in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e., scan every document in a collection.

If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key. So, be careful while choosing indexes.

Types Of Indexes

ref:
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/applications/indexes/

Single Field Index

For a single field index and sort operations, the sort order (i.e. ascending or descending) of the index key doesn't matter. With index intersetion, single field indexs could be powerful.

ref:
https://docs.mongodb.com/manual/core/index-single/

Compound Index

The order of the fields listed in a compound index is very important.

ref:
https://docs.mongodb.com/manual/core/index-compound/
https://docs.mongodb.com/manual/tutorial/create-indexes-to-support-queries/

Index Intersection

MongoDB can use multiple single field indexes to fulfill queries.

db.orders.createIndex({item: 1})
db.orders.createIndex({qty: 1})

db.orders.find({item: 'abc123', qty: {$gt: 15}})

ref:
https://docs.mongodb.com/manual/core/index-intersection/

Covered Queries

ref:
https://docs.mongodb.com/manual/core/query-optimization/#read-operations-covered-query

List Indexes

db.message.getIndexes()

// show collection statistics
db.message.stats()
db.message.stats().indexSizes

ref:
https://docs.mongodb.com/manual/tutorial/manage-indexes/

Add Indexes

An index which contains array fields might consume a lot of disk space.

db.message.createIndex({
    '_cls': 1,
    'sender': 1,
    'posted_at': 1
}, {'background': true, 'sparse': true})

db.message.createIndex({
    '_cls': 1,
    'includes': 1,
    'posted_at': 1
}, {'background': true, 'sparse': true})

db.getCollection('message').find({
    '$or': [
        // sent by cp
        {
            '_cls': 'Message.ChatMessage',
            'sender': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        },
        // sent by payer
        {
            '_cls': 'Message.GiftMessage',
            'includes': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        }
    ]
})

You can't index two arrays together, in this example: includes and unlocks.

// it doesn't work
db.message.createIndex({
    '_cls': 1,
    'sender': 1,
    'includes': 1,
    'unlocks': 1
}, {'background': true, 'sparse': true})

The Order Of Fields of Compound Indexes

The order of fields in an index matters, you must consider Index Cardinality and Selectivity. Instead, the order of fields in a find() query or $match in an aggregation doesn't affect whether it can use an index or not.

The order of fields in a compound index should be:

  • First, fields on which you will query for exact values.
  • Second, fields on which you will sort.
  • Finally, fields on which you will query for a range of values.

ref:
https://blog.mlab.com/2012/06/cardinal-ins/
https://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
https://stackoverflow.com/questions/5245737/mongodb-indexes-order-and-query-order-must-match
https://stackoverflow.com/questions/33545339/how-does-the-order-of-compound-indexes-matter-in-mongodb-performance-wise

Partial Indexes v.s. Sparse Indexes

Partial indexes should be preferred over sparse indexes. However, partial indexes only support a very small set of filter operators:

  • $exists
  • $eq or field: value
  • $gt, $gte, $lt, $lte
  • $type
  • $and

If you use 'partialFilterExpression': {'includes': {'$exists': true}}, MongoDB also indexes documents whose includes field has null value.

db.collection('message').createIndex(
    {'_cls': 1, 'includes': 1, 'posted_at': 1},
    {'background': true, 'partialFilterExpression': {'includes': {'$exists': true}}}
)

db.collection('message').createIndex(
  {'created_at': -1},
  {'background': true, 'partialFilterExpression': {'created_at': {'$gte': new Date("2018-01-01T16:00:00Z")}}}
)

ref:
https://docs.mongodb.com/manual/core/index-partial/
https://docs.mongodb.com/manual/core/index-sparse/

Create An Index On An Array Field

Querying will certainly be a lot easier in an array field index than a object field.

ref:
https://stackoverflow.com/questions/9589856/mongo-indexing-on-object-arrays-vs-objects

Create An Unique Index On An Array Field

Create an unique index on an array field.

The unique constraint applies to separate documents in the collection. That is, the unique index prevents separate documents from having the same value for the indexed key. It prevents different documents have the same transaction ID but allows one document has multiple identical transaction IDs.

db.getCollection('test1').createIndex({purchases.transaction_id: 1}, {unique: true})

db.getCollection('test1').insert({ _id: 1, purchases: [
    {transaction_id: 'A'}
]})

db.getCollection('test1').insert({ _id: 5, purchases: [
    {transaction_id: 'A'}
]})

db.getCollection('test1').update({ _id: 1}, {$push: {purchases: {transaction_id: 'A'}}})

To prevent one document has multiple identical transaction IDs, We would have atomic updates on single documents.

user = User(id=bson.ObjectId(user_id))
purchase = DirectPurchase(
    user=user,
    timestamp=timestamp,
    transaction_id=transaction_id,
)
MessagePackProduct.objects \
    .filter(id=message_pack_id, __raw__={
        'purchases': {'$not': {'$elemMatch': {
            '_cls': purchase._cls,
            'user': purchase.user.id,
        }}},
    }) \
    .update_one(push__purchases=purchase)

ref:
https://docs.mongodb.com/manual/core/index-unique/#unique-constraint-across-separate-documents

Sort With Indexes

ref:
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

Drop Indexes

db.message.dropIndex({
    'includes': 1
})

db.message.dropIndex({
    '_cls': 1,
    'posted_at': 1,
    'includes': 1
})

Remove Unused Indexes

You can use db.getCollection('message').aggregate({$indexStats: {}}) to find unused indexes, there is a accesses.ops which means the number of operations that have used the index. Also, you should remove indexes which have the same prefix.

db.getCollection('message').aggregate(
    {
        '$indexStats': {}
    },
    {
        '$match': {
            'accesses.ops': {'$gt': 0}
        }
    }
)

Result:

{
    "name" : "_cls_1_sender_1_posted_at_1",
    "key" : {
        "_cls" : 1,
        "sender" : 1,
        "posted_at" : 1
    },
    "host" : "a6ea11893605:27017",
    "accesses" : {
        "ops" : 3,
        "since" : "2018-01-26T07:04:51.137Z"
    }
}

這部漫畫真的不得了,有點像是重口味的 Zootopia,但是無論是劇情深度或是畫面的衝擊性都太 hardcore 了,我一個老實人是有點頂不住。

同場推薦 Designs 和約定的夢幻島,這陣子看得漫畫都太獵奇了。

ref:
https://blog.mlab.com/2017/01/using-mongodb-indexstats-to-identify-and-remove-unused-indexes/
https://scalegrid.io/blog/how-to-find-unused-indexes-in-mongodb/

Profiling

// enable
db.setProfilingLevel(2)

// disable
db.setProfilingLevel(0)

// see profiling data after you issues some queries
db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()

// delete profiling data
db.system.profile.drop()

Query Explain

There are both collection.find().explain() and collection.explain().find(). It's recommended to use collection.find().explain('executionStats') for getting more information, like total documents examined.

db.getCollection('message').find({
    '$or': [
        // sent by cp
        {
            '_cls': 'Message.ChatMessage',
            'sender': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        },
        {
            '_cls': 'Message',
            'sender': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        },
        // sent by payer
        {
            '_cls': 'Message.ChatMessage',
            'includes': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        },
        {
            '_cls': 'Message.ReplyMessage',
            'includes': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        },
        {
            '_cls': 'Message.GiftMessage',
            'includes': ObjectId('582ee32a5b9c861c87dc297e'),
            'posted_at': {
                '$gte': ISODate('2018-01-08T00:00:00.000Z'),
                '$lt': ISODate('2018-01-14T00:00:00.000Z')
            }
        }
    ]
})
// .explain()
// .explain('allPlansExecution')
.explain('executionStats')

ref:
https://docs.mongodb.com/manual/reference/method/cursor.explain/
https://docs.mongodb.com/manual/reference/method/db.collection.explain/#db.collection.explain

You could also explain a .update() query. However, .updateMany() and .updateOne() don't support .explain().

db.getCollection('user').explain().update(
    {'follows.user': ObjectId("57985b784af4124063f090d3")},
    {'$set': {'created_at': ISODate('2018-01-01 00:00:00.000Z')}},
    {'multi': true}
)

Some important fields to look at in the result of explain():

  • executionStats.totalKeysExamined
  • executionStats.totalDocsExamined
  • queryPlanner.winningPlan.stage
  • queryPlanner.winningPlan.inputStage.stage
  • queryPlanner.winningPlan.inputStage.indexName
  • queryPlanner.winningPlan.inputStage.direction

Possible values of stage:

  • COLLSCAN: scanning the entire collection
  • IXSCAN: scanning index keys
  • FETCH: retrieving documents
  • SHARD_MERGE: merging results from shards

ref:
https://docs.mongodb.com/manual/reference/explain-results/

Aggregation Explain

db.getCollection('message').explain().aggregate()

ref:
https://stackoverflow.com/questions/12702080/mongodb-explain-for-aggregation-framework
https://docs.mongodb.com/manual/reference/method/db.collection.explain/

If $project, $unwind, or $group occur prior to the $sort operation, $sort cannot use any indexes. Additionally, $sort can only use fields defined in previous $project stage.

Basically, you could just consider the $match part when you want to create new indexes.

ref:
https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-operator-and-performance

MongoEngine

_cls creation on indexes is automatically included if allow_inheritance is on. If you want to disable, set kwarg cls: False.

ref:
http://docs.mongoengine.org/guide/defining-documents.html#indexes