Indexes are crucial for the efficient execution of queries and aggregations in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e., scan every document in a collection.
If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key. So, be careful while choosing indexes.
Types Of Indexes
ref:
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/applications/indexes/
Single Field Index
For a single field index and sort operations, the sort order (i.e. ascending or descending) of the index key doesn't matter. With index intersetion, single field indexs could be powerful.
ref:
https://docs.mongodb.com/manual/core/index-single/
Compound Index
The order of the fields listed in a compound index is very important.
ref:
https://docs.mongodb.com/manual/core/index-compound/
https://docs.mongodb.com/manual/tutorial/create-indexes-to-support-queries/
TTL Index
When the TTL thread is active, a background thread in mongod reads the values in the index and removes expired documents from the collection. You will see delete operations in the output of db.currentOp()
.
TTL indexes are a single-field indexes. Compound indexes do not support TTL and ignore the expireAfterSeconds
option.
import datetime
class JournalEntry(db.Document):
users = db.ListField(db.ReferenceField('User'))
event = db.StringField()
context = db.DynamicField()
timestamp = db.DateTimeField(default=datetime.datetime.utcnow)
meta = {
'index_background': True,
'indexes': [
{
'fields': ['timestamp'],
'cls': False,
'expireAfterSeconds': int(datetime.timedelta(days=90).total_seconds()),
},
],
}
ref:
https://docs.mongodb.com/manual/core/index-ttl/
Index Intersection
MongoDB can use multiple single field indexes to fulfill queries.
db.orders.createIndex({tags: 1});
db.orders.createIndex({key: { created_at: -1 }, background: true});
db.orders.find({item: 'abc123', qty: {$gt: 15}});
ref:
https://docs.mongodb.com/manual/core/index-intersection/
Covered Queries
ref:
https://docs.mongodb.com/manual/core/query-optimization/#read-operations-covered-query
Index Limits
The size of an index entry for an indexed field must be less than 1024 bytes. For instance, an arbitrary URL field can easily exceed 1024 bytes.
MongoDB will not insert into an indexed collection any document with an indexed field whose corresponding index entry would exceed the index key limit, and instead, will return an error; Updates to the indexed field will error if the updated value causes the index entry to exceed the indexkey limit.
ref:
https://docs.mongodb.com/manual/reference/limits/#indexes
List Indexes
db.message.getIndexes()
// show collection statistics
db.message.stats()
db.message.stats().indexSizes
// scale defaults to 1 to return size data in bytes
// 1024 * 1024 means MB
db.getCollection('message').stats({'scale': 1024 * 1024}).indexSizes
ref:
https://docs.mongodb.com/manual/tutorial/manage-indexes/
Add Indexes
TODO:
It seems like creating indexes on empty collection, even with background will cause DB latency.
An index which contains array fields might consume a lot of disk space.
db.message.createIndex({
'_cls': 1,
'sender': 1,
'posted_at': 1
}, {'background': true, 'sparse': true})
db.message.createIndex({
'_cls': 1,
'includes': 1,
'posted_at': 1
}, {'background': true, 'sparse': true})
db.getCollection('message').find({
'$or': [
// sent by cp
{
'_cls': 'Message.ChatMessage',
'sender': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
},
// sent by payer
{
'_cls': 'Message.GiftMessage',
'includes': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
}
]
})
import pymongo
from your_app.models import YourModel
YourModel._get_collection().create_index(
[
('users', pymongo.ASCENDING),
('timestamp', pymongo.DESCENDING),
],
background=True,
partialFilterExpression={'timestamp': {'$exists': True}},
)
You can't index two arrays together, in this example: includes
and unlocks
.
// it doesn't work
db.message.createIndex({
'_cls': 1,
'sender': 1,
'includes': 1,
'unlocks': 1
}, {'background': true, 'sparse': true})
The Order Of Fields of Compound Indexes
The order of fields in an index matters, you must consider Index Cardinality and Selectivity. Instead, the order of fields in a find()
query or $match
in an aggregation doesn't affect whether it can use an index or not.
The order of fields in a compound index should be:
- First, fields on which you will query for exact values.
- Second, fields on which you will sort.
- Finally, fields on which you will query for a range of values.
ref:
https://docs.mongodb.com/manual/core/index-compound/#prefixes
https://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
https://blog.mlab.com/2012/06/cardinal-ins/
https://stackoverflow.com/questions/33545339/how-does-the-order-of-compound-indexes-matter-in-mongodb-performance-wise
https://stackoverflow.com/questions/5245737/mongodb-indexes-order-and-query-order-must-match
Partial Indexes v.s. Sparse Indexes
Partial indexes should be preferred over sparse indexes. However, partial indexes only support a very small set of filter operators:
$exists
$eq
orfield: value
$gt
,$gte
,$lt
,$lte
$type
$and
If you use 'partialFilterExpression': {'includes': {'$exists': true}}
, MongoDB also indexes documents whose includes
field has null
value.
db.collection('message').createIndex(
{'_cls': 1, 'includes': 1, 'posted_at': 1},
{'background': true, 'partialFilterExpression': {'includes': {'$exists': true}}}
)
db.collection('message').createIndex(
{'created_at': -1},
{'background': true, 'partialFilterExpression': {'created_at': {'$gte': new Date("2018-01-01T16:00:00Z")}}}
)
ref:
https://docs.mongodb.com/manual/core/index-partial/
https://docs.mongodb.com/manual/core/index-sparse/
Create An Index On An Array Field
Querying will certainly be a lot easier in an array field index than a object field.
ref:
https://stackoverflow.com/questions/9589856/mongo-indexing-on-object-arrays-vs-objects
Create An Unique Index On An Array Field
Create an unique index on an array field.
The unique constraint applies to separate documents in the collection. That is, the unique index prevents separate documents from having the same value for the indexed key. It prevents different documents have the same transaction ID but allows one document has multiple identical transaction IDs.
db.getCollection('test1').createIndex({purchases.transaction_id: 1}, {unique: true})
db.getCollection('test1').insert({ _id: 1, purchases: [
{transaction_id: 'A'}
]})
db.getCollection('test1').insert({ _id: 5, purchases: [
{transaction_id: 'A'}
]})
db.getCollection('test1').update({ _id: 1}, {$push: {purchases: {transaction_id: 'A'}}})
To prevent one document has multiple identical transaction IDs, We would have atomic updates on single documents.
user = User(id=bson.ObjectId(user_id))
purchase = DirectPurchase(
user=user,
timestamp=timestamp,
transaction_id=transaction_id,
)
MessagePackProduct.objects \
.filter(id=message_pack_id, __raw__={
'purchases': {'$not': {'$elemMatch': {
'_cls': purchase._cls,
'user': purchase.user.id,
}}},
}) \
.update_one(push__purchases=purchase)
ref:
https://docs.mongodb.com/manual/core/index-unique/#unique-constraint-across-separate-documents
Sort With Indexes
ref:
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/
Drop Indexes
db.message.dropIndex({
'includes': 1
})
db.message.dropIndex({
'_cls': 1,
'posted_at': 1,
'includes': 1
})
Remove Unused Indexes
You can use db.getCollection('COLLECTION_NAME').aggregate({$indexStats: {}})
to find unused indexes, there is a accesses.ops
field which indicates the number of operations that have used the index. Also, you might want to remove indexes which have the same prefix.
db.getCollection('message').aggregate(
{
'$indexStats': {}
},
{
'$match': {
'accesses.ops': {'$gt': 0}
}
}
);
Result:
{
"name" : "_cls_1_sender_1_posted_at_1",
"key" : {
"_cls" : 1,
"sender" : 1,
"posted_at" : 1
},
"host" : "a6ea11893605:27017",
"accesses" : {
"ops" : 3,
"since" : "2018-01-26T07:04:51.137Z"
}
}
ref:
https://blog.mlab.com/2017/01/using-mongodb-indexstats-to-identify-and-remove-unused-indexes/
https://scalegrid.io/blog/how-to-find-unused-indexes-in-mongodb/
Profiling
// enable
db.setProfilingLevel(2)
// disable
db.setProfilingLevel(0)
// see profiling data after you issues some queries
db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()
// delete profiling data
db.system.profile.drop()
Query Explain
There are both collection.find().explain()
and collection.explain().find()
. It's recommended to use collection.find().explain('executionStats')
for getting more information, like total documents examined.
db.getCollection('message').find({
'$or': [
// sent by cp
{
'_cls': 'Message.ChatMessage',
'sender': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
},
{
'_cls': 'Message',
'sender': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
},
// sent by payer
{
'_cls': 'Message.ChatMessage',
'includes': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
},
{
'_cls': 'Message.ReplyMessage',
'includes': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
},
{
'_cls': 'Message.GiftMessage',
'includes': ObjectId('582ee32a5b9c861c87dc297e'),
'posted_at': {
'$gte': ISODate('2018-01-08T00:00:00.000Z'),
'$lt': ISODate('2018-01-14T00:00:00.000Z')
}
}
]
})
// .explain()
// .explain('allPlansExecution')
.explain('executionStats')
ref:
https://docs.mongodb.com/manual/reference/method/cursor.explain/
https://docs.mongodb.com/manual/reference/method/db.collection.explain/#db.collection.explain
You could also explain a .update()
query. However, .updateMany()
and .updateOne()
don't support .explain()
.
db.getCollection('user').explain().update(
{'follows.user': ObjectId("57985b784af4124063f090d3")},
{'$set': {'created_at': ISODate('2018-01-01 00:00:00.000Z')}},
{'multi': true}
)
Some important fields to look at in the result of explain()
:
executionStats.totalKeysExamined
executionStats.totalDocsExamined
queryPlanner.winningPlan.stage
queryPlanner.winningPlan.inputStage.stage
queryPlanner.winningPlan.inputStage.indexName
queryPlanner.winningPlan.inputStage.direction
Possible values of stage
:
COLLSCAN
: scanning the entire collectionIXSCAN
: scanning index keysFETCH
: retrieving documentsSHARD_MERGE
: merging results from shards
ref:
https://docs.mongodb.com/manual/reference/explain-results/
Aggregation Explain
db.getCollection('message').explain().aggregate()
ref:
https://stackoverflow.com/questions/12702080/mongodb-explain-for-aggregation-framework
https://docs.mongodb.com/manual/reference/method/db.collection.explain/
If $project
, $unwind
, or $group
occur prior to the $sort
operation, $sort
cannot use any indexes. Additionally, $sort
can only use fields defined in previous $project
stage.
Basically, you could just consider the $match
part when you want to create new indexes.
ref:
https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-operator-and-performance
MongoEngine
_cls
creation on indexes is automatically included if allow_inheritance
is on. If you want to disable, set kwarg cls: False
.
ref:
http://docs.mongoengine.org/guide/defining-documents.html#indexes