Change index mappings with zero downtime using elasticsearch-py

Basically you can't change mappings (so-called "schema") in Elasticsearch. You may add fields free but changing field definitions (field types or analyzers) of mappings is impossible. One way or another, you need to create a new index.

Steps:

  • Create an alias my_index which points to the old index my_index_v1
  • Use my_index instead of my_index_v1 in your application
  • Create a new index my_index_v2 with new mappings
  • Transfer documents from old index to new index - a.k.a. reindex
  • Associate the alias my_index with index my_index_v2
  • Delete the old index my_index_v1
from datetime import datetime

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

es_client = Elasticsearch(hosts=settings.ES_HOSTS)

# make sure that this alias doesn't conflict with any existing index name
alias = 'packer'

# CAUTION: if you have an index already, you should create an alias for it first
# es_client.indices.put_alias(index='your_current_index', name=alias)

old_indexes = list(es_client.indices.get_alias(alias).keys())
try:
    old_index = old_indexes[0]
except IndexError:
    old_index = None
else:
    if len(old_indexes) > 1:
        raise RuntimeError('Alias `{0}` points to {1} indexes that may cause error when writing data to `{0}`'.format(alias, len(old_indexes)))

new_index = '{}_{}'.format(alias, datetime.now().strftime('%Y%m%d%H%M%S%f'))

available_types = [TrackDoc, AlbumDoc]
for my_doc_type in available_types:
    # create a new index with new mappings
    my_doc_type.init(index=new_index)

if old_index:
    # transfer documents from old index to new index
    reindex(es_client, source_index=old_index, target_index=new_index)

    es_client.indices.update_aliases({
        'actions': [
            {'remove': {'index': old_index, 'alias': alias}},
            {'add': {'index': new_index, 'alias': alias}},
        ],
    })
else:
    es_client.indices.update_aliases({
        'actions': [
            {'add': {'index': new_index, 'alias': alias}},
        ],
    })

ref:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime
https://blog.codecentric.de/en/2014/09/elasticsearch-zero-downtime-reindexing-problems-solutions/
http://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex

An alias can point to multiple indexes, in that case, reading (searching) from the alias performs perfectly, writing (indexing) to the alias raises an exception: Alias [my_index] has more than one indices associated with it [[my_index_v1, my_index_v2]], can't execute a single index op.

It's not recommended to set the same alias for multiple indexes unless explicitly using a specific index for writing data.

ref:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multiple-indices.html

Create aliases

# list all indexes and their aliases
$ curl 'http://127.0.0.1:9200/_aliases'

# create an alias
$ curl -XPOST 'http://127.0.0.1:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "dps", "alias" : "packer" } }
    ]
}
'

# delete all indexes and aliases
$ curl -XDELETE 'http://127.0.0.1:9200/*/'

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-aliases.html

Update index settings

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-put-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-update-settings.html
https://gist.github.com/nicolashery/6317643