All Posts Tagged “search”

Change index mappings with zero downtime using elasticsearch-py

Basically you can't change mappings (so-called "schema") in Elasticsearch. You may add fields free but changing field definitions (field types or analyzers) of mappings is impossible. One way or another, you need to create a new index.


  • Create an alias my_index which points to the old index my_index_v1
  • Use my_index instead of my_index_v1 in your application
  • Create a new index my_index_v2 with new mappings
  • Transfer documents from old index to new index - a.k.a. reindex
  • Associate the alias my_index with index my_index_v2
  • Delete the old index my_index_v1
from datetime import datetime

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

es_client = Elasticsearch(hosts=settings.ES_HOSTS)

# make sure that this alias doesn't conflict with any existing index name
alias = 'packer'

# CAUTION: if you have an index already, you should create an alias for it first
# es_client.indices.put_alias(index='your_current_index', name=alias)

old_indexes = list(es_client.indices.get_alias(alias).keys())
    old_index = old_indexes[0]
except IndexError:
    old_index = None
    if len(old_indexes) > 1:
        raise RuntimeError('Alias `{0}` points to {1} indexes that may cause error when writing data to `{0}`'.format(alias, len(old_indexes)))

new_index = '{}_{}'.format(alias,'%Y%m%d%H%M%S%f'))

available_types = [TrackDoc, AlbumDoc]
for my_doc_type in available_types:
    # create a new index with new mappings

if old_index:
    # transfer documents from old index to new index
    reindex(es_client, source_index=old_index, target_index=new_index)

        'actions': [
            {'remove': {'index': old_index, 'alias': alias}},
            {'add': {'index': new_index, 'alias': alias}},
        'actions': [
            {'add': {'index': new_index, 'alias': alias}},


An alias can point to multiple indexes, in that case, reading (searching) from the alias performs perfectly, writing (indexing) to the alias raises an exception: Alias [my_index] has more than one indices associated with it [[my_index_v1, my_index_v2]], can't execute a single index op.

It's not recommended to set the same alias for multiple indexes unless explicitly using a specific index for writing data.


Create aliases

# list all indexes and their aliases
$ curl ''

# create an alias
$ curl -XPOST '' -d '
    "actions" : [
        { "add" : { "index" : "dps", "alias" : "packer" } }

# delete all indexes and aliases
$ curl -XDELETE '*/'


Update index settings


Use elasticsearch-dsl with Python

Query DSL 是 Elasticsearch 的查詢用 Domain-specific Language
可以當成是 Elasticsearch 的 SQL
只不過它實際上就是一堆 JSON
elasticsearch-dsl 就是官方發佈的一套用來操作 Query DSL 的 Python package
用起來有點像 Django 的 ORM



$ pip install elasticsearch-dsl


Indice and Types

in app/

from elasticsearch_dsl import DocType, String, Boolean
from elasticsearch_dsl.connections import connections
connections.create_connection(hosts=['', ])

class AlbumDoc(DocType):
    upc = String(index='not_analyzed')
    title = String(analyzer='ik', fields={'raw': String(index='not_analyzed')})
    artist = String(analyzer='ik')
    is_ready = Boolean()

    class Meta:
        index = 'dps'
        doc_type = 'album'

    def sync(cls, album):
        album_doc = AlbumDoc(meta={'id':})
        album_doc.upc = album.get_upcs(output_str=False)
        album_doc.title =
        album_doc.artist =
        album_doc.is_ready = album.is_ready

    def save(self, *args, **kwargs):
        return super(AlbumDoc, self).save(*args, **kwargs)

    def get_model_obj(self):
        from svapps.dps.models import Album
        return Album.objects.get(

# to create mappings

一定要執行一次 YourDocType.init()
這樣 Elasticsearch 才會根據你的 DocType 產生對應的 mapping
否則 Elasticsearch 就會在你第一次倒資料進去的時候根據你的資料的 data type 建立對應的 mapping
所以 analyzer 之類的設定就會是預設的 standard
你可以透過 _mapping API 來檢查

需要全文搜尋的欄位要設為 analyzed(string 欄位默認都是 analyzed)
不需要全文搜尋的欄位,也就是要求精確的欄位,例如:username、email address、zip code,就可以設成 not_analyzed
但是你就不能對 analyzed 的欄位使用 term 了
除非你對該欄位額外再建立一個 raw 欄位


Store Data

album_doc = AlbumDoc(meta={'id': 42})
album_doc.upc = ['887375000619', '887375502069']
album_doc.title = 'abc'
album_doc.artist = 'xyz'
album_doc.is_ready = True

# 可以如常地 query,不用管它是不是 list
search ='term', upc='887375000619')
response = search.execute()

因為 Elasticsearch 是 schemaless
所以即使你定義了 String 欄位
還是可以存一個 list 進去

Search Data

search = \
    .filter('term', is_ready=True) \
    .query('match', title=u'沒有的啊')

search = \
    .filter('term', is_ready=True) \
        Q('match', title='沒有的啊') & \
        Q('match', artist='那我懂你意思了') & \
        Q('match', album='沒有的, 啊!?')

q = Q(
        Q('match', title={'query': track_name, 'fuzziness': 'AUTO'}),
        Q('match', album={'query': album_name, 'minimum_should_match': '60%'}),
        Q('match', artist={'query': artist_name, 'minimum_should_match': '80%'}),
search ='term', is_ready=True).query(q)

q = Q(
        Q('term', isrc=q),
        Q('term', upc=q),
        Q('match', **{'title.raw': {'query': q}}),
        Q('multi_match', query=q, fields=['title', 'artist', 'album']),
search = Search(index='dps', doc_type=['track', 'album']).query(q)
search = search[:20]

# print the raw Query DSL
import uniout
from pprint import pprint

response = search.execute()



Elasticsearch notes

Elasticsearch is a schemaless, document-oriented search engine, has a bunch of powerful quering APIs. It's also a pretty good NoSQL database.



  • cluster: 一個 cluster 會包含一個或多個 nodes
  • node: 一台 server 就是一個 node
  • index: 有點類似 RDBMS 裡的 database 的概念,嚴格來說 index 只是一個 namespace
  • shard: 每個 index 會被分割成多個 shards 放到不同的 node 上。每個 shard 還分為 primary 和 replica
  • type: 類似 RDBMS 裡的 table
  • document: 類似 RDBMS 裡的 row,document 實際上是儲存在一個個的 shard 裡
  • field: 類似 RDBMS 裡的 column
  • mapping: how the data in each field is interpreted 類似 RDBMS 裡的 table schema
  • analysis: how full text is processed to make it searchable


Mapping (Schema)

Data in Elasticsearch can be broadly divided into two types: exact values and full text.

  • Exact values are exactly what they sound like. Examples are a date or a user ID, but can also include exact strings such as a username or an email address. The exact value Foo is not the same as the exact value foo. The exact value 2014 is not the same as the exact value 2014-09-15.
  • Full text, on the other hand, refers to textual data—usually written in some human language — like the text of a tweet or the body of an email.

Elasticsearch 就會根據資料的 data type 自動建立對應的 mapping
但是一些欄位的屬性(例如 analyzer)可能不會符合你的預期
所以建議你還是自己手動建立 mapping 比較好

某些欄位你平常搜尋的時候希望是 full text
但是在 aggregation 時又希望是 exact value
這時候你可以新增一個 raw 欄位來達成


show useful info for humans

list all indices and their aliases

list types under a index

list all documents under a type

get mapping for an index or type


Query DSL

可以分成 query 和 filter
query 就是你要搜索的主體
filter 則是這個搜索的前置條件

要做 exact value 的 query
請用 term

要做 full text 的 query
請用 match

要一次 query 多個欄位
請用 multi_match

要用 AND (must), OR (should), NOT (must_not) 的條件搜索
請用 bool

要結合 filter 和 query
請用 filtered(通常都會用這個)


Multi-index, Multi-type

除了可以搜尋單一 type,也可以跨 index、跨 type

  • /_search Search all types in all indices
  • /gb/_search Search all types in the gb index
  • /gb,us/_search Search all types in the gb and us indices
  • /g*,u*/_search Search all types in any indices beginning with g or beginning with u
  • /gb/user/_search Search type user in the gb index
  • /gb,us/user,tweet/_search Search types user and tweet in the gb and us indices
  • /_all/user,tweet/_search Search types user and tweet in all indices





Elasticsearch 中文分詞 analysis plugin: ik

$ pip install httpie

# test.txt 的內容就是你要分詞的文字
$ http < test.txt