Query DSL 是 Elasticsearch 的查詢用 Domain-specific Language (DSL),實際上就是一堆 JSON。elasticsearch-dsl
是官方發佈的一套用來操作 Query DSL 的 Python package,可以當成是 Elasticsearch 的 ORM。
希望之後可以直接支援用 SQL 來查詢,不然 Query DSL 真的有夠難寫。
ref:
https://github.com/elastic/elasticsearch-dsl-py
Installation
$ pip install elasticsearch-dsl>=5.0.0,<6.0.0
ref:
https://elasticsearch-dsl.readthedocs.org/en/latest/index.html
Schema
in app/mappings.py
from elasticsearch_dsl import DocType, String, Boolean
from elasticsearch_dsl.connections import connections
connections.create_connection(hosts=['127.0.0.1', ])
class AlbumDoc(DocType):
upc = String(index='not_analyzed')
title = String(analyzer='ik', fields={'raw': String(index='not_analyzed')})
artist = String(analyzer='ik')
is_ready = Boolean()
class Meta:
index = 'dps'
doc_type = 'album'
@classmethod
def sync(cls, album):
album_doc = AlbumDoc(meta={'id': album.id})
album_doc.upc = album.get_upcs(output_str=False)
album_doc.title = album.name
album_doc.artist = album.artist.name
album_doc.is_ready = album.is_ready
album_doc.save()
def save(self, *args, **kwargs):
return super(AlbumDoc, self).save(*args, **kwargs)
def get_model_obj(self):
from svapps.dps.models import Album
return Album.objects.get(id=self.meta.id)
# to create mappings
AlbumDoc.init()
一定要執行一次 YourDocType.init()
,這樣 Elasticsearch 才會根據你的 DocType 產生對應的 mapping。否則 Elasticsearch 就會在你第一次倒資料進去的時候根據你的資料的 data type 建立對應的 mapping,所以 analyzer 之類的設定就會是預設的 standard
,你可以透過 _mapping
API 來檢查。
需要全文搜尋的欄位要設為 analyzed
(string 欄位默認都是 analyzed),不需要全文搜尋的欄位,也就是要求精確的欄位,例如:username
、email
、zip code
,就可以設成 not_analyzed
,但是你就不能對 analyzed 的欄位使用 term
了,除非你對該欄位額外再建立一個 raw
欄位。
ref:
https://elasticsearch-dsl.readthedocs.org/en/latest/persistence.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#CO59-2
Store Data
album_doc = AlbumDoc(meta={'id': 42})
album_doc.upc = ['887375000619', '887375502069']
album_doc.title = 'abc'
album_doc.artist = 'xyz'
album_doc.is_ready = True
album_doc.save()
# 可以如常地 query,不用管它是不是 list
search = AlbumDoc.search().filter('term', upc='887375000619')
response = search.execute()
因為 Elasticsearch 是 schemaless,所以即使你定義了 String 欄位,還是可以存一個 list 進去。
Search Data
must
:必須符合所有條件should
:符合其中一個條件即可
search = TrackDoc.search() \
.filter('term', is_ready=True) \
.query('match', title=u'沒有的啊')
search = TrackDoc.search() \
.filter('term', is_ready=True) \
.query(
Q('match', title='沒有的啊') & \
Q('match', artist='那我懂你意思了') & \
Q('match', album='沒有的, 啊!?')
)
q = Q(
'bool',
must=[
Q('match', title={'query': track_name, 'fuzziness': 'AUTO'}),
],
should=[
Q('match', album={'query': album_name, 'minimum_should_match': '60%'}),
Q('match', artist={'query': artist_name, 'minimum_should_match': '80%'}),
],
minimum_should_match=1
)
search = TrackDoc.search().filter('term', is_ready=True).query(q)
q = Q(
'bool',
should=[
Q('term', isrc=q),
Q('term', upc=q),
Q('match', **{'title.raw': {'query': q}}),
Q('multi_match', query=q, fields=['title', 'artist', 'album']),
],
)
search = Search(index='dps', doc_type=['track', 'album']).query(q)
search = search[:20]
# print the raw Query DSL
import uniout
from pprint import pprint
pprint(search.to_dict())
response = search.execute()
print(response.hits.total)
print(response[0].title)
print(response[0].artist)
print(response[0].album)
print(response[0].is_ready)
ref:
https://elasticsearch-dsl.readthedocs.org/en/latest/search_dsl.html