Transaction isolation levels in MySQL

Transaction isolation levels in MySQL

Dirty reads 髒讀

Transaction 1 讀到了 Transaction 2 的未提交的值
READ COMMITED 可以避免這個問題
READ UNCOMMITTED 會有這個問題

A dirty read occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed.

Non-repeatable reads 不可重複讀

在同一個 Transaction 裡 不同的時間點 讀取了同一個 row 卻得到了不同的值(但是可能使用了不同的 SELECT 語句去讀到同一個 row)
REPEATABLE READ 可以避免這個問題
READ COMMITED 會有這個問題

A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.

When Transaction 2 commits successfully, which means that its changes to the row with id 1 should become visible. However, Transaction 1 has already seen a different value for age in that row. At the SERIALIZABLE and REPEATABLE READ isolation levels, the DBMS must return the old value for the second SELECT. At READ COMMITTED and READ UNCOMMITTED, the DBMS may return the updated value; this is a non-repeatable read.

Phantom reads 幻讀

有點類似 Non-repeatable reads
在同一個 Transaction 裡 不同的時間點 兩個相同但是先後執行的 SQL 卻得到了不同的結果
差別在於 Phantom reads 專門指沒有用 range locks 時
先後兩個同樣的 SELECT 卻返回了不同數量的結果
SERIALIZABLE 可以避免以上的三種問題

也可以說 non-repeatable reads 的原因是 UPDATE
而 phantom reads 的原因則是 INSERT 和 DELETE

A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.

READ UNCOMMITTED

允許 dirty reads
可能會讀到其他 transaction 尚未提交的數據

READ COMMITED

不會有 dirty reads
只會讀到其他 transaction 已經提交的數據
但是允許 non-repeatable reads

大部份資料庫默認的 isolation level 都是 READ COMMITED

REPEATABLE READ

不會有 dirty reads 和 non-repeatable reads
但是允許 phantom reads

MySQL 默認的 isolation level 是 REPEATABLE READ
但是 InnoDB 透過 next-key lock 也防止了 phantom reads

SERIALIZABLE

不會有 dirty reads、non-repeatable reads 和 phantom reads
讀寫互斥
所以讀和寫都只能一個一個依序執行

ref:
https://www.percona.com/blog/2012/08/28/differences-between-read-committed-and-repeatable-read-transaction-isolation-levels/

AWS DynamoDB Notes

AWS DynamoDB Notes

AWS DynamoDB is a fully managed key-value store (also document store) NoSQL database as a service provided by Amazon Web Services. Its pricing model is that you only pay for the throughput (read and write) you use instead of the storage usage and the running hours of database instances.

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html
http://www.slideshare.net/AmazonWebServices/design-patterns-using-amazon-dynamodb

Glossary

DynamoDB is schema-less.

  • table: a table is a collection of items.
  • item: an item is a collection of attributes (key-value pairs).
  • attribute: attribute is similar to fields or columns in other databases.
  • primary key: one or two attributes that can uniquely identify every item in a table.
    • partition key (aka hash key): a simple primary key, composed of one attribute.
    • partition key and sort key (aka range key): a composite primary key, composed of two attributes.

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html

Global Secondary Index (GSI)

secondary index 指的是除了 primary key 之外的第二組 key
可以有很多組 secondary index
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

GSI 可以用在是 partition key 或 partition + sort key 的 table
GSI 跟 primary key 一樣可以 simple 或是 composite 的
GSI 可以隨時增減

如果你不需要 strong consistency 或個別 partition 的資料量大於 10GB
那就用 GSI

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
http://iamgarlic.blogspot.tw/2015/01/amazon-dynamodb-global-secondary-index.html

Local Secondary Index (LSI)

LSI 只能用在是 partition + sort key 的 table
LSI 必須用原本的 partition key 搭配其他 attribute 做為新的 partition + sort key(LSI 只會是 composite 的)
LSI 只能在建立 table 的時候定義

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html
http://iamgarlic.blogspot.tw/2015/01/amazon-dynamodb-local-secondary-index.html

Query and Scan

能不用 scan 就不用
畢竟這個操作就是去掃 table 裡的所有 item

primary key 和 local secondary index 只能在建立 table 時指定
一旦建立就不能改了
但是 global secondary index 就沒有這個限制

如果是用 partition + sork key 當 primary key
get 的時候要同時給 partition key 和 sort key
query 的時候可以只給 partition key 而 sort key 可給可不給(但是 partition key 一定要給)

無論是當 primary key、GSI 或 LSI
只要是 partition key 的 attribute 一律只能使用 = 來 query
該 attribute 沒有 rich query 的能力(就是 >, <, between, contains 那些條件)
sort key 才會有 rich query

Best Practices
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html

Choosing a Partition Key
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html

Querying DynamoDB by date
http://stackoverflow.com/questions/14836600/querying-dynamodb-by-date

Pick an item randomly
http://stackoverflow.com/questions/10666364/aws-dynamodb-pick-a-record-item-randomly

ref:
https://www.uplift.agency/blog/posts/2016/03/clearcare-dynamodb
https://medium.com/building-timehop/one-year-of-dynamodb-at-timehop-f761d9fe5fa1#.3g97b3lqy

Commands

DynamoDB is schema-less, so that you can only define keys you need for specifying primary key or local secondary index when creating table.

# 可以用 project name 作為 table name 的 prefix
# 之後可以隨時修改 read / write capacity units
$ aws dynamodb create-table \
--table-name CodeTengu_Preference \
--attribute-definitions AttributeName=name,AttributeType=S \
--key-schema AttributeName=name,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

$ aws dynamodb create-table \
--table-name CodeTengu_WeeklyIssue \
--attribute-definitions AttributeName=number,AttributeType=N AttributeName=publication,AttributeType=S AttributeName=publishedAt,AttributeType=N \
--key-schema AttributeName=number,KeyType=HASH \
--global-secondary-indexes IndexName=publication_published_at,KeySchema='[{AttributeName=publication,KeyType=HASH},{AttributeName=publishedAt,KeyType=RANGE}]',Projection='{ProjectionType=ALL}',ProvisionedThroughput='{ReadCapacityUnits=5,WriteCapacityUnits=5}' \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

$ aws dynamodb create-table \
--table-name CodeTengu_WeeklyPost \
--attribute-definitions AttributeName=issueNumber,AttributeType=N AttributeName=id,AttributeType=N  AttributeName=categoryCode,AttributeType=S \
--key-schema AttributeName=issueNumber,KeyType=HASH AttributeName=id,KeyType=RANGE \
--global-secondary-indexes IndexName=categoryCode_id,KeySchema='[{AttributeName=categoryCode,KeyType=HASH},{AttributeName=id,KeyType=RANGE}]',Projection='{ProjectionType=ALL}',ProvisionedThroughput='{ReadCapacityUnits=5,WriteCapacityUnits=5}' \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

ref:
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/create-table.html
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/update-table.html

$ aws dynamodb put-item \
--table-name CodeTengu_Preference \
--item file://fixtures/curated_api_config.json \
--return-consumed-capacity TOTAL

# fixtures/curated_api_config.json
{
  "name": { "S": "curated_api_config" },
  "apiKey": { "S": "xxx" }
}

ref:
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/put-item.html

$ aws dynamodb get-item \
--table-name CodeTengu_WeeklyIssue \
--key '{"number": {"N": "42"}}'

ref:
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/get-item.html

Usage

你應該用 AWS.DynamoDB.DocumentClient
而不是直接用 AWS.DynamoDB

const AWS = require('aws-sdk');

const dynamodb = new AWS.DynamoDB({ apiVersion: '2012-08-10', region: 'ap-northeast-1' });
const dynamodbClient = new AWS.DynamoDB.DocumentClient({ service: dynamodb });

const params = {
  RequestItems: {
    CodeTengu_Preference: {
      Keys: [
        { name: 'xxx' },
      ],
    },
  },
};

dynamodbClient.batchGet(params, (err, data) => {
  if (err) {
    console.log('fail');
    console.log(err);
  } else {
    console.log('success');
    console.log(data);
  }
});

ref:
http://aws.amazon.com/sdk-for-node-js/
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html

完整的程式碼放在 GitHub 上
https://github.com/CodeTengu/lambdabaku

Database Routers in Django

Database Routers in Django

把部分 models / tables 獨立到一台資料庫

Database Router

沒辦法在 Model class 裡指定這個 model 只能用某個 database
而是要用 database router
就是判斷 model._meta.app_label == 'xxx' 的時候指定使用某一個 database
database 是指定義在 settings.DATABASES 的那些

不過 django 不支援跨 database 的 model relation
你不能用 foreign key 或 m2m 指向另一個 database 裡的 model
但是其實你直接用 user_id, song_id 之類的 int 欄位來記錄就好了

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'default',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
    'warehouse': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'warehouse',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
}

DATABASE_ROUTERS = [
    'app.db_routers.SVDatabaseRouter',
]

in app/db_routers.py

from oauth2_provider.models import AccessToken as OauthAccessToken
from oauth2_provider.oauth2_backends import get_oauthlib_core
from rest_framework2.authentication import BaseAuthentication

class SVOauthAuthentication(BaseAuthentication):
    www_authenticate_realm = 'api'

    def authenticate(self, request):
        oauthlib_core = get_oauthlib_core()
        valid, result = oauthlib_core.verify_request(request, scopes=[])

        if valid:
            return (result.user, result.access_token)
        else:
            access_token = request.GET.get('access_token', None)
            if access_token:
                try:
                    access_token_obj = OauthAccessToken.objects.get(token=access_token)
                except OauthAccessToken.DoesNotExist:
                    pass
                else:
                    return (access_token_obj.user, access_token_obj)

        return None

    def authenticate_header(self, request):
        return 'Bearer realm="{0}"'.format(self.www_authenticate_realm)

ref:
https://docs.djangoproject.com/en/dev/topics/db/multi-db/

Models

把所有需要放到另一台 db 的 models 都放在同一個 app 下
方便管理

in warehouse/models.py

class PlayRecord(SVWarehouseDBMixin, models.Model):
    song_id = models.IntegerField()
    user_id = models.IntegerField(null=True, blank=True)
    is_full = models.BooleanField(default=False)
    ip_address = models.IPAddressField()
    location = models.CharField(max_length=2)
    created_at = models.DateTimeField()

繼承 SVWarehouseDBMixin 這個 minxin 的 class 會被放到 warehouse 這台資料庫!
所以 migrate 的時候要注意,記得在 migration 檔案裡加上:

from south.db import dbs
warehouse_db = dbs['warehouse']

這樣 south 才會在 warehouse 這台資料庫上建立 table
不然就是你自己手動去 CREATE TABLE

Migration

如果你要把舊有的 app 的 models 搬到另一台資料庫
但是 models 不動(還是放在本來的 app 底下)
你可能會需要 reset 整個 migration 紀錄
從頭開始建立一個新的 migration
因為 schema 會錯亂
所以還是建議新開一個 app 來放那些要搬到另一台資料庫的 models
這樣 database router 和 migration 都會比較單純

in app/migrations/0001_initial.py

$ pip install south==1.0.2

# syncdb 默認只會作用到 default 資料庫,你要明確指定要用哪個 database 才行
$ ./manage.py syncdb --noinput
$ ./manage.py syncdb --noinput --database=warehouse

# migrate 卻可以作用到其他資料庫
# 因為 migrate 哪個資料庫是 migration file 裡的 `db` 參數在決定的
$ ./manage.py migrate

ref:
http://stackoverflow.com/questions/7029228/is-using-multiple-databases-and-south-together-possible

Unit Tests

Django only flushes the default database at the start of each test run. If your setup contains multiple databases, and you have a test that requires every database to be clean, you can use the multi_db attribute on the test suite to request a full flush.

在使用 --keepdb 的情況下,如果你的測試執行到一半就因為錯誤而中斷了,可能會發生資料庫裡有資料還沒被 flush 的問題,導致下次執行測試時失敗。不過如果你沒有用 --keepdb 的話,因為每次都會重建資料庫,所以不會有這個問題。

from django.test import TestCase

class YourBTestCase(TestCase):
    multi_db = True

    def setUp(self):
        do_shit()

ref:
https://docs.djangoproject.com/en/dev/topics/testing/tools/
https://docs.djangoproject.com/en/dev/topics/testing/advanced/#topics-testing-advanced-multidb

elasticsearch-dsl-py: The Official Elasticsearch ORM in Python

elasticsearch-dsl-py: The Official Elasticsearch ORM in Python

Query DSL 是 Elasticsearch 的查詢用 Domain-specific Language (DSL),實際上就是一堆 JSON。elasticsearch-dsl 是官方發佈的一套用來操作 Query DSL 的 Python package,可以當成是 Elasticsearch 的 ORM。

希望之後可以直接支援用 SQL 來查詢,不然 Query DSL 真的有夠難寫。

ref:
https://github.com/elastic/elasticsearch-dsl-py

Installation

$ pip install elasticsearch-dsl>=5.0.0,<6.0.0

ref:
https://elasticsearch-dsl.readthedocs.org/en/latest/index.html

Schema

in app/mappings.py

from elasticsearch_dsl import DocType, String, Boolean
from elasticsearch_dsl.connections import connections
connections.create_connection(hosts=['127.0.0.1', ])

class AlbumDoc(DocType):
    upc = String(index='not_analyzed')
    title = String(analyzer='ik', fields={'raw': String(index='not_analyzed')})
    artist = String(analyzer='ik')
    is_ready = Boolean()

    class Meta:
        index = 'dps'
        doc_type = 'album'

    @classmethod
    def sync(cls, album):
        album_doc = AlbumDoc(meta={'id': album.id})
        album_doc.upc = album.get_upcs(output_str=False)
        album_doc.title = album.name
        album_doc.artist = album.artist.name
        album_doc.is_ready = album.is_ready
        album_doc.save()

    def save(self, *args, **kwargs):
        return super(AlbumDoc, self).save(*args, **kwargs)

    def get_model_obj(self):
        from svapps.dps.models import Album
        return Album.objects.get(id=self.meta.id)

# to create mappings
AlbumDoc.init()

一定要執行一次 YourDocType.init(),這樣 Elasticsearch 才會根據你的 DocType 產生對應的 mapping。否則 Elasticsearch 就會在你第一次倒資料進去的時候根據你的資料的 data type 建立對應的 mapping,所以 analyzer 之類的設定就會是預設的 standard,你可以透過 _mapping API 來檢查。

需要全文搜尋的欄位要設為 analyzed(string 欄位默認都是 analyzed),不需要全文搜尋的欄位,也就是要求精確的欄位,例如:usernameemailzip code,就可以設成 not_analyzed,但是你就不能對 analyzed 的欄位使用 term 了,除非你對該欄位額外再建立一個 raw 欄位。

ref:
https://elasticsearch-dsl.readthedocs.org/en/latest/persistence.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#CO59-2

Store Data

album_doc = AlbumDoc(meta={'id': 42})
album_doc.upc = ['887375000619', '887375502069']
album_doc.title = 'abc'
album_doc.artist = 'xyz'
album_doc.is_ready = True
album_doc.save()

# 可以如常地 query,不用管它是不是 list
search = AlbumDoc.search().filter('term', upc='887375000619')
response = search.execute()

因為 Elasticsearch 是 schemaless,所以即使你定義了 String 欄位,還是可以存一個 list 進去。

Search Data

  • must:必須符合所有條件
  • should:符合其中一個條件即可
search = TrackDoc.search() \
    .filter('term', is_ready=True) \
    .query('match', title=u'沒有的啊')

search = TrackDoc.search() \
    .filter('term', is_ready=True) \
    .query(
        Q('match', title='沒有的啊') & \
        Q('match', artist='那我懂你意思了') & \
        Q('match', album='沒有的, 啊!?')
    )

q = Q(
    'bool',
    must=[
        Q('match', title={'query': track_name, 'fuzziness': 'AUTO'}),
    ],
    should=[
        Q('match', album={'query': album_name, 'minimum_should_match': '60%'}),
        Q('match', artist={'query': artist_name, 'minimum_should_match': '80%'}),
    ],
    minimum_should_match=1
)
search = TrackDoc.search().filter('term', is_ready=True).query(q)

q = Q(
    'bool',
    should=[
        Q('term', isrc=q),
        Q('term', upc=q),
        Q('match', **{'title.raw': {'query': q}}),
        Q('multi_match', query=q, fields=['title', 'artist', 'album']),
    ],
)
search = Search(index='dps', doc_type=['track', 'album']).query(q)
search = search[:20]

# print the raw Query DSL
import uniout
from pprint import pprint
pprint(search.to_dict())

response = search.execute()

print(response.hits.total)
print(response[0].title)
print(response[0].artist)
print(response[0].album)
print(response[0].is_ready)

ref:
https://elasticsearch-dsl.readthedocs.org/en/latest/search_dsl.html

Elasticsearch: More than a Search Engine

Elasticsearch is a schemaless, document-oriented search engine, has a bunch of powerful quering APIs. It's also a great NoSQL database.

ref:
https://www.elastic.co/products/elasticsearch

Glossary

以下的定義以 Elasticsearch 5.6 為準,可能跟舊版的定義不同。在新版的 Elasticsearch 中,每個 index 只能有一個 mapping type,之前的版本則可以有多個。

  • cluster:一個 cluster 包含一個或多個 nodes,會自動選出一個 master node
  • node:一台跑著 Elasticsearch 的機器就是一個 node
  • index:類似關聯式資料庫裡的 table
  • mapping:類似關聯式資料庫裡的 table schema
  • field:類似關聯式資料庫裡的 column
  • document:類似關聯式資料庫裡的 row
  • text:任意的非結構化文字,text 會被 analyze 變成 term,然後才能被搜尋
  • term:實際上存在 Elasticsearch 裡的東西
  • analysis: 把 text 變成 term 的過程,例如 normalize、tokenize 和 stopword remove

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html

Mapping

Mapping 可以顯式地指定,但是如果沒有指定,Elasticsearch 就會在第一次有資料進去的時候,自動根據資料建立對應的 mapping,所以一些欄位的屬性(例如 analyzer)可能不會符合你的預期,所以最好還是手動指定 mapping。

常用的 field data types:

  • text:表示 string 類型,用於 full-text search
  • keyword:表示 string 類型,用於 exact value 的 filter、sort 或 aggregate(就是舊版的 not_analyzed

在 Elasticsearch 中,同一個欄位可以被 index 成不同的 data types,例如 location 欄位,可以透過 fields 屬性,同時 index 成 textkeyword,用來全文搜索和 exact value 過濾。也可以分別指定不同的 analyzer。

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

Analysis

一個 analyzer 由三個部分組成:

  • Character filters
  • Tokenizers
  • Token filters

你可以自己組合出你的 analyzer,以 elasticsearch-dsl-py 為例:

from elasticsearch_dsl import DocType, Date, Integer, Keyword, Text, Boolean
from elasticsearch_dsl import analyzer, tokenizer

text_analyzer = analyzer('text_analyzer',
    char_filter=["html_strip"],
    tokenizer="standard",
    filter=["asciifolding", "lowercase", "snowball", "stop"]
)

cjk_analyzer = analyzer('text_analyzer',
    char_filter=["html_strip"],
    tokenizer=tokenizer('trigram', 'nGram', min_gram=2, max_gram=3),
    filter=["asciifolding", "lowercase", "snowball", "stop"]
)

ref:
http://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html

Testing Analyzer

測試某段文字在某個 analyzer 下的效果:

POST http://127.0.0.1:9200/_analyze
{
  "tokenizer": "standard",
  "filter": ["lowercase", "asciifolding"],
  "text": "Is this déja vu?"
}

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html

Chinese Words Segmentation

ik 之類的分詞 plugin 的效果都不是很好,內建的 cjk 加上 NGram 可能會是比較好的選擇(可以用 multi-field index)。另外一個作法是,把資料餵進去 Elasticsearch 之前就先分好詞,可以用 Jieba,分詞完的文本以空格分隔,然後用 Elasticsearch 的 whitespace tokenizer。

中文搜尋經驗分享
https://blog.liang2.tw/2015Talk-Chinese-Search/

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#cjk-analyzer
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html << 不推薦
https://github.com/medcl/elasticsearch-analysis-ik << 堪用

RESTful APIs

Show useful information for humans
http://127.0.0.1:9200/_cat
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html

List all indices and aliases
http://127.0.0.1:9200/_aliases

List mappings under am index
http://127.0.0.1:9200/repo/_mapping

List documents under an index
http://127.0.0.1:9200/repo/_search

Query DSL

query 就是你要搜索的主體
filter 則是這個搜索的前置條件
https://www.elastic.co/guide/en/elasticsearch/guide/master/search-in-depth.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filters.html

要做 exact value 的 query
請用 term
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

要做 full text 的 query
請用 match
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

要一次 query 多個欄位
請用 multi_match
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

要用 AND (must), OR (should), NOT (must_not) 的條件搜索
請用 bool
https://www.elastic.co/guide/en/elasticsearch/guide/master/bool-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

要結合 filter 和 query
請用 filtered
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

More Like This query
除了可以輸入文字之外,還可以直接指定 document id 找出相似的結果
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

對特定欄位加權
https://www.elastic.co/guide/en/elasticsearch/guide/current/query-time-boosting.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html

Multi-index, Multi-type

除了可以搜尋單一 type,也可以跨 index、跨 type

  • /_search: Search all types in all indices
  • /gb/_search: Search all types in the gb index
  • /gb,us/_search: Search all types in the gb and us indices
  • /g*,u*/_search: Search all types in any indices beginning with g or beginning with u
  • /gb/user/_search: Search type user in the gb index
  • /gb,us/user,tweet/_search: Search types user and tweet in the gb and us indices
  • /_all/user,tweet/_search: Search types user and tweet in all indices

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-multi-index-type