碼天狗週刊 第 100 期 @vinta - Apache Spark, Scala, Machine Learning, Feature Engineering, MySQL

碼天狗週刊 第 100 期 @vinta - Apache Spark, Scala, Machine Learning, Feature Engineering, MySQL

本文同步發表於 CodeTengu Weekly - Issue 100

Big Data Analysis with Scala and Spark

因為前陣子辭職了(想放個長假吶~),突然多了不少時間,所以決定在打電動之餘,花點時間上幾門 Coursera 的課。然後花了一個禮拜終於完成這門課啦!主題是 Scala 和 Spark,是 Functional Programming in Scala 系列課程的最後一門課(這個系列有一門課的老師就是 Scala 的發明者 Martin Odersky)。之前學東西都習慣看書,這次第一次在 Coursera 完整地上完一門課,老實說是個很棒的體驗啊。尤其是在寫第一個程式作業的時候,上面標註說大概要花 3 小時,結果我寫了一個下午哈哈哈。雖然上手之後,後來的作業其實很快就做完了。但是還是忍不住想抱怨一下:Spark 的 Dataset typed API 寫起來也太麻煩,而且效能還沒有比較好。

題外話,雖然還沒那麼快要開始找工作,不過因為很閒,這陣子新認識了不少工程師同業,交流了很多技術經驗(和業界八卦 XD),感覺挺不錯的啊,所以想說如果各位朋友或公司有興趣,歡迎聯絡我,咱們可以約個時間吃個飯 ?

題外話之二,說到打電動,跟大家分享一下,12 月的時候 PC/PS4/Xbox One 會出「大神 絕景版」,這款遊戲可是 PS2 時代不朽的名作之一啊,如果你還沒有玩過,拜託玩一下。然!後!登登!小島秀夫的傑作之一 Anubis: Z.O.E. 也要重製啦!

Rules of Machine Learning: Best Practices for ML Engineering

不得了啊,這份文件,有在搞機器學習的人,這禮拜讀這篇就夠啦。這份文件的作者是 Google 的 Research Scientist,歸納了 43 條搭建一個 Machine Learning 系統的最佳實踐,而且很多都是從實務和軟體工程角度的經驗總結,這種知識特別寶貴啊。老實說我覺得這也是軟體工程師在這一波 AI 浪潮中可以施力的點,因為任何的機器學習系統或產品,宏觀一點來看,它就是一個軟體工程問題。

Most of the problems you will face are, in fact, engineering problems.

延伸閱讀(RecsysChina 的前輩對這篇文章的評註):

Feature Engineering - Getting most out of data for predictive models

這份簡報很豐富,除了幾乎把我前陣子才讀完的 Mastering Feature Engineering 書裡的內容都講完了之外,也提到很多特別的作法。而且看到最後一頁才發現,原來簡報的作者也讀過這本書啊。

是說我這一陣子實際搗鼓了一番 Feature Engineering 之後的感想,特徵工程根本就是手藝活,講求的其實是創造力啊。

延伸閱讀:

What happens when your application cannot open yet another connection to MySQL

之前工作的時候,要把 Spark 運算完的推薦系統的結果寫回 MySQL 資料庫,結果卻遇到了 ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.x.x' (99) 的錯誤,很多人應該都見過類似的錯誤訊息,不過括號裡的數字可能不一樣。在這個例子中,單看文字的描述很容易讓人誤會是 MySQL server 出了什麼差錯,但是其實關鍵在於最後那個括號裡的 OS error code。這篇文章把造成這個錯誤的前因後果說得非常清楚,值得一讀。

簡單說,用 perror 99 可以查到這個 error code 是 Cannot assign requested address 的意思,原來是因為我在寫 Spark 的時候太放肆,不小心在短時間內開了太多的 connection,把 local 的 port 用完了(需要 TIME_WAIT 冷卻時間),所以沒有辦法分配 port 給新的 MySQL connection。解決的辦法:在程式裡重用 MySQL connection 或是限制一下 concurrent 數,再不然就是修改 net.ipv4.tcp_tw_reuse = 1 系統設定。

忍不住提一下,其實 master/slave 架構的 MySQL 還是很罩啊,當初ㄙㄨㄚˋ地一下在幾分鐘內寫了四十幾億筆資料進去,MySQL 跟沒事一樣。原本還ㄏㄧㄠˊ咖稱想用 Cassandra,結果發現在現階段用 MySQL 就頂得住啦。

延伸閱讀:

MySQL system error codes

MySQL system error codes

Print all OS error codes and MySQL error codes using the perror command.

$ for i in {1..190..1}; do perror "$i"; done

OS error code   1:  Operation not permitted
OS error code   2:  No such file or directory
OS error code   3:  No such process
OS error code   4:  Interrupted system call
OS error code   5:  Input/output error
OS error code   6:  No such device or address
OS error code   7:  Argument list too long
OS error code   8:  Exec format error
OS error code   9:  Bad file descriptor
OS error code  10:  No child processes
OS error code  11:  Resource temporarily unavailable
OS error code  12:  Cannot allocate memory
OS error code  13:  Permission denied
OS error code  14:  Bad address
OS error code  15:  Block device required
OS error code  16:  Device or resource busy
OS error code  17:  File exists
OS error code  18:  Invalid cross-device link
OS error code  19:  No such device
OS error code  20:  Not a directory
OS error code  21:  Is a directory
OS error code  22:  Invalid argument
OS error code  23:  Too many open files in system
OS error code  24:  Too many open files
OS error code  25:  Inappropriate ioctl for device
OS error code  26:  Text file busy
OS error code  27:  File too large
OS error code  28:  No space left on device
OS error code  30:  Read-only file system
OS error code  31:  Too many links
OS error code  32:  Broken pipe
OS error code  33:  Numerical argument out of domain
OS error code  34:  Numerical result out of range
OS error code  35:  Resource deadlock avoided
OS error code  36:  File name too long
OS error code  37:  No locks available
OS error code  38:  Function not implemented
OS error code  39:  Directory not empty
OS error code  40:  Too many levels of symbolic links
OS error code  42:  No message of desired type
OS error code  43:  Identifier removed
OS error code  44:  Channel number out of range
OS error code  45:  Level 2 not synchronized
OS error code  46:  Level 3 halted
OS error code  47:  Level 3 reset
OS error code  48:  Link number out of range
OS error code  49:  Protocol driver not attached
OS error code  50:  No CSI structure available
OS error code  51:  Level 2 halted
OS error code  52:  Invalid exchange
OS error code  53:  Invalid request descriptor
OS error code  54:  Exchange full
OS error code  55:  No anode
OS error code  56:  Invalid request code
OS error code  57:  Invalid slot
OS error code  59:  Bad font file format
OS error code  60:  Device not a stream
OS error code  61:  No data available
OS error code  62:  Timer expired
OS error code  63:  Out of streams resources
OS error code  64:  Machine is not on the network
OS error code  65:  Package not installed
OS error code  66:  Object is remote
OS error code  67:  Link has been severed
OS error code  68:  Advertise error
OS error code  69:  Srmount error
OS error code  70:  Communication error on send
OS error code  71:  Protocol error
OS error code  72:  Multihop attempted
OS error code  73:  RFS specific error
OS error code  74:  Bad message
OS error code  75:  Value too large for defined data type
OS error code  76:  Name not unique on network
OS error code  77:  File descriptor in bad state
OS error code  78:  Remote address changed
OS error code  79:  Can not access a needed shared library
OS error code  80:  Accessing a corrupted shared library
OS error code  81:  .lib section in a.out corrupted
OS error code  82:  Attempting to link in too many shared libraries
OS error code  83:  Cannot exec a shared library directly
OS error code  84:  Invalid or incomplete multibyte or wide character
OS error code  85:  Interrupted system call should be restarted
OS error code  86:  Streams pipe error
OS error code  87:  Too many users
OS error code  88:  Socket operation on non-socket
OS error code  89:  Destination address required
OS error code  90:  Message too long
OS error code  91:  Protocol wrong type for socket
OS error code  92:  Protocol not available
OS error code  93:  Protocol not supported
OS error code  94:  Socket type not supported
OS error code  95:  Operation not supported
OS error code  96:  Protocol family not supported
OS error code  97:  Address family not supported by protocol
OS error code  98:  Address already in use
OS error code  99:  Cannot assign requested address
OS error code 100:  Network is down
OS error code 101:  Network is unreachable
OS error code 102:  Network dropped connection on reset
OS error code 103:  Software caused connection abort
OS error code 104:  Connection reset by peer
OS error code 105:  No buffer space available
OS error code 106:  Transport endpoint is already connected
OS error code 107:  Transport endpoint is not connected
OS error code 108:  Cannot send after transport endpoint shutdown
OS error code 109:  Too many references: cannot splice
OS error code 110:  Connection timed out
OS error code 111:  Connection refused
OS error code 112:  Host is down
OS error code 113:  No route to host
OS error code 114:  Operation already in progress
OS error code 115:  Operation now in progress
OS error code 116:  Stale NFS file handle
OS error code 117:  Structure needs cleaning
OS error code 118:  Not a XENIX named type file
OS error code 119:  No XENIX semaphores available
OS error code 120:  Is a named type file
OS error code 121:  Remote I/O error
OS error code 122:  Disk quota exceeded
OS error code 123:  No medium found
OS error code 124:  Wrong medium type
OS error code 125:  Operation canceled
OS error code 126:  Required key not available
OS error code 127:  Key has expired
OS error code 128:  Key has been revoked
OS error code 129:  Key was rejected by service
OS error code 130:  Owner died
OS error code 131:  State not recoverable
OS error code 132:  Operation not possible due to RF-kill
OS error code 133:  Memory page has hardware error
MySQL error code 120: Did not find key on read or update
MySQL error code 121: Duplicate key on write or update
MySQL error code 122: Internal (unspecified) error in handler
MySQL error code 123: Someone has changed the row since it was read (while the table was locked to prevent it)
MySQL error code 124: Wrong index given to function
MySQL error code 125: Undefined handler error 125
MySQL error code 126: Index file is crashed
MySQL error code 127: Record file is crashed
MySQL error code 128: Out of memory in engine
MySQL error code 129: Undefined handler error 129
MySQL error code 130: Incorrect file format
MySQL error code 131: Command not supported by database
MySQL error code 132: Old database file
MySQL error code 126: Index file is crashed
MySQL error code 127: Record-file is crashed
MySQL error code 128: Out of memory
MySQL error code 130: Incorrect file format
MySQL error code 131: Command not supported by database
MySQL error code 132: Old database file
MySQL error code 133: No record read before update
MySQL error code 134: Record was already deleted (or record file crashed)
MySQL error code 135: No more room in record file
MySQL error code 136: No more room in index file
MySQL error code 137: No more records (read after end of file)
MySQL error code 138: Unsupported extension used for table
MySQL error code 139: Too big row
MySQL error code 140: Wrong create options
MySQL error code 141: Duplicate unique key or constraint on write or update
MySQL error code 142: Unknown character set used in table
MySQL error code 143: Conflicting table definitions in sub-tables of MERGE table
MySQL error code 144: Table is crashed and last repair failed
MySQL error code 145: Table was marked as crashed and should be repaired
MySQL error code 146: Lock timed out; Retry transaction
MySQL error code 147: Lock table is full;  Restart program with a larger locktable
MySQL error code 148: Updates are not allowed under a read only transactions
MySQL error code 149: Lock deadlock; Retry transaction
MySQL error code 150: Foreign key constraint is incorrectly formed
MySQL error code 151: Cannot add a child row
MySQL error code 152: Cannot delete a parent row
MySQL error code 153: No savepoint with that name
MySQL error code 154: Non unique key block size
MySQL error code 155: The table does not exist in engine
MySQL error code 156: The table already existed in storage engine
MySQL error code 157: Could not connect to storage engine
MySQL error code 158: Unexpected null pointer found when using spatial index
MySQL error code 159: The table changed in storage engine
MySQL error code 160: There is no partition in table for the given value
MySQL error code 161: Row-based binlogging of row failed
MySQL error code 162: Index needed in foreign key constraint
MySQL error code 163: Upholding foreign key constraints would lead to a duplicate key error in some other table
MySQL error code 164: Table needs to be upgraded before it can be used
MySQL error code 165: Table is read only
MySQL error code 166: Failed to get next auto increment value
MySQL error code 167: Failed to set row auto increment value
MySQL error code 168: Unknown (generic) error from engine
MySQL error code 169: Record is the same
MySQL error code 170: It is not possible to log this statement
MySQL error code 171: The event was corrupt, leading to illegal data being read
MySQL error code 172: The table is of a new format not supported by this version
MySQL error code 173: The event could not be processed no other hanlder error happened
MySQL error code 174: Got a fatal error during initialzaction of handler
MySQL error code 175: File to short; Expected more data in file
MySQL error code 176: Read page with wrong checksum
MySQL error code 177: Too many active concurrent transactions
MySQL error code 178: Record not matching the given partition set
MySQL error code 179: Index column length exceeds limit
MySQL error code 180: Index corrupted
MySQL error code 181: Undo record too big
MySQL error code 182: Invalid InnoDB FTS Doc ID
MySQL error code 183: Table is being used in foreign key check
MySQL error code 184: Tablespace already exists
MySQL error code 185: Too many columns
MySQL error code 186: Row in wrong partition
MySQL error code 187: InnoDB is in read only mode
MySQL error code 188: FTS query exceeds result cache memory limit
MySQL error code 189: Temporary file write failure
MySQL error code 190: Operation not allowed when innodb_forced_recovery > 0
MySQL error code 191: Too many words in a FTS phrase or proximity search
MySQL error code 192: Foreign key cascade delete/update exceeds max depth
MySQL error code 193: Required Create option missing
MySQL error code 194: Out of memory in storage engine
MySQL error code 195: Table corrupted
MySQL error code 196: Query interrupted
MySQL error code 197: Tablespace cannot be accessed
MySQL error code 198: Tablespace is not empty
MySQL error code 199: Incorrect file name
MySQL error code 200: Operation is not allowed
MySQL error code 201: Compute generate value failed

ref:
http://man7.org/linux/man-pages/man3/perror.3.html

Convert utf8 tables to utf8mb4 in MySQL

Convert utf8 tables to utf8mb4 in MySQL

Change Schema

# For each database:
ALTER DATABASE
    database_name
    CHARACTER SET = utf8mb4
    COLLATE = utf8mb4_unicode_ci;

# For each table:
ALTER TABLE
    table_name
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

# For each column:
ALTER TABLE
    table_name
    CHANGE column_name column_name
    VARCHAR(191)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

ALTER TABLE
    svcomments_svcomment
    CHANGE comment comment 
    longtext
    CHARACTER SET utf8mb4
    COLLATE utf8_general_ci;

SHOW VARIABLES WHERE Variable_name LIKE 'character%' OR Variable_name LIKE 'collation%';

# Don’t blindly copy-paste this!
# The exact statement depends on the column type, maximum length, and other properties.
# The above line is just an example for a `VARCHAR` column.

ref:
https://coderwall.com/p/pns4pa/setting-up-unicode-defaults-for-mariadb-or-mysql
https://mathiasbynens.be/notes/mysql-utf8mb4
http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
http://dba.stackexchange.com/questions/8239/how-to-easily-convert-utf8-tables-to-utf8mb4-in-mysql-5-5

Application Settings

Take a Django project as an example.

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        ...
        'OPTIONS': {
            'charset': 'utf8mb4',
        },
        'TEST': {
            'CHARSET': 'utf8mb4',
            'COLLATION': 'utf8mb4_general_ci',
        },
    },
}

ref:
https://tzangms.com/use-emoji-in-mysql-with-django/

Create a custom migration in Django

Create a custom migration in Django

# create an empty migration file
$ ./manage.py makemigrations --empty --name convert_to_utf8mb4 your_app

in your_app/migrations/0002_convert_to_utf8mb4.py

from __future__ import unicode_literals

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('your_app', '0001_initial'),
    ]

    operations = [
        migrations.RunSQL(
            'ALTER TABLE app_repostarring CHANGE repo_description repo_description VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
        ),
    ]

ref:
https://docs.djangoproject.com/en/dev/ref/migration-operations/#runsql

Locking and MVCC in MySQL InnoDB 鎖機制與多版本並發控制

Locking and MVCC in MySQL InnoDB 鎖機制與多版本並發控制

最近在研究 MySQL 5.6 / 5.7 InnoDB storage engine 的 Transaction、Locking 和 Multi-Version Concurrency Control (MVCC) 機制,基本上只討論 Isolation level 是 REPEATABLE READREAD COMMITTED 的情況,這一篇是過程中記錄下來的筆記。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html

Transaction

Transaction Isolation Levels
https://vinta.ws/code/transaction-isolation-levels-in-databases.html

autocommit 啟用的情況下(默認),每一個 SQL 語句就是一個 transaction,但是你可以用 START TRANSACTIONBEGIN 來手動開啟一個 transaction,然後用 COMMITROLLBACK 來結束。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-autocommit-commit-rollback.html

Pessimistic Locking 悲觀鎖 / Optimistic Locking 樂觀鎖

更準確一點的說法是 pessimistic concurrency control 和 optimistic concurrency control。

基本上悲觀鎖、樂觀鎖是一種方法論(或者說是策略),在不同的系統可能有不同的作法,不過這裡就只討論 MySQL。

悲觀鎖假設你在存取某個資料時,其他人同時修改同一份資料的機率很高,也就是對「資料會被修改」這件事持悲觀的態度,所以要在存取(無論讀寫)時就把該資料鎖住,不讓其他人有機會修改。悲觀鎖通常指的是資料庫本身提供的 locking 機制(在 application 層的鎖沒辦法保證不會被修改)。SELECT ... FOR UPDATE; 就是一種悲觀鎖。

樂觀鎖則是對「資料會被修改」這件事持樂觀的態度,通常會使用 version 版本號的方式來實作,在 InnoDB 裡就是指 Multi-Version Concurrency Control (MVCC)。

You can remember which is which by picturing it this way:

  • The pessimistic approach is a user thinking "I’m sure someone will try to update this row at the same time as I will, so I better ask the database server to not let anyone touch it until I’m done with it."
  • The optimistic approach is a user thinking "I’m sure it’s all good and there will be no conflicts, so I’m just going to remember this row to double-check later, but I’ll let the database server do whatever it wants with the row."

ref:
http://stackoverflow.com/questions/129329/optimistic-vs-pessimistic-locking
https://blogs.msdn.microsoft.com/marcelolr/2010/07/16/optimistic-and-pessimistic-concurrency-a-simple-explanation/

Multi-Version Concurrency Control (MVCC)

在 InnoDB 裡是基本上是讀不加鎖,寫才加鎖,而且讀寫不互斥。

InnoDB 會把每個 row 的歷史版本儲存在 rollback segment 裡,rollback segment 的資料除了用在 transaction rollback 的 undo 操作外,會用被用在 consistent read,也就是 snapshot data。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-multi-versioning.html

在所有 InnoDB 的 tables 裡,每個 row 都會有三個隱藏的欄位 DB_ROW_IDDB_TRX_IDDB_ROLL_PTRDB_TRX_ID 是建立這個版本的數據的那個 transaction id、DB_ROLL_PTR 就是 rollback data pointer 指向在 undo logs 裡的前一個版本的數據。

TODO:
多說一點細節
例如 SELECT、INSERT、UPDATE 和 DELETE 時的行為

Consistent (Nonlocking) Reads

所謂的 consistent read 是指 read 操作實際上是去讀 snapshot data(該 row 在不同時間點的數據、版本),所以 read 操作不會因為該 row 正在被其他 transaction 修改而被 block 住,snapshot data 就是當前數據的歷史版本。

如果 isolation level 是 REPEATABLE READREAD COMMITTED 的話,SELECT 操作默認就是 consistent read,所以不會加上任何 lock。除非你顯式地加上 SELECT ... LOCK IN SHARE MODESELECT ... FOR UPDATE(稱為 locking reads)。

ref:
https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_consistent_read
https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_snapshot
https://blogs.oracle.com/mysqlinnodb/entry/repeatable_read_isolation_level_in

至於 INSERTUPDATEDELETE 操作,MySQL 則會自動對牽涉到的數據加上 lock。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html

不同的 isolation level 在讀 snapshot data 的行為上是不同的。

REPEATABLE READ 在同一個 transaction 內,對同一個 row 的所有 consistent read 讀到的內容都會是第一個 consistent read 時建立的那個 snapshot data,即使其他 transaction 已經 commit 了新的資料,所以才叫 repeatable read。不過如果是在同一個 transaction 內所做的修改則還是會去讀那個修改過的內容,因為是去讀 snapshot。而且 MySQL 的 REPEATABLE READ 還額外地避免了 phantom read。

READ COMMITTED 在同一個 transaction 內,每次去讀同一個 row 都會建立並且讀最新的一份 snapshot data(當然只會讀已經 commit 的資料),先後的 read 可能會讀到不同的內容,所以才會有 non-repeatable read 的問題。

SERIALIZABLE 是讀加讀鎖,寫加寫鎖,而且讀寫互斥。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html#isolevel_read-committed
https://blog.gslin.org/archives/2015/09/27/6007/%E7%B9%BC%E7%BA%8C%E5%AD%B8-isolation/

雖然 MySQL 默認的 isolation level 是 REPEATABLE READ,但是如果你的專案其實不太需要 repeatable reads 的話,用 READ COMMITTED 可能更合適,至少 lock 會比較少。不過 REPEATABLE READREAD COMMITTED 在效能上的差異似乎也沒有想像中那麼大就是了。

ref:
https://www.percona.com/blog/2015/01/14/mysql-performance-implications-of-innodb-isolation-modes/
https://smalldatum.blogspot.tw/2016/11/sysbench-innodb-transaction-isolation.html

Snapshot Read / Current Read

也有一種說法是把「讀」分成 snapshot read 和 current read,不過這種說法似乎只有在中國程序員寫的文章裡流傳,至少我沒有在官方的文件裡找到這兩個名詞。

snapshot read,就是去讀 snapshot data:

  • SELECT

current read,就是去讀最新版本的數據:

  • SELECT ... LOCK IN SHARE MODE
  • SELECT ... FOR UPDATE
  • INSERT
  • UPDATE
  • DELETE

ref:
http://hedengcheng.com/?p=771
http://tech.meituan.com/innodb-lock.html

Lock Modes

Shared Locks 共享鎖 / Exclusive Locks 排他鎖

shared lock (S lock),也稱為「共享鎖」、「讀鎖」。被加上讀鎖的數據,其他 transaction 可以讀,也可以再加上讀鎖,但是不能寫(因為不能加上寫鎖)。

可以用 SELECT ... LOCK IN SHARE MODE; 顯式地加上 shared lock。MySQL 會對被 SELECT 到的每個 row 加上 shared lock,當然前提是這些 row 沒有被加上 exclusive lock。

ref:
https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_shared_lock

exclusive lock (X lock),也稱為「排他鎖」、「寫鎖」。被加上寫鎖的數據,其他 transaction 可以讀(因為讀是讀 snapshot data),但是不能再加上任何鎖,所以不能寫(因為不能加上寫鎖),只有加上寫鎖的那個 transaction 才能寫。

可以用 SELECT ... FOR UPDATE; 顯式地加上 exclusive lock,MySQL 會對被 SELECT 到的每個 row 加上 exclusive lock。

ref:
https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_exclusive_lock

Intention Locks 意向鎖

分成 intention shared locks (IS) 和 intention exclusive locks (IX),用來表示某個 transaction 打算對某個 table 裡的某個 row 加上 shared lock 或 exclusive lock。更明確地說,某個 transaction 必須先取得 table 的 intention lock,它才能對該 table 裡的 row 加上 shared lock 或 exclusive lock。

intention locks 是 MySQL 自動加的,你不需要也沒辦法自己加上這種鎖。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html

另外還有一種 table-level locks,可以透過 LOCK TABLE 顯式地加上:

LOCK TABLES t1 READ;
LOCK TABLES t1 WRITE;

ref:
https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

Online DDL (Data Definition Language)

ALTER TABLE 操作可能也會有 table-level locks,在 ALTER TABLE 的過程中只能讀,寫則會被 block 直到 ALTER TABLE 完成。不過這點在 MySQL 5.6 / 5.7 以後改善了不少,例如可以用 ALTER TABLE ... ALGORITHM=INPLACE LOCK=NONE; 來指定不同的行為。

ref:
https://dev.mysql.com/doc/refman/5.7/en/alter-table.html
https://dev.mysql.com/doc/refman/5.7/en/innodb-online-ddl.html

Summary of Online Status for DDL Operations
https://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-overview.html

Lock Types

Locks Set by Different SQL Statements in InnoDB
https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html

Record Locks

record lock 就是加在 index 上的 lock,InnoDB 裡的 row-level lock 實際上是 index 的 record lock。使用 READ COMMITTED 的時候,locking read(即 LOCK IN SHARE MODEFOR UPDATE)、UPDATEDELETE 的 lock 都是 record lock。

TODO:
如果該 table 沒有任何 index

READ COMMITTED 只有 record lock,沒有 gap lock 和 next-key lock,不過 foreign-key constraint 和 duplicate-key 檢查還是會用到 gap lock。

REPEATABLE READ 有 record lock、gap lock 和 next-key lock。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-record-locks
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html#isolevel_read-committed

Gap Locks

gap lock 是加在 index records 之間的 gap 上的 lock。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-gap-locks

Next-Key Locks

next-key lock 是 record lock + gap lock,next-key lock 可以用來避免 phantom read。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-next-key-locks
https://dev.mysql.com/doc/refman/5.7/en/innodb-next-key-locking.html

Insert Intention Locks

insert intention lock 是 gap lock 的一種,INSERT 之前會先加一個 insert intention lock,但是只要不是要 INSERT 到同一個位置上就不會 block 其他 transaction。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-insert-intention-locks

AUTO-INC Locks

auto-inc lock 是一種 table-level lock,不同種類的 insert-like 語句會有不同的行為。可以透過 innodb_autoinc_lock_mode 設定來改變這些行為。

innodb_autoinc_lock_mode = 1(預設值)的情況下,simple inserts(就是那種在執行之前就知道最後會產生幾筆新資料的 inserts)不會有 table-level 的 AUTO-INC lock,只有在取得 auto-increment 值的那瞬間會有 lock。

ref:
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-auto-inc-locks
https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html
https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_autoinc_lock_mode