Convert utf8 tables to utf8mb4 in MySQL

Convert utf8 tables to utf8mb4 in MySQL

# For each database:
ALTER DATABASE
    database_name
    CHARACTER SET = utf8mb4
    COLLATE = utf8mb4_unicode_ci;

# For each table:
ALTER TABLE
    table_name
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

# For each column:
ALTER TABLE
    table_name
    CHANGE column_name column_name
    VARCHAR(191)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

SHOW VARIABLES LIKE '%character%';
SHOW VARIABLES LIKE '%collation%';

# Don’t blindly copy-paste this!
# The exact statement depends on the column type, maximum length, and other properties.
# The above line is just an example for a `VARCHAR` column.

ref:
http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
http://dba.stackexchange.com/questions/8239/how-to-easily-convert-utf8-tables-to-utf8mb4-in-mysql-5-5
https://mathiasbynens.be/notes/mysql-utf8mb4

Create a custom migration in Django

Create a custom migration in Django

# create an empty migration file
$ ./manage.py makemigrations --empty --name convert_to_utf8mb4 your_app

in your_app/migrations/0002_convert_to_utf8mb4.py

from __future__ import unicode_literals

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ('your_app', '0001_initial'),
    ]

    operations = [
        migrations.RunSQL(
            'ALTER TABLE app_repostarring CHANGE repo_description repo_description VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
        ),
    ]

ref:
https://docs.djangoproject.com/en/dev/ref/migration-operations/#runsql

South: Database migration for Django 1.6

South is a tool to provide consistent, easy-to-use and database-agnostic migrations for Django applications. You should use the built-in migration tool in Django 1.7+ instead of South.

ref:
https://south.readthedocs.org/en/latest/

https://docs.djangoproject.com/en/dev/topics/migrations/

Configuration

south 加入 INSTALLED_APPS 之後:

# 你可以在 settings.py 加上這個設定,這樣跑測試的時候就會直接 syncdb
SOUTH_TESTS_MIGRATE = False

Usage

$ python manage.py syncdb --all

第一次使用,尚未修改 models.py 時

$ python manage.py schemamigration your_app --initial

會在該 app 底下產生一個 migrations 目錄
裡頭會有一個 0001_initial.py

修改 models.py 後

$ python manage.py schemamigration your_app --auto

會產生 0002_auto__xxx.py

開始 migrate:

# 你可以先看一下要執行的 SQL
$ python manage.py migrate your_app --db-dry-run --verbosity=2

$ python manage.py migrate your_app

# 一口氣 migrate 所有需要 migrate 的 models
$ python manage.py migrate

Issues

DatabaseError: (1050, "Table 'YOUR_TABLE' already exists") 之一

表示資料庫中沒有 south 的 migrate 資料
但是該 app 的 table 已經被建立了
可能是因為之前已經 syncdb 過
之後才把 south 加入到 INSTALLED_APPS

則:

$ python manage.py migrate your_app --fake
$ python manage.py migrate

DatabaseError: (1050, "Table 'YOUR_TABLE' already exists") 之二

假設你有三個 migration 資料,分別是 0001、0002 和 0003
0003 是新產生的 migration,還沒被 migrate 過
就先 fake 到前一次的 migration
再真的 migrate 到最新的那一次 migration

則:

$ python manage.py migrate your_app 0002 --fake
$ python manage.py migrate your_app

GhostMigrations: These migrations are in the database but not on disk

$ python manage.py migrate --delete-ghost-migrations

NoMigrations: Application '' has no migrations

$ python manage.py reset south

Show migration SQL

# 先回到上一個 migration
$ ./manage.py migrate music --fake 0010

$ ./manage.py migrate music --db-dry-run --verbosity=2

$ ./manage.py migrate music 0011
# or
$ ./manage.py migrate music --fake 0011
Database Routers in Django

Database Routers in Django

把部分 models / tables 獨立到一台資料庫

Database Router

沒辦法在 Model class 裡指定這個 model 只能用某個 database
而是要用 database router
就是判斷 model._meta.app_label == 'xxx' 的時候指定使用某一個 database
database 是指定義在 settings.DATABASES 的那些

不過 django 不支援跨 database 的 model relation
你不能用 foreign key 或 m2m 指向另一個 database 裡的 model
但是其實你直接用 user_id, song_id 之類的 int 欄位來記錄就好了

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'main',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
    'warehouse': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'warehouse',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
}

DATABASE_ROUTERS = [
    'app.db_routers.SVDatabaseRouter',
]

in app/db_routers.py

class SVWarehouseDBMixin(object):
    """
    CAUTION! 有這個 minxin 的 class 會被放到 warehouse 這台資料庫!

    原本是想繼承 models.Model 用 abstract = True
    但是載入 db router 的時候會衝突(待研究)
    只好改成 mixin
    """

    pass


class SVDatabaseRouter(object):

    def db_for_read(self, model, **hints):
        if issubclass(model, SVWarehouseDBMixin):
            return 'warehouse'

        return 'default'

    def db_for_write(self, model, **hints):
        if issubclass(model, SVWarehouseDBMixin):
            return 'warehouse'

        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        if isinstance(obj1, SVWarehouseDBMixin) and isinstance(obj2, SVWarehouseDBMixin):
            return True
        elif (not isinstance(obj1, SVWarehouseDBMixin)) and (not isinstance(obj2, SVWarehouseDBMixin)):
            return True

        return False

    def allow_syncdb(self, db, model):
        if db == 'default':
            if issubclass(model, SVWarehouseDBMixin):
                return False
            else:
                return True
        elif db == 'warehouse':
            if issubclass(model, SVWarehouseDBMixin):
                return True
            else:
                return False

        return False

ref:
https://docs.djangoproject.com/en/dev/topics/db/multi-db/

Models

繼承 SVWarehouseDBMixin 這個 minxin 的 class 會被放到 warehouse 這台資料庫!
所以 migrate 的時候要注意,記得在 migration 檔案裡加上:

from south.db import dbs
warehouse_db = dbs['warehouse']

這樣 south 才會在 warehouse 這台資料庫上建立 table
不然就是你自己手動去 CREATE TABLE

from app.db_routers import SVWarehouseDBMixin

class PlayRecord(SVWarehouseDBMixin, models.Model):
    song_id = models.IntegerField()
    user_id = models.IntegerField(null=True, blank=True)
    is_full = models.BooleanField(default=False)
    ip_address = models.IPAddressField()
    location = models.CharField(max_length=2)
    created_at = models.DateTimeField()

Migration

如果你要把舊有的 app 的 models 搬到另一台資料庫
但是 models 不動(還是放在本來的 app 底下)
你可能會需要 reset 整個 migration 紀錄
從頭開始建立一個新的 migration
因為 schema 會錯亂
所以還是建議新開一個 app 來放那些要搬到另一台資料庫的 models
這樣 database router 和 migration 都會比較單純

in app/migrations/0001_initial.py

$ pip install south==1.0.2

# syncdb 默認只會作用到 default 資料庫,你要明確指定要用哪個 database 才行
$ ./manage.py syncdb --noinput
$ ./manage.py syncdb --noinput --database=warehouse

# migrate 卻可以作用到其他資料庫
# 因為 migrate 哪個資料庫是 migration file 裡的 `db` 參數在決定的
$ ./manage.py migrate

ref:
http://stackoverflow.com/questions/7029228/is-using-multiple-databases-and-south-together-possible

Unit Tests

Django only flushes the default database at the start of each test run. If your setup contains multiple databases, and you have a test that requires every database to be clean, you can use the multi_db attribute on the test suite to request a full flush.

from django.test import TestCase

class YourBTestCase(TestCase):
    multi_db = True

    def setUp(self):
        do_shit()

ref:
https://docs.djangoproject.com/en/dev/topics/testing/tools/
https://docs.djangoproject.com/en/dev/topics/testing/advanced/#topics-testing-advanced-multidb

pt-online-schema-change

pt-online-schema-change can alter a table's structure without blocking reads or writes.

pt-online-schema-change works by creating an empty copy of the table to alter, modifying it as desired, and then copying rows from the original table into the new table. When the copy is complete, it moves away the original table and replaces it with the new one. Any modifications to data in the original tables during the copy will be reflected in the new table, because the tool creates triggers on the original table to update the corresponding rows in the new table.

Install

$ sudo apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A

# add this to `/etc/apt/sources.list`
# remember to replace `precise` with the name of your distribution
deb http://repo.percona.com/apt precise main
deb-src http://repo.percona.com/apt precise main

$ sudo apt-get update
$ sudo aptitude -V install libio-socket-ssl-perl libnet-ssleay-perl percona-toolkit

ref:
http://www.percona.com/doc/percona-server/5.6/installation/apt_repo.html

Usage

$ time pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--dry-run \
--alter "ADD COLUMN extra_data TEXT NULL" \
D=streetvoice,t=member_userprofile

# you must specify `execute` option explicitly to alter tables
$ time pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter "ADD COLUMN extra_data TEXT NULL" \
D=streetvoice,t=member_userprofile

$ pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter " \
DROP COLUMN get_sms, \
DROP COLUMN job_title, \
DROP COLUMN exp, \
DROP COLUMN non_member_message, \
DROP COLUMN fans_amount, \
DROP COLUMN new_fans, \
DROP COLUMN last_time_in_giftbox \
" \
D=streetvoice,t=member_userprofile

$ pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter " \
ADD COLUMN playlist_id integer NULL AFTER user_id, \
ADD CONSTRAINT playlist_id_refs_id_1d572331 FOREIGN KEY (playlist_id) REFERENCES music_playlist (id), \
ADD INDEX music_playrecord_97bd5154 (playlist_id) \
" \
D=streetvoice,t=music_playrecord

# 2 cores, 2GB RAM
real 105m27.484s
user 0m8.541s
sys 0m1.414s

$ pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter-foreign-keys-method=rebuild_constraints \
--alter " \
ADD COLUMN play_count integer UNSIGNED NOT NULL DEFAULT 0 AFTER buy_link \
" \
D=streetvoice,t=music_musicalbum

$ pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter " \
DROP COLUMN starsigns \
" \
D=streetvoice,t=member_usertemp

$ pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--dry-run \
--alter " \
DROP COLUMN is_cc, \
DROP COLUMN cc_license \
" \
D=streetvoice,t=music_song

ref:
http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html
http://www.percona.com/blog/2014/11/18/avoiding-mysql-alter-table-downtime/
http://www.cnblogs.com/zhoujinyi/p/3491059.html
http://huanghualiang.blog.51cto.com/6782683/1360873

alter-foreign-keys-method

如果你要修改的 table 有欄位是其他 table 的 foreign key
你必須明確地指定 alter-foreign-keys-method 參數
否則 pt-online-schema-change 會不讓你執行
因為會有 constraint 的問題()

官方文件說 rebuild_constraints 是比較建議的作法

$ time pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter-foreign-keys-method=rebuild_constraints \
--alter "ADD COLUMN length int(10) UNSIGNED NOT NULL DEFAULT 0 AFTER file_size" \
D=streetvoice,t=music_song

# 2 cores, 2G RAM
real 2m57.298s
user 0m0.338s
sys 0m0.043s

ref:
http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html#cmdoption-pt-online-schema-change--alter-foreign-keys-method

MariaDB Galera Cluster

This tool works fine with MariaDB Galera Cluster, only makes sure wsrep_OSU_method value is TOI.

ref:
http://www.severalnines.com/blog/online-schema-upgrade-mysql-galera-cluster-using-toi-method
http://galeracluster.com/documentation-webpages/schemaupgrades.html