Create a custom migration in Django

Create a custom migration in Django

# create an empty migration file
$ ./manage.py makemigrations --empty --name convert_to_utf8mb4 your_app

in your_app/migrations/0002_convert_to_utf8mb4.py

from __future__ import unicode_literals

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ('your_app', '0001_initial'),
    ]

    operations = [
        migrations.RunSQL(
            'ALTER TABLE app_repostarring CHANGE repo_description repo_description VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
        ),
    ]

ref:
https://docs.djangoproject.com/en/dev/ref/migration-operations/#runsql

South: Database migration for Django 1.6

South is a tool to provide consistent, easy-to-use and database-agnostic migrations for Django applications. You should use the built-in migration tool in Django 1.7+ instead of South.

ref:
https://south.readthedocs.org/en/latest/

https://docs.djangoproject.com/en/dev/topics/migrations/

Configuration

south 加入 INSTALLED_APPS 之後:

# 你可以在 settings.py 加上這個設定,這樣跑測試的時候就會直接 syncdb
SOUTH_TESTS_MIGRATE = False

Usage

$ python manage.py syncdb --all

第一次使用,尚未修改 models.py 時

$ python manage.py schemamigration your_app --initial

會在該 app 底下產生一個 migrations 目錄
裡頭會有一個 0001_initial.py

修改 models.py 後

$ python manage.py schemamigration your_app --auto

會產生 0002_auto__xxx.py

開始 migrate:

# 你可以先看一下要執行的 SQL
$ python manage.py migrate your_app --db-dry-run --verbosity=2

$ python manage.py migrate your_app

# 一口氣 migrate 所有需要 migrate 的 models
$ python manage.py migrate

Issues

DatabaseError: (1050, "Table 'YOUR_TABLE' already exists") 之一

表示資料庫中沒有 south 的 migrate 資料
但是該 app 的 table 已經被建立了
可能是因為之前已經 syncdb 過
之後才把 south 加入到 INSTALLED_APPS

則:

$ python manage.py migrate your_app --fake
$ python manage.py migrate

DatabaseError: (1050, "Table 'YOUR_TABLE' already exists") 之二

假設你有三個 migration 資料,分別是 0001、0002 和 0003
0003 是新產生的 migration,還沒被 migrate 過
就先 fake 到前一次的 migration
再真的 migrate 到最新的那一次 migration

則:

$ python manage.py migrate your_app 0002 --fake
$ python manage.py migrate your_app

GhostMigrations: These migrations are in the database but not on disk

$ python manage.py migrate --delete-ghost-migrations

NoMigrations: Application '' has no migrations

$ python manage.py reset south

Show migration SQL

# 先回到上一個 migration
$ ./manage.py migrate music --fake 0010

$ ./manage.py migrate music --db-dry-run --verbosity=2

$ ./manage.py migrate music 0011
# or
$ ./manage.py migrate music --fake 0011
Database Routers in Django

Database Routers in Django

把部分 models / tables 獨立到一台資料庫

Database Router

沒辦法在 Model class 裡指定這個 model 只能用某個 database
而是要用 database router
就是判斷 model._meta.app_label == 'xxx' 的時候指定使用某一個 database
database 是指定義在 settings.DATABASES 的那些

不過 django 不支援跨 database 的 model relation
你不能用 foreign key 或 m2m 指向另一個 database 裡的 model
但是其實你直接用 user_id, song_id 之類的 int 欄位來記錄就好了

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'main',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
    'warehouse': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'warehouse',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
}

DATABASE_ROUTERS = [
    'app.db_routers.SVDatabaseRouter',
]

in app/db_routers.py

class SVWarehouseDBMixin(object):
    """
    CAUTION! 有這個 minxin 的 class 會被放到 warehouse 這台資料庫!

    原本是想繼承 models.Model 用 abstract = True
    但是載入 db router 的時候會衝突(待研究)
    只好改成 mixin
    """

    pass


class SVDatabaseRouter(object):

    def db_for_read(self, model, **hints):
        if issubclass(model, SVWarehouseDBMixin):
            return 'warehouse'

        return 'default'

    def db_for_write(self, model, **hints):
        if issubclass(model, SVWarehouseDBMixin):
            return 'warehouse'

        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        if isinstance(obj1, SVWarehouseDBMixin) and isinstance(obj2, SVWarehouseDBMixin):
            return True
        elif (not isinstance(obj1, SVWarehouseDBMixin)) and (not isinstance(obj2, SVWarehouseDBMixin)):
            return True

        return False

    def allow_syncdb(self, db, model):
        if db == 'default':
            if issubclass(model, SVWarehouseDBMixin):
                return False
            else:
                return True
        elif db == 'warehouse':
            if issubclass(model, SVWarehouseDBMixin):
                return True
            else:
                return False

        return False

ref:
https://docs.djangoproject.com/en/dev/topics/db/multi-db/

Models

繼承 SVWarehouseDBMixin 這個 minxin 的 class 會被放到 warehouse 這台資料庫!
所以 migrate 的時候要注意,記得在 migration 檔案裡加上:

from south.db import dbs
warehouse_db = dbs['warehouse']

這樣 south 才會在 warehouse 這台資料庫上建立 table
不然就是你自己手動去 CREATE TABLE

from app.db_routers import SVWarehouseDBMixin

class PlayRecord(SVWarehouseDBMixin, models.Model):
    song_id = models.IntegerField()
    user_id = models.IntegerField(null=True, blank=True)
    is_full = models.BooleanField(default=False)
    ip_address = models.IPAddressField()
    location = models.CharField(max_length=2)
    created_at = models.DateTimeField()

Migration

如果你要把舊有的 app 的 models 搬到另一台資料庫
但是 models 不動(還是放在本來的 app 底下)
你可能會需要 reset 整個 migration 紀錄
從頭開始建立一個新的 migration
因為 schema 會錯亂
所以還是建議新開一個 app 來放那些要搬到另一台資料庫的 models
這樣 database router 和 migration 都會比較單純

in app/migrations/0001_initial.py

$ pip install south==1.0.2

# syncdb 默認只會作用到 default 資料庫,你要明確指定要用哪個 database 才行
$ ./manage.py syncdb --noinput
$ ./manage.py syncdb --noinput --database=warehouse

# migrate 卻可以作用到其他資料庫
# 因為 migrate 哪個資料庫是 migration file 裡的 `db` 參數在決定的
$ ./manage.py migrate

ref:
http://stackoverflow.com/questions/7029228/is-using-multiple-databases-and-south-together-possible

Unit Tests

Django only flushes the default database at the start of each test run. If your setup contains multiple databases, and you have a test that requires every database to be clean, you can use the multi_db attribute on the test suite to request a full flush.

from django.test import TestCase

class YourBTestCase(TestCase):
    multi_db = True

    def setUp(self):
        do_shit()

ref:
https://docs.djangoproject.com/en/dev/topics/testing/tools/
https://docs.djangoproject.com/en/dev/topics/testing/advanced/#topics-testing-advanced-multidb

Django ORM

defer / only / values / values_list

# retrieve everything but the `body` field
posts = Post.objects.all().defer('body')

# retrieve only the `title` field
posts = Post.objects.all().only('title')

# retrieve a list of {'id': id} dictionaries
posts = Post.objects.all().values('id')

# retrieve a list of (id,) tuples
posts = Post.objects.all().values_list('id')

# retrieve a list of ids
posts = Post.objects.all().values_list('id', flat=True)

使用 defer()only()
你得到的還是一個 model object

使用 values()values_list()
你得到的會是 list 或 dictionaries

如果你只會存取特定的幾個欄位而不會用到 methods
建議你用 values()values_list()
這樣可以省下把資料初始化成 Django model instance 的時間

distinct()

因為 MySQL 不支援 distinct('some_field')
會報錯 NotImplementedError: DISTINCT ON fields is not supported by this database backend
但是你其實可以結合 values()distinct() 來達成差不多的結果

AlbumDistribute.objects.filter(status=AlbumPublishingStatus.PUBLISHED).distinct('album').count()
# NotImplementedError: DISTINCT ON fields is not supported by this database backend

AlbumDistribute.objects.filter(status=AlbumPublishingStatus.PUBLISHED).values('album').count()
# 107

AlbumDistribute.objects.filter(status=AlbumPublishingStatus.PUBLISHED).values('album').distinct().count()
# 30

cast a queryset to a list

queryset = User.objects.filter(is_staff=True)
for user in queryset:
    setattr(user, 'extra_field', 'whatever')

object_list = list(queryset)

當你 assign 了值之後,最好把 queryset 轉成 list
這樣可以避免在其他地方不小心執行了 object_list.all() 之類的操作
因為 callable attributes cause DB lookups every time
每次 object_list.all() 又會產生一個新的 queryset

ref:
https://docs.djangoproject.com/en/dev/topics/db/optimization/

order_by()

.order_by() 不加參數表示不要任何排序

SQL 的 GROUP BY

如果你只是希望 query 的結果能夠隱式地 group 在一起
其實用 .order_by() 就可以了
例如 .order_by('album', '-created_at')
http://stackoverflow.com/questions/629551/how-to-query-as-group-by-in-django

如果你需要用到 aggregation
就要用 .values()

from django.db.models import Sum

"""
每一首歌在每一週的播放次數之和
group by song
[
    {'sum_count': 123553, 'song': 261009L},
    {'sum_count': 81889, 'song': 295336L},
    {'sum_count': 78596, 'song': 206349L},
    ...
]
"""
PlayRecordWeekly.objects \
.filter(song__created_at__range=(start_date, end_date)) \
.values('song') \
.annotate(sum_count=Sum('count')) \
.order_by('-sum_count')[:10]


"""
每一首歌的總播放次數
group by song
[
    {'count_song': 26054, 'song': 194512L},
    {'count_song': 18281, 'song': 201436L},
    {'count_song': 17802, 'song': 295336L},
    ...
]
"""
PlayRecord.objects \
.filter(created_at__year=2014, created_at__month=11) \
.values('song') \
.annotate(count_song=Count('song')) \
.order_by('-count_song')[:10]


"""
每一個用戶的播放次數之和
group by user
"""
PlayRecordArchive.objects \
.filter(last_modified__year=2014) \
.values('user_id') \
.annotate(sum_count=Sum('count')) \
.filter(sum_count__gte=10000) \
.order_by('-sum_count')


"""
每一種曲風的播放次數之和
group by song__genre
"""
PlayRecord.objects \
.filter(user=user) \
.exclude(song__genre=0) \
.values('song__genre') \
.annotate(count_song_genre=Count('song__genre')) \
.order_by('-count_song_genre')

ref:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#values
http://fcamel-life.blogspot.tw/2010/04/django-group-by.html
http://blog.darkchoco.com/2011/12/20/group-by-in-django/
http://stackoverflow.com/questions/19101665/django-how-to-do-select-count-group-by-and-order-by

Q

所有的 .filter() 查詢都是 AND 操作
如果想要 OR 就得用 Q()

from django.db.models import Q
Q(question__startswith='What')

ref:
https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects

F

你可以在 .filter() 裡用 F() 來表示同一個 model 的其他欄位
就是可以在資料庫層就直接運算
而不會把資料拿到 Python 層處理

不過跟 transaction.atomic() 一起用的時候有點問題
因為有可能資料還沒有真的被存進資料庫裡

from django.db.models import F

Entry.objects.filter(n_comments__gt=F('n_pingbacks'))

song.play_count = F('play_count') + 1
song.save(update_fields=['play_count', ])

ref:
https://docs.djangoproject.com/en/dev/topics/db/queries/#using-f-expressions-in-filters

aggregate, annotate

https://docs.djangoproject.com/en/dev/topics/db/aggregation/
aggregate 是「聚合」
a sum total of many heterogenous things taken together

annotate 是「註釋」
add explanatory notes to or supply with critical comments
https://docs.djangoproject.com/en/dev/ref/models/querysets/#annotate

annotate 作用在 queryset 裡的每一個 object
所以返回的還是一個一般的 queryset
只是每個 model instance 會有額外的 attribute 儲存 annotate 的結果

aggregate 作用在整個 queryset
通常會返回一個 dict
只包含你 aggregate 的結果
例如 model 裡某個欄位(例如 price)的總和

要訣就是 Avg, Sum, Count, Max, Min
能夠在 SQL 做的計算就在 SQL 做
不要拿到 Python 來做
https://docs.djangoproject.com/en/dev/ref/models/querysets/#aggregation-functions

ref:
https://docs.djangoproject.com/en/dev/topics/db/aggregation/
http://stackoverflow.com/questions/327807/django-equivalent-for-count-and-group-by/1317837
http://www.shellbye.com/blog/%E6%8A%80%E6%9C%AF%E4%B8%96%E7%95%8C/django-aggregate-annotate-%E8%AF%A6%E8%A7%A3/

Object Permission in Django

django

django 自己提供的 permission 系統是 "model permission"
那些 add, change, delete 的權限是認 model(也就是該 model 下的所有 objects)
因為這個 permission 系統是用在 admin 裡面的

如果要用 "object permission"
可以用 django-guardian
針對每個 object 設定不同的權限

ref:
https://docs.djangoproject.com/en/dev/topics/auth/default/#permissions-and-authorization
http://www.cnblogs.com/esperyong/archive/2012/12/20/2826690.html

django-guardian

如果是 superuser
對所有 permission 的 has_perm() 都會是 True

class Label(models.Model):
    """
    廠牌
    """

    user = models.ForeignKey(User, related_name='labels')
    name = models.CharField(_(u'name'), max_length=100)

    class Meta:
        verbose_name = pgettext_lazy(u'DPS model name', u'label')
        verbose_name_plural = pgettext_lazy(u'DPS model name', u'labels')
        permissions = (
            ('view_label', 'View label'),
        )

django 默認會幫每個 model 建立三種 permission(所謂的 codename)

  • add_modelname
  • change_modelname
  • delete_modelname
# permission format: app_label.codename
user.has_perm('sites.change_site')  # by django
user.has_perm('sites.change_site', site)  # by django-guadian

每個 model object 被建立之後是沒有 object permission 的

要手動建立
通常會透過 signal 的方式去做

from django.db.models.signals import post_save
from django.dispatch import receiver

from guardian.shortcuts import assign_perm

@receiver(post_save, sender=Label)
def create_label_permission(instance, **kwargs):
    if not kwargs['created']:
        return

    label = instance

    assign_perm('view_label', label.user, label)
    assign_perm('change_label', label.user, label)

get objects by permission

from guardian.shortcuts import get_objects_for_user

labels = get_objects_for_user(request.user, 'dps.view_label')

ref:
https://github.com/lukaszb/django-guardian
http://django-guardian.readthedocs.org/