Django's get_or_create() may raise IntegrityError but subsequent get() raises DoesNotExist

Django's get_or_create() may raise IntegrityError but subsequent get() raises DoesNotExist

Django 的 SomeModel.objects.get_or_create() 實際上的步驟是:

  1. get
  2. if problem: save + some_trickery
  3. if still problem: get again
  4. if still problem: surrender and raise

get_or_create() 拋出 IntegrityError 但是接著 get() 一次卻會是 DoesNotExist,這個錯誤常常會發生在 MySQL 的 AUTOCOMMIT=OFF 且 isolation level 是 REPEATABLE READ 的情況下。MySQL 默認採用 REPEATABLE READ(而 PostgreSQL 預設是 READ COMMITTED),而 REPEATABLE READ 的定義是在同一個 transaction 之內 執行相同的 SELECT 一定會拿到相同的結果。

with transaction.commit_on_success():
    try:
        user, created = User.objects.get_or_create(id=self.id, username=self.username)
    except IntegrityError:
        # 這個 get() 可能會發生 DoesNotExist
        user = User.objects.get(id=self.id)

會發生那個 DoesNotExist 通常是因為:

  1. thread a 執行 get(),但是 DoesNotExist
  2. 所以 thread a 接著 create(),但是在資料建立之前
  3. thread b 幾乎在同一個時間點也執行 get(),也是 DoesNotExist
  4. thread a 成功地 create()
  5. 因為在 3 拿不到資料,所以 thread b 也執行 create(),但是因為 UNIQUE CONSTRAINT,導致失敗了
  6. 所以 thread b 又會 get() 一次,但是因為 REPEATABLE READ,thread b 不會拿到 thread a 建立的資料,所以 DoesNotExist
  7. 所以那個 User.DoesNotExist 就是 thread b 拋出來的錯誤
  8. thread a 是成功的,你事後去資料庫裡查詢,會看到那個 User 的資料好端端地在那裡

ref:
http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django
https://blog.ionelmc.ro/2014/12/28/terrible-choices-mysql/

解決的方法,除了就乾脆不要在同一個 transaction 裡之外,基本上就只能把 isolation level 改成 READ COMMITTED 了。似乎沒有其他辦法了,因為這就是 isolation level 的定義。

As of MariaDB/MySQL 5.1, if you use READ COMMITTED or enable innodb_locks_unsafe_for_binlog, you must use row-based binary logging.

SHOW VARIABLES LIKE 'binlog_format';
SET SESSION binlog_format = 'ROW';
SET GLOBAL binlog_format = 'ROW';

ref:
http://dba.stackexchange.com/questions/2678/how-do-i-show-the-binlog-format-on-a-mysql-server

Django 和 Celery 都建議使用 READ COMMITTED
https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
https://code.djangoproject.com/ticket/13906
https://code.djangoproject.com/ticket/6641
http://docs.celeryproject.org/en/latest/faq.html#mysql-is-throwing-deadlock-errors-what-can-i-do

方法一

SELECT @@GLOBAL.tx_isolation, @@tx_isolation;

# MySQL
SET SESSION tx_isolation='READ-COMMITTED';
SET GLOBAL tx_isolation='READ-COMMITTED';

# MariaDB
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

ref:
http://dev.mysql.com/doc/refman/5.5/en/set-transaction.html

方法二

/etc/mysql/my.cnf

[mysqld]
transaction-isolation = READ-COMMITTED

方法三

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        ...
        'OPTIONS': {
            'init_command': 'SET SESSION tx_isolation="READ-COMMITTED"',
        },
    }
}

方法四

自己寫一個 get_or_create()

from django.db import transaction

@transaction.atomic
# or
@transaction.commit_on_success
def my_get_or_create(...):
    try:
        obj = MyObj.objects.create(...)
    except IntegrityError:
        transaction.commit()
        obj = MyObj.objects.get(...)
    return obj

ref:
http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django

MySQL 默認是 AUTOCOMMIT
http://dev.mysql.com/doc/refman/5.5/en/commit.html

Django 默認也是 AUTOCOMMIT,除非你使用了 TransactionMiddleware(Django 1.5 以前)或是設置 ATOMIC_REQUESTS = False(Django 1.6 以後)。TransactionMiddleware 的效果類似 @transaction.commit_on_success@transaction.atomic,都是把 AUTOCOMMIT 關掉,差別在於範圍不同。

ref:
http://django.readthedocs.io/en/1.5.x/topics/db/transactions.html#django-s-default-transaction-behavior
http://django.readthedocs.io/en/1.6.x/topics/db/transactions.html#managing-autocommit
http://django.readthedocs.io/en/1.6.x/ref/settings.html#std:setting-DATABASE-ATOMIC_REQUESTS

Configure i18n translation in Django

Configure i18n translation in Django

Internationalization and localization for your Django project.

Configuration

in settings.py

USE_I18N = True

# Django 1.6 要用 'zh-tw' 或 'zh-cn'
# Django 1.7+ 要用 'zh-hant' 或 'zh-hans'
LANGUAGE_CODE = 'zh-hant'

LANGUAGES = (
    ('en', 'English'),
    ('zh-hans', 'Simplified Chinese'),
    ('zh-hant', 'Traditional Chinese'),
)

LOCALE_PATHS = (
    os.path.join(BASE_DIR, 'locale/'),
)

# 要記得加上 LocaleMiddleware,才會針對 request headers 使用對應的語言
MIDDLEWARE_CLASSES = (
    ...
    'django.middleware.locale.LocaleMiddleware',
    ...
)

ref:
https://docs.djangoproject.com/en/dev/ref/settings/#languages
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#how-django-discovers-language-preference

in urls.py

urlpatterns = patterns('',
    ...
    url(r'^i18n/', include('django.conf.urls.i18n')),
    ...
)

ref:
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#the-set-language-redirect-view

Usage

在 models.py 中通常都會使用 ugettext_lazy() 而不是 ugettext()。因為 gettext_lazy() 其中的值是在被訪問的時候才翻譯,而不是在呼叫 gettext_lazy() 的時候就翻譯。另外,ugettext()ugettext_lazy() 出來的字串都是 unicode。pgettext() 的作用跟 ugettext() 一樣,只是多了一個參數可以傳 context 進去。

ref:
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#lazy-translation
https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.translation.pgettext

要在 ugettext() 中使用 string format,必須要用以下的形式:

msg = _(u'您有 %(not_ready_count)s 項行程的尚未填寫「司機備忘錄」。') % {'not_ready_count': not_ready_count}

ref:
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#internationalization-in-python-code

in base.html

{% load i18n %}

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Heelstagram</title>
  </head>
  <body>
    {% block content %}{% endblock %}

    <form action="/i18n/setlang/" method="POST">
      {% csrf_token %}
      <input name="next" type="hidden" value="{{ redirect_to }}" />
      <select name="language">
      {% get_language_info_list for LANGUAGES as LANGUAGES %}
      {% for language in LANGUAGES %}
        <option value="{{ language.code }}">{{ language.name_local }} ({{ language.code }})</option>
      {% endfor %}
      </select>
      <input type="submit" value="{% trans 'Change Language' %}" />
    </form>
  </body>
</html>

in home.html

{% extends 'base.html' %}

{# 即使 base.html 已經 load i18n 了,在每個 template 還是得在 load 一次 #}
{% load i18n %}

{% block content %}
<p>current language: "{{ LANGUAGE_CODE }}"</p>

{# 在 .po 中會表示為 msgid "string 1" #}
<p>1: {{ VAR_1 }} 會被翻譯</p>

{# 這兩行是等價的,在 .po 中會表示為 msgid "string 3" #}
<p>2: {% trans VAR_3 %} 會被翻譯</p>
<p>3: {% trans "string 3" %} 會被翻譯</p>

{# 在 .po 中會表示為: #}
{# msgctxt "說明的文字" #}
{# msgid "User" #}
<p>4: {% trans 'User' context '說明的文字' %} 會被翻譯</p>

<p>5: {{ VAR_3 }} 不會被翻譯</p>

{# 在 .po 中會表示為 msgid "%(VAR_2)s 這整句會被翻譯,包含 VAR_2 的值" #}
{# VAR_2 也會被翻譯是因為它等於 _('string 2') #}
{# blocktrans 中的變數不能是 artist.name 這種形式,必須用 artist_name 或是 with #}
<p>6: {% blocktrans %}{{ VAR_2 }} 這整句會被翻譯,包含 VAR_2 的值{% endblocktrans %}</p>

{# VAR_4 不會被翻譯是因為它等於 'string 4',只是一個單純的字串 #}
<p>7: {% blocktrans %}{{ VAR_4 }} 這整句會被翻譯,但是不包含 VAR_4 的值{% endblocktrans %}</p>
{% endblock %}

blocktrans 的作用是讓你在翻譯字串中插入 template contexts 變數。

ref:
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#trans-template-tag
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#internationalization-in-template-code

Create Language Files

要先安裝 gettext,否則會出現 /bin/sh: xgettext: command not found 的錯誤

# 必須手動建立 locale 目錄(放在 project 或 app 的根目錄)
$ mkdir -p locale

# 第一次執行,必須指定 locale name,不能直接用 `-a` 參數,這樣才會在 locale 底下產生相對應的目錄
# 注意!是 zh_TW 而不是 zh-tw
# 在專案根目錄執行 makemessages 就是對整個專案的所有 apps 產生翻譯檔案
$ ./manage.py makemessages -l en
$ ./manage.py makemessages -l ja
$ ./manage.py makemessages -l zh_CN
$ ./manage.py makemessages -l zh_TW
$ ./manage.py makemessages -l zh_Hans
$ ./manage.py makemessages -l zh_Hant
$ ./manage.py makemessages --all

# .js 的翻譯要額外指定 `-d djangojs` 才會產生 djangojs.po
$ ./manage.py makemessages -l zh_Hant -d djangojs --ignore=node_modules

# 如果你的 django.po 裡有些字串被標記為 `#, fuzzy` 的話,要記得刪掉,否則該字串不會被翻譯
$ ./manage.py compilemessages

# 只對特定 app 產生翻譯檔案
$ cd your_app
$ django-admin.py makemessages -l zh_Hant
$ django-admin.py compilemessages

ref:
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#localization-how-to-create-language-files
https://docs.djangoproject.com/en/dev/ref/django-admin/#makemessages

每次 compilemessages 完要記得重啟 server

How Django discovers translations

django-admin.py makemessages -a 這個指令會去收集整個 project 裡的所有 apps 的 locale 字串。
優先權最高的是 LOCALE_PATHS 定義的那個目錄,找不到的話才會去找個別 app 之下的 locale 目錄。

ref:
https://docs.djangoproject.com/en/dev/topics/i18n/translation/#how-django-discovers-translations

強制使用某種語言

除了可以用上面那個 /i18n/setlang/ 的 form 表單之外,也可以在 views 裡面這樣寫:

from django.shortcuts import render_to_response
from django.template import RequestContext
from django.utils.translation import activate
from django.utils.translation import ugettext as _

def common_processor(request):
    contexts = {
        'T1': _('string 1'),
    }

    return contexts

def home(request):
    activate('zh-tw')  # 強制使用正體中文,覆蓋掉 user 和 browser 的設定

    contexts = {
        'T2': _('string 2'),
        'T3': 'string 3',
    }

    return render_to_response('home.html', contexts, RequestContext(request, processors=[common_processor]))

ref:
https://docs.djangoproject.com/en/dev/ref/utils/#module-django.utils.translation

JavaScript

in urls.py

urlpatterns = patterns(
    '',
    ...
    url(r'^jsi18n/$', 'django.views.i18n.javascript_catalog'),
    ...
)

in your_shit.html

<script src="{% url 'django.views.i18n.javascript_catalog' %}"></script>
<script>
    var text = gettext('要被翻譯的字串');
</script>

gettext() 而不是 ugettext()

# 要加上 -d djangojs 才會去 parse .js 裡的字串
$ ./manage.py makemessages --all -d djangojs
Upload files to Amazon S3 when Travis CI builds pass

Upload files to Amazon S3 when Travis CI builds pass

Assume that you want to upload a xxx.whl file generated by pip wheel to Amazon S3 so that you will be able to run pip install https://url/to/s3/bucket/xxx.whl.

CAUTION! By default, only master branch's builds could trigger deployments in Travis CI.

Configuration

before_install:
  - pip install -U pip
  - pip install wheel

script:
  - python setup.py test

before_deploy:
  - pip wheel --wheel-dir=wheelhouse .

deploy:
  provider: s3
  access_key_id: "YOUR_KEY"
  secret_access_key: "YOUR_SECRET"
  bucket: YOUR_BUCKET
  acl: public_read
  local_dir: wheelhouse
  upload_dir: wheels
  skip_cleanup: true
# install from an URL directly
$ pip install https://url/to/s3/bucket/wheels/xxx.whl

ref:
https://docs.travis-ci.com/user/deployment/s3

Setup a static website on Amazon S3

Setup a static website on Amazon S3

Say that you would like to host your static site on Amazon S3 with a custom domain and, of course, HTTPS.

Create two S3 buckets

To serve requests from both root domain such as codetengu.com and subdomain such as www.codetengu.com, you must create two buckets named exactly codetengu.com and www.codetengu.com.

In this post, I assume that you want to redirect www.codetengu.com to codetengu.com.

ref:
https://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html

Upload your static files

$ cd /path/to/your_project_root/

$ aws s3 sync . s3://codetengu.com \
--acl "public-read" \
--exclude "*.DS_Store" \
--exclude "*.gitignore" \
--exclude ".git/*" \
--dryrun

$ aws s3 website s3://codetengu.com --index-document index.html --error-document error.html

ref:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Setup bucket policy for public accessing

In your S3 Management Console, click codetengu.com bucket > Properties > Edit bucket policy, enter:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::codetengu.com/*"
        }
    ]
}

Setup www redirecting

In your S3 Management Console, click www.codetengu.com bucket > Properties > Static Website Hosting, choose Redirect all requests to another host name, type codetengu.com.

Now you're able to access your website via:

Configure a custom domain

In the "Setting Up a Static Website Using a Custom Domain" guide I mentioned above, it uses Amazon Route 53 to manage DNS records; In this post, I use CloudFlare as my website's DNS provider instead.

  • Create a CNAME for codetengu.com to point to codetengu.com.s3-website-ap-northeast-1.amazonaws.com
  • Create a CNAME for www.codetengu.com to point to codetengu.com.s3-website-ap-northeast-1.amazonaws.com

Yep, you CAN create a CNAME record for root domain on CloudFlare, just like your can add an "Alias" on Route 53.

Wait for the DNS records to propagate then visit https://codetengu.com/.

AWS Lambda Cookbook

AWS Lambda Cookbook

AWS Lambda is an event-driven service that you can upload your code to it and run those code on-demand without having your own servers.

ref:
http://aws.amazon.com/lambda/
http://docs.aws.amazon.com/lambda/latest/dg/limits.html

API Gateway 就是 URL routing
Lambda 則是那些 route (endpoint) 對應的 handler
如果你是用 event 或 schedule 的方式呼叫 Lambda function 的話
可以不用 API Gateway

AWS Lambda 有兩種 invocation type
一是 RequestResponse,同步(例如綁定 API Gateway 和你在 Lambda Management Console 操作的時候)
二是 Event,非同步

Runtimes

AWS Lambda supports the following runtime versions:

  • nodejs (Node v0.10)
  • nodejs4.3
  • java
  • python

ref:
http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

Node.js

const aws = require('aws-sdk');

exports.handle = (event, context, callback) => {
  doYourShit();
  callback(null, 'DONE');
};

每個 Lambda function 會接收三個參數 eventcontextcallback

event 是從外部的 input
可能是來自 S3 object event、DynamoDB stream 或是由 API Gateway POST 進來的 JSON payload

context 則會包含當前這個 Lambda fuction 的一些 metadata
例如 context.getRemainingTimeInMillis()

callback 參數只有 Node.js runtime v4.3 才支援
v0.10 的話得用 context.succeed()context.fail()context.done()
不過誰他媽還在用 Node.js v0.10

ref:
http://docs.aws.amazon.com/lambda/latest/dg/programming-model.html
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-handler.html
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html
http://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

Calling another Lambda function in a Lambda function.

要注意的是
你的 Lambda function 的 role 得要有 invoke 其他 Lambda function 的權限才行

const util = require('util');

const aws = require('aws-sdk');

const params = {
  FunctionName: 'LambdaBaku_syncIssue',
  InvocationType: 'Event', // means asynchronous execution
  Payload: JSON.stringify({ issue_number: curatedIssue.number }),
};

lambda.invoke(params, (err, data) => {
  if (err) {
    console.log('FAIL', params);
    console.log(util.inspect(err));
  } else {
    console.log(data);
  }
});

ref:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html
http://stackoverflow.com/questions/31714788/can-an-aws-lambda-function-call-another

完整的程式碼放在 GitHub 上
https://github.com/CodeTengu/lambdabaku

Users and Roles

如果你是用 apex 來管理 Lambda functions 的話
確保你用的 AWS credential (User) 擁有 AWSLambdaFullAccessAWSLambdaRole 這兩個 permissions

以 project 為單位建立 Role 即可
例如 lambdabaku_role
你可以在 IAM Management Console 找到那些你建立的 roles
基本上用 Basic execution role 就夠了
反正之後可以隨時修改 Role 的 permission / policy
Lambda function 屬於哪個 VPC 是額外指定的
跟 Role 沒有關係
也就是說你用 Basic execution role 還是可以支援 VPC

如果想在 Lambda function 裡存取 DynamoDB
要記得在 Role 裡新增對應的設定

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1428341300017",
            "Effect": "Allow",
            "Action": [
                "dynamodb:*"
            ],
            "Resource": [
                "arn:aws:dynamodb:ap-northeast-1:004615714446:table/CodeTengu_Preference",
                "arn:aws:dynamodb:ap-northeast-1:004615714446:table/CodeTengu_WeeklyIssue",
                "arn:aws:dynamodb:ap-northeast-1:004615714446:table/CodeTengu_WeeklyPost"
            ]
        }
    ]
}

Scheduled Events

ref:
http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html

API Gateway

單純一點的話
Security 可以選 Open with access key
然後到 API Gateway 介面的 API Keys 底下新增一組 access key
然後分配一個 API stage 給它

使用的時候在 HTTP header 加上 x-api-key: YOUR_API_KEY 即可

ref:
http://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-api-keys.html

Related Projects

ref:
https://github.com/serverless/serverless
https://github.com/apex/apex
https://github.com/claudiajs/claudia
https://github.com/garnaat/kappa
https://github.com/Miserlou/Zappa
https://github.com/nficano/python-lambda

淺析 serverless 架構與實作
http://abalone0204.github.io/2016/05/22/serverless-simple-crud/

Deploy Lambda Functions via apex

$ curl https://raw.githubusercontent.com/apex/apex/master/install.sh | sh

$ apex deploy
$ apex invoke syncPublishedIssues --logs
$ echo -n '{"issue_number": 43}' | apex invoke syncIssue --logs

ref:
https://github.com/apex/apex
http://apex.run/