Lazy evaluation in Django middlewares

Lazy evaluation in Django middlewares

Attach a lazily evaluated function as a property of request in a middleware.

from django.contrib.gis.geoip import GeoIP
from django.utils.functional import SimpleLazyObject

def get_country_code(request):
    g = GeoIP()
    location = g.country(request.META['REMOTE_ADDR'])
    country_code = location.get('country_code', 'TW')

    return country_code

class CountryAndSiteMiddleware(object):

    def process_request(self, request):
        request.COUNTRY_CODE = SimpleLazyObject(lambda: get_country_code(request))

Then you could use request.COUNTRY_CODE whenever you want.

Setup Jupyter and other Machine Learning tools on macOS

Setup Jupyter and other Machine Learning tools on macOS

Jupyter Notebook is an interactive environment for running code in the browser. It allows you to create interactive documents that contain live code, rich text elements and visualizations. It's also a widely used tool for Data Scientists to make prototypes or demonstrations.

ref:
http://jupyter.org/

Install

$ brew install freetype gcc libffi libpng openssl pkg-config
$ pip install -U 
  cython 
  numpy 
  scipy 
  matplotlib 
  bokeh 
  seaborn 
  scikit-learn 
  surprise 
  gensim 
  nltk 
  pandas 
  jupyter

$ pip install jupyter_contrib_nbextensions && 
  jupyter contrib nbextension install --user

# install kernels for Python 2 and 3
$ python2 -m pip install ipykernel && 
  python2 -m ipykernel install --user

# list kernels
$ jupyter kernelspec list

# remove kernel
$ jupyter kernelspec uninstall apache_toree_scala

# start your notebook server
$ jupyter notebook
$ jupyter notebook --ip 0.0.0.0 --allow-root --no-browser

ref:
https://ipython.readthedocs.io/en/latest/install/kernel_install.html
https://jupyter.readthedocs.io/en/latest/running.html#running
https://github.com/ipython-contrib/jupyter_contrib_nbextensions

Or you could just download Anaconda and install it.
https://www.continuum.io/downloads#osx

Configuration

# ~/.ipython/profile_default/ipython_config.py    
c = get_config()

c.InteractiveShell.ast_node_interactivity = 'all'

# c.InteractiveShellApp.matplotlib = 'notebook'
c.InteractiveShellApp.matplotlib = 'inline'

Usage

Automatic module reload

%load_ext autoreload
%autoreload 2
import your_module

ref:
https://blog.3blades.io/jupyter-notebook-little-known-tricks-b0866a558017

Show media in notebook

# show image
from IPython.display import Image
Image('iris.png')

# show pdf
from IPython.display import IFrame
IFrame('iris.pdf', width='100%', height=700)

Django integration with django-extension

$ python manage.py shell_plus --notebook

ref:
https://stackoverflow.com/questions/35483328/how-do-i-set-up-jupyter-ipython-notebook-for-django

IPython: the Python REPL interpreter

IPython: the Python REPL interpreter

IPython is a neat alternative to Python's built-in REPL (Read-Eval-Print Loop) interpreter, also a kernel of Jupyter.

ref:
https://ipython.org/
https://jupyter.org/

Useful Commands

# cheatsheet
%quickref

# show details of any objects (including modules, classes, functions and variables)
random?
os.path.join?
some_variable?

# show source code of any objects
os.path.join??

# show nothing for a function that is not implemented in Python
len??

# run some shell commands directly in IPython
pwd
ll
cd
cp
rm
mv
mkdir new_folder

# run any shell command with ! prefix
!ls
!ping www.google.com
!youtube-dl

# assign command output to a variable
contents = !ls
print(contents)

ref:
http://ipython.readthedocs.io/en/stable/interactive/tutorial.html

Magic Functions

# list all magic functions
%lsmagic

# run a Python script and load objects into current session
%run my_script.py

# run a profiling for multi-line code
%%timeit
array = []
for i in xrange(100):
    array.append(i)

# paste multi-line code
# you might not need them in IPython 5.0+, just paste your code
%cpaste

# explore objects
%pdoc some_object
%pdef some_object
%psource some_object
%pfile some_object

# if you call it after hitting an exception, it will automatically open ipdb at the point of the exception
%debug

# the In object is a list which keeps track of the commands in order
print(In)

# the Out object is a dictionary mapping input numbers to their outputs
print(Out)
print(Out[2], _2)

ref:
https://ipython.readthedocs.io/en/stable/interactive/magics.html
https://www.safaribooksonline.com/library/view/python-data-science/9781491912126/ch02.html

Issues

Reload Any Python Module

from your.project import your_module
your_module.run_shit(123)

# after you made some changes on your_module
from importlib import reload
reload(your_module)

your_module.run_shit(123)

ref:
https://stackoverflow.com/questions/5364050/reloading-submodules-in-ipython

Speed up Python and Node.js builds on Travis CI

Speed up Python and Node.js builds on Travis CI

Travis CI's caching archives all directories listed in the configuration and uploads them to Amazon S3. Cached contents are available to any build on the repository, including Pull Requests. For Python and Node.js projects, you could cache both site-packages and node_modules directories in every Travis CI build.

Here is an example of .travis.yml:

sudo: false

language: python

python:
  - "2.7"

node_js: 4

cache:
  directories:
    - $HOME/.cache/pip
    - $HOME/virtualenv/python2.7.9/lib/python2.7/site-packages
    - node_modules

before_install:
  - pip install -U pip

install:
  - pip install -r requirements.txt
  - pip install coverage --ignore-installed
  - npm install

script:
  - coverage run manage.py test

In my case, after applying these changes, the installation time of pip and npm reduces from 180 seconds to 5 seconds.

One thing should be mentioned here: Since we didn't specify any bin folder in the configuration (and I don't think that's necessary), any executable file that is installed by pip such as coverage or django-admin.py will not exist in subsequent builds. If you need those commands, you could just force install them by adding pip install some_package --ignore-installed.

ref:
https://docs.travis-ci.com/user/caching/
https://stackoverflow.com/questions/19422229/how-to-cache-requirements-for-a-django-project-on-travis-ci
https://tzangms.com/how-to-speed-up-python-unit-test-on-travis-ci/

Django's get_or_create() may raise IntegrityError but subsequent get() raises DoesNotExist

Django's get_or_create() may raise IntegrityError but subsequent get() raises DoesNotExist

Django 的 SomeModel.objects.get_or_create() 實際上的步驟是:

  1. get
  2. if problem: save + some_trickery
  3. if still problem: get again
  4. if still problem: surrender and raise

get_or_create() 拋出 IntegrityError 但是接著 get() 一次卻會是 DoesNotExist,這個錯誤常常會發生在 MySQL 的 AUTOCOMMIT=OFF 且 isolation level 是 REPEATABLE READ 的情況下。MySQL 默認採用 REPEATABLE READ(而 PostgreSQL 預設是 READ COMMITTED),而 REPEATABLE READ 的定義是在同一個 transaction 之內 執行相同的 SELECT 一定會拿到相同的結果。

with transaction.commit_on_success():
    try:
        user, created = User.objects.get_or_create(id=self.id, username=self.username)
    except IntegrityError:
        # 這個 get() 可能會發生 DoesNotExist
        user = User.objects.get(id=self.id)

會發生那個 DoesNotExist 通常是因為:

  1. thread a 執行 get(),但是 DoesNotExist
  2. 所以 thread a 接著 create(),但是在資料建立之前
  3. thread b 幾乎在同一個時間點也執行 get(),也是 DoesNotExist
  4. thread a 成功地 create()
  5. 因為在 3 拿不到資料,所以 thread b 也執行 create(),但是因為 UNIQUE CONSTRAINT,導致失敗了
  6. 所以 thread b 又會 get() 一次,但是因為 REPEATABLE READ,thread b 不會拿到 thread a 建立的資料,所以 DoesNotExist
  7. 所以那個 User.DoesNotExist 就是 thread b 拋出來的錯誤
  8. thread a 是成功的,你事後去資料庫裡查詢,會看到那個 User 的資料好端端地在那裡

ref:
http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django
https://blog.ionelmc.ro/2014/12/28/terrible-choices-mysql/

解決的方法,除了就乾脆不要在同一個 transaction 裡之外,基本上就只能把 isolation level 改成 READ COMMITTED 了。似乎沒有其他辦法了,因為這就是 isolation level 的定義。

As of MariaDB/MySQL 5.1, if you use READ COMMITTED or enable innodb_locks_unsafe_for_binlog, you must use row-based binary logging.

SHOW VARIABLES LIKE 'binlog_format';
SET SESSION binlog_format = 'ROW';
SET GLOBAL binlog_format = 'ROW';

ref:
http://dba.stackexchange.com/questions/2678/how-do-i-show-the-binlog-format-on-a-mysql-server

Django 和 Celery 都建議使用 READ COMMITTED
https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
https://code.djangoproject.com/ticket/13906
https://code.djangoproject.com/ticket/6641
http://docs.celeryproject.org/en/latest/faq.html#mysql-is-throwing-deadlock-errors-what-can-i-do

方法一

SELECT @@GLOBAL.tx_isolation, @@tx_isolation;

# MySQL
SET SESSION tx_isolation='READ-COMMITTED';
SET GLOBAL tx_isolation='READ-COMMITTED';

# MariaDB
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

ref:
http://dev.mysql.com/doc/refman/5.5/en/set-transaction.html

方法二

/etc/mysql/my.cnf

[mysqld]
transaction-isolation = READ-COMMITTED

方法三

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        ...
        'OPTIONS': {
            'init_command': 'SET SESSION tx_isolation="READ-COMMITTED"',
        },
    }
}

方法四

自己寫一個 get_or_create()

from django.db import transaction

@transaction.atomic
# or
@transaction.commit_on_success
def my_get_or_create(...):
    try:
        obj = MyObj.objects.create(...)
    except IntegrityError:
        transaction.commit()
        obj = MyObj.objects.get(...)
    return obj

ref:
http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django

MySQL 默認是 AUTOCOMMIT
http://dev.mysql.com/doc/refman/5.5/en/commit.html

Django 默認也是 AUTOCOMMIT,除非你使用了 TransactionMiddleware(Django 1.5 以前)或是設置 ATOMIC_REQUESTS = False(Django 1.6 以後)。TransactionMiddleware 的效果類似 @transaction.commit_on_success@transaction.atomic,都是把 AUTOCOMMIT 關掉,差別在於範圍不同。

ref:
http://django.readthedocs.io/en/1.5.x/topics/db/transactions.html#django-s-default-transaction-behavior
http://django.readthedocs.io/en/1.6.x/topics/db/transactions.html#managing-autocommit
http://django.readthedocs.io/en/1.6.x/ref/settings.html#std:setting-DATABASE-ATOMIC_REQUESTS