Tools for profiling your Python projects

Tools for profiling your Python projects

The first aim of profiling is to test a representative system to identify what's slow, using too much RAM, causing too much disk I/O or network I/O. You should keep in mind that profiling typically adds an overhead to your code.

In this post, I will introduce tools you could use to profile your Python or Django projects, including: timer, pycallgraph, cProfile, line-profiler, memory-profiler.

ref:
https://stackoverflow.com/questions/582336/how-can-you-profile-a-script
https://www.airpair.com/python/posts/optimizing-python-code

timer

The simplest way to profile a piece of code.

ref:
https://docs.python.org/3/library/timeit.html

pycallgraph

pycallgraph is a Python module that creates call graph visualizations for Python applications.

ref:
https://pycallgraph.readthedocs.org/en/latest/

$ sudo apt-get install graphviz
$ pip install pycallgraph
# in your_app/middlewares.py
from pycallgraph import Config
from pycallgraph import PyCallGraph
from pycallgraph.globbing_filter import GlobbingFilter
from pycallgraph.output import GraphvizOutput
import time

class PyCallGraphMiddleware(object):

    def process_view(self, request, callback, callback_args, callback_kwargs):
        if 'graph' in request.GET:
            config = Config()
            config.trace_filter = GlobbingFilter(include=['rest_framework.*', 'api.*', 'music.*'])
            graphviz = GraphvizOutput(output_file='pycallgraph-{}.png'.format(time.time()))
            pycallgraph = PyCallGraph(output=graphviz, config=config)
            pycallgraph.start()

            self.pycallgraph = pycallgraph

    def process_response(self, request, response):
        if 'graph' in request.GET:
            self.pycallgraph.done()

        return response
# in settings.py
MIDDLEWARE_CLASSES = (
    'your_app.middlewares.PyCallGraphMiddleware',
    ...
)
$ python manage.py runserver 0.0.0.0:8000
$ open http://127.0.0.1:8000/your_endpoint/?graph=true

cProfile

cProfile is a tool in Python's standard library to understand which functions in your code take the longest to run. It will give you a high-level view of the performance problem so you can direct your attention to the critical functions.

ref:
http://igor.kupczynski.info/2015/01/16/profiling-python-scripts.html
https://ymichael.com/2014/03/08/profiling-python-with-cprofile.html

$ python -m cProfile manage.py test member
$ python -m cProfile -o my-profile-data.out manage.py test --failtest
$ python -m cProfile -o my-profile-data.out manage.py runserver 0.0.0.0:8000

$ pip install cprofilev
$ cprofilev -f my-profile-data.out -a 0.0.0.0 -p 4000
$ open http://127.0.0.1:4000

cProfile with django-cprofile-middleware

$ pip install django-cprofile-middleware
# in settings.py
MIDDLEWARE_CLASSES = (
    ...
    'django_cprofile_middleware.middleware.ProfilerMiddleware',
)

Open any url with a ?prof suffix to do the profiling, for instance, http://localhost:8000/foo/?prof

ref:
https://github.com/omarish/django-cprofile-middleware

cProfile with django-extension and kcachegrind

kcachegrind is a profiling data visualization tool, used to determine the most time consuming execution parts of a program.

ref:
http://django-extensions.readthedocs.org/en/latest/runprofileserver.html

$ pip install django-extensions
# in settings.py
INSTALLED_APPS += (
    'django_extensions',
)
$ mkdir -p my-profile-data

$ python manage.py runprofileserver \
--noreload \
--nomedia \
--nostatic \
--kcachegrind \
--prof-path=my-profile-data \
0.0.0.0:8000

$ brew install qcachegrind --with-graphviz
$ qcachegrind my-profile-data/root.003563ms.1441992439.prof
# or
$ sudo apt-get install kcachegrind
$ kcachegrind my-profile-data/root.003563ms.1441992439.prof

cProfile with django-debug-toolbar

You're only able to use django-debug-toolbar if your view returns HTML, it needs a place to inject the debug panels into your DOM on the webpage.

ref:
https://github.com/django-debug-toolbar/django-debug-toolbar

$ pip install django-debug-toolbar
# in settiangs.py
INSTALLED_APPS += (
    'debug_toolbar',
)

DEBUG_TOOLBAR_PANELS = [
    ...
    'debug_toolbar.panels.profiling.ProfilingPanel',
    ...
]

line-profiler

line-profiler is a module for doing line-by-line profiling of functions. One of my favorite tools.

ref:
https://github.com/rkern/line_profiler

$ pip install line-profiler
# in your_app/views.py
def do_line_profiler(view=None, extra_view=None):
    import line_profiler

    def wrapper(view):
        def wrapped(*args, **kwargs):
            prof = line_profiler.LineProfiler()
            prof.add_function(view)
            if extra_view:
                [prof.add_function(v) for v in extra_view]
            with prof:
                resp = view(*args, **kwargs)
            prof.print_stats()
            return resp

        return wrapped

    if view:
        return wrapper(view)

    return wrapper

@do_line_profiler
def your_view(request):
    pass

ref:
https://djangosnippets.org/snippets/10483/

There is a pure Python alternative: pprofile.
https://github.com/vpelletier/pprofile

line-profiler with django-devserver

ref:
https://github.com/dcramer/django-devserver

$ pip install git+git://github.com/dcramer/django-devserver#egg=django-devserver

in settings.py

INSTALLED_APPS += (
    'devserver',
)

DEVSERVER_MODULES = (
    ...
    'devserver.modules.profile.LineProfilerModule',
    ...
)

DEVSERVER_AUTO_PROFILE = False

in your_app/views.py

from devserver.modules.profile import devserver_profile

@devserver_profile()
def your_view(request):
    pass

line-profiler with django-debug-toolbar-line-profiler

ref:
http://django-debug-toolbar.readthedocs.org/en/latest/
https://github.com/dmclain/django-debug-toolbar-line-profiler

$ pip install django-debug-toolbar django-debug-toolbar-line-profiler
# in settings.py
INSTALLED_APPS += (
    'debug_toolbar',
    'debug_toolbar_line_profiler',
)

DEBUG_TOOLBAR_PANELS = [
    ...
    'debug_toolbar_line_profiler.panel.ProfilingPanel',
    ...
]

memory-profiler

This is a Python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for Python programs.

ref:
https://pypi.python.org/pypi/memory_profiler

$ pip install memory-profiler psutil
# in your_app/views.py
from memory_profiler import profile

@profile(precision=4)
def your_view(request):
    pass

There are other options:
http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended

dogslow

ref:
https://bitbucket.org/evzijst/dogslow

django-slow-tests

ref:
https://github.com/realpython/django-slow-tests

django-debug-toolbar

django-debug-toolbar is a tool sets to display various debug information about the current request and response in Django, works perfectly in Python 2.

ref:
https://github.com/django-debug-toolbar/django-debug-toolbar

Install

$ pip install \
  django-debug-toolbar \
  django-debug-toolbar-line-profiler \
  django-debug-toolbar-template-profiler \
  django-debug-toolbar-template-timings \
  django-debug-panel \
  memcache-toolbar \
  pympler \
  git+https://github.com/scuml/debug-toolbar-mail

ref:
https://github.com/dmclain/django-debug-toolbar-line-profiler << 推薦
https://github.com/node13h/django-debug-toolbar-template-profiler
https://github.com/orf/django-debug-toolbar-template-timings << 推薦
https://github.com/recamshak/django-debug-panel
https://github.com/ross/memcache-debug-panel
https://pythonhosted.org/Pympler/django.html
https://github.com/scuml/debug-toolbar-mail

Python 3
https://github.com/lerela/django-debug-toolbar-line-profile

Configuration

in settings.py

INSTALLED_APPS += (
    'debug_toolbar',
    # 'debug_toolbar_line_profiler',
    # 'memcache_toolbar',
    # 'pympler',
    # 'template_profiler_panel',
    # 'template_timings_panel',
)

DEBUG_TOOLBAR_PANELS = [
    # 'debug_toolbar.panels.versions.VersionsPanel',
    # 'debug_toolbar.panels.timer.TimerPanel',
    # 'debug_toolbar.panels.settings.SettingsPanel',
    # 'debug_toolbar.panels.headers.HeadersPanel',
    # 'debug_toolbar.panels.request.RequestPanel',
    'debug_toolbar.panels.sql.SQLPanel',
    # 'debug_toolbar.panels.staticfiles.StaticFilesPanel',
    # 'debug_toolbar.panels.templates.TemplatesPanel',
    # 'template_timings_panel.panels.TemplateTimings.TemplateTimings',
    # 'template_profiler_panel.panels.template.TemplateProfilerPanel'
    # 'debug_toolbar.panels.cache.CachePanel',
    # 'memcache_toolbar.panels.memcache.MemcachePanel',
    # 'debug_toolbar.panels.profiling.ProfilingPanel',
    # 'debug_toolbar_line_profiler.panel.ProfilingPanel',
    # 'pympler.panels.MemoryPanel',
    # 'debug_toolbar.panels.signals.SignalsPanel',
    # 'debug_toolbar.panels.logging.LoggingPanel',
    # 'debug_toolbar.panels.redirects.RedirectsPanel',
]

if 'debug_toolbar' in INSTALLED_APPS:
    MIDDLEWARE_CLASSES = (
        'debug_toolbar.middleware.DebugToolbarMiddleware',
        ...
    )

INTERNAL_IPS = [
    '127.0.0.1',
]

ref:
http://django-debug-toolbar.readthedocs.org/en/latest/configuration.html
http://django-debug-toolbar.readthedocs.org/en/latest/panels.html

The templatetag {% include %} in Django

Loads a template and renders it with the current context.

# 會把 template context 也帶進去
{% include 'new_style/_bottom_action_btns' %}

{% for repost in recommend_creatives %}
    # 會把 repost 這個參數也帶進去!
    {% include 'new_style/creative_small.html' %}
{% endfor %}

# 但是你也可以額外指定要傳進去的 context
{% include 'new_style/_bottom_action_btns' with item=item repost=None %}

# 或是用 only 來限制只傳「顯式地指定的參數」進去
{% include 'new_style/_bottom_action_btns' with item=item only %}

ref:
https://docs.djangoproject.com/en/dev/ref/templates/builtins/#std:templatetag-include

Database Routers in Django

Database Routers in Django

把部分 models / tables 獨立到一台資料庫

Database Router

沒辦法在 Model class 裡指定這個 model 只能用某個 database
而是要用 database router
就是判斷 model._meta.app_label == 'xxx' 的時候指定使用某一個 database
database 是指定義在 settings.DATABASES 的那些

不過 django 不支援跨 database 的 model relation
你不能用 foreign key 或 m2m 指向另一個 database 裡的 model
但是其實你直接用 user_id, song_id 之類的 int 欄位來記錄就好了

in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'main',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
    'warehouse': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'warehouse',
        'USER': 'whatever',
        'PASSWORD': 'whatever',
        'HOST': '',
        'PORT': '',
    },
}

DATABASE_ROUTERS = [
    'app.db_routers.SVDatabaseRouter',
]

in app/db_routers.py

class SVWarehouseDBMixin(object):
    """
    CAUTION! 有這個 minxin 的 class 會被放到 warehouse 這台資料庫!

    原本是想繼承 models.Model 用 abstract = True
    但是載入 db router 的時候會衝突(待研究)
    只好改成 mixin
    """

    pass


class SVDatabaseRouter(object):

    def db_for_read(self, model, **hints):
        if issubclass(model, SVWarehouseDBMixin):
            return 'warehouse'

        return 'default'

    def db_for_write(self, model, **hints):
        if issubclass(model, SVWarehouseDBMixin):
            return 'warehouse'

        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        if isinstance(obj1, SVWarehouseDBMixin) and isinstance(obj2, SVWarehouseDBMixin):
            return True
        elif (not isinstance(obj1, SVWarehouseDBMixin)) and (not isinstance(obj2, SVWarehouseDBMixin)):
            return True

        return False

    def allow_syncdb(self, db, model):
        if db == 'default':
            if issubclass(model, SVWarehouseDBMixin):
                return False
            else:
                return True
        elif db == 'warehouse':
            if issubclass(model, SVWarehouseDBMixin):
                return True
            else:
                return False

        return False

ref:
https://docs.djangoproject.com/en/dev/topics/db/multi-db/

Models

繼承 SVWarehouseDBMixin 這個 minxin 的 class 會被放到 warehouse 這台資料庫!
所以 migrate 的時候要注意,記得在 migration 檔案裡加上:

from south.db import dbs
warehouse_db = dbs['warehouse']

這樣 south 才會在 warehouse 這台資料庫上建立 table
不然就是你自己手動去 CREATE TABLE

from app.db_routers import SVWarehouseDBMixin

class PlayRecord(SVWarehouseDBMixin, models.Model):
    song_id = models.IntegerField()
    user_id = models.IntegerField(null=True, blank=True)
    is_full = models.BooleanField(default=False)
    ip_address = models.IPAddressField()
    location = models.CharField(max_length=2)
    created_at = models.DateTimeField()

Migration

如果你要把舊有的 app 的 models 搬到另一台資料庫
但是 models 不動(還是放在本來的 app 底下)
你可能會需要 reset 整個 migration 紀錄
從頭開始建立一個新的 migration
因為 schema 會錯亂
所以還是建議新開一個 app 來放那些要搬到另一台資料庫的 models
這樣 database router 和 migration 都會比較單純

in app/migrations/0001_initial.py

$ pip install south==1.0.2

# syncdb 默認只會作用到 default 資料庫,你要明確指定要用哪個 database 才行
$ ./manage.py syncdb --noinput
$ ./manage.py syncdb --noinput --database=warehouse

# migrate 卻可以作用到其他資料庫
# 因為 migrate 哪個資料庫是 migration file 裡的 `db` 參數在決定的
$ ./manage.py migrate

ref:
http://stackoverflow.com/questions/7029228/is-using-multiple-databases-and-south-together-possible

Unit Tests

Django only flushes the default database at the start of each test run. If your setup contains multiple databases, and you have a test that requires every database to be clean, you can use the multi_db attribute on the test suite to request a full flush.

from django.test import TestCase

class YourBTestCase(TestCase):
    multi_db = True

    def setUp(self):
        do_shit()

ref:
https://docs.djangoproject.com/en/dev/topics/testing/tools/
https://docs.djangoproject.com/en/dev/topics/testing/advanced/#topics-testing-advanced-multidb

Read and save file in Django / Python

File 和 ImageFile 接受 Python 的 file 或 StringIO 物件
而 ContentFile 接受 string

ref:
https://docs.djangoproject.com/en/dev/ref/files/file/#the-file-object

Django Form

image_file = request.FILES['file']

# 方法一
profile.mugshot.save(image_file.name, image_file)

# 方法二
profile.mugshot = image_file

profile.save()

ref:
File Upload with Form in Django

open('/path/to/file.png')

from django.core.files import File

with open('/home/vinta/image.png', 'rb') as f:
    profile.mugshot = File(f)
    profile.save()

Django ContentFile

import os
import uuid

from django.core.files.base import ContentFile

import requests

url = 'http://vinta.ws/static/photo.jpg'
r = requests.get(url)
file_url, file_ext = os.path.splitext(r.url)
file_name = '%s%s' % (str(uuid.uuid4()).replace('-', ''), file_ext)

profile.mugshot.save('123.png', ContentFile(r.content), save=False)

# 如果 profile.mugshot 是 ImageField 欄位的話
# 可以用以下的方式來判斷它是不是合法的圖檔
try:
    profile.mugshot.width
except TypeError:
    raise RuntimeError('圖檔格式不正確')

profile.save()

Data URI, Base64

from binascii import a2b_base64

from django.core.files.base import ContentFile

data_uri = 'data:image/jpeg;base64,/9j/4AAQSkZJRg....'
head, data = data_uri.split(',')
binary_data = a2b_base64(data)

# 方法一
profile.mugshot.save('whatever.jpg', ContentFile(binary_data), save=False)
profile.save()

# 不能用這種方式,因為少了 file name
profile.mugshot = ContentFile(binary_data)
profile.save()

# 方法二
f = open('image.png', 'wb')
f.write(binary_data)
f.close()

# 方法三
from StringIO import StringIO
from PIL import Image
img = Image.open(StringIO(binary_data))
print img.size

ref:
http://stackoverflow.com/questions/19395649/python-pil-create-and-save-image-from-data-uri

StringIO, PIL image

你就把 StringIO 想成是 open('/home/vinta/some_file.txt', 'rb') 的 file 物件

from StringIO import StringIO

from PIL import Image
import requests

r = requests.get('http://vinta.ws/static/photo.jpg')
img = Image.open(StringIO(r.content))
print pil_image.size

StringIO, PIL image, Django

from StringIO import StringIO

from django.core.files.base import ContentFile

from PIL import Image

raw_img_io = StringIO(binary_data)
img = Image.open(raw_img_io)
img = img.resize((524, 328), Image.ANTIALIAS)
img_io = StringIO()
img.save(img_io, 'PNG', quality=100)

profile.image.save('whatever.png', ContentFile(img_io.getvalue()), save=False)
profile.save()

ref:
http://stackoverflow.com/questions/3723220/how-do-you-convert-a-pil-image-to-a-django-file

Download file from URL, tempfile

import os
import tempfile
import requests
import xlrd

try:
    file_path = report.file.path
    temp = None
except NotImplementedError:
    url = report.file.url
    r = requests.get(url, stream=True)
    file_url, file_ext = os.path.splitext(r.url)

    # delete=True 會在 temp.close() 之後自己刪掉
    temp = tempfile.NamedTemporaryFile(prefix='report_file_', suffix=file_ext, dir='/tmp', delete=False)
    file_path = temp.name

    with open(file_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
                f.flush()

wb = xlrd.open_workbook(file_path)

...

# 因為是 tempfile.NamedTemporaryFile(delete=False)
# 所以你要自己刪掉
try:
    os.remove(temp.name)
except AttributeError:
    pass

ref:
http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
http://pymotw.com/2/tempfile/