python

argparse: Create a Command-line App with Python

2015-12-092019-10-22VintaPython

argparse is a Python standard library makes it easy to write a CLI application. You should use this module instead of optparse which is deprecated since Python 2.7.

ref:
https://docs.python.org/3/library/argparse.html

Basic example

#!/usr/bin/env python

from __future__ import print_function

import argparse
import sys

import jokekappa

class JokeKappaCLI(object):

    def __init__(self):
        parser = argparse.ArgumentParser(
            prog='jokekappa',
            description='humor is a serious thing, you should take it seriously',
        )
        self.parser = parser
        self.parser.add_argument('-v', '--version', action='version', version=jokekappa.__version__)

        self.subparsers = parser.add_subparsers(title='sub commands')
        self.subparsers \
            .add_parser('one', help='print one joke randomly') \
            .set_defaults(func=self.tell_joke)
        self.subparsers \
            .add_parser('all', help='print all jokes') \
            .set_defaults(func=self.tell_jokes)
        self.subparsers \
            .add_parser('update', help='update jokes from sources') \
            .set_defaults(func=self.update_jokes)

    def parse_args(self):
        if len(sys.argv) == 1:
            namespace = self.parser.parse_args(['one', ])
        else:
            namespace = self.parser.parse_args()
        namespace.func()

    def tell_joke(self):
        joke = jokekappa.get_joke()
        print(joke['content'])

    def tell_jokes(self):
        for joke in jokekappa.get_jokes():
            print(joke['content'])

    def update_jokes(self):
        jokekappa.update_jokes()
        print('Done')

def main():
    JokeKappaCLI().parse_args()

if __name__ == '__main__':
    main()

ref:
https://github.com/CodeTengu/JokeKappa
https://stackoverflow.com/questions/5176691/argparse-how-to-specify-a-default-subcommand

Advanced example

In following code, you're able to create a Python module called pangu and a command-line tool also called pangu, both share the same codebase.

in pangu.py

from __future__ import print_function

import argparse
import sys

__version__ = '3.3.0'
__all__ = ['spacing_text', 'PanguCLI']

def spacing_text(text):
    """
    You could find real code in https://github.com/vinta/pangu.py
    """
    return text.upper().strip()

class PanguCLI(object):

    def __init__(self):
        parser = argparse.ArgumentParser(
            prog='pangu',
            description='paranoid text spacing',
        )
        self.parser = parser
        self.parser.add_argument('-v', '--version', action='version', version=__version__)
        self.parser.add_argument('text', action='store', type=str)

    def parse(self):
        if not sys.stdin.isatty():
            print(spacing_text(sys.stdin.read()))
        elif len(sys.argv) > 1:
            namespace = self.parser.parse_args()
            print(spacing_text(namespace.text))
        else:
            self.parser.print_help()
        sys.exit(0)

if __name__ == '__main__':
    PanguCLI().parse()

in bin/pangu

#!/usr/bin/env python

from pangu import PanguCLI

if __name__ == '__main__':
    PanguCLI().parse()

in setup.py

from setuptools import setup

setup(
    name='pangu',
    py_modules=['pangu', ],
    scripts=['bin/pangu', ],
    ...
)

As a result, there're multiple usages:

$ pangu "abc"
$ python -m pangu "abc"
$ echo "abc" | pangu
$ echo "abc" | python -m pangu
ABC

ref:
https://github.com/vinta/pangu.py

Accept a list as option

parser.add_argument('-u', '--usernames', type=lambda x: x.split(','), dest='usernames', required=True)
# your_command -u vinta
# your_command -u vinta,saiday

Accept conditional argument

ref:
https://stackoverflow.com/questions/15459997/passing-integer-lists-to-python

Tools for Profiling your Python Projects

2015-08-312019-10-22VintaPython, Web Development

The first aim of profiling is to test a representative system to identify what's slow, using too much RAM, causing too much disk I/O or network I/O. You should keep in mind that profiling typically adds an overhead to your code.

In this post, I will introduce tools you could use to profile your Python or Django projects, including: timer, pycallgraph, cProfile, line-profiler, memory-profiler.

ref:
https://stackoverflow.com/questions/582336/how-can-you-profile-a-script
https://www.airpair.com/python/posts/optimizing-python-code

timer

The simplest way to profile a piece of code.

ref:
https://docs.python.org/3/library/timeit.html

pycallgraph

pycallgraph is a Python module that creates call graph visualizations for Python applications.

ref:
https://pycallgraph.readthedocs.org/en/latest/

$ sudo apt-get install graphviz
$ pip install pycallgraph

# in your_app/middlewares.py
from pycallgraph import Config
from pycallgraph import PyCallGraph
from pycallgraph.globbing_filter import GlobbingFilter
from pycallgraph.output import GraphvizOutput
import time

class PyCallGraphMiddleware(object):

    def process_view(self, request, callback, callback_args, callback_kwargs):
        if 'graph' in request.GET:
            config = Config()
            config.trace_filter = GlobbingFilter(include=['rest_framework.*', 'api.*', 'music.*'])
            graphviz = GraphvizOutput(output_file='pycallgraph-{}.png'.format(time.time()))
            pycallgraph = PyCallGraph(output=graphviz, config=config)
            pycallgraph.start()

            self.pycallgraph = pycallgraph

    def process_response(self, request, response):
        if 'graph' in request.GET:
            self.pycallgraph.done()

        return response

# in settings.py
MIDDLEWARE_CLASSES = (
    'your_app.middlewares.PyCallGraphMiddleware',
    ...
)

$ python manage.py runserver 0.0.0.0:8000
$ open http://127.0.0.1:8000/your_endpoint/?graph=true

cProfile

cProfile is a tool in Python's standard library to understand which functions in your code take the longest to run. It will give you a high-level view of the performance problem so you can direct your attention to the critical functions.

ref:
http://igor.kupczynski.info/2015/01/16/profiling-python-scripts.html
https://ymichael.com/2014/03/08/profiling-python-with-cprofile.html

$ python -m cProfile manage.py test member
$ python -m cProfile -o my-profile-data.out manage.py test --failtest
$ python -m cProfile -o my-profile-data.out manage.py runserver 0.0.0.0:8000

$ pip install cprofilev
$ cprofilev -f my-profile-data.out -a 0.0.0.0 -p 4000
$ open http://127.0.0.1:4000

cProfile with django-cprofile-middleware

$ pip install django-cprofile-middleware

# in settings.py
MIDDLEWARE_CLASSES = (
    ...
    'django_cprofile_middleware.middleware.ProfilerMiddleware',
)

Open any url with a ?prof suffix to do the profiling, for instance, http://localhost:8000/foo/?prof

ref:
https://github.com/omarish/django-cprofile-middleware

cProfile with django-extension and kcachegrind

kcachegrind is a profiling data visualization tool, used to determine the most time consuming execution parts of a program.

ref:
http://django-extensions.readthedocs.org/en/latest/runprofileserver.html

$ pip install django-extensions

# in settings.py
INSTALLED_APPS += (
    'django_extensions',
)

$ mkdir -p my-profile-data

$ python manage.py runprofileserver \
--noreload \
--nomedia \
--nostatic \
--kcachegrind \
--prof-path=my-profile-data \
0.0.0.0:8000

$ brew install qcachegrind --with-graphviz
$ qcachegrind my-profile-data/root.003563ms.1441992439.prof
# or
$ sudo apt-get install kcachegrind
$ kcachegrind my-profile-data/root.003563ms.1441992439.prof

cProfile with django-debug-toolbar

You're only able to use django-debug-toolbar if your view returns HTML, it needs a place to inject the debug panels into your DOM on the webpage.

ref:
https://github.com/django-debug-toolbar/django-debug-toolbar

$ pip install django-debug-toolbar

# in settiangs.py
INSTALLED_APPS += (
    'debug_toolbar',
)

DEBUG_TOOLBAR_PANELS = [
    ...
    'debug_toolbar.panels.profiling.ProfilingPanel',
    ...
]

line-profiler

line-profiler is a module for doing line-by-line profiling of functions. One of my favorite tools.

ref:
https://github.com/rkern/line_profiler

$ pip install line-profiler

# in your_app/views.py
def do_line_profiler(view=None, extra_view=None):
    import line_profiler

    def wrapper(view):
        def wrapped(*args, **kwargs):
            prof = line_profiler.LineProfiler()
            prof.add_function(view)
            if extra_view:
                [prof.add_function(v) for v in extra_view]
            with prof:
                resp = view(*args, **kwargs)
            prof.print_stats()
            return resp

        return wrapped

    if view:
        return wrapper(view)

    return wrapper

@do_line_profiler
def your_view(request):
    pass

ref:
https://djangosnippets.org/snippets/10483/

There is a pure Python alternative: pprofile.
https://github.com/vpelletier/pprofile

line-profiler with django-devserver

ref:
https://github.com/dcramer/django-devserver

$ pip install git+git://github.com/dcramer/django-devserver#egg=django-devserver

in settings.py

INSTALLED_APPS += (
    'devserver',
)

DEVSERVER_MODULES = (
    ...
    'devserver.modules.profile.LineProfilerModule',
    ...
)

DEVSERVER_AUTO_PROFILE = False

in your_app/views.py

from devserver.modules.profile import devserver_profile

@devserver_profile()
def your_view(request):
    pass

line-profiler with django-debug-toolbar-line-profiler

ref:
http://django-debug-toolbar.readthedocs.org/en/latest/
https://github.com/dmclain/django-debug-toolbar-line-profiler

$ pip install django-debug-toolbar django-debug-toolbar-line-profiler

# in settings.py
INSTALLED_APPS += (
    'debug_toolbar',
    'debug_toolbar_line_profiler',
)

DEBUG_TOOLBAR_PANELS = [
    ...
    'debug_toolbar_line_profiler.panel.ProfilingPanel',
    ...
]

memory-profiler

This is a Python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for Python programs.

ref:
https://pypi.python.org/pypi/memory_profiler

$ pip install memory-profiler psutil

# in your_app/views.py
from memory_profiler import profile

@profile(precision=4)
def your_view(request):
    pass

There are other options:
http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended

dogslow

ref:
https://bitbucket.org/evzijst/dogslow

django-slow-tests

ref:
https://github.com/realpython/django-slow-tests

django-debug-toolbar: The Debugging Toolkit for Django

2015-08-302019-10-22VintaPython, Web Development

django-debug-toolbar is a tool sets to display various debug information about the current request and response in Django.

ref:
https://github.com/django-debug-toolbar/django-debug-toolbar

Install

$ pip install \
  django-debug-toolbar \
  django-debug-toolbar-line-profiler \
  django-debug-toolbar-template-profiler \
  django-debug-toolbar-template-timings \
  django-debug-panel \
  memcache-toolbar \
  pympler \
  git+https://github.com/scuml/debug-toolbar-mail

ref:
https://github.com/dmclain/django-debug-toolbar-line-profiler
https://github.com/node13h/django-debug-toolbar-template-profiler
https://github.com/orf/django-debug-toolbar-template-timings
https://github.com/recamshak/django-debug-panel
https://github.com/ross/memcache-debug-panel
https://pythonhosted.org/Pympler/django.html
https://github.com/scuml/debug-toolbar-mail

Python 3
https://github.com/lerela/django-debug-toolbar-line-profile

Configuration

in urls.py

from django.conf import settings
from django.conf.urls import include, url

if settings.DEBUG:
    import debug_toolbar
    urlpatterns = [
        url(r'^__debug__/', include(debug_toolbar.urls)),
    ] + urlpatterns

in settings.py

INSTALLED_APPS += (
    'debug_toolbar',
    # 'debug_toolbar_line_profiler',
    # 'memcache_toolbar',
    # 'pympler',
    # 'template_profiler_panel',
    # 'template_timings_panel',
)
DEBUG_TOOLBAR_PANELS = [
    # 'debug_toolbar.panels.versions.VersionsPanel',
    # 'debug_toolbar.panels.timer.TimerPanel',
    # 'debug_toolbar.panels.settings.SettingsPanel',
    # 'debug_toolbar.panels.headers.HeadersPanel',
    # 'debug_toolbar.panels.request.RequestPanel',
    'debug_toolbar.panels.sql.SQLPanel',
    # 'debug_toolbar.panels.staticfiles.StaticFilesPanel',
    # 'debug_toolbar.panels.templates.TemplatesPanel',
    # 'template_timings_panel.panels.TemplateTimings.TemplateTimings',
    # 'template_profiler_panel.panels.template.TemplateProfilerPanel'
    # 'debug_toolbar.panels.cache.CachePanel',
    # 'memcache_toolbar.panels.memcache.MemcachePanel',
    # 'debug_toolbar.panels.profiling.ProfilingPanel',
    # 'debug_toolbar_line_profiler.panel.ProfilingPanel',
    # 'pympler.panels.MemoryPanel',
    # 'debug_toolbar.panels.signals.SignalsPanel',
    # 'debug_toolbar.panels.logging.LoggingPanel',
    # 'debug_toolbar.panels.redirects.RedirectsPanel',
]

if 'debug_toolbar' in INSTALLED_APPS:
    MIDDLEWARE_CLASSES = list(MIDDLEWARE_CLASSES)
    MIDDLEWARE_CLASSES += [
        'debug_toolbar.middleware.DebugToolbarMiddleware',
    ]

def show_toolbar(request):
    return True

DEBUG_TOOLBAR_CONFIG = {
    'SHOW_TOOLBAR_CALLBACK': show_toolbar,
}

INTERNAL_IPS = (
    '127.0.0.1',
)

ref:
http://django-debug-toolbar.readthedocs.org/en/latest/configuration.html
http://django-debug-toolbar.readthedocs.org/en/latest/panels.html

要確保沒有在 MIDDLEWARE_CLASSES 裡啟用以下的 middlewares：

'django.middleware.gzip.GZipMiddleware'
'django.middleware.http.ConditionalGetMiddleware'

ref:
http://django-debug-toolbar.readthedocs.io/en/stable/installation.html#automatic-setup

Python with Excel: xlrd, xlsxwriter, and xlutils

2015-06-142019-10-22VintaPython

Libraries

xlsxwriter 的文件寫得比較好

ref:
http://www.python-excel.org/
https://xlsxwriter.readthedocs.org/en/latest/
http://openpyxl.readthedocs.org/en/latest/

Usage

row 是橫排
column 是直排

Default format

import xlsxwriter

workbook = xlsxwriter.Workbook('label_copy.xlsx')

# default cell format
workbook.formats[0].set_font_size(12)
workbook.formats[0].set_text_wrap() # 要加上這個才能正常顯示多行
workbook.formats[0].set_align('vcenter')

ref:
https://xlsxwriter.readthedocs.org/en/latest/format.html

Multiple lines

lines_format = workbook.add_format({
    'align': 'left',
    'font_size': 12,
    'text_wrap': True,
    'valign': 'vcenter',
})

# 或是用 """多行"""
content = 'first line\nsecond line'
worksheet.write(0, 0, content, lines_format)

重點是要加上 text_wrap

ref:
http://stackoverflow.com/questions/15370432/writing-multi-line-strings-into-cells-using-openpyxl

Write to existing excel files

from xlutils.copy import copy as xlutils_copy
import xlrd

rb = xlrd.open_workbook('your_file.xls', formatting_info=True)
wb = xlutils_copy(rb)
ws = wb.get_sheet(0)
ws.write(0, 0, 'Hello World')

ref:
https://stackoverflow.com/questions/2725852/writing-to-existing-workbook-using-xlwt

Examples

ref:
https://xlsxwriter.readthedocs.org/en/latest/examples.html

ipdb: The interactive Python debugger with IPython

2015-06-112019-10-22VintaPython

ipdb is an interactive Python Debugger with IPython integration, which features tab completion and syntax highlighting, etc. In layman's terms, ipdb is a better pdb.

ref:
https://github.com/gotcha/ipdb

Usage

$ pip install -U ipdb

ref:
https://pypi.python.org/pypi/ipdb

Add a breakpoint to any place you want to inspect, then run your code.

import ipdb; ipdb.set_trace()

If you use Sublime Text 3, try Python Breakpoints.
https://github.com/obormot/PythonBreakpoints

Useful Commands

Oldest frame is the frame in the stack where your program started; it is the oldest in time; the Newest frame, the other end of the stack, is where Python is executing code and is the current frame of execution.

# help: Print the list of all commands
h

# help: Print help about the certain command
h break

# print: Print the value of the expression
p some_obj
pp some_obj

# Print detailed information about the object
pinfo some_obj
pinfo2 some_obj

# args: Print arguments with their values of the current function
a

# list: List 11 lines of source code around the current line
l

# list: List 11 lines of source code around line 123
l 123

# longlist: List all source code for the current function or frame
ll

# jump: Jump to line 123, skip the execution of anything between
j 123

# args: List all arguments of the current function
a

# step: Execute code line by line, it may jump to another frame when a function call is encountered
s

# next: Execute code line by line, it doesn't enter functions called from the statement being executed
n

# return: Continue execution until the current function returns.
r

# continue: Continue execution, only stop when a breakpoint is encountered
c

# break: List all breakpoints
b

# break: Set a breakpoint at line 123
b 123

# break: Set a breakpoint at line 123 of file.py
b path/to/file.py:123

# break: Set a breakpoint on some_func that will be triggered if some_arg == 0
b some_func, some_arg == 0

# clear: Clear all breakpoints
clear

# where: Print a stack trace
w

# up: Move the current frame one level up in the stack trace
u

# down: Move the current frame one level down in the stack trace
d

# quit: Quit debugging
q

# use ! to run Python code that may conflict with pdb's built-in commands
!r = 123
!r = 123; c = 455

ref:
https://docs.python.org/2/library/pdb.html#debugger-commands
https://docs.python.org/3/library/pdb.html#debugger-commands
https://pymotw.com/2/pdb/
https://pymotw.com/3/pdb/
https://medium.com/instamojo-matters/become-a-pdb-power-user-e3fc4e2774b2

post_mortem

Debugging a failure after a program terminates is called post-mortem debugging.

>>> do_shit(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pdb_post_mortem.py", line 13, in go
    for i in range(self.num_loops):
AttributeError: 'MyObj' object has no attribute 'num_loops'
>>> import ipdb; ipdb.pm()
>>> w

trace

Tracing a program as it runs. In this case, it will enter ipdb when sys.path changes.

import sys

# this function will execute on every line!!!
def trace_sys_path(frame, event, arg):
    if sys.path[0].endswith('/lib'):
        ipdb.set_trace()
    return trace_sys_path

sys.settrace(trace_sys_path)

ref:
https://youtu.be/5XvAVgcbmdY?t=22m51s

Use IPython magic functions in ipdb

Because that ipdb is not a full IPython shell: actually, it provides the same Python Debugger interface as pdb, ipdb lacks many features of IPython, for instance, magic functions. You could use following code to enter a real IPython environment for debugging.

from IPython import embed; embed()

Instead of import ipdb; ipdb.set_trace().

ref:
http://stackoverflow.com/questions/16184487/use-ipython-magic-functions-in-ipdb-shell
https://github.com/gotcha/ipdb/issues/33