I Failed the Turing Test

HTTP Cache Headers in Django

2015-01-292019-10-22VintaPython, Web Development

Conditional View Processing

def latest_entry(request, blog_id):
    return Entry.objects.filter(blog=blog_id).latest("published").published

@condition(last_modified_func=latest_entry)
def front_page(request, blog_id):
    ...

ref:
https://docs.djangoproject.com/en/dev/topics/conditional-view-processing/

Cache Middlewares

如果有啟用 Django 的 cache middleware（就是 UpdateCacheMiddleware 和 FetchFromCacheMiddleware）
每一個 request 都會被標上 Cache-Control: max-age=600
那個 600 是根據 CACHE_MIDDLEWARE_SECONDS 設定的值

只要設置了 max-age > 0
response header 中就會被自動加入 Cache-Control、Expires 和 Last-Modified 兩個欄位

ref:
https://docs.djangoproject.com/en/dev/ref/settings/#std:setting-CACHE_MIDDLEWARE_SECONDS

Never Cache Decorator

from django.views.decorators.cache import never_cache

@never_cache
def myview(request):
    pass

如果你單純的就是不希望被 cache
就使用這種方式
@never_cache 只會設置 Cache-Control: max-age=0（max-age=0 是馬上過期）

Cache Control Decorator

from django.views.decorators.cache import cache_control

class SongDetail(SVAPIDetailView):
    serializer_class = api_serializers.SongDetailSerializer

    @cache_control(no_store=True, no_cache=True, max_age=0)
    def get(self, request, song_id):
        do_something()

        return Response(data)

不知道為什麼，只設置 no-store 和 no-cache 的話
iOS 的 AFNetworking 還是會 cache
照道理說 no-store 的優先權應該是最高的
目前的解法是使用 Cache-Control: max-age=0

ref:
https://docs.djangoproject.com/en/dev/topics/cache/#controlling-cache-using-other-headers

Scrapy: The Web Scraping Framework for Python

2015-01-112019-10-22VintaPython, Web Development

Scrapy is a fast high-level web crawling and web scraping framework.

ref:
https://doc.scrapy.org/en/latest/

Install

# on Ubuntu
$ sudo apt-get install libxml2-dev libxslt1-dev libffi-dev

# on Mac
$ brew install libffi

$ pip install scrapy service_identity

Usage

# interative shell
# http://doc.scrapy.org/en/latest/intro/tutorial.html#trying-selectors-in-the-shell
$ scrapy shell "http://www.wendyslookbook.com/2013/09/the-frame-a-digital-glossy/"
# or
$ scrapy shell --spider=seemodel
>>> view(response)
>>> fetch(req_or_url)

# create a project
$ scrapy startproject blackwindow

# create a spider
$ scrapy genspider fancy www.fancy.com

# run spider
$ scrapy crawl fancy
$ scrapy crawl pinterest -L ERROR

Spider
去爬資料的程式，用 parse() 定義你要 parse 哪些資料

Item
定義抓回來的資料欄位，可以想成是 django 的 model

Pipeline
對抓回來的資料進行加工，可能是清除 html 或是檢查重複之類的

scrapy 底層是用 lxml 和 Twisted

ref:
https://github.com/vinta/BlackWidow

Tips

Debugging

from scrapy.shell import inspect_response
inspect_response(response, self)

These 2 lines will invoke the interative shell.

相對路徑 XPath

divs = response.xpath('//div')
for p in divs.xpath('.//p'):  # extracts all <p> inside
    print p.extract()

Access Django Model in Scrapy

def setup_django_env(django_settings_dir):
    import imp
    import sys

    from django.core.management import setup_environ

    django_project_path = os.path.abspath(os.path.join(django_settings_dir, '..'))
    sys.path.append(django_project_path)
    sys.path.append(django_settings_dir)

    f, filename, desc = imp.find_module('settings', [django_settings_dir, ])
    project = imp.load_module('settings', f, filename, desc)

    setup_environ(project)

# where Django settings.py placed
DJANGO_SETTINGS_DIR = '/all_projects/heelsfetishism/heelsfetishism'
setup_django_env(DJANGO_SETTINGS_DIR)

then you can import Django's modules in scrapy, like this:

from django.contrib.auth.models import User

from app.models import SomeModel

State

http://doc.scrapy.org/en/latest/topics/jobs.html#keeping-persistent-state-between-batches

def parse_item(self, response):
    # parse item here
    self.state['items_count'] = self.state.get('items_count', 0) + 1

Close Spider

from scrapy.exceptions import CloseSpider

# 只能在 spider 裡頭呼叫，不能用在 pipeline 裡
raise CloseSpider('Stop')

Login in Spider

from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.http import Request, FormRequest

from blackwidow.items import HeelsItem

class SeeModelSpider(CrawlSpider):
    name = 'seemodel'
    allowed_domains = ['www.seemodel.com', ]
    login_page = 'http://www.seemodel.com/member.php?mod=logging&action=login'
    start_urls = [
        'http://www.seemodel.com/forum.php?mod=forumdisplay&fid=41&filter=heat&orderby=heats',
        'http://www.seemodel.com/forum.php?mod=forumdisplay&fid=42&filter=heat&orderby=heats',
    ]

    rules = (
        Rule(
            SgmlLinkExtractor(allow=r'forum\.php\?mod=viewthread&tid=\d+'),
            callback='parse_item',
            follow=False,
        ),
    )

    def start_requests(self):
        self.username = self.settings['SEEMODEL_USERNAME']
        self.password = self.settings['SEEMODEL_PASSWORD']

        yield Request(
            url=self.login_page,
            callback=self.login,
            dont_filter=True,
        )

    def login(self, response):
        return FormRequest.from_response(
            response,
            formname='login',
            formdata={
                'username': self.username,
                'password': self.password,
                'cookietime': 'on',
            },
            callback=self.check_login_response,
        )

    def check_login_response(self, response):
        if self.username not in response.body:
            self.log("Login failed")
            return

        self.log("Successfully logged in")

        return [Request(url=url, dont_filter=True) for url in self.start_urls]

    def parse_item(self, response):
        item = HeelsItem()
        item['comment'] = response.xpath('//*[@id="thread_subject"]/text()').extract()
        item['image_urls'] = response.xpath('//ignore_js_op//img/@zoomfile').extract()
        item['source_url'] = response.url

        return item

ref:
https://doc.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-userlogin

Others

XPath 的選擇節點語法
http://mi.hosp.ncku.edu.tw/km/index.php/dotnet/48-netdisk/57-xml-xpath

Avoiding getting banned
http://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned

Python 抓取框架：Scrapy 的架构
http://biaodianfu.com/scrapy-architecture.html

Download images
https://scrapy.readthedocs.org/en/latest/topics/images.html

Parse datetime in Python and JavaScript

2014-12-302019-10-22VintaJavaScript, Python, Web Development

Python

I recommend dateutil.

ref:
https://dateutil.readthedocs.org/en/latest/

import datetime
from dateutil import parser as dateutil_parser

>>> dateutil_parser.parse('2014-12-24T16:15:16')
datetime.datetime(2014, 12, 24, 16, 15, 16)

>>> datetime_obj = datetime.datetime.strptime('2014-12-24T16:15:16', '%Y-%m-%dT%H:%M:%S')
datetime.datetime(2014, 12, 24, 16, 15, 16)

>>> datetime_obj = datetime.datetime.strptime('201408282300', '%Y%m%d%H%M')
datetime.datetime(2014, 8, 28, 23, 0)

>>> datetime_obj.strftime('%Y-%m-%d %H:%M')

strftime >> datetime -> str
strptime >> str --> datetime

ref:
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior

Django Template

class DriverInfoForm(forms.ModelForm):
    service_time_start = forms.TimeField(
        widget=forms.TimeInput(format='%H:%M'),
        input_formats=['%H:%M', ]
    )

@register.filter
def str_to_time(time_str, output_format):
    """
    把字串轉成 datetime obj
    再依據 output_format 輸出

    {{ news.modified_at|str_to_time:"%Y/%m/%d %H:%M" }}
    """

    from dateutil import parser

    datetime_obj = parser.parse(time_str, fuzzy=True)

    return datetime_obj.strftime(output_format)

日期：{{ withdraw.presented_at|date:"%Y 年 %n 月" }}
聯絡時間：{{ driver.service_time_start|date:"H:i" }} - {{ driver.service_time_end|date:"H:i" }}

要注意的是，Django 似乎不能 parse AM / PM，所以儘量用 24 小時制。

ref:
https://docs.djangoproject.com/en/dev/ref/templates/builtins/#date

JavaScript

I recommend moment.js.

ref:
https://momentjs.com/

var today = new Date().toISOString().slice(0, 10);
// 2016-05-11

var t1 = new Date('2016-05-02T03:00:00.000+01:00');
// Mon May 02 2016 10:00:00 GMT+0800 (CST)

var t1_timestamp_ms = t1.getTime();
// 要注意的是 JavaScript 的 getTime() 的單位是 ms
// 1462154400000

var t1_timestamp = t1.getTime() / 1000;
// 1462154400

var t2 = new Date(1485596172 * 1000);
// Sat Jan 28 2017 17:36:12 GMT+0800 (CST)

var t3 = moment('201408292300', 'YYYYMMDDHHmm');

var t3 = moment('2018-02-02')
var timestamp = time.unix()
// 單位是 second
// 1518192000

ref:
https://stackoverflow.com/questions/3552461/how-to-format-a-javascript-date

Send Emails in Django

2014-12-302019-10-22VintaPython, Web Development

Sending emails with Amazon SES, Mailgun, Zoho, or Gmail in Django.

Configuration

in settings.py

SERVER_EMAIL = '[email protected]'
DEFAULT_FROM_EMAIL = 'Hourmasters <{0}>'.format(SERVER_EMAIL)
REPLY_TO_EMAIL = '[email protected]'

Amazon SES (Simple Email Service)

在 Amazon SES 上驗證你的 domain
在你的 email 服務商（例如 Google Apps）上建立一個 email 帳號，例如 [email protected]
在 Amazon SES 上驗證這個 email 帳號
收信，點一下確認信裡的超連結
在 Amazon SES 上 Request a Sending Limit Increase

如果你沒有 Request a Sending Limit Increase
預設會是在一個 sandbox 裡面
你只能寄信給你有在 Amazon SES 上驗證過的 email 帳號

ref:
https://console.aws.amazon.com/ses/home

$ pip install django-ses

EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
EMAIL_HOST = 'email-smtp.us-east-1.amazonaws.com'
EMAIL_HOST_USER = 'YOUR_AWS_ACCESS_KEY_ID'
EMAIL_HOST_PASSWORD = 'YOUR_AWS_SECRET_ACCESS_KEY'
EMAIL_PORT = 587
EMAIL_USE_TLS = True

$ python manage.py ses_email_address -l

ref:
https://github.com/django-ses/django-ses

Mailgun

EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
EMAIL_HOST = 'smtp.mailgun.org'
EMAIL_HOST_USER = '[email protected]'
EMAIL_HOST_PASSWORD = 'XXX'
EMAIL_PORT = 587
EMAIL_USE_TLS = True

如果原本就是用 EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
可以無縫改用 https://github.com/pmclanahan/django-celery-email

ref:
http://www.mailgun.com/pricing
http://thingsilearned.com/2011/06/07/mailgun-as-an-smtp-server-for-django-apps/

Zoho

Django 1.7 之前沒有 EMAIL_USE_SSL 這個設定
所以連 zoho 的 mail server 都會 timeout
你可以安裝 django-smtp-ssl

EMAIL_BACKEND = 'django_smtp_ssl.SSLEmailBackend'
EMAIL_HOST = 'smtp.zoho.com'
EMAIL_HOST_USER = '[email protected]'
EMAIL_HOST_PASSWORD = 'XXX'
EMAIL_PORT = 465

ref:
https://github.com/bancek/django-smtp-ssl
https://stackoverflow.com/questions/18335697/send-email-through-zoho-smtp

Gmail

EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
EMAIL_HOST = 'smtp.gmail.com'
EMAIL_HOST_USER = '[email protected]'  # 也可以是 Google App
EMAIL_HOST_PASSWORD = 'XXX'
EMAIL_PORT = 587
EMAIL_USE_TLS = True

Usage

in views.py

from django.core.mail import EmailMessage
from django.core.mail import send_mail
from django.template.loader import render_to_string

mail_context = {
    'name': 'Vinta',
    'email': '[email protected]',
    'content': 'YOU SUCK',
}
msg = EmailMessage(
    subject='Subject',
    body=render_to_string('email/contact_email.html', mail_context),
    from_email='[email protected]',
    to=['[email protected]', '[email protected]'],
    headers={'Reply-To': settings.REPLY_TO_EMAIL},
)
msg.content_subtype = 'html'  # or 'plain'
msg.send()

# or

send_mail(
    'Subject',
    'Message',
    'YOUR NAME <[email protected]>',
    ['[email protected]', '[email protected]']
)

ref:
https://docs.djangoproject.com/en/dev/topics/email/

Send Email Attachments with Unicode Filenames in Django

2014-11-272019-10-22VintaPython, Web Development

使用 MIMEApplication 並設置 attachment 的 header
不然如果檔名有中文的話
用 Gmail 收到的附件檔名會是 noname

def withdraw_send_email(request, pk):
    withdraw = get_object_or_404(Withdraw, pk=pk)

    email = '[email protected]'
    subject = _(u'Packer 數位發行服務')
    body = render_to_string('dps/email/withdraw_notice.html', {})

    message = EmailMessage(
        subject=subject,
        body=body,
        to=[email, ],
    )
    message.content_subtype = 'html'

    from email.mime.application import MIMEApplication

    # 檔名包含中文的話，要用這種方式 Gmail 收到的附件檔名才不會是 noname
    filename = '%s 結算表.xlsx' % (withdraw.serial_number[:6].encode('utf-8'))
    file_data = withdraw.file.read()
    attach_file = MIMEApplication(file_data, _subtype='vnd.ms-excel')
    attach_file.add_header('Content-Disposition', 'attachment; filename="%s"' % (filename))

    message.attach(attach_file)

    message.send()

    return HttpResponse('OK')

ref:
https://docs.python.org/2/library/email.mime.html