Install Go (Golang)

Install Go (Golang)

Go is a programming language created by Google.

ref:
https://golang.org/

Installation

on Ubuntu:

$ sudo apt-get install -V golang

# or

$ wget https://dl.google.com/go/go1.10.1.linux-amd64.tar.gz
$ tar -C /usr/local -xzf go1.10.1.linux-amd64.tar.gz
$ vim /etc/profile
export PATH=$PATH:/usr/local/go/bin

$ which go

on macOS:

$ brew install go

$ go env
GOCACHE="/Users/vinta/Library/Caches/go-build"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/vinta/go"
GOROOT="/usr/local/Cellar/go/1.10/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.10/libexec/pkg/tool/darwin_amd64"
...

ref:
https://golang.org/dl/

Environment Variables

  • GOPATH is your workspace.
    • go get fetches source code and puts them into $GOPATH/src.

ref:
https://golang.org/doc/code.html

Read and save file in Django / Python

File 和 ImageFile 接受 Python 的 file 或 StringIO 物件
而 ContentFile 接受 string

ref:
https://docs.djangoproject.com/en/dev/ref/files/file/#the-file-object

Django Form

image_file = request.FILES['file']

# 方法一
profile.mugshot.save(image_file.name, image_file)

# 方法二
profile.mugshot = image_file

profile.save()

ref:
File Upload with Form in Django

open('/path/to/file.png')

from django.core.files import File

with open('/home/vinta/image.png', 'rb') as f:
    profile.mugshot = File(f)
    profile.save()

Django ContentFile

import os
import uuid

from django.core.files.base import ContentFile

import requests

url = 'http://vinta.ws/static/photo.jpg'
r = requests.get(url)
file_url, file_ext = os.path.splitext(r.url)
file_name = '%s%s' % (str(uuid.uuid4()).replace('-', ''), file_ext)

profile.mugshot.save('123.png', ContentFile(r.content), save=False)

# 如果 profile.mugshot 是 ImageField 欄位的話
# 可以用以下的方式來判斷它是不是合法的圖檔
try:
    profile.mugshot.width
except TypeError:
    raise RuntimeError('圖檔格式不正確')

profile.save()

Data URI, Base64

from binascii import a2b_base64

from django.core.files.base import ContentFile

data_uri = 'data:image/jpeg;base64,/9j/4AAQSkZJRg....'
head, data = data_uri.split(',')
binary_data = a2b_base64(data)

# 方法一
profile.mugshot.save('whatever.jpg', ContentFile(binary_data), save=False)
profile.save()

# 不能用這種方式,因為少了 file name
profile.mugshot = ContentFile(binary_data)
profile.save()

# 方法二
f = open('image.png', 'wb')
f.write(binary_data)
f.close()

# 方法三
from StringIO import StringIO
from PIL import Image
img = Image.open(StringIO(binary_data))
print img.size

ref:
http://stackoverflow.com/questions/19395649/python-pil-create-and-save-image-from-data-uri

StringIO, PIL image

你就把 StringIO 想成是 open('/home/vinta/some_file.txt', 'rb') 的 file 物件

from StringIO import StringIO

from PIL import Image
import requests

r = requests.get('http://vinta.ws/static/photo.jpg')
img = Image.open(StringIO(r.content))
print pil_image.size

StringIO, PIL image, Django

from StringIO import StringIO

from django.core.files.base import ContentFile

from PIL import Image

raw_img_io = StringIO(binary_data)
img = Image.open(raw_img_io)
img = img.resize((524, 328), Image.ANTIALIAS)
img_io = StringIO()
img.save(img_io, 'PNG', quality=100)

profile.image.save('whatever.png', ContentFile(img_io.getvalue()), save=False)
profile.save()

ref:
http://stackoverflow.com/questions/3723220/how-do-you-convert-a-pil-image-to-a-django-file

Download file from URL, tempfile

import os
import tempfile
import requests
import xlrd

try:
    file_path = report.file.path
    temp = None
except NotImplementedError:
    url = report.file.url
    r = requests.get(url, stream=True)
    file_url, file_ext = os.path.splitext(r.url)

    # delete=True 會在 temp.close() 之後自己刪掉
    temp = tempfile.NamedTemporaryFile(prefix='report_file_', suffix=file_ext, dir='/tmp', delete=False)
    file_path = temp.name

    with open(file_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
                f.flush()

wb = xlrd.open_workbook(file_path)

...

# 因為是 tempfile.NamedTemporaryFile(delete=False)
# 所以你要自己刪掉
try:
    os.remove(temp.name)
except AttributeError:
    pass

ref:
http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
http://pymotw.com/2/tempfile/

Service Discovery with Consul and consul-template

Service Discovery with Consul and consul-template

Consul agent 必須裝在 cluster 裡的每一台機器上,agent 可以是以 server mode 或 client mode 運行。官方的建議是每個 datacenter 裡至少有 3-5 個 agent server。

  • agent server 負責串連整個 clusters 和紀錄 nodes 資訊
  • agent client 負責 register service, run health check, forward queris to servers

ref:
https://consul.io/intro/getting-started/agent.html

Installation

$ sudo apt-get install unzip
$ wget https://dl.bintray.com/mitchellh/consul/0.5.0_linux_amd64.zip
$ unzip 0.5.0_linux_amd64.zip
$ sudo mv consul /usr/local/bin/

ref:
https://www.consul.io/intro/getting-started/install.html

Usage

You can get your Atlas token from https://atlas.hashicorp.com/settings/tokens.

# the first node
$ consul agent -server -bootstrap-expect 2 -data-dir /tmp/consul -config-dir /etc/consul.d \
-atlas=vinta/consul-test \
-atlas-token="YOUR_TOKEN" \
-atlas-join

# other nodes: server mode
$ consul agent -server -data-dir /tmp/consul -config-dir /etc/consul.d \
-atlas=vinta/consul-test \
-atlas-token="YOUR_TOKEN" \
-atlas-join

# other nodes: client model
$ consul agent -data-dir /tmp/consul -config-dir /etc/consul.d \
-atlas=vinta/consul-test \
-atlas-token="YOUR_TOKEN" \
-atlas-join

# if we want to access the web UI remotely
# we'll have to specify the public IP of the client (104.236.148.236)
$ consul agent -data-dir /tmp/consul -config-dir /etc/consul.d \
-client 104.236.148.236 \
-ui-dir /root/consul_web_ui/dist \

# when you don't specify `-join`
# run `consul join NEW_NODE_IP` command on any node (servers or clients) of this cluster
$ consul join 10.134.249.144

$ consul members

# you can query services on any node of this cluster
$ http http://localhost:8500/v1/catalog/nodes
$ http http://localhost:8500/v1/catalog/services
$ http http://localhost:8500/v1/catalog/service/salt-master
$ http http://localhost:8500/v1/catalog/service/nginx
$ http http://localhost:8500/v1/health/service/nginx?passing

ref:
https://consul.io/docs/agent/options.html
https://consul.io/docs/guides/atlas.html
https://consul.io/docs/agent/http.html

Service Definition

server 和 client 都可以有 service;address 欄位默認就是執行該 consul agent 的機器的 IP,所以可以不填。

in /etc/consul.d/nginx.json

{
  "service": {
    "name": "nginx",
    "port": 80,
    "check": {
      "script": "curl localhost > /dev/null 2>&1",
      "interval": "5s"
    }
  }
}
$ consul reload

$ http http://localhost:8500/v1/catalog/service/nginx

ref:
https://consul.io/docs/agent/services.html
https://consul.io/docs/agent/checks.html

Integration with consul-template

Take HAProxy as an example.

$ wget https://github.com/hashicorp/consul-template/releases/download/v0.8.0/consul-template_0.8.0_linux_amd64.tar.gz
$ tar -zxvf consul-template_0.8.0_linux_amd64.tar.gz
$ sudo mv consul-template_0.8.0_linux_amd64/consul-template /usr/local/bin/

in /etc/consul-template/haproxy.conf

consul = "127.0.0.1:8500"

syslog {
  enabled = true
}

template {
  source = "/etc/consul-template/haproxy.cfg.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "sudo service haproxy reload"
}

in /etc/consul-template/haproxy.cfg.ctmpl

backend web_nodes
    mode http
    balance roundrobin
    option forwardfor
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    cookie haproxyserverid insert nocache maxidle 1h
    option httpchk HEAD /
    {{ range service "nginx" }}
    server {{ .Node }} {{ .Address }}:{{ .Port }} check cookie {{ .Node }}
    {{ end }}
# -dry: Dump generated templates to stdout
$ consul-template -config /etc/consul-template/haproxy.conf -dry

ref:
https://github.com/hashicorp/consul-template

MySQL schema design for large tables

MySQL schema design for large tables

把大字段拆出來

主要是指那些 text 類型的欄位
例如 Article 的 content 內文欄位
可以拆出來變成 ArticleContent 表
因為在大部分的 query 裡你其實都不會需要 content 欄位
通常只有顯示單一 article 的時候才需要

例如 Profile 表
phone_number, address, occupation 之類的很少被用來 query 的欄位
其實也可以拆出來到另外一個表
ExtendedProfile

大表的水平拆分:置頂訊息

假設你有一個存放論壇文章的表 Post
文章分成一般用戶的「普通文章」和管理者發布的「置頂文章」
你可能會在 Post 表新增一個 boolean 欄位
區分是不是置頂文章

但是置頂文章和普通文章的數量可能差很多
而且置頂文章的 query 次數也會多很多(每一頁都要顯示)
所以另一個可行的辦法是幫置頂文章獨立開一個 table

這種方式就是所謂的「水平拆分」

不是所有的 COUNT 都需要實時

很多 count 可以透過 crontab 定時跑
然後存進某個欄位裡
前端顯示時直接去讀那個欄位的值即可

把同一個 counter 的資料存成多筆 record

這樣可以減少同時有多個 request 修改了同一個 row 所引起的 lock 的問題
Deadlock found when trying to get lock; try restarting transaction

在你的 counter table 新增一個 slot 欄位(槽)
每次寫入的時候隨機選擇(透過 slot 欄位)某一個 record 寫入
要讀取的時候就把這幾筆 SUM 起來

song_id count slot
100 2 1
100 5 2
100 3 3
100 8 4
101 1 1
101 19 2
101 3 3
101 7 4

UPDATE song_counter SET count = count + 1 WHERE song_id = 100 AND slot = 1;
UPDATE song_counter SET count = count + 1 WHERE song_id = 100 AND slot = 2;
UPDATE song_counter SET count = count + 1 WHERE song_id = 100 AND slot = 3;

缺點就是 record 會變超多
可以用 crontab 定期整理數據
把 counter 資料寫到 slot 1
然後刪掉其他的 slot

ref:
High Performance MySQL, Chapter 4.4

MySQL index 索引

index 和 key 在 MySQL 裡面都是指索引

Which columns should be indexed?

唯一性太差的欄位不適合 "單獨" 建立 index
應該盡量選那些能夠「最大地」區分資料的欄位(id 就是一個最好的例子)
也就是說用這個欄位當 WHERE 條件去跑
出來的資料筆數越少則「唯一性」越高

ref:
http://talentluke.iteye.com/blog/1843868

ADD INDEX

建議使用 pt-online-schema-change

# 建立索引
mysql> ALTER TABLE writing_article ADD INDEX writing_article_last_modified (last_modified);
# or
mysql> CREATE INDEX writing_article_last_modified ON writing_article (last_modified);

time pt-online-schema-change \
--user=root --ask-pass --host=127.0.0.1 \
--print \
--execute \
--alter "ADD INDEX relationships_relationship_0e7efed3 (from_user_id), ADD INDEX relationships_relationship_bc172800 (to_user_id)" \
D=streetvoice,t=relationships_relationship

# 24 cores, 8G RAM
real 21m24.237s
user 0m2.532s
sys 0m0.392s

# 列出索引,Key_name 就是索引的名字
mysql> SHOW INDEX FROM relationships_relationship;
mysql> SHOW KEYS FROM relationships_relationship;
mysql> SHOW CREATE TABLE relationships_relationship;

# 刪除索引
mysql> DROP INDEX relationships_relationship_0e7efed3 ON relationships_relationship;
mysql> DROP INDEX relationships_relationship_bc172800 ON relationships_relationship;
mysql> DROP INDEX writing_article_953cd7bc ON writing_article;
mysql> DROP INDEX writing_article_last_modified ON writing_article;

ref:
https://dev.mysql.com/doc/refman/5.5/en/show-index.html
http://blog.xuite.net/hsiung03/blog/64217097
http://stackoverflow.com/questions/5213339/how-to-see-indexes-for-a-database-or-table

ForeignKey creates an index automatically in MySQL

所以你其實不需要特地幫 models.ForeignKey(User) 欄位加上 db_index=True

$ ./manage.py sqlindexes relationships

ref:
http://stackoverflow.com/questions/304317/does-mysql-index-foreign-key-columns-automatically

EXPLAIN

注意結尾的 \Q 是指 output 的格式使用條列式,而不是表格

mysql> EXPLAIN SELECT COUNT(*) FROM `relationships_relationship` WHERE `relationships_relationship`.`to_user_id` = 910188;
mysql> EXPLAIN SELECT COUNT(*) FROM `relationships_relationship` WHERE `relationships_relationship`.`to_user_id` = 841766;
mysql> EXPLAIN SELECT COUNT(*) FROM `relationships_relationship` WHERE `relationships_relationship`.`from_user_id` = 841766;

mysql> EXPLAIN SELECT COUNT(*) FROM `relationships_relationship` WHERE `relationships_relationship`.`to_user_id` = 841766\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: relationships_relationship
         type: index
possible_keys: relationships_relationship_to_user_id
          key: relationships_relationship_to_user_id
      key_len: 4
          ref: const
         rows: 20
        Extra: Using index
1 row in set (0.00 sec)

The possible_keys column indicates which indexes MySQL can choose from use to find the rows in this table.
The key column indicates the key (index) that MySQL actually decided to use.
The rows column indicates the number of rows MySQL believes it must examine to execute the query.

type: 優 > 劣
system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

rows 的值越小越好,表示 MySQL 在越小的資料集中查找資料

Extra: Using index 表示使用了索引,但是仍舊有可能是使用了錯誤的索引

ref:
https://blog.hinablue.me/entry/mysql-note-mysql-query-explain-optimization/ << 值得一讀
http://baike.baidu.com/view/9416916.htm

Query Profiler

mysql> SET profiling=1;
mysql> DO YOUR QUERY;
mysql> SHOW profiles;
mysql> SHOW profile cpu, block io for query 6;

The Slow Query Log

[mysqld]
...
slow_query_log
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 10
log_slow_verbosity = query_plan
log_queries_not_using_indexes
...
$ sudo su root
$ pt-query-digest /var/log/mysql/mariadb-slow.log

ref:
http://dev.mysql.com/doc/refman/5.5/en/slow-query-log.html
http://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html