碼天狗週刊 第 125 期 @vinta - Amazon Web Services, Google Cloud Platform, Kubernetes, DevOps, MySQL, Redis

碼天狗週刊 第 125 期 @vinta - Amazon Web Services, Google Cloud Platform, Kubernetes, DevOps, MySQL, Redis

本文同步發表於 CodeTengu Weekly - Issue 125

Apex and Terraform: The easiest way to manage AWS Lambda functions

因為一直都有訂閱 RSS 的習慣,但是常常工作一忙就積了一堆文章忘記看,可是又發現自己就算上班事情很多還是會三不五時刷一下 Twitter ~~順便抱怨幾句~~,所以就乾脆建了一個 @vinta_rss_bot,透過 Zapier 同步 Feedly 裡的文章到 Twitter,讓自己在刷推的時候很容易不小心就看到。實測了一個多禮拜,效果不錯,大家可以自己建一個 RSS bot 試試。

雖然這個 RSS bot 用了 Zapier 才花五分鐘就搞定了,連一行 code 都不用寫,但是因為不是每個人都是「空格之神」的信徒,一看到 @vinta_rss_bot 推了幾則沒有在標題的中英文之間加上空格的文章之後,開始覺得渾身不舒服。最後實在受不了,就用 AWS Lambda 寫了一個加空格的 web API - api.pangu.space,讓 Zapier 在輸出到 Twitter 之前先打一次。

(前情提要有點太長)

這篇文章就是紀錄我當初用 ApexTerraform 部署 AWS Lambda functions 的過程,主要的邏輯很簡單,是用 Go 寫的,比較麻煩的反而是在配置 Amazon API Gateway 和 custom domain 的 HTTPS 之類的。因為只是個 side project,所以就沒用太重量級的 Serverless 了。

延伸閱讀:

cert-manager: Automatically provision TLS certificates in Kubernetes

目前公司的 Kubernetes cluster 是用 kube-lego 自動從 Let's Encrypt 取得 TLS/SSL 憑證,但是因為 kube-lego 之前宣佈只支援到 Kubernetes v1.8 為止,所以希望大家改用另外一套由同一群人開發的在做同一件事的工具:cert-manager。

這篇文章就是紀錄我當初部署 cert-manager 的過程,準備之後從 kube-lego 遷移過去。不過因為當時測試的時候發現 cert-manager 有些功能還不是很完善,例如 ingress-shim,再加上我們在 Kubernetes v1.9.6 用 kube-lego 其實也沒遇到什麼問題,所以後來的結論是暫時先不遷移。不過文章寫都寫了,還是跟大家分享一下,希望對其他人有幫助。

延伸閱讀:

GCP products described in 4 words or less

之前都是用 AWS 比較多,但是現在公司是用 Google Cloud Platform,這篇文章可以讓你快速了解 GCP 上面有哪些東西可以用。

忍不住抱怨一下,Google Cloud Memorystore 到底什麼時候才要上線呢?

雖然 GCP 在各方面都還是差了 AWS 一截(Google Kubernetes Engine 除外),但是 Google Cloud 的 Stackdriver 系列真心好用,例如 Logging 可以直接全文搜尋所有 containers 的 stdout,什麼配置都不用(轉頭望向 ELK)。說到看 logs,kubetail 也是不錯,就是強化版的 kubectl logs -f;另外還有 Debugger 可以直接在 production code 上跑 debugger,實在炫炮。

延伸閱讀:

One Giant Leap For SQL: MySQL 8.0 Released

MySQL 8.0 前陣子發佈了,這個版本對 SQL 標準的支援有了長足的進步,終於從 SQL-92 的魔障中走出來了。有望擺脫 Friends don't let friends use MySQL 的罵名(目前看來會繼承這個污名的應該是 MongoDB)。

是說因為以前一直都在用 MySQL,根本不知道 Window functions 是什麼,第一次用 OVER (PARTITION BY ... ORDER BY ...) 反而是在 Apache Spark 裡啊(SQL 俗)。

延伸閱讀:

Redis in Action

上禮拜花了一點時間研究 Redis 的 RDB/AOF persistence 和 Master/Slave replication 的原理,發現除了官方文件之外,Redis in Action 這本書寫得也非常詳細(雖然有些內容可能有點舊了),但是畢竟是經過 Redis 作者本人背書的,值得一讀。

忍不住分享一下,我上禮拜仔細看了 Redis 4.0 的 redis.conf 之後,才發現現在多了一個 aof-use-rdb-preamble 設定,實測啟用之後可以讓 appendonly.aof 的檔案大小減少 50%,大家有空可以試試。

延伸閱讀:

金丝雀发布、滚动发布、蓝绿发布到底有什么差别?关键点是什么?

看了這篇文章我才終於知道 Canary Releases, Blue-green Deployment, Rolling Update 是什麼意思(汗顏)。

HTTP codes as Valentine’s Day comics

這篇文章用漫畫的方式介紹了各種 HTTP status code,有點太可愛了。

@vinta 分享。

Monty Python's Flying Circus on Netflix

各位觀眾,Netflix 上有 Monty Python's Flying Circus 了!不知道 Monty Python 是誰的,我們在 Issue 6 有介紹過!

@vinta 分享!

kube-lego: Automatically provision TLS certificates in Kubernetes

kube-lego: Automatically provision TLS certificates in Kubernetes

kube-lego automatically requests certificates for Kubernetes Ingress resources from Let's Encrypt.

ref:
https://github.com/jetstack/kube-lego
https://letsencrypt.org/

I run kube-lego v0.1.5 with Kubernetes v1.9.4, everything works very fine.

Deploy kube-lego

It is strongly recommended to try Let's Encrypt Staging API first.

# kube-lego/deployment.yaml
kind: Namespace
apiVersion: v1
metadata:
  name: kube-lego
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-lego
  namespace: kube-lego
data:
  LEGO.EMAIL: "[email protected]"
  # LEGO.URL: "https://acme-v01.api.letsencrypt.org/directory"
  LEGO.URL: "https://acme-staging.api.letsencrypt.org/directory"
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kube-lego
  namespace: kube-lego
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-lego
  template:
    metadata:
      labels:
        app: kube-lego
    spec:
      containers:
      - name: kube-lego
        image: jetstack/kube-lego:0.1.5
        ports:
        - containerPort: 8080
        env:
        - name: LEGO_LOG_LEVEL
          value: debug
        - name: LEGO_EMAIL
          valueFrom:
            configMapKeyRef:
              name: kube-lego
              key: LEGO.EMAIL
        - name: LEGO_URL
          valueFrom:
            configMapKeyRef:
              name: kube-lego
              key: LEGO.URL
        - name: LEGO_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: LEGO_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 1

ref:
https://github.com/jetstack/kube-lego/tree/master/examples

$ kubectl apply -f kube-lego/ -R

Configure the Ingress

  • Add an annotation kubernetes.io/tls-acme: "true" to metadata.annotations
  • Add domains to spec.tls.hosts.

spec.tls.secretName is the Secret used to store the certificate received from Let's Encrypt, i.e., tls.key and tls.crt. If no Secret exists with that name, it will be created by kube-lego.

# ingress.yaml
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: simple-project
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/tls-acme: "true"
spec:
  tls:
  - secretName: kittenphile-com-tls
    hosts:
    - kittenphile.com
    - www.kittenphile.com
    - api.kittenphile.com
  rules:
  - host: kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: http
  - host: www.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: http
  - host: api.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-api
          servicePort: http

ref:
https://kubernetes.io/docs/concepts/services-networking/ingress/#tls

$ kubectl apply -f ingress.yaml

You could find exact ACME challenge paths by inspecting your Ingress resource.

$ kubectl describe ing simple-project
...
TLS:
  kittenphile-com-tls terminates kittenphile.com,www.kittenphile.com,api.kittenphile.com
Rules:
  Host                 Path  Backends
  ----                 ----  --------
kittenphile.com
                       /.well-known/acme-challenge/*   kube-lego-gce:8080 (<none>)
                       /*                              simple-frontend:http (<none>)
www.kittenphile.com
                       /.well-known/acme-challenge/*   kube-lego-gce:8080 (<none>)
                       /*                              simple-frontend:http (<none>)
api.kittenphile.com
                       /.well-known/acme-challenge/*   kube-lego-gce:8080 (<none>)
                       /*                              simple-api:http (<none>)
...

You might want to see logs of kube-lego Pods for observing the progress.

$ kubectl logs -f deploy/kube-lego --namespace kube-lego

Create a Production Certificate

After you make sure everything works ok, you are able to request production certificates for your domains.

Follow these instructions:

  • Change LEGO_URL to https://acme-v01.api.letsencrypt.org/directory
  • Delete account secret kube-lego-account
  • Delete certificate secret kittenphile-com-tls
  • Restart kube-lego
$ kubectl get secrets --all-namespaces
$ kubectl delete secret kube-lego-account --namespace kube-lego && \
  kubectl delete secret kittenphile-com-tls

$ kubectl replace --force -f kube-lego/ -R
$ kubectl logs -f deploy/kube-lego --namespace kube-lego

ref:
https://github.com/jetstack/kube-lego#switching-from-staging-to-production

cert-manager: Automatically provision TLS certificates in Kubernetes

cert-manager: Automatically provision TLS certificates in Kubernetes

cert-manager is an addon for automatically generating TLS certificates from Let's Encrypt for your Kubernetes cluster, which also is the official successor of kube-lego.

ref:
https://github.com/jetstack/cert-manager
https://letsencrypt.org/

If you are interfering with kube-lego, see the following link:

kube-lego: Automatically provision TLS certificates in Kubernetes
https://vinta.ws/code/kube-lego-automatically-provision-tls-certificates-in-kubernetes.html

Install

Assuming you already have Helm setup. If not, see the following link:

Helm: the package manager for Kubernetes
https://vinta.ws/code/helm-the-package-manager-for-kubernetes.html

$ helm install \
--name cert-manager \
--set rbac.create=false \
stable/cert-manager

$ helm ls --all cert-manager

$ kubectl logs deploy/cert-manager-cert-manager cert-manager -f
$ kubectl logs deploy/cert-manager-cert-manager ingress-shim -f

ref:
https://github.com/jetstack/cert-manager/blob/master/docs/user-guides/deploying.md
https://docs.helm.sh/helm/#helm-install

Create Cluster Issuers

An Issuer is a Certificate Authority who provisions TLS Certificates for your domains, for instance, Let's Encrypt.

spec.acme.privateKeySecretRef is the Secret used to store the ACME account private key, cert-manager creates it for you.

# cert-manager/issuer.yaml
kind: ClusterIssuer
apiVersion: certmanager.k8s.io/v1alpha1
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v01.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-private-key
    http01: {}
---
kind: ClusterIssuer
apiVersion: certmanager.k8s.io/v1alpha1
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-staging-private-key
    http01: {}
$ kubectl apply -f cert-manager/issuer.yaml

$ kubectl get clusterissuers
$ kubectl describe clusterissuer letsencrypt-staging

$ kubectl get secrets --all-namespaces
NAMESPACE     NAME                                    TYPE                                  DATA      AGE
default       cert-manager-cert-manager-token-5j4gw   kubernetes.io/service-account-token   3         6m
kube-system   letsencrypt-prod-private-key            Opaque                                1         40s
kube-system   letsencrypt-staging-private-key         Opaque                                1         40s
...

ref:
https://github.com/jetstack/cert-manager/blob/master/docs/user-guides/cluster-issuers.md
https://github.com/jetstack/cert-manager/tree/master/docs/api-types/issuer

Create the Ingress

Assuming you already have an Ingress like this:

# ingress.yaml
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: simple-project
  annotations:
    kubernetes.io/ingress.class: "gce"
spec:
  rules:
  - host: kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: http
  - host: api.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-api
          servicePort: http

Before you test certificate provisions, you must add A DNS records which point to the "Address" of the Ingress for all your domains.

$ kubectl apply -f ingress.yaml

$ kubectl describe ing simple-project
Name:             simple-project
Namespace:        default
Address:          12.34.56.78
Default backend:  default-http-backend:80 (10.44.2.5:8080)

$ dig kittenphile.com

Create a Staging Certificate

Let's Encrypt production API has a rate limit of 20 requests per domain per week, so it is strongly recommended to first use staging API for testing your configurations.

A Certificate contains the information required to make a certificate signing request for a given Issuer.

# cert-manager/certificate.yaml
kind: Certificate
apiVersion: certmanager.k8s.io/v1alpha1
metadata:
  name: kittenphile-com
spec:
  secretName: kittenphile-com-tls
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
  commonName: kittenphile.com
  dnsNames:
  - kittenphile.com
  - api.kittenphile.com
  acme:
    config:
    - http01:
        ingress: simple-project
      domains:
      - kittenphile.com
      - api.kittenphile.com

ref:
https://github.com/jetstack/cert-manager/blob/master/docs/user-guides/acme-http-validation.md
https://blog.n1analytics.com/free-automated-tls-certificates-on-k8s/

Configure the Ingress

Add domains you want to have TLS certificates to spec.tls.hosts.

spec.tls.secretName is the Secret used to store the certificate received from Let's Encrypt, i.e., tls.key and tls.crt.

# ingress.yaml
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: simple-project
  annotations:
    kubernetes.io/ingress.class: "gce"
spec:
  tls:
  - secretName: kittenphile-com-tls
    hosts:
    - kittenphile.com
    - api.kittenphile.com
  rules:
  - host: kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: http
  - host: api.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-api
          servicePort: http

ref:
https://kubernetes.io/docs/concepts/services-networking/ingress/#tls

cert-manager watches new domain entries in any Certificate resource, requests certificates from Let's Encrypt for new domains, and creates ACME HTTP-01 challenge endpoints which are attached to the Ingress automatically.

You could see the issuing progress in "Events" section of kittenphile-com certificate.

$ kubectl logs deploy/cert-manager-cert-manager cert-manager -f

$ kubectl apply -f ingress.yaml
$ kubectl apply -f cert-manager/certificate.yaml

$ kubectl describe certificate kittenphile-com
...
Events:
  Type    Reason               Age                From                     Message
  ----    ------               ----               ----                     -------
  Normal  PresentChallenge     5m                 cert-manager-controller  Presenting http-01 challenge for domain kittenphile.com
  Normal  PresentChallenge     5m                 cert-manager-controller  Presenting http-01 challenge for domain api.kittenphile.com
  Normal  SelfCheck            5m                 cert-manager-controller  Performing self-check for domain kittenphile.com
  Normal  SelfCheck            5m                 cert-manager-controller  Performing self-check for domain api.kittenphile.com
  Normal  ObtainAuthorization  25s                cert-manager-controller  Obtained authorization for domain kittenphile.com
  Normal  ObtainAuthorization  36s                cert-manager-controller  Obtained authorization for domain api.kittenphile.com
  Normal  RenewalScheduled     19s (x3 over 23s)  cert-manager-controller  Certificate scheduled for renewal in 1438 hours
  Normal  CeritifcateIssued    19s (x3 over 24s)  cert-manager-controller  Certificated issued successfully
...

You could also find the exact ACME challenge path by inspecting your Ingress resource.

$ kubectl describe ing simple-project
...
TLS:
  kittenphile-com-tls terminates kittenphile.com,api.kittenphile.com
Rules:
  Host                Path  Backends
  ----                ----  --------
kittenphile.com
                      /*                                                  simple-frontend:http (<none>)
                      /.well-known/acme-challenge/ltvlVWEXTup5BqEsztirs   cm-kittenphile-com-gikjk:8089 (<none>)
api.kittenphile.com
                      /*                                                  simple-api:http (<none>)
                      /.well-known/acme-challenge/kd08LK93Fkdf653h9dfjj   cm-kittenphile-com-hgdkd:8090 (<none>)
...

It's also worth noting, when using the Google Cloud's Ingress controller (kubernetes.io/ingress.class: "gce"), changes to load balancers might take up to 10 minutes to propagate. cert-manager sets a timeout of 15 minutes on HTTP validations to allow for this.

ref:
https://github.com/jetstack/cert-manager/issues/285

Create a Production Certificate

After you make sure all configurations are correct, just change the Certificate manifest's spec.issuerRef.name to letsencrypt-prod. Also, delete the staging Certificate and TLS Secret.

$ kubectl delete certificate kittenphile-com && \
  kubectl delete secret kittenphile-com-tls

$ kubectl apply -f cert-manager/certificate.yaml

$ kubectl describe certificate kittenphile-com
$ kubectl describe ing simple-project

cert-manager attaches temporarily generated Services to the Ingress for presenting ACME HTTP-01 challenges of each domains, which changes configurations of the Ingress. Don't forget that Google Cloud's Ingress controller might take a long time to propagate settings.

Provision automatically with ingress-shim

As of cert-manager v0.2.4, the ingress-shim seems to have some issues, for instance, it can not detect new domains which were added after the first issuing. The workaround is to create Certificate manifests manually, in other words, don't use ingress-shim.

$ helm upgrade \
cert-manager \
stable/cert-manager \
--set ingressShim.extraArgs='{--default-issuer-name=letsencrypt-prod,--default-issuer-kind=ClusterIssuer}'

ref:
https://github.com/jetstack/cert-manager/blob/master/docs/user-guides/ingress-shim.md

Migrate from kube-lego

Scale down and make sure kube-lego Pods are no longer running.

$ kubectl scale \
--namespace kube-lego \
--replicas=0 \
deployment kube-lego

$ kubectl get pods --namespace kube-lego

Download a copy of your ACME account private key which created by kube-lego.

$ kubectl get secret \
--namespace kube-lego \
-o yaml \
--export kube-lego-account > cert-manager/secret.yaml

Change metadata.name to something more relevant to cert-manager.

# cert-manager/secret.yaml
kind: Secret
apiVersion: v1
metadata:
  name: letsencrypt-prod-private-key
type: Opaque
data:
  acme-registration-url: XXX
  tls.key: XXX

Deploy cert-manager's Issuers and Certificates. Make sure your Certificate matches domains specified in the Ingress.

$ kubectl apply -f cert-manager/secret.yaml && \
  kubectl apply -f cert-manager/issuer.yaml && \
  kubectl apply -f cert-manager/certificate.yaml

ref:
https://github.com/jetstack/cert-manager/blob/master/docs/user-guides/migrating-from-kube-lego.md

Play with GitHub Archive dataset on BigQuery

Play with GitHub Archive dataset on BigQuery

Google BigQuery is a web service that lets you do interactive analysis of very massive datasets - analyzing billions of rows in seconds.

ref:
https://www.githubarchive.org/#bigquery
https://bigquery.cloud.google.com/table/githubarchive:month.201612

See also:
http://ghtorrent.org/

Show repository informations (1)

WITH repo_info AS (
  SELECT repo.id AS id, repo.name AS name, JSON_EXTRACT_SCALAR(payload, '$.pull_request.base.repo.description') AS description
  FROM `githubarchive.month.2017*`
  -- FROM `githubarchive.year.2016`
  -- FROM `githubarchive.year.*`
  WHERE type = "PullRequestEvent"
)

SELECT repo_info.name, ANY_VALUE(repo_info.description) AS description
FROM repo_info
WHERE
  repo_info.description IS NOT NULL AND
  repo_info.description != ""
GROUP BY repo_info.name
ORDER BY repo_info.name

ref:
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#json-functions
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#any_value

Show repository informations (2)

WITH repo_info AS (
  SELECT repo.id AS id, repo.name AS name, JSON_EXTRACT_SCALAR(payload, '$.description') AS description
  FROM `githubarchive.month.201708`
  WHERE type = "CreateEvent"
)

SELECT repo_info.name, ANY_VALUE(repo_info.description) AS description
FROM repo_info
WHERE
  repo_info.description IS NOT NULL AND
  repo_info.description != ""
GROUP BY repo_info.name
ORDER BY repo_info.name

Show repository informations (3)

SELECT name, description
FROM `ghtorrent-bq.ght_2017_04_01.projects`
WHERE
  forked_from IS NULL AND
  description IS NOT NULL AND
  description != ""

Show starred repositories by a specific user

You must use WatchEvent for starring a repo:
https://developer.github.com/v3/activity/events/types/#watchevent

SELECT repo.name, created_at
FROM TABLE_QUERY([githubarchive:month], 'LEFT(table_ID,4) IN ("2017","2016","2015")') 
WHERE type = "WatchEvent" AND actor.login = 'vinta'
GROUP BY repo.name, created_at
ORDER BY created_at DESC

Show starred repositories per user who has 10+ starred repositories

WITH stars AS (
     SELECT DISTINCT actor.login AS user, repo.name AS repo
     FROM `githubarchive.month.2017*`
     WHERE type="WatchEvent"
),
repositories_stars AS (
     SELECT repo, COUNT(*) as c FROM stars GROUP BY repo
     ORDER BY c DESC
     LIMIT 1000
),
users_stars AS (
    SELECT user, COUNT(*) as c FROM  stars
    WHERE repo IN (SELECT repo FROM repositories_stars)
    GROUP BY user
    HAVING c >= 10
    LIMIT 10000
)
SELECT user, repo FROM stars
WHERE repo IN (SELECT repo FROM repositories_stars)
AND user IN (SELECT user FROM users_stars)

ref:
https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e