碼天狗週刊 第 135 期 @vinta - Kubernetes, Python, MongoDB

碼天狗週刊 第 135 期 @vinta - Kubernetes, Python, MongoDB

本文同步發表於 CodeTengu Weekly - Issue 135

The incomplete guide to Google Kubernetes Engine

根據前陣子搗鼓 Kubernetes 的心得寫了一篇文章,跟大家分享一下,希望有幫助。內容包含概念介紹、建立 cluster、新增 node pools、部署 ConfigMap、Deployment with LivenessProbe/ReadinessProbe、Horizontal Pod Autoscaler、Pod Disruption Budget、StatefulSet、DaemonSet,到說明 Service 和 Ingress 的關係,以及 Node Affinity 與 Pod Affinity 的應用等。

順帶一提,就算只是架來玩玩,建議大家可以直接在 Google Kubernetes Engine 開一個 preemptible(類似 AWS 的 Spot Instances)的 k8s cluster,價格超便宜,所以就不要再用 minikube 啦。不過現在連 Amazon 也有自己的 managed Kubernetes 了,雖然目前公司是用 GCP,但是還是比較懷念 AWS 啊~

Fluent Python

雖然 Python 也是寫了一陣子了,但是每次讀這本書還是能夠學到不少。真心推薦。

當初學 Python 讀的是另一本 Learning Python,查了一下,哇都出到第五版了。

延伸閱讀:

A deep dive into the PyMongo MongoDB driver

Replica Set 通常是 MongoDB 的標準配置(再來就是 Sharding 了),這個 talk 詳細地說明了 Replica Set 是如何應對 service discovery 以及 PyMongo 和 Replica Set 之間是怎麼溝通的。

延伸閱讀:

Let's talk about usernames

就像我們之前提到過很多次的 Falsehoods 系列,這篇文章也是一直不厭其煩地告訴大家,幾乎每個系統、每個網站都會有的東西:username,其實沒有你以為的那麼簡單。大家感受一下。

作者也提到一個很重要的 The Tripartite Identity Pattern,把所謂的 ID 分成以下三種:

  1. System-level identifier, suitable for use as a target of foreign keys in our database
  2. Login identifier, suitable for use in performing a credential check
  3. Public identity, suitable for displaying to other users

而不要想用同一個 identifier 搞定所有用途。

Web Architecture 101

這篇文章淺顯易懂地解釋了一個現代的 web service 通常會具備的各項元件。不過說真的,如果你今天是一個初入門的後端工程師,你究竟得花多少時間和心力才能摸清楚這篇文章提到的東西?更別提那些更加底層的知識了,喔,這篇文章甚至也還沒提到 DevOps 的事情呢。就像之前讀到的 Will Kubernetes Collapse Under the Weight of Its Complexity?,總覺得整個態勢發展到現在,對新手(甚至是我們這種普通的 1x 工程師)似乎不是很友善啊。

延伸閱讀:

The incomplete guide to Google Kubernetes Engine

The incomplete guide to Google Kubernetes Engine

Kubernetes is the de facto standard of container orchestration. Google Kubernetes Engine (GKE) is the managed Kubernetes as a Service provided by Google Cloud Platform.

ref:
https://kubernetes.io/
https://cloud.google.com/kubernetes-engine/

You could find the sample project on GitHub.
https://github.com/vinta/simple-project-on-k8s

Installation

Install gcloud to create Kubernetes clusters on Google Cloud Platform.

ref:
https://cloud.google.com/sdk/docs/

Install kubectl to interact with any Kubernetes cluster.

$ brew install kubernetes-cli
# or
$ gcloud components install kubectl

Enable zsh shell autocompletion.

if [ $commands[kubectl] ]; then
  source <(kubectl completion zsh)
fi

ref:
https://kubernetes.io/docs/tasks/tools/install-kubectl/

Some useful tools:

Concepts

Nodes

  • Cluster: A set of machines, called nodes, that run containerized applications.
  • Node: A single virtual or physical machine that provides hardware resources.
  • Edge Node: The node which is exposed to the Internet.
  • Master Node: The node which is responsible for managing the whole cluster.

Objects

  • Pod: A group of one or more tightly related containers. Each pod is like a logical host has its own IP and hostname.
  • PodPreset: A set of pre-defined configurations can be injected into Pods automatically.
  • Service: A load balancer of a set of Pods which selected by labels, also called Service Discovery.
  • Ingress: A revered proxy acts as an entry point to the cluster, which allows domain-based and path-based routing to different Services.
  • ConfigMap: Key-value configuration data can be mounted into containers or consumed as environment variables.
  • Secret: Similar to ConfigMap but for storing sensitive data only.
  • Volume: A ephemeral file system whose lifetime is the same as the Pod.
  • PersistentVolume: A persistent file system that can be mounted to the cluster, without being associated with any particular node.
  • PersistentVolumeClaim: A binding between a Pod and a PersistentVolume.
  • StorageClass: A storage provisioner which allows users request storages dynamically.
  • Namespace: The way to partition a single cluster into multiple virtual groups.

Controllers

  • ReplicationController: Ensures that a specified number of Pods are always running.
  • ReplicaSet: The next-generation ReplicationController.
  • Deployment: The recommended way to deploy stateless Pods.
  • StatefulSet: Similar to Deployment but provides guarantees about the ordering and unique names of Pods.
  • DaemonSet: Ensures a copy of a Pod is running on every node.
  • Job: Creates Pods that runs to completion (exit with 0).
  • CronJob: A Job which can run at a specific time or run regularly.
  • HorizontalPodAutoscaler: Automatically scales the number of Pods based on CPU and memory utilization or custom metric targets.

ref:
https://kubernetes.io/docs/concepts/
https://kubernetes.io/docs/reference/glossary/?all=true

Setup Google Cloud Accounts

Make sure you use the right Google Cloud Platform account.

$ gcloud init
# or
$ gcloud config configurations activate default

$ gcloud config configurations list

$ gcloud config set compute/region asia-east1
$ gcloud config set compute/zone asia-east1-a
$ gcloud config list

Create Clusters

Create a regional cluster in asia-east1 region which has 1 node in each of the asia-east1 zones using --region=asia-east1 --num-nodes=1. By default, a cluster only creates its cluster master and nodes in a single compute zone.

# show available OSs and versions of Kubernetes
$ gcloud container get-server-config

$ gcloud container clusters create demo \
--cluster-version=1.10.5-gke.3 \
--node-version=1.10.5-gke.3 \
--scopes=gke-default,storage-full,compute-ro,pubsub,https://www.googleapis.com/auth/cloud_debugger \
--region=asia-east1 \
--num-nodes=1 \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--maintenance-window=20:00 \
--machine-type=n1-standard-4 \
--image-type=UBUNTU \
--node-labels=custom.kubernetes.io/fs-type=xfs
# or
$ gcloud container clusters create demo \
--cluster-version=1.10.5-gke.3 \
--node-version=1.10.5-gke.3 \
--scopes=gke-default,storage-full,compute-ro,pubsub,https://www.googleapis.com/auth/cloud_debugger \
--region=asia-east1 \
--num-nodes=1 \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--maintenance-window=20:00 \
--machine-type=n1-standard-4 \
--enable-autorepair \
--preemptible

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:36:43Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.5-gke.3", GitCommit:"6265b9797fc8680c8395abeab12c1e3bad14069a", GitTreeState:"clean", BuildDate:"2018-07-19T23:02:51Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl get nodes -o wide

ref:
https://cloud.google.com/sdk/gcloud/reference/container/clusters/create
https://cloud.google.com/kubernetes-engine/docs/concepts/multi-zone-and-regional-clusters
https://cloud.google.com/compute/docs/machine-types

Google Kubernetes Engine clusters running Kubernetes version 1.8+ enable Role-Based Access Control (RBAC) by default. Therefore, you must explicitly provide --enable-legacy-authorization option to disable RBAC.

ref:
https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control

Delete the cluster. After you delete the cluster, you might also need to manually delete persistent disks (under Compute Engine), load balancers (under Network services) and static IPs (under VPC network) which belong to the cluster on Google Cloud Platform Console.

$ gcloud container clusters delete demo

Create Node Pools

Create a cluster with preemptible VMs which are much cheaper than regular instances using --preemptible.

You might receive The connection to the server x.x.x.x was refused - did you specify the right host or port? error while upgrading the cluster which includes adding new node pools.

$ gcloud container node-pools create n1-standard-4-pre \
--cluster=demo \
--node-version=1.10.5-gke.3 \
--scopes=gke-default,storage-full,compute-ro,pubsub,https://www.googleapis.com/auth/cloud_debugger \
--region=asia-east1 \
--num-nodes=1 \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--machine-type=n1-standard-4 \
--enable-autorepair \
--preemptible

$ gcloud container node-pools list --cluster=demo --region=asia-east1

$ gcloud container operations list

ref:
https://cloud.google.com/sdk/gcloud/reference/container/node-pools/create
https://cloud.google.com/kubernetes-engine/docs/concepts/preemptible-vm
https://cloud.google.com/compute/docs/regions-zones/

Switch Contexts

Get authentication credentials which would generate a Context locally to interact with the cluster.

$ gcloud container clusters get-credentials demo --project simple-project-198818

ref:
https://cloud.google.com/sdk/gcloud/reference/container/clusters/get-credentials
https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/

A Context is roughly a configuration profile which indicates the cluster, the namespace, and the user you use. Contexts are stored in ~/.kube/config.

$ kubectl config get-contexts
$ kubectl config use-context gke_simple-project-198818_asia-east1_demo
$ kubectl config view

ref:
https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/

The recommended way to switch contexts is using kubectx or fubectl.

$ kubectx
# or
$ kcs

ref:
https://github.com/ahmetb/kubectx
https://github.com/kubermatic/fubectl

Build Docker Images

You could use Google Cloud Container Builder or any Continuous Integration (CI) service to automatically build Docker images and push them to Google Container Registry.

Furthermore, you need to tag your Docker images appropriately with the registry name format: region_name.gcr.io/your_project_id/your_image_name:version.

ref:
https://cloud.google.com/container-builder/
https://cloud.google.com/container-registry/

An example of cloudbuild.yaml:

substitutions:
  _REPO_NAME: simple
steps:
- id: pull-image
  name: gcr.io/cloud-builders/docker
  entrypoint: "/bin/sh"
  args: ["-c", "docker pull asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$BRANCH_NAME || true"]
  waitFor: ["-"]
- id: build-image
  name: gcr.io/cloud-builders/docker
  args: [
    "build",
    "--cache-from", "asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$BRANCH_NAME",
    "--label", "git.commit=$SHORT_SHA",
    "--label", "git.branch=$BRANCH_NAME",
    "--label", "ci.build-id=$BUILD_ID",
    "-t", "asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$SHORT_SHA",
    "simple-api/"
  ]
  waitFor: [
    "pull-image",
  ]
images:
  - asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$SHORT_SHA

ref:
https://cloud.google.com/container-builder/docs/build-config
https://cloud.google.com/container-builder/docs/create-custom-build-steps

Of course, you could also manually push Docker images to Google Container Registry.

$ gcloud auth configure-docker && \
gcloud config set project simple-project-198818 && \
export PROJECT_ID="$(gcloud config get-value project -q)"

$ docker build --rm -t asia.gcr.io/${PROJECT_ID}/simple-api:v1 simple-api/

$ gcloud docker -- push asia.gcr.io/${PROJECT_ID}/simple-api:v1

$ gcloud container images list --repository=asia.gcr.io/${PROJECT_ID}

ref:
https://cloud.google.com/container-registry/docs/pushing-and-pulling

Create Pods

No, you should never create Pods directly which are so-called naked Pods. Use Deployment instead.

ref:
https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/

Pods have following life cycles (states):

  • Pending
  • Running
  • Succeeded
  • Failed
  • Unknown

ref:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

Inspect Pods

Show information about Pods.

$ kubectl get all

$ kubectl get deploy

$ kubectl get pods
$ kubectl get pods -l app=simple-api
$ kubectl get pods

$ kubectl describe pod simple-api-5bbf4dd4f9-8b4c9
$ kubectl get pod simple-api-5bbf4dd4f9-8b4c9 -o yaml

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#describe
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#get

Execute a command in a container.

$ kubectl exec -i -t simple-api-5bbf4dd4f9-8b4c9 -- sh
# or
$ kex sh

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#exec

Tail Pod logs. It is also recommended to use kubetail.

$ kubectl logs simple-api-5bbf4dd4f9-8b4c9 -f
$ kubectl logs deploy/simple-api -f
$ kubectl logs statefulset/mongodb-rs0 -f

$ kubetail simple-api
$ kubetail simple-worker
$ kubetail mongodb-rs0 -c db

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#logs
https://github.com/johanhaleby/kubetail

List all Pods on a certain node.

$ kubectl describe node gke-demo-default-pool-fb33ac26-frkw
...
Non-terminated Pods:         (7 in total)
  Namespace                  Name                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                              ------------  ----------  ---------------  -------------
  default                    mongodb-rs0-1                                     2100m (53%)   4 (102%)    4G (30%)         4G (30%)
  default                    simple-api-84554476df-w5b5g                       500m (25%)    1 (51%)     1G (16%)         1G (16%)
  default                    simple-worker-6495b6b74b-rqplv                    500m (25%)    1 (51%)     1G (16%)         1G (16%)
  kube-system                fluentd-gcp-v3.0.0-848nq                          100m (2%)     0 (0%)      200Mi (1%)       300Mi (2%)
  kube-system                heapster-v1.5.3-6447d67f78-7psb2                  138m (3%)     138m (3%)   301856Ki (2%)    301856Ki (2%)
  kube-system                kube-dns-788979dc8f-5zvfk                         260m (6%)     0 (0%)      110Mi (0%)       170Mi (1%)
  kube-system                kube-proxy-gke-demo-default-pool-3c058fcf-x7cv    100m (2%)     0 (0%)      0 (0%)           0 (0%)
...

$ kubectl get pods --all-namespaces -o wide --sort-by="{.spec.nodeName}"

Check resource usage.

$ kubectl top pods
$ kubectl top nodes

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#top
https://kubernetes.io/docs/tasks/debug-application-cluster/

Restart Pods.

# you could simply kill Pods which would restart automatically if your Pods are managed by any Deployment
$ kubectl delete pods -l app=simple-worker
$ kubectl delete pods -l app=swag,role=worker,queues=swag.features.chat

# you could replace a resource by providing a manifest
$ kubectl replace --force -f simple-api/

ref:
https://stackoverflow.com/questions/40259178/how-to-restart-kubernetes-pods

Completely delete resources.

$ kubectl delete -f simple-api/ -R
$ kubectl delete deploy simple-api
$ kubectl delete deploy -l app=simple,role=worker
$ kubectl delete sts kafka pzoo zoo

# delete a Pod forcefully
$ kubectl delete pod simple-api-668d465985-886h5 --grace-period=0 --force
$ kubectl delete deploy simple-api --grace-period=0 --force

# delete all resources under a namespace
$ kubectl delete daemonsets,deployments,services,statefulset,pvc,pv --all --namespace tick

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#delete

Create ConfigMaps

Create a environment-variable-like ConfigMap.

kind: ConfigMap
apiVersion: v1
metadata:
  name: simple-api
data:
  FLASK_ENV: production
  MONGODB_URL: mongodb://mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local,mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local,mongodb-rs0-3.mongodb-rs0.default.svc.cluster.local/demo?readPreference=secondaryPreferred&maxPoolSize=10
  CACHE_URL: redis://redis-cache.default.svc.cluster.local/0
  CELERY_BROKER_URL: redis://redis-broker.default.svc.cluster.local/0
  CELERY_RESULT_BACKEND: redis://redis-broker.default.svc.cluster.local/1

Load environment variables from a ConfigMap:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: simple-api
  labels:
    app: simple-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-api
  template:
    metadata:
      labels:
        app: simple-api
    spec:
      containers:
      - name: simple-api
        image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
        command: ["uwsgi", "--ini", "config/uwsgi.ini", "--single-interpreter", "--enable-threads", "--http", ":8000"]
        envFrom:
        - configMapRef:
            name: simple-api
        ports:
        - containerPort: 8000

Create a file-like ConfigMap.

kind: ConfigMap
apiVersion: v1
metadata:
  name: redis-cache
data:
  redis.conf: |-
    maxmemory-policy allkeys-lfu
    appendonly no
    save ""

Mount files from a ConfigMap:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: redis-cache
  labels:
    app: redis-cache
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-cache
  template:
    metadata:
      labels:
        app: redis-cache
    spec:
      volumes:
      - name: config
        configMap:
          name: redis-cache
      containers:
      - name: redis
        image: redis:4.0.10-alpine
        command: ["redis-server"]
        args: ["/etc/redis/redis.conf", "--loglevel", "verbose", "--maxmemory", "1g"]
        volumeMounts:
        - name: config
          mountPath: /etc/redis
        ports:
        - containerPort: 6379

ref:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/

Only mount a single file with subPath.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: redis-cache
  labels:
    app: redis-cache
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-cache
  template:
    metadata:
      labels:
        app: redis-cache
    spec:
      volumes:
      - name: config
        configMap:
          name: redis-cache
      containers:
      - name: redis
        image: redis:4.0.10-alpine
        command: ["redis-server"]
        args: ["/etc/redis/redis.conf", "--loglevel", "verbose", "--maxmemory", "1g"]
        volumeMounts:
        - name: config
          mountPath: /etc/redis/redis.conf
          subPath: redis.conf
        ports:
        - containerPort: 6379

ref:
https://github.com/kubernetes/kubernetes/issues/44815#issuecomment-297077509

It is worth noting that changing ConfigMap or Secret won't trigger re-deploying Deployment. A workaround might be changing the name of ConfigMap every time you change the content of ConfigMap. If you mount ConfigMap as environment variables, you must trigger a re-deployment explicitly.

ref:
https://github.com/kubernetes/kubernetes/issues/22368

Create Deployments With Probes

Deployment are designed for stateless (or nearly stateless) services. Deployment controls ReplicaSet and ReplicaSet controls Pod.

ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

livenessProbe can be used to determine when an application must be restarted by Kubernetes, while readinessProbe can be used to determine when a container is ready to accept traffic.

ref:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

It is also a best practice to always specify resource limits: resources.requests and resources.limits.

ref:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

Create a Deployment with probes.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: simple-api
  labels:
    app: simple-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-api
  template:
    metadata:
      labels:
        app: simple-api
    spec:
      containers:
      - name: simple-api
        image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
        command: ["uwsgi", "--ini", "config/uwsgi.ini", "--single-interpreter", "--enable-threads", "--http", ":8000"]
        envFrom:
        - configMapRef:
            name: simple-api
        ports:
        - containerPort: 8000
        livenessProbe:
          exec:
            command: ["curl", "-fsS", "-m", "0.1", "-H", "User-Agent: KubernetesHealthCheck/1.0", "http://127.0.0.1:8000/health"]
          initialDelaySeconds: 5
          periodSeconds: 1
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          exec:
            command: ["curl", "-fsS", "-m", "0.1", "-H", "User-Agent: KubernetesHealthCheck/1.0", "http://127.0.0.1:8000/health"]
          initialDelaySeconds: 3
          periodSeconds: 1
          successThreshold: 1
          failureThreshold: 3
        resources:
          requests:
            cpu: 500m
            memory: 1G
          limits:
            cpu: 1000m
            memory: 1G

Create another Deployment of Celery workers.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: simple-worker
spec:
  replicas: 2
  selector:
    matchLabels:
      app: simple-worker
  template:
    metadata:
      labels:
        app: simple-worker
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: simple-worker
        image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
        command: ["celery", "-A", "app:celery", "worker", "--without-gossip", "-Ofair", "-l", "info"]
        envFrom:
        - configMapRef:
            name: simple-api
        readinessProbe:
          exec:
            command: ["sh", "-c", "celery inspect -q -A app:celery -d [email protected]$(hostname) --timeout 10 ping"]
          initialDelaySeconds: 15
          periodSeconds: 15
          timeoutSeconds: 10
          successThreshold: 1
          failureThreshold: 3
        resources:
          requests:
            cpu: 500m
            memory: 1G
          limits:
            cpu: 1000m
            memory: 1G
$ kubectl apply -f simple-api/ -R
$ kubectl get pods

The minimum value of timeoutSeconds is 1 so that you might need to use exec.command to run arbitrary shell commands with custom timeout settings.

ref:
https://cloudplatform.googleblog.com/2018/05/Kubernetes-best-practices-Setting-up-health-checks-with-readiness-and-liveness-probes.html

Create Deployments With Init Containers

apiVersion: v1
kind: Service
metadata:
  name: gcs-proxy-asia-contents-kittenphile-com
spec:
  type: NodePort
  selector:
    app: gcs-proxy-asia-contents-kittenphile-com
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: google-cloud-storage-proxy
data:
  nginx.conf: |-
    worker_processes auto;

    http {
      include mime.types;
      default_type application/octet-stream;

      server {
        listen 80;

        if ( $http_user_agent ~* (GoogleHC|KubernetesHealthCheck) ) {
          return 200;
        }

        root /usr/share/nginx/html;
        open_file_cache max=10000 inactive=10m;
        open_file_cache_valid 1m;
        open_file_cache_min_uses 1;
        open_file_cache_errors on;

        include /etc/nginx/conf.d/*.conf;
      }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gcs-proxy-asia-contents-kittenphile-com
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gcs-proxy-asia-contents-kittenphile-com
  template:
    metadata:
      labels:
        app: gcs-proxy-asia-contents-kittenphile-com
    spec:
      volumes:
      - name: nginx-config
        configMap:
          name: google-cloud-storage-proxy
      - name: nginx-config-extra
        emptyDir: {}
      initContainers:
      - name: create-robots-txt
        image: busybox
        command: ["sh", "-c"]
        args:
        - |
            set -euo pipefail
            cat << 'EOF' > /etc/nginx/conf.d/robots.txt
            User-agent: *
            Disallow: /
            EOF
        volumeMounts:
        - name: nginx-config-extra
          mountPath: /etc/nginx/conf.d/
      - name: create-nginx-extra-conf
        image: busybox
        command: ["sh", "-c"]
        args:
        - |
            set -euo pipefail
            cat << 'EOF' > /etc/nginx/conf.d/extra.conf
            location /robots.txt {
              alias /etc/nginx/conf.d/robots.txt;
            }
            EOF
        volumeMounts:
        - name: nginx-config-extra
          mountPath: /etc/nginx/conf.d/
      containers:
      - name: http
        image: swaglive/openresty:gcsfuse
        imagePullPolicy: Always
        args: ["nginx", "-c", "/usr/local/openresty/nginx/conf/nginx.conf", "-g", "daemon off;"]
        ports:
        - containerPort: 80
        securityContext:
          privileged: true
          capabilities:
            add: ["CAP_SYS_ADMIN"]
        env:
          - name: GCSFUSE_OPTIONS
            value: "--debug_gcs --implicit-dirs --stat-cache-ttl 1s --type-cache-ttl 24h --limit-bytes-per-sec -1 --limit-ops-per-sec -1 -o ro,allow_other"
          - name: GOOGLE_CLOUD_STORAGE_BUCKET
            value: asia.contents.kittenphile.com
        volumeMounts:
        - name: nginx-config
          mountPath: /usr/local/openresty/nginx/conf/nginx.conf
          subPath: nginx.conf
          readOnly: true
        - name: nginx-config-extra
          mountPath: /etc/nginx/conf.d/
          readOnly: true
        readinessProbe:
          httpGet:
            port: 80
            path: /
            httpHeaders:
            - name: User-Agent
              value: "KubernetesHealthCheck/1.0"
          timeoutSeconds: 1
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 1
          successThreshold: 1
        resources:
          requests:
            cpu: 0m
            memory: 500Mi
          limits:
            cpu: 1000m
            memory: 500Mi
$ kubectl exec -i -t simple-api-5968cfc48d-8g755 -- sh                                                                                  (gke_simple-project-198818_asia-east1_demo/default)
> curl http://gcs-proxy-asia-contents-kittenphile-com/robots.txt
User-agent: *
Disallow: /

ref:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340

Create Deployments With Canary Deployment

TODO

ref:
https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments
https://medium.com/google-cloud/kubernetes-canary-deployments-for-mere-mortals-13728ce032fe

Rollback A Deployment

Yes, you could rollback a deployment with kubectl rollout. However, the simplest way might be just git checkout the previous commit and deploy again with kubectl apply.

$ git checkout b7ed8d5
$ kubectl apply -f simple-api/ -R
$ kubectl get pods

ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment
https://medium.com/@brendanrius/scaling-kubernetes-for-25m-users-a7937e3536a0

Scale A Deployment

Simply increase the number of spec.replicas and deploy again.

$ kubectl apply -f simple-api/ -R
# or
$ kubectl scale --replicas=10 deploy/simple-api

$ kubectl get pods

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#scale
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment

Create HorizontalPodAutoscalers (HPA)

The Horizontal Pod Autoscaler automatically scales the number of pods in a Deployment based on observed CPU utilization, memory usage, or custom metrics. Yes, HPA only applies to Deployments and ReplicationControllers.

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: simple-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: simple-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageValue: 800M
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: simple-worker
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: simple-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageValue: 500M
$ kubectl apply -f simple-api/hpa.yaml

$ kubectl get hpa --watch
NAME            REFERENCE                  TARGETS                   MINPODS   MAXPODS   REPLICAS   AGE
simple-api      Deployment/simple-api      18685952/800M, 4%/80%     2         20        3          10m
simple-worker   Deployment/simple-worker   122834944/500M, 11%/80%   2         10        3          10m

ref:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

You could run some load testing.

ref:
https://medium.com/@jonbcampos/kubernetes-horizontal-pod-scaling-190e95c258f5

There is also Cluster Autoscaler in Google Kubernetes Engine.

$ gcloud container clusters update demo \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--node-pool=default-pool

ref:
https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler

Create VerticalPodsAutoscalers (VPA)

TODO

ref:
https://medium.com/@Mohamed.ahmed/kubernetes-autoscaling-101-cluster-autoscaler-horizontal-pod-autoscaler-and-vertical-pod-2a441d9ad231

Create PodDisruptionBudget (PDB)

  • Voluntary disruptions: actions initiated by application owners or admins.
  • Involuntary disruptions: unavoidable cases like hardware failures or system software error.

PodDisruptionBudgets are only accounted for with voluntary disruptions, something like a hardware failure will not take PodDisruptionBudget into account. PDB cannot prevent involuntary disruptions from occurring, but they do count against the budget.

Create a PodDisruptionBudget for a stateless application.

kind: PodDisruptionBudget
apiVersion: policy/v1beta1
metadata:
  name: simple-api
spec:
  minAvailable: 90%
  selector:
    matchLabels:
      app: simple-api

Create a PodDisruptionBudget for a multiple-instance stateful application.

kind: PodDisruptionBudget
apiVersion: policy/v1beta1
metadata:
  name: mongodb-rs0
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: mongodb-rs0
$ kubectl apply -f simple-api/pdb.yaml
$ kubectl apply -f mongodb/pdb.yaml

$ kubectl get pdb
NAME          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
mongodb-rs0   2               N/A               1                     48m
simple-api    90%             N/A               0                     48m

ref:
https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
https://kubernetes.io/docs/tasks/run-application/configure-pdb/

Actually, you could also have the similar functionality using .spec.strategy.rollingUpdate.

  • maxUnavailable: The maximum number of Pods that can be unavailable during the update process.
  • maxSurge: The maximum number of Pods that can be created over the desired number of Pods.

Which makes sure that total ready Pods >= total desired Pods - maxUnavailable and total Pods <= total desired Pods + maxSurge.

ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#writing-a-deployment-spec
https://cloud.google.com/kubernetes-engine/docs/how-to/updating-apps

Create Services

A Service is basically a load balancer of a set of Pods which are selected by labels. Since you can't rely on any Pod's IP which changes every time it creates and destroys, you should always provide a Service as an entry point for your Pods or so-called Microservice.

Typically, containers you run in the cluster are not accessible from the Internet, because they do not have external IP addresses. You must explicitly expose your application by creating a Service or an Ingress.

There are following Service types:

  • ClusterIP: A virtual IP which is only reachable from within the cluster. Also, the default Service type.
  • NodePort: It opens a specific port on all Nodes, and any traffic sent to this port is forwarded to the Service.
  • LoadBalancer: It builds on NodePorts by additionally configuring the cloud provider to create an external load balancer.
  • ExternalName: It maps the service to a external CNAME record, i.e., your MySQL RDS on AWS.

Create a Service.

kind: Service
apiVersion: v1
metadata:
  name: simple-api
  labels:
    app: simple-api
spec:
  type: NodePort
  selector:
    app: simple-api
  ports:
    - port: 80
      targetPort: 8000
      protocol: TCP

type: NodePorts is enough in most cases; spec.selector must match labels defined in the corresponding Deployment as the same as spec.ports.targetPort and spec.ports.protocol.

$ kubectl apply -f simple-api/ -R

$ kubectl get svc,endpoints

ref:
https://kubernetes.io/docs/concepts/services-networking/service/
https://medium.com/google-cloud/kubernetes-nodeport-vs-loadbalancer-vs-ingress-when-should-i-use-what-922f010849e0

After a Service is created, kube-dns creates a corresponding DNS A record named my-svc.my-namespace.svc.cluster.local which resolves to the cluster IP. In ths case: simple-api.default.svc.cluster.local. Headless Services (without a cluster IP) are also assigned a DNS A record for a name of the form my-svc.my-namespace.svc.cluster.local. Unlike normal Services, this A record directly resolves to a set of IPs of Pods selected by the Service. Clients should be expected to consume the set of IPs or use round-robin selection from the set.

ref:
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

For more detail about Kubernetes networking, go to:
https://github.com/hackstoic/kubernetes_practice/blob/master/%E7%BD%91%E7%BB%9C.md
https://containerops.org/2017/01/30/kubernetes-services-and-ingress-under-x-ray/
https://www.safaribooksonline.com/library/view/kubernetes-up-and/9781491935668/ch07.html

Use Port Forwarding

Access a Service or a Pod on your local machine with port forwarding.

# 8080 is the local port and 80 is the remote port
$ kubectl port-forward svc/simple-api 8080:80

$ open http://127.0.0.1:8080/

ref:
https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/

Create An Ingress

Pods in Kubernetes are not reachable from outside the cluster, so you need a way to expose your Pods to the Internet. Even though you could associate Pods with a Service of the right type, i.e., NodePort or LoadBalancer, the recommended way to expose services is using Ingress. You can do a lot of different things with an Ingress, and there are many types of Ingress controllers that have different capabilities.

There are some reasons to choose Ingress over Service:

  • Service is internal load balancers and Ingress is a gateway of external access to Services
  • Service is L3 load balancer and Ingress is L7 load balancer
  • Ingress allows domain-based and path-based routing to different Services
  • It is not efficent to create a cloud provider's load balancer for each Service you want to expose

Create an Ingress which is implemented using Google Cloud Load Balancing (HTTP load balancer). You should make sure Services exist before creating the Ingress.

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: simple-project
  annotations:
    kubernetes.io/ingress.class: "gce"
    # kubernetes.io/tls-acme: "true"
    # ingress.kubernetes.io/ssl-redirect: "true"
spec:
  # tls:
  # - secretName: kittenphile-com-tls
  #   hosts:
  #   - kittenphile.com
  #   - www.kittenphile.com
  #   - api.kittenphile.com
  rules:
  - host: kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: 80
  - host: www.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: 80
  - host: api.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-api
          servicePort: 80
  - host: asia.contents.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: gcs-proxy-asia-contents-kittenphile-com
          servicePort: 80
  backend:
    serviceName: simple-api
    servicePort: 80

It might take several minutes to spin up a Google HTTP load balancer (includes acquiring the public IP), and at least 5 minutes before the GCE API starts healthchecking backends. After getting your public IP, you could go to your domain provider and create new DNS records which point to the IP.

$ kubectl apply -f ingress.yaml

$ kubectl describe ing simple-project

ref:
https://kubernetes.io/docs/concepts/services-networking/ingress/
https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/
https://www.joyfulbikeshedding.com/blog/2018-03-26-studying-the-kubernetes-ingress-system.html

To read more about Google Load balancer, go to:
https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer
https://cloud.google.com/compute/docs/load-balancing/http/backend-service

Setup The Ingress With TLS Certificates

To automatically create HTTPS certificates for your domains:

Create Ingress Controllers

Kubernetes supports multiple Ingress controllers:

ref:
https://container-solutions.com/production-ready-ingress-kubernetes/

Create StorageClasses

StorageClass provides a way to define different available storage types, for instance, ext4 SSD, XFS SSD, CephFS, NFS. You could specify them in 123 or spec.volumeClaimTemplates.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ssd-xfs
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  fsType: xfs
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ssd-regional
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  zones: asia-east1-a, asia-east1-b, asia-east1-c
  replication-type: regional-pd
$ kubectl apply -f storageclass.yaml
$ kubectl get sc
NAME PROVISIONER AGE
ssd kubernetes.io/gce-pd 5s
ssd-regional kubernetes.io/gce-pd 4s
ssd-xfs kubernetes.io/gce-pd 3s
standard (default) kubernetes.io/gce-pd 1h

ref:
https://kubernetes.io/docs/concepts/storage/storage-classes/#gce

Create PersistentVolumeClaims

A Volume is just a directory which you could mount into containers and it is shared by all containers inside the same Pod. Also, it has an explicit lifetime - the same as the Pod that encloses it. Sources of Volume are various, they could be a remote Git repo, a file path of the host machine, a folder from a PersistentVolumeClaim, or data from a ConfigMap and a Secret.

PersistentVolumes are used to manage durable storage in a cluster. Unlike Volumes, PersistentVolumes have a lifecycle independent of any individual Pod. On Google Kubernetes Engine, PersistentVolumes are typically backed by Google Compute Engine Persistent Disks. Normally, you don't have to create PersistentVolumes explicitly. In Kubernetes 1.6 and later versions, you only need to create PersistentVolumeClaim and the corresponding PersistentVolume would be dynamically provisioned with StorageClasses.

Pods use PersistentVolumeClaims as Volumes.

Be care of creating a Deployment with PersistentVolumeClaim. In most of the case, you might not want to multiple replica of a Deploymenht write data into the same PersistentVolumeClaim.

For accessModes, there are only a few storage providers support ReadOnlyMany and ReadWriteMany.

ref:
https://kubernetes.io/docs/concepts/storage/volumes/
https://kubernetes.io/docs/concepts/storage/persistent-volumes/
https://kubernetes.io/blog/2017/03/dynamic-provisioning-and-storage-classes-kubernetes/
https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes

On Kubernetes v1.10+, it is possible to create local PersistentVolumes for your StatefulSets. Previously, PersistentVolumes only supported remote volume types, for instance, GCP's Persistent Disk and AWS's EBS. However, using local storage ties your applications to that specific node, making your application harder to schedule.

ref:
https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/

Create A StatefulSet

Pods created under a StatefulSet have a few unique attributes: the name of the pod is not random, instead each pod gets an ordinal name. In addition, Pods are created one at a time instead of all at once, which can help when bootstrapping a stateful system. StatefulSet also deletes/updates one Pod at a time, in reverse order with respect to its ordinal index, and it waits for each to be completely shutdown before deleting the next.

Rule of thumb: once you find out that you need PersistentVolume for the component, you might just consider using StatefulSet.

ref:
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
https://akomljen.com/kubernetes-persistent-volumes-with-deployment-and-statefulset/

Create a StatefulSet of a three-node MongoDB replica set.

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: default-view
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default
---
apiVersion: v1
kind: Service
metadata:
  name: mongodb-rs0
spec:
  clusterIP: None
  selector:
    app: mongodb-rs0
  ports:
    - port: 27017
      targetPort: 27017
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: mongodb-rs0
spec:
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  serviceName: mongodb-rs0
  selector:
    matchLabels:
      app: mongodb-rs0
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: ssd-xfs
      resources:
        requests:
          storage: 100G
  template:
    metadata:
      labels:
        app: mongodb-rs0
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: custom.kubernetes.io/fs-type
                operator: In
                values:
                - "xfs"
              - key: cloud.google.com/gke-preemptible
                operator: NotIn
                values:
                - "true"
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - mongodb-rs0
      terminationGracePeriodSeconds: 10
      containers:
      - name: db
        image: mongo:3.6.5
        command: ["mongod"]
        args: ["--bind_ip_all", "--replSet", "rs0"]
        ports:
        - containerPort: 27017
        volumeMounts:
        - name: data
          mountPath: /data/db
        readinessProbe:
          exec:
            command: ["mongo", --eval, "db.adminCommand('ping')"]
        resources:
          requests:
            cpu: 2
            memory: 4G
          limits:
            cpu: 4
            memory: 4G
      - name: sidecar
        image: cvallance/mongo-k8s-sidecar
        env:
          - name: MONGO_SIDECAR_POD_LABELS
            value: app=mongodb-rs0
          - name: KUBE_NAMESPACE
            value: default
          - name: KUBERNETES_MONGO_SERVICE_NAME
            value: mongodb-rs0
$ kubectl apply -f storageclass.yaml
$ kubectl apply -f mongodb/ -R

$ kubectl get pods

$ kubetail mongodb -c db
$ kubetail mongodb -c sidecar

$ kubectl scale statefulset mongodb-rs0 --replicas=4

The purpose of cvallance/mongo-k8s-sidecar is to automatically add new Pods to the replica set and remove Pods from the replica set while you scale up or down MongoDB StatefulSet.

ref:
https://github.com/cvallance/mongo-k8s-sidecar
https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/

Create A Headless Service For A StatefulSet

Headless Services (clusterIP: None) are just like normal Kubernetes Services, except they don’t do any load balancing for you. For a typical StatefulSet component, for instance, a database with Master-Slave replication, you don't want Kubernetes load balancing in order to prevent writing data to slaves accidentally.

When headless Services combine with StatefulSets, they can give you unique DNS addresses which return A records that point directly to Pods themselves. DNS names are in the format of static-pod-name.headless-service-name.namespace.svc.cluster.local.

kind: Service
apiVersion: v1
metadata:
  name: redis-broker
spec:
  clusterIP: None
  selector:
    app: redis-broker
  ports:
  - port: 6379
    targetPort: 6379
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: redis-broker
spec:
  replicas: 1
  serviceName: redis-broker
  selector:
    matchLabels:
      app: redis-broker
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: ssd
      resources:
        requests:
          storage: 32Gi
  template:
    metadata:
      labels:
        app: redis-broker
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-preemptible
                operator: NotIn
                values:
                - "true"
      volumes:
      - name: config
        configMap:
          name: redis-broker
      containers:
      - name: redis
        image: redis:4.0.10-alpine
        command: ["redis-server"]
        args: ["/etc/redis/redis.conf", "--loglevel", "verbose", "--maxmemory", "1g"]
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: data
          mountPath: /data
        - name: config
          mountPath: /etc/redis
        readinessProbe:
          exec:
            command: ["sh", "-c", "redis-cli -h $(hostname) ping"]
          initialDelaySeconds: 5
          timeoutSeconds: 1
          periodSeconds: 1
          successThreshold: 1
          failureThreshold: 3
        resources:
          requests:
            cpu: 250m
            memory: 1G
          limits:
            cpu: 1000m
            memory: 1G

If redis-broker has 2 replicas, nslookup redis-broker.default.svc.cluster.local returns multiple A records for a single DNS lookup is commonly known as round-robin DNS.

$ kubectl run -i -t --image busybox dns-test --restart=Never --rm /bin/sh

> nslookup redis-broker.default.svc.cluster.local
Server: 10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local
Name: redis-broker.default.svc.cluster.local
Address 1: 10.60.6.2 redis-broker-0.redis-broker.default.svc.cluster.local
Address 2: 10.60.6.7 redis-broker-1.redis-broker.default.svc.cluster.local

> nslookup redis-broker-0.redis-broker.default.svc.cluster.local
Server: 10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local
Name: redis-broker-0.redis-broker.default
Address 1: 10.60.6.2 redis-broker-0.redis-broker.default.svc.cluster.local

ref:
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#using-stable-network-identities

Moreover, there is no port re-mapping for a headless Service due to the IP resolves to Pod directly.

apiVersion: v1
kind: Service
metadata:
  namespace: tick
  name: influxdb
spec:
  clusterIP: None
  selector:
    app: influxdb
  ports:
  - name: api
    port: 4444
    targetPort: 8086
  - name: admin
    port: 8083
    targetPort: 8083
$ kubectl apply -f tick/ -R
$ kubectl get svc --namespace tick
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
influxdb   ClusterIP   None         <none>        4444/TCP,8083/TCP   1h

$ curl http://influxdb.tick.svc.cluster.local:4444/ping
curl: (7) Failed to connect to influxdb.tick.svc.cluster.local port 4444: Connection refused

$ curl -I http://influxdb.tick.svc.cluster.local:8086/ping
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: 7fc09a56-8538-11e8-8d1d-000000000000

Create A DaemonSet

Create a DaemonSet which deploy Telegraf agents on each nodes.

kind: DaemonSet
apiVersion: apps/v1
metadata:
  namespace: tick
  name: telegraf-ds
spec:
  selector:
    matchLabels:
      app: telegraf-ds
  template:
    metadata:
      labels:
        app: telegraf-ds
    spec:
      volumes:
      - name: config
        configMap:
          name: telegraf-ds
      - name: sys
        hostPath:
          path: /sys
      - name: docker-socket
        hostPath:
          path: /var/run/docker.sock
      - name: proc
        hostPath:
          path: /proc
      - name: varrunutmp
        hostPath:
          path: /var/run/utmp
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      containers:
      - name: telegraf
        image: telegraf:1.7.1-alpine
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: "HOST_PROC"
          value: "/rootfs/proc"
        - name: "HOST_SYS"
          value: "/rootfs/sys"
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf
          readOnly: true
        - name: sys
          mountPath: /rootfs/sys
          readOnly: true
        - name: proc
          mountPath: /rootfs/proc
          readOnly: true
        - name: docker-socket
          mountPath: /var/run/docker.sock
        - name: varrunutmp
          mountPath: /var/run/utmp
          readOnly: true
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        resources:
          requests:
            cpu: 50m
            memory: 500Mi
          limits:
            cpu: 200m
            memory: 500Mi

ref:
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
https://github.com/influxdata/telegraf

Create A CronJob

TODO

ref:
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/

Define Node Affinity And Pod Affinity

Prevent that Pods locate on preemptible nodes. Also, you should always prefer nodeAffinity over nodeSelector.

kind: StatefulSet
apiVersion: apps/v1
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-preemptible
                operator: NotIn
                values:
                - "true"

ref:
https://medium.com/google-cloud/using-preemptible-vms-to-cut-kubernetes-engine-bills-in-half-de2481b8e814

spec.PodAntiAffinity ensures that each Pod of the same Deployment or StatefulSet does not co-locate on a single node.

kind: StatefulSet
apiVersion: apps/v1
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - mongodb-rs0

ref:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Migrate Pods from Old Nodes to New Nodes

  • Cordon marks old nodes as unschedulable
  • Drain evicts all Pods on old nodes
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=n1-standard-4-pre -o=name); do
  kubectl cordon "$node";
done

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=n1-standard-4-pre -o=name); do
  kubectl drain --ignore-daemonsets --delete-local-data --grace-period=2 "$node";
done

$ kubectl get nodes
NAME                                       STATUS                     ROLES     AGE       VERSION
gke-demo-default-pool-3c058fcf-x7cv        Ready                      <none>    2h        v1.10.5-gke.3
gke-demo-default-pool-58da1098-1h00        Ready                      <none>    2h        v1.10.5-gke.3
gke-demo-default-pool-fc34abbf-9dwr        Ready                      <none>    2h        v1.10.5-gke.3
gke-demo-n1-standard-4-pre-1a54e45a-0m7p   Ready,SchedulingDisabled   <none>    58m       v1.10.5-gke.3
gke-demo-n1-standard-4-pre-1a54e45a-mx3h   Ready,SchedulingDisabled   <none>    58m       v1.10.5-gke.3
gke-demo-n1-standard-4-pre-1a54e45a-qhdz   Ready,SchedulingDisabled   <none>    58m       v1.10.5-gke.3

ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#cordon
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain
https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool

Show Objects' Events

$ kubectl get events -w
$ kubectl get events -w --sort-by='{.firstTimestamp}'
$ kubectl get events -w --sort-by='{.lastTimestamp}' | grep simple-worker
$ kubectl get events -w --sort-by=.metadata.creationTimestamp

$ kubectl get events --sort-by='{.lastTimestamp}'

ref:
https://kubernetes.io/docs/tasks/debug-application-cluster/

View Pods' Logs on Stackdriver Logging

You could use following search formats.

textPayload: "OBJECT_FINALIZE"

logName: "projects/simple-project-198818/logs/worker"
textPayload: "Added media preset"

logName = "projects/simple-project-198818/logs/beat"
textPayload: "backend_cleanup"

resource.labels.pod_id = "simple-api-6744bf74db-529qf"
logName = "projects/simple-project-198818/logs/api"
timestamp >= "2018-04-21T12:00:00Z"
timestamp <= "2018-04-21T16:00:00Z"
textPayload: "5adb2bd460d6487649fe82ea"

resource.type="container"
resource.labels.cluster_name="production-1"
resource.labels.namespace_id="default"
logName: "projects/simple-project-198818/logs/worker"
textPayload: "INFO/ForkPoolWorker-27]"
timestamp>="2018-06-21T04:10:24Z"
timestamp<="2018-06-21T04:15:25Z"

resource.type="container"
resource.labels.cluster_name="production-1"
resource.labels.namespace_id="default"
resource.labels.pod_id:"swag-worker-swag.signals"
textPayload:"ConcurrentObjectUseError"

ref:
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/
https://cloud.google.com/logging/docs/view/advanced-filters

Issues

Pending Pods

One of the most common reasons of Pending Pods is lack of resources.

$ kubectl describe pod mongodb-rs0-1
...
Events:
Type       Reason              Age                  From                 Message
----       ------              ----                 ----                 -------
Warning    FailedScheduling    3m (x739 over 1d)    default-scheduler    0/3 nodes are available: 1 ExistingPodsAntiAffinityRulesNotMatch, 1 MatchInterPodAffinity, 1 NodeNotReady, 2 NoVolumeZoneConflict, 3 Insufficient cpu, 3 Insufficient memory, 3 MatchNodeSelector.
...

You could resize nodes in the cluster.

$ gcloud container clusters resize demo --node-pool=n1-standard-4-pre --size=5 --region=asia-east1

ref:
https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/

CrashLoopBackOff Pods

CrashLoopBackOff means the Pod is starting, then crashing, then starting again and crashing again.

$ kubectl describe pod the-pod-name
$ kubectl logs --previous the-pod-name

ref:
https://www.krenger.ch/blog/crashloopbackoff-and-how-to-fix-it/
https://sysdig.com/blog/debug-kubernetes-crashloopbackoff/

碼天狗週刊 第 125 期 @vinta - Amazon Web Services, Google Cloud Platform, Kubernetes, DevOps, MySQL, Redis

碼天狗週刊 第 125 期 @vinta - Amazon Web Services, Google Cloud Platform, Kubernetes, DevOps, MySQL, Redis

本文同步發表於 CodeTengu Weekly - Issue 125

Apex and Terraform: The easiest way to manage AWS Lambda functions

因為一直都有訂閱 RSS 的習慣,但是常常工作一忙就積了一堆文章忘記看,可是又發現自己就算上班事情很多還是會三不五時刷一下 Twitter ~~順便抱怨幾句~~,所以就乾脆建了一個 @vinta_rss_bot,透過 Zapier 同步 Feedly 裡的文章到 Twitter,讓自己在刷推的時候很容易不小心就看到。實測了一個多禮拜,效果不錯,大家可以自己建一個 RSS bot 試試。

雖然這個 RSS bot 用了 Zapier 才花五分鐘就搞定了,連一行 code 都不用寫,但是因為不是每個人都是「空格之神」的信徒,一看到 @vinta_rss_bot 推了幾則沒有在標題的中英文之間加上空格的文章之後,開始覺得渾身不舒服。最後實在受不了,就用 AWS Lambda 寫了一個加空格的 web API - api.pangu.space,讓 Zapier 在輸出到 Twitter 之前先打一次。

(前情提要有點太長)

這篇文章就是紀錄我當初用 ApexTerraform 部署 AWS Lambda functions 的過程,主要的邏輯很簡單,是用 Go 寫的,比較麻煩的反而是在配置 Amazon API Gateway 和 custom domain 的 HTTPS 之類的。因為只是個 side project,所以就沒用太重量級的 Serverless 了。

延伸閱讀:

cert-manager: Automatically provision TLS certificates in Kubernetes

目前公司的 Kubernetes cluster 是用 kube-lego 自動從 Let's Encrypt 取得 TLS/SSL 憑證,但是因為 kube-lego 之前宣佈只支援到 Kubernetes v1.8 為止,所以希望大家改用另外一套由同一群人開發的在做同一件事的工具:cert-manager。

這篇文章就是紀錄我當初部署 cert-manager 的過程,準備之後從 kube-lego 遷移過去。不過因為當時測試的時候發現 cert-manager 有些功能還不是很完善,例如 ingress-shim,再加上我們在 Kubernetes v1.9.6 用 kube-lego 其實也沒遇到什麼問題,所以後來的結論是暫時先不遷移。不過文章寫都寫了,還是跟大家分享一下,希望對其他人有幫助。

延伸閱讀:

GCP products described in 4 words or less

之前都是用 AWS 比較多,但是現在公司是用 Google Cloud Platform,這篇文章可以讓你快速了解 GCP 上面有哪些東西可以用。

忍不住抱怨一下,Google Cloud Memorystore 到底什麼時候才要上線呢?

雖然 GCP 在各方面都還是差了 AWS 一截(Google Kubernetes Engine 除外),但是 Google Cloud 的 Stackdriver 系列真心好用,例如 Logging 可以直接全文搜尋所有 containers 的 stdout,什麼配置都不用(轉頭望向 ELK)。說到看 logs,kubetail 也是不錯,就是強化版的 kubectl logs -f;另外還有 Debugger 可以直接在 production code 上跑 debugger,實在炫炮。

延伸閱讀:

One Giant Leap For SQL: MySQL 8.0 Released

MySQL 8.0 前陣子發佈了,這個版本對 SQL 標準的支援有了長足的進步,終於從 SQL-92 的魔障中走出來了。有望擺脫 Friends don't let friends use MySQL 的罵名(目前看來會繼承這個污名的應該是 MongoDB)。

是說因為以前一直都在用 MySQL,根本不知道 Window functions 是什麼,第一次用 OVER (PARTITION BY ... ORDER BY ...) 反而是在 Apache Spark 裡啊(SQL 俗)。

延伸閱讀:

Redis in Action

上禮拜花了一點時間研究 Redis 的 RDB/AOF persistence 和 Master/Slave replication 的原理,發現除了官方文件之外,Redis in Action 這本書寫得也非常詳細(雖然有些內容可能有點舊了),但是畢竟是經過 Redis 作者本人背書的,值得一讀。

忍不住分享一下,我上禮拜仔細看了 Redis 4.0 的 redis.conf 之後,才發現現在多了一個 aof-use-rdb-preamble 設定,實測啟用之後可以讓 appendonly.aof 的檔案大小減少 50%,大家有空可以試試。

延伸閱讀:

金丝雀发布、滚动发布、蓝绿发布到底有什么差别?关键点是什么?

看了這篇文章我才終於知道 Canary Releases, Blue-green Deployment, Rolling Update 是什麼意思(汗顏)。

HTTP codes as Valentine’s Day comics

這篇文章用漫畫的方式介紹了各種 HTTP status code,有點太可愛了。

@vinta 分享。

Monty Python's Flying Circus on Netflix

各位觀眾,Netflix 上有 Monty Python's Flying Circus 了!不知道 Monty Python 是誰的,我們在 Issue 6 有介紹過!

@vinta 分享!

Deploy TICK stack on Kubernetes: Telegraf, InfluxDB, Chronograf, and Kapacitor

Deploy TICK stack on Kubernetes: Telegraf, InfluxDB, Chronograf, and Kapacitor

TICK stack is a set of open source tools for building a monitoring and alerting system, whose components include:

  • Telegraf: Collect data
  • InfluxDB: Store data
  • Chronograf: Visualize data
  • Kapacitor: Raise alerts

You could consider TICK as an alternative of ELK (Elasticsearch, Logstash, and Kibana).

ref:
https://www.influxdata.com/time-series-platform/

InfluxDB

InfluxDB is a Time Series Database which optimized for time-stamped or time series data. Time series data could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data.

ref:
https://www.influxdata.com/time-series-platform/influxdb/

Key Concepts

  • Retention Policy: Database configurations which indicate how long a db keeps data and how many copies of those data are stored in the cluster.
  • Measurement: Conceptually similar to a RDBM table.
  • Point: Conceptually similar to a RDBM row.
  • Timestamp: Primary key of every point.
  • Field: Conceptually similar to a RDBM column (without indexes).
    • Field set menas a pair of Field key and Field value.
  • Tag: Basically indexed field.
  • Series: Data which is in the same measurement, retention policy, and tag set.

All time in InfluxDB is UTC.

ref:
https://docs.influxdata.com/influxdb/v1.5/concepts/key_concepts/
https://docs.influxdata.com/influxdb/v1.5/concepts/glossary/
https://docs.influxdata.com/influxdb/v1.5/concepts/crosswalk/
https://www.jianshu.com/p/a1344ca86e9b

Hardware Guideline

InfluxDB should be run on locally attached SSDs. Any other storage configuration will have lower performance characteristics and may not be able to recover from even small interruptions in normal processing.

ref:
https://docs.influxdata.com/influxdb/v1.5/guides/hardware_sizing/

Deployment

When you create a database, InfluxDB automatically creates a retention policy named autogen which has infinite retention. You may rename that retention policy or disable its auto-creation in the configuration file.

# tick/influxdb/service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: tick
  name: influxdb
spec:
  clusterIP: None
  selector:
    app: influxdb
  ports:
  - name: api
    port: 8086
    targetPort: api
  - name: admin
    port: 8083
    targetPort: admin
# tick/influxdb/statefulset.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: tick
  name: influxdb
data:
  influxdb.conf: |+
    [meta]
      dir = "/var/lib/influxdb/meta"
      retention-autocreate = false
    [data]
      dir = "/var/lib/influxdb/data"
      engine = "tsm1"
      wal-dir = "/var/lib/influxdb/wal"
  init.iql: |+
    CREATE DATABASE "telegraf" WITH DURATION 90d REPLICATION 1 SHARD DURATION 1h NAME "rp_90d"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: tick
  name: influxdb
spec:
  replicas: 1
  selector:
    matchLabels:
      app: influxdb
  serviceName: influxdb
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: ssd-ext4
      resources:
        requests:
          storage: 250Gi
  template:
    metadata:
      labels:
        app: influxdb
    spec:
      volumes:
      - name: config
        configMap:
          name: influxdb
          items:
            - key: influxdb.conf
              path: influxdb.conf
      - name: init-iql
        configMap:
          name: influxdb
          items:
            - key: init.iql
              path: init.iql
      containers:
      - name: influxdb
        image: influxdb:1.5.2-alpine
        ports:
        - name: api
          containerPort: 8086
        - name: admin
          containerPort: 8083
        volumeMounts:
        - name: data
          mountPath: /var/lib/influxdb
        - name: config
          mountPath: /etc/telegraf
        - name: init-iql
          mountPath: /docker-entrypoint-initdb.d
        resources:
          requests:
            cpu: 500m
            memory: 10G
          limits:
            cpu: 4000m
            memory: 10G
        readinessProbe:
          httpGet:
            path: /ping
            port: api
          initialDelaySeconds: 5
          timeoutSeconds: 5

ref:
https://hub.docker.com/_/influxdb/
https://docs.influxdata.com/influxdb/v1.5/administration/config/

$ kubectl apply -f tick/influxdb/ -R

Usage

$ kubectl get all --namespace tick
$ kubectl exec -i -t influxdb-0 --namespace tick -- influx
SHOW DATABASES
USE telegraf
SHOW MEASUREMENTS
SELECT * FROM diskio LIMIT 2
DROP MEASUREMENT access_log

ref:
https://docs.influxdata.com/influxdb/v1.5/query_language/data_download/
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/

Telegraf

Telegraf is a plugin-driven server agent process for collecting arbitrary metrics and writing them to multiple data storages include InfluxDB, Elasticsearch, CloudWatch, and so on.

ref:
https://www.influxdata.com/time-series-platform/telegraf/

Deployment

Collect system metrics per node using DaemonSet:

# tick/telegraf/daemonset.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: tick
  name: telegraf-ds
data:
  telegraf.conf: |+
    [agent]
      interval = "10s"
      round_interval = true
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      collection_jitter = "0s"
      flush_interval = "10s"
      flush_jitter = "0s"
      precision = ""
      debug = true
      quiet = false
      logfile = ""
      hostname = "$HOSTNAME"
      omit_hostname = false
    [[outputs.influxdb]]
      urls = ["http://influxdb.tick.svc.cluster.local:8086"]
      database = "telegraf"
      retention_policy = "rp_90d"
      write_consistency = "any"
      timeout = "5s"
      username = ""
      password = ""
      user_agent = "telegraf"
      insecure_skip_verify = false
    [[inputs.cpu]]
      percpu = true
      totalcpu = true
      collect_cpu_time = false
    [[inputs.disk]]
      ignore_fs = ["tmpfs", "devtmpfs"]
    [[inputs.diskio]]
    [[inputs.docker]]
      endpoint = "unix:///var/run/docker.sock"
      container_names = []
      timeout = "5s"
      perdevice = true
      total = false
    [[inputs.kernel]]
    [[inputs.kubernetes]]
      url = "http://$HOSTNAME:10255"
      bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      insecure_skip_verify = true
    [[inputs.mem]]
    [[inputs.processes]]
    [[inputs.swap]]
    [[inputs.system]]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: tick
  name: telegraf-ds
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 3
  selector:
    matchLabels:
      app: telegraf
      type: ds
  template:
    metadata:
      labels:
        app: telegraf
        type: ds
    spec:
      containers:
      - name: telegraf
        image: telegraf:1.5.3-alpine
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: "HOST_PROC"
          value: "/rootfs/proc"
        - name: "HOST_SYS"
          value: "/rootfs/sys"
        volumeMounts:
        - name: sys
          mountPath: /rootfs/sys
          readOnly: true
        - name: proc
          mountPath: /rootfs/proc
          readOnly: true
        - name: docker-socket
          mountPath: /var/run/docker.sock
        - name: varrunutmp
          mountPath: /var/run/utmp
          readOnly: true
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config
          mountPath: /etc/telegraf
          readOnly: true
        resources:
          requests:
            cpu: 50m
            memory: 500Mi
          limits:
            cpu: 200m
            memory: 500Mi
      volumes:
      - name: sys
        hostPath:
          path: /sys
      - name: docker-socket
        hostPath:
          path: /var/run/docker.sock
      - name: proc
        hostPath:
          path: /proc
      - name: varrunutmp
        hostPath:
          path: /var/run/utmp
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: config
        configMap:
          name: telegraf-ds

ref:
https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/docker/README.md
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/kubernetes/README.md

Collect arbitrary metrics:

# tick/telegraf/deployment.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: tick
  name: telegraf-infra
data:
  telegraf.conf: |+
    [agent]
      interval = "10s"
      round_interval = true
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      collection_jitter = "0s"
      flush_interval = "10s"
      flush_jitter = "0s"
      precision = ""
      debug = true
      quiet = false
      logfile = ""
      hostname = "telegraf-infra"
      omit_hostname = false
    [[outputs.influxdb]]
      urls = ["http://influxdb.tick.svc.cluster.local:8086"]
      database = "telegraf"
      retention_policy = "rp_90d"
      write_consistency = "any"
      timeout = "5s"
      username = ""
      password = ""
      user_agent = "telegraf"
      insecure_skip_verify = false
    [[inputs.http_listener]]
      service_address = ":8186"
    [[inputs.socket_listener]]
      service_address = "udp://:8092"
      data_format = "influx"
    [[inputs.redis]]
      servers = ["tcp://redis-cache.default.svc.cluster.local", "tcp://redis-broker.default.svc.cluster.local"]
    [[inputs.mongodb]]
      servers = ["mongodb://mongodb.default.svc.cluster.local"]
      gather_perdb_stats = true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: tick
  name: telegraf-infra
spec:
  replicas: 1
  selector:
    matchLabels:
      app: telegraf
      type: infra
  template:
    metadata:
      labels:
        app: telegraf
        type: infra
    spec:
      containers:
      - name: telegraf
        image: telegraf:1.5.3-alpine
        ports:
        - name: udp
          protocol: UDP
          containerPort: 8092
        - name: http
          containerPort: 8186
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf
        resources:
          requests:
            cpu: 50m
            memory: 500Mi
          limits:
            cpu: 500m
            memory: 500Mi
      volumes:
      - name: config
        configMap:
          name: telegraf-infra

ref:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/exec/README.md
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/mongodb/README.md
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/redis/README.md

$ kubectl apply -f tick/telegraf/ -R

Furthermore, you might also want to parse stdout of all containers, in that case, you could use logparser with DaemonSet:

# telegraf.conf
[agent]
    interval = "1s"
    round_interval = true
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    collection_jitter = "0s"
    flush_interval = "10s"
    flush_jitter = "0s"
    precision = ""
    debug = true
    quiet = false
    logfile = ""
    hostname = "$HOSTNAME"
    omit_hostname = false
[[outputs.file]]
  files = ["stdout"]
[[outputs.influxdb]]
    urls = ["http://influxdb.tick.svc.cluster.local:8086"]
    database = "your_db"
    retention_policy = "rp_90d"
    write_consistency = "any"
    timeout = "5s"
    username = ""
    password = ""
    user_agent = "telegraf"
    insecure_skip_verify = false
    namepass = ["logparser_*"]
[[inputs.logparser]]
    name_override = "logparser_api"
    files = ["/var/log/containers/api*.log"]
    from_beginning = false
    [inputs.logparser.grok]
    measurement = "api_access_log"
    patterns = ["bytes\\} \\[%{DATA:timestamp:ts-ansic}\\] %{WORD:request_method} %{URIPATH:request_path}%{DATA:request_params:drop} =\\\\u003e generated %{NUMBER:response_bytes:int} bytes in %{NUMBER:response_time_ms:int} msecs \\(HTTP/1.1 %{RESPONSE_CODE}"]
[[inputs.logparser]]
    name_override = "logparser_worker"
    files = ["/var/log/containers/worker*.log"]
    from_beginning = false
    [inputs.logparser.grok]
    measurement = "worker_task_log"
    patterns = ['''\[%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"},%{WORD:value1:drop}: %{LOGLEVEL:loglevel:tag}\/MainProcess\] Task %{PROG:task_name:tag}\[%{UUID:task_id:drop}\] %{WORD:execution_status:tag} in %{DURATION:execution_time:duration}''']

ref:
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/logparser
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/logparser/grok/patterns/influx-patterns
https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns

Debug telegraf.conf

$ docker run \
-v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf \
-v $PWD/api-1.log:/var/log/containers/api-1.log \
-v $PWD/worker-1.log:/var/log/containers/worker-1.log \
telegraf

ref:
https://grokdebug.herokuapp.com/

Usage

from telegraf.client import TelegrafClient

client = TelegrafClient(host='telegraf.tick.svc.cluster.local', port=8092)
client.metric('some_measurement', {'value_a': 100, 'value_b': 0, 'value_c': True}, tags={'country': 'taiwan'})

ref:
https://github.com/paksu/pytelegraf

Kapacitor

Kapacitor is so-called a real-time streaming data processing engine, basically, you would use it to trigger alerts.

ref:
https://www.influxdata.com/time-series-platform/kapacitor/

Deployment

# tick/kapacitor/service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: tick
  name: kapacitor-ss
spec:
  clusterIP: None
  selector:
    app: kapacitor
  ports:
  - name: api
    port: 9092
    targetPort: api
---
apiVersion: v1
kind: Service
metadata:
  namespace: tick
  name: kapacitor
spec:
  type: ClusterIP
  selector:
    app: kapacitor
  ports:
  - name: api
    port: 9092
    targetPort: api
# tick/kapacitor/statefulset.yaml


apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: tick
  name: kapacitor
spec:
  replicas: 1
  serviceName: kapacitor-ss
  selector:
    matchLabels:
      app: kapacitor
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: hdd-ext4
      resources:
        requests:
          storage: 10Gi
  template:
    metadata:
      labels:
        app: kapacitor
    spec:
      containers:
      - name: kapacitor
        image: kapacitor:1.4.1-alpine
        env:
        - name: KAPACITOR_HOSTNAME
          value: kapacitor
        - name: KAPACITOR_INFLUXDB_0_URLS_0
          value: http://influxdb.tick.svc.cluster.local:8086
        ports:
        - name: api
          containerPort: 9092
        volumeMounts:
        - name: data
          mountPath: /var/lib/kapacitor
        resources:
          requests:
            cpu: 50m
            memory: 500Mi
          limits:
            cpu: 500m
            memory: 500Mi

ref:
https://docs.influxdata.com/kapacitor/v1.4/

$ kubectl apply -f tick/kapacitor/ -R

Chronograf

Chronograf is the Web UI for TICK stack.

ref:
https://www.influxdata.com/time-series-platform/chronograf/

Deployment

# tick/chronograf/service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: tick
  name: chronograf-ss
spec:
  clusterIP: None
  selector:
    app: chronograf
---
apiVersion: v1
kind: Service
metadata:
  namespace: tick
  name: chronograf
spec:
  selector:
    app: chronograf
  ports:
  - name: api
    port: 80
    targetPort: api
# tick/chronograf/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: tick
  name: chronograf
spec:
  replicas: 1
  serviceName: chronograf-ss
  selector:
    matchLabels:
      app: chronograf
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: hdd-ext4
      resources:
        requests:
          storage: 10Gi
  template:
    metadata:
      labels:
        app: chronograf
    spec:
      containers:
      - name: chronograf
        image: chronograf:1.4.4.0-alpine
        command: ["chronograf"]
        args: ["--influxdb-url=http://influxdb.tick.svc.cluster.local:8086", "--kapacitor-url=http://kapacitor.tick.svc.cluster.local:9092"]
        ports:
        - name: api
          containerPort: 8888
        livenessProbe:
          httpGet:
            path: /ping
            port: api
        readinessProbe:
          httpGet:
            path: /ping
            port: api
        volumeMounts:
        - name: data
          mountPath: /var/lib/chronograf
        resources:
          requests:
            cpu: 100m
            memory: 1000Mi
          limits:
            cpu: 2000m
            memory: 1000Mi

ref:
https://docs.influxdata.com/chronograf/v1.4/

$ kubectl apply -f tick/chronograf/ -R
$ kubectl port-forward svc/chronograf 8888:80 --namespace tick

ref:
https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/

kube-lego: Automatically provision TLS certificates in Kubernetes

kube-lego: Automatically provision TLS certificates in Kubernetes

kube-lego automatically requests certificates for Kubernetes Ingress resources from Let's Encrypt.

ref:
https://github.com/jetstack/kube-lego
https://letsencrypt.org/

I run kube-lego v0.1.5 with Kubernetes v1.9.4, everything works very fine.

Deploy kube-lego

It is strongly recommended to try Let's Encrypt Staging API first.

# kube-lego/deployment.yaml
kind: Namespace
apiVersion: v1
metadata:
  name: kube-lego
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-lego
  namespace: kube-lego
data:
  LEGO.EMAIL: "[email protected]"
  # LEGO.URL: "https://acme-v01.api.letsencrypt.org/directory"
  LEGO.URL: "https://acme-staging.api.letsencrypt.org/directory"
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kube-lego
  namespace: kube-lego
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-lego
  template:
    metadata:
      labels:
        app: kube-lego
    spec:
      containers:
      - name: kube-lego
        image: jetstack/kube-lego:0.1.5
        ports:
        - containerPort: 8080
        env:
        - name: LEGO_LOG_LEVEL
          value: debug
        - name: LEGO_EMAIL
          valueFrom:
            configMapKeyRef:
              name: kube-lego
              key: LEGO.EMAIL
        - name: LEGO_URL
          valueFrom:
            configMapKeyRef:
              name: kube-lego
              key: LEGO.URL
        - name: LEGO_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: LEGO_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 1

ref:
https://github.com/jetstack/kube-lego/tree/master/examples

$ kubectl apply -f kube-lego/ -R

Configure the Ingress

  • Add an annotation kubernetes.io/tls-acme: "true" to metadata.annotations
  • Add domains to spec.tls.hosts.

spec.tls.secretName is the Secret used to store the certificate received from Let's Encrypt, i.e., tls.key and tls.crt. If no Secret exists with that name, it will be created by kube-lego.

# ingress.yaml
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: simple-project
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/tls-acme: "true"
spec:
  tls:
  - secretName: kittenphile-com-tls
    hosts:
    - kittenphile.com
    - www.kittenphile.com
    - api.kittenphile.com
  rules:
  - host: kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: http
  - host: www.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-frontend
          servicePort: http
  - host: api.kittenphile.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: simple-api
          servicePort: http

ref:
https://kubernetes.io/docs/concepts/services-networking/ingress/#tls

$ kubectl apply -f ingress.yaml

You could find exact ACME challenge paths by inspecting your Ingress resource.

$ kubectl describe ing simple-project
...
TLS:
  kittenphile-com-tls terminates kittenphile.com,www.kittenphile.com,api.kittenphile.com
Rules:
  Host                 Path  Backends
  ----                 ----  --------
kittenphile.com
                       /.well-known/acme-challenge/*   kube-lego-gce:8080 (<none>)
                       /*                              simple-frontend:http (<none>)
www.kittenphile.com
                       /.well-known/acme-challenge/*   kube-lego-gce:8080 (<none>)
                       /*                              simple-frontend:http (<none>)
api.kittenphile.com
                       /.well-known/acme-challenge/*   kube-lego-gce:8080 (<none>)
                       /*                              simple-api:http (<none>)
...

You might want to see logs of kube-lego Pods for observing the progress.

$ kubectl logs -f deploy/kube-lego --namespace kube-lego

Create a Production Certificate

After you make sure everything works ok, you are able to request production certificates for your domains.

Follow these instructions:

  • Change LEGO_URL to https://acme-v01.api.letsencrypt.org/directory
  • Delete account secret kube-lego-account
  • Delete certificate secret kittenphile-com-tls
  • Restart kube-lego
$ kubectl get secrets --all-namespaces
$ kubectl delete secret kube-lego-account --namespace kube-lego && \
  kubectl delete secret kittenphile-com-tls

$ kubectl replace --force -f kube-lego/ -R
$ kubectl logs -f deploy/kube-lego --namespace kube-lego

ref:
https://github.com/jetstack/kube-lego#switching-from-staging-to-production