Kubernetes is the de facto standard of container orchestration (deploying workloads on distributed systems). Google Kubernetes Engine (GKE) is the managed Kubernetes as a Service provided by Google Cloud Platform.
Currently, GKE is still your best choice compares to other managed Kubernetes services, i.e., Azure Container Service (AKS) and Amazon Elastic Container Service for Kubernetes (EKS).
ref:
https://kubernetes.io/
https://cloud.google.com/kubernetes-engine/
You could find the sample project on GitHub.
https://github.com/vinta/simple-project-on-k8s
Installation
Install gcloud
to create Kubernetes clusters on Google Cloud Platform.
Install kubectl
to interact with any Kubernetes cluster.
$ brew install kubernetes-cli
# or
$ gcloud components install kubectl
$ gcloud components update
ref:
https://cloud.google.com/sdk/docs/
https://kubernetes.io/docs/tasks/tools/install-kubectl/
Some useful tools:
fubectl
: https://github.com/kubermatic/fubectlk9s
: https://github.com/derailed/k9sstern
: https://github.com/wercker/sternzsh-kubectl-prompt
: https://github.com/superbrothers/zsh-kubectl-prompt
Concepts
Nodes
- Cluster: A set of machines, called nodes, that run containerized applications.
- Node: A single virtual or physical machine that provides hardware resources.
- Edge Node: The node which is exposed to the Internet.
- Master Node: The node which is responsible for managing the whole cluster.
Objects
- Pod: A group of tightly related containers. Each pod is like a logical host has its own IP, hostname, and storages.
- PodPreset: A set of pre-defined configurations can be injected into Pods automatically.
- Service: A load balancer of a set of Pods which selected by labels, also called Service Discovery.
- Ingress: A revered proxy acts as an entry point to the cluster, which allows domain-based and path-based routing to different Services.
- ConfigMap: Key-value configuration data can be mounted into containers or consumed as environment variables.
- Secret: Similar to ConfigMap but for storing sensitive data only.
- Volume: A ephemeral file system whose lifetime is the same as the Pod.
- PersistentVolume: A persistent file system that can be mounted to the cluster, without being associated with any particular node.
- PersistentVolumeClaim: A binding between a Pod and a PersistentVolume.
- StorageClass: A storage provisioner which allows users to request storages dynamically.
- Namespace: The way to partition a single cluster into multiple virtual groups.
Controllers
- ReplicationController: Ensures that a specified number of Pods are always running.
- ReplicaSet: The next-generation ReplicationController.
- Deployment: The recommended way to deploy stateless Pods.
- StatefulSet: Similar to Deployment but provides guarantees about the ordering and unique names of Pods.
- DaemonSet: Ensures a copy of a Pod is running on every node.
- Job: Creates Pods that runs to completion (exit with 0).
- CronJob: A Job which can run at a specific time or run regularly.
- HorizontalPodAutoscaler: Automatically scales the number of Pods based on CPU and memory utilization or custom metric targets.
ref:
https://kubernetes.io/docs/concepts/
https://kubernetes.io/docs/reference/glossary/?all=true
Setup Google Cloud Accounts
Make sure you use the right Google Cloud Platform account.
$ gcloud init
# or
$ gcloud config configurations list
$ gcloud config configurations activate default
$ gcloud config set project simple-project-198818
$ gcloud config set compute/region asia-east1
$ gcloud config set compute/zone asia-east1-a
$ gcloud config list
Create Clusters
Create a regional cluster in asia-east1
region which has 1 node in each of the asia-east1
zones using --region=asia-east1 --num-nodes=1
. By default, a cluster only creates its cluster master and nodes in a single compute zone.
# show available OSs and versions of Kubernetes
$ gcloud container get-server-config
# show available CPU platforms in the desired zone
$ gcloud compute zones describe asia-east1-a
availableCpuPlatforms:
- Intel Skylake
- Intel Broadwell
- Intel Haswell
- Intel Ivy Bridge
$ gcloud container clusters create demo \
--cluster-version=1.11.6-gke.6 \
--node-version=1.11.6-gke.6 \
--scopes=gke-default,cloud-platform,storage-full,compute-ro,pubsub,https://www.googleapis.com/auth/cloud_debugger \
--region=asia-east1 \
--num-nodes=1 \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--maintenance-window=20:00 \
--machine-type=n1-standard-4 \
--min-cpu-platform="Intel Skylake" \
--enable-ip-alias \
--create-subnetwork="" \
--image-type=UBUNTU \
--node-labels=custom.kubernetes.io/fs-type=xfs
$ gcloud container clusters describe demo --region=asia-east1
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-04T04:48:55Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-gke.5", GitCommit:"9aba9c1237d9d2347bef28652b93b1cba3aca6d8", GitTreeState:"clean", BuildDate:"2018-12-11T02:36:50Z", GoVersion:"go1.10.3b4", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get nodes -o wide
You can only get a regional cluster by creating a whole new cluster, Google currently won't allow you to turn an existed cluster into a regional one.
ref:
https://cloud.google.com/sdk/gcloud/reference/container/clusters/create
https://cloud.google.com/compute/docs/machine-types
https://cloud.google.com/kubernetes-engine/docs/concepts/regional-clusters
https://cloud.google.com/kubernetes-engine/docs/how-to/min-cpu-platform
https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips
Google Kubernetes Engine clusters running Kubernetes version 1.8+ enable Role-Based Access Control (RBAC) by default. Therefore, you must explicitly provide --enable-legacy-authorization
option to disable RBAC.
ref:
https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control
Delete the cluster. After you delete the cluster, you might also need to manually delete persistent disks (under Compute Engine), load balancers (under Network services) and static IPs (under VPC network) which belong to the cluster on Google Cloud Platform Console.
$ gcloud container clusters delete demo --region=asia-east1
Create Node Pools
Create a cluster with preemptible VMs which are much cheaper than regular instances using --preemptible
.
You might receive The connection to the server x.x.x.x was refused - did you specify the right host or port?
error while upgrading the cluster which includes adding new node pools.
$ gcloud container node-pools create n1-standard-4-pre \
--cluster=demo \
--node-version=1.11.6-gke.6 \
--scopes=gke-default,storage-full,compute-ro,pubsub,https://www.googleapis.com/auth/cloud_debugger \
--region=asia-east1 \
--num-nodes=1 \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--machine-type=n1-standard-4 \
--min-cpu-platform="Intel Skylake" \
--node-labels=custom.kubernetes.io/scopes-storage-full=true
--enable-autorepair \
--preemptible
$ gcloud container node-pools list --cluster=demo --region=asia-east1
$ gcloud container operations list
ref:
https://cloud.google.com/sdk/gcloud/reference/container/node-pools/create
https://cloud.google.com/kubernetes-engine/docs/concepts/preemptible-vm
https://cloud.google.com/compute/docs/regions-zones/
Build Docker Images
You could use Google Cloud Build or any Continuous Integration (CI) service to automatically build Docker images and push them to Google Container Registry.
Furthermore, you need to tag your Docker images appropriately with the registry name format: region_name.gcr.io/your_project_id/your_image_name:version
.
ref:
https://cloud.google.com/container-builder/
https://cloud.google.com/container-registry/
An example of cloudbuild.yaml
:
substitutions:
_REPO_NAME: simple-api
steps:
- id: pull-image
name: gcr.io/cloud-builders/docker
entrypoint: "/bin/sh"
args: [
"-c",
"docker pull asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$BRANCH_NAME || true"
]
waitFor: [
"-"
]
- id: build-image
name: gcr.io/cloud-builders/docker
args: [
"build",
"--cache-from", "asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$BRANCH_NAME",
"--label", "git.commit=$SHORT_SHA",
"--label", "git.branch=$BRANCH_NAME",
"--label", "ci.build-id=$BUILD_ID",
"-t", "asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$SHORT_SHA",
"simple-api/"
]
waitFor: [
"pull-image",
]
images:
- asia.gcr.io/$PROJECT_ID/$_REPO_NAME:$SHORT_SHA
ref:
https://cloud.google.com/container-builder/docs/build-config
https://cloud.google.com/container-builder/docs/create-custom-build-steps
Of course, you could also manually push Docker images to Google Container Registry.
$ gcloud auth configure-docker && \
gcloud config set project simple-project-198818 && \
export PROJECT_ID="$(gcloud config get-value project -q)"
$ docker build --rm -t asia.gcr.io/${PROJECT_ID}/simple-api:v1 simple-api/
$ gcloud docker -- push asia.gcr.io/${PROJECT_ID}/simple-api:v1
$ gcloud container images list --repository=asia.gcr.io/${PROJECT_ID}
ref:
https://cloud.google.com/container-registry/docs/pushing-and-pulling
Moreover, you should always adopt Multi-Stage builds for your Dockerfiles.
FROM python:3.6.8-alpine3.7 AS builder
ENV PATH=$PATH:/root/.local/bin
ENV PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /usr/src/app/
RUN apk add --no-cache --virtual .build-deps \
build-base \
linux-headers \
openssl-dev \
zlib-dev
COPY requirements.txt .
RUN pip install --user -r requirements.txt && \
find $(python -m site --user-base) -type f -name "*.pyc" -delete && \
find $(python -m site --user-base) -type f -name "*.pyo" -delete && \
find $(python -m site --user-base) -type d -name "__pycache__" -delete
###
FROM python:3.6.8-alpine3.7
ENV PATH=$PATH:/root/.local/bin
ENV FLASK_APP=app.py
WORKDIR /usr/src/app/
RUN apk add --no-cache --virtual .run-deps \
ca-certificates \
curl \
openssl \
zlib
COPY --from=builder /root/.local/ /root/.local/
COPY . .
EXPOSE 8000
CMD ["uwsgi", "--ini", "config/uwsgi.ini", "--single-interpreter", "--enable-threads", "--http", ":8000"]
ref:
https://medium.com/@tonistiigi/advanced-multi-stage-build-patterns-6f741b852fae
Create Pods
No, you should never create Pods directly which are so-called naked Pods. Use Deployment instead.
ref:
https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/
Pods have following life cycles (states):
- Pending
- Running
- Succeeded
- Failed
- Unknown
ref:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
Inspect Pods
Show information about Pods.
$ kubectl get all
$ kubectl get deploy
$ kubectl get pods
$ kubectl get pods -l app=simple-api
$ kubectl get pods
$ kubectl describe pod simple-api-5bbf4dd4f9-8b4c9
$ kubectl get pod simple-api-5bbf4dd4f9-8b4c9 -o yaml
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#describe
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#get
Execute a command in a container.
$ kubectl exec -i -t simple-api-5bbf4dd4f9-8b4c9 -- sh
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#exec
Tail Pod logs. It is also recommended to use kubetail
.
$ kubectl logs simple-api-5bbf4dd4f9-8b4c9 -f
$ kubectl logs deploy/simple-api -f
$ kubectl logs statefulset/mongodb-rs0 -f
$ kubetail simple-api
$ kubetail simple-worker
$ kubetail mongodb-rs0 -c db
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#logs
https://github.com/johanhaleby/kubetail
List all Pods on a certain node.
$ kubectl describe node gke-demo-default-pool-fb33ac26-frkw
...
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default mongodb-rs0-1 2100m (53%) 4 (102%) 4G (30%) 4G (30%)
default simple-api-84554476df-w5b5g 500m (25%) 1 (51%) 1G (16%) 1G (16%)
default simple-worker-6495b6b74b-rqplv 500m (25%) 1 (51%) 1G (16%) 1G (16%)
kube-system fluentd-gcp-v3.0.0-848nq 100m (2%) 0 (0%) 200Mi (1%) 300Mi (2%)
kube-system heapster-v1.5.3-6447d67f78-7psb2 138m (3%) 138m (3%) 301856Ki (2%) 301856Ki (2%)
kube-system kube-dns-788979dc8f-5zvfk 260m (6%) 0 (0%) 110Mi (0%) 170Mi (1%)
kube-system kube-proxy-gke-demo-default-pool-3c058fcf-x7cv 100m (2%) 0 (0%) 0 (0%) 0 (0%)
...
$ kubectl get pods --all-namespaces -o wide --sort-by="{.spec.nodeName}"
Check resource usage.
$ kubectl top pods
$ kubectl top nodes
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#top
https://kubernetes.io/docs/tasks/debug-application-cluster/
Restart Pods.
# you could simply kill Pods which would restart automatically if your Pods are managed by any Deployment
$ kubectl delete pods -l app=simple-worker
# you could replace a resource by providing a manifest
$ kubectl replace --force -f simple-api/
ref:
https://stackoverflow.com/questions/40259178/how-to-restart-kubernetes-pods
Completely delete resources.
$ kubectl delete -f simple-api/ -R
$ kubectl delete deploy simple-api
$ kubectl delete deploy -l app=simple,role=worker
# delete a Pod forcefully
$ kubectl delete pod simple-api-668d465985-886h5 --grace-period=0 --force
$ kubectl delete deploy simple-api --grace-period=0 --force
# delete all resources under a namespace
$ kubectl delete daemonsets,deployments,services,statefulset,pvc,pv --all --namespace tick
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#delete
Create ConfigMaps
Create an environment-variable-like ConfigMap.
kind: ConfigMap
apiVersion: v1
metadata:
name: simple-api
data:
FLASK_ENV: production
MONGODB_URL: mongodb://mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local,mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local,mongodb-rs0-3.mongodb-rs0.default.svc.cluster.local/demo?readPreference=secondaryPreferred&maxPoolSize=10
CACHE_URL: redis://redis-cache.default.svc.cluster.local/0
CELERY_BROKER_URL: redis://redis-broker.default.svc.cluster.local/0
CELERY_RESULT_BACKEND: redis://redis-broker.default.svc.cluster.local/1
Load environment variables from a ConfigMap:
kind: Deployment
apiVersion: apps/v1
metadata:
name: simple-api
labels:
app: simple-api
spec:
replicas: 1
selector:
matchLabels:
app: simple-api
template:
metadata:
labels:
app: simple-api
spec:
containers:
- name: simple-api
image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
command: ["uwsgi", "--ini", "config/uwsgi.ini", "--single-interpreter", "--enable-threads", "--http", ":8000"]
envFrom:
- configMapRef:
name: simple-api
ports:
- containerPort: 8000
Create a file-like ConfigMap.
kind: ConfigMap
apiVersion: v1
metadata:
name: redis-cache
data:
redis.conf: |-
maxmemory-policy allkeys-lfu
appendonly no
save ""
Mount files from a ConfigMap:
kind: Deployment
apiVersion: apps/v1
metadata:
name: redis-cache
labels:
app: redis-cache
spec:
replicas: 1
selector:
matchLabels:
app: redis-cache
template:
metadata:
labels:
app: redis-cache
spec:
volumes:
- name: config
configMap:
name: redis-cache
containers:
- name: redis
image: redis:4.0.10-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf", "--loglevel", "verbose", "--maxmemory", "1g"]
volumeMounts:
- name: config
mountPath: /etc/redis
ports:
- containerPort: 6379
ref:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/
Only mount a single file with subPath
.
kind: Deployment
apiVersion: apps/v1
metadata:
name: redis-cache
labels:
app: redis-cache
spec:
replicas: 1
selector:
matchLabels:
app: redis-cache
template:
metadata:
labels:
app: redis-cache
spec:
volumes:
- name: config
configMap:
name: redis-cache
containers:
- name: redis
image: redis:4.0.10-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf", "--loglevel", "verbose", "--maxmemory", "1g"]
volumeMounts:
- name: config
mountPath: /etc/redis/redis.conf
subPath: redis.conf
ports:
- containerPort: 6379
ref:
https://github.com/kubernetes/kubernetes/issues/44815#issuecomment-297077509
It is worth noting that changing ConfigMap or Secret won't trigger re-deploying Deployment. A workaround might be changing the name of ConfigMap every time you change the content of ConfigMap. If you mount ConfigMap as environment variables, you must trigger a re-deployment explicitly.
ref:
https://github.com/kubernetes/kubernetes/issues/22368
Create Secrets
First of all, Secrets are only base64 encoded, not encrypted.
Encode and decode a Secret value.
$ echo -n 'YOUR_SECRET_KEY' | base64
WU9VUl9TRUNSRVRfS0VZ
$ echo 'WU9VUl9TRUNSRVRfS0VZ' | base64 --decode
YOUR_SECRET_KEY
Create an environment-variable-like Secret.
kind: Secret
apiVersion: v1
metadata:
name: simple-api
data:
SECRET_KEY: WU9VUl9TRUNSRVRfS0VZ
Export data (base64-encoded) from a Secret.
$ kubectl get secret simple-project-com --export=true -o yaml
ref:
https://kubernetes.io/docs/concepts/configuration/secret/
Create Deployments With Probes
Deployment are designed for stateless (or nearly stateless) services. Deployment controls ReplicaSet and ReplicaSet controls Pod.
ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
livenessProbe
can be used to determine when an application must be restarted by Kubernetes, while readinessProbe
can be used to determine when a container is ready to accept traffic.
ref:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
It is also a best practice to always specify resource limits: resources.requests
and resources.limits
.
ref:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
Create a Deployment with probes.
kind: Deployment
apiVersion: apps/v1
metadata:
name: simple-api
labels:
app: simple-api
spec:
replicas: 1
selector:
matchLabels:
app: simple-api
template:
metadata:
labels:
app: simple-api
spec:
containers:
- name: simple-api
image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
command: ["uwsgi", "--ini", "config/uwsgi.ini", "--single-interpreter", "--enable-threads", "--http", ":8000"]
envFrom:
- configMapRef:
name: simple-api
ports:
- containerPort: 8000
livenessProbe:
exec:
command: ["curl", "-fsS", "-m", "0.1", "-H", "User-Agent: KubernetesHealthCheck/1.0", "http://127.0.0.1:8000/health"]
initialDelaySeconds: 5
periodSeconds: 1
successThreshold: 1
failureThreshold: 5
readinessProbe:
exec:
command: ["curl", "-fsS", "-m", "0.1", "-H", "User-Agent: KubernetesHealthCheck/1.0", "http://127.0.0.1:8000/health"]
initialDelaySeconds: 3
periodSeconds: 1
successThreshold: 1
failureThreshold: 3
resources:
requests:
cpu: 500m
memory: 1G
limits:
cpu: 1000m
memory: 1G
Create another Deployment of Celery workers.
kind: Deployment
apiVersion: apps/v1
metadata:
name: simple-worker
spec:
replicas: 2
selector:
matchLabels:
app: simple-worker
template:
metadata:
labels:
app: simple-worker
spec:
terminationGracePeriodSeconds: 30
containers:
- name: simple-worker
image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
command: ["celery", "-A", "app:celery", "worker", "--without-gossip", "-Ofair", "-l", "info"]
envFrom:
- configMapRef:
name: simple-api
readinessProbe:
exec:
command: ["sh", "-c", "celery inspect -q -A app:celery -d celery@$(hostname) --timeout 10 ping"]
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 3
resources:
requests:
cpu: 500m
memory: 1G
limits:
cpu: 1000m
memory: 1G
$ kubectl apply -f simple-api/ -R
$ kubectl get pods
The minimum value of timeoutSeconds
is 1
so that you might need to use exec.command
to run arbitrary shell commands with custom timeout settings.
Create Deployments With InitContainers
If multiple Init Containers are specified for a Pod, those Containers are run one at a time in sequential order. Each must succeed before the next can run. When all of the Init Containers have run to completion, Kubernetes initializes regular containers as usual.
kind: Service
apiVersion: v1
metadata:
name: gcs-proxy-media-simple-project-com
spec:
type: NodePort
selector:
app: gcs-proxy-media-simple-project-com
ports:
- name: http
port: 80
targetPort: 80
---
kind: ConfigMap
apiVersion: v1
metadata:
name: google-cloud-storage-proxy
data:
nginx.conf: |-
worker_processes auto;
http {
include mime.types;
default_type application/octet-stream;
server {
listen 80;
if ( $http_user_agent ~* (GoogleHC|KubernetesHealthCheck) ) {
return 200;
}
root /usr/share/nginx/html;
open_file_cache max=10000 inactive=10m;
open_file_cache_valid 1m;
open_file_cache_min_uses 1;
open_file_cache_errors on;
include /etc/nginx/conf.d/*.conf;
}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gcs-proxy-media-simple-project-com
spec:
replicas: 1
selector:
matchLabels:
app: gcs-proxy-media-simple-project-com
template:
metadata:
labels:
app: gcs-proxy-media-simple-project-com
spec:
volumes:
- name: nginx-config
configMap:
name: google-cloud-storage-proxy
- name: nginx-config-extra
emptyDir: {}
initContainers:
- name: create-robots-txt
image: busybox
command: ["sh", "-c"]
args:
- |
set -euo pipefail
cat << 'EOF' > /etc/nginx/conf.d/robots.txt
User-agent: *
Disallow: /
EOF
volumeMounts:
- name: nginx-config-extra
mountPath: /etc/nginx/conf.d/
- name: create-nginx-extra-conf
image: busybox
command: ["sh", "-c"]
args:
- |
set -euo pipefail
cat << 'EOF' > /etc/nginx/conf.d/extra.conf
location /robots.txt {
alias /etc/nginx/conf.d/robots.txt;
}
EOF
volumeMounts:
- name: nginx-config-extra
mountPath: /etc/nginx/conf.d/
containers:
- name: http
image: swaglive/openresty:gcsfuse
imagePullPolicy: Always
args: ["nginx", "-c", "/usr/local/openresty/nginx/conf/nginx.conf", "-g", "daemon off;"]
ports:
- containerPort: 80
securityContext:
privileged: true
capabilities:
add: ["CAP_SYS_ADMIN"]
env:
- name: GCSFUSE_OPTIONS
value: "--debug_gcs --implicit-dirs --stat-cache-ttl 1s --type-cache-ttl 24h --limit-bytes-per-sec -1 --limit-ops-per-sec -1 -o ro,allow_other"
- name: GOOGLE_CLOUD_STORAGE_BUCKET
value: asia.contents.simple-project.com
volumeMounts:
- name: nginx-config
mountPath: /usr/local/openresty/nginx/conf/nginx.conf
subPath: nginx.conf
readOnly: true
- name: nginx-config-extra
mountPath: /etc/nginx/conf.d/
readOnly: true
readinessProbe:
httpGet:
port: 80
path: /
httpHeaders:
- name: User-Agent
value: "KubernetesHealthCheck/1.0"
timeoutSeconds: 1
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 1
successThreshold: 1
resources:
requests:
cpu: 0m
memory: 500Mi
limits:
cpu: 1000m
memory: 500Mi
$ kubectl exec -i -t simple-api-5968cfc48d-8g755 -- sh (gke_simple-project-198818_asia-east1_demo/default)
> curl http://gcs-proxy-media-simple-project-com/robots.txt
User-agent: *
Disallow: /
ref:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340
Create Deployments With Canary Deployment
TODO
ref:
https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments
https://medium.com/google-cloud/kubernetes-canary-deployments-for-mere-mortals-13728ce032fe
Rollback A Deployment
Yes, you could publish a deployment with kubectl apply --record
and rollback it with kubectl rollout undo
. However, the simplest way might be just git checkout
the previous commit and deploy again with kubectl apply
.
The formal way.
$ kubectl apply -f simple-api/ -R --record
$ kubectl rollout history deploy/simple-api
$ kubectl rollout undo deploy/simple-api --to-revision=2
The git way.
$ git checkout b7ed8d5
$ kubectl apply -f simple-api/ -R
$ kubectl get pods
ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment
Scale A Deployment
Simply increase the number of spec.replicas
and deploy again.
$ kubectl apply -f simple-api/ -R
# or
$ kubectl scale --replicas=10 deploy/simple-api
$ kubectl get pods
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#scale
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment
Create HorizontalPodAutoscalers (HPA)
The Horizontal Pod Autoscaler automatically scales the number of pods in a Deployment based on observed CPU utilization, memory usage, or custom metrics. Yes, HPA only applies to Deployments and ReplicationControllers.
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: simple-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: simple-api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageValue: 800M
---
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: simple-worker
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: simple-worker
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageValue: 500M
$ kubectl apply -f simple-api/hpa.yaml
$ kubectl get hpa --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
simple-api Deployment/simple-api 18685952/800M, 4%/80% 2 20 3 10m
simple-worker Deployment/simple-worker 122834944/500M, 11%/80% 2 10 3 10m
ref:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
You could run some load testing.
ref:
https://medium.com/@jonbcampos/kubernetes-horizontal-pod-scaling-190e95c258f5
There is also Cluster Autoscaler in Google Kubernetes Engine.
$ gcloud container clusters update demo \
--enable-autoscaling --min-nodes=1 --max-nodes=10 \
--node-pool=default-pool
ref:
https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
Create VerticalPodsAutoscalers (VPA)
TODO
Create PodDisruptionBudget (PDB)
- Voluntary disruptions: actions initiated by application owners or admins.
- Involuntary disruptions: unavoidable cases like hardware failures or system software error.
PodDisruptionBudgets are only accounted for with voluntary disruptions, something like a hardware failure will not take PodDisruptionBudget into account. PDB cannot prevent involuntary disruptions from occurring, but they do count against the budget.
Create a PodDisruptionBudget for a stateless application.
kind: PodDisruptionBudget
apiVersion: policy/v1beta1
metadata:
name: simple-api
spec:
minAvailable: 90%
selector:
matchLabels:
app: simple-api
Create a PodDisruptionBudget for a multiple-instance stateful application.
kind: PodDisruptionBudget
apiVersion: policy/v1beta1
metadata:
name: mongodb-rs0
spec:
minAvailable: 2
selector:
matchLabels:
app: mongodb-rs0
$ kubectl apply -f simple-api/pdb.yaml
$ kubectl apply -f mongodb/pdb.yaml
$ kubectl get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
mongodb-rs0 2 N/A 1 48m
simple-api 90% N/A 0 48m
ref:
https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
https://kubernetes.io/docs/tasks/run-application/configure-pdb/
Actually, you could also have the similar functionality using .spec.strategy.rollingUpdate
.
maxUnavailable
: The maximum number of Pods that can be unavailable during the update process.maxSurge
: The maximum number of Pods that can be created over the desired number of Pods.
Which makes sure that total ready Pods >= total desired Pods - maxUnavailable
and total Pods <= total desired Pods + maxSurge
.
ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#writing-a-deployment-spec
https://cloud.google.com/kubernetes-engine/docs/how-to/updating-apps
Create Services
A Service is basically a load balancer of a set of Pods which are selected by labels. Since you can't rely on any Pod's IP which changes every time it creates and destroys, you should always provide a Service as an entry point for your Pods or so-called Microservice.
Typically, containers you run in the cluster are not accessible from the Internet, because they do not have external IP addresses. You must explicitly expose your application by creating a Service or an Ingress.
There are following Service types:
ClusterIP
: A virtual IP which is only reachable from within the cluster. Also, the default Service type.NodePort
: It opens a specific port on all Nodes, and any traffic sent to the specific port on any node is forwarded to the Service.LoadBalancer
: It builds onNodePorts
by additionally configuring the cloud provider to create an external load balancer.ExternalName
: It maps the service to an external CNAME record, i.e., your MySQL RDS on AWS.
Create a Service.
kind: Service
apiVersion: v1
metadata:
name: simple-api
spec:
type: NodePort
selector:
app: simple-api
ports:
- name: http
port: 80
targetPort: 8000
type: NodePorts
is enough in most cases; spec.selector
must match labels defined in the corresponding Deployment as the same as spec.ports.targetPort
and spec.ports.protocol
.
$ kubectl apply -f simple-api/ -R
$ kubectl get svc,endpoints
$ kubespy trace service simple-api
[ADDED v1/Service] default/simple-api
[ADDED v1/Endpoints] default/simple-api
Directs traffic to the following live Pods:
- [Ready] simple-api-6b4b4c4bfb-g5dln @ 10.28.1.42
- [Ready] simple-api-6b4b4c4bfb-h66dg @ 10.28.8.24
ref:
https://kubernetes.io/docs/concepts/services-networking/service/
https://medium.com/google-cloud/kubernetes-nodeport-vs-loadbalancer-vs-ingress-when-should-i-use-what-922f010849e0
After a Service is created, kube-dns
creates a corresponding DNS A record named your-service.your-namespace.svc.cluster.local
which resolves to an internal IP in the cluster. In ths case: simple-api.default.svc.cluster.local
. Headless Services (without a cluster IP) are also assigned a DNS A record which has the same form. Unlike normal Services, this A record directly resolves to a set of IPs of Pods selected by the Service. Clients should be expected to consume the set of IPs or use round-robin selection from the set.
You should always prefer DNS names of a Service over injected environment variables, e.g., FOO_SERVICE_HOST
and FOO_SERVICE_PORT
.
ref:
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
For more detail about Kubernetes networking, go to:
https://github.com/hackstoic/kubernetes_practice/blob/master/%E7%BD%91%E7%BB%9C.md
https://containerops.org/2017/01/30/kubernetes-services-and-ingress-under-x-ray/
https://www.safaribooksonline.com/library/view/kubernetes-up-and/9781491935668/ch07.html
Configure Services With Google Cloud CDN
kind: BackendConfig
apiVersion: cloud.google.com/v1beta1
metadata:
name: cdn
spec:
cdn:
enabled: true
cachePolicy:
includeHost: false
includeProtocol: false
includeQueryString: false
---
kind: Service
apiVersion: v1
metadata:
name: gcs-proxy-media-simple-project-com
annotations:
beta.cloud.google.com/backend-config: '{"ports": {"http":"cdn"}}'
cloud.google.com/neg: '{"ingress": true}'
spec:
selector:
app: gcs-proxy-media-simple-project-com
ports:
- name: http
port: 80
targetPort: 80
ref:
https://cloud.google.com/kubernetes-engine/docs/concepts/backendconfig
Configure Services With Network Endpoint Groups (NEGs)
To use container-native load balancing, you must create a cluster with --enable-ip-alias
flag, and just add an annotation to your Services. However, the load balancer is not created until you create an Ingress for the Service.
kind: Service
apiVersion: v1
metadata:
name: simple-api
annotations:
cloud.google.com/neg: '{"ingress": true}'
spec:
selector:
app: simple-api
ports:
- name: http
port: 80
targetPort: 8000
ref:
https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing
Create An Internal Load Balancer
Use Port Forwarding
Access a Service or a Pod on your local machine with port forwarding.
# 8080 is the local port and 80 is the remote port
$ kubectl port-forward svc/simple-api 8080:80
# port forward to a Pod directly
$ kubectl port-forward mongo-rs0-0 27017:27017
$ open http://127.0.0.1:8080/
ref:
https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/
Create An Ingress
Pods in Kubernetes are not reachable from outside the cluster, so you need a way to expose your Pods to the Internet. Even though you could associate Pods with a Service of the right type, i.e., NodePort
or LoadBalancer
, the recommended way to expose services is using Ingress. You can do a lot of different things with an Ingress, and there are many types of Ingress controllers that have different capabilities.
There are some reasons to choose Ingress over Service:
- Service is internal load balancer and Ingress is a gateway of external access to Services
- Service is L3 load balancer and Ingress is L7 load balancer
- Ingress allows domain-based and path-based routing to different Services
- It is not efficient to create a cloud provider's load balancer for each Service you want to expose
Create an Ingress which is implemented using Google Cloud Load Balancing (L7 HTTP load balancer). You should make sure Services exist before creating the Ingress.
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: simple-project
annotations:
kubernetes.io/ingress.class: "gce"
# kubernetes.io/tls-acme: "true"
# ingress.kubernetes.io/ssl-redirect: "true"
spec:
# tls:
# - secretName: simple-project-com-tls
# hosts:
# - simple-project.com
# - www.simple-project.com
# - api.simple-project.com
rules:
- host: simple-project.com
http:
paths:
- path: /*
backend:
serviceName: simple-frontend
servicePort: 80
- host: www.simple-project.com
http:
paths:
- path: /*
backend:
serviceName: simple-frontend
servicePort: 80
- host: api.simple-project.com
http:
paths:
- path: /*
backend:
serviceName: simple-api
servicePort: 80
- host: asia.contents.simple-project.com
http:
paths:
- path: /*
backend:
serviceName: gcs-proxy-media-simple-project-com
servicePort: 80
backend:
serviceName: simple-api
servicePort: 80
It might take several minutes to spin up a Google HTTP load balancer (includes acquiring the public IP), and at least 5 minutes before the GCE API starts healthchecking backends. After getting your public IP, you could go to your domain provider and create new DNS records which point to the IP.
$ kubectl apply -f ingress.yaml
$ kubectl describe ing simple-project
ref:
https://kubernetes.io/docs/concepts/services-networking/ingress/
https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/
https://www.joyfulbikeshedding.com/blog/2018-03-26-studying-the-kubernetes-ingress-system.html
To read more about Google Load balancer, go to:
https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer
https://cloud.google.com/compute/docs/load-balancing/http/backend-service
Setup The Ingress With TLS Certificates
To automatically create HTTPS certificates for your domains:
- https://vinta.ws/code/cert-manager-automatically-provision-tls-certificates-in-kubernetes.html
- https://vinta.ws/code/kube-lego-automatically-provision-tls-certificates-in-kubernetes.html
Create Ingress Controllers
Kubernetes supports multiple Ingress controllers:
- https://github.com/kubernetes/ingress-gce
- https://github.com/kubernetes/ingress-nginx
- https://github.com/nginxinc/kubernetes-ingress
- https://github.com/containous/traefik/
ref:
https://container-solutions.com/production-ready-ingress-kubernetes/
Create StorageClasses
StorageClass provides a way to define different available storage types, for instance, ext4 SSD, XFS SSD, CephFS, NFS. You could specify what you want in PersistentVolumeClaim or StatefulSet.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ssd-xfs
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
fsType: xfs
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ssd-regional
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
zones: asia-east1-a, asia-east1-b, asia-east1-c
replication-type: regional-pd
$ kubectl apply -f storageclass.yaml
$ kubectl get sc
NAME PROVISIONER AGE
ssd kubernetes.io/gce-pd 5s
ssd-regional kubernetes.io/gce-pd 4s
ssd-xfs kubernetes.io/gce-pd 3s
standard (default) kubernetes.io/gce-pd 1h
ref:
https://kubernetes.io/docs/concepts/storage/storage-classes/#gce
Create PersistentVolumeClaims
A Volume is just a directory which you could mount into containers and it is shared by all containers inside the same Pod. Also, it has an explicit lifetime - the same as the Pod that encloses it. Sources of Volume are various, they could be a remote Git repo, a file path of the host machine, a folder from a PersistentVolumeClaim, or data from a ConfigMap and a Secret.
PersistentVolumes are used to manage durable storage in a cluster. Unlike Volumes, PersistentVolumes have a lifecycle independent of any individual Pod. On Google Kubernetes Engine, PersistentVolumes are typically backed by Google Compute Engine Persistent Disks. Typically, you don't have to create PersistentVolumes explicitly. In Kubernetes 1.6 and later versions, you only need to create PersistentVolumeClaim, and the corresponding PersistentVolume would be dynamically provisioned with StorageClasses. Pods use PersistentVolumeClaims as Volumes.
Be care of creating a Deployment with PersistentVolumeClaim. In most of the case, you might not want to multiple replica of a Deployment write data into the same PersistentVolumeClaim.
ref:
https://kubernetes.io/docs/concepts/storage/volumes/
https://kubernetes.io/docs/concepts/storage/persistent-volumes/
https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes
Also, IOPS is based on the disk size and node size. You need to claim a large disk size if you want high IOPS even you only have very few disk usage.
ref:
https://cloud.google.com/compute/docs/disks/performance
On Kubernetes v1.10+, it is possible to create local PersistentVolumes for your StatefulSets. Previously, PersistentVolumes only supported remote volume types, for instance, GCE's Persistent Disk and AWS's EBS. However, using local storage ties your applications to that specific node, making your application harder to schedule.
ref:
https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/
Create A StatefulSet
Pods created under a StatefulSet have a few unique attributes: the name of the pod is not random, instead each pod gets an ordinal name. In addition, Pods are created one at a time instead of all at once, which can help when bootstrapping a stateful system. StatefulSet also deletes/updates one Pod at a time, in reverse order with respect to its ordinal index, and it waits for each to be completely shutdown before deleting the next.
Rule of thumb: once you find out that you need PersistentVolume for the component, you might just consider using StatefulSet.
ref:
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
https://akomljen.com/kubernetes-persistent-volumes-with-deployment-and-statefulset/
Create a StatefulSet of a three-node MongoDB replica set.
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: default-view
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
name: default
namespace: default
---
kind: Service
apiVersion: v1
metadata:
name: mongodb-rs0
spec:
clusterIP: None
selector:
app: mongodb-rs0
ports:
- port: 27017
targetPort: 27017
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: mongodb-rs0
spec:
replicas: 3
updateStrategy:
type: RollingUpdate
serviceName: mongodb-rs0
selector:
matchLabels:
app: mongodb-rs0
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: ssd-xfs
resources:
requests:
storage: 100G
template:
metadata:
labels:
app: mongodb-rs0
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom.kubernetes.io/fs-type
operator: In
values:
- "xfs"
- key: cloud.google.com/gke-preemptible
operator: NotIn
values:
- "true"
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- mongodb-rs0
terminationGracePeriodSeconds: 10
containers:
- name: db
image: mongo:3.6.5
command: ["mongod"]
args: ["--bind_ip_all", "--replSet", "rs0"]
ports:
- containerPort: 27017
volumeMounts:
- name: data
mountPath: /data/db
readinessProbe:
exec:
command: ["mongo", --eval, "db.adminCommand('ping')"]
resources:
requests:
cpu: 2
memory: 4G
limits:
cpu: 4
memory: 4G
- name: sidecar
image: cvallance/mongo-k8s-sidecar
env:
- name: MONGO_SIDECAR_POD_LABELS
value: app=mongodb-rs0
- name: KUBE_NAMESPACE
value: default
- name: KUBERNETES_MONGO_SERVICE_NAME
value: mongodb-rs0
$ kubectl apply -f storageclass.yaml
$ kubectl apply -f mongodb/ -R
$ kubectl get pods
$ kubetail mongodb -c db
$ kubetail mongodb -c sidecar
$ kubectl scale statefulset mongodb-rs0 --replicas=4
The purpose of cvallance/mongo-k8s-sidecar
is to automatically add new Pods to the replica set and remove Pods from the replica set while you scale up or down MongoDB StatefulSet.
ref:
https://github.com/cvallance/mongo-k8s-sidecar
https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/
https://medium.com/@thakur.vaibhav23/scaling-mongodb-on-kubernetes-32e446c16b82
Create A Headless Service For A StatefulSet
Headless Services (clusterIP: None
) are just like normal Kubernetes Services, except they don’t do any load balancing for you. For a typical StatefulSet component, for instance, a database with Master-Slave replication, you don't want Kubernetes load balancing in order to prevent writing data to slaves accidentally.
When headless Services combine with StatefulSets, they can give you unique DNS addresses which return A records that point directly to Pods themselves. DNS names are in the format of static-pod-name.headless-service-name.namespace.svc.cluster.local
.
kind: Service
apiVersion: v1
metadata:
name: redis-broker
spec:
clusterIP: None
selector:
app: redis-broker
ports:
- port: 6379
targetPort: 6379
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: redis-broker
spec:
replicas: 1
serviceName: redis-broker
selector:
matchLabels:
app: redis-broker
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: ssd
resources:
requests:
storage: 32Gi
template:
metadata:
labels:
app: redis-broker
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: NotIn
values:
- "true"
volumes:
- name: config
configMap:
name: redis-broker
containers:
- name: redis
image: redis:4.0.10-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf", "--loglevel", "verbose", "--maxmemory", "1g"]
ports:
- containerPort: 6379
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
readinessProbe:
exec:
command: ["sh", "-c", "redis-cli -h $(hostname) ping"]
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 1
successThreshold: 1
failureThreshold: 3
resources:
requests:
cpu: 250m
memory: 1G
limits:
cpu: 1000m
memory: 1G
If redis-broker
has 2 replicas, nslookup redis-broker.default.svc.cluster.local
returns multiple A records for a single DNS lookup is commonly known as round-robin DNS.
$ kubectl run -i -t --image busybox dns-test --restart=Never --rm /bin/sh
> nslookup redis-broker.default.svc.cluster.local
Server: 10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local
Name: redis-broker.default.svc.cluster.local
Address 1: 10.60.6.2 redis-broker-0.redis-broker.default.svc.cluster.local
Address 2: 10.60.6.7 redis-broker-1.redis-broker.default.svc.cluster.local
> nslookup redis-broker-0.redis-broker.default.svc.cluster.local
Server: 10.63.240.10
Address 1: 10.63.240.10 kube-dns.kube-system.svc.cluster.local
Name: redis-broker-0.redis-broker.default
Address 1: 10.60.6.2 redis-broker-0.redis-broker.default.svc.cluster.local
ref:
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#using-stable-network-identities
Moreover, there is no port re-mapping for a headless Service due to the IP resolves to Pod directly.
kind: Service
apiVersion: v1
metadata:
namespace: tick
name: influxdb
spec:
clusterIP: None
selector:
app: influxdb
ports:
- name: api
port: 4444
targetPort: 8086
- name: admin
port: 8083
targetPort: 8083
$ kubectl apply -f tick/ -R
$ kubectl get svc --namespace tick
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
influxdb ClusterIP None <none> 4444/TCP,8083/TCP 1h
$ curl http://influxdb.tick.svc.cluster.local:4444/ping
curl: (7) Failed to connect to influxdb.tick.svc.cluster.local port 4444: Connection refused
$ curl -I http://influxdb.tick.svc.cluster.local:8086/ping
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: 7fc09a56-8538-11e8-8d1d-000000000000
Create A DaemonSet
Create a DaemonSet which changes OS kernel configurations on each node.
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: thp-disabler
spec:
selector:
matchLabels:
app: thp-disabler
template:
metadata:
labels:
app: thp-disabler
spec:
hostPID: true
containers:
- name: configurer
image: gcr.io/google-containers/startup-script:v1
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
set -o errexit
set -o pipefail
set -o nounset
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
ref:
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
Create A CronJob
Backup your MongoDB database every hour.
kind: CronJob
apiVersion: batch/v1beta1
metadata:
name: backup-mongodb-rs0
spec:
suspend: false
schedule: "30 * * * *"
startingDeadlineSeconds: 600
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom.kubernetes.io/scopes-storage-full
operator: In
values:
- "true"
volumes:
- name: backups-dir
emptyDir: {}
initContainers:
- name: clean
image: busybox
command: ["rm", "-rf", "/backups/*"]
volumeMounts:
- name: backups-dir
mountPath: /backups
- name: backup
image: vinta/mongodb-tools:4.0.1
workingDir: /backups
command: ["sh", "-c"]
args:
- mongodump --host=$MONGODB_URL --readPreference=secondaryPreferred --oplog --gzip --archive=$(date +%Y-%m-%dT%H-%M-%S).tar.gz
env:
- name: MONGODB_URL
value: mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local,mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local,mongodb-rs0-3.mongodb-rs0.default.svc.cluster.local
volumeMounts:
- name: backups-dir
mountPath: /backups
resources:
requests:
cpu: 2
memory: 2G
containers:
- name: upload
image: google/cloud-sdk:alpine
workingDir: /backups
command: ["sh", "-c"]
args:
- gsutil -m cp -r . gs://$(GOOGLE_CLOUD_STORAGE_BUCKET)
env:
- name: GOOGLE_CLOUD_STORAGE_BUCKET
value: simple-project-backups
volumeMounts:
- name: backups-dir
mountPath: /backups
readOnly: true
Note: The environment variable appears in parentheses, $(VAR)
, and it is required for the variable to be expanded in the command
or args
field.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: simple-api-send-email
spec:
schedule: "*/30 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: simple-api-send-email
image: asia.gcr.io/simple-project-198818/simple-api:4fc4199
command: ["flask", "shell", "-c"]
args:
- |
from bar.tasks import send_email
send_email.delay('Hey!', 'Stand up!', to=['[email protected]'])
envFrom:
- configMapRef:
name: simple-api
You could just write a simple Python script as a CronJob since everyting is containerized.
ref:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
Define NodeAffinity And PodAffinity
Prevent that Pods locate on preemptible nodes. Also, you should always prefer nodeAffinity
over nodeSelector
.
kind: StatefulSet
apiVersion: apps/v1
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: NotIn
values:
- "true"
spec.PodAntiAffinity
ensures that each Pod of the same Deployment or StatefulSet does not co-locate on a single node.
kind: StatefulSet
apiVersion: apps/v1
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- mongodb-rs0
ref:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
Migrate Pods from Old Nodes to New Nodes
- Cordon marks old nodes as unschedulable
- Drain evicts all Pods on old nodes
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=n1-standard-4-pre -o=name); do
kubectl cordon "$node";
done
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=n1-standard-4-pre -o=name); do
kubectl drain --ignore-daemonsets --delete-local-data --grace-period=2 "$node";
done
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-demo-default-pool-3c058fcf-x7cv Ready <none> 2h v1.11.6-gke.6
gke-demo-default-pool-58da1098-1h00 Ready <none> 2h v1.11.6-gke.6
gke-demo-default-pool-fc34abbf-9dwr Ready <none> 2h v1.11.6-gke.6
gke-demo-n1-standard-4-pre-1a54e45a-0m7p Ready,SchedulingDisabled <none> 58m v1.11.6-gke.6
gke-demo-n1-standard-4-pre-1a54e45a-mx3h Ready,SchedulingDisabled <none> 58m v1.11.6-gke.6
gke-demo-n1-standard-4-pre-1a54e45a-qhdz Ready,SchedulingDisabled <none> 58m v1.11.6-gke.6
ref:
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#cordon
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain
https://cloud.google.com/kubernetes-engine/docs/tutorials/migrating-node-pool
Show Objects' Events
$ kubectl get events -w
$ kubectl get events -w --sort-by=.metadata.creationTimestamp
$ kubectl get events -w --sort-by=.metadata.creationTimestamp | grep mongo
ref:
https://kubernetes.io/docs/tasks/debug-application-cluster/
You could find more comprehensive logs on Google Cloud Stackdriver Logging if you are using GKE.
View Pods' Logs on Stackdriver Logging
You could use the following search formats.
textPayload:"OBJECT_FINALIZE"
logName="projects/simple-project-198818/logs/worker"
textPayload:"Added media preset"
logName="projects/simple-project-198818/logs/beat"
textPayload:"backend_cleanup"
resource.labels.pod_id="simple-api-6744bf74db-529qf"
textPayload:"5adb2bd460d6487649fe82ea"
timestamp>="2018-04-21T12:00:00Z"
timestamp<="2018-04-21T16:00:00Z"
resource.type="k8s_container"
resource.labels.cluster_name="production"
resource.labels.namespace_id="default"
resource.labels.pod_id:"simple-worker"
textPayload:"ConcurrentObjectUseError"
resource.type="k8s_node"
resource.labels.location="asia-east1"
resource.labels.cluster_name="production"
logName="projects/simple-project-198818/logs/node-problem-detector"
# see a Pod's logs
resource.type="k8s_container"
resource.labels.cluster_name="production"
resource.labels.namespace_id="default"
resource.labels.pod_name="cache-redis-0"
"start"
# see a Node's logs
resource.type="k8s_node"
resource.labels.location="asia-east1"
resource.labels.cluster_name="production"
resource.labels.node_name="gke-production-n1-highmem-32-p0-2bd334ec-v4ng"
"start"
ref:
https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/
https://cloud.google.com/logging/docs/view/advanced-filters
Best Practices
ref:
https://cloud.google.com/solutions/best-practices-for-building-containers
https://medium.com/@sachin.arote1/kubernetes-best-practices-9b1435a4cb53
https://medium.com/@brendanrius/scaling-kubernetes-for-25m-users-a7937e3536a0
Common Issues
Switch Contexts
Get authentication credentials to allow your kubectl
to interact with the cluster.
$ gcloud container clusters get-credentials demo --project simple-project-198818
ref:
https://cloud.google.com/sdk/gcloud/reference/container/clusters/get-credentials
https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
A Context is roughly a configuration profile which indicates the cluster, the namespace, and the user you use. Contexts are stored in ~/.kube/config
.
$ kubectl config get-contexts
$ kubectl config use-context gke_simple-project-198818_asia-east1_demo
$ kubectl config view
ref:
https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/
The recommended way to switch contexts is using fubectl
.
$ kcs
ref:
https://github.com/kubermatic/fubectl
Pending Pods
One of the most common reasons of Pending Pods is lack of resources.
$ kubectl describe pod mongodb-rs0-1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x739 over 1d) default-scheduler 0/3 nodes are available: 1 ExistingPodsAntiAffinityRulesNotMatch, 1 MatchInterPodAffinity, 1 NodeNotReady, 2 NoVolumeZoneConflict, 3 Insufficient cpu, 3 Insufficient memory, 3 MatchNodeSelector.
...
You could resize nodes in the cluster.
$ gcloud container clusters resize demo --node-pool=n1-standard-4-pre --size=5 --region=asia-east1
ref:
https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/
Init:Error Pods
$ kubectl describe mongodump-sh0-1543978800-bdkhl
$ kubectl logs mongodump-sh0-1543978800-bdkhl -c mongodump
CrashLoopBackOff Pods
CrashLoopBackOff
means the Pod is starting, then crashing, then starting again and crashing again.
When in doubt, kubectl describe
.
$ kubectl describe pod the-pod-name
$ kubectl logs the-pod-name --previous
ref:
https://www.krenger.ch/blog/crashloopbackoff-and-how-to-fix-it/
https://sysdig.com/blog/debug-kubernetes-crashloopbackoff/