ngrok: Share your localhost services with friends

ngrok: Share your localhost services with friends

Generate a https://xxx.ngrok.com URL for letting other people access your localhost services.

ref:
https://github.com/inconshreveable/ngrok
https://github.com/localtunnel/localtunnel

Install

Download ngrok from https://ngrok.com/download.

$ unzip ngrok-stable-darwin-amd64.zip && \
sudo mv ngrok /usr/local/bin && \
sudo chown vinta:admin /usr/local/bin/ngrok

$ ngrok --version
ngrok version 2.3.35

Usage

Get your auth token in https://dashboard.ngrok.com/auth.

$ ngrok authtoken YOUR_TOKEN

# open a session to local port 8000
# you can also specify a custom subdomain for the tunnel
$ ngrok http 8000
$ ngrok http -subdomain=vinta-test-server -region=ap 8000
$ open https://vinta-test-server.ap.ngrok.io/

# view ngrok sessions
$ open http://localhost:4040/

ref:
https://ngrok.com/docs

Use Makefile as a task runner for arbitrary projects

Use Makefile as a task runner for arbitrary projects

Use the GNU make, Luke!

Some notes:

  • The first target will be executed by default when we call make without any subcommand.
  • The order of the targets does not matter.
  • Add an @ to suppress output of the command that is executed.

ref:
https://www.gnu.org/software/make/manual/make.html

.PHONY

Let's assume you have install target, which is a very common in Makefiles. If you do not add .PHONY, and a file or a directory named install exists in the same directory as the Makefile, then make install will do nothing.

ref:
https://stackoverflow.com/questions/2145590/what-is-the-purpose-of-phony-in-a-makefile

Automatic Variables

ref:
https://www.gnu.org/software/make/manual/make.html#toc-How-to-Use-Variables
https://www.gnu.org/software/make/manual/make.html#Automatic-Variables

Set Environment Variables From .env

  • export $(grep -v '^#' .env | xargs -0)
  • set -o allexport; source .env; set +o allexport
include .env
export $(shell sed 's/=.*//' .env)

run_web:
    poetry run python -m flask run -h 0.0.0.0 -p 8000 --reload

ref:
https://unix.stackexchange.com/questions/235223/makefile-include-env-file
https://stackoverflow.com/questions/19331497/set-environment-variables-from-file-of-key-pair-values

Also see: https://github.com/Tarrasch/zsh-autoenv

Run Another Target in the Same Makefile

Say that coverage depends on clean.

.PHONY: clean coverage

clean:
     find . -regex "\(.*__pycache__.*\|*.py[co]\)" -delete

coverage: clean
     docker exec -i -t streetvoice_django_1 python -m coverage run manage.py test --failfast
     docker exec -i -t streetvoice_django_1 python -m coverage html -i

ref:
https://stackoverflow.com/questions/13337727/how-do-i-make-a-target-in-a-makefile-invoke-another-target-in-the-makefile

Pass Arguments to make command

.PHONY: something

something:
ifeq ($(var),foo)
    @echo $(var) "bar"
else
    @echo "others"
endif
$ make something var=foo
foo bar

$ make something
others

ref:
https://stackoverflow.com/questions/2214575/passing-arguments-to-make-run

Detect OS

.PHONY: update

update:
ifeq ($(shell uname),Darwin)
    brew update
else
    apt-get update
endif

Check Whether a File or Directory Exists

.PHONY: up install

ifneq ($(wildcard /usr/local/HAL-9000/bin/hal),)
    UP_COMMAND = /usr/local/HAL-9000/bin/hal up
else
    UP_COMMAND = docker-compose up
endif

up:
    $(UP_COMMAND)

install:
    pip install -r requirements_dev.txt

ref:
https://stackoverflow.com/questions/20763629/test-whether-a-directory-exists-inside-a-makefile

Run Targets Parallelly

You could add MAKEFLAGS += --jobs=4 in your Makefile.

MAKEFLAGS += --jobs

.PHONY: task1 task2 task3 task4

task1:
    echo "hello $@"

task2:
    echo "hello $@"

task3:
    echo "hello $@"

task4:
    echo "hello $@"

tasks: task1 task2 task3 task4
$ make tasks

Or you could call make with -j 4 explicitly.

# allow 4 jobs at once
$ make -j 4 tasks

# allow infinite jobs with no arg
$ make -j tasks

ref:
https://stackoverflow.com/questions/10567890/parallel-make-set-j8-as-the-default-option

Simple Examples

Example 1:

include .env
export $(shell sed 's/=.*//' .env)

.PHONY: clean run_web run_worker

clean:
    find . \( -name \*.pyc -o -name \*.pyo -o -name __pycache__ \) -prune -exec rm -rf {} +

run_web:
    poetry run python -m flask run -h 0.0.0.0 -p 8000 --reload

run_worker:
    poetry run watchmedo auto-restart -d . -p '*.py' -R -- celery -A app:celery worker --pid= --without-gossip --prefetch-multiplier 1 -Ofair -l debug --purge -P gevent

Example 2:

MAKEFLAGS += --jobs=4

INDICES = music_group music_album music_recording music_composition

.PHONY: prepare up stop get_indices $(INDICES)

prepare:
    mkdir -p ../Muzeum-Node-Data

up: prepare
    docker-compose up

stop:
    docker-compose stop

ipfs:
    docker-compose exec ipfs sh

$(INDICES):
    mkdir -p ../Muzeum-Node-Data/ipfs/export/soundscape/$@
    docker-compose exec ipfs ipfs get /ipns/ipfs.soundscape.net/$@/index.json -o soundscape/$@/index.json

get_indices: $(INDICES)
Spark troubleshooting

Spark troubleshooting

Apache Spark 2.x Troubleshooting Guide
https://www.slideshare.net/jcmia1/a-beginners-guide-on-troubleshooting-spark-applications
https://www.slideshare.net/jcmia1/apache-spark-20-tuning-guide

Check your cluster UI to ensure that workers are registered and have sufficient resources

PYSPARK_DRIVER_PYTHON="jupyter" \
PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip 0.0.0.0" \
pyspark \
--packages "org.xerial:sqlite-jdbc:3.16.1,com.github.fommil.netlib:all:1.1.2" \
--driver-memory 4g \
--executor-memory 20g \
--master spark://TechnoCore.local:7077
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

可能是你指定的 --executor-memory 超過了 worker 的 memory。

你可以在 Spark Master UI http://localhost:8080/ 看到各個 worker 總共有多少 memory 可以用。如果每台 worker 可以用的 memory 容量不同,Spark 就只會選擇那些 memory 大於 --executor-memory 的 workers。

ref:
https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application

SparkContext was shut down

ERROR Executor: Exception in task 1.0 in stage 6034.0 (TID 21592)
java.lang.StackOverflowError
...
ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(55,1494185401195,JobFailed(org.apache.spark.SparkException: Job 55 cancelled because SparkContext was shut down))

可能是 executor 的記憶體不夠,導致 Out Of Memory (OOM) 了。

ref:
http://stackoverflow.com/questions/32822948/sparkcontext-was-shut-down-while-running-spark-on-a-large-dataset

Container exited with a non-zero exit code 56 (or some other numbers)

WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1504241464590_0001_01_000002 on host: albedo-w-1.c.albedo-157516.internal. Exit status: 56. Diagnostics: Exception from container-launch.
Container id: container_1504241464590_0001_01_000002
Exit code: 56
Stack trace: ExitCodeException exitCode=56:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 56

可能是 executor 的記憶體不夠,導致 Out Of Memory (OOM) 了。

ref:
http://stackoverflow.com/questions/39038460/understanding-spark-container-failure

Exception in thread "main" java.lang.StackOverflowError

Exception in thread "main" java.lang.StackOverflowError
    at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    ...

解決辦法:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.sql.SparkSession

val spark: SparkSession = SparkSession.builder().getOrCreate()
val sc = spark.sparkContext
sc.setCheckpointDir("./spark-data/checkpoint")

// 因為 sc.setCheckpointDir() 就會啟用 checkpoint 了
// 所以可以不用特別指定 checkpointInterval
val als = new ALS()
  .setCheckpointInterval(2)

ref:
https://stackoverflow.com/questions/31484460/spark-gives-a-stackoverflowerror-when-training-using-als
https://stackoverflow.com/questions/35127720/what-is-the-difference-between-spark-checkpoint-and-persist-to-a-disk

Randomness of hash of string should be disabled via PYTHONHASHSEED

解決辦法:

$ cd $SPARK_HOME
$ cp conf/spark-env.sh.template conf/spark-env.sh
$ echo "export PYTHONHASHSEED=42" >> conf/spark-env.sh

ref:
https://issues.apache.org/jira/browse/SPARK-13330

It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

因為 spark.sparkContext 只能在 driver program 裡存取,不能被 worker 存取(例如那些丟給 RDD 執行的 lambda function 或是 UDF 就是在 worker 上執行的)。

ref:
https://spark.apache.org/docs/latest/rdd-programming-guide.html#passing-functions-to-spark
https://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/

Spark automatically creates closures:

  • for functions that run on RDDs at workers,
  • and for any global variables that are used by those workers.

One closure is send per worker for every task. Closures are one way from the driver to the worker.

ref:
https://gerardnico.com/wiki/spark/closure

Unable to find encoder for type stored in a Dataset

Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases. someDF.as[SomeCaseClass]

解決辦法:

import spark.implicits._

yourDF.as[YourCaseClass]

ref:
https://stackoverflow.com/questions/38664972/why-is-unable-to-find-encoder-for-type-stored-in-a-dataset-when-creating-a-dat

Task not serializable

Caused by: java.io.NotSerializableException: Settings
Serialization stack:
    - object not serializable (class: Settings, value: Settings@2dfe2f00)
    - field (class: Settings$$anonfun$1, name: $outer, type: class Settings)
    - object (class Settings$$anonfun$1, <function1>)
Caused by: org.apache.spark.SparkException:
    Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)

通常是你在 closure functions 裡使用了 driver program 裡的某個 object,因為 Spark 會自動 serialize 那個被引用的 object 一起丟給 worker node 執行,所以如果那個 object 或是 class 沒辦法被 serialize,就會出現這個錯誤。

ref:
https://www.safaribooksonline.com/library/view/spark-the-definitive/9781491912201/ch04.html#user-defined-functions
http://www.puroguramingu.com/2016/02/26/spark-dos-donts.html
https://stackoverflow.com/questions/36176011/spark-sql-udf-task-not-serialisable
https://stackoverflow.com/questions/22592811/task-not-serializable-java-io-notserializableexception-when-calling-function-ou
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/javaionotserializableexception.html
https://mp.weixin.qq.com/s/BT6sXZlHcufAFLgTONCHsg

如果你只有在 Databricks Notebook 裡遇到這個錯誤,因為 Notebook 的運作機制跟一般的 Spark application 稍微有點不同,你可以試試 package cell。

ref:
https://docs.databricks.com/user-guide/notebooks/package-cells.html

java.lang.IllegalStateException: Cannot find any build directories.

java.lang.IllegalStateException: Cannot find any build directories.
    at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
    at org.apache.spark.launcher.AbstractCommandBuilder.getScalaVersion(AbstractCommandBuilder.java:240)
    at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:194)
    at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:117)
    at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:39)
    at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:45)
    at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:63)
    at org.apache.spark.deploy.worker.CommandUtils$.buildProcessBuilder(CommandUtils.scala:51)
    at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:145)
    at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)

可能的原因是沒有設置 SPARK_HOME 或是你的 launch script 沒有讀到該環境變數。

Observe system metrics, status, and logs on Linux

Observe system metrics, status, and logs on Linux

Linux commands that DevOps engineers (or SysAdmin) should know.

ref:
https://peteris.rocks/blog/htop/
http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html
http://techblog.netflix.com/2015/08/netflix-at-velocity-2015-linux.html

總覽

$ top

$ sudo apt-get install htop
$ htop

# 每 1 秒輸出一次資訊
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 1580104 171620 4287208    0    0     0    11    2    2  9  0 90  0  0
 0  0      0 1579832 171620 4287340    0    0     0     0 2871 2414 13  2 85  0  0
 0  0      0 1578688 171620 4287344    0    0     0    40 2311 1700 18  1 82  0  0
 1  0      0 1578640 171620 4287348    0    0     0    48 1302 1020  5  0 95  0  0
...

查 CPU

$ uptime

Load average: 0.03 0.11 0.19
Load average: 一分鐘 五分鐘 十五分鐘內的平均負載
單核心,如果 Load average 是 1 表示負載 100%
多核心的話,因為 Load average 是所有 CPU 數加起來,所以數值可能會大於 1

$ sudo apt-get install sysstat

# 每個 CPU 的使用率
$ mpstat -P ALL 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)
07:38:49 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle
07:38:50 PM  all  98.47   0.00   0.75    0.00   0.00   0.00    0.00    0.00    0.00   0.78
07:38:50 PM    0  96.04   0.00   2.97    0.00   0.00   0.00    0.00    0.00    0.00   0.99
07:38:50 PM    1  97.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   2.00
07:38:50 PM    2  98.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   1.00
...

# 每個 process 的 CPU 使用率
$ pidstat 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)
07:41:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:41:03 PM     0         9    0.00    0.94    0.00    0.94     1  rcuos/0
07:41:03 PM     0      4214    5.66    5.66    0.00   11.32    15  mesos-slave
07:41:03 PM     0      4354    0.94    0.94    0.00    1.89     8  java
07:41:03 PM     0      6521 1596.23    1.89    0.00 1598.11    27  java
...

查 Memory

$ free –m
             total       used       free     shared    buffers     cached
Mem:          7983       6443       1540          0        167       4192
-/+ buffers/cache:       2083       5900
Swap:            0          0          0

查 Disk

$ iostat -xz 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          73.96    0.00    3.73    0.03    0.06   22.21
Device:   rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda        0.00     0.23    0.21    0.18     4.52     2.08    34.37     0.00    9.98   13.80    5.42   2.44   0.09
xvdb        0.01     0.00    1.02    8.94   127.97   598.53   145.79     0.00    0.43    1.78    0.28   0.25   0.25
xvdc        0.01     0.00    1.02    8.86   127.79   595.94   146.50     0.00    0.45    1.82    0.30   0.27   0.26

查 Disk Usage

# show whole disk
$ df -h

# show every folder under the directory
$ du -h /data

# show the top directory only
$ du -hs /var/lib/influxdb/data
77.4G    /var/lib/influxdb/data

# show largest top 10 files
$ du -hsx * | sort -rh | head -10

ref:
https://www.codecoffee.com/tipsforlinux/articles/22.html
https://www.cyberciti.biz/faq/how-do-i-find-the-largest-filesdirectories-on-a-linuxunixbsd-filesystem/

查 IO

$ sudo apt-get install dstat iotop

# 可以顯示哪些 process 在進行 io 操作
$ dstat --top-io --top-bio

# with –only option to see only processes or threads actually doing I/O
$ sudo iotop --only

ref:
https://www.cyberciti.biz/hardware/linux-iotop-simple-top-like-io-monitor/

查 CPU bound 或 IO bound

$ iostat -c | head -3 ; iostat -c 1 20

ref:
https://serverfault.com/questions/72209/cpu-or-network-i-o-bound
https://askubuntu.com/questions/1540/how-can-i-find-out-if-a-process-is-cpu-memory-or-disk-bound

iotop cannot is not working inside a container.

查 Process

$ ps aux
$ pstree -a

# attach to a process to find out system calls the process calls
# -t -- absolute timestamp
# -T -- print time spent in each syscall
# -s strsize -- limit length of print strings to STRSIZE chars (default 32)
# -f -- follow forks
# -e -- filtering expression: `option=trace,abbrev,verbose,raw,signal,read,write,fault`
# -u username -- run command as username handling setuid and/or setgid
$ strace -t -T -f -s 2048 -p THE_PID

# find out which files that nginx accesses
# you could try to find something related to the error message first:
# write(1, "Ign http://192.168.212.136 trusty Release\n", 62) = 62
# writev(12, [{"HTTP/1.1 500 Internal Server Error"..., 256}, {...}, {...}, {...}, 4]) = 276
$ strace -f -e trace=file service nginx start

# 顯示 PID 3001 的 process 是用什麼指令和參數啟動的
$ tr '\0' '\n' < /proc/3001/cmdline

# only on macOS
$ top -c a -p 1537

ref:
https://mp.weixin.qq.com/s/Sf79W5dqUFx7rUYRrtx88Q
https://blogs.oracle.com/linux/strace-the-sysadmins-microscope-v2
https://zwischenzugs.com/2011/08/29/my-favourite-secret-weapon-strace/

查 Kernel Logs

# 顯示最近的 15 筆 system messages
$ dmesg | tail -fn 15

# 顯示有關 killed process 的 logs
$ dmesg | grep -E -i -B50 'killed process'

ref:
https://stackoverflow.com/questions/726690/what-killed-my-process-and-why

查 Network

$ sar -n TCP,ETCP 1

查 DNS

Resolve a domain name using dig:

$ apt-get install curl dnsutils iputils-ping
# or
$ apk add --update bind-tools

$ dig +short october-api.default.svc.cluster.local
10.32.1.79

$ dig +short redis-broker.default.svc.cluster.local
10.60.32.20
10.60.33.15

$ dig +short redis-broker-0.redis-broker.default.svc.cluster.local
10.60.32.20

ref:
https://blog.longwin.com.tw/2013/03/dig-dns-query-debug-2013/

Resolve a domain name using nslookup:

$ apt-get install dnsutils

$ nslookup redis-broker.default.svc.cluster.local
Server:    10.3.240.10
Address 1: 10.3.240.10 kube-dns.kube-system.svc.cluster.local

Name:      redis-broker.default.svc.cluster.local
Address 1: 10.0.69.46 redis-broker-0.redis-broker.default.svc.cluster.local

Find specific types of DNS records:

$ nslookup -q=TXT codetengu.com
Server:     1.1.1.1
Address:    1.1.1.1#53

Non-authoritative answer:
codetengu.com    text = "zoho-verification=xxx.zmverify.zoho.com"

Authoritative answers can be found from:

nslookup could return multiple A records for a domain which is commonly known as round-robin DNS.

ref:
https://serverfault.com/questions/590277/why-does-nslookup-return-two-or-more-ip-address-for-yahoo-com-or-microsoft-com

查 Nginx

# 顯示各個 status code 的數量
$ cat access.log | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -rn

# 顯示哪些 URL 的 404 數量最多
$ awk '($9 ~ /404/)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn

# 顯示 2016/10/01 的 16:00 ~ 18:00 的 log
$ grep "01/Oct/2016:1[6-8]" access.log

# 顯示 2016/10/01 的 09:00 ~ 12:00 的 log
$ egrep "01/Oct/2016:(0[8-9]|1[0-2])" access.log

ref:
http://stackoverflow.com/questions/7575267/extract-data-from-log-file-in-specified-range-of-time
http://superuser.com/questions/848971/unix-command-to-grep-a-time-range

如果 status code 是 502 Bad Gateway
通常表示是 load balancer / nginx 的 upstream server 掛了或連不到
如果是 Kubernetes service 的話
可能是 Service spec.selector 跟 pod 匹配不起來

Linux commands cookbook

Linux commands cookbook

50 Most Frequently Used UNIX / Linux Commands
http://www.thegeekstuff.com/2010/11/50-linux-commands/

Write a Simple Script in Shell

Write a for loop.

$ for pod_name in $(kubectl get pods -l app=swag-worker-test -o jsonpath={..metadata.name}); do; kubectl delete pod $pod_name; done
pod "swag-worker-test-67fffcdd5-5hgf3" deleted
pod "swag-worker-test-67fffcdd5-h8jgg" deleted

# you could also write multiple lines
for pod_name in $(kubectl get pods -l app=swag-worker-test -o jsonpath={..metadata.name}); do
    kubectl delete pod $pod_name
done

Write a while true.

# using trap and wait will make your container react immediately to a stop request
$ bash -c "trap : TERM INT; sleep infinity & wait"
# or
$ bash -c "while true; do echo 'I am alive!'; sleep 3600; done"

or

#!/bin/bash
while true; do echo 'I am alive!'; sleep 3600; done

ref:
https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kubernetes

Set Environment Variables from a File

$ export $(cat .env | grep -v ^# | xargs)

ref:
https://stackoverflow.com/questions/19331497/set-environment-variables-from-file

Switch to Another User

# the latter with "-" gets an environment as if another user just logged in
$ sudo su - ubuntu

Clear the Content of a File

$ echo -n > /var/log/nginx/error.log

ref:
https://unix.stackexchange.com/questions/88808/empty-the-contents-of-a-file

Pipeline stdout with xargs

$ find . -type f -name "*.yaml" | xargs echo
./configmap.yaml ./pvc.yaml ./service.yaml ./statefulset.yaml

$ find . -type f -name "*.yaml" | xargs -n 1 echo
./configmap.yaml
./pvc.yaml
./service.yaml
./statefulset.yaml

$ find . -type f -name "*.yaml" | xargs -n 2 echo
./configmap.yaml ./pvc.yaml
./service.yaml ./statefulset.yaml

$ redis-cli KEYS "*-*-*-*.reply.celery.pidbox" | xargs -n 100 redis-cli DEL

ref:
https://shapeshed.com/unix-xargs/

Set a Timeout for any Command

$ timeout -t 15 celery inspect ping -A app:celery -d celery@$(hostname)

Run a One-time Command at a Specific Time

at executes commands at a specified time. You may need to install the "at" package manually.

# install
$ sudo apt-get install at

# start
$ sudo atd

# list jobs
$ atq

$ at 00:05
at> echo "123" > /tmp/test.txt

$ at 00:00 18.1.2017
at> DPS_ENV=production /home/ubuntu/.virtualenvs/dps/bin/python /home/ubuntu/dps/manage.py send_emails > /tmp/send_emails.log

Press Control + D to exit at shell.

ref:
https://www.lifewire.com/linux-command-at-4091646
https://tecadmin.net/one-time-task-scheduling-using-at-commad-in-linux/

Pass Arguments to bash when Executing a script Fetched by curl

$ curl -L http://bit.ly/open-the-pod-bay-doors | bash -s -- --tags docker 

ref:
https://stackoverflow.com/a/25563308/885524
https://github.com/vinta/HAL-9000

Change a File's Modify Time

$ touch -m -d '1 Jan 2006 12:34' tmp
$ touch -m -d '1 Jan 2006 12:34' tmp/old_file.txt

ref:
https://www.unixtutorial.org/2008/11/how-to-update-atime-and-mtime-for-a-file-in-unix/

Delete Old Files under a Directory

$ find /data/storage/tmp/* -mtime +2 | xargs rm -Rf
$ find /data/storage/tmp/* -mtime +2 -exec rm {} \;

ref:
http://stackoverflow.com/questions/14731133/how-to-delete-all-files-older-than-3-days-when-argument-list-too-long

Append String to a File

# append
$ echo "the last line" >> README.md

# replace
$ echo "replace all" > README.md

Rename Sub-folders

$ for f in */migrations/; do mv -v "$f" "${f%???????????}south_migrations"; done

ref:
http://unix.stackexchange.com/questions/220176/rename-specific-level-of-sub-folders

List History Commands

$ export HISTTIMEFORMAT="%Y%m%d %T  "
$ history

Make Permission For A File Same As Another File

$ chmod --reference=file1 file2

Find Computer's Public IP

$ wget -qO- http://ipecho.net/plain ; echo

ref:
http://askubuntu.com/questions/95910/command-for-determining-my-public-ip

Compress and Uncompress Files

$ tar czf media-20151010.tar.gz media/
$ s3cmd put media-20151010.tar.gz s3://goeasytaxi/

# decompress
$ tar -xzf media.tar.gz

$ sudo apt-get install zip unzip
$ zip -j -r deps.zip spark_app/src/deps/
$ zip -r hourmasters.zip hourmasters/
$ scp -r -i ~/hourmasters.pem ssh [email protected]:/home/ubuntu/hourmasters.zip ~/Desktop/

# decompress
$ unzip stork.1.4.zip
$ gzip -d uwsgi.log.*.gz

$ gzip dps.201701171200.sql
$ gunzip dps.201701171200.sql.gz

Count File Lines

$ wc -l filename.txt

$ wc -l *.py

Find Files by Name or Content

$ find / -name virtualenvwrapper.sh

# 在現在的資料夾裡的全部檔案中搜尋字串,會自動搜尋子目錄
$ find . | xargs grep 'string'

$ find . -iname '*something*'

$ find *.html | xargs grep 'share_server.html'

# 搜尋當前目錄及子目錄下的含有 print() 字串的檔案
$ grep -rnw "." -e "print()"

$ grep -rnw "." -e "print()" --include=\*.py

ref:
https://stackoverflow.com/questions/16956810/how-do-i-find-all-files-containing-specific-text-on-linux

Find Directories by Name

$ find . -type d -name "*native*" -print

ref:
https://askubuntu.com/questions/153144/how-can-i-recursively-search-for-directory-names-with-a-particular-string-where

List Files by Date

$ ls -lrt

List Files Opened by a Process

$ lsof | grep uwsgi

$ lsof -i | grep LISTEN
$ lsof -i -n -P | grep LISTEN

Extract Content from a File

$ cat uwsgi.log | grep error

Display Contents of All Files in the Current Directory

$ grep . *
$ grep . *.html

List Used Ports

$ netstat -a

# TCP
$ netstat -ntlp | grep uwsgi

# UCP
$ netstat -nulp

Ping a Port

$ curl -I "10.148.70.84:9200"
$ curl -I "192.168.100.10:80"

$ sudo apt-get install nmap
$ nmap -p 4505 54.250.5.176
$ nmap -p 8000 10.10.100.70
$ nmap -p 5672 10.10.100.82

$ telnet 54.250.5.176 4505

ref:
http://stackoverflow.com/questions/12792856/what-ports-does-rabbitmq-use

Show Network Traffic and Bandwidth

$ tcpdump -i eth0

$ sudo apt-get install tcptrack
$ tcptrack -i eth0

ref:
http://askubuntu.com/questions/257263/how-to-display-network-traffic-in-terminal

List Running Processes

# show all processes
$ pstree -a

# also show pid
$ pstree -ap

# 列出前 10 個最佔記憶體的 processes
$ ps aux | sort -nk +4 | tail

# 列出 mysql 相關的 processes
$ ps aux | grep 'worker process'
$ ps aux | grep uwsgi

# 樹狀顯示
$ ps auxf

# 搜尋 process 並以樹狀結果顯示 parent process
$ ps f -opid,cmd -C python

Kill Processes

# 列出目前所有的正在記憶體當中的程序
$ ps aux

# 匹配字串
$ ps aux | grep "mongo"

# 幹掉它
$ kill PID

# kill all processes matching a name
$ sudo pkill -f runserver

Store User's Input as a Variable

$ read YOUR_VARIABLE_NAME

$ read name
# you type: vinta

$ echo $name
vinta

ref:
https://canred.blogspot.tw/2013/03/read.html

Show Terminal Size

$ stty size
$ echo $LINES && echo $COLUMNS
59 273 

ref:
https://stackoverflow.com/questions/263890/how-do-i-find-the-width-height-of-a-terminal-window