Linux commands that every DevOps engineer should know

Linux commands that every DevOps engineer should know

Linux commands that DevOps engineers (or SysAdmin) should know.

ref:
https://peteris.rocks/blog/htop/
http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html
http://techblog.netflix.com/2015/08/netflix-at-velocity-2015-linux.html

總覽

$ top

$ sudo apt-get install htop
$ htop

# 每 1 秒輸出一次資訊
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 1580104 171620 4287208    0    0     0    11    2    2  9  0 90  0  0
 0  0      0 1579832 171620 4287340    0    0     0     0 2871 2414 13  2 85  0  0
 0  0      0 1578688 171620 4287344    0    0     0    40 2311 1700 18  1 82  0  0
 1  0      0 1578640 171620 4287348    0    0     0    48 1302 1020  5  0 95  0  0
...

查 CPU

$ uptime

Load average: 0.03 0.11 0.19
Load average: 一分鐘 五分鐘 十五分鐘內的平均負載
單核心,如果 Load average 是 1 表示負載 100%
多核心的話,因為 Load average 是所有 CPU 數加起來,所以數值可能會大於 1

$ sudo apt-get install sysstat

# 每個 CPU 的使用率
$ mpstat -P ALL 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)
07:38:49 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle
07:38:50 PM  all  98.47   0.00   0.75    0.00   0.00   0.00    0.00    0.00    0.00   0.78
07:38:50 PM    0  96.04   0.00   2.97    0.00   0.00   0.00    0.00    0.00    0.00   0.99
07:38:50 PM    1  97.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   2.00
07:38:50 PM    2  98.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   1.00
...

# 每個 process 的 CPU 使用率
$ pidstat 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)
07:41:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
07:41:03 PM     0         9    0.00    0.94    0.00    0.94     1  rcuos/0
07:41:03 PM     0      4214    5.66    5.66    0.00   11.32    15  mesos-slave
07:41:03 PM     0      4354    0.94    0.94    0.00    1.89     8  java
07:41:03 PM     0      6521 1596.23    1.89    0.00 1598.11    27  java
...

查 Memory

$ free –m
             total       used       free     shared    buffers     cached
Mem:          7983       6443       1540          0        167       4192
-/+ buffers/cache:       2083       5900
Swap:            0          0          0

查 Disk

$ iostat -xz 1
Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          73.96    0.00    3.73    0.03    0.06   22.21
Device:   rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda        0.00     0.23    0.21    0.18     4.52     2.08    34.37     0.00    9.98   13.80    5.42   2.44   0.09
xvdb        0.01     0.00    1.02    8.94   127.97   598.53   145.79     0.00    0.43    1.78    0.28   0.25   0.25
xvdc        0.01     0.00    1.02    8.86   127.79   595.94   146.50     0.00    0.45    1.82    0.30   0.27   0.26

查 IO

$ sudo apt-get install dstat iotop

# 可以顯示哪些 process 在進行 io 操作
$ dstat --top-io --top-bio

# with –only option to see only processes or threads actually doing I/O
$ sudo iotop --only

ref:
https://www.cyberciti.biz/hardware/linux-iotop-simple-top-like-io-monitor/

查 Network

$ sar -n TCP,ETCP 1

查 Process

$ ps aux
$ pstree -a

# attach 到某個 process,查看 system call
# -t -- absolute timestamp
# -T -- print time spent in each syscall
# -s strsize -- limit length of print strings to STRSIZE chars (default 32)
# -f -- follow forks
# -u username -- run command as username handling setuid and/or setgid
$ strace -t -T -f -p 1234

# 可以看到啟動 nginx 的過程中存取了哪些檔案
$ strace -f -e trace=file service nginx start

# 顯示 PID 3001 的 process 是用什麼指令和參數啟動的
$ tr '\0' '\n' < /proc/3001/cmdline

ref:
http://man7.org/linux/man-pages/man1/strace.1.html
https://blogs.oracle.com/ksplice/strace-the-sysadmins-microscope

查 Logs

# 顯示最近的 15 筆 system messages
$ dmesg | tail -fn 15

查 Nginx

# 顯示各個 status code 的數量
$ cat access.log | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -rn

# 顯示哪些 URL 的 404 數量最多
$ awk '($9 ~ /404/)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn

# 顯示 2016/10/01 的 16:00 ~ 18:00 的 log
$ grep "01/Oct/2016:1[6-8]" access.log

# 顯示 2016/10/01 的 09:00 ~ 12:00 的 log
$ egrep "01/Oct/2016:(0[8-9]|1[0-2])" access.log

ref:
http://stackoverflow.com/questions/7575267/extract-data-from-log-file-in-specified-range-of-time
http://superuser.com/questions/848971/unix-command-to-grep-a-time-range

Linux commands cookbook

Linux commands cookbook

switch shell to another user

# the latter with "-" gets an environment as if another user just logged in
$ sudo su - ubuntu

change file's modify time

$ touch -m -d '1 Jan 2006 12:34' tmp
$ touch -m -d '1 Jan 2006 12:34' tmp/old_file.txt

ref:
https://www.unixtutorial.org/2008/11/how-to-update-atime-and-mtime-for-a-file-in-unix/

delete old files under a directory

$ find /data/storage/tmp/* -mtime +2 | xargs rm -Rf
$ find /data/storage/tmp/* -mtime +2 -exec rm {} \;

ref:
http://stackoverflow.com/questions/14731133/how-to-delete-all-files-older-than-3-days-when-argument-list-too-long

append string to file in command line

# append
$ echo "the last line" >> README.md

# replace
$ echo "replace all" > README.md

rename sub-folders

$ for f in */migrations/; do mv -v "$f" "${f%???????????}south_migrations"; done

ref:
http://unix.stackexchange.com/questions/220176/rename-specific-level-of-sub-folders

list history commands

$ export HISTTIMEFORMAT="%Y%m%d %T  "
$ history

find public IP

$ wget -qO- http://ipecho.net/plain ; echo

ref:
http://askubuntu.com/questions/95910/command-for-determining-my-public-ip

count file lines

$ wc -l filename.txt

$ wc -l *.py

find files by name or content

$ find / -name virtualenvwrapper.sh

# 在現在的資料夾裡的全部檔案中搜尋字串,會自動搜尋子目錄
$ find . | xargs grep 'string'

$ find . -iname '*something*'

$ find *.html | xargs grep 'share_server.html'

# 搜尋當前目錄及子目錄下的含有 print() 字串的檔案
$ grep -rnw "." -e "print()"

$ grep -rnw "." -e "print()" --include=\*.py

ref:
https://stackoverflow.com/questions/16956810/how-do-i-find-all-files-containing-specific-text-on-linux

list files by date

$ ls -lrt

extract info from a file

$ cat uwsgi.log | grep error

display contents of all files in the current directory

$ grep . *
$ grep . *.html

list used ports

# list open files for a process
$ lsof | grep uwsgi

$ lsof -i | grep LISTEN
$ lsof -i -n -P | grep LISTEN

# TCP
$ sudo netstat -ntlp | grep uwsgi

# UCP
$ sudo netstat -nulp

$ sudo netstat -nxlp

ping port

$ curl -I "10.148.70.84:9200"
$ curl -I "192.168.100.10:80"

$ sudo apt-get install nmap
$ nmap -p 4505 54.250.5.176
$ nmap -p 8000 10.10.100.70
$ nmap -p 5672 10.10.100.82

$ telnet 54.250.5.176 4505

ref:
http://stackoverflow.com/questions/12792856/what-ports-does-rabbitmq-use

show network traffic and bandwidth

$ tcpdump -i eth0

$ sudo apt-get install tcptrack
$ tcptrack -i eth0

ref:
http://askubuntu.com/questions/257263/how-to-display-network-traffic-in-terminal

list running processes

# show all processes
$ pstree -a

# also show pid
$ pstree -ap

# 列出前 10 個最佔記憶體的 processes
$ ps aux | sort -nk +4 | tail

# 列出 mysql 相關的 processes
$ ps aux | grep 'worker process'
$ ps aux | grep uwsgi

# 樹狀顯示
$ ps auxf

# 搜尋 process 並以樹狀結果顯示 parent process
$ ps f -opid,cmd -C python

kill processes

# 列出目前所有的正在記憶體當中的程序
$ ps aux

# 匹配字串
$ ps aux | grep "mongo"

# 幹掉它
$ kill PID

# kill all processes matching a name
$ sudo killall -9 httpd
$ sudo killall salt
$ sudo pkill -f runserver
MySQL error codes

MySQL error codes

Show all MySQL error codes using the perror command.

$ for i in {1..190..1}; do perror "$i"; done
OS error code   1:  Operation not permitted
OS error code   2:  No such file or directory
OS error code   3:  No such process
OS error code   4:  Interrupted system call
OS error code   5:  Input/output error
OS error code   6:  No such device or address
OS error code   7:  Argument list too long
OS error code   8:  Exec format error
OS error code   9:  Bad file descriptor
OS error code  10:  No child processes
OS error code  11:  Resource temporarily unavailable
OS error code  12:  Cannot allocate memory
OS error code  13:  Permission denied
OS error code  14:  Bad address
OS error code  15:  Block device required
OS error code  16:  Device or resource busy
OS error code  17:  File exists
OS error code  18:  Invalid cross-device link
OS error code  19:  No such device
OS error code  20:  Not a directory
OS error code  21:  Is a directory
OS error code  22:  Invalid argument
OS error code  23:  Too many open files in system
OS error code  24:  Too many open files
OS error code  25:  Inappropriate ioctl for device
OS error code  26:  Text file busy
OS error code  27:  File too large
OS error code  28:  No space left on device
OS error code  30:  Read-only file system
OS error code  31:  Too many links
OS error code  32:  Broken pipe
OS error code  33:  Numerical argument out of domain
OS error code  34:  Numerical result out of range
OS error code  35:  Resource deadlock avoided
OS error code  36:  File name too long
OS error code  37:  No locks available
OS error code  38:  Function not implemented
OS error code  39:  Directory not empty
OS error code  40:  Too many levels of symbolic links
OS error code  42:  No message of desired type
OS error code  43:  Identifier removed
OS error code  44:  Channel number out of range
OS error code  45:  Level 2 not synchronized
OS error code  46:  Level 3 halted
OS error code  47:  Level 3 reset
OS error code  48:  Link number out of range
OS error code  49:  Protocol driver not attached
OS error code  50:  No CSI structure available
OS error code  51:  Level 2 halted
OS error code  52:  Invalid exchange
OS error code  53:  Invalid request descriptor
OS error code  54:  Exchange full
OS error code  55:  No anode
OS error code  56:  Invalid request code
OS error code  57:  Invalid slot
OS error code  59:  Bad font file format
OS error code  60:  Device not a stream
OS error code  61:  No data available
OS error code  62:  Timer expired
OS error code  63:  Out of streams resources
OS error code  64:  Machine is not on the network
OS error code  65:  Package not installed
OS error code  66:  Object is remote
OS error code  67:  Link has been severed
OS error code  68:  Advertise error
OS error code  69:  Srmount error
OS error code  70:  Communication error on send
OS error code  71:  Protocol error
OS error code  72:  Multihop attempted
OS error code  73:  RFS specific error
OS error code  74:  Bad message
OS error code  75:  Value too large for defined data type
OS error code  76:  Name not unique on network
OS error code  77:  File descriptor in bad state
OS error code  78:  Remote address changed
OS error code  79:  Can not access a needed shared library
OS error code  80:  Accessing a corrupted shared library
OS error code  81:  .lib section in a.out corrupted
OS error code  82:  Attempting to link in too many shared libraries
OS error code  83:  Cannot exec a shared library directly
OS error code  84:  Invalid or incomplete multibyte or wide character
OS error code  85:  Interrupted system call should be restarted
OS error code  86:  Streams pipe error
OS error code  87:  Too many users
OS error code  88:  Socket operation on non-socket
OS error code  89:  Destination address required
OS error code  90:  Message too long
OS error code  91:  Protocol wrong type for socket
OS error code  92:  Protocol not available
OS error code  93:  Protocol not supported
OS error code  94:  Socket type not supported
OS error code  95:  Operation not supported
OS error code  96:  Protocol family not supported
OS error code  97:  Address family not supported by protocol
OS error code  98:  Address already in use
OS error code  99:  Cannot assign requested address
OS error code 100:  Network is down
OS error code 101:  Network is unreachable
OS error code 102:  Network dropped connection on reset
OS error code 103:  Software caused connection abort
OS error code 104:  Connection reset by peer
OS error code 105:  No buffer space available
OS error code 106:  Transport endpoint is already connected
OS error code 107:  Transport endpoint is not connected
OS error code 108:  Cannot send after transport endpoint shutdown
OS error code 109:  Too many references: cannot splice
OS error code 110:  Connection timed out
OS error code 111:  Connection refused
OS error code 112:  Host is down
OS error code 113:  No route to host
OS error code 114:  Operation already in progress
OS error code 115:  Operation now in progress
OS error code 116:  Stale NFS file handle
OS error code 117:  Structure needs cleaning
OS error code 118:  Not a XENIX named type file
OS error code 119:  No XENIX semaphores available
OS error code 120:  Is a named type file
OS error code 121:  Remote I/O error
OS error code 122:  Disk quota exceeded
OS error code 123:  No medium found
OS error code 124:  Wrong medium type
OS error code 125:  Operation canceled
MySQL error code 126: Index file is crashed
MySQL error code 127: Record-file is crashed
MySQL error code 128: Out of memory
MySQL error code 130: Incorrect file format
MySQL error code 131: Command not supported by database
MySQL error code 132: Old database file
MySQL error code 133: No record read before update
MySQL error code 134: Record was already deleted (or record file crashed)
MySQL error code 135: No more room in record file
MySQL error code 136: No more room in index file
MySQL error code 137: No more records (read after end of file)
MySQL error code 138: Unsupported extension used for table
MySQL error code 139: Too big row
MySQL error code 140: Wrong create options
MySQL error code 141: Duplicate unique key or constraint on write or update
MySQL error code 142: Unknown character set used
MySQL error code 143: Conflicting table definitions in sub-tables of MERGE table
MySQL error code 144: Table is crashed and last repair failed
MySQL error code 145: Table was marked as crashed and should be repaired
MySQL error code 146: Lock timed out; Retry transaction
MySQL error code 147: Lock table is full;  Restart program with a larger locktable
MySQL error code 148: Updates are not allowed under a read only transactions
MySQL error code 149: Lock deadlock; Retry transaction
MySQL error code 150: Foreign key constraint is incorrectly formed
MySQL error code 151: Cannot add a child row
MySQL error code 152: Cannot delete a parent row
MySQL error code 153: No savepoint with that name
MySQL error code 154: Non unique key block size
MySQL error code 155: The table does not exist in engine
MySQL error code 156: The table already existed in storage engine
MySQL error code 157: Could not connect to storage engine
MySQL error code 158: Unexpected null pointer found when using spatial index
MySQL error code 159: The table changed in storage engine
MySQL error code 160: There's no partition in table for the given value
MySQL error code 161: Row-based binary logging of row failed
MySQL error code 162: Index needed in foreign key constraint
MySQL error code 163: Upholding foreign key constraints would lead to a duplicate key error in some other table
MySQL error code 164: Table needs to be upgraded before it can be used
MySQL error code 165: Table is read only
MySQL error code 166: Failed to get next auto increment value
MySQL error code 167: Failed to set row auto increment value
MySQL error code 168: Unknown (generic) error from engine
MySQL error code 169: Record was not update. Original values was same as new values
MySQL error code 170: It is not possible to log this statement
MySQL error code 171: The event was corrupt, leading to illegal data being read
MySQL error code 172: The table is of a new format not supported by this version
MySQL error code 173: The event could not be processed. No other handler error happened
MySQL error code 174: Got a fatal error during initialization of handler
MySQL error code 175: File too short; Expected more data in file
MySQL error code 176: Read page with wrong checksum
MySQL error code 177: Too many active concurrent transactions
MySQL error code 178: Record not matching the given partition set
MySQL error code 179: Index column length exceeds limit
MySQL error code 180: Index corrupted
MySQL error code 181: Undo record too big
MySQL error code 182: Invalid InnoDB FTS Doc ID
MySQL error code 183: Table is being used in foreign key check
MySQL error code 184: Tablespace already exists
MySQL error code 185: Too many columns
MySQL error code 186: Row in wrong partition
MySQL error code 187: Row is not visible by the current transaction
MySQL error code 188: Operation was interrupted by end user (probably kill command?)
MySQL error code 189: Disk full
MySQL error code 190: Incompatible key or row definition between the MariaDB .frm file and the information in the storage engine. You have to dump and restore the table to fix this