{"id":412,"date":"2017-08-05T22:21:26","date_gmt":"2017-08-05T14:21:26","guid":{"rendered":"http:\/\/vinta.ws\/code\/?p=412"},"modified":"2026-02-18T01:20:35","modified_gmt":"2026-02-17T17:20:35","slug":"linux-commands-that-every-devops-engineer-should-know","status":"publish","type":"post","link":"https:\/\/vinta.ws\/code\/linux-commands-that-every-devops-engineer-should-know.html","title":{"rendered":"Observe system metrics, status, and logs on Linux"},"content":{"rendered":"<p>Linux commands that DevOps engineers (or SysAdmin) should know.<\/p>\n<p>ref:<br \/>\n<a href=\"https:\/\/peteris.rocks\/blog\/htop\/\">https:\/\/peteris.rocks\/blog\/htop\/<\/a><br \/>\n<a href=\"http:\/\/techblog.netflix.com\/2015\/11\/linux-performance-analysis-in-60s.html\">http:\/\/techblog.netflix.com\/2015\/11\/linux-performance-analysis-in-60s.html<\/a><br \/>\n<a href=\"http:\/\/techblog.netflix.com\/2015\/08\/netflix-at-velocity-2015-linux.html\">http:\/\/techblog.netflix.com\/2015\/08\/netflix-at-velocity-2015-linux.html<\/a><\/p>\n<h2>\u7e3d\u89bd<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ top\n\n$ sudo apt-get install htop\n$ htop\n\n# \u6bcf 1 \u79d2\u8f38\u51fa\u4e00\u6b21\u8cc7\u8a0a\n$ vmstat 1\nprocs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----\n r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st\n 1  0      0 1580104 171620 4287208    0    0     0    11    2    2  9  0 90  0  0\n 0  0      0 1579832 171620 4287340    0    0     0     0 2871 2414 13  2 85  0  0\n 0  0      0 1578688 171620 4287344    0    0     0    40 2311 1700 18  1 82  0  0\n 1  0      0 1578640 171620 4287348    0    0     0    48 1302 1020  5  0 95  0  0\n...<\/code><\/pre>\n<h2>\u67e5 CPU<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ uptime<\/code><\/pre>\n<p>Load average: 0.03 0.11 0.19<br \/>\nLoad average: \u4e00\u5206\u9418 \u4e94\u5206\u9418 \u5341\u4e94\u5206\u9418\u5167\u7684\u5e73\u5747\u8ca0\u8f09<br \/>\n\u55ae\u6838\u5fc3\uff0c\u5982\u679c Load average \u662f 1 \u8868\u793a\u8ca0\u8f09 100%<br \/>\n\u591a\u6838\u5fc3\u7684\u8a71\uff0c\u56e0\u70ba Load average \u662f\u6240\u6709 CPU \u6578\u52a0\u8d77\u4f86\uff0c\u6240\u4ee5\u6578\u503c\u53ef\u80fd\u6703\u5927\u65bc 1<\/p>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ sudo apt-get install sysstat\n\n# \u6bcf\u500b CPU \u7684\u4f7f\u7528\u7387\n$ mpstat -P ALL 1\nLinux 3.13.0-49-generic (titanclusters-xxxxx)  07\/14\/2015  _x86_64_ (32 CPU)\n07:38:49 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle\n07:38:50 PM  all  98.47   0.00   0.75    0.00   0.00   0.00    0.00    0.00    0.00   0.78\n07:38:50 PM    0  96.04   0.00   2.97    0.00   0.00   0.00    0.00    0.00    0.00   0.99\n07:38:50 PM    1  97.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   2.00\n07:38:50 PM    2  98.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   1.00\n...\n\n# \u6bcf\u500b process \u7684 CPU \u4f7f\u7528\u7387\n$ pidstat 1\nLinux 3.13.0-49-generic (titanclusters-xxxxx)  07\/14\/2015    _x86_64_    (32 CPU)\n07:41:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command\n07:41:03 PM     0         9    0.00    0.94    0.00    0.94     1  rcuos\/0\n07:41:03 PM     0      4214    5.66    5.66    0.00   11.32    15  mesos-slave\n07:41:03 PM     0      4354    0.94    0.94    0.00    1.89     8  java\n07:41:03 PM     0      6521 1596.23    1.89    0.00 1598.11    27  java\n...<\/code><\/pre>\n<h2>\u67e5 Memory<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ free \u2013m\n             total       used       free     shared    buffers     cached\nMem:          7983       6443       1540          0        167       4192\n-\/+ buffers\/cache:       2083       5900\nSwap:            0          0          0<\/code><\/pre>\n<h2>\u67e5 Disk<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ iostat -xz 1\nLinux 3.13.0-49-generic (titanclusters-xxxxx)  07\/14\/2015  _x86_64_ (32 CPU)\navg-cpu:  %user   %nice %system %iowait  %steal   %idle\n          73.96    0.00    3.73    0.03    0.06   22.21\nDevice:   rrqm\/s   wrqm\/s     r\/s     w\/s    rkB\/s    wkB\/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util\nxvda        0.00     0.23    0.21    0.18     4.52     2.08    34.37     0.00    9.98   13.80    5.42   2.44   0.09\nxvdb        0.01     0.00    1.02    8.94   127.97   598.53   145.79     0.00    0.43    1.78    0.28   0.25   0.25\nxvdc        0.01     0.00    1.02    8.86   127.79   595.94   146.50     0.00    0.45    1.82    0.30   0.27   0.26<\/code><\/pre>\n<h2>\u67e5 Disk Usage<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\"># show whole disk\n$ df -h\n\n# show every folder under the directory\n$ du -h \/data\n\n# show the top directory only\n$ du -hs \/var\/lib\/influxdb\/data\n77.4G    \/var\/lib\/influxdb\/data\n\n# show largest top 10 files\n$ du -hsx * | sort -rh | head -10<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/www.codecoffee.com\/tipsforlinux\/articles\/22.html\">https:\/\/www.codecoffee.com\/tipsforlinux\/articles\/22.html<\/a><br \/>\n<a href=\"https:\/\/www.cyberciti.biz\/faq\/how-do-i-find-the-largest-filesdirectories-on-a-linuxunixbsd-filesystem\/\">https:\/\/www.cyberciti.biz\/faq\/how-do-i-find-the-largest-filesdirectories-on-a-linuxunixbsd-filesystem\/<\/a><\/p>\n<h2>\u67e5 IO<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ sudo apt-get install dstat iotop\n\n# \u53ef\u4ee5\u986f\u793a\u54ea\u4e9b process \u5728\u9032\u884c io \u64cd\u4f5c\n$ dstat --top-io --top-bio\n\n# with \u2013only option to see only processes or threads actually doing I\/O\n$ sudo iotop --only<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/www.cyberciti.biz\/hardware\/linux-iotop-simple-top-like-io-monitor\/\">https:\/\/www.cyberciti.biz\/hardware\/linux-iotop-simple-top-like-io-monitor\/<\/a><\/p>\n<h2>\u67e5 CPU bound \u6216 IO bound<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ iostat -c | head -3 ; iostat -c 1 20<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/serverfault.com\/questions\/72209\/cpu-or-network-i-o-bound\">https:\/\/serverfault.com\/questions\/72209\/cpu-or-network-i-o-bound<\/a><br \/>\n<a href=\"https:\/\/askubuntu.com\/questions\/1540\/how-can-i-find-out-if-a-process-is-cpu-memory-or-disk-bound\">https:\/\/askubuntu.com\/questions\/1540\/how-can-i-find-out-if-a-process-is-cpu-memory-or-disk-bound<\/a><\/p>\n<p><code>iotop<\/code> cannot is not working inside a container.<\/p>\n<h2>\u67e5 Process<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ ps aux\n$ pstree -a\n\n# attach to a process to find out system calls the process calls\n# -t -- absolute timestamp\n# -T -- print time spent in each syscall\n# -s strsize -- limit length of print strings to STRSIZE chars (default 32)\n# -f -- follow forks\n# -e -- filtering expression: <code>option=trace,abbrev,verbose,raw,signal,read,write,fault<\/code>\n# -u username -- run command as username handling setuid and\/or setgid\n$ strace -t -T -f -s 2048 -p THE_PID\n\n# find out which files that nginx accesses\n# you could try to find something related to the error message first:\n# write(1, \"Ign http:\/\/192.168.212.136 trusty Releasen\", 62) = 62\n# writev(12, [{\"HTTP\/1.1 500 Internal Server Error\"..., 256}, {...}, {...}, {...}, 4]) = 276\n$ strace -f -e trace=file service nginx start\n\n# \u986f\u793a PID 3001 \u7684 process \u662f\u7528\u4ec0\u9ebc\u6307\u4ee4\u548c\u53c3\u6578\u555f\u52d5\u7684\n$ tr '\u0000' 'n' &lt; \/proc\/3001\/cmdline\n\n# only on macOS\n$ top -c a -p 1537<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/mp.weixin.qq.com\/s\/Sf79W5dqUFx7rUYRrtx88Q\">https:\/\/mp.weixin.qq.com\/s\/Sf79W5dqUFx7rUYRrtx88Q<\/a><br \/>\n<a href=\"https:\/\/blogs.oracle.com\/linux\/strace-the-sysadmins-microscope-v2\">https:\/\/blogs.oracle.com\/linux\/strace-the-sysadmins-microscope-v2<\/a><br \/>\n<a href=\"https:\/\/zwischenzugs.com\/2011\/08\/29\/my-favourite-secret-weapon-strace\/\">https:\/\/zwischenzugs.com\/2011\/08\/29\/my-favourite-secret-weapon-strace\/<\/a><\/p>\n<h2>\u67e5 Kernel Logs<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\"># \u986f\u793a\u6700\u8fd1\u7684 15 \u7b46 system messages\n$ dmesg | tail -fn 15\n\n# \u986f\u793a\u6709\u95dc killed process \u7684 logs\n$ dmesg | grep -E -i -B50 'killed process'<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/stackoverflow.com\/questions\/726690\/what-killed-my-process-and-why\">https:\/\/stackoverflow.com\/questions\/726690\/what-killed-my-process-and-why<\/a><\/p>\n<h2>\u67e5 Network<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ sar -n TCP,ETCP 1<\/code><\/pre>\n<h2>\u67e5 DNS<\/h2>\n<p>Resolve a domain name using <code>dig<\/code>:<\/p>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ apt-get install curl dnsutils iputils-ping\n# or\n$ apk add --update bind-tools\n\n$ dig +short october-api.default.svc.cluster.local\n10.32.1.79\n\n$ dig +short redis-broker.default.svc.cluster.local\n10.60.32.20\n10.60.33.15\n\n$ dig +short redis-broker-0.redis-broker.default.svc.cluster.local\n10.60.32.20<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/blog.longwin.com.tw\/2013\/03\/dig-dns-query-debug-2013\/\">https:\/\/blog.longwin.com.tw\/2013\/03\/dig-dns-query-debug-2013\/<\/a><\/p>\n<p>Resolve a domain name using <code>nslookup<\/code>:<\/p>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ apt-get install dnsutils\n\n$ nslookup redis-broker.default.svc.cluster.local\nServer:    10.3.240.10\nAddress 1: 10.3.240.10 kube-dns.kube-system.svc.cluster.local\n\nName:      redis-broker.default.svc.cluster.local\nAddress 1: 10.0.69.46 redis-broker-0.redis-broker.default.svc.cluster.local<\/code><\/pre>\n<p>Find specific types of DNS records:<\/p>\n<pre class=\"line-numbers\"><code class=\"language-console\">$ nslookup -q=TXT codetengu.com\nServer:     1.1.1.1\nAddress:    1.1.1.1#53\n\nNon-authoritative answer:\ncodetengu.com    text = \"zoho-verification=xxx.zmverify.zoho.com\"\n\nAuthoritative answers can be found from:<\/code><\/pre>\n<p><code>nslookup<\/code> could return multiple A records for a domain which is commonly known as round-robin DNS.<\/p>\n<p>ref:<br \/>\n<a href=\"https:\/\/serverfault.com\/questions\/590277\/why-does-nslookup-return-two-or-more-ip-address-for-yahoo-com-or-microsoft-com\">https:\/\/serverfault.com\/questions\/590277\/why-does-nslookup-return-two-or-more-ip-address-for-yahoo-com-or-microsoft-com<\/a><\/p>\n<h2>\u67e5 Nginx<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-console\"># \u986f\u793a\u5404\u500b status code \u7684\u6578\u91cf\n$ cat access.log | cut -d '\"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -rn\n\n# \u986f\u793a\u54ea\u4e9b URL \u7684 404 \u6578\u91cf\u6700\u591a\n$ awk '($9 ~ \/404\/)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn\n\n# \u986f\u793a 2016\/10\/01 \u7684 16:00 ~ 18:00 \u7684 log\n$ grep \"01\/Oct\/2016:1[6-8]\" access.log\n\n# \u986f\u793a 2016\/10\/01 \u7684 09:00 ~ 12:00 \u7684 log\n$ egrep \"01\/Oct\/2016:(0[8-9]|1[0-2])\" access.log<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"http:\/\/stackoverflow.com\/questions\/7575267\/extract-data-from-log-file-in-specified-range-of-time\">http:\/\/stackoverflow.com\/questions\/7575267\/extract-data-from-log-file-in-specified-range-of-time<\/a><br \/>\n<a href=\"http:\/\/superuser.com\/questions\/848971\/unix-command-to-grep-a-time-range\">http:\/\/superuser.com\/questions\/848971\/unix-command-to-grep-a-time-range<\/a><\/p>\n<p>\u5982\u679c status code \u662f 502 Bad Gateway<br \/>\n\u901a\u5e38\u8868\u793a\u662f load balancer \/ nginx \u7684 upstream server \u639b\u4e86\u6216\u9023\u4e0d\u5230<br \/>\n\u5982\u679c\u662f Kubernetes service \u7684\u8a71<br \/>\n\u53ef\u80fd\u662f Service spec.selector \u8ddf pod \u5339\u914d\u4e0d\u8d77\u4f86<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Linux commands that every DevOps engineer or SysAdmin should know.<\/p>\n","protected":false},"author":1,"featured_media":413,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[101,74],"class_list":["post-412","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-devops","tag-cli-tool","tag-linux"],"_links":{"self":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/412","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/comments?post=412"}],"version-history":[{"count":0,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/412\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media\/413"}],"wp:attachment":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media?parent=412"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/categories?post=412"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/tags?post=412"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}