Write next generation JavaScript with Babel 7

Write next generation JavaScript with Babel 7

You write the next generation JavaScript code (ES6 or ES2018!) and using Babel to convert them to ES5. Even more, with the new @babel/preset-env module, it is able to intellectually convert your next generation ECMAScript code to compatible syntax based on browser compatibility statistics. So you don't have to target specific browser versions anymore!

ref:
https://babeljs.io/
https://babeljs.io/docs/en/babel-preset-env

There is a real-world project with proper configurations. The following article is based on this project.
https://github.com/vinta/pangu.js

Babel

$ npm install \
@babel/core \
@babel/cli \
@babel/preset-env \
--save-dev

ref:
https://babeljs.io/setup

// babel.config.js
module.exports = function(api) {
    api.cache(false);
    return {
        presets: [
            "@babel/preset-env"
        ],
        comments: false
    };
};

ref:
https://babeljs.io/docs/en/configuration

It is also recommended to put common commands in the scripts section of the package.json file.

// package.json
{
    ...
    "scripts": {
        "clear:shared": "rm -rf ./dist/shared/",
        "clear:browser": "rm -rf ./dist/browser/",
        "clear:node": "rm -rf ./dist/node/",
        "clear": "npm run clear:shared && npm run clear:browser && npm run clear:node",
        "build:shared": "npm run clear:shared && babel src/shared/ -d dist/shared/",
        "build:browser": "npm run clear:browser && webpack",
        "build:node": "npm run clear:node && babel src/node/ -d dist/node/",
        "build": "npm run build:shared && npm run build:browser && npm run build:node",
    },
    ...
}
$ npm run build:node

ref:
https://babeljs.io/docs/en/babel-cli/

Webpack

$ npm install \
webpack \
webpack-cli \
babel-loader \
terser-webpack-plugin \
--save-dev
// webpack.config.js
var _ = require('underscore');
var fs = require('fs');
var path = require('path');
var TerserPlugin = require('terser-webpack-plugin');
var webpack = require('webpack');

var packageInfo = require('./package.json');

var entryPath = './src/browser/pangu.js';

module.exports = {
  target: 'web',
  // mode: 'development',
  mode: 'production',
  entry: {
    'pangu': entryPath,
    'pangu.min': entryPath
  },
  output: {
    path: path.resolve(__dirname, 'dist/browser/'),
    filename: '[name].js',
    library: 'pangu',
    libraryTarget: 'umd',
    umdNamedDefine: true
  },
  module: {
    rules: [
      {
        test: /\.js$/,
        exclude: /node_modules|node/,
        use: {
          loader: 'babel-loader',
          options: {
            babelrc: false,
            presets: [
              [
                "@babel/preset-env",
                {
                  "modules": "umd"
                }
              ]
            ]
          }
        }
      }
    ]
  },
  devtool: false,
  optimization: {
    minimizer: [
      new TerserPlugin({
        include: /\.min\.js$/
      })
    ],
  },
}

ref:
https://webpack.js.org/configuration/

@babel/preset-env transpiles your files to commonjs by default, which requires the transpiled files to be included by require or import. To make this compatible with your Chrome extension, you need to transpile the files as umd module.

ref:
https://stackoverflow.com/questions/52929562/babel-7-uncaught-referenceerror-after-transpiling-a-module

$ nom run build:browser

Karma

$ npm install \
@babel/register \
karma-babel-preprocessor \
karma-chrome-launcher \
karma-coverage \
karma-mocha \
karma-mocha-reporter \
puppeteer \
chai \
--save-dev
// karma.conf.js
module.exports = function(config) {
  config.set({
    frameworks: [
      'mocha'
    ],
    browsers: [
      'ChromeHeadless'
    ],
    files: [
      'node_modules/chai/chai.js',
      'dist/browser/pangu.js',
      'test/browser/*.js',
    ],
    preprocessors: {
        'dist/browser/pangu.js': ['coverage'],
    },
    reporters: [
      'mocha',
      'coverage'
    ],
    singleRun: true,
    coverageReporter: {
      type: 'lcov',
      subdir: '.'
    },
  });
};

ref:
https://karma-runner.github.io/3.0/config/configuration-file.html

$ nom run test:browser
MongoDB Change Stream: react to real-time data changes

MongoDB Change Stream: react to real-time data changes

What is Change Stream?

Change Stream is a Change Data Capture (CDC) feature provided by MongoDB since v3.6. In layman's terms, it's a high-level API that allows you to subscribe to real-time notifications whenever there is a change in your MongoDB collections, databases, or the entire cluster, in an event-driven fashion.

Change Stream uses information stored in the oplog (operations log) to produce the change event. The oplog.rs is a special capped collection that keeps a rolling record of all insert, update, and remove operations that come into your MongoDB so other members of the Replica Set can copy them. Since Change Stream is built on top of the oplog, it is only available for Replica Sets and Sharded clusters.

The problem with most databases' replication logs is that they have long been considered to be an internal implementation detail of the database, not a public API (Martin Kleppmann, 2017).

Change Stream comes to rescue!

Change Stream in a Sharded cluster

MongoDB has a global logical clock that enables the server to order all changes across a Sharded cluster.

To guarantee total ordering of changes, for each change notification the mongos checks with each shard to see if the shard has seen more recent changes. Sharded clusters with one or more shards that have little or no activity for the collection, or are "cold", can negatively affect the response time of the change stream as the mongos must still check with those cold shards to guarantee total ordering of changes.

References:

What can Change Stream do?

There are some typical use cases of Change Stream:

  • Syncing fields between the source and denormalized collections to mitigate the data consistency issue.
  • Invalidating the cache.
  • Updating the search index.
  • Replicating data to a data warehouse.
  • Hooking up Change Stream to a generic streaming processing pipeline, e.g., Kafka or Spark Streaming.

How to open a Change Stream?

First of all, you must have a Replica Set or a Shared cluster for your MongoDB deployment and make sure you are using WiredTiger storage engine. If you don't, you might use MongoDB all wrong.

All code samples below are written in Node.js.

const { MongoClient, ReadPreference } = require('mongodb');

const MONGO_URL = 'mongodb://127.0.0.1:27017/';

(async () => {
    const mongoClient = await MongoClient.connect(MONGO_URL, {
        appname: 'test',
        readPreference: ReadPreference.PRIMARY,
        useNewUrlParser: true,
    });
    const db = await mongoClient.db('test');
    const changeStream = db.collection('user').watch([], {'fullDocument': 'updateLookup'});

    changeStream.on('change', (event) => {
        console.log(event);
    });
})();

You could also enable 'fullDocument': 'updateLookup' which includes the entire document in each update event, but as the name says, it does a lookup which has an overhead and might exceed the 16MB limitation on BSON documents.

Also, the content of fullDocument may differ from the updateDescription if other majority-committed operations modified the document between the original update operation and the full document lookup. Be cautious when you use it.

References:

  • Change Events
    • Besides regular insert, update, and delete, there is also a replace event which triggered by a update operation.

How to aggregate Change Stream events?

One of the advantages of Change Stream is that you are able to leverage MongoDB's powerful [aggregation]() framework - allowing you to filter and modify the output of Change Stream.

However, there is a tricky part in update events, field names and their contents in the updateDescription.updatedFields might vary if the updated field is an array field. Assuming that we have a tags field which is a list of strings in the user collection. You could try running following code in the mongo shell:

var userId = ObjectId();
db.getCollection('user').insert({
    "_id" : userId,
    "username" : "vinta",
    "tags" : ["tag1"]
});

db.getCollection('user').updateOne({_id: userId}, {
    '$addToSet': {'tags': 'tag2'},
});
// the change event output would look like:
// {
//     ...
//     "operationType": "update",
//     "updateDescription": {
//         "updatedFields": {
//             "tags": ["tag1", "tag2"]
//         }
//     }
//     ...
// }

db.getCollection('user').updateOne({_id: userId}, {
    '$push': {'tags': 'tag3'},
});
// the change event output would look like:
// {
//     ...
//     "operationType": "update",
//     "updateDescription": {
//         "updatedFields": {
//             "tags.2": "tag3"
//         }
//     }
//     ...
// }

Fortunately, to mitigate the tags and tags.2 problem, we could do some aggregation to $project and $match change events if we only want to listen to the change of the tags field:

const pipeline = [
    {'$project': {
        '_id': 1,
        'operationType': 1,
        'documentKey': 1,
        'changedDocument': {
            '$objectToArray': {
                '$mergeObjects': ['$updateDescription.updatedFields', '$fullDocument'],
            },
        },
        'removedFields': '$updateDescription.removedFields',
    }},
    {'$match': {
        '$or': [
            {'changedDocument.k': /^tags$/},
            {'changedDocument.k': /^tags./},
            {'removedFields': {'$in': ['tags']}},
            {'operationType': 'delete'},
        ],
    }},
    {'$addFields': {
        'changedDocument': {'$arrayToObject': '$changedDocument'},
    }},
];
const changeStream = db.collection('user').watch(pipeline, {});

References:

How to resume a Change Stream?

Another critical feature of Change Stream is Resumability. Since any service will inevitably get restarted or crashed, it is essential that we can resume from the point of time that Change Stream was interrupted.

There are two options in watch() we can use:

  • resumeAfter: A resume token from any change event.
  • startAtOperationTime: A starting timestamp for Change Stream.

resumeAfter

Before using resumeAfter token, there is MongoDB configuration you might need to tackle with, FeatureCompatibilityVersion.

db.adminCommand({getParameter: 1, featureCompatibilityVersion: 1});
db.adminCommand({setFeatureCompatibilityVersion: "4.0"});

A resumeAfter token is carried by every Change Stream event: the _id field whose value looks like {'_data': '825C4607870000000129295A1004AF1EE5355B7344D6B25478700E75259D46645F696400645C42176528578222B13ADEAA0004'}. In other words, the {'_data': 'a hex string'} is your resumeAfter token.

In practice, you should store each resumeAfter token somewhere, for instance, Redis, so that you can resume from a blackout or a restart. It is also a good idea to wrap the store function with a debounced functionality.

Another unusual (and not so reliable) way to get a resumeAfter token is composing one from the oplog.rs collection:

const _ = require('lodash');
const { MongoClient, ReadPreference } = require('mongodb');

const MONGO_URL = 'mongodb://127.0.0.1:27017/';

(async () => {
    const mongoClient = await MongoClient.connect(MONGO_URL, {
        appname: 'test',
        replicaSet: 'rs0',
        readPreference: ReadPreference.PRIMARY,
        useNewUrlParser: true,
    });

    // cannot use 'local' database through mongos
    const localDb = await mongoClient.db('local');

    // querying oplog.rs might take seconds
    const doc = await localDb.collection('oplog.rs')
        .findOne(
            {'ns': 'test.user'}, // dbName.collectionName
            {'sort': {'$natural': -1}},
        );

    // https://stackoverflow.com/questions/48665409/how-do-i-resume-a-mongodb-changestream-at-the-first-document-and-not-just-change
    // https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/key_string.cpp
    // https://github.com/mongodb/mongo/blob/master/src/mongo/bson/bsontypes.h
    const resumeAfterData = [
        '82', // unknown
        doc.ts.toString(16), // timestamp
        '29', // unknown
        '29', // unknown
        '5A', // CType::BinData
        '10', // length (16)
        '04', // BinDataType of newUUID
        doc.ui.toString('hex'), // the collection uuid (see `db.getCollectionInfos({name: 'user'})`)
        '46', // CType::Object
        '64', // CType::OID (vary from the type of the collection primary key)
        '5F', // _ (vary from the field name of the collection primary key)
        '69', // i
        '64', // d
        '00', // null
        '64', // CType::OID (vary from the type of document primary key)
        _.get(doc, 'o2._id', _.get(doc, 'o._id')).toString('hex'), // ObjectID, update operations have `o2` field and others have `o` field
        '00', // null
        '04', // unknown
    ].join('').toUpperCase();

    const options = {
        'resumeAfter': {
            '_data': resumeAfterData,
        },
    };
    console.log(options);

    const db = await mongoClient.db('test');
    const changeStream = db.collection('user').watch([], options);

    changeStream.on('change', (event) => {
        console.log(event);
    });
})();

startAtOperationTime

The startAtOperationTime is only available in MongoDB 4.0+. It simply represents a starting point of time for the Change Stream. Also, you must make sure that the specified starting point is in the time range of the oplog if it is in the past.

The tricky part is that this option only accepts a MongoDB Timestamp object. You could also retrieve the latest timestamp directly from db.adminCommand({replSetGetStatus: 1}).

const { MongoClient, ReadPreference, Timestamp } = require('mongodb');

const MONGO_URL = 'mongodb://127.0.0.1:27017/';

(async () => {
    const mongoClient = await MongoClient.connect(MONGO_URL, {
        appname: 'test',
        readPreference: ReadPreference.PRIMARY,
        useNewUrlParser: true,
    });

    const options = {
        'startAtOperationTime': Timestamp(1, Date.now() / 1000),
    };
    console.log(options);

    const db = await mongoClient.db('test');
    const changeStream = db.collection('user').watch([], options);

    changeStream.on('change', (event) => {
        console.log(event);
    });
})();
mitmproxy: proxy any network traffic through your local machine

mitmproxy: proxy any network traffic through your local machine

mitmproxy is your swiss-army knife for interactive HTTP/HTTPS proxy. In fact, it can be used to intercept, inspect, modify and replay web traffic such as HTTP/1, HTTP/2, WebSockets, or any other SSL/TLS-protected protocols.

Moreover, mitproxy has a powerful Python API offers full control over any intercepted request and response.

ref:
https://mitmproxy.org/
https://docs.mitmproxy.org/stable/

Concept

ref:
https://docs.mitmproxy.org/stable/concepts-howmitmproxyworks/

Installation

$ brew install mitmproxy

$ mitmproxy --version
Mitmproxy: 4.0.4
Python:    3.7.0
OpenSSL:   OpenSSL 1.0.2p  14 Aug 2018
Platform:  Darwin-18.0.0-x86_64-i386-64bit

ref:
https://docs.mitmproxy.org/stable/overview-installation/

Configuration

Make your computer become the man of man-in-the-middle attack.

macOS

$ ipconfig getifaddr en0
192.168.0.128

$ mitmproxy -p 8888
# or
$ mitmweb -p 8888
$ open http://127.0.0.1:8081/

Flow List keys:

  • ?: Show help
  • q: Exit the current view
  • f: Set view filter
  • r: Replay this flow
  • i: Set intercept filter
  • hjkl or arrow: Move left/down/up/right
  • enter: Select

Flow Details keys:

  • tab: Select next
  • m: Set flow view mode
  • e: Edit this flow (request or response)
  • a: Accept this intercepted flow

ref:
https://docs.mitmproxy.org/stable/tools-mitmproxy/
https://github.com/mitmproxy/mitmproxy/blob/master/mitmproxy/tools/console/defaultkeys.py

iOS

  • Go to Settings > Wi-Fi > Your Wi-Fi > Configure Proxy
    • Select Manual, enter the following values:
      • Server: 192.168.0.128
      • Port: 8888
      • Authentication: unchecked
  • Open http://mitm.it/ on Safari
    • Install the corresponding certificate for your device
  • Go to Settings > General > About > Certificate Trust Settings
    • Turn on the mitmproxy certificate
  • Open any app you want to watch

ref:
https://docs.mitmproxy.org/stable/concepts-certificates/

Usage

The most exciting feature is you could alter any request and response using a Python script, mitmdump -s!

ref:
https://docs.mitmproxy.org/stable/tools-mitmdump/
https://github.com/mitmproxy/mitmproxy/tree/master/examples

Deal With Certificate Pinning

You can use your own certificate by passing the --certs example.com=/path/to/example.com.pem option to mitmproxy. Mitmproxy then uses the provided certificate for interception of the specified domain.

The certificate file is expected to be in the PEM format which would roughly looks like this:

-----BEGIN PRIVATE KEY-----
<private key>
-----END PRIVATE KEY-----

-----BEGIN CERTIFICATE-----
<cert>
-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----
<intermediary cert (optional)>
-----END CERTIFICATE-----
$ mitmproxy -p 8888 --certs example.com=example.com.pem

ref:
https://docs.mitmproxy.org/stable/concepts-certificates/#using-a-custom-server-certificate

Redirect Requests To Your Local Development Server

# redirect_to_localhost.py
from mitmproxy import ctx
from mitmproxy import http

REMOTE_HOST = 'api.example.com'
DEV_HOST = '192.168.0.128'
DEV_PORT = 8000

def request(flow: http.HTTPFlow) -> None:
    if flow.request.pretty_host in [REMOTE_HOST, DEV_HOST]:
        ctx.log.info('=== request')
        ctx.log.info(str(flow.request.headers))
        ctx.log.info(f'content: {str(flow.request.content)}')

        flow.request.scheme = 'http'
        flow.request.host = DEV_HOST
        flow.request.port = DEV_PORT

def response(flow: http.HTTPFlow) -> None:
    if flow.request.pretty_host == DEV_HOST:
        ctx.log.info('=== response')
        ctx.log.info(str(flow.response.headers))
        if flow.response.headers.get('Content-Type', '').startswith('image/'):
            return
        ctx.log.info(f'body: {str(flow.response.get_content())}')

ref:
https://discourse.mitmproxy.org/t/reverse-mode-change-request-host-according-to-the-sni-https/466

You could use negative regex with --ignore-hosts to only watch specific domains. Of course, you are still able to blacklist any domain you don't want: --ignore-hosts 'apple.com|icloud.com|itunes.com|facebook.com|googleapis.com|crashlytics.com'.

Currently, changing the Host server for HTTP/2 connections is not allowed, but you could just disable HTTP/2 proxy to solve the issue if you don't need HTTP/2 for local development.

$ mitmdump -p 8888 \
--certs example.com=example.com.pem \
-v --flow-detail 3 \
--ignore-hosts '^(?!.*example\.com)' \
--no-http2 \
-s redirect_to_localhost.py

ref:
https://stackoverflow.com/questions/29414158/regex-negative-lookahead-with-wildcard

碼天狗週刊 第 140 期 @vinta - MongoDB, Kubernetes, NGINX, Google Cloud Platform, MySQL

碼天狗週刊 第 140 期 @vinta - MongoDB, Kubernetes, NGINX, Google Cloud Platform, MySQL

本文同步發表於 CodeTengu Weekly - Issue 140

MongoDB cookbook: Queries and Aggregations

Issue 130 有提到,MongoDB 的 Aggregation 其實很強大,尤其搭配 $elemMatch$project$let$unwind$facet 等功能,可以直接完成很多複雜的業務邏輯,不需要多寫一行 code,雖然哪些事應該讓 DB 做、哪些事得在 API server 做,這就見仁見智啦。

不過 MongoDB Aggregation 寫起來的阿雜程度實在也跟 Elasticsearch 的 Query DSL 不遑多讓了(Thanks JSON),因為老是記不起來各種 operators 的用法和限制,所以就遵循之前提過的 Cookbook 模式,幫自己寫了一份筆記,複習、速查、分享各相宜。

Kubernetes Best Practices with Sandeep Dinesh (Google)

這個影片是 Google 的工程師在講使用 Kubernetes 和 containers 時的最佳實踐,影片的後半段則是 Weaveworks 的人在講他們搭建自己的 Kubernetes cluster 時遇到的各種挑戰和解法。

雖然前半段的內容有不少在 Kubernetes 和 GKE 的官方文件裡都有提到,但是有人貼心地幫你整理好還是挺棒的(就像你訂閱的這個 weekly 一樣),畢竟 Kubernetes 的文件真心多到靠北,看完都已經是 YAML 的形狀了。不過我對於越來越多人都推薦 Helm 這點還是不太能領略,總覺得 Helm 對一般使用者的意義好像不大啊(又不是 PaaS),我還不如直接幹一份 Chart 回來自己維護,之後要升級或客製化也比較方便,畢竟也就是一堆 YAML 檔。比較可行的用途似乎是團隊共用一套 Chart 來部署 production、staging 或 dev 環境?

延伸閱讀:

Tuning NGINX behind Google Cloud Platform HTTP(S) Load Balancer

因為 Google Cloud HTTP Load Balancing 的某些特性,如果你在 Google Kubernetes Engine 裡面跑 NGINX(或 OpenResty)的話,會有一些額外的 config 需要設定,尤其是 keepalive_timeout 620s;

題外話,Google Cloud 的 Load Balancer 也是很強啊,除了支援 QUIC 之外,更是默認啟用 TCP BBR

延伸閱讀:

别废话,各种 SQL 到底加了什么锁?

這個系列的文章專門在講 MySQL InnoDB 在各種情況下會使用的各種 lock,作者寫得非常淺顯易懂,最喜歡讀這種技術文章了~

延伸閱讀:

TeePublic

上禮拜發現的一個專門賣 T-shirt 的網站,重點是上面賣的 T-shirt 都!超!宅!它甚至有一個叫做 Programmer 的分類,或是你也可以隨便拿幾個你喜歡的電影、遊戲或動漫畫作品的名字去搜尋看看,保證有驚喜。我看到的第一天就買了八件。推薦各位臭宅去感受一下。

@vinta 分享!

碼天狗週刊 第 135 期 @vinta - Kubernetes, Python, MongoDB

碼天狗週刊 第 135 期 @vinta - Kubernetes, Python, MongoDB

本文同步發表於 CodeTengu Weekly - Issue 135

The incomplete guide to Google Kubernetes Engine

根據前陣子搗鼓 Kubernetes 的心得寫了一篇文章,跟大家分享一下,希望有幫助。內容包含概念介紹、建立 cluster、新增 node pools、部署 ConfigMap、Deployment with LivenessProbe/ReadinessProbe、Horizontal Pod Autoscaler、Pod Disruption Budget、StatefulSet、DaemonSet,到說明 Service 和 Ingress 的關係,以及 Node Affinity 與 Pod Affinity 的應用等。

順帶一提,就算只是架來玩玩,建議大家可以直接在 Google Kubernetes Engine 開一個 preemptible(類似 AWS 的 Spot Instances)的 k8s cluster,價格超便宜,所以就不要再用 minikube 啦。不過現在連 Amazon 也有自己的 managed Kubernetes 了,雖然目前公司是用 GCP,但是還是比較懷念 AWS 啊~

Fluent Python

雖然 Python 也是寫了一陣子了,但是每次讀這本書還是能夠學到不少。真心推薦。

當初學 Python 讀的是另一本 Learning Python,查了一下,哇都出到第五版了。

延伸閱讀:

A deep dive into the PyMongo MongoDB driver

Replica Set 通常是 MongoDB 的標準配置(再來就是 Sharding 了),這個 talk 詳細地說明了 Replica Set 是如何應對 service discovery 以及 PyMongo 和 Replica Set 之間是怎麼溝通的。

延伸閱讀:

Let's talk about usernames

就像我們之前提到過很多次的 Falsehoods 系列,這篇文章也是一直不厭其煩地告訴大家,幾乎每個系統、每個網站都會有的東西:username,其實沒有你以為的那麼簡單。大家感受一下。

作者也提到一個很重要的 The Tripartite Identity Pattern,把所謂的 ID 分成以下三種:

  1. System-level identifier, suitable for use as a target of foreign keys in our database
  2. Login identifier, suitable for use in performing a credential check
  3. Public identity, suitable for displaying to other users

而不要想用同一個 identifier 搞定所有用途。

Web Architecture 101

這篇文章淺顯易懂地解釋了一個現代的 web service 通常會具備的各項元件。不過說真的,如果你今天是一個初入門的後端工程師,你究竟得花多少時間和心力才能摸清楚這篇文章提到的東西?更別提那些更加底層的知識了,喔,這篇文章甚至也還沒提到 DevOps 的事情呢。就像之前讀到的 Will Kubernetes Collapse Under the Weight of Its Complexity?,總覺得整個態勢發展到現在,對新手(甚至是我們這種普通的 1x 工程師)似乎不是很友善啊。

延伸閱讀: