I Failed the Turing Test

Write next generation JavaScript with Babel 7

2019-02-142019-10-21VintaJavaScript, Web Development

You write the next generation JavaScript code (ES6 or ES2018!) and using Babel to convert them to ES5. Even more, with the new @babel/preset-env module, it is able to intellectually convert your next generation ECMAScript code to compatible syntax based on browser compatibility statistics. So you don't have to target specific browser versions anymore!

ref:
https://babeljs.io/
https://babeljs.io/docs/en/babel-preset-env

There is a real-world project with proper configurations. The following article is based on this project.
https://github.com/vinta/pangu.js

Babel

$ npm install \
@babel/core \
@babel/cli \
@babel/preset-env \
--save-dev

ref:
https://babeljs.io/setup

// babel.config.js
module.exports = function(api) {
    api.cache(false);
    return {
        presets: [
            "@babel/preset-env"
        ],
        comments: false
    };
};

ref:
https://babeljs.io/docs/en/configuration

It is also recommended to put common commands in the scripts section of the package.json file.

// package.json
{
    ...
    "scripts": {
        "clear:shared": "rm -rf ./dist/shared/",
        "clear:browser": "rm -rf ./dist/browser/",
        "clear:node": "rm -rf ./dist/node/",
        "clear": "npm run clear:shared && npm run clear:browser && npm run clear:node",
        "build:shared": "npm run clear:shared && babel src/shared/ -d dist/shared/",
        "build:browser": "npm run clear:browser && webpack",
        "build:node": "npm run clear:node && babel src/node/ -d dist/node/",
        "build": "npm run build:shared && npm run build:browser && npm run build:node",
    },
    ...
}

$ npm run build:node

ref:
https://babeljs.io/docs/en/babel-cli/

Webpack

$ npm install \
webpack \
webpack-cli \
babel-loader \
terser-webpack-plugin \
--save-dev

// webpack.config.js
var _ = require('underscore');
var fs = require('fs');
var path = require('path');
var TerserPlugin = require('terser-webpack-plugin');
var webpack = require('webpack');

var packageInfo = require('./package.json');

var entryPath = './src/browser/pangu.js';

module.exports = {
  target: 'web',
  // mode: 'development',
  mode: 'production',
  entry: {
    'pangu': entryPath,
    'pangu.min': entryPath
  },
  output: {
    path: path.resolve(__dirname, 'dist/browser/'),
    filename: '[name].js',
    library: 'pangu',
    libraryTarget: 'umd',
    umdNamedDefine: true
  },
  module: {
    rules: [
      {
        test: /\.js$/,
        exclude: /node_modules|node/,
        use: {
          loader: 'babel-loader',
          options: {
            babelrc: false,
            presets: [
              [
                "@babel/preset-env",
                {
                  "modules": "umd"
                }
              ]
            ]
          }
        }
      }
    ]
  },
  devtool: false,
  optimization: {
    minimizer: [
      new TerserPlugin({
        include: /\.min\.js$/
      })
    ],
  },
}

ref:
https://webpack.js.org/configuration/

@babel/preset-env transpiles your files to commonjs by default, which requires the transpiled files to be included by require or import. To make this compatible with your Chrome extension, you need to transpile the files as umd module.

ref:
https://stackoverflow.com/questions/52929562/babel-7-uncaught-referenceerror-after-transpiling-a-module

$ nom run build:browser

Karma

$ npm install \
@babel/register \
karma-babel-preprocessor \
karma-chrome-launcher \
karma-coverage \
karma-mocha \
karma-mocha-reporter \
puppeteer \
chai \
--save-dev

// karma.conf.js
module.exports = function(config) {
  config.set({
    frameworks: [
      'mocha'
    ],
    browsers: [
      'ChromeHeadless'
    ],
    files: [
      'node_modules/chai/chai.js',
      'dist/browser/pangu.js',
      'test/browser/*.js',
    ],
    preprocessors: {
        'dist/browser/pangu.js': ['coverage'],
    },
    reporters: [
      'mocha',
      'coverage'
    ],
    singleRun: true,
    coverageReporter: {
      type: 'lcov',
      subdir: '.'
    },
  });
};

ref:
https://karma-runner.github.io/3.0/config/configuration-file.html

$ nom run test:browser

MongoDB Change Stream: react to real-time data changes

2019-01-222023-08-19VintaDatabase, JavaScript

What is Change Stream?

Change Stream is a Change Data Capture (CDC) feature provided by MongoDB since v3.6. In layman's terms, it's a high-level API that allows you to subscribe to real-time notifications whenever there is a change in your MongoDB collections, databases, or the entire cluster, in an event-driven fashion.

Change Stream uses information stored in the oplog (operations log) to produce the change event. The oplog.rs is a special capped collection that keeps a rolling record of all insert, update, and remove operations that come into your MongoDB so other members of the Replica Set can copy them. Since Change Stream is built on top of the oplog, it is only available for Replica Sets and Sharded clusters.

The problem with most databases' replication logs is that they have long been considered to be an internal implementation detail of the database, not a public API (Martin Kleppmann, 2017).

Change Stream comes to rescue!

Change Stream in a Sharded cluster

MongoDB has a global logical clock that enables the server to order all changes across a Sharded cluster.

To guarantee total ordering of changes, for each change notification the mongos checks with each shard to see if the shard has seen more recent changes. Sharded clusters with one or more shards that have little or no activity for the collection, or are "cold", can negatively affect the response time of the change stream as the mongos must still check with those cold shards to guarantee total ordering of changes.

References:

Change Streams Production Recommendations

What can Change Stream do?

There are some typical use cases of Change Stream:

Syncing fields between the source and denormalized collections to mitigate the data consistency issue.
Invalidating the cache.
Updating the search index.
Replicating data to a data warehouse.
Hooking up Change Stream to a generic streaming processing pipeline, e.g., Kafka or Spark Streaming.

How to open a Change Stream?

First of all, you must have a Replica Set or a Shared cluster for your MongoDB deployment and make sure you are using WiredTiger storage engine. If you don't, you might use MongoDB all wrong.

All code samples below are written in Node.js.

const { MongoClient, ReadPreference } = require('mongodb');

const MONGO_URL = 'mongodb://127.0.0.1:27017/';

(async () => {
    const mongoClient = await MongoClient.connect(MONGO_URL, {
        appname: 'test',
        readPreference: ReadPreference.PRIMARY,
        useNewUrlParser: true,
    });
    const db = await mongoClient.db('test');
    const changeStream = db.collection('user').watch([], {'fullDocument': 'updateLookup'});

    changeStream.on('change', (event) => {
        console.log(event);
    });
})();

You could also enable 'fullDocument': 'updateLookup' which includes the entire document in each update event, but as the name says, it does a lookup which has an overhead and might exceed the 16MB limitation on BSON documents.

Also, the content of fullDocument may differ from the updateDescription if other majority-committed operations modified the document between the original update operation and the full document lookup. Be cautious when you use it.

References:

Change Events
- Besides regular insert, update, and delete, there is also a replace event which triggered by a update operation.

How to aggregate Change Stream events?

One of the advantages of Change Stream is that you are able to leverage MongoDB's powerful aggregation framework - allowing you to filter and modify the output of Change Stream.

However, there is a tricky part in update events, field names and their contents in the updateDescription.updatedFields might vary if the updated field is an array field. Assuming that we have a tags field which is a list of strings in the user collection. You could try running following code in the mongo shell:

$addToSet produces complete items of the array field
$push produces only the inserted item of the array field
$pull produces complete items of the array field

var userId = ObjectId();
db.getCollection('user').insert({
    "_id" : userId,
    "username" : "vinta",
    "tags" : ["tag1"]
});

db.getCollection('user').updateOne({_id: userId}, {
    '$addToSet': {'tags': 'tag2'},
});
// the change event output would look like:
// {
//     ...
//     "operationType": "update",
//     "updateDescription": {
//         "updatedFields": {
//             "tags": ["tag1", "tag2"]
//         }
//     }
//     ...
// }

db.getCollection('user').updateOne({_id: userId}, {
    '$push': {'tags': 'tag3'},
});
// the change event output would look like:
// {
//     ...
//     "operationType": "update",
//     "updateDescription": {
//         "updatedFields": {
//             "tags.2": "tag3"
//         }
//     }
//     ...
// }

db.getCollection('user').updateOne({_id: userId}, {
    '$pull': {'tags': 'tag1'},
});
// the change event output would look like:
// {
//     ...
//     "operationType": "update",
//     "updateDescription": {
//         "updatedFields": {
//             "tags": ["tag2", "tag3"]
//         }
//     }
//     ...
// }

Fortunately, to mitigate the tags and tags.2 problem, we could do some aggregation to $project and $match change events if we only want to listen to the change of the tags field:

const pipeline = [
    {'$project': {
        '_id': 1,
        'operationType': 1,
        'documentKey': 1,
        'changedDocument': {
            '$objectToArray': {
                '$mergeObjects': ['$updateDescription.updatedFields', '$fullDocument'],
            },
        },
        'removedFields': '$updateDescription.removedFields',
    }},
    {'$match': {
        '$or': [
            {'changedDocument.k': /^tags$/},
            {'changedDocument.k': /^tags./},
            {'removedFields': {'$in': ['tags']}},
            {'operationType': 'delete'},
        ],
    }},
    {'$addFields': {
        'changedDocument': {'$arrayToObject': '$changedDocument'},
    }},
];
const changeStream = db.collection('user').watch(pipeline, {});

References:

How to resume a Change Stream?

Another critical feature of Change Stream is Resumability. Since any service will inevitably get restarted or crashed, it is essential that we can resume from the point of time that Change Stream was interrupted.

There are two options in watch() we can use:

resumeAfter: A resume token from any change event.
startAtOperationTime: A starting timestamp for Change Stream.

`resumeAfter`

Before using resumeAfter token, there is MongoDB configuration you might need to tackle with, FeatureCompatibilityVersion.

db.adminCommand({getParameter: 1, featureCompatibilityVersion: 1});
db.adminCommand({setFeatureCompatibilityVersion: "4.0"});

A resumeAfter token is carried by every Change Stream event: the _id field whose value looks like {'_data': '825C4607870000000129295A1004AF1EE5355B7344D6B25478700E75259D46645F696400645C42176528578222B13ADEAA0004'}. In other words, the {'_data': 'a hex string'} is your resumeAfter token.

In practice, you should store each resumeAfter token somewhere, for instance, Redis, so that you can resume from a blackout or a restart. It is also a good idea to wrap the store function with a debounced functionality.

Another unusual (and not so reliable) way to get a resumeAfter token is composing one from the oplog.rs collection:

const _ = require('lodash');
const { MongoClient, ReadPreference } = require('mongodb');

const MONGO_URL = 'mongodb://127.0.0.1:27017/';

(async () => {
    const mongoClient = await MongoClient.connect(MONGO_URL, {
        appname: 'test',
        replicaSet: 'rs0',
        readPreference: ReadPreference.PRIMARY,
        useNewUrlParser: true,
    });

    // cannot use 'local' database through mongos
    const localDb = await mongoClient.db('local');

    // querying oplog.rs might take seconds
    const doc = await localDb.collection('oplog.rs')
        .findOne(
            {'ns': 'test.user'}, // dbName.collectionName
            {'sort': {'$natural': -1}},
        );

    // https://stackoverflow.com/questions/48665409/how-do-i-resume-a-mongodb-changestream-at-the-first-document-and-not-just-change
    // https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/key_string.cpp
    // https://github.com/mongodb/mongo/blob/master/src/mongo/bson/bsontypes.h
    const resumeAfterData = [
        '82', // unknown
        doc.ts.toString(16), // timestamp
        '29', // unknown
        '29', // unknown
        '5A', // CType::BinData
        '10', // length (16)
        '04', // BinDataType of newUUID
        doc.ui.toString('hex'), // the collection uuid (see `db.getCollectionInfos({name: 'user'})`)
        '46', // CType::Object
        '64', // CType::OID (vary from the type of the collection primary key)
        '5F', // _ (vary from the field name of the collection primary key)
        '69', // i
        '64', // d
        '00', // null
        '64', // CType::OID (vary from the type of document primary key)
        _.get(doc, 'o2._id', _.get(doc, 'o._id')).toString('hex'), // ObjectID, update operations have `o2` field and others have `o` field
        '00', // null
        '04', // unknown
    ].join('').toUpperCase();

    const options = {
        'resumeAfter': {
            '_data': resumeAfterData,
        },
    };
    console.log(options);

    const db = await mongoClient.db('test');
    const changeStream = db.collection('user').watch([], options);

    changeStream.on('change', (event) => {
        console.log(event);
    });
})();

`startAtOperationTime`

The startAtOperationTime is only available in MongoDB 4.0+. It simply represents a starting point of time for the Change Stream. Also, you must make sure that the specified starting point is in the time range of the oplog if it is in the past.

The tricky part is that this option only accepts a MongoDB Timestamp object. You could also retrieve the latest timestamp directly from db.adminCommand({replSetGetStatus: 1}).

const { MongoClient, ReadPreference, Timestamp } = require('mongodb');

const MONGO_URL = 'mongodb://127.0.0.1:27017/';

(async () => {
    const mongoClient = await MongoClient.connect(MONGO_URL, {
        appname: 'test',
        readPreference: ReadPreference.PRIMARY,
        useNewUrlParser: true,
    });

    const options = {
        'startAtOperationTime': Timestamp(1, Date.now() / 1000),
    };
    console.log(options);

    const db = await mongoClient.db('test');
    const changeStream = db.collection('user').watch([], options);

    changeStream.on('change', (event) => {
        console.log(event);
    });
})();

mitmproxy: proxy any network traffic through your local machine

2018-10-212019-10-30VintaDevOps, Python, Web Development

mitmproxy is your swiss-army knife for interactive HTTP/HTTPS proxy. In fact, it can be used to intercept, inspect, modify and replay web traffic such as HTTP/1, HTTP/2, WebSockets, or any other SSL/TLS-protected protocols.

Moreover, mitproxy has a powerful Python API offers full control over any intercepted request and response.

ref:
https://mitmproxy.org/
https://docs.mitmproxy.org/stable/

Concept

ref:
https://docs.mitmproxy.org/stable/concepts-howmitmproxyworks/

Installation

$ brew install mitmproxy

$ mitmproxy --version
Mitmproxy: 4.0.4
Python:    3.7.0
OpenSSL:   OpenSSL 1.0.2p  14 Aug 2018
Platform:  Darwin-18.0.0-x86_64-i386-64bit

ref:
https://docs.mitmproxy.org/stable/overview-installation/

Configuration

Make your computer become the man of man-in-the-middle attack.

macOS

$ ipconfig getifaddr en0
192.168.0.128

$ mitmproxy -p 8888
# or
$ mitmweb -p 8888
$ open http://127.0.0.1:8081/

Flow List keys:

?: Show help
q: Exit the current view
f: Set view filter
r: Replay this flow
i: Set intercept filter
hjkl or arrow: Move left/down/up/right
enter: Select

Flow Details keys:

tab: Select next
m: Set flow view mode
e: Edit this flow (request or response)
a: Accept this intercepted flow

ref:
https://docs.mitmproxy.org/stable/tools-mitmproxy/
https://github.com/mitmproxy/mitmproxy/blob/master/mitmproxy/tools/console/defaultkeys.py

iOS

Go to Settings > Wi-Fi > Your Wi-Fi > Configure Proxy
- Select Manual, enter the following values:
  - Server: 192.168.0.128
  - Port: 8888
  - Authentication: unchecked
Open http://mitm.it/ on Safari
- Install the corresponding certificate for your device
Go to Settings > General > About > Certificate Trust Settings
- Turn on the mitmproxy certificate
Open any app you want to watch

ref:
https://docs.mitmproxy.org/stable/concepts-certificates/

Usage

The most exciting feature is you could alter any request and response using a Python script, mitmdump -s!

ref:
https://docs.mitmproxy.org/stable/tools-mitmdump/
https://github.com/mitmproxy/mitmproxy/tree/master/examples

Deal With Certificate Pinning

You can use your own certificate by passing the --certs example.com=/path/to/example.com.pem option to mitmproxy. Mitmproxy then uses the provided certificate for interception of the specified domain.

The certificate file is expected to be in the PEM format which would roughly looks like this:

$ mitmproxy -p 8888 --certs example.com=example.com.pem

ref:
https://docs.mitmproxy.org/stable/concepts-certificates/#using-a-custom-server-certificate

Redirect Requests To Your Local Development Server

# redirect_to_localhost.py
from mitmproxy import ctx
from mitmproxy import http

REMOTE_HOST = 'api.example.com'
DEV_HOST = '192.168.0.128'
DEV_PORT = 8000

def request(flow: http.HTTPFlow) -> None:
    if flow.request.pretty_host in [REMOTE_HOST, DEV_HOST]:
        ctx.log.info('=== request')
        ctx.log.info(str(flow.request.headers))
        ctx.log.info(f'content: {str(flow.request.content)}')

        flow.request.scheme = 'http'
        flow.request.host = DEV_HOST
        flow.request.port = DEV_PORT

def response(flow: http.HTTPFlow) -> None:
    if flow.request.pretty_host == DEV_HOST:
        ctx.log.info('=== response')
        ctx.log.info(str(flow.response.headers))
        if flow.response.headers.get('Content-Type', '').startswith('image/'):
            return
        ctx.log.info(f'body: {str(flow.response.get_content())}')

ref:
https://discourse.mitmproxy.org/t/reverse-mode-change-request-host-according-to-the-sni-https/466

You could use negative regex with --ignore-hosts to only watch specific domains. Of course, you are still able to blacklist any domain you don't want: --ignore-hosts 'apple.com|icloud.com|itunes.com|facebook.com|googleapis.com|crashlytics.com'.

Currently, changing the Host server for HTTP/2 connections is not allowed, but you could just disable HTTP/2 proxy to solve the issue if you don't need HTTP/2 for local development.

$ mitmdump -p 8888 \
--certs example.com=example.com.pem \
-v --flow-detail 3 \
--ignore-hosts '^(?!.*example\.com)' \
--no-http2 \
-s redirect_to_localhost.py

ref:
https://stackoverflow.com/questions/29414158/regex-negative-lookahead-with-wildcard

Integrate with Google Cloud API in Python

2018-07-242019-10-22VintaPython, Web Development

google-cloud, Python idiomatic clients for Google Cloud Platform services. There is an older Python library also officially supported by Google, google-api-python-client, which is in maintenance mode.

ref:
https://github.com/googleapis/google-cloud-python
https://github.com/googleapis/google-api-python-client

Installation

$ pip install google-cloud

# you could also only install specific components
$ pip install google-cloud-storage

ref:
https://pypi.org/search/?q=google+cloud

Google Cloud Storage

It is worth noting that, initializing storage.Client() is a blocking call.

ref:
https://googleapis.github.io/google-cloud-python/latest/storage/buckets.html
https://cloud.google.com/storage/docs/reference/libraries

Upload From String

from google.cloud.storage.bucket import Bucket
from google.cloud.storage.blob import Blob

def upload_from_string(bucket_id, content, filename, content_type):
    client = storage.Client()
    bucket = Bucket(client, bucket_id)
    blob = Blob(filename, bucket)
    blob.upload_from_string(content, content_type)

Upload From An URL

from google.cloud import storage
import requests

def upload_from_url(bucket_id, filename, url):
    client = storage.Client()
    session = requests.Session()
    with session.get(url, stream=True) as response:
        bucket = client.get_bucket(bucket_id)
        blob = bucket.blob(filename)
        blob.upload_from_file(response.raw, content_type=response.headers.get('Content-Type'))

Update A File's Metadata

from google.cloud import storage

def update_metadata(bucket, filepath, new_metadata):
    bucket = task.storage_client.get_bucket(bucket)
    blob = bucket.get_blob(filepath)
    blob.metadata = {**blob.metadata, **new_metadata} if blob.metadata else new_metadata
    blob.patch()

new_metadata = {
    'Link': '<https://api.example.com/users/57c16f5bb811055b66d8ef46>; rel="user"',
}

ref:
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/1185

Copy A File

from google.cloud import storage

def copy_file(source_bucket, source_name, destination_bucket, destination_name):
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(source_bucket)
    source_file = source_bucket.blob(source_name)
    destination_bucket = storage_client.get_bucket(destination_bucket)
    destination_file = source_bucket.copy_blob(source_file, destination_bucket, destination_name)
    return destination_file

file_ext_mapping = {
    'image/jpeg': 'jpg',
    'video/mp4': 'mp4',
}
file_ext = file_ext_mapping[original_message.media.mimetype]
source_name = f'messages/{original_message.id}.{file_ext}'
destination_name = f'messages/{new_message.id}.{file_ext}'

copy_file(
    source_bucket='asia.uploads.example.com',
    source_name=source_name,
    destination_bucket='asia.uploads.example.com',
    destination_name=destination_name,
)

ref:
https://cloud.google.com/storage/docs/json_api/v1/objects/copy
https://cloud.google.com/storage/docs/renaming-copying-moving-objects#storage-copy-object-python

Copy A Folder With Batch Operations

from google.cloud import storage

def copy_files(source_bucket_name, source_name_prefix, destination_bucket_name, fix_destination_name_func=None):
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(source_bucket_name)
    destination_bucket = storage_client.get_bucket(destination_bucket_name)
    blobs = source_bucket.list_blobs(prefix=source_name_prefix)

    # YOU CANNOT DO THIS
    # blobs is a HTTP iterator
    # blobs.num_results always return 0
    # if not blobs.num_results:
    #     raise ValueError(f'No objects matched: gs://{source_bucket.name}/{source_name_prefix}')

    with storage_client.batch():
        for source_blob in blobs:
            destination_name = fix_destination_name_func(source_blob.name) if callable(fix_destination_name_func) else source_blob.name
            source_bucket.copy_blob(source_blob, destination_bucket, destination_name)
    return True

source_bucket_name = 'asia.uploads.example.com'
destination_bucket_name = 'asia.contents.example.com'
source_name_prefix = 'media/123'

copy_files(
    source_bucket_name=source_bucket_name,
    destination_bucket_name=destination_bucket_name,
    source_name_prefix=source_name_prefix,
    fix_destination_name_func=lambda source_name: source_name.replace(source_name_prefix, 'forum-posts'),
)

equals to

$ gsutil cp -r "gs://asia.uploads.example.com/media/123/*" "gs://asia.contents.example.com/"

ref:
https://cloud.google.com/storage/docs/listing-objects

batch() does not guarantee the order of executions, so do not mix different type of calls in the same batch. For instance, the batch should not be a mixture of "copy a.txt" then delete a.txt.

ref:
https://googlecloudplatform.github.io/google-cloud-python/latest/storage/batch.html

Upload A File Directly To A Bucket

We first need to generate a signed upload URL and we could upload the file to the URL.

import base64
import datetime
import time

from oauth2client.client import GoogleCredentials
import yarl

credentials = GoogleCredentials.get_application_default()

def signurl(method, url, content_type=None, expires_at=None, md5sum=None, meta=None):
    method, is_resumable = method.upper(), False
    if method in ['RESUMABLE']:
        method, is_resumable = 'POST', True
    path = yarl.URL(url).path

    def signature():
        def _signature_parts():
            def _meta():
                for key, value in (meta or {}).items():
                    yield 'x-goog-meta-{key}:{value}'.format(key=key, value=value)
                if is_resumable:
                    yield 'x-goog-resumable:start'

            yield method
            yield md5sum or ''
            # we need to use `curl -H 'content-type:'` to upload if we sign an empty content-type
            yield content_type or 'application/octet-stream'
            yield str(int(time.mktime(expires_at.timetuple()))) if expires_at else ''
            yield from sorted(_meta())
            yield path

        _, signature = credentials.sign_blob('\n'.join(_signature_parts()))
        return base64.b64encode(signature).decode('utf-8')

    def params():
        yield 'GoogleAccessId', credentials.service_account_email
        if expires_at:
            yield 'Expires', int(time.mktime(expires_at.timetuple()))
        yield 'Signature', signature()

    return str(yarl.URL(url).with_query(**dict(params())))

signurl(
    method='RESUMABLE',
    url='https://storage.googleapis.com/asia.uploads.example.com/media/your-filename.ext'
    expires_at=datetime.datetime.utcnow() + datetime.timedelta(hours=24),
)

$ curl -v -X 'POST' \
-H 'content-type: application/octet-stream' \
-H 'x-goog-resumable:start' \
-d '' 'THE_SIGNED_UPLOAD_URL'

$ curl -v -X PUT \
--upload-file whatever.mp4 \
THE_URL_FROM_LOCATION_HEADER_OF_THE_ABOVE_RESPONSE

ref:
https://cloud.google.com/storage/docs/access-control/signed-urls#signing-resumable
https://cloud.google.com/storage/docs/uploading-objects
https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
https://cloud.google.com/storage/docs/xml-api/resumable-upload

Enable CORS For A Google Cloud Storage Bucket

$ gsutil cors get gs://your_bucket_name

$ cat cors.json
[
  {
    "origin": ["*"],
    "responseHeader": ["Content-Type", "x-goog-resumable:start"],
    "method": ["GET", "PUT", ""]
  }
]
$ gsutil cors set cors.json gs://your_bucket_name

ref:
https://cloud.google.com/storage/docs/gsutil/commands/cors
https://medium.com/imersotechblog/upload-files-to-google-cloud-storage-gcs-from-the-browser-159810bb11e3
http://andrewvos.com/uploading-files-directly-to-google-cloud-storage-from-client-side-javascript

MongoDB operations: Replica Set

2018-07-112019-10-21VintaDatabase, DevOps

A replica set is a group of servers (mongod actually) that maintain the same data set, with one primary which takes client requests, and multiple secondaries that keep copies of the primary's data. If the primary crashes, secondaries can elect a new primary from amongst themselves.

Replication from primary to secondaries is asynchronous.

ref:
https://docs.mongodb.com/v3.6/replication/
https://www.safaribooksonline.com/library/view/mongodb-the-definitive/9781491954454/ch08.html
https://www.percona.com/blog/2018/10/10/mongodb-replica-set-scenarios-and-internals/

Concepts

Primary: A node that accepts writes and is the leader for voting. There can be only one primary.
Secondary: A node that replicates from the primary or another secondary and can be used for reads. There can be a max of 127.
Arbiter: The node does not hold data and only participates in the voting. Also, it cannot be elected as the primary.
- In the event your node count is an even number, add one of these to break the tie. Never add one where it would make the count even.
Priority 0 node: The node cannot be selected as the primary. You might want to lower priority of some slow nodes.
- Priority allows you to prefer specific nodes are primary.
Vote 0 node: The node does not participate in the voting.
- In some cases, having more than eight nodes means additional nodes must not vote.
Hidden node: The hidden node must be a priority 0 node and is invisible to the driver which unable to take queries from clients.
Delayed node: The delayed node must be a hidden node, and its data lag behind the primary for some time.
Tags: Grants special ability to make queries directly to specific nodes. Useful for BI, geo-locality, and other advanced functions.

ref:
https://docs.mongodb.com/manual/core/replica-set-elections/
https://docs.mongodb.com/manual/core/replica-set-priority-0-member/
https://docs.mongodb.com/manual/core/replica-set-hidden-member/
https://docs.mongodb.com/manual/core/replica-set-delayed-member/

Common Architectures

ref:
https://docs.mongodb.com/v3.6/core/replica-set-architectures/
https://www.percona.com/blog/2018/03/22/the-anatomy-of-a-mongodb-replica-set/

Three-Node Replica Set: Primary, Secondary, Secondary

ref:
https://docs.mongodb.com/v3.6/tutorial/deploy-replica-set/
https://docs.mongodb.com/v3.6/tutorial/expand-replica-set/

If you are running MongoDB cluster on Kubernetes, PLEASE USE THE FULL DNS NAME (FQDN). DO NOT use something like pod-name.service-name.

$ mongo mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local
> rs.initiate({
   _id : "rs0",
   members: [
      {_id: 0, host: "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017"},
      {_id: 1, host: "mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local:27017"},
      {_id: 2, host: "mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017"}
   ]
})
{
    "ok" : 1,
    "operationTime" : Timestamp(1531223087, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1531223087, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}
rs0:PRIMARY> db.isMaster()

ref:
https://docs.mongodb.com/v3.6/reference/method/rs.initiate/

$ mongo mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local
rs0:SECONDARY> rs.slaveOk()
rs0:SECONDARY> show dbs
rs0:SECONDARY> rs.conf()
{
    "_id" : "rs0",
    "version" : 1,
    "protocolVersion" : NumberLong(1),
    "members" : [
        {
            "_id" : 0,
            "host" : "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : -1,
        "catchUpTakeoverDelayMillis" : 30000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("5b449c2f9269bb1a807a8cdf")
    }
}
rs0:SECONDARY> rs.status()
{
    "set" : "rs0",
    "date" : ISODate("2018-07-10T11:47:48.474Z"),
    "myState" : 1,
    "term" : NumberLong(1),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "optimes" : {
        "lastCommittedOpTime" : {
            "ts" : Timestamp(1531223260, 1),
            "t" : NumberLong(1)
        },
        "readConcernMajorityOpTime" : {
            "ts" : Timestamp(1531223260, 1),
            "t" : NumberLong(1)
        },
        "appliedOpTime" : {
            "ts" : Timestamp(1531223260, 1),
            "t" : NumberLong(1)
        },
        "durableOpTime" : {
            "ts" : Timestamp(1531223260, 1),
            "t" : NumberLong(1)
        }
    },
    "members" : [
        {
            "_id" : 0,
            "name" : "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 381,
            "optime" : {
                "ts" : Timestamp(1531223260, 1),
                "t" : NumberLong(1)
            },
            "optimeDate" : ISODate("2018-07-10T11:47:40Z"),
            "electionTime" : Timestamp(1531223098, 1),
            "electionDate" : ISODate("2018-07-10T11:44:58Z"),
            "configVersion" : 1,
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 181,
            "optime" : {
                "ts" : Timestamp(1531223260, 1),
                "t" : NumberLong(1)
            },
            "optimeDurable" : {
                "ts" : Timestamp(1531223260, 1),
                "t" : NumberLong(1)
            },
            "optimeDate" : ISODate("2018-07-10T11:47:40Z"),
            "optimeDurableDate" : ISODate("2018-07-10T11:47:40Z"),
            "lastHeartbeat" : ISODate("2018-07-10T11:47:46.599Z"),
            "lastHeartbeatRecv" : ISODate("2018-07-10T11:47:47.332Z"),
            "pingMs" : NumberLong(0),
            "syncingTo" : "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017",
            "configVersion" : 1
        },
        {
            "_id" : 2,
            "name" : "mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 181,
            "optime" : {
                "ts" : Timestamp(1531223260, 1),
                "t" : NumberLong(1)
            },
            "optimeDurable" : {
                "ts" : Timestamp(1531223260, 1),
                "t" : NumberLong(1)
            },
            "optimeDate" : ISODate("2018-07-10T11:47:40Z"),
            "optimeDurableDate" : ISODate("2018-07-10T11:47:40Z"),
            "lastHeartbeat" : ISODate("2018-07-10T11:47:46.599Z"),
            "lastHeartbeatRecv" : ISODate("2018-07-10T11:47:47.283Z"),
            "pingMs" : NumberLong(0),
            "syncingTo" : "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017",
            "configVersion" : 1
        }
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1531223260, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1531223260, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}

Three-Node Replica Set: Primary, Secondary, Arbiter

If your replica set has an even number of members, add an arbiter to obtain a majority of votes in an election for primary. Arbiters do not require dedicated hardware.

ref:
https://docs.mongodb.com/v3.6/tutorial/add-replica-set-arbiter/

Issues

Change Replica Set Name

Start mongod without --replSet
Run db.system.replset.remove({_id: 'oldReplicaSetName'}) in MongoDB Shell
Start mongod with --replSet "newReplicaSetName"

ref:
https://stackoverflow.com/questions/33400607/how-do-i-rename-a-mongodb-replica-set

InvalidReplicaSetConfig: Our replica set configuration is invalid or does not include us

$ kubectl logs -f mongodb-rs0-0
REPL_HB [replexec-10] Error in heartbeat (requestId: 20048) to mongodb-rs0-2.mongodb-rs0:27017, response status: InvalidReplicaSetConfig: Our replica set configuration is invalid or does not include us

$ mongo mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local
rs0:OTHER> rs.status()
{
    "state" : 10,
    "stateStr" : "REMOVED",
    "uptime" : 631,
    "optime" : {
        "ts" : Timestamp(1531224140, 1),
        "t" : NumberLong(1)
    },
    "optimeDate" : ISODate("2018-07-10T12:02:20Z"),
    "ok" : 0,
    "errmsg" : "Our replica set config is invalid or we are not a member of it",
    "code" : 93,
    "codeName" : "InvalidReplicaSetConfig",
    "operationTime" : Timestamp(1531224140, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1531224790, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}

$ mongo mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local
rs0:PRIMARY> rs.conf() 
{
    "_id" : "rs0",
    "version" : 9,
    "protocolVersion" : NumberLong(1),
    "members" : [
        {
            "_id" : 0,
            "host" : "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : -1,
        "catchUpTakeoverDelayMillis" : 30000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("5b449c2f9269bb1a807a8cdf")
    }
}

The faulty member's state is REMOVED (it was once in a replica set but was subsequently removed) and shows Our replica set config is invalid or we are not a member of it. In fact, the real issue is that the removed node is sill in the list of replica set members.

You could just manually remove the broken node from the replica set on the primary, restart the node, and re-add the node.

$ mongo mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local
rs0:PRIMARY> rs.remove("mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017")

# restart the Pod
$ kubectl delete mongodb-rs0-2

$ mongo mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local
rs0:PRIMARY> rs.add("mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017")

ref:
https://stackoverflow.com/questions/47439781/mongodb-replica-set-member-state-is-other
https://docs.mongodb.com/v3.6/tutorial/remove-replica-set-member/
https://docs.mongodb.com/manual/reference/replica-states/

db.isMaster(): Does not have a valid replica set config

rs0:OTHER> db.isMaster()
{
    "hosts" : [
        "mongodb-rs0-0.mongodb-rs0.default.svc.cluster.local:27017",
        "mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local:27017",
        "mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local27017"
    ],
    "setName" : "rs0",
    "ismaster" : false,
    "secondary" : false,
    "info" : "Does not have a valid replica set config",
    "isreplicaset" : true,
    "maxBsonObjectSize" : 16777216,
    "maxMessageSizeBytes" : 48000000,
    "maxWriteBatchSize" : 100000,
    "localTime" : ISODate("2018-07-10T14:34:48.640Z"),
    "logicalSessionTimeoutMinutes" : 30,
    "minWireVersion" : 0,
    "maxWireVersion" : 6,
    "readOnly" : false,
    "ok" : 1,
    "operationTime" : Timestamp(1531232610, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1531232610, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}

You could just re-configure the replica set and only keep reachable members.

rs0:OTHER> oldConf = rs.conf()
rs0:OTHER> oldConf.members = [oldConf.members[0]]
rs0:OTHER> rs.reconfig(oldConf, {force: true})
rs0:PRIMARY> rs.add("mongodb-rs0-1.mongodb-rs0.default.svc.cluster.local:27017")
rs0:PRIMARY> rs.add("mongodb-rs0-2.mongodb-rs0.default.svc.cluster.local:27017")

ref:
https://docs.mongodb.com/v3.6/tutorial/reconfigure-replica-set-with-unavailable-members/

Change Replica Set Name

Stop mongod
Start mongod --bind_ip_all --port 27017 --dbpath /data/db without --replSet
Remove the old Replica Set name

use admin
db.getCollection('system.version').remove({_id: 'shardIdentity'})

use local
db.getCollection('system.replset').remove({_id: 'rs0'})

Start mongod --bind_ip_all --port 27017 --dbpath /data/db --shardsvr --replSet sh0

ref:
https://stackoverflow.com/questions/33400607/how-do-i-rename-a-mongodb-replica-set

Connect To A Replica Set Cluster

ref:
https://api.mongodb.com/python/current/examples/high_availability.html

Use Connection Pools

ref:
https://api.mongodb.com/python/current/faq.html#how-does-connection-pooling-work-in-pymongo