mitmproxy: proxy any network traffic through your local machine

mitmproxy: proxy any network traffic through your local machine

mitmproxy is your swiss-army knife for interactive HTTP/HTTPS proxy. In fact, it can be used to intercept, inspect, modify and replay web traffic such as HTTP/1, HTTP/2, WebSockets, or any other SSL/TLS-protected protocols.

Moreover, mitproxy has a powerful Python API offers full control over any intercepted request and response.

ref:
https://mitmproxy.org/
https://docs.mitmproxy.org/stable/

Concept

ref:
https://docs.mitmproxy.org/stable/concepts-howmitmproxyworks/

Installation

$ brew install mitmproxy

$ mitmproxy --version
Mitmproxy: 4.0.4
Python:    3.7.0
OpenSSL:   OpenSSL 1.0.2p  14 Aug 2018
Platform:  Darwin-18.0.0-x86_64-i386-64bit

ref:
https://docs.mitmproxy.org/stable/overview-installation/

Configuration

Make your computer become the man of man-in-the-middle attack.

macOS

$ ipconfig getifaddr en0
192.168.0.128

$ mitmproxy -p 8888
# or
$ mitmweb -p 8888
$ open http://127.0.0.1:8081/

Flow List keys:

  • ?: Show help
  • q: Exit the current view
  • f: Set view filter
  • r: Replay this flow
  • i: Set intercept filter
  • hjkl or arrow: Move left/down/up/right
  • enter: Select

Flow Details keys:

  • tab: Select next
  • m: Set flow view mode
  • e: Edit this flow (request or response)
  • a: Accept this intercepted flow

ref:
https://docs.mitmproxy.org/stable/tools-mitmproxy/
https://github.com/mitmproxy/mitmproxy/blob/master/mitmproxy/tools/console/defaultkeys.py

iOS

  • Go to Settings > Wi-Fi > Your Wi-Fi > Configure Proxy
    • Select Manual, enter the following values:
      • Server: 192.168.0.128
      • Port: 8888
      • Authentication: unchecked
  • Open http://mitm.it/ on Safari
    • Install the corresponding certificate for your device
  • Go to Settings > General > About > Certificate Trust Settings
    • Turn on the mitmproxy certificate
  • Open any app you want to watch

ref:
https://docs.mitmproxy.org/stable/concepts-certificates/

Usage

The most exciting feature is you could alter any request and response using a Python script, mitmdump -s!

ref:
https://docs.mitmproxy.org/stable/tools-mitmdump/
https://github.com/mitmproxy/mitmproxy/tree/master/examples

Deal With Certificate Pinning

You can use your own certificate by passing the --certs example.com=/path/to/example.com.pem option to mitmproxy. Mitmproxy then uses the provided certificate for interception of the specified domain.

The certificate file is expected to be in the PEM format which would roughly looks like this:

-----BEGIN PRIVATE KEY-----
<private key>
-----END PRIVATE KEY-----

-----BEGIN CERTIFICATE-----
<cert>
-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----
<intermediary cert (optional)>
-----END CERTIFICATE-----
$ mitmproxy -p 8888 --certs example.com=example.com.pem

ref:
https://docs.mitmproxy.org/stable/concepts-certificates/#using-a-custom-server-certificate

Redirect Requests To Your Local Development Server

# redirect_to_localhost.py
from mitmproxy import ctx
from mitmproxy import http

REMOTE_HOST = 'api.example.com'
DEV_HOST = '192.168.0.128'
DEV_PORT = 8000

def request(flow: http.HTTPFlow) -> None:
    if flow.request.pretty_host in [REMOTE_HOST, DEV_HOST]:
        ctx.log.info('=== request')
        ctx.log.info(str(flow.request.headers))
        ctx.log.info(f'content: {str(flow.request.content)}')

        flow.request.scheme = 'http'
        flow.request.host = DEV_HOST
        flow.request.port = DEV_PORT

def response(flow: http.HTTPFlow) -> None:
    if flow.request.pretty_host == DEV_HOST:
        ctx.log.info('=== response')
        ctx.log.info(str(flow.response.headers))
        if flow.response.headers.get('Content-Type', '').startswith('image/'):
            return
        ctx.log.info(f'body: {str(flow.response.get_content())}')

ref:
https://discourse.mitmproxy.org/t/reverse-mode-change-request-host-according-to-the-sni-https/466

You could use negative regex with --ignore-hosts to only watch specific domains. Of course, you are still able to blacklist any domain you don't want: --ignore-hosts 'apple.com|icloud.com|itunes.com|facebook.com|googleapis.com|crashlytics.com'.

Currently, changing the Host server for HTTP/2 connections is not allowed, but you could just disable HTTP/2 proxy to solve the issue if you don't need HTTP/2 for local development.

$ mitmdump -p 8888 \
--certs example.com=example.com.pem \
-v --flow-detail 3 \
--ignore-hosts '^(?!.*example\.com)' \
--no-http2 \
-s redirect_to_localhost.py

ref:
https://stackoverflow.com/questions/29414158/regex-negative-lookahead-with-wildcard

碼天狗週刊 第 135 期 @vinta - Kubernetes, Python, MongoDB

碼天狗週刊 第 135 期 @vinta - Kubernetes, Python, MongoDB

本文同步發表於 CodeTengu Weekly - Issue 135

The incomplete guide to Google Kubernetes Engine

根據前陣子搗鼓 Kubernetes 的心得寫了一篇文章,跟大家分享一下,希望有幫助。內容包含概念介紹、建立 cluster、新增 node pools、部署 ConfigMap、Deployment with LivenessProbe/ReadinessProbe、Horizontal Pod Autoscaler、Pod Disruption Budget、StatefulSet、DaemonSet,到說明 Service 和 Ingress 的關係,以及 Node Affinity 與 Pod Affinity 的應用等。

順帶一提,就算只是架來玩玩,建議大家可以直接在 Google Kubernetes Engine 開一個 preemptible(類似 AWS 的 Spot Instances)的 k8s cluster,價格超便宜,所以就不要再用 minikube 啦。不過現在連 Amazon 也有自己的 managed Kubernetes 了,雖然目前公司是用 GCP,但是還是比較懷念 AWS 啊~

Fluent Python

雖然 Python 也是寫了一陣子了,但是每次讀這本書還是能夠學到不少。真心推薦。

當初學 Python 讀的是另一本 Learning Python,查了一下,哇都出到第五版了。

延伸閱讀:

A deep dive into the PyMongo MongoDB driver

Replica Set 通常是 MongoDB 的標準配置(再來就是 Sharding 了),這個 talk 詳細地說明了 Replica Set 是如何應對 service discovery 以及 PyMongo 和 Replica Set 之間是怎麼溝通的。

延伸閱讀:

Let's talk about usernames

就像我們之前提到過很多次的 Falsehoods 系列,這篇文章也是一直不厭其煩地告訴大家,幾乎每個系統、每個網站都會有的東西:username,其實沒有你以為的那麼簡單。大家感受一下。

作者也提到一個很重要的 The Tripartite Identity Pattern,把所謂的 ID 分成以下三種:

  1. System-level identifier, suitable for use as a target of foreign keys in our database
  2. Login identifier, suitable for use in performing a credential check
  3. Public identity, suitable for displaying to other users

而不要想用同一個 identifier 搞定所有用途。

Web Architecture 101

這篇文章淺顯易懂地解釋了一個現代的 web service 通常會具備的各項元件。不過說真的,如果你今天是一個初入門的後端工程師,你究竟得花多少時間和心力才能摸清楚這篇文章提到的東西?更別提那些更加底層的知識了,喔,這篇文章甚至也還沒提到 DevOps 的事情呢。就像之前讀到的 Will Kubernetes Collapse Under the Weight of Its Complexity?,總覺得整個態勢發展到現在,對新手(甚至是我們這種普通的 1x 工程師)似乎不是很友善啊。

延伸閱讀:

Integrate with Google Cloud API in Python

Integrate with Google Cloud API in Python

google-cloud, Python idiomatic clients for Google Cloud Platform services. There is an older Python library also officially supported by Google, google-api-python-client, which is in maintenance mode.

ref:
https://github.com/googleapis/google-cloud-python
https://github.com/googleapis/google-api-python-client

Installation

$ pip install google-cloud

# you could also only install specific components
$ pip install google-cloud-storage

ref:
https://pypi.org/search/?q=google+cloud

Google Cloud Storage

It is worth noting that, initializing storage.Client() is a blocking call.

ref:
https://googleapis.github.io/google-cloud-python/latest/storage/buckets.html
https://cloud.google.com/storage/docs/reference/libraries

Upload From String

from google.cloud.storage.bucket import Bucket
from google.cloud.storage.blob import Blob

def upload_from_string(bucket_id, content, filename, content_type):
    client = storage.Client()
    bucket = Bucket(client, bucket_id)
    blob = Blob(filename, bucket)
    blob.upload_from_string(content, content_type)

Upload From An URL

from google.cloud import storage
import requests

def upload_from_url(bucket_id, filename, url):
    client = storage.Client()
    session = requests.Session()
    with session.get(url, stream=True) as response:
        bucket = client.get_bucket(bucket_id)
        blob = bucket.blob(filename)
        blob.upload_from_file(response.raw, content_type=response.headers.get('Content-Type'))

Update A File's Metadata

from google.cloud import storage

def update_metadata(bucket, filepath, new_metadata):
    bucket = task.storage_client.get_bucket(bucket)
    blob = bucket.get_blob(filepath)
    blob.metadata = {**blob.metadata, **new_metadata} if blob.metadata else new_metadata
    blob.patch()

new_metadata = {
    'Link': '<https://api.example.com/users/57c16f5bb811055b66d8ef46>; rel="user"',
}

ref:
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/1185

Copy A File

from google.cloud import storage

def copy_file(source_bucket, source_name, destination_bucket, destination_name):
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(source_bucket)
    source_file = source_bucket.blob(source_name)
    destination_bucket = storage_client.get_bucket(destination_bucket)
    destination_file = source_bucket.copy_blob(source_file, destination_bucket, destination_name)
    return destination_file

file_ext_mapping = {
    'image/jpeg': 'jpg',
    'video/mp4': 'mp4',
}
file_ext = file_ext_mapping[original_message.media.mimetype]
source_name = f'messages/{original_message.id}.{file_ext}'
destination_name = f'messages/{new_message.id}.{file_ext}'

copy_file(
    source_bucket='asia.uploads.example.com',
    source_name=source_name,
    destination_bucket='asia.uploads.example.com',
    destination_name=destination_name,
)

ref:
https://cloud.google.com/storage/docs/json_api/v1/objects/copy
https://cloud.google.com/storage/docs/renaming-copying-moving-objects#storage-copy-object-python

Copy A Folder With Batch Operations

from google.cloud import storage

def copy_files(source_bucket_name, source_name_prefix, destination_bucket_name, fix_destination_name_func=None):
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(source_bucket_name)
    destination_bucket = storage_client.get_bucket(destination_bucket_name)
    blobs = source_bucket.list_blobs(prefix=source_name_prefix)

    # YOU CANNOT DO THIS
    # blobs is a HTTP iterator
    # blobs.num_results always return 0
    # if not blobs.num_results:
    #     raise ValueError(f'No objects matched: gs://{source_bucket.name}/{source_name_prefix}')

    with storage_client.batch():
        for source_blob in blobs:
            destination_name = fix_destination_name_func(source_blob.name) if callable(fix_destination_name_func) else source_blob.name
            source_bucket.copy_blob(source_blob, destination_bucket, destination_name)
    return True

source_bucket_name = 'asia.uploads.example.com'
destination_bucket_name = 'asia.contents.example.com'
source_name_prefix = 'media/123'

copy_files(
    source_bucket_name=source_bucket_name,
    destination_bucket_name=destination_bucket_name,
    source_name_prefix=source_name_prefix,
    fix_destination_name_func=lambda source_name: source_name.replace(source_name_prefix, 'forum-posts'),
)

equals to

$ gsutil cp -r "gs://asia.uploads.example.com/media/123/*" "gs://asia.contents.example.com/"

ref:
https://cloud.google.com/storage/docs/listing-objects

batch() does not guarantee the order of executions, so do not mix different type of calls in the same batch. For instance, the batch should not be a mixture of "copy a.txt" then delete a.txt.

ref:
https://googlecloudplatform.github.io/google-cloud-python/latest/storage/batch.html

Upload A File Directly To A Bucket

We first need to generate a signed upload URL and we could upload the file to the URL.

import base64
import datetime
import time

from oauth2client.client import GoogleCredentials
import yarl

credentials = GoogleCredentials.get_application_default()

def signurl(method, url, content_type=None, expires_at=None, md5sum=None, meta=None):
    method, is_resumable = method.upper(), False
    if method in ['RESUMABLE']:
        method, is_resumable = 'POST', True
    path = yarl.URL(url).path

    def signature():
        def _signature_parts():
            def _meta():
                for key, value in (meta or {}).items():
                    yield 'x-goog-meta-{key}:{value}'.format(key=key, value=value)
                if is_resumable:
                    yield 'x-goog-resumable:start'

            yield method
            yield md5sum or ''
            # we need to use `curl -H 'content-type:'` to upload if we sign an empty content-type
            yield content_type or 'application/octet-stream'
            yield str(int(time.mktime(expires_at.timetuple()))) if expires_at else ''
            yield from sorted(_meta())
            yield path

        _, signature = credentials.sign_blob('\n'.join(_signature_parts()))
        return base64.b64encode(signature).decode('utf-8')

    def params():
        yield 'GoogleAccessId', credentials.service_account_email
        if expires_at:
            yield 'Expires', int(time.mktime(expires_at.timetuple()))
        yield 'Signature', signature()

    return str(yarl.URL(url).with_query(**dict(params())))

signurl(
    method='RESUMABLE',
    url='https://storage.googleapis.com/asia.uploads.example.com/media/your-filename.ext'
    expires_at=datetime.datetime.utcnow() + datetime.timedelta(hours=24),
)
$ curl -v -X 'POST' \
-H 'content-type: application/octet-stream' \
-H 'x-goog-resumable:start' \
-d '' 'THE_SIGNED_UPLOAD_URL'

$ curl -v -X PUT \
--upload-file whatever.mp4 \
THE_URL_FROM_LOCATION_HEADER_OF_THE_ABOVE_RESPONSE

ref:
https://cloud.google.com/storage/docs/access-control/signed-urls#signing-resumable
https://cloud.google.com/storage/docs/uploading-objects
https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
https://cloud.google.com/storage/docs/xml-api/resumable-upload

Enable CORS For A Google Cloud Storage Bucket

$ gsutil cors get gs://your_bucket_name

$ cat cors.json
[
  {
    "origin": ["*"],
    "responseHeader": ["Content-Type", "x-goog-resumable:start"],
    "method": ["GET", "PUT", ""]
  }
]
$ gsutil cors set cors.json gs://your_bucket_name

ref:
https://cloud.google.com/storage/docs/gsutil/commands/cors
https://medium.com/imersotechblog/upload-files-to-google-cloud-storage-gcs-from-the-browser-159810bb11e3
http://andrewvos.com/uploading-files-directly-to-google-cloud-storage-from-client-side-javascript

Pipenv and Pipfile: The officially recommended Python packaging tool

Pipenv and Pipfile: The officially recommended Python packaging tool

You no longer need to use pip and virtualenv separately. Use pipenv instead.

ref:
https://github.com/pypa/pipenv
https://pipenv.kennethreitz.org/en/latest/

Install

$ pip install pipenv

ref:
https://pipenv.kennethreitz.org/en/latest/install/#installing-pipenv

Usage

$ pyenv global 3.7.4

# initialize project virtualenv with a specific Python version
# automatically generate both Pipfile and Pipfile.lock from requirements.txt if it exists
$ pipenv --three

$ cd /path/to/project-contains-Pipfile
$ pipenv install

$ pipenv install pangu
$ pipenv install -r requirements.txt

# install packages to dev-packages
$ pipenv install --dev \
autopep8 \
flake8 \
flake8-bandit \
flake8-blind-except \
flake8-bugbear \
flake8-builtins \
flake8-comprehensions \
flake8-debugger \
flake8-mutable \
flake8-pep3101 \
flake8-print \
flake8-string-format \
ipdb \
jedi \
mypy \
pep8-naming \
ptvsd \
pylint \
pylint-celery \
pylint-common \
pylint-flask \
pytest \
watchdog

# switch your shell environment to project virtualenv
$ pipenv shell
$ exit

# uninstall everything
$ pipenv uninstall --all

# remove project virtualenv
$ pipenv --rm

ref:
https://pipenv.kennethreitz.org/en/latest/install/

Example Pipfile

[[source]]
url = "https://pypi.python.org/simple" 
verify_ssl = true 
name = "pypi" 

[requires] 
python_version = "3.7"

[packages] 
celery = "==4.2.1"
flask = "==1.0.2"
requests = ">=2.0.0" 

[dev-packages] 
flake8 = "*" 
ipdb = "*" 
pylint = "*" 

[scripts]
web = "python -m flask run -h 0.0.0.0"
worker = "celery -A app:celery worker --pid= -l info -E --purge"
scheduler = "celery -A app:celery beat -l info --pid="
shell = "flask shell"

ref:
https://pipenv.kennethreitz.org/en/latest/basics/#example-pipfile-pipfile-lock

Print traceback call stack in Python

Print traceback call stack in Python

Extract Traceback From An Exception

try:
    do_shit()
except Exception as exc:
    print('----- start -----')
    tb = _hx_e.__traceback__
    raise RuntimeError().with_traceback(tb)
    print('----- end -----')

reef:
https://stackoverflow.com/questions/11414894/extract-traceback-info-from-an-exception-object

Print Traceback Without Raising An Exception

print('----- start -----')
import traceback; traceback.print_stack()
print('----- end -----')

# or

import traceback
for line in traceback.format_stack():
    print(line.strip())

The result would be like:

>>> import qingcloud.iaas
>>> conn = qingcloud.iaas.connect_to_zone('pek2', '123', '456')
>>> conn.describe_instances(limit=1)
----- start -----
  File "<stdin>", line 1, in <module>
  File "qingcloud/iaas/connection.py", line 214, in describe_instances
    return self.send_request(action, body)
  File "qingcloud/iaas/connection.py", line 42, in send_request
    resp = self.send(url, request, verb)
  File "qingcloud/conn/connection.py", line 245, in send
    request.authorize(self)
  File "qingcloud/conn/connection.py", line 156, in authorize
    connection._auth_handler.add_auth(self, **kwargs)
  File "qingcloud/conn/auth.py", line 118, in add_auth
    import traceback; traceback.print_stack()
----- end -----

ref:
https://stackoverflow.com/questions/3925248/print-python-stack-trace-without-exception-being-raised