Speed up Python and Node.js builds on Travis CI

Speed up Python and Node.js builds on Travis CI

Travis CI's caching archives all directories listed in the configuration and uploads them to Amazon S3. Cached contents are available to any build on the repository, including Pull Requests. For Python and Node.js projects, you could cache both site-packages and node_modules directories in every Travis CI build.

Here is an example of .travis.yml:

sudo: false

language: python

python:
  - "2.7"

node_js: 4

cache:
  directories:
    - $HOME/.cache/pip
    - $HOME/virtualenv/python2.7.9/lib/python2.7/site-packages
    - node_modules

before_install:
  - pip install -U pip

install:
  - pip install -r requirements.txt
  - pip install coverage --ignore-installed
  - npm install

script:
  - coverage run manage.py test

In the case of mine, after applying these changes, the installation time of pip and npm reduces from 180 seconds to 5 seconds.

One thing should be mentioned here: Since we didn't specify any bin folder in the configuration (and I don't think that's necessary), any execution file that being installed by pip such as coverage or django-admin.py will not exist in subsequent builds. If you need those commands, you could just force install them by adding pip install some_package --ignore-installed.

References:

Caching Dependencies and Directories
https://docs.travis-ci.com/user/caching/

How to cache requirements for a Django project on Travis-CI?
http://stackoverflow.com/questions/19422229/how-to-cache-requirements-for-a-django-project-on-travis-ci

如何在 Travis CI 加快 Python 單元測試速度
https://tzangms.com/how-to-speed-up-python-unit-test-on-travis-ci/

ES6 Promise 筆記

new Promise() 接受一個 function 做為初始化參數
這個 function 又接受兩個參數 resolvereject
它們都是 function
resolve() 的作用是把 Promise object 的狀態從 pending 改成 resolved / fulfilled(成功)
reject() 則是把狀態從 pending 改成 rejected(失敗)

Promise object 產生之後
可以用 Promise object 的 then() 來指定 resolved 狀態的 callback function
然後 catch() 來指定 rejected 狀態的 callback function

每個 then()catch() 都會 return 一個新的 Promise objecct
如果你在 then() 裡 return 的不是 Promise object
它會隱式地用 Promise.resolve() 幫你轉換

const yourPromiseFunc = function (params) {
  return new Promise((resolve, reject) => {
    doYourAsyncShit(params, (err, data) => {
      if (err) {
        reject(err);
      } else {
        resolve(data);
      }
    });
  });
};

yourPromiseFunc('some parameter')
.then((data) => {
  console.log('success', data);
})
.catch((err) => {
  console.log('fail', data);
});

嚴格來說你只能在 then() 裡做三件事:

  • return 另外一個 Promise object
  • return 一個 synchronous value(字串、數字或其他 object)
  • throw 一個 Error()

如果你沒有顯式地 return 的話,JavaScript 會自動幫你 return undefined;

ref:
http://www.html5rocks.com/zh/tutorials/es6/promises/
http://es6.ruanyifeng.com/#docs/promise#基本用法

使用 reject() 而不是 throw

在 Promise 裡
如果可以用 reject(new Error('your error message')) 就用
不要用 throw new Error('your error message');

基本上就是用 reject() 來表示我們有意識地拋出的錯誤

ref:
http://liubin.org/promises-book/#not-throw-use-reject

把任意 object 轉換成 Promise object

Promise.resolve(xxx); 就是把 xxx 包裝成 Promise object
然後 resolve() 它
如果 xxx 已經是 Promise object 了,則會 clone 一個新的

Promise.resolve(42);

# equals to

new Promise((resolve) => {
  resolve(42);
});

因為所有 Promise 操作都一定是 async 的(這是規格裡規定的)
所以就算是 Promise.resolve(42); 的 42 也不會馬上被執行到

var promise = new Promise(function (resolve) {
    console.log("inner promise"); // 執行順序 1
    resolve(42);
});

promise.then(function(value) {
    console.log(value); // 執行順序 3
});

console.log("outer promise"); // 執行順序 2

ref:
http://liubin.org/promises-book/#chapter2-how-to-write-promise

Promise chains

一律使用 then().catch() 的方式分別指定 resolved 和 rejected 的 callback functions
通常會在 promise chains 的最後放一個 catch()

因為每個 then() 執行完都會 return 一個新的 Promise object(注意!是新的 Promise object,不是你最一開始 new 出來的那個)
所以你可以一直用很多個 then() 串起來
你也可以在 then() 裡 return 某個值作為下一個 then() 的參數

get('story.json')
.then(function(response) {
  return JSON.parse(response);
})
.then(function(data) {
  console.log(data);
});

ref:
http://liubin.org/promises-book/#then-return-new-promise

// 會照順序由上往下執行(除了 .catch() 之外)
Promise.resolve()
  .then(functionA)
  .then(functionB)
  .then(functionC)
  .catch(errorHandler)
  .then(finalFunction);

errorHandler 只能 catch 到 functionA、functionB 和 functionC 中拋出的錯誤

你可以在 then() 裡 return 某個值
他會被包裝成 Promise object(透過 Promise.resolve(某個值))然後傳給下一個 then()

如果你想在 functionC 裡同時使用 funtionA 和 functionB 的結果
你可以這麼寫

firstThingAsync()
  .then(function(result1) {
    return Promise.all([result1, secondThingAsync(result1)]);
  })
  .then(function(results) {
    // do something with results array: results[0], results[1]
  })
  .catch((err) => {
    doErrorHandling();
  });

Anti-patterns

// 這種寫法會造成你的 badAsyncCall() 得不到 newVar 的返回值
function badAsyncCall() {
    var promise = Promise.resolve();
    promise.then(function() {
        return newVar;
    });

    return promise;
}

// 應該要寫成
function badAsyncCall() {
    var promise = Promise.resolve();
    return promise.then(function() {
        return newVar;
    });
}

ref:
https://pouchdb.com/2015/05/18/we-have-a-problem-with-promises.html
http://www.datchley.name/promise-patterns-anti-patterns/

等到所有 promises 都執行完才執行某個動作

你可以用 Promise.all()

Promise.all([promise1, promise2, promise3])
.then((results) => {
  // 這裡會在 promise1, promise2, promise3 的狀態都是 fulfilled 時執行
  // results 的順序跟 .all() 的順序一定會是一致的
  // results[0] 就是 promise1 的回傳值,results[1] 則是 promise2,以此類推
  // 你也可以寫成 .then(([data1, data2, data3]) => {}),不過 Node.js v4.3 還不支援這個語法就是了
})
.catch((err) => {
  // 這裡會在任一個 promise 變成 rejected 時執行
});

ref:
http://www.datchley.name/es6-promises/
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise/all
http://liubin.org/promises-book/#ch2-promise-all

AWS Lambda cookbook

AWS Lambda cookbook

AWS Lambda is an event-driven service that you can upload your code to it and run those code on-demand without having your own servers.

ref:
http://aws.amazon.com/lambda/
http://docs.aws.amazon.com/lambda/latest/dg/limits.html

API Gateway 就是 URL routing
Lambda 則是那些 route (endpoint) 對應的 handler
如果你是用 event 或 schedule 的方式呼叫 Lambda function 的話
可以不用 API Gateway

AWS Lambda 有兩種 invocation type
一是 RequestResponse,同步(例如綁定 API Gateway 和你在 Lambda Management Console 操作的時候)
二是 Event,非同步

Runtimes

AWS Lambda supports the following runtime versions:

  • nodejs (Node v0.10)
  • nodejs4.3
  • java
  • python

ref:
http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

Node.js

const aws = require('aws-sdk');

exports.handle = (event, context, callback) => {
  doYourShit();
  callback(null, 'DONE');
};

每個 Lambda function 會接收三個參數 eventcontextcallback

event 是從外部的 input
可能是來自 S3 object event、DynamoDB stream 或是由 API Gateway POST 進來的 JSON payload

context 則會包含當前這個 Lambda fuction 的一些 metadata
例如 context.getRemainingTimeInMillis()

callback 參數只有 Node.js runtime v4.3 才支援
v0.10 的話得用 context.succeed()context.fail()context.done()
不過誰他媽還在用 Node.js v0.10

ref:
http://docs.aws.amazon.com/lambda/latest/dg/programming-model.html
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-handler.html
http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html
http://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

Calling another Lambda function in a Lambda function.

要注意的是
你的 Lambda function 的 role 得要有 invoke 其他 Lambda function 的權限才行

const util = require('util');

const aws = require('aws-sdk');

const params = {
  FunctionName: 'LambdaBaku_syncIssue',
  InvocationType: 'Event', // means asynchronous execution
  Payload: JSON.stringify({ issue_number: curatedIssue.number }),
};

lambda.invoke(params, (err, data) => {
  if (err) {
    console.log('FAIL', params);
    console.log(util.inspect(err));
  } else {
    console.log(data);
  }
});

ref:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html
http://stackoverflow.com/questions/31714788/can-an-aws-lambda-function-call-another

完整的程式碼放在 GitHub 上
https://github.com/CodeTengu/lambdabaku

Users and Roles

如果你是用 apex 來管理 Lambda functions 的話
確保你用的 AWS credential (User) 擁有 AWSLambdaFullAccessAWSLambdaRole 這兩個 permissions

以 project 為單位建立 Role 即可
例如 lambdabaku_role
你可以在 IAM Management Console 找到那些你建立的 roles
基本上用 Basic execution role 就夠了
反正之後可以隨時修改 Role 的 permission / policy
Lambda function 屬於哪個 VPC 是額外指定的
跟 Role 沒有關係
也就是說你用 Basic execution role 還是可以支援 VPC

如果想在 Lambda function 裡存取 DynamoDB
要記得在 Role 裡新增對應的設定

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1428341300017",
            "Effect": "Allow",
            "Action": [
                "dynamodb:*"
            ],
            "Resource": [
                "arn:aws:dynamodb:ap-northeast-1:004615714446:table/CodeTengu_Preference",
                "arn:aws:dynamodb:ap-northeast-1:004615714446:table/CodeTengu_WeeklyIssue",
                "arn:aws:dynamodb:ap-northeast-1:004615714446:table/CodeTengu_WeeklyPost"
            ]
        }
    ]
}

Scheduled Events

ref:
http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html

API Gateway

單純一點的話
Security 可以選 Open with access key
然後到 API Gateway 介面的 API Keys 底下新增一組 access key
然後分配一個 API stage 給它

使用的時候在 HTTP header 加上 x-api-key: YOUR_API_KEY 即可

ref:
http://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-api-keys.html

Related Projects

ref:
https://github.com/serverless/serverless
https://github.com/apex/apex
https://github.com/claudiajs/claudia
https://github.com/garnaat/kappa
https://github.com/Miserlou/Zappa
https://github.com/nficano/python-lambda

淺析 serverless 架構與實作
http://abalone0204.github.io/2016/05/22/serverless-simple-crud/

Deploy Lambda Functions via apex

$ curl https://raw.githubusercontent.com/apex/apex/master/install.sh | sh

$ apex deploy
$ apex invoke syncPublishedIssues --logs
$ echo -n '{"issue_number": 43}' | apex invoke syncIssue --logs

ref:
https://github.com/apex/apex
http://apex.run/

AWS DynamoDB notes

AWS DynamoDB is a fully managed key-value store (also document store) NoSQL database as a service provided by Amazon Web Services. Its pricing model is that you only pay for the throughput (read and write) you use instead of the storage usage and the running hours of database instances.

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html
http://www.slideshare.net/AmazonWebServices/design-patterns-using-amazon-dynamodb

Glossary

DynamoDB is schema-less.

  • table: a table is a collection of items.
  • item: an item is a collection of attributes (key-value pairs).
  • attribute: attribute is similar to fields or columns in other databases.
  • primary key: one or two attributes that can uniquely identify every item in a table.
    • partition key (aka hash key): a simple primary key, composed of one attribute.
    • partition key and sort key (aka range key): a composite primary key, composed of two attributes.

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html

Global Secondary Index (GSI)

secondary index 指的是除了 primary key 之外的第二組 key
可以有很多組 secondary index
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

GSI 可以用在是 partition key 或 partition + sort key 的 table
GSI 跟 primary key 一樣可以 simple 或是 composite 的
GSI 可以隨時增減

如果你不需要 strong consistency 或個別 partition 的資料量大於 10GB
那就用 GSI

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
http://iamgarlic.blogspot.tw/2015/01/amazon-dynamodb-global-secondary-index.html

Local Secondary Index (LSI)

LSI 只能用在是 partition + sort key 的 table
LSI 必須用原本的 partition key 搭配其他 attribute 做為新的 partition + sort key(LSI 只會是 composite 的)
LSI 只能在建立 table 的時候定義

ref:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html
http://iamgarlic.blogspot.tw/2015/01/amazon-dynamodb-local-secondary-index.html

Query and Scan

能不用 scan 就不用
畢竟這個操作就是去掃 table 裡的所有 item

primary key 和 local secondary index 只能在建立 table 時指定
一旦建立就不能改了
但是 global secondary index 就沒有這個限制

如果是用 partition + sork key 當 primary key
get 的時候要同時給 partition key 和 sort key
query 的時候可以只給 partition key 而 sort key 可給可不給(但是 partition key 一定要給)

無論是當 primary key、GSI 或 LSI
只要是 partition key 的 attribute 一律只能使用 = 來 query
該 attribute 沒有 rich query 的能力(就是 >, <, between, contains 那些條件)
sort key 才會有 rich query

Best Practices
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html

Choosing a Partition Key
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html

Querying DynamoDB by date
http://stackoverflow.com/questions/14836600/querying-dynamodb-by-date

Pick an item randomly
http://stackoverflow.com/questions/10666364/aws-dynamodb-pick-a-record-item-randomly

ref:
https://www.uplift.agency/blog/posts/2016/03/clearcare-dynamodb
https://medium.com/building-timehop/one-year-of-dynamodb-at-timehop-f761d9fe5fa1#.3g97b3lqy

Commands

DynamoDB is schema-less, so that you can only define keys you need for specifying primary key or local secondary index when creating table.

# 可以用 project name 作為 table name 的 prefix
# 之後可以隨時修改 read / write capacity units
$ aws dynamodb create-table \
--table-name CodeTengu_Preference \
--attribute-definitions AttributeName=name,AttributeType=S \
--key-schema AttributeName=name,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

$ aws dynamodb create-table \
--table-name CodeTengu_WeeklyIssue \
--attribute-definitions AttributeName=number,AttributeType=N AttributeName=publication,AttributeType=S AttributeName=publishedAt,AttributeType=N \
--key-schema AttributeName=number,KeyType=HASH \
--global-secondary-indexes IndexName=publication_published_at,KeySchema='[{AttributeName=publication,KeyType=HASH},{AttributeName=publishedAt,KeyType=RANGE}]',Projection='{ProjectionType=ALL}',ProvisionedThroughput='{ReadCapacityUnits=5,WriteCapacityUnits=5}' \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

$ aws dynamodb create-table \
--table-name CodeTengu_WeeklyPost \
--attribute-definitions AttributeName=issueNumber,AttributeType=N AttributeName=id,AttributeType=N  AttributeName=categoryCode,AttributeType=S \
--key-schema AttributeName=issueNumber,KeyType=HASH AttributeName=id,KeyType=RANGE \
--global-secondary-indexes IndexName=categoryCode_id,KeySchema='[{AttributeName=categoryCode,KeyType=HASH},{AttributeName=id,KeyType=RANGE}]',Projection='{ProjectionType=ALL}',ProvisionedThroughput='{ReadCapacityUnits=5,WriteCapacityUnits=5}' \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

ref:
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/create-table.html
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/update-table.html

$ aws dynamodb put-item \
--table-name CodeTengu_Preference \
--item file://fixtures/curated_api_config.json \
--return-consumed-capacity TOTAL

# fixtures/curated_api_config.json
{
  "name": { "S": "curated_api_config" },
  "apiKey": { "S": "xxx" }
}

ref:
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/put-item.html

$ aws dynamodb get-item \
--table-name CodeTengu_WeeklyIssue \
--key '{"number": {"N": "42"}}'

ref:
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/get-item.html

Usage

你應該用 AWS.DynamoDB.DocumentClient
而不是直接用 AWS.DynamoDB

const AWS = require('aws-sdk');

const dynamodb = new AWS.DynamoDB({ apiVersion: '2012-08-10', region: 'ap-northeast-1' });
const dynamodbClient = new AWS.DynamoDB.DocumentClient({ service: dynamodb });

const params = {
  RequestItems: {
    CodeTengu_Preference: {
      Keys: [
        { name: 'xxx' },
      ],
    },
  },
};

dynamodbClient.batchGet(params, (err, data) => {
  if (err) {
    console.log('fail');
    console.log(err);
  } else {
    console.log('success');
    console.log(data);
  }
});

ref:
http://aws.amazon.com/sdk-for-node-js/
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html

完整的程式碼放在 GitHub 上
https://github.com/CodeTengu/lambdabaku

nvm: Node.js Version Manager

Install nvm

$ curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.29.0/install.sh | bash

ref:
https://github.com/creationix/nvm

Install Node.js

$ nvm install --lts

$ nvm install 4.8.1 && \
  nvm use 4.8.1 && \
  nvm alias default 4.8.1

# 列出所有安裝的版本
$ nvm ls

# 列出總共有哪些版本可以安裝
$ nvm ls-remote

Install Packages by npm (Node Packaged Manager)

# 會安裝到當前目錄的 node_modules 目錄底下
# 適合 node.js project
$ npm install express

# 會安裝到 ~/.nvm/vx.x.x/lib
$ npm install -g coffee-script

$ npm update -g bower

$ npm uninstall -g grunt-cli

# 查看安裝了哪些 lib
# 類似 pip list
$ npm ls
$ npm ls -g
$ npm ls --depth=0

$ npm search underscore

ref:
http://book.nodejs.tw/zh-tw/node_npm.html
https://npmjs.org/doc/