awscli: Command-line Interface for Amazon Web Services

awscli is a command-line interface for all Amazon Web Services (AWS).

ref:
https://github.com/aws/aws-cli

$ pip install awscli

$ aws configure

# sync files between S3 buckets
# http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
$ aws s3 sync s3://your_bucket_1/media s3://your_bucket_2/media \
--acl "public-read" \
--exclude "track_audio/*"

# http://docs.aws.amazon.com/cli/latest/reference/s3/rm.html
$ aws s3 rm s3://your_bucket_1/media/track_audio --recursive

ref:
http://docs.aws.amazon.com/cli/latest/index.html


$ s3cmd setacl s3://your_bucket_1/media --acl-public --recursive

ref:
http://s3tools.org/usage

Change index mappings with zero downtime using elasticsearch-py

Basically you can't change mappings (so-called "schema") in Elasticsearch. You may add fields free but changing field definitions (field types or analyzers) of mappings is impossible. One way or another, you need to create a new index.

Steps:

  • Create an alias my_index which points to the old index my_index_v1
  • Use my_index instead of my_index_v1 in your application
  • Create a new index my_index_v2 with new mappings
  • Transfer documents from old index to new index - a.k.a. reindex
  • Associate the alias my_index with index my_index_v2
  • Delete the old index my_index_v1
from datetime import datetime

from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

es_client = Elasticsearch(hosts=settings.ES_HOSTS)

# make sure that this alias doesn't conflict with any existing index name
alias = 'packer'

# CAUTION: if you have an index already, you should create an alias for it first
# es_client.indices.put_alias(index='your_current_index', name=alias)

old_indexes = list(es_client.indices.get_alias(alias).keys())
try:
    old_index = old_indexes[0]
except IndexError:
    old_index = None
else:
    if len(old_indexes) > 1:
        raise RuntimeError('Alias `{0}` points to {1} indexes that may cause error when writing data to `{0}`'.format(alias, len(old_indexes)))

new_index = '{}_{}'.format(alias, datetime.now().strftime('%Y%m%d%H%M%S%f'))

available_types = [TrackDoc, AlbumDoc]
for my_doc_type in available_types:
    # create a new index with new mappings
    my_doc_type.init(index=new_index)

if old_index:
    # transfer documents from old index to new index
    reindex(es_client, source_index=old_index, target_index=new_index)

    es_client.indices.update_aliases({
        'actions': [
            {'remove': {'index': old_index, 'alias': alias}},
            {'add': {'index': new_index, 'alias': alias}},
        ],
    })
else:
    es_client.indices.update_aliases({
        'actions': [
            {'add': {'index': new_index, 'alias': alias}},
        ],
    })

ref:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime
https://blog.codecentric.de/en/2014/09/elasticsearch-zero-downtime-reindexing-problems-solutions/
http://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex

An alias can point to multiple indexes, in that case, reading (searching) from the alias performs perfectly, writing (indexing) to the alias raises an exception: Alias [my_index] has more than one indices associated with it [[my_index_v1, my_index_v2]], can't execute a single index op.

It's not recommended to set the same alias for multiple indexes unless explicitly using a specific index for writing data.

ref:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multiple-indices.html

Create aliases

# list all indexes and their aliases
$ curl 'http://127.0.0.1:9200/_aliases'

# create an alias
$ curl -XPOST 'http://127.0.0.1:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "dps", "alias" : "packer" } }
    ]
}
'

# delete all indexes and aliases
$ curl -XDELETE 'http://127.0.0.1:9200/*/'

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-aliases.html

Update index settings

ref:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-put-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-update-settings.html
https://gist.github.com/nicolashery/6317643

django-elastic-transcoder

Integrating with Amazon Elastic Transcoder in Django via django-elastic-transcoder.

ref:
https://github.com/StreetVoice/django-elastic-transcoder
https://aws.amazon.com/elastictranscoder/

Glossary

pipeline
差不多就是 project 的概念
要先指定檔案輸入和輸出的 S3 bucket

preset
預先定義好的轉檔格式
例如轉成 128k 的 MP3 或 720p 的影片之類的

job
一個要轉檔的檔案就是一個 job
一個 job 可以有多個 outputs
也就是說你可以輸入一個 .wav
然後同時轉檔成 192k 的 .mp3 和 128k 但是只有 90 秒的 .mp3 兩個檔案

ref:
http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/create-job.html

Configuration (Amazon Elastic Transcoder)

比較重要的是在建立 Pipeline 的時候
要記得手動建立 Notifications
選擇 Create a New SNS Topic
分別建立以下的 SNS Topic:

  • whatever-transcode-on-progress
  • whatever-transcode-on-warning
  • whatever-transcode-on-complete
  • whatever-transcode-on-error

然後在 Amazon SNS 的 Topics 介面就會出現你剛剛新增的那些 Topics
https://ap-northeast-1.console.aws.amazon.com/sns/v2/home?region=ap-northeast-1#/topics

接著再分別幫每個 Topics 建立一個 HTTP Subscription

Topic ARN: arn:aws:sns:ap-northeast-1:123456789:whatever-transcode-on-progress
Protocol: HTTP
Endpoint: http://your_domain.com/dj_elastictranscoder/endpoint/

Amazon SNS 就會送一個 request 到你設定的 Endpoint URL
你再手動訪問一下 request body 裡頭的 SubscribeURL
這樣就完成驗證了

PS. 你可以用 http://request.lesschat.com/ 來測試

Amazon Elastic Transcoder 就會在不同的轉檔階段打一個 request 到你設定的 Endpoint URL

Usage

in settings.py

AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
AWS_REGION = 'ap-northeast-1'

in views.py

from dj_elastictranscoder.transcoder import Transcoder

# 這個路徑是相對於你在 pipeline 裡指定的 S3 bucket 的路徑,而不是完整的 URL
key = 'media/{}'.format(track.audio_file.name)

input_name = {
    'Key': key,
}

name, ext = os.path.splitext(key)
full_preview_filename = '{}_preview.mp3'.format(name)
short_preview_filename = '{}_short_preview.mp3'.format(name)

outputs = [
    {
        'Key': full_preview_filename,
        'PresetId': '1351620000001-300040',  # System preset: Audio MP3 - 128k
    },
    {
        'Key': short_preview_filename,
        'PresetId': '1351620000001-300040',  # System preset: Audio MP3 - 128k
        'Composition': [
            {
                'TimeSpan': {
                    'StartTime': '00:00:00.000',
                    'Duration': '00:01:30.000',
                },
            },
        ],
    },
]

transcoder = Transcoder(settings.AWS_TRANSCODER_PIPELINE_ID)
transcoder.encode(input_name, outputs)
transcoder.create_job_for_object(track)

outpus 的格式可以參考這個:
http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/create-job.html

in models.py

from django.dispatch import receiver
from dj_elastictranscoder.signals import transcode_oncomplete

@receiver(transcode_oncomplete)
def encode_complete(sender, job, message, **kwargs):
    full_preview_file_info = message['outputs'][0]
    short_preview_file_info = message['outputs'][1]

    do_your_shit()

你的 endpoint 收到的內容大概會長這樣:

{
  "Type" : "Notification",
  "MessageId" : "xxx",
  "TopicArn" : "xxx",
  "Subject" : "Amazon Elastic Transcoder has finished transcoding job xxx.",
  "Message" : "內容是 JSON 字串",
  "Timestamp" : "2015-09-22T07:12:56.448Z",
  "SignatureVersion" : "1",
  "Signature" : "xxx",
  "SigningCertURL" : "https://sns.ap-northeast-1.amazonaws.com/SimpleNotificationService-xxx.pem",
  "UnsubscribeURL" : "https://sns.ap-northeast-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=xxx"
}

其中 Message 差不多會長這樣:

{'input': {'key': 'xxx.wav'},
 'jobId': 'xxx',
 'outputs': [{'duration': 1,
               'id': '1',
               'key': 'xxx_track_short_preview.mp3',
               'presetId': '1351620000001-300040',
               'status': 'Complete',
               'statusDetail': 'The starting time plus the duration of a clip exceeded the length of the input file. Amazon Elastic Transcoder created an output file that is shorter than the specified duration.'}],
 'pipelineId': 'xxx',
 'state': 'COMPLETED',
 'userMetadata': {'output_name': 'track_short_preview'},
 'version': '2012-09-25'}