Helm: the package manager for Kubernetes

Helm: the package manager for Kubernetes

Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources. Think of it like apt for Kubernetes.

ref:
https://github.com/kubernetes/helm

Install

Install Helm client, helm.

$ brew install kubernetes-helm

ref:
https://docs.helm.sh/using_helm/#installing-helm

Install Helm server, tiller.

On Kubernetes v1.8+, Role-Based Access Control (RBAC) is enabled by default. So you must create a ClusterRoleBinding which specifies a ClusterRole and a ServiceAccount for tiller.

# helm/rbac.yaml
kind: ServiceAccount
apiVersion: v1
metadata:
  name: tiller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system
$ kubectl apply -f helm/rbac.yaml && \
  helm init --service-account tiller

$ kubectl get all --namespace kube-system

# uninstall tiller
$ helm reset

ref:
https://docs.helm.sh/using_helm/#role-based-access-control
https://docs.helm.sh/using_helm/#installing-tiller

Usage

# show both the client and server version to make sure installation is correct
$ helm version
Client: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"}

# install a Chart
$ helm install \
--name cert-manager \
--namespace kube-system \
stable/cert-manager

# install a Chart without RBAC
$ helm install \
--name cert-manager \
--namespace kube-system \
--set rbac.create=false \
stable/cert-manager

# list Releases
$ helm ls
$ helm ls --all cert-manager

# delete a Release
$ helm del --purge cert-manager

ref:
https://docs.helm.sh/helm/#helm

Charts

In my humble opinion, it might be more flexible and realistic just to copy YAML files from a Helm chart and maintain them, instead of running a helm install.

ref:
https://github.com/kubernetes/charts

Play with GitHub Archive dataset on BigQuery

Play with GitHub Archive dataset on BigQuery

Google BigQuery is a web service that lets you do interactive analysis of very massive datasets - analyzing billions of rows in seconds.

ref:
https://www.githubarchive.org/#bigquery
https://bigquery.cloud.google.com/table/githubarchive:month.201612

See also:
http://ghtorrent.org/

Show repository informations (1)

WITH repo_info AS (
  SELECT repo.id AS id, repo.name AS name, JSON_EXTRACT_SCALAR(payload, '$.pull_request.base.repo.description') AS description
  FROM `githubarchive.month.2017*`
  -- FROM `githubarchive.year.2016`
  -- FROM `githubarchive.year.*`
  WHERE type = "PullRequestEvent"
)

SELECT repo_info.name, ANY_VALUE(repo_info.description) AS description
FROM repo_info
WHERE
  repo_info.description IS NOT NULL AND
  repo_info.description != ""
GROUP BY repo_info.name
ORDER BY repo_info.name

ref:
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#json-functions
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#any_value

Show repository informations (2)

WITH repo_info AS (
  SELECT repo.id AS id, repo.name AS name, JSON_EXTRACT_SCALAR(payload, '$.description') AS description
  FROM `githubarchive.month.201708`
  WHERE type = "CreateEvent"
)

SELECT repo_info.name, ANY_VALUE(repo_info.description) AS description
FROM repo_info
WHERE
  repo_info.description IS NOT NULL AND
  repo_info.description != ""
GROUP BY repo_info.name
ORDER BY repo_info.name

Show repository informations (3)

SELECT name, description
FROM `ghtorrent-bq.ght_2017_04_01.projects`
WHERE
  forked_from IS NULL AND
  description IS NOT NULL AND
  description != ""

Show starred repositories by a specific user

You must use WatchEvent for starring a repo:
https://developer.github.com/v3/activity/events/types/#watchevent

SELECT repo.name, created_at
FROM TABLE_QUERY([githubarchive:month], 'LEFT(table_ID,4) IN ("2017","2016","2015")') 
WHERE type = "WatchEvent" AND actor.login = 'vinta'
GROUP BY repo.name, created_at
ORDER BY created_at DESC

Show starred repositories per user who has 10+ starred repositories

WITH stars AS (
     SELECT DISTINCT actor.login AS user, repo.name AS repo
     FROM `githubarchive.month.2017*`
     WHERE type="WatchEvent"
),
repositories_stars AS (
     SELECT repo, COUNT(*) as c FROM stars GROUP BY repo
     ORDER BY c DESC
     LIMIT 1000
),
users_stars AS (
    SELECT user, COUNT(*) as c FROM  stars
    WHERE repo IN (SELECT repo FROM repositories_stars)
    GROUP BY user
    HAVING c >= 10
    LIMIT 10000
)
SELECT user, repo FROM stars
WHERE repo IN (SELECT repo FROM repositories_stars)
AND user IN (SELECT user FROM users_stars)

ref:
https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e