hardhat-deploy: Upgradeable Contracts with Linked Libraries

hardhat-deploy: Upgradeable Contracts with Linked Libraries

Library

Assume ContractA imports LibraryA, when deploying ContractA, LibraryA is embedded into ContractA if LibraryA contains only internal functions.

If LibraryA contains at least one external function, LibraryA must be deployed first, and then linked when deploying ContractA.

ref:
https://solidity-by-example.org/library/
https://docs.soliditylang.org/en/v0.7.6/using-the-compiler.html

Foundry

In Foundry tests, Foundry will automatically deploy libraries if they have external functions, so you don't need to explicitly link them.

hardhat-deploy

Whenever the library is changed, hardhat-deploy will deploy a new implementation and upgrade the proxy:

import { DeployFunction } from "hardhat-deploy/dist/types"
import { QuoteVault } from "../typechain-types"

const func: DeployFunction = async function (hre) {
    const { deployments, ethers } = hre

    // deploy library
    await deployments.deploy("PerpFacade", {
        from: deployerAddress,
        contract: "contracts/lib/PerpFacade.sol:PerpFacade",
    })
    const perpFacadeDeployment = await deployments.get("PerpFacade")

    // deploy upgradeable contract
    await deployments.deploy("QuoteVault", {
        from: deployerAddress,
        contract: "contracts/QuoteVault.sol:QuoteVault",
        proxy: {
            owner: multisigOwnerAddress, // ProxyAdmin.owner
            proxyContract: "OpenZeppelinTransparentProxy",
            viaAdminContract: "DefaultProxyAdmin",
            execute: {
                init: {
                    methodName: "initialize",
                    args: [
                        "Kantaban USDC-ETH QuoteVault",
                        "kUSDC-ETH",
                        usdcDecimals,
                        USDC,
                        WETH,
                    ],
                },
            },
        },
        libraries: {
            PerpFacade: perpFacadeDeployment.address,
        },
    })
    const quoteVaultDeployment = await deployments.get("QuoteVault")

    // must specify library address when instantiating the contract:
    const quoteVaultFactory = await ethers.getContractFactory("contracts/QuoteVault.sol:QuoteVault", {
        libraries: {
            PerpFacade: perpFacadeDeployment.address,
        },
    })
    const quoteVault = quoteVaultFactory.attach(quoteVaultDeployment.address) as unknown as QuoteVault
    console.log(await quoteVault.decimals())
}

export default func

ref:
https://github.com/wighawag/hardhat-deploy#handling-contract-using-libraries

Deploy Ethereum RPC Provider Load Balancer with HAProxy in Kubernetes (AWS EKS)

Deploy Ethereum RPC Provider Load Balancer with HAProxy in Kubernetes (AWS EKS)

To achieve high availability and better performance, we could build a HAProxy load balancer in front of multiple Ethereum RPC providers, and also automatically adjust traffic weights based on the latency and block timestamp of each RPC endpoints.

ref:
https://www.haproxy.org/

Configurations

In haproxy.cfg, we have a backend named rpc-backend, and two RPC endpoints: quicknode and alchemy as upstream servers.

global
    log stdout format raw local0 info
    stats socket ipv4@*:9999 level admin expose-fd listeners
    stats timeout 5s

defaults
    log global
    mode http
    option httplog
    option dontlognull
    timeout connect 10s
    timeout client 60s
    timeout server 60s
    timeout http-request 60s

frontend stats
    bind *:8404
    stats enable
    stats uri /
    stats refresh 10s

frontend http
    bind *:8000
    option forwardfor
    default_backend rpc-backend

backend rpc-backend
    balance leastconn
    server quicknode 127.0.0.1:8001 weight 100
    server alchemy 127.0.0.1:8002 weight 100

frontend quicknode-frontend
    bind *:8001
    option dontlog-normal
    default_backend quicknode-backend

backend quicknode-backend
    balance roundrobin
    http-request set-header Host xxx.quiknode.pro
    http-request set-path /xxx
    server quicknode xxx.quiknode.pro:443 sni str(xxx.quiknode.pro) check-ssl ssl verify none

frontend alchemy-frontend
    bind *:8002
    option dontlog-normal
    default_backend alchemy-backend

backend alchemy-backend
    balance roundrobin
    http-request set-header Host xxx.alchemy.com
    http-request set-path /xxx
    server alchemy xxx.alchemy.com:443 sni str(xxx.alchemy.com) check-ssl ssl verify none

ref:
https://docs.haproxy.org/2.7/configuration.html
https://www.haproxy.com/documentation/hapee/latest/configuration/

Test it on local:

docker run --rm -v $PWD:/usr/local/etc/haproxy \
-p 8000:8000 \
-p 8404:8404 \
-p 9999:9999 \
-i -t --name haproxy haproxy:2.7.0

docker exec -i -t -u 0 haproxy bash

echo "show stat" | socat stdio TCP:127.0.0.1:9999
echo "set weight rpc-backend/quicknode 0" | socat stdio TCP:127.0.0.1:9999

# if you're using a socket file descriptor
apt update
apt install socat -y
echo "set weight rpc-backend/alchemy 0" | socat stdio /var/lib/haproxy/haproxy.sock

ref:
https://www.redhat.com/sysadmin/getting-started-socat

Healtchcheck

Then the important part: we're going to run a simple but flexible healthcheck script, called node weighter, as a sidecar container. So the healthcheck script can access HAProxy admin socket of the HAProxy container through 127.0.0.1:9999.

The node weighter can be written in any language. Here is a TypeScript example:

in HAProxyConnector.ts which sets weights through HAProxy admin socket:

import net from "net"

export interface ServerWeight {
    backendName: string
    serverName: string
    weight: number
}

export class HAProxyConnector {
    constructor(readonly adminHost = "127.0.0.1", readonly adminPort = 9999) {}

    setWeights(serverWeights: ServerWeight[]) {
        const scaledServerWeights = this.scaleWeights(serverWeights)

        const commands = scaledServerWeights.map(server => {
            return `set weight ${server.backendName}/${server.serverName} ${server.weight}\n`
        })

        const client = net.createConnection({ host: this.adminHost, port: this.adminPort }, () => {
            console.log("HAProxyAdminSocketConnected")
        })
        client.on("error", err => {
            console.log("HAProxyAdminSocketError")
        })
        client.on("data", data => {
            console.log("HAProxyAdminSocketData")
            console.log(data.toString().trim())
        })

        client.write(commands.join(""))
    }

    private scaleWeights(serverWeights: ServerWeight[]) {
        const totalWeight = sum(serverWeights.map(server => server.weight))

        return serverWeights.map(server => {
            server.weight = Math.floor((server.weight / totalWeight) * 256)
            return server
        })
    }
}

in RPCProxyWeighter.ts which calculates weights based a custom healthcheck logic:

import { HAProxyConnector } from "./connectors/HAProxyConnector"
import config from "./config.json"

export interface Server {
    backendName: string
    serverName: string
    serverUrl: string
}

export interface ServerWithWeight {
    backendName: string
    serverName: string
    weight: number
    [metadata: string]: any
}

export class RPCProxyWeighter {
    protected readonly log = Log.getLogger(RPCProxyWeighter.name)
    protected readonly connector: HAProxyConnector

    protected readonly ADJUST_INTERVAL_SEC = 60 // 60 seconds
    protected readonly MAX_BLOCK_TIMESTAMP_DELAY_MSEC = 150 * 1000 // 150 seconds
    protected readonly MAX_LATENCY_MSEC = 3 * 1000 // 3 seconds
    protected shouldScale = false
    protected totalWeight = 0

    constructor() {
        this.connector = new HAProxyConnector(config.admin.host, config.admin.port)
    }

    async start() {
        while (true) {
            let serverWithWeights = await this.calculateWeights(config.servers)
            if (this.shouldScale) {
                serverWithWeights = this.connector.scaleWeights(serverWithWeights)
            }
            this.connector.setWeights(serverWithWeights)

            await sleep(1000 * this.ADJUST_INTERVAL_SEC)
        }
    }

    async calculateWeights(servers: Server[]) {
        this.totalWeight = 0

        const serverWithWeights = await Promise.all(
            servers.map(async server => {
                try {
                    return await this.calculateWeight(server)
                } catch (err: any) {
                    return {
                        backendName: server.backendName,
                        serverName: server.serverName,
                        weight: 0,
                    }
                }
            }),
        )

        // if all endpoints are unhealthy, overwrite weights to 100
        if (this.totalWeight === 0) {
            for (const server of serverWithWeights) {
                server.weight = 100
            }
        }

        return serverWithWeights
    }

    async calculateWeight(server: Server) {
        const healthInfo = await this.getHealthInfo(server.serverUrl)

        const serverWithWeight: ServerWithWeight = {
            ...{
                backendName: server.backendName,
                serverName: server.serverName,
                weight: 0,
            },
            ...healthInfo,
        }

        if (healthInfo.isBlockTooOld || healthInfo.isLatencyTooHigh) {
            return serverWithWeight
        }

        // normalizedLatency: the lower the better
        // blockTimestampDelayMsec: the lower the better
        // both units are milliseconds at the same scale
        // serverWithWeight.weight = 1 / healthInfo.normalizedLatency + 1 / healthInfo.blockTimestampDelayMsec

        // NOTE: if we're using `balance source` in HAProxy, the weight can only be 100% or 0%,
        // therefore, as long as the RPC endpoint is healthy, we always set the same weight
        serverWithWeight.weight = 100

        this.totalWeight += serverWithWeight.weight

        return serverWithWeight
    }

    protected async getHealthInfo(serverUrl: string): Promise<HealthInfo> {
        const provider = new ethers.providers.StaticJsonRpcProvider(serverUrl)

        // TODO: add timeout
        const start = Date.now()
        const blockNumber = await provider.getBlockNumber()
        const end = Date.now()

        const block = await provider.getBlock(blockNumber)

        const blockTimestamp = block.timestamp
        const blockTimestampDelayMsec = Math.floor(Date.now() / 1000 - blockTimestamp) * 1000
        const isBlockTooOld = blockTimestampDelayMsec >= this.MAX_BLOCK_TIMESTAMP_DELAY_MSEC

        const latency = end - start
        const normalizedLatency = this.normalizeLatency(latency)
        const isLatencyTooHigh = latency >= this.MAX_LATENCY_MSEC

        return {
            blockNumber,
            blockTimestamp,
            blockTimestampDelayMsec,
            isBlockTooOld,
            latency,
            normalizedLatency,
            isLatencyTooHigh,
        }
    }

    protected normalizeLatency(latency: number) {
        if (latency <= 40) {
            return 1
        }

        const digits = Math.floor(latency).toString().length
        const base = Math.pow(10, digits - 1)
        return Math.floor(latency / base) * base
    }
}

in config.json:

Technically, we don't need this config file. Instead, we could read the actual URLs from HAProxy admin socket directly. Though creating a JSON file that contains URLs is much simpler.

{
    "admin": {
        "host": "127.0.0.1",
        "port": 9999
    },
    "servers": [
        {
            "backendName": "rpc-backend",
            "serverName": "quicknode",
            "serverUrl": "https://xxx.quiknode.pro/xxx"
        },
        {
            "backendName": "rpc-backend",
            "serverName": "alchemy",
            "serverUrl": "https://xxx.alchemy.com/xxx"
        }
    ]
}

ref:
https://www.haproxy.com/documentation/hapee/latest/api/runtime-api/set-weight/
https://sleeplessbeastie.eu/2020/01/29/how-to-use-haproxy-stats-socket/

Deployments

apiVersion: v1
kind: ConfigMap
metadata:
  name: rpc-proxy-config-file
data:
  haproxy.cfg: |
    ...
  config.json: |
    ...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rpc-proxy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rpc-proxy
  template:
    metadata:
      labels:
        app: rpc-proxy
    spec:
      volumes:
        - name: rpc-proxy-config-file
          configMap:
            name: rpc-proxy-config-file
      containers:
        - name: haproxy
          image: haproxy:2.7.0
          ports:
            - containerPort: 8000
              protocol: TCP
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 256Mi
          volumeMounts:
            - name: rpc-proxy-config-file
              subPath: haproxy.cfg
              mountPath: /usr/local/etc/haproxy/haproxy.cfg
              readOnly: true
        - name: node-weighter
          image: your-node-weighter
          command: ["node", "./index.js"]
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 256Mi
          volumeMounts:
            - name: rpc-proxy-config-file
              subPath: config.json
              mountPath: /path/to/build/config.json
              readOnly: true
---
apiVersion: v1
kind: Service
metadata:
  name: rpc-proxy
spec:
  clusterIP: None
  selector:
    app: rpc-proxy
  ports:
    - name: http
      port: 8000
      targetPort: 8000

The RPC load balancer can then be accessed through http://rpc-proxy.default.svc.cluster.local:8000 inside the Kubernetes cluster.

ref:
https://www.containiq.com/post/kubernetes-sidecar-container
https://hub.docker.com/_/haproxy

Deploy graph-node (The Graph) in Kubernetes (AWS EKS)

Deploy graph-node (The Graph) in Kubernetes (AWS EKS)

graph-node is an open source software that indexes blockchain data, as known as indexer. Though the cost of running a self-hosted graph node could be pretty high. We're going to deploy a self-hosted graph node on Amazon Elastic Kubernetes Service (EKS).

In this article, we have two approaches to deploy self-hosted graph node:

  1. A single graph node with a single PostgreSQL database
  2. A graph node cluster with a primary-secondary PostgreSQL cluster
    • A graph node cluster consists of one index node and multiple query nodes
    • For the database, we will simply use a Multi-AZ DB cluster on AWS RDS

ref:
https://github.com/graphprotocol/graph-node

Create a PostgreSQL Database on AWS RDS

Hardware requirements for running a graph-node:
https://thegraph.com/docs/en/network/indexing/#what-are-the-hardware-requirements
https://docs.thegraph.academy/official-docs/indexer/testnet/graph-protocol-testnet-baremetal/1_architectureconsiderations

A Single DB Instance

We use the following settings on staging.

  • Version: PostgreSQL 13.7-R1
  • Template: Dev/Test
  • Deployment option: Single DB instance
  • DB instance identifier: graph-node
  • Master username: graph_node
  • Auto generate a password: Yes
  • DB instance class: db.t3.medium (2 vCPU 4G RAM)
  • Storage type: gp2
  • Allocated storage: 500 GB
  • Enable storage autoscaling: No
  • Compute resource: Don’t connect to an EC2 compute resource
  • Network type: IPv4
  • VPC: eksctl-perp-staging-cluster/VPC
  • DB Subnet group: default-vpc
  • Public access: Yes
  • VPC security group: graph-node
  • Availability Zone: ap-northeast-1d
  • Initial database name: graph_node

A Multi-AZ DB Cluster

We use the following settings on production.

  • Version: PostgreSQL 13.7-R1
  • Template: Production
  • Deployment option: Multi-AZ DB Cluster
  • DB instance identifier: graph-node-cluster
  • Master username: graph_node
  • Auto generate a password: Yes
  • DB instance class: db.m6gd.2xlarge (8 vCPU 32G RAM)
  • Storage type: io1
  • Allocated storage: 500 GB
  • Provisioned IOPS: 1000
  • Enable storage autoscaling: No
  • Compute resource: Don’t connect to an EC2 compute resource
  • VPC: eksctl-perp-production-cluster/VPC
  • DB Subnet group: default-vpc
  • Public access: Yes
  • VPC security group: graph-node

Unfortunately, AWS currently do not have Reserved Instances (RIs) Plan for Multi-AZ DB clusters. Use "Multi-AZ DB instance" or "Single DB instance" instead if the cost is a big concern to you.

RDS Remote Access

You could test your database remote access. Also, make sure the security group's inbound rules include 5432 port for PostgreSQL.

brew install postgresql

psql --host=YOUR_RDB_ENDPOINT --port=5432 --username=graph_node --password --dbname=postgres
# or
createdb -h YOUR_RDB_ENDPOINT -p 5432 -U graph_node graph_node

Create a Dedicated EKS Node Group

This step is optional.

eksctl --profile=perp create nodegroup \
--cluster perp-production \
--region ap-southeast-1 \
--name "managed-graph-node-m5-xlarge" \
--node-type "m5.xlarge" \
--nodes 3 \
--nodes-min 3 \
--nodes-max 3 \
--managed \
--asg-access \
--alb-ingress-access

Deploy graph-node in Kubernetes

Deployments for a Single DB Instance

apiVersion: v1
kind: Service
metadata:
  name: graph-node
spec:
  clusterIP: None
  selector:
    app: graph-node
  ports:
    - name: jsonrpc
      port: 8000
      targetPort: 8000
    - name: websocket
      port: 8001
      targetPort: 8001
    - name: admin
      port: 8020
      targetPort: 8020
    - name: index-node
      port: 8030
      targetPort: 8030
    - name: metrics
      port: 8040
      targetPort: 8040
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: graph-node-config
data:
  # https://github.com/graphprotocol/graph-node/blob/master/docs/environment-variables.md
  GRAPH_LOG: info
  EXPERIMENTAL_SUBGRAPH_VERSION_SWITCHING_MODE: synced
---
apiVersion: v1
kind: Secret
metadata:
  name: graph-node-secret
type: Opaque
data:
  postgres_host: xxx
  postgres_user: xxx
  postgres_db: xxx
  postgres_pass: xxx
  ipfs: https://ipfs.network.thegraph.com
  ethereum: optimism:https://YOUR_RPC_ENDPOINT_1 optimism:https://YOUR_RPC_ENDPOINT_2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: graph-node
spec:
  replicas: 1
  selector:
    matchLabels:
      app: graph-node
  serviceName: graph-node
  template:
    metadata:
      labels:
        app: graph-node
      annotations:
        "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
    spec:
      containers:
        # https://github.com/graphprotocol/graph-node/releases
        # https://hub.docker.com/r/graphprotocol/graph-node
        - name: graph-node
          image: graphprotocol/graph-node:v0.29.0
          envFrom:
            - secretRef:
                name: graph-node-secret
            - configMapRef:
                name: graph-node-config
          ports:
            - containerPort: 8000
              protocol: TCP
            - containerPort: 8001
              protocol: TCP
            - containerPort: 8020
              protocol: TCP
            - containerPort: 8030
              protocol: TCP
            - containerPort: 8040
              protocol: TCP
          resources:
            requests:
              cpu: 2000m
              memory: 4G
            limits:
              cpu: 4000m
              memory: 4G

Since Kubernetes Secrets are, by default, stored unencrypted in the API server's underlying data store (etcd). Anyone with API access can retrieve or modify a Secret. They're not secret at all. So instead of storing sensitive data in Secrets, you might want to use Secrets Store CSI Driver.

ref:
https://github.com/graphprotocol/graph-node/blob/master/docs/environment-variables.md
https://hub.docker.com/r/graphprotocol/graph-node

Deployments for a Multi-AZ DB Cluster

There are two types of nodes in a graph node cluster:

  • Index Node: Only indexing data from the blockchain, not serving queries at all
  • Query Node: Only serving GraphQL queries, not indexing data at all

Indexing subgraphs doesn't require too much CPU and memory resources, but serving queries does, especially when you enable GraphQL caching.

Index Node

Technically, we could further split an index node into Ingestor and Indexer: the former fetches blockchain data from RPC providers periodically, and the latter indexes entities based on mappings. That's another story though.

apiVersion: v1
kind: ConfigMap
metadata:
  name: graph-node-cluster-index-config-file
data:
  config.toml: |
    [store]
    [store.primary]
    connection = "postgresql://USER:PASSWORD@RDS_WRITER_ENDPOINT/DB"
    weight = 1
    pool_size = 2

    [chains]
    ingestor = "graph-node-cluster-index-0"
    [chains.optimism]
    shard = "primary"
    provider = [
      { label = "optimism-rpc-proxy", url = "http://rpc-proxy-for-graph-node.default.svc.cluster.local:8000", features = ["archive"] }
      # { label = "optimism-quicknode", url = "https://YOUR_RPC_ENDPOINT_1", features = ["archive"] },
      # { label = "optimism-alchemy", url = "https://YOUR_RPC_ENDPOINT_2", features = ["archive"] },
      # { label = "optimism-infura", url = "https://YOUR_RPC_ENDPOINT_3", features = ["archive"] }
    ]

    [deployment]
    [[deployment.rule]]
    indexers = [ "graph-node-cluster-index-0" ]
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: graph-node-cluster-index-config-env
data:
  # https://github.com/graphprotocol/graph-node/blob/master/docs/environment-variables.md
  GRAPH_LOG: "info"
  GRAPH_KILL_IF_UNRESPONSIVE: "true"
  ETHEREUM_POLLING_INTERVAL: "100"
  ETHEREUM_BLOCK_BATCH_SIZE: "10"
  GRAPH_STORE_WRITE_QUEUE: "50"
  EXPERIMENTAL_SUBGRAPH_VERSION_SWITCHING_MODE: "synced"
---
apiVersion: v1
kind: Service
metadata:
  name: graph-node-cluster-index
spec:
  clusterIP: None
  selector:
    app: graph-node-cluster-index
  ports:
    - name: jsonrpc
      port: 8000
      targetPort: 8000
    - name: websocket
      port: 8001
      targetPort: 8001
    - name: admin
      port: 8020
      targetPort: 8020
    - name: index-node
      port: 8030
      targetPort: 8030
    - name: metrics
      port: 8040
      targetPort: 8040
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: graph-node-cluster-index
spec:
  replicas: 1
  selector:
    matchLabels:
      app: graph-node-cluster-index
  serviceName: graph-node-cluster-index
  template:
    metadata:
      labels:
        app: graph-node-cluster-index
      annotations:
        "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
    spec:
      terminationGracePeriodSeconds: 10
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: alpha.eksctl.io/nodegroup-name
                    operator: In
                    values:
                      - managed-graph-node-m5-xlarge
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values:
                      # should schedule index node to the zone that writer instance is located
                      - ap-southeast-1a
      volumes:
        - name: graph-node-cluster-index-config-file
          configMap:
            name: graph-node-cluster-index-config-file
      containers:
        # https://github.com/graphprotocol/graph-node/releases
        # https://hub.docker.com/r/graphprotocol/graph-node
        - name: graph-node-cluster-index
          image: graphprotocol/graph-node:v0.29.0
          command:
            [
              "/bin/sh",
              "-c",
              'graph-node --node-id $HOSTNAME --config "/config.toml" --ipfs "https://ipfs.network.thegraph.com"',
            ]
          envFrom:
            - configMapRef:
                name: graph-node-cluster-index-config-env
          ports:
            - containerPort: 8000
              protocol: TCP
            - containerPort: 8001
              protocol: TCP
            - containerPort: 8020
              protocol: TCP
            - containerPort: 8030
              protocol: TCP
            - containerPort: 8040
              protocol: TCP
          resources:
            requests:
              cpu: 1000m
              memory: 1G
            limits:
              cpu: 2000m
              memory: 1G
          volumeMounts:
            - name: graph-node-cluster-index-config-file
              subPath: config.toml
              mountPath: /config.toml
              readOnly: true

ref:
https://github.com/graphprotocol/graph-node/blob/master/docs/config.md
https://github.com/graphprotocol/graph-node/blob/master/docs/environment-variables.md

The key factors to the efficiency of syncing/indexing subgraphs are:

  1. The latency of the RPC provider
  2. The write thoughput of the database

I didn't find any graph-node configs or environment variables that can speed up the syncing process observably. If you know, please tell me.

ref:
https://github.com/graphprotocol/graph-node/issues/3756

If you're interested in building a RPC proxy with healthcheck of block number and latency, see Deploy Ethereum RPC Provider Load Balancer with HAProxy in Kubernetes (AWS EKS). graph-node itself cannot detect if the RPC provider's block delays.

Query Nodes

The most important config is DISABLE_BLOCK_INGESTOR: "true" which basically configures the node as a query node.

apiVersion: v1
kind: ConfigMap
metadata:
  name: graph-node-cluster-query-config-env
data:
  # https://github.com/graphprotocol/graph-node/blob/master/docs/environment-variables.md
  DISABLE_BLOCK_INGESTOR: "true" # this node won't ingest blockchain data
  GRAPH_LOG_QUERY_TIMING: "gql"
  GRAPH_GRAPHQL_QUERY_TIMEOUT: "600"
  GRAPH_QUERY_CACHE_BLOCKS: "60"
  GRAPH_QUERY_CACHE_MAX_MEM: "2000" # the actual used memory could be 3x
  GRAPH_QUERY_CACHE_STALE_PERIOD: "100"
  EXPERIMENTAL_SUBGRAPH_VERSION_SWITCHING_MODE: "synced"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: graph-node-cluster-query-config-file
data:
  config.toml: |
    [store]
    [store.primary]
    connection = "postgresql://USER:PASSWORD@RDS_WRITER_ENDPOINT/DB" # connect to RDS writer instance
    weight = 0
    pool_size = 2
    [store.primary.replicas.repl1]
    connection = "postgresql://USER:PASSWORD@RDS_READER_ENDPOINT/DB" # connect to RDS reader instances (multiple)
    weight = 1
    pool_size = 100

    [chains]
    ingestor = "graph-node-cluster-index-0"

    [deployment]
    [[deployment.rule]]
    indexers = [ "graph-node-cluster-index-0" ]

    [general]
    query = "graph-node-cluster-query*"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: graph-node-cluster-query
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  minReadySeconds: 5
  selector:
    matchLabels:
      app: graph-node-cluster-query
  template:
    metadata:
      labels:
        app: graph-node-cluster-query
    spec:
      terminationGracePeriodSeconds: 40
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: alpha.eksctl.io/nodegroup-name
                    operator: In
                    values:
                      - managed-graph-node-m5-xlarge
      volumes:
        - name: graph-node-cluster-query-config-file
          configMap:
            name: graph-node-cluster-query-config-file
      containers:
        # https://github.com/graphprotocol/graph-node/releases
        # https://hub.docker.com/r/graphprotocol/graph-node
        - name: graph-node-cluster-query
          image: graphprotocol/graph-node:v0.29.0
          command:
            [
              "/bin/sh",
              "-c",
              'graph-node --node-id $HOSTNAME --config "/config.toml" --ipfs "https://ipfs.network.thegraph.com"',
            ]
          envFrom:
            - configMapRef:
                name: graph-node-cluster-query-config-env
          ports:
            - containerPort: 8000
              protocol: TCP
            - containerPort: 8001
              protocol: TCP
            - containerPort: 8030
              protocol: TCP
          resources:
            requests:
              cpu: 1000m
              memory: 8G
            limits:
              cpu: 2000m
              memory: 8G
          volumeMounts:
            - name: graph-node-cluster-query-config-file
              subPath: config.toml
              mountPath: /config.toml
              readOnly: true

ref:
https://github.com/graphprotocol/graph-node/blob/master/docs/config.md
https://github.com/graphprotocol/graph-node/blob/master/docs/environment-variables.md

It's also strongly recommended to mark subgraph schemas as immutable with @entity(immutable: true). Immutable entities are much faster to write and to query, so should be used whenever possible. The query time reduces by 80% in our case.

ref:
https://thegraph.com/docs/en/developing/creating-a-subgraph/#defining-entities

Setup an Ingress for graph-node

WebSocket connections are inherently sticky. If the client requests a connection upgrade to WebSockets, the target that returns an HTTP 101 status code to accept the connection upgrade is the target used in the WebSockets connection. After the WebSockets upgrade is complete, cookie-based stickiness is not used. You don't need to enable stickiness for ALB.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: graph-node-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:XXX
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=false
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=3600,deletion_protection.enabled=true
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/actions.subgraph-api: >
      {"type": "forward", "forwardConfig": {"targetGroups": [
        {"serviceName": "graph-node-cluster-query", "servicePort": 8000, "weight": 100}
      ]}}
    alb.ingress.kubernetes.io/actions.subgraph-ws: >
      {"type": "forward", "forwardConfig": {"targetGroups": [
        {"serviceName": "graph-node-cluster-query", "servicePort": 8001, "weight": 100}
      ]}}
    alb.ingress.kubernetes.io/actions.subgraph-hc: >
      {"type": "forward", "forwardConfig": {"targetGroups": [
        {"serviceName": "graph-node-cluster-query", "servicePort": 8030, "weight": 100}
      ]}}
spec:
  rules:
    - host: "subgraph-api.example.com"
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: subgraph-api
                port:
                  name: use-annotation
    - host: "subgraph-ws.example.com"
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: subgraph-ws
                port:
                  name: use-annotation
    - host: "subgraph-hc.example.com"
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: subgraph-hc
                port:
                  name: use-annotation

Deploy a Subgraph

kubectl port-forward service/graph-node-cluster-index 8020:8020
npx graph create your_org/your_subgraph --node http://127.0.0.1:8020
graph deploy --node http://127.0.0.1:8020 your_org/your_subgraph

ref:
https://github.com/graphprotocol/graph-cli

Here are also some useful commands for maintenance tasks:

kubectl logs -l app=graph-node-cluster-query -f | grep "query_time_ms"

# show total pool connections according to config file
graphman --config config.toml config pools graph-node-cluster-query-zone-a-78b467665d-tqw7g

# show info
graphman --config config.toml info QmbCm38nL6C7v3FS5snzjnQyDMvV8FkM72C3o5RCfGpM9P

# reassign a deployment to a index node
graphman --config config.toml reassign QmbCm38nL6C7v3FS5snzjnQyDMvV8FkM72C3o5RCfGpM9P graph-node-cluster-index-pending-0
graphman --config config.toml reassign QmbCm38nL6C7v3FS5snzjnQyDMvV8FkM72C3o5RCfGpM9P graph-node-cluster-index-0

# stop indexing a deployment
graphman --config config.toml unassign QmbCm38nL6C7v3FS5snzjnQyDMvV8FkM72C3o5RCfGpM9P

# remove unused deployments
graphman --config config.toml unused record
graphman --config config.toml unused remove -d QmdoXB4zJVD5n38omVezMaAGEwsubhKdivgUmpBCUxXKDh

ref:
https://github.com/graphprotocol/graph-node/blob/master/docs/graphman.md

The Graph: Subgraph and GRT Token

The Graph: Subgraph and GRT Token

The Graph is a protocol for indexing and querying Blockchain data. Currently, The Graph has the legacy version and the decentralized version. The legacy version is a centralized and managed service hosted by The Graph, and it will eventually be shutdown in the future. The decentralized version (aka The Graph Network) consists of 4 major roles: Developers, Indexers, Curators, and Delegators.

ref:
https://thegraph.com/

Video from Finematics
https://finematics.com/the-graph-explained/

Terminologies

Define a Subgraph

A subgraph defines one or more entities of indexed data: what kind of data on the blockchain you want to index for faster querying. Once deployed, subgraphs could be queried by dApps to fetch blockchain data to power their frontend interfaces. Basically, a subgraph is like a database, and an entity (a GraphQL type) is like a table in RDBMS.

A subgraph definition consists of 3 files:

  • subgraph.yaml: a YAML file that defines the subgraph manifest and metadata.
  • schema.graphql: a GraphQL schema that defines what data (entities) are stored, and how to query it via GraphQL.
  • mappings.ts: AssemblyScript code that translates blockchain data (events, blocks, or contract calls) to GraphQL entities.

GraphQL Schema

First, we need to design our GraphQL entity schemas (the data model) which mainly depends on how you want to query the data instead of how the data are emitted from the blockchain. GraphQL schemas are defined using the GraphQL schema language.

Here're some notes about subgraph's GraphQL schema:

An example of schema.graphql:

type Market @entity {
  id: ID!
  baseToken: Bytes!
  pool: Bytes!
  feeRatio: BigInt!
  tradingFee: BigDecimal!
  tradingVolume: BigDecimal!
  blockNumberAdded: BigInt!
  timestampAdded: BigInt!
}

type Trader @entity {
  id: ID!
  realizedPnl: BigDecimal!
  fundingPayment: BigDecimal!
  tradingFee: BigDecimal!
  badDebt: BigDecimal!
  totalPnl: BigDecimal!
  positions: [Position!]! @derivedFrom(field: "traderRef")
}

type Position @entity {
  id: ID!
  trader: Bytes!
  baseToken: Bytes!
  positionSize: BigDecimal!
  openNotional: BigDecimal!
  openPrice: BigDecimal!
  realizedPnl: BigDecimal!
  tradingFee: BigDecimal!
  badDebt: BigDecimal!
  totalPnl: BigDecimal!
  traderRef: Trader!
}

ref:
https://thegraph.com/docs/developer/create-subgraph-hosted#the-graphql-schema
https://thegraph.com/docs/developer/graphql-api
https://graphql.org/learn/schema/

It's also worth noting that The Graph supports Time-travel queries. We can query the state of your entities for an arbitrary block in the past:

{
  positions(
    block: {
      number: 1234567
    },
    where: {
      trader: "0x5abfec25f74cd88437631a7731906932776356f9"
    }
  ) {
    id
    trader
    baseToken
    positionSize
    openNotional
    openPrice
    realizedPnl
    fundingPayment
    tradingFee
    badDebt
    totalPnl
    blockNumber
    timestamp
  }
}

ref:
https://thegraph.com/docs/developer/graphql-api#time-travel-queries

Subgraph Manifest

Second, we must provide a manifest to tell The Graph which contracts we would like to listen to, which contract events we want to index. Also a mapping file that instructs The Graph on how to transform blockchain data into GraphQL entities.

A template file of subgraph.yaml:

specVersion: 0.0.2
description: Test Subgraph
repository: https://github.com/vinta/my-subgraph
schema:
  file: ./schema.graphql
dataSources:
  - kind: ethereum/contract
    name: ClearingHouse
    network: {{ network }}
    source:
      abi: ClearingHouse
      address: {{ clearingHouse.address }}
      startBlock: {{ clearingHouse.startBlock }}
    mapping:
      kind: ethereum/events
      apiVersion: 0.0.4
      language: wasm/assemblyscript
      file: ./src/mappings/clearingHouse.ts
      entities:
        - Protocol
        - Market
        - Trader
        - Position
      abis:
        - name: ClearingHouse
          file: ./abis/ClearingHouse.json
      eventHandlers:
        - event: PoolAdded(indexed address,indexed uint24,indexed address)
          handler: handlePoolAdded
        - event: PositionChanged(indexed address,indexed address,int256,int256,uint256,int256,uint256)
          handler: handlePositionChanged

Since we would usually deploy our contracts to multiple chains (at least one for mainnet and one for testnet), so we could use a template engine (like mustache.js) to facilate deployment.

$ cat configs/arbitrum-rinkeby.json
{
    "network": "arbitrum-rinkeby",
    "clearingHouse": {
        "address": "0xYourContractAddress",
        "startBlock": 1234567
    }
}

# generate the subgraph manifest for different networks
$ mustache configs/arbitrum-rinkeby.json subgraph.template.yaml > subgraph.yaml
$ mustache configs/arbitrum-one.json subgraph.template.yaml > subgraph.yaml

It's worth noting that The Graph Legacy (the Hosted Service) supports most of common networks, for instance, mainnet, rinkeby, bsc, matic, arbitrum-one, and optimism. However, The Graph Network (the decentralized version) only supports Ethereum mainnet and rinkeby.

You could find the full list of supported networks on the document:
https://thegraph.com/docs/developer/create-subgraph-hosted#from-an-existing-contract

Mappings

Mappings are written in AssemblyScript and will be compiled to WebAssembly (WASM) when deploying. AssemblyScript's syntax is similar to TypeScript, but it's actually a completely different language.

For each event handler defined in subgraph.yaml under mapping.eventHandlers, we must create an exported function of the same name. What we do in a event handler is basically:

  1. Creating new entities or loading existed ones by id.
  2. Updating fields of entities from a blockchain event.
  3. Saving entities to The Graph.
    • It's not necessary to load an entity before updating it. It's fine to simply create the entity, set properties, then save. If the entity already exists, changes will be merged automatically.
export function handlePoolAdded(event: PoolAdded): void {
    // upsert Protocol
    const protocol = getOrCreateProtocol()
    protocol.publicMarketCount = protocol.publicMarketCount.plus(BI_ONE)

    // upsert Market
    const market = getOrCreateMarket(event.params.baseToken)
    market.baseToken = event.params.baseToken
    market.pool = event.params.pool
    market.feeRatio = BigInt.fromI32(event.params.feeRatio)
    market.blockNumberAdded = event.block.number
    market.timestampAdded = event.block.timestamp

    // commit changes
    protocol.save()
    market.save()
}

export function handlePositionChanged(event: Swapped): void {
    // upsert Market
    const market = getOrCreateMarket(event.params.baseToken)
    market.tradingFee = market.tradingFee.plus(swappedEvent.fee)
    market.tradingVolume = market.tradingVolume.plus(abs(swappedEvent.exchangedPositionNotional))
    ...

    // upsert Trader
    const trader = getOrCreateTrader(event.params.trader)
    trader.tradingFee = trader.tradingFee.plus(swappedEvent.fee)
    trader.realizedPnl = trader.realizedPnl.plus(swappedEvent.realizedPnl)
    ...

    // upsert Position
    const position = getOrCreatePosition(event.params.trader, event.params.baseToken)
    const side = swappedEvent.exchangedPositionSize.ge(BD_ZERO) ? Side.BUY : Side.SELL
    ...

    // commit changes
    market.save()
    trader.save()
    position.save()
}

We can also access contract states and call contract functions at the current block (even ). Though the functionality of calling contract functions is limited by @graphprotocol/graph-ts, it's not as powerful as libraries like ethers.js. And no, we cannot import ethers.js in mappings, as mappings are written in AssemblyScript. However, contract calls are quite "expensive" in terms of indexing performance. In extreme cases, some indexers might avoid syncing a very slow subgraph, or charge a premium for serving queries.

export function handlePoolAdded(event: PoolAdded): void {
    ...
    const pool = UniswapV3Pool.bind(event.params.pool)
    market.poolTickSpacing = pool.tickSpacing()
    ...
}

ref:
https://thegraph.com/docs/developer/create-subgraph-hosted#writing-mappings
https://thegraph.com/docs/developer/assemblyscript-api

In addition to event handlers, we're also able to define call handlers and block handlers. A call handler listens to a specific contract function call, and receives input and output of the call as the handler argument. On the contrary, a block handler will be called after every block or after blocks that match a predefined filter - for every block which contains a call to the contract listed in dataSources.

ref:
https://thegraph.com/docs/developer/create-subgraph-hosted#defining-a-call-handler
https://thegraph.com/docs/developer/create-subgraph-hosted#block-handlers

Here're references to how other projects organize their subgraphs:
https://github.com/Uniswap/uniswap-v3-subgraph
https://github.com/Synthetixio/synthetix-subgraph
https://github.com/mcdexio/mai3-perpetual-graph

Deploy a Subgraph

Deploy to Legacy Explorer

Before deploying your subgraph to the Legacy Explorer (the centralized and hosted version of The Graph), we need to create it on the Legacy Explorer dashboard.

Then run the following commands to deploy:

$ mustache configs/arbitrum-rinkeby.json subgraph.template.yaml > subgraph.yaml

$ graph auth --product hosted-service <YOUR_THE_GRAPH_ACCESS_TOKEN>
$ graph deploy --product hosted-service <YOUR_GITHUB_USERNAME>/<YOUR_SUBGRAPH_REPO>

ref:
https://thegraph.com/docs/developer/deploy-subgraph-hosted

Deploy to Subgraph Studio

When we deploy a subgraph to Subgraph Studio (the decentralized version of The Graph), we just push it to the Studio where we're able to test it. Versus, when we "publish" a subgraph in Subgraph Studio, we are publishing it on-chain. Unfortunately, Subgraph Studio only supports Ethereum Mainnet and Rinkeby testnet currently.

$ graph auth --studio <YOUR_SUBGRAPH_DEPLOY_KEY>
$ graph deploy --studio <YOUR_SUBGRAPH_SLUG>

ref:
https://thegraph.com/docs/developer/deploy-subgraph-studio

Token Economics

Before we talk about the token economics of GRT token, it's important to know that the following description only applies to The Graph Network, the decentralized version of The Graph. Also, the name of The Graph Network is a bit ambiguous, it is not a new network or a new blockchain, instead, it is a web service that charges HTTP/WebSocket API calls in GRT token.

To make GRT token somehow valuable, when you query data (through GraphQL APIs) from The Graph Network, you need to pay for each query in GRT. First, you have to connect your wallet and create an account on Subgraph Studio to obtain an API key, then you deposit some GRT tokens into the account's billing balance on Polygon since their billing contract is built on Polygon. At the end of each week, if you used your API keys to query data, you will receive an invoice based on the query fees you have generated during this period. This invoice will be paid using GRT available in your balance.

ref:
https://thegraph.com/docs/studio/billing

When it comes to token economics:

  • Indexers earn query fees and indexing rewards. GRT would be slashed if indexers are malicious or serve incorrect data. Though, there's no documentation about how exactly slashing works.
  • Delegators earn a portion of query fees and indexing rewards by delegating GRT to existing indexers.
  • Curators earn a portion of query fees for the subgraphs they signal on by depositing GRT into a bonding curve of a specific subgraph.

ref:
https://thegraph.com/blog/the-graph-grt-token-economics
https://thegraph.com/blog/the-graph-network-in-depth-part-1
https://thegraph.com/blog/the-graph-network-in-depth-part-2

Query Fee

The price of queries will be set by indexers and vary based on cost to index the subgraph, the demand for queries, the amount of curation signal and the market rate for blockchain queries. Though querying data from the hosted version of The Graph is free now.

The Graph has developed a Cost Model (Agora) for pricing queries, and there is also a microtransaction system (Scalar) that uses state channels to aggregate and compress transactions before being finalized on-chain.

ref:
https://github.com/graphprotocol/agora
https://thegraph.com/blog/scalar

IPFS: The (Very Slow) Distributed Permanent Web

IPFS: The (Very Slow) Distributed Permanent Web

IPFS stands for InterPlanetary File System, but you could simply consider it as a distributed, permanent, but ridiculously slow, not properly functioning version of web. You could upload any static file and static website to IPFS. And the whole swarm would probably distribute your files to the moon, that might be why IPFS is so fucking slow.

ref:
https://ipfs.io/

Installation

Install on macOS.

$ brew install ipfs

Start your IPFS node.

$ ipfs init
initializing IPFS node at /Users/vinta/.ipfs
generating 2048-bit RSA keypair... done
peer identity: QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6

$ ipfs daemon

ref:
https://ipfs.io/docs/commands/#ipfs-init
https://ipfs.io/docs/commands/#ipfs-daemon

Furthermore, you might want to run your IPFS node in a Docker container.

# docker-compose.yml
version: "3"
services:
    ipfs:
        image: ipfs/go-ipfs:v0.4.15
        working_dir: /export
        ports:
            - "4001:4001" # Swarm
            - "5001:5001" # web UI
            - "8080:8080" # HTTP proxy
        volumes:
            - "~/.ipfs:/data/ipfs"
            - "~/.ipfs/export:/export"

ref:
https://hub.docker.com/r/ipfs/go-ipfs/

Usage

Show Node Info

$ ipfs id
{
    "ID": "QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6",
    "PublicKey": "A_LONG_LONG_LONG_KEY,
    "Addresses": [
        "/ip4/127.0.0.1/tcp/4001/ipfs/QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6",
        "/ip4/172.19.0.2/tcp/4001/ipfs/QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6"
    ],
    "AgentVersion": "go-ipfs/0.4.14/5db3846",
    "ProtocolVersion": "ipfs/0.1.0"
}

ref:
https://ipfs.io/docs/getting-started/

Add Other Nodes to Your Bootstrap List

This one is from Muzeum, https://muzeum.pro/.

$ ipfs bootstrap add /ip4/52.221.121.238/tcp/4001/ipfs/QmTKYdZDkqHiY24kPynSmKbmRdk7cJxWsvvfvvvZArQ1N9

# you could also connect to a node directly
$ ipfs swarm connect /ip4/52.221.121.238/tcp/4001/ipfs/QmTKYdZDkqHiY24kPynSmKbmRdk7cJxWsvvfvvvZArQ1N9

ref:
https://ipfs.io/docs/commands/#ipfs-bootstrap
https://ipfs.io/docs/commands/#ipfs-swarm

Add Files to IPFS

Every IPFS node's default storage is 10GB, and a single node could only store data it needs, which also means each node only stores a small amount of whole data on IPFS. If there is not enough nodes, your data might be distributed to no one except your own node.

Your content is automatically pinned when you ipfs add it.

$ ipfs add -r mysite
added QmRticJ3P5fnb9GGnUj3U9XMkYvGEnv9AQfk6YmgRhivYA mysite/index.html
added QmY9cxiHqTFoWamkQVkpmmqzBrY3hCBEL2XNu3NtX74Fuu mysite/readme.md
added QmTLhFgeWLacpbiGNYmhchHGQAhfNyDZcLt5akJFFLV89V mysite

If files/folders under the folder change, the hash of the folder changes too.

$ vim mysite/index.html
$ ipfs add -r mysite
added QmQTTe3deLfeULKjPHnQTcyFuCmY5JZiwSTiPT4nSt1KVK mysite/index.html # changed
added QmS85tb3aKQNurFm51FaxtK6NyNei4ej3gDR21baDZXRoU mysite            # changed

ref:
https://ipfs.io/docs/commands/#ipfs-add

Pin Files from IPFS

Pinning means storing IPFS files on local node, and prevent them from getting garbage collected. Also, you could access them much quickly. You only need to do ipfs pin add to pin contents someone else uploaded.

$ ipfs pin add -r --progress /ipns/ipfs.soundscape.net/

$ ipfs pin add --progress /ipns/ipfs.soundscape.net/music_group/index.json
pinned QmZwTEhdjT4MyvEnWndVEJzBjp8zGGZH1cEBpshBQs75rY recursively

$ ipfs pin add --progress /ipns/ipfs.soundscape.net/music_album/index.json
pinned QmSAuGU5xt5SdR2ca2EDgeHFATSrAQhTfTYpYs9K9qmqED recursively

$ ipfs pin add --progress /ipns/ipfs.soundscape.net/music_recording/index.json
pinned QmcTiadA9jRMXx77tydPa6492QJAtjXkKkA4gERaFksy94 recursively

$ ipfs pin add --progress /ipns/ipfs.soundscape.net/music_composition/index.json
pinned QmTfqVaGVRnaPRQgYypGYXUvTK1UcDfK5VWYvU4rwK3m26 recursively

P.S. Sometimes when I ipfs pin add a file which is not on my node, the command just hangs there. I'm not sure why that once I access the file first (through curl or any browser), then ipfs pin add works fine. But it does not make sense: if I already get/access/download the file, I could just ipfs add the file and it would be automatically pinned.

ref:
https://ipfs.io/docs/commands/#ipfs-pin

Get Files

You have several ways to get files or folders from IPFS:

  • ipfs get dir-hash -o readable-dir-name
    • ipfs get QmbMQNcg8TTo5dXZPtuxbns1XVq6cZJaa7vNqZzeJpKwfk -o mysite
  • ipfs get file-hash -o readable-file-name.ext
    • ipfs get Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u -o cat.jpg
  • ipfs get /ipfs/dir-hash/path/to/file.txt
    • ipfs get /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme
  • ipfs get /ipns/example.com/path/to/file.txt
    • ipfs get /ipns/ipfs.soundscape.net/music_group/index.json

You could also access IPFS files through any public gateway:

  • curl https://ipfs.io/ipns/peer-id/path/to/file.txt
    • curl https://ipfs.io/ipns/QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6/index.html
  • curl https://ipfs.io/ipns/example.com/path/to/file.txt
    • curl https://ipfs.io/ipns/ipfs.soundscape.net/music_group/index.json
  • curl http://127.0.0.1:8080/ipns/example.com/path/to/file.txt
    • curl http://127.0.0.1:8080/ipns/ipfs.soundscape.net/music_group/index.json

Download IPFS objects with ipfs get.

$ ipfs ls QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ
Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u 443362 cat.jpg

# you could get a folder
$ ipfs get QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ
$ ls QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ
cat.jpg

# as well as a file
$ ipfs get Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u -o cat.jpg

# get files and rename them
$ mkdir -p soundscape/music_group/ soundscape/music_album/ soundscape/music_recording/ soundscape/music_composition/ && \
  ipfs get /ipns/ipfs.soundscape.net/music_group/index.json -o soundscape/music_group/index.json; \
  ipfs get /ipns/ipfs.soundscape.net/music_album/index.json -o soundscape/music_album/index.json; \
  ipfs get /ipns/ipfs.soundscape.net/music_recording/index.json -o soundscape/music_recording/index.json; \
  ipfs get /ipns/ipfs.soundscape.net/music_composition/index.json -o soundscape/music_composition/index.json

# get whole folders
$ ipfs get /ipns/ipfs.soundscape.net/music_group; \
  ipfs get /ipns/ipfs.soundscape.net/music_album; \
  ipfs get /ipns/ipfs.soundscape.net/music_recording; \
  ipfs get /ipns/ipfs.soundscape.net/music_composition

ref:
https://ipfs.io/docs/commands/#ipfs-get
https://discuss.ipfs.io/t/trying-to-better-understand-the-pinning-concept/754

Display IPFS object data with ipfs cat.

$ ipfs cat Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u > cat.jpg
$ ipfs cat QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv/readme

Publish a Website to IPNS

IPNS stands for InterPlanetary Naming System.

Everytime you change files under a folder, the hash of the folder also changes. So you need a static reference which always points to the latest hash of your folder. You could publish your static website (a folder) to IPNS with the static reference, which is your peer ID as well as the hash of your public key.

By default, every IPFS node has only one pair of private and public key. Therefore, you could only publish one folder with your peer ID. But you could add new keypairs through ipfs key gen and publish multiple folders.

$ ipfs add -r mysite
added QmeqHWZgvgx5C7T6DakX75CJDRgAUoSDZayLYrcnAP8Fma mysite/index.html
added QmUtuRphD9rJgRkfxwj7DcyFEAcSeH3Q1fK8nHxxoDiKK5 mysite

$ ipfs name publish QmUtuRphD9rJgRkfxwj7DcyFEAcSeH3Q1fK8nHxxoDiKK5
published to QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6: /ipfs/QmUtuRphD9rJgRkfxwj7DcyFEAcSeH3Q1fK8nHxxoDiKK5

$ ipfs name resolve QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6
/ipfs/QmUtuRphD9rJgRkfxwj7DcyFEAcSeH3Q1fK8nHxxoDiKK5

Click following links to see contents.

After you change something, publish it again with new hash.

$ vim mysite/index.html
$ ipfs add -r mysite
added QmNjbhdks8RUgDt6QiNFe5QGe2HrbCsq5FKda9D9hLVkkU mysite/index.html # changed
added QmbMQNcg8TTo5dXZPtuxbns1XVq6cZJaa7vNqZzeJpKwfk mysite            # changed

$ ipfs name publish QmbMQNcg8TTo5dXZPtuxbns1XVq6cZJaa7vNqZzeJpKwfk
published to QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6: /ipfs/QmbMQNcg8TTo5dXZPtuxbns1XVq6cZJaa7vNqZzeJpKwfk

ref:
https://ipfs.io/docs/commands/#ipfs-name

Create a Domain Name Alias for Your Peer ID

The hash is not very friendly for humans. Fortunately, you could and probably should associate a domain name with your peer ID.

First, you need to add a TXT record whose value is dnslink=/ipns/YOUR_PEER_ID to your domain name. In the following article, we assume the domain name you choose is ipfs.kittenphile.com.

$ dig +short TXT ipfs.kittenphile.com
"dnslink=/ipns/QmfNy1th16zscbpxe8Q2EQdQkNFn7Y3Rp9kGZWL1EQDyw6"

$ ipfs name resolve -r ipfs.kittenphile.com
/ipfs/QmaE2DcNxGjPGPfzfTQuTBTW9D57abVSv319WqC89Av1y1

Click following links to see contents.

ref:
https://ipfs.io/docs/examples/example-viewer/example#../websites/README.md
https://hackernoon.com/ten-terrible-attempts-to-make-the-inter-planetary-file-system-human-friendly-e4e95df0c6fa

Public Gateway

If you have a public gateway and people retrieve files through it. Your public gateway fetches and stores the data, but it doesn't pin them. Files get removed with the next garbage collection run.

ref:
https://discuss.ipfs.io/t/public-facing-gateway-and-pinning/449