GKE Autopilot Cluster: Pay for Pods, Not Nodes

If you're already on Google Cloud, it's highly recommended to use GKE Autopilot Cluster: you only pay for resources requested by your pods (system pods and unused resources are free in Autopilot Cluster). No need to pay for surplus node pools anymore! Plus the entire cluster applies Google's best practices by default.

ref:
https://cloud.google.com/kubernetes-engine/pricing#compute

Create an Autopilot Cluster

DO NOT enable Private Nodes, otherwise you MUST pay for a Cloud NAT Gateway (~$32/month) for them to access the internet (to pull images, etc.).

# create
gcloud container clusters create-auto my-auto-cluster \
--project YOUR_PROJECT_ID \
--region us-west1

# connect
gcloud container clusters get-credentials my-auto-cluster \
--project YOUR_PROJECT_ID \
--region us-west1

You can update some configurations later on Google Cloud Console.

ref:
https://docs.cloud.google.com/sdk/gcloud/reference/container/clusters/create-auto

Autopilot mode works in both Autopilot and Standard clusters. You don't necessarily need to create a new Autopilot cluster; you can simply deploy your pods in Autopilot mode as long as your Standard cluster meets the requirements:

gcloud container clusters check-autopilot-compatibility my-cluster \
--project YOUR_PROJECT_ID \
--region us-west1

ref:
https://cloud.google.com/kubernetes-engine/docs/concepts/about-autopilot-mode-standard-clusters
https://cloud.google.com/kubernetes-engine/docs/how-to/autopilot-classes-standard-clusters

Deploy Workloads in Autopilot Mode

The only thing you need to do is add one magical config: nodeSelector: cloud.google.com/compute-class: "autopilot". That's it. You don't need to create or manage any node pools beforehand, just write some YAMLs and kubectl apply. All workloads with cloud.google.com/compute-class: "autopilot" will run in Autopilot mode.

More importantly, you are only billed for the CPU/memory resources your pods request, not for nodes that may have unused capacity or system pods (those running under the kube-system namespace). Autopilot mode is both cost-efficient and developer-friendly.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        cloud.google.com/compute-class: "autopilot"
      containers:
        - name: nginx
          image: nginx:1.29.3
          ports:
            - name: http
              containerPort: 80
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi

If your workloads are fault-tolerant (stateless), you can use Spot instances to save a significant amount of money. Just change the nodeSelector to autopilot-spot:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  template:
    spec:
      nodeSelector:
        cloud.google.com/compute-class: "autopilot-spot"
      terminationGracePeriodSeconds: 25 # spot instances have a 25s warning before preemption

ref:
https://docs.cloud.google.com/kubernetes-engine/docs/how-to/autopilot-classes-standard-clusters

You will see something like this in your Autopilot cluster:

kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
gk3-my-auto-cluster-nap-xxx      Ready    <none>   2d18h   v1.33.5-gke.1201000
gk3-my-auto-cluster-nap-xxx      Ready    <none>   1d13h   v1.33.5-gke.1201000
gk3-my-auto-cluster-pool-1-xxx   Ready    <none>   86m     v1.33.5-gke.1201000

The nap nodes are auto-provisioned by Autopilot for your workloads, while pool-1 is a default node pool created during cluster creation. System pods may run on either, but in Autopilot cluster, you are never billed for the nodes themselves (neither nap nor pool-1), nor for the system pods. You only pay for the resources requested by your application pods.

FYI, the minimum resources for Autopilot workloads are:

CPU: 50m
Memory: 52Mi

Additionally, Autopilot applies the following default resource requests if not specified:

Containers in DaemonSets
- CPU: 50m
- Memory: 100Mi
- Ephemeral storage: 100Mi
All other containers
- Ephemeral storage: 1Gi

ref:
https://docs.cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests

Exclude Prometheus Metrics

You may see Prometheus Samples Ingested in your billing. If you don't need (or don't care about) Prometheus metrics for observability, you could exclude them:

Go to Google Cloud Console -> Monitoring -> Metrics Management -> Excluded Metrics -> Metrics Exclusion
If you want to exclude all:
- prometheus.googleapis.com/.*
If you only want to exclude some:
- prometheus.googleapis.com/container_.*
- prometheus.googleapis.com/kubelet_.*

It's worth noting that excluding Prometheus metrics won't affect your HorizontalPodAutoscaler (HPA) which is using Metrics Server instead.

ref:
https://console.cloud.google.com/monitoring/metrics-management/excluded-metrics
https://docs.cloud.google.com/stackdriver/docs/managed-prometheus/cost-controls