The Kubernetes's Cluster Autoscaler automatically adjusts the number of nodes in your cluster when pods fail or are rescheduled onto other nodes. Here we're going to deploy the AWS implementation that implements the decisions of Cluster Autoscaler by communicating with AWS products and services such as Amazon EC2.
ref:
https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html
Configure IAM Permissions
If your existing node groups were created with eksctl create nodegroup --asg-access
, then this policy already exists and you can skip this step.
in cluster-autoscaler-policy-staging.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/k8s.io/cluster-autoscaler/perp-staging": "owned"
}
}
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*"
}
]
}
aws --profile perp iam create-policy \
--policy-name AmazonEKSClusterAutoscalerPolicyStaging \
--policy-document file://cluster-autoscaler-policy-staging.json
eksctl --profile=perp create iamserviceaccount \
--cluster=perp-staging \
--namespace=kube-system \
--name=cluster-autoscaler \
--attach-policy-arn=arn:aws:iam::xxx:policy/AmazonEKSClusterAutoscalerPolicy \
--override-existing-serviceaccounts \
--approve
ref:
https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
Deploy Cluster Autoscaler
Download the deployment yaml of cluster-autoscaler
:
curl -o cluster-autoscaler-autodiscover.yaml \
https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Before you apply the file, it's recommended to check whether the version of cluster-autoscaler
matches the Kubernetes major and minor version of your cluster. Find the version number on GitHub releases.
kubectl apply -f cluster-autoscaler-autodiscover.yaml
# or
kubectl set image deployment cluster-autoscaler \
-n kube-system \
cluster-autoscaler=k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.3
Do some tweaks:
--balance-similar-node-groups
ensures that there is enough available compute across all availability zones.--skip-nodes-with-system-pods=false
ensures that there are no problems with scaling to zero.
kubectl patch deployment cluster-autoscaler \
-n kube-system \
-p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'
kubectl -n kube-system edit deployment.apps/cluster-autoscaler
# change the command to the following:
# spec:
# containers:
# - command
# - ./cluster-autoscaler
# - --v=4
# - --stderrthreshold=info
# - --cloud-provider=aws
# - --skip-nodes-with-local-storage=false
# - --expander=least-waste
# - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/perp-staging
# - --balance-similar-node-groups
# - --skip-nodes-with-system-pods=false
ref:
https://aws.github.io/aws-eks-best-practices/cluster-autoscaling/
https://github.com/kubernetes/autoscaler/tree/master/charts/cluster-autoscaler#additional-configuration