
- Kubernetes Tutorial
- Kubernetes - Home
- Kubernetes - Overview
- Kubernetes - Architecture
- Kubernetes - Setup
- Kubernetes - Setup on Ubuntu
- Kubernetes - Images
- Kubernetes - Jobs
- Kubernetes - Labels & Selectors
- Kubernetes - Namespace
- Kubernetes - Node
- Kubernetes - Service
- Kubernetes - POD
- Kubernetes - Replication Controller
- Kubernetes - Replica Sets
- Kubernetes - Deployments
- Kubernetes - Volumes
- Kubernetes - Secrets
- Kubernetes - Network Policy
- Advanced Kubernetes
- Kubernetes - API
- Kubernetes - Kubectl
- Kubernetes - Kubectl Commands
- Kubernetes - Creating an App
- Kubernetes - App Deployment
- Kubernetes - Autoscaling
- Kubernetes - Dashboard Setup
- Kubernetes - Helm Package Management
- Kubernetes - CI/CD Integration
- Kubernetes - Persistent Storage and PVCs
- Kubernetes - RBAC
- Kubernetes - Logging & Monitoring
- Kubernetes - Service Mesh with Istio
- Kubernetes - Backup and Disaster Recovery
- Managing ConfigMaps and Secrets
- Running Stateful Applications
- Multi-Cluster Management
- Security Best Practices
- Kubernetes CRDs
- Debugging Pods and Nodes
- K9s for Cluster Management
- Managing Taints and Tolerations
- Horizontal and Vertical Pod Autoscaling
- Minikube for Local Development
- Kubernetes in Docker
- Deploying Microservices
- Blue-Green Deployments
- Canary Deployments with Commands
- Troubleshooting Kubernetes with Commands
- Scaling Applications with Kubectl
- Advanced Scheduling Techniques
- Upgrading Kubernetes Clusters
- Kubernetes Useful Resources
- Kubernetes - Quick Guide
- Kubernetes - Useful Resources
- Kubernetes - Discussion
Kubernetes - Horizontal and Vertical Pod Autoscaling
Autoscaling is a powerful Kubernetes feature that helps us ensure our applications remain responsive and cost-efficient by adjusting the number of pods or resources they consume based on demand. In this chapter, weâll explore two types of autoscaling in Kubernetes: Horizontal Pod Autoscaling and Vertical Pod Autoscaling.
What is Horizontal Pod Autoscaling?
Horizontal Pod Autoscaling (HPA) automatically increases or decreases the number of pod replicas in a deployment or replicaset based on observed CPU/memory usage or custom metrics.
Imagine we have a web service handling fluctuating traffic. Instead of manually scaling the number of pods, HPA can help us scale it up when traffic spikes and scale down when it's idle.
Prerequisites for HPA
Before we can use HPA, make sure:
- The Metrics Server is deployed and running in the cluster.
Why the Metrics Server is important
- Metrics Server collects CPU/memory usage from each pod via Kubelets.
- It exposes these metrics through the Kubernetes metrics.k8s.io API.
- HPA queries this API periodically.
- If usage crosses the configured threshold (e.g., 50% CPU), HPA adjusts the number of replicas accordingly.
To install the Metrics Server:
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Output
service/metrics-server created deployment.apps/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
Then verify it's working:
$ kubectl get deployment metrics-server -n kube-system
Output
NAME READY UP-TO-DATE AVAILABLE AGE metrics-server 1/1 1 0 42s
Using Horizontal Pod Autoscaler (HPA)
Letâs walk through a simple example of setting up HPA for an NGINX deployment.
Create a Deployment
Create a file called nginx-deployment.yaml:
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx resources: requests: cpu: 100m limits: cpu: 200m
Apply the deployment:
kubectl apply -f nginx-deployment.yaml
Output
deployment.apps/nginx-deployment created
Create the HPA
Now weâll create a Horizontal Pod Autoscaler that adjusts between 2 and 5 pods based on CPU usage:
$ kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=2 --max=5
Output
horizontalpodautoscaler.autoscaling/nginx-deployment autoscaled
This command tells Kubernetes to maintain an average CPU utilization of 50% across all pods.
Monitor the Autoscaler
Use the following command to check the current status:
$ kubectl get hpa
Output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE nginx-deployment Deployment/nginx-deployment 65%/50% 2 5 3 5m
To test it, we can generate CPU load on the pods using a stress tool like busybox or a custom load generator.
Generate CPU Load to Trigger HPA
We'll run a busybox pod and use it to continuously hit the NGINX service. This will indirectly increase CPU usage and trigger scaling via HPA.
Expose the NGINX Deployment
Weâll start by creating a ClusterIP service so we can target it with load:
$ kubectl expose deployment nginx-deployment --port=80 --target-port=80
Output
service/nginx-deployment exposed
Create the Load Generator (busybox)
Now weâll create a pod that repeatedly curls the NGINX service to cause CPU load.
Create a file named busybox-load.yaml and the following lines:
apiVersion: v1 kind: Pod metadata: name: busybox-load spec: containers: - name: busybox image: busybox command: ["sh", "-c", "while true; do wget -q -O- http://nginx-deployment; done"]
Apply it:
$ kubectl apply -f busybox-load.yaml
Output
pod/busybox-load created
This will continuously send HTTP requests to the NGINX service, keeping the pods busy.
Monitor HPA Reaction
Watch how the autoscaler responds:
$ kubectl get hpa -w
Output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE nginx-deployment Deployment/nginx-deployment 65%/50% 2 5 2 4m22s
This output confirms:
- Our application (nginx-deployment) is under high CPU load â 65% on average.
- The autoscaler saw that usage exceeds the target (50%), so it scaled the number of pods from 2 to 3.
- HPA is working as expected â it dynamically adjusted our pod count to meet CPU demand.
What is Vertical Pod Autoscaling?
While HPA adjusts the number of pods, Vertical Pod Autoscaling (VPA) adjusts the CPU and memory requests and limits of individual pods. This is useful for applications with unpredictable memory or CPU requirements where simply adding pods doesnât help.
VPA is ideal for batch jobs or services that canât scale horizontally.
Enabling Vertical Pod Autoscaling
VPA isnât installed by default in most clusters. Hereâs how we can install it using the official YAMLs:
$ git clone https://github.com/kubernetes/autoscaler.git $ cd autoscaler/vertical-pod-autoscaler
Output
Cloning into 'autoscaler'... remote: Enumerating objects: 219519, done. remote: Counting objects: 100% (1546/1546), done. remote: Compressing objects: 100% (1095/1095), done. remote: Total 219519 (delta 972), reused 453 (delta 451), pack-reused 217973 (from 4) Receiving objects: 100% (219519/219519), 245.21 MiB | 14.42 MiB/s, done. Resolving deltas: 100% (141772/141772), done. Updating files: 100% (7924/7924), done
Apply the Custom Resource Definitions (CRDs)
$ kubectl apply -f deploy/vpa-crd.yaml
Output
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalers.autoscaling.k8s.io created customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalercheckpoints.autoscaling.k8s.io created
This defines the VPA objects (VerticalPodAutoscaler) that Kubernetes can understand.
Apply the RBAC Permissions
$ kubectl apply -f vpa-rbac.yaml
Output
clusterrole.rbac.authorization.k8s.io/vpa-system-role created clusterrolebinding.rbac.authorization.k8s.io/vpa-system-binding created
This grants the necessary permissions for the VPA components to function securely.
Deploy the VPA Components
Apply the Recommender deployment:
$ kubectl apply -f deploy/recommender-deployment.yaml
Output
deployment.apps/vpa-recommender created
Apply the Admission Controller Deployment
$ kubectl apply -f deploy/admission-controller-deployment.yaml
Output
deployment.apps/vpa-admission-controller created service/vpa-webhook created
Apply the Updater Deployment
$ kubectl apply -f deploy/updater-deployment.yaml
Output
deployment.apps/vpa-updater created
Verify that the VPA Components are Running
$ kubectl get pods -n kube-system | grep vpa
Output
vpa-admission-controller-6d5b66d7f5-mtx5g 1/1 Running 0 5m vpa-recommender-6b87f56c7b-q84g5 1/1 Running 0 5m vpa-updater-56f6f777d8-2knrj 1/1 Running 0 5m
Using Vertical Pod Autoscaler (VPA)
Letâs create a sample deployment and attach a VPA configuration.
Create a Deployment
Create a file vpa-demo.yaml:
apiVersion: apps/v1 kind: Deployment metadata: name: vpa-demo spec: replicas: 1 selector: matchLabels: app: vpa-demo template: metadata: labels: app: vpa-demo spec: containers: - name: app image: vish/stress resources: requests: cpu: 100m memory: 100Mi limits: cpu: 200m memory: 200Mi args: - -cpus - "2"
Apply the deployment:
kubectl apply -f vpa-demo.yaml
Output
deployment.apps/vpa-demo created
Create the VPA Object
Now weâll create the VPA configuration (vpa.yaml):
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: vpa-demo spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: vpa-demo updatePolicy: updateMode: "Auto"
Apply it:
$ kubectl apply -f vpa.yaml
Output
verticalpodautoscaler.autoscaling.k8s.io/vpa-demo created
Monitor VPA Recommendations
After applying the VPA configuration, the VPA will start monitoring and recommending resource adjustments for the deployment. If the updateMode is set to Auto, the VPA will automatically apply the recommendations during pod restarts.
To view the VPA recommendations, run:
$ kubectl describe vpa vpa-demo
Output
Name: vpa-demo Namespace: default Target Ref: Deployment/vpa-demo Update Policy: Auto Recommendations: Container: app Target: CPU: 100m Memory: 100Mi Current: CPU: 200m Memory: 200Mi Utilization: CPU: 50% Memory: 50%
The recommendations show the target and current resource requests and limits, as well as the utilization of CPU and memory. Based on this, VPA will make suggestions and may modify the resource requests during a pod restart.
When to Use HPA vs VPA
Use Case | HPA | VPA |
---|---|---|
Web services under traffic | Yes | Sometimes useful |
Long-running jobs | No | Ideal |
Resource tuning for workloads | No | Yes |
Fast scaling needs | Best choice | Not optimal |
You can also combine both (with care) by setting HPA for replicas and VPA in "Off" mode to only get recommendations.
Conclusion
In this tutorial, weâve learned how Kubernetes helps us scale workloads using Horizontal and Vertical Pod Autoscaling.
- HPA adjusts the number of pods based on CPU, memory, or custom metrics.
- VPA automatically adjusts pod resource requests and limits.
Both autoscaling types are crucial tools in building a resilient, cost-effective, and responsive infrastructure. Try them out in your test environments, and start tuning your workloads the smart way.