
- Kubernetes Tutorial
- Kubernetes - Home
- Kubernetes - Overview
- Kubernetes - Architecture
- Kubernetes - Setup
- Kubernetes - Setup on Ubuntu
- Kubernetes - Images
- Kubernetes - Jobs
- Kubernetes - Labels & Selectors
- Kubernetes - Namespace
- Kubernetes - Node
- Kubernetes - Service
- Kubernetes - POD
- Kubernetes - Replication Controller
- Kubernetes - Replica Sets
- Kubernetes - Deployments
- Kubernetes - Volumes
- Kubernetes - Secrets
- Kubernetes - Network Policy
- Advanced Kubernetes
- Kubernetes - API
- Kubernetes - Kubectl
- Kubernetes - Kubectl Commands
- Kubernetes - Creating an App
- Kubernetes - App Deployment
- Kubernetes - Autoscaling
- Kubernetes - Dashboard Setup
- Kubernetes - Helm Package Management
- Kubernetes - CI/CD Integration
- Kubernetes - Persistent Storage and PVCs
- Kubernetes - RBAC
- Kubernetes - Logging & Monitoring
- Kubernetes - Service Mesh with Istio
- Kubernetes - Backup and Disaster Recovery
- Managing ConfigMaps and Secrets
- Running Stateful Applications
- Multi-Cluster Management
- Security Best Practices
- Kubernetes CRDs
- Debugging Pods and Nodes
- K9s for Cluster Management
- Managing Taints and Tolerations
- Horizontal and Vertical Pod Autoscaling
- Minikube for Local Development
- Kubernetes in Docker
- Deploying Microservices
- Blue-Green Deployments
- Canary Deployments with Commands
- Troubleshooting Kubernetes with Commands
- Scaling Applications with Kubectl
- Advanced Scheduling Techniques
- Upgrading Kubernetes Clusters
- Kubernetes Useful Resources
- Kubernetes - Quick Guide
- Kubernetes - Useful Resources
- Kubernetes - Discussion
Kubernetes - Scaling Applications with Kubectl
Scaling is a fundamental aspect of managing applications in Kubernetes, allowing us to handle varying loads efficiently. In this chapter, we'll explore how to scale applications manually and automatically, ensuring optimal performance and resource utilization.
Understanding Scaling in Kubernetes
In Kubernetes, scaling refers to adjusting the number of pod replicas to meet the current demand. There are two primary types of scaling:
- Horizontal Scaling: Involves increasing or decreasing the number of pod replicas. This is the most common form of scaling in Kubernetes.
- Vertical Scaling: Involves adding more resources (CPU, memory) to a single pod.
Kubernetes provides tools for both manual and automatic scaling to help maintain application performance and resource efficiency.
Manual Scaling with kubectl
Manual scaling is useful for predictable workloads or during development and testing phases. The kubectl scale command allows us to adjust the number of replicas for deployments, replica sets, or stateful sets.
Scaling a Deployment
To scale a deployment named my-app to 5 replicas:
$ kubectl scale deployment my-app --replicas=5
Output
deployment.apps/my-app scaled
This command instructs Kubernetes to ensure that 5 pods of my-app are running.
Scaling a ReplicaSet
To scale a ReplicaSet named my-replicaset to 3 replicas:
$ kubectl scale replicaset my-replicaset --replicas=3
Output
replicaset.apps/my-replicaset scaled
Scaling a StatefulSet
To scale a StatefulSet named my-statefulset to 4 replicas:
$ kubectl scale statefulset my-statefulset --replicas=4
Output
statefulset.apps/my-statefulset scaled
Scaling All Deployments in a Namespace
To scale all deployments in the default namespace to 2 replicas:
$ kubectl scale deployments --all --replicas=2 -n default
Output
deployment.apps/my-app scaled
Conditional Scaling
We can also perform conditional scaling by specifying the current number of replicas. This ensures that scaling only occurs if the current state matches the expected state.
$ kubectl scale deployment my-app --replicas=5 --current-replicas=3
Output
deployment.apps/my-app scaled
This command only scales my-app to 5 replicas if it currently has 3 replicas.
Automated Scaling with Horizontal Pod Autoscaler (HPA)
For dynamic workloads, manual scaling isn't practical. Kubernetes offers the Horizontal Pod Autoscaler (HPA) to automatically adjust the number of pod replicas based on observed CPU utilization or other select metrics.
Setting Up HPA
First, weâll ensure that the Metrics Server is deployed in our cluster, as HPA relies on it to retrieve metrics:
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.1/components.yaml
Output
serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
Verify that the Metrics Server is running:
$ kubectl get pods -n kube-system
Output
kube-scheduler-controlplane 1/1 Running 0 83m kubelet-csr-approver-5cd65b5d5f-2q4p5 1/1 Running 0 83m kubelet-csr-approver-5cd65b5d5f-ntmk9 1/1 Running 0 83m metrics-server-7fcfc544f6-dx24d 1/1 Running 0 49s
Now, we will create an HPA for a deployment named my-app that maintains an average CPU utilization of 50% and scales between 2 and 10 replicas:
$ kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
Output
horizontalpodautoscaler.autoscaling/my-app autoscaled
This command creates an HPA resource that monitors the CPU usage of my-app and adjusts the number of replicas accordingly.
Viewing HPA Status
To check the status of the HPA:
$ kubectl get hpa
Output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE my-app Deployment/my-app cpu: 40%/50% 2 10 2 5m
This displays information about the HPA, including:
- Current CPU utilization.
- The desired number of replicas.
- The current number of replicas in use.
Best Practices for Scaling
When it comes to scaling applications in Kubernetes, effective scaling is more than just adjusting replica counts. We want to ensure that our system can handle variable loads efficiently while maintaining reliability. Here are some best practices:
Define Resource Requests and Limits
We should always define resource requests and limits for our containers. By doing so, we help Kubernetes make informed decisions about scheduling and scaling. Requests specify the minimum resources required for a container, while limits define the maximum amount of resources a container can use. This ensures our containers donât consume more resources than they need, and Kubernetes can better manage resource distribution.
Hereâs an example of how we can define resource requests and limits in our Kubernetes Deployment YAML file:
resources: requests: cpu: "100m" memory: "200Mi" limits: cpu: "500m" memory: "500Mi"
Use Readiness and Liveness Probes
Implementing readiness and liveness probes is crucial for maintaining application health. The readiness probe ensures that Kubernetes only routes traffic to healthy pods, while the liveness probe allows Kubernetes to restart pods that become unresponsive.
We can add these probes to our deployment YAML like this:
readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 10
Monitor and Adjust HPA Settings
We need to monitor the performance of our application and adjust Horizontal Pod Autoscaler (HPA) thresholds as needed. By keeping an eye on how our pods perform, we can adjust HPA settings to ensure our scaling behavior aligns with actual usage patterns.
Combine HPA with Cluster Autoscaler
While HPA adjusts the number of pods based on resource usage, the Cluster Autoscaler is responsible for adjusting the number of nodes in our cluster. By combining both, we ensure that our cluster can accommodate the scaled workloads effectively.
Conclusion
Scaling applications in Kubernetes is essential for maintaining performance and efficiency. By leveraging both manual scaling with kubectl and automated scaling with HPA, we can ensure our applications adapt to varying workloads seamlessly.
Here are the Key Takeaways:
- Use kubectl scale for manual, immediate scaling needs.
- Implement HPA for dynamic, metric-based scaling.
- Define resource requests and limits to aid in effective scaling decisions.
- Combine HPA with Cluster Autoscaler for comprehensive scaling solutions.
By following these practices, we can build resilient, scalable applications that meet user demands effectively.