Kubernetes - Scaling Applications with Kubectl



Scaling is a fundamental aspect of managing applications in Kubernetes, allowing us to handle varying loads efficiently. In this chapter, we'll explore how to scale applications manually and automatically, ensuring optimal performance and resource utilization.

Understanding Scaling in Kubernetes

In Kubernetes, scaling refers to adjusting the number of pod replicas to meet the current demand. There are two primary types of scaling:

  • Horizontal Scaling: Involves increasing or decreasing the number of pod replicas. This is the most common form of scaling in Kubernetes.
  • Vertical Scaling: Involves adding more resources (CPU, memory) to a single pod.

Kubernetes provides tools for both manual and automatic scaling to help maintain application performance and resource efficiency.

Manual Scaling with kubectl

Manual scaling is useful for predictable workloads or during development and testing phases. The kubectl scale command allows us to adjust the number of replicas for deployments, replica sets, or stateful sets.

Scaling a Deployment

To scale a deployment named my-app to 5 replicas:

$ kubectl scale deployment my-app --replicas=5

Output

deployment.apps/my-app scaled

This command instructs Kubernetes to ensure that 5 pods of my-app are running.

Scaling a ReplicaSet

To scale a ReplicaSet named my-replicaset to 3 replicas:

$ kubectl scale replicaset my-replicaset --replicas=3

Output

replicaset.apps/my-replicaset scaled

Scaling a StatefulSet

To scale a StatefulSet named my-statefulset to 4 replicas:

$ kubectl scale statefulset my-statefulset --replicas=4

Output

statefulset.apps/my-statefulset scaled

Scaling All Deployments in a Namespace

To scale all deployments in the default namespace to 2 replicas:

$ kubectl scale deployments --all --replicas=2 -n default

Output

deployment.apps/my-app scaled

Conditional Scaling

We can also perform conditional scaling by specifying the current number of replicas. This ensures that scaling only occurs if the current state matches the expected state.

$ kubectl scale deployment my-app --replicas=5 --current-replicas=3

Output

deployment.apps/my-app scaled

This command only scales my-app to 5 replicas if it currently has 3 replicas.

Automated Scaling with Horizontal Pod Autoscaler (HPA)

For dynamic workloads, manual scaling isn't practical. Kubernetes offers the Horizontal Pod Autoscaler (HPA) to automatically adjust the number of pod replicas based on observed CPU utilization or other select metrics.

Setting Up HPA

First, we’ll ensure that the Metrics Server is deployed in our cluster, as HPA relies on it to retrieve metrics:

$ kubectl apply -f 
https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.1/components.yaml

Output

serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

Verify that the Metrics Server is running:

$ kubectl get pods -n kube-system

Output

kube-scheduler-controlplane             1/1     Running   0          83m
kubelet-csr-approver-5cd65b5d5f-2q4p5   1/1     Running   0          83m
kubelet-csr-approver-5cd65b5d5f-ntmk9   1/1     Running   0          83m
metrics-server-7fcfc544f6-dx24d         1/1     Running   0          49s

Now, we will create an HPA for a deployment named my-app that maintains an average CPU utilization of 50% and scales between 2 and 10 replicas:

$ kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

Output

horizontalpodautoscaler.autoscaling/my-app autoscaled

This command creates an HPA resource that monitors the CPU usage of my-app and adjusts the number of replicas accordingly.

Viewing HPA Status

To check the status of the HPA:

$ kubectl get hpa

Output

NAME      REFERENCE           TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
my-app    Deployment/my-app   cpu: 40%/50%      2         10        2          5m

This displays information about the HPA, including:

  • Current CPU utilization.
  • The desired number of replicas.
  • The current number of replicas in use.

Best Practices for Scaling

When it comes to scaling applications in Kubernetes, effective scaling is more than just adjusting replica counts. We want to ensure that our system can handle variable loads efficiently while maintaining reliability. Here are some best practices:

Define Resource Requests and Limits

We should always define resource requests and limits for our containers. By doing so, we help Kubernetes make informed decisions about scheduling and scaling. Requests specify the minimum resources required for a container, while limits define the maximum amount of resources a container can use. This ensures our containers don’t consume more resources than they need, and Kubernetes can better manage resource distribution.

Here’s an example of how we can define resource requests and limits in our Kubernetes Deployment YAML file:

resources:
  requests:
    cpu: "100m"
    memory: "200Mi"
  limits:
    cpu: "500m"
    memory: "500Mi"

Use Readiness and Liveness Probes

Implementing readiness and liveness probes is crucial for maintaining application health. The readiness probe ensures that Kubernetes only routes traffic to healthy pods, while the liveness probe allows Kubernetes to restart pods that become unresponsive.

We can add these probes to our deployment YAML like this:

readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Monitor and Adjust HPA Settings

We need to monitor the performance of our application and adjust Horizontal Pod Autoscaler (HPA) thresholds as needed. By keeping an eye on how our pods perform, we can adjust HPA settings to ensure our scaling behavior aligns with actual usage patterns.

Combine HPA with Cluster Autoscaler

While HPA adjusts the number of pods based on resource usage, the Cluster Autoscaler is responsible for adjusting the number of nodes in our cluster. By combining both, we ensure that our cluster can accommodate the scaled workloads effectively.

Conclusion

Scaling applications in Kubernetes is essential for maintaining performance and efficiency. By leveraging both manual scaling with kubectl and automated scaling with HPA, we can ensure our applications adapt to varying workloads seamlessly.

Here are the Key Takeaways:

  • Use kubectl scale for manual, immediate scaling needs.
  • Implement HPA for dynamic, metric-based scaling.
  • Define resource requests and limits to aid in effective scaling decisions.
  • Combine HPA with Cluster Autoscaler for comprehensive scaling solutions.

By following these practices, we can build resilient, scalable applications that meet user demands effectively.

Advertisements