
- Kubernetes Tutorial
- Kubernetes - Home
- Kubernetes - Overview
- Kubernetes - Architecture
- Kubernetes - Setup
- Kubernetes - Setup on Ubuntu
- Kubernetes - Images
- Kubernetes - Jobs
- Kubernetes - Labels & Selectors
- Kubernetes - Namespace
- Kubernetes - Node
- Kubernetes - Service
- Kubernetes - POD
- Kubernetes - Replication Controller
- Kubernetes - Replica Sets
- Kubernetes - Deployments
- Kubernetes - Volumes
- Kubernetes - Secrets
- Kubernetes - Network Policy
- Advanced Kubernetes
- Kubernetes - API
- Kubernetes - Kubectl
- Kubernetes - Kubectl Commands
- Kubernetes - Creating an App
- Kubernetes - App Deployment
- Kubernetes - Autoscaling
- Kubernetes - Dashboard Setup
- Kubernetes - Helm Package Management
- Kubernetes - CI/CD Integration
- Kubernetes - Persistent Storage and PVCs
- Kubernetes - RBAC
- Kubernetes - Logging & Monitoring
- Kubernetes - Service Mesh with Istio
- Kubernetes - Backup and Disaster Recovery
- Managing ConfigMaps and Secrets
- Running Stateful Applications
- Multi-Cluster Management
- Security Best Practices
- Kubernetes CRDs
- Debugging Pods and Nodes
- K9s for Cluster Management
- Managing Taints and Tolerations
- Horizontal and Vertical Pod Autoscaling
- Minikube for Local Development
- Kubernetes in Docker
- Deploying Microservices
- Blue-Green Deployments
- Canary Deployments with Commands
- Troubleshooting Kubernetes with Commands
- Scaling Applications with Kubectl
- Advanced Scheduling Techniques
- Upgrading Kubernetes Clusters
- Kubernetes Useful Resources
- Kubernetes - Quick Guide
- Kubernetes - Useful Resources
- Kubernetes - Discussion
Kubernetes - Debugging Pods and Nodes
Kubernetes is a robust platform for deploying and managing applications, but like any complex system, things can go wrong. Pods may crash unexpectedly, nodes can become unresponsive, and networking issues might disrupt communication between services. Instead of guessing what went wrong, we need a structured approach to troubleshooting.
In this chapter, weâll walk through practical debugging techniques to identify and resolve issues with Kubernetes Pods and Nodes, helping us maintain a stable and reliable cluster.
Understanding Kubernetes Debugging
Before jumping into specific debugging techniques, let's define what debugging in Kubernetes means. Debugging involves identifying and resolving issues in our cluster, such as:
- Application crashes due to misconfigurations or resource constraints.
- Networking problems that prevent communication between services.
- Node failures causing Pods to be unschedulable.
- PersistentVolume issues where data is inaccessible.
- Misconfigured deployments leading to unexpected behavior.
By understanding common failure points, we can systematically approach debugging in a structured way.
Debugging Kubernetes Pods
Pods are the smallest deployable units in Kubernetes, and most issues start at this level. Let's look at various ways to diagnose and fix Pod-related problems.
Checking Pod Status
The first step in debugging a Pod is to check its status. We use:
$ kubectl get pods
Output
NAME READY STATUS RESTARTS AGE crashloop-pod 0/1 CrashLoopBackOff 3 (37s ago) 88s
The STATUS column tells us if the Pod is running, pending, or in an error state. If a Pod is in CrashLoopBackOff, it means the application is repeatedly crashing.
Check the Logs of the Pod
Since the container is failing, logs provide valuable insights:
$ kubectl logs crashloop-pod
If there are multiple containers in the Pod, specify the container name:
$ kubectl logs crashloop-pod -c faulty-container
Describe the Pod for More Details
This command will show events and reasons for failure:
$ kubectl describe pod crashloop-pod
Output
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 31m default-scheduler Successfully assigned default/crashloop-pod to node01 Normal Pulled 31m kubelet Successfully pulled image "busybox" in 2.602s (2.602s including waiting). Image size: 2156519 bytes. Normal Pulled 31m kubelet Successfully pulled image "busybox" in 823ms (823ms including waiting). Image size: 2156519 bytes. Normal Pulled 30m kubelet Normal Created 30m (x4 over 31m) kubelet Created container: faulty-container Normal Started 30m (x4 over 31m) kubelet Started container faulty-container Warning BackOff 21m (x47 over 31m) kubelet Back-off restarting failed container faulty-container...
Look for messages under the Events sectionâthis often gives hints about why the Pod is crashing, such as an image pull error, insufficient CPU/memory, or a missing secret.
Possible Fixes
If it's an image issue, ensure the correct image name is used:
$ kubectl get pod crashloop-pod -o yaml | grep image
Output
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"crashloop-pod","namespace":"default"}, "spec":{"containers":[{"command":["sh","-c","exit 1"],"image":"busybox","name":"faulty-container"}]}} image: busybox imagePullPolicy: Always - image: busybox imagePullPolicy: Always image: busybox imagePullPolicy: Always image: docker.io/library/busybox:latest imageID:
From the above output, it looks like crashloop-pod is using the faulty command (exit 1), causing it to keep crashing. To fix this, update the pod's YAML definition and restart it.
Fix the CrashLoopBackOff Issue
Run the following command to edit the pod's YAML:
$ kubectl edit pod crashloop-pod
Then, modify the command section under spec.containers to:
containers: - name: faulty-container image: busybox command: ["sh", "-c", "sleep infinity"]
Save and exit the editor.
Delete and Recreate the Pod
Since editing a running pod doesnât update its spec, delete and recreate it:
$ kubectl delete pod crashloop-pod $ kubectl apply -f crashloop-pod.yaml
Now, verify the status:
$ kubectl get pods NAME READY STATUS RESTARTS AGE crashloop-pod 0/1 Running 0 25s
Debugging Kubernetes Nodes
If a Pod issue isnât resolved, the problem might be at the Node level. Kubernetes nodes run workloads, and if they fail, Pods may become unavailable.
Checking Node Status
List the nodes and their conditions:
$ kubectl get nodes
Output
NAME STATUS ROLES AGE VERSION controlplane Ready control-plane 94m v1.31.6 node01 Ready <none> 94m v1.31.6
A NotReady node indicates an issue. If a node remains in the NotReady state, it may be due to network issues. In such cases, installing a network plugin like Flannel can help restore connectivity:
$ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
Investigating Node Issues
Checking Kubelet Logs
Kubelet manages Pods on a node. If a node is failing, check its logs:
$ journalctl -u kubelet -f
Look for errors like failed to start container or out of memory issues.
Checking Disk Space
Nodes may fail if they run out of disk space. Verify with:
$ df -h
If disk usage is high, clean up logs and unused containers:
$ sudo docker system prune -a
Restarting the Cluster
If debugging doesnât solve the issue, restarting the cluster may help:
$ sudo systemctl restart kubelet containerd
Checking Network Connectivity
If a Pod cannot communicate with another service, test networking using:
$ kubectl exec -it $(kubectl get pod -l app=web-app-xyz -o jsonpath="{.items[0].metadata.name}") -- curl http://my-service:8080
Output
metadata.name}") -- curl http://my-service:8080
The kubectl exec command successfully accessed the web-app-xyz Pod and made a request to my-service:8080. The response confirms that Nginx is running correctly inside the Pod and serving the default welcome page.
If network policies are blocking communication, review them with:
$ kubectl get networkpolicy -A
Debugging Kubernetes Cluster Issues
Checking Kubernetes API Server
If kubectl commands are slow or failing, the API server may be down. Verify with:
$ kubectl cluster-info
Output
Kubernetes control plane is running at https://172.16.32.5:6443 CoreDNS is running at https://172.16.32.5:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use:
'kubectl cluster-info dump'
If the API server is unreachable, check logs on the master node:
$ journalctl -u kube-apiserver -f
Checking Control Plane Components
Run:
$ kubectl get pods -n kube-system
Output
NAME READY STATUS RESTARTS AGE coredns-7c65d6cfc9-k4f4d 1/1 Running 0 48m coredns-7c65d6cfc9-pp729 1/1 Running 0 48m etcd-controlplane 1/1 Running 0 48m...
If key services like kube-controller-manager or etcd are failing, restart them:
$ sudo systemctl restart kubelet
Checking DNS Issues
If services fail due to DNS problems, test resolution inside a Pod:
$ kubectl run -it --rm --image=busybox dns-test -- nslookup my-service
Output
If you don't see a command prompt, try pressing enter. Server: 10.96.0.10 Address: 10.96.0.10:53 Name: my-service.default.svc.cluster.local Address: 10.103.45.32
If it fails, restart CoreDNS:
$ kubectl rollout restart deployment coredns -n kube-system
Conclusion
Debugging Kubernetes requires a systematic approach. By analyzing Pod logs, inspecting events, checking node status, and verifying networking, we can efficiently diagnose and resolve issues within the cluster setup.