Kubernetes - Managing Taints and Tolerations



In Kubernetes, scheduling is automatic, but sometimes we need more control over where workloads run. For example, we may want certain nodes to run only specific pods, or prevent some pods from being scheduled on critical nodes. This is where taints and tolerations come into play.

Taints allow us to mark a node so that no pod can be scheduled on it unless that pod has a matching toleration. It’s a way of telling the scheduler, "Don’t schedule here unless you’re explicitly allowed."

In this chapter, we’ll explore how to add taints to nodes, how tolerations work, and how we can use both to fine-tune pod scheduling behavior.

What Are Taints and Tolerations?

A taint is a property that we add to a node to repel certain pods. A toleration is a property that we add to a pod to allow it to be scheduled on a tainted node.

Kubernetes won’t schedule a pod on a node if that node has a taint, unless the pod tolerates it.

Taints Format

Taints have the following format:

<key>=<value>:<effect>

Where effect can be:

  • NoSchedule: Pods will not be scheduled on the node unless they tolerate the taint.
  • PreferNoSchedule: Kubernetes will try to avoid scheduling pods on the node.
  • NoExecute: Existing pods that don’t tolerate the taint will be evicted.

Viewing Existing Taints

To check the taints on all nodes:

$ kubectl get nodes -o json | jq '.items[].spec.taints'

Output

[
   {
      "effect": "NoSchedule",
      "key": "node-role.kubernetes.io/control-plane"
   }
]
null

Or, for a human-readable format:

$ kubectl describe node <node-name>

For instance:

$ kubectl describe node node01

Output

Name:               node01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
dedicated=web
kubernetes.io/arch=amd64
kubernetes.io/hostname=node01
kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"aa:01:1c:5b:c1:3c"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.16.36.6
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 05 Apr 2025 11:16:32 +0000
Taints:             dedicated=web:NoSchedule
Unschedulable:      false
Lease:
HolderIdentity:  node01
AcquireTime:     <unset>
RenewTime:       Sat, 05 Apr 2025 11:29:59 +0000

Adding a Taint to a Node

Let’s say we want to prevent general workloads from being scheduled on a particular node. We can add a NoSchedule taint like this:

$ kubectl taint nodes <node-name> key=value:NoSchedule

Example

$ kubectl taint nodes node01 dedicated=web:NoSchedule

Output

node/node01 tainted

This means only pods that tolerate dedicated=web:NoSchedule can run on node01.

To confirm that the taint has been successfully applied, we can use the following command:

$ kubectl describe node node01 | grep -i taint

Output

Taints: dedicated=web:NoSchedule

Removing a Taint from a Node

To remove a taint, simply run:

$ kubectl taint nodes node01 dedicated:NoSchedule-

Output

node/node01 untainted

Adding a Toleration to a Pod

To allow a pod to be scheduled on a tainted node, we can add a matching toleration in the pod spec.

Using an editor, create the file nginx-web.yaml and then add the following pod definition:

apiVersion: v1
kind: Pod
metadata:
name: nginx-web
spec:
containers:
- name: nginx
image: nginx
tolerations:
- key: "dedicated"
operator: "Equal"
value: "web"
effect: "NoSchedule"

This pod will tolerate the dedicated=web:NoSchedule taint and can run on node01.

We can apply it with:

$ kubectl apply -f nginx-web.yaml

Output

pod/nginx-web created

Confirming Pod Scheduling

Now we can confirm if the pod is scheduled on the tainted node (node01):

$ kubectl get pod nginx-web -o wide

Output

NAME        READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
nginx-web   1/1     Running   0          81s   10.244.1.7   node01   <none>           <none>

Creating Tolerations via kubectl run

We can also add tolerations directly using kubectl run with the --overrides flag:

$ kubectl run nginx-web --image=nginx --overrides='{
   "apiVersion": "v1",
   "spec": {
      "tolerations": [
         {
            "key": "dedicated",
            "operator": "Equal",
            "value": "web",
            "effect": "NoSchedule"
         }
      ]
   }
}'

Output

pod/nginx-web created

Evicting Pods

In some scenarios, we may want nodes to automatically evict non-critical pods—for instance, during maintenance or high memory pressure situations. In such cases, we can use a NoExecute taint:

$ kubectl taint nodes node01 critical=true:NoExecute

Output

node/node01 tainted

This tells Kubernetes to evict any pods on node01 that do not have a matching toleration.

Defining a Temporary Critical Pod

Now let’s define a pod that can tolerate this taint for 60 seconds:

Create a file called critical-pod.yaml and add the following content:

apiVersion: v1
kind: Pod
metadata:
name: temporary-critical
spec:
containers:
- name: busybox
image: busybox
command: ["sleep", "3600"]
tolerations:
- key: "critical"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 60

Apply the manifest:

$ kubectl apply -f critical-pod.yaml

Output

pod/temporary-critical created

Observing Eviction Behavior

We can observe this behavior using:

$ kubectl get events --field-selector involvedObject.name=temporary-critical

Output

LAST SEEN   TYPE     REASON                 OBJECT                   MESSAGE
107s        Normal   Scheduled              pod/temporary-critical   Successfully assigned default/temporary-critical to node01
105s        Normal   Pulling                pod/temporary-critical   Pulling image "busybox"
103s        Normal   Pulled                 pod/temporary-critical   Successfully pulled image "busybox" in 2.56s (2.56s including waiting). Image size: 2156519 bytes.
103s        Normal   Created                pod/temporary-critical   Created container: busybox
102s        Normal   Started                pod/temporary-critical   Started container busybox
47s         Normal   TaintManagerEviction   pod/temporary-critical   Marking for deletion Pod default/temporary-critical
47s         Normal   Killing                pod/temporary-critical   Stopping container busybox

This confirms the pod tolerated the NoExecute taint for 60 seconds and was then automatically evicted, exactly as defined in the toleration settings.

Best Practices

  • Use taints for protecting critical nodes from general workloads.
  • Label nodes clearly and consistently.
  • Always add tolerations when tainting nodes.
  • Document taints and tolerations in your cluster policies.

Troubleshooting Tips

Pods Stuck in Pending State

If a pod is stuck and kubectl describe pod <pod-name> shows taint mismatch, check:

  • Node taints using kubectl describe node
  • Pod tolerations in the YAML

Node Appears Unschedulable

If a node has a taint and no pods can be scheduled:

  • Either remove the taint or
  • Add matching tolerations to pods

View Which Pods Tolerate Which Taints

Use this command to see tolerations on all pods:

$ kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.tolerations}{"\n"}{end}'

Conclusion

Taints and tolerations are powerful tools for advanced scheduling in Kubernetes. They help us control where pods run, protect critical workloads, and enforce logical separation across our infrastructure.

By mastering taints and tolerations, we can gain better control over our Kubernetes clusters, ensuring that workloads are placed where they make the most sense. Let’s use these features to build smarter, more resilient infrastructure.

Advertisements