What is the need for Node Affinity? #
- Control Where Pods Run: Sometimes, you want Pods to run only on specific nodes (e.g., nodes with GPUs, SSDs, or located in a particular zone)
- Use Node Affinity for Targeted Scheduling: Node Affinity lets you influence which nodes a Pod can be scheduled on, based on node labels
- Ensure Resource Compatibility: Schedule Pods on nodes that meet special hardware or configuration needs, like GPU nodes for ML workloads
- Enhance Performance and Isolation: Co-locate Pods that benefit from shared cache or avoid placing high-traffic Pods on same node for better performance
- Improve Fault Tolerance: Run workloads on nodes in separate locations to reduce the impact of failures in one area
- Avoid Overloading Critical Nodes: Keep non-critical apps away from nodes reserved for sensitive or resource-heavy workloads
Practical Example: Node Affinity #
apiVersion: v1
kind: Pod
metadata:
name: memory-intensive-app
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "512Mi"
cpu: "250m"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- high-mem
What is the difference between Node Affinity and Node Selector? #
nodeSelector:
disktype: ssd
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
- another-value
- Simpler and Older Version: Node Label Selector is Simpler and Older version of Node Affinity
- Straightforward Syntax: Easy to understand and implement for basic use cases
- Limited Flexibility: Only supports exact matches with one or more key-value pairs — Cannot define complex conditions with advanced operators (
In
, NotIn
)
- Node Affinity is Recommended for Modern Use: Node Affinity is Fully backward-compatible with Node Selector logic and is more flexible
Why do we need Pod Affinity? #
- Control Pod Co-Location: Sometimes you want certain Pods to be scheduled on the same node as others — for better performance or tighter integration
- Use Pod Affinity to Group Related Pods: Ensures Pods with similar roles (e.g., app and sidecar, or tightly coupled microservices) are placed together
- Reduce Latency Between Services: Co-locating Pods improves communication speed when services need to frequently talk to each other
- Support Shared Resource Usage: Helps when Pods share a persistent volume (e.g., via ReadWriteMany) or access the same local cache or hardware
- Improve Data Locality and Throughput: Useful in big data or AI workloads where data needs to stay close to compute for fast access
- Apply Topology-Aware Scheduling: Pod affinity can be used across zones, or nodes — by specifying the
topologyKey
Practical Example: Pod Affinity #
apiVersion: v1
kind: Pod
metadata:
name: frontend-app
spec:
containers:
- name: app
image: nginx
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: "kubernetes.io/hostname"
What is Anti-Affinity? #
- Avoid Placing Similar Pods Together: Anti-affinity ensures that certain Pods are not scheduled on the same node (or zone) as other matching Pods
- Define Rules Using Labels and Topology: Anti-affinity is label-based and supports
topologyKey
(like kubernetes.io/hostname
or topology.kubernetes.io/zone
) to control the scope of separation
- Improve High Availability: By spreading replicas across nodes or zones, anti-affinity helps prevent all replicas from going down due to a single node or zone failure
- Prevent Single Point of Failure: Especially useful for replicated workloads (like
StatefulSets
) where all Pods shouldn’t fail together
- Minimize Resource Contention: Keeps resource-hungry Pods apart so they don’t compete for CPU, memory, or disk on the same node
Practical Example: Anti Affinity #
apiVersion: v1
kind: Pod
metadata:
name: analytics-app
spec:
containers:
- name: analytics
image: busybox
command: ["sleep", "3600"]
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: "kubernetes.io/hostname"
Node Affinity vs Pod Affinity vs Pod Anti-Affinity #
Feature |
Description |
Use Case |
Node Affinity |
Schedules pods on nodes based on specific node labels and conditions |
Useful for targeting specific nodes with defined hardware or location (e.g., ML workloads needing GPUs) |
Pod Affinity |
Ensures pods are scheduled close to other related pods (same node, zone or region) |
Ideal for services that work closely together (e.g., web and cache pods) |
Pod Anti-Affinity |
Prevents pods from being scheduled nearby other specific pods |
Useful for high-availability, ensuring redundancy by spreading pods (e.g., databases and queues that need high availability and high durability) |
What are Taints and Tolerations? #
- Prevent Pods from Running on Unsuitable Nodes: Taints let you mark a node as unsuitable for general Pods unless they explicitly tolerate it
- Use Taints to Repel Pods: Nodes can be tainted to repel Pods that don’t match
- Allow Specific Pods Using Tolerations: Pods add tolerations that match taints, signaling they’re allowed to run on those nodes
- Taint = Node with “Only Certified Workloads Allowed” Sign
- Toleration = Pod with “Certified Workloads” Badge
- Isolate Workloads Based on Node Roles: Run sensitive or special workloads (like GPU jobs or system daemons) only on tainted nodes, avoiding interference
- Control Scheduling Behavior Precisely: Combine taints and tolerations to enforce rules such as:
- Don’t schedule general Pods on GPU nodes (
NoSchedule
)
- Prefer not to run Pods on preemptible nodes (
PreferNoSchedule
)
- Evict Pods if a taint is added at runtime (
NoExecute
)
- DO YOU KNOW?: Kubernetes taints master/control-plane nodes to prevent regular workloads (Pods) from running on them
kubectl describe node <master-node-name>
-> Taints: node-role.kubernetes.io/master:NoSchedule
("Don’t schedule Pods here unless they tolerate this taint.")
Practical Example: Taints and Tolerations #
apiVersion: v1
kind: Pod
metadata:
name: team-a-app
spec:
containers:
- name: nginx
image: nginx
tolerations:
- key: "teamA"
operator: "Equal"
value: "true"
effect: "NoSchedule"
What is a Pod Disruption Budget? #
- Prevent Too Many Pods from Going Down at Once: Pod Disruption Budgets (PDBs) ensure a minimum number of replicas stay available during voluntary disruptions
- Handle Planned Maintenance Safely: During node upgrades, autoscaling, or draining, PDBs prevent all replicas from being evicted at the same time
- Maintain Application Availability: Keep enough Pods running to serve traffic
- Protect Stateful and Critical Apps: Use PDBs for databases, APIs, or services that require high availability — even during cluster changes
- Avoid Downtime in Rolling Updates: Kubernetes respects PDBs during deployments and avoids deleting too many Pods at once
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: my-app
Cordon vs Drain #
|
Cordon |
Drain |
Function |
Marks a node as unschedulable; prevents new Pods from being scheduled |
Evicts all Pods from the node; prepares node for maintenance |
Pod Removal |
Does not remove existing pods; they continue running |
Evicts and gracefully terminates pods, moving them to other nodes |
Use Case |
Stop new workloads without disrupting running ones |
Remove all workloads to safely reboot, upgrade, or decommission the node |
Effect on Node |
Node remains active and continues serving current Pods |
Node becomes empty and can be safely maintained or removed |
Command |
kubectl cordon <node-name> |
kubectl drain <node-name> |
Debugging Problems with Starting Up Pods #
- No Pod is Running: Pod is not created or crashes immediately due to issues in its definition or during startup
- Focus on Pod Configuration First!
- (COMMON ISSUE) Something is wrong in Pod definition or startup: Check for YAML errors, invalid image names, missing configs, volume mount failures, or init container errors
- Use Diagnostic Commands: Use
kubectl get pods
, kubectl describe pod
, and kubectl logs
to identify startup issues
- Some Pods Running, Others Not: Some Pods are scheduled and running, but others are stuck in Pending state
- (COMMON ISSUE) Pod scheduling is blocked due to cluster/node constraints: Issues with resources, taints, affinity, node selectors, or Anti-Affinity can prevent scheduling
- Use Scheduling Diagnostics: Use
kubectl describe pod
, kubectl describe node
, and kubectl get nodes --show-labels
to debug scheduling issues