Kubernetes Storage

Why is persistence important for stateful applications in Kubernetes? #

Pod Storage is Ephemeral: Data inside a Pod is lost when the Pod is restarted, deleted, or moved to another node
Retain Important Data Outside the Pod: Store databases, logs, and user files in persistent storage to avoid data loss
Needed for Stateful Apps: Required for apps like MySQL, PostgreSQL, and Kafka that depend on consistent storage
Supports Backups and Disaster Recovery: Supports snapshots and recovery strategies to bring back lost or corrupted data
Ensures High Availability and Reliability: Keeps data safe during failures

What is the need for Volumes in Kubernetes? #

Why Use Kubernetes Volumes?: They allow containers in a Pod to access and share data using a persistent filesystem
Persist Container Data: Avoids data loss when containers crash or restart by keeping data outside the container
Support Multi-Container Pods: Lets containers within the same Pod share files easily using a common volume
Enable Shared Storage Across Pods: Volumes like NFS help share data between Pods, even on different nodes
Use Configs and Secrets as Files: Automatically load configuration files or credentials using ConfigMaps and Secrets
Provide Temporary Working Space: Use emptyDir volumes for scratch space that lives as long as the Pod does
Mount Read-Only Data: Share static content from a container image without modifying it

Give examples of different types of volumes in Kubernetes #

emptyDir: Temporary volume that provides scratch space for a Pod; data is deleted when the Pod is removed from the node
hostPath: Mounts a file or directory from the host node’s filesystem into the Pod
nfs: Mounts a shared NFS (Network File System) volume; allows multiple Pods across nodes to access the same files
configMap: Mount configuration values from key-value pairs directly into the container as files
persistentVolumeClaim: Claim for persistent storage. Kubernetes matches PVC with the right storage!

# Sample Kubernetes Pod using different volume types
apiVersion: v1
kind: Pod
metadata:
  name: volume-demo-pod
spec:
  containers:
    - name: app-container
      image: busybox
      command: [ "sleep", "3600" ]

      # Mount ConfigMap as files
      volumeMounts:
        - name: config-vol
          mountPath: /etc/config

        # Mount temporary scratch space
        - name: temp-vol
          mountPath: /tmp/scratch

        # Mount host path directory
        - name: host-vol
          mountPath: /mnt/host

        # Mount NFS volume
        - name: nfs-vol
          mountPath: /mnt/nfs

        # Mount persistent volume claim
        - name: pvc-vol
          mountPath: /mnt/data

  volumes:
    # 1. configMap volume
    - name: config-vol
      configMap:
        name: app-config
        # ConfigMap must exist in the same namespace

    # 2. emptyDir volume (lives as long as the Pod)
    - name: temp-vol
      emptyDir: {}  # Default: node's disk is used

      # Optionally use memory:
      # emptyDir:
      #   medium: Memory

    # 3. hostPath volume
    - name: host-vol
      hostPath:
        path: /var/log
        type: Directory

    # 4. nfs volume
    - name: nfs-vol
      nfs:
        server: 10.0.0.12
        path: /shared-data
        # NFS server must allow client access

    # 5. persistentVolumeClaim
    - name: pvc-vol
      persistentVolumeClaim:
        claimName: data-pvc
        # The PVC must exist before Pod creation

---
# Sample ConfigMap (used above)
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  app.properties: |
    mode=production
    max_connections=50

---
# Sample PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: standard

What is Static Volume Provisioning? #

Define PersistentVolumes Manually: In static provisioning, cluster admins pre-create PersistentVolumes (PVs) with fixed size, storage type, and access modes
Claim Storage with PVCs: Users create PersistentVolumeClaims (PVCs) to request storage; Kubernetes matches the PVC to a suitable pre-created PV
Use Matching Rules for Binding: Kubernetes binds PVC to PV based on criteria like storage size, access mode, etc.
No Automation for Volume Creation: Unlike dynamic provisioning, the PV must already exist—Kubernetes does not create the volume automatically
Suitable for Controlled Environments: Best used when storage is managed externally (e.g., existing NFS shares) and admins want full control over provisioning

# Static Volume Provisioning Example
# ----------------------------------
# Admin creates the PersistentVolume (PV) manually
# User creates a PersistentVolumeClaim (PVC)
# Kubernetes binds PVC to a matching PV

# Step 1: PersistentVolume (PV)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: static-pv
spec:
  capacity:
    storage: 1Gi  # Size of volume
  accessModes:
    - ReadWriteOnce  # One node read/write
  persistentVolumeReclaimPolicy: Retain
  storageClassName: manual
  hostPath:
    path: "/mnt/data"
    type: DirectoryOrCreate
    # For real use, replace with:
    # nfs:
    #   server: 10.0.0.10
    #   path: /shared-nfs-path

# ReclaimPolicy options:
# - Retain: Keep data after PVC deleted
# - Delete: Remove volume with PVC

---

# Step 2: PersistentVolumeClaim (PVC)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: static-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: manual
  # Must match storageClassName in PV

---

# Step 3: Pod using the PVC
apiVersion: v1
kind: Pod
metadata:
  name: pvc-pod
spec:
  containers:
    - name: app
      image: busybox
      command: [ "sleep", "3600" ]
      volumeMounts:
        - mountPath: "/usr/share/data"
          name: storage-vol
  volumes:
    - name: storage-vol
      persistentVolumeClaim:
        claimName: static-pvc

What is Dynamic Volume Provisioning? #

Provision Storage Automatically: Dynamic provisioning creates storage volumes on demand when users create PersistentVolumeClaims
Avoid Manual Volume Creation: Eliminates the need for admins to manually create PersistentVolumes or interact with storage providers
Use StorageClass for Provisioning Rules: Admins define StorageClass objects that specify which provisioner to use and its parameters
Offer Multiple Storage Flavors: Allows clusters to support different types of storage (e.g., fast SSD, standard HDD, cloud volumes) using different StorageClass configurations
Simplify Storage for Users: Users only need to create a PVC; Kubernetes handles the rest using the defined StorageClass configuration

# Dynamic Volume Provisioning Example
# -----------------------------------
# Admin creates StorageClass once
# Users create PVCs referencing the StorageClass
# Kubernetes dynamically provisions a PV

# Step 1: StorageClass (admin-created using CSI provisioner)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: ebs.csi.aws.com  # ✅ Recommended driver for AWS EBS
# For other providers:
# - GCP: pd.csi.storage.gke.io
# - Azure: disk.csi.azure.com

parameters:
  type: gp3       # AWS EBS volume type
  fsType: ext4    # Filesystem to use

reclaimPolicy: Delete
# - Retain: keep data after PVC is deleted
# - Delete: auto-remove volume with PVC
# - Recycle: legacy; not supported by most CSI drivers

volumeBindingMode: WaitForFirstConsumer
# - Immediate: bind volume as soon as PVC is created
# - WaitForFirstConsumer: bind only when Pod is scheduled

---

# Step 2: PersistentVolumeClaim (user-created)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: fast-storage
  # Must match StorageClass name

---

# Step 3: Pod using the PVC
apiVersion: v1
kind: Pod
metadata:
  name: dynamic-pod
spec:
  containers:
    - name: app
      image: busybox
      command: [ "sleep", "3600" ]
      volumeMounts:
        - mountPath: "/data"
          name: dynamic-vol
  volumes:
    - name: dynamic-vol
      persistentVolumeClaim:
        claimName: dynamic-pvc

Kubernetes Object - PersistentVolumeClaim (PVC) #

Request Storage: Allows users to request specific storage resources without needing to know the underlying storage details
Works with Both Static and Dynamic Provisioning: PVCs are flexible and can bind to pre-created PersistentVolumes (static) or trigger automatic provisioning (dynamic) via a StorageClass

# WHAT: Requests persistent storage from the cluster
# WHY:  Allows an app to store logs that survive
#       restarts or rescheduling of Pods
# WHEN: Use when data (like logs) must be retained
# WHERE: Defined in the same namespace as the Pod
# HOW:  Matches with a compatible StorageClass or PV

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: log-pvc
  # PVC name used by the Pod to reference this storage
spec:
  accessModes:
    - ReadWriteOnce
    # WHAT: Access policy for the volume
    # - ReadWriteOnce: only one node can write
    # - ReadOnlyMany: multiple nodes can read
    # - ReadWriteMany: multiple nodes can read/write

  resources:
    requests:
      storage: 1Gi
      # WHAT: Storage size requested (min guaranteed)
      # WHY: App needs this much space for logs
      # HOW: Cluster finds/provisions matching volume

  storageClassName: fast-ssd
  # WHAT: Tells Kubernetes which StorageClass to use
  # WHY: Enables dynamic provisioning of the volume
  # VARIATION:
  # - Omit this field to use the default StorageClass
  #
  # storageClassName: manual
  # Bind to Pre Created PersistentVolume (Static Provisioning)
  # Ensure storageClassName of PersistentVolume is manual

What is the role of a StorageClass in dynamic provisioning? #

Dynamic Provisioning: Defines how storage is dynamically provisioned, abstracting storage backend details
Reusability: Enables multiple PVCs to use the same StorageClass for consistent storage provisioning

# WHAT: Defines a class for dynamic volume creation
# WHY:  PVCs use it to auto-provision storage
# WHERE: Cluster-wide resource (not namespaced)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
    # Makes this the default StorageClass
    # PVCs without storageClassName will use this one
    # Only one default StorageClass is allowed per cluster
provisioner: ebs.csi.aws.com
# ✅ CSI-based AWS EBS provisioner (recommended)
# For GCP: pd.csi.storage.gke.io
# For Azure: disk.csi.azure.com

parameters:
  type: gp3
  # AWS EBS volume type
  # GCP: pd-standard, pd-ssd
  # Azure: Standard_LRS, Premium_LRS

  fsType: ext4
  # File system to format the volume with
  # Options: ext4 (default), xfs, btrfs, ntfs (depends on OS image)

reclaimPolicy: Delete
# WHAT: Behavior after PVC is deleted
# WHY:  To automatically delete the volume and free up resources
# OPTIONS:
# - Retain: keep volume and data after PVC deletion
# - Delete: delete volume (typical for cloud-managed disks)
# - Recycle: deprecated legacy mode (not supported in CSI)

volumeBindingMode: WaitForFirstConsumer
# WHAT: Controls when volume is bound to the node
# WHY:  Ensures zone-aware volume placement (especially in multi-zone clusters)
# OPTIONS:
# - Immediate: create volume as soon as PVC is created
# - WaitForFirstConsumer: create volume only after Pod is scheduled (recommended)

PV, PVC and StorageClass - Troubleshooting #

Scenario: What happens if a PVC requests more storage than any available PV offers?

PVC Stays Unbound: Kubernetes can't find a matching PV with sufficient capacity, so the claim remains in Pending state
Pod Stuck in Pending: If a Pod depends on the PVC, it will also stay in Pending until the claim is resolved
No Auto-Resize or Retry: Kubernetes doesn't auto-adjust existing PVs
Fix by Provisioning a Larger PV: Admin must create a compatible PV or enable dynamic provisioning via StorageClass
Verify with kubectl describe pvc: Check for capacity mismatch and related events for troubleshooting

Scenario: You’ve defined a PVC and deployed a pod using it, but the pod is stuck in Pending. What could be the issue and how do you debug it?

No Available PV: There may be no static PV available with matching specs, or dynamic provisioning may not be enabled
Missing StorageClass: If the PVC references a StorageClass that doesn’t exist, provisioning fails silently
Check Events and Describe: Use kubectl describe pod <pod-name> and kubectl describe pvc <pvc-name> to inspect reasons

Why would you prefer to use StatefulSet over a Deployment for running a distributed message platform like Kafka? #

Kafka is a Distributed Messaging Platform: Apache Kafka is used for building real-time event-driven systems
Each Broker Has a Persistent Identity: A Kafka broker is a server that stores data and serves client requests. Each broker in a Kafka cluster has a unique ID and hostname and maintains local storage for topics and partitions.
- Kafka Brokers Need Stable Identity: Each Kafka broker registers itself with a unique ID and hostname in the cluster
- Stable DNS Is Critical for Communication: Kafka uses broker hostnames (like kafka-0.kafka-headless.default.svc.cluster.local) for internal communication and client access
- Clients and Brokers Rely on Fixed Hostnames: Without consistent DNS names, clients can't reconnect and brokers can't find each other
Deployment Fails to Guarantee DNS Stability: Pods created by a Deployment get random names (e.g., kafka-xyz123), breaking Kafka's expectations
StatefulSet Solves These Problems: It gives each Pod a fixed hostname and persistent volume (e.g., kafka-0 always gets pv-kafka-0)
Ensures Predictable, Reliable Cluster Behavior: Required for stable broker identity, safe storage, and smooth recovery in Kafka deployments

Can you explain a StatefulSet with an example? #

Stable Identity: Assigns each Pod a unique, persistent identity that remains consistent across rescheduling
Persistent Storage: Ensures each Pod has its own PersistentVolume, maintaining data across restarts
Ordered Deployment: Deploys and scales Pods in a specific order(pod-0, pod-1, …), crucial for certain stateful applications
Headless Service: Often used with headless Services to manage network identities of Pods

# WHAT: Defines a StatefulSet for a Kafka-like app
# WHY:  Each Pod needs a stable name and its own volume
# WHEN: Use when app relies on identity and persistent state
# WHERE: Pods get hostnames like kafka-0, kafka-1, etc.
# HOW:  Each Pod gets a PVC and starts in order

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
spec:
  serviceName: kafka-headless
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  template:
    metadata:
      labels:
        app: kafka
    spec:
      containers:
        - name: kafka
          image: bitnami/kafka:latest
          ports:
            - containerPort: 9092
          volumeMounts:
            - name: kafka-storage
              mountPath: /bitnami/kafka
  volumeClaimTemplates:
    - metadata:
        name: kafka-storage
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi
        storageClassName: fast-ssd

---
# WHAT: Headless service for stable Pod DNS
# WHY:  Allows kafka-0, kafka-1, etc. to resolve by name
# HOW:  Used with StatefulSet for stable networking

apiVersion: v1
kind: Service
metadata:
  name: kafka-headless
spec:
  # Disables allocation of a cluster IP for the service.
  # Creates DNS records for individual Pods
  # Allows direct pod-to-pod communication
  # based on their DNS names
  clusterIP: None
  selector:
    app: kafka
  ports:
    - port: 9092
      name: kafka

Each Pod is Named Predictably: kafka-0, kafka-1, kafka-2
Each Pod Gets a Volume: kafka-storage-kafka-0, etc.
Stable DNS with Headless Service: Required for Kafka to connect broker-to-broker
Startup and Shutdown Are Ordered: Helps Kafka avoid cluster confusion or split-brain

How is a Secret different from a ConfigMap, and how are they mounted in a pod? #

Separate Sensitive from Non-Sensitive Data: Use Secrets for passwords, tokens, and keys; use ConfigMaps for configs like URLs and log levels
Encoded vs Plain Storage: Secrets are base64-encoded and treated as sensitive; ConfigMaps store plain text
Access-Controlled and Safer: Secrets are stored in etcd with stricter access rules; safer for confidential info
Same Mounting Options: Both can be mounted as files or exposed as environment variables in Pods
Example Use Case: Use a Secret for a database password and a ConfigMap for the database hostname in the same Pod

# WHAT: Stores sensitive data securely
# WHY:  Used for passwords, tokens, keys
# HOW:  Encoded in base64; access-controlled
# WHERE: Created in the Pod's namespace

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  db-password: cGFzc3dvcmQxMjM=
  # "password123" base64-encoded
  # echo -n "password123" | base64

---
# WHAT: Stores non-sensitive app config
# WHY:  Used for URLs, ports, feature flags
# HOW:  Stored as plain text key-value pairs
# WHERE: Created in the Pod's namespace

apiVersion: v1
kind: ConfigMap
metadata:
  name: db-config
data:
  db-host: my-database.svc.cluster.local
  db-port: "5432"
  db-name: myappdb
  db-user: appuser

---
# WHAT: Pod that uses Secret + ConfigMap
# WHY:  Mounts password + host into container
# WHEN: App reads them from /app/config/
# WHERE: Values mounted as individual files

apiVersion: v1
kind: Pod
metadata:
  name: db-client
spec:
  containers:
    - name: app
      image: busybox
      command:
        - /bin/sh
        - -c
        - >
          echo DB_HOST=$(cat /app/config/db-host);
          echo DB_PASSWORD=$(cat /app/config/db-password);
          sleep 3600;
      volumeMounts:
        - name: db-host-volume
          mountPath: /app/config/db-host
          subPath: db-host
        - name: db-pass-volume
          mountPath: /app/config/db-password
          subPath: db-password
  volumes:
    - name: db-host-volume
      configMap:
        name: db-config
    - name: db-pass-volume
      secret:
        secretName: db-secret

# ------------------------------------------
# VARIATION: Use env instead of file mounts
# env:
# - name: DB_HOST
#   valueFrom:
#     configMapKeyRef:
#       name: db-config
#       key: db-host
# - name: DB_PASSWORD
#   valueFrom:
#     secretKeyRef:
#       name: db-secret
#       key: db-password
# ------------------------------------------
#

Why are changes to ConfigMaps and Secrets not visible immediately? #

Not Live Reloaded by Default : Updates to ConfigMaps and Secrets are not immediately reflected in running Pods
No Built-in Watch Mechanism: Kubernetes does not trigger Pod updates automatically on changes to ConfigMaps or Secrets
Restart Needed for Environment Variables: If used as environment variables, changes only apply after Pod restart
Volume Mounts Sync with Delay: When mounted as files, updates are polled by kubelet (typically every 1–2 minutes) and not delivered instantly
Use External Tools for Auto-Restart: Tools like Reloader or Kustomize annotations can automate Pod restarts on changes

On this page