Why is Data Persistence needed for Containers? #


  • Ephemeral Nature of Containers: Containers are designed to be short-lived. Data inside a container is lost when it restarts or terminates
  • Stateful Applications Need Persistence: Applications like databases, file storage systems, and message brokers need to retain their state across restarts or crashes
  • Data Retention: Critical data—such as user sessions, configurations, and transactions—must survive beyond the lifecycle of individual containers
  • Backup and Recovery: Persistent storage enables backup strategies and ensures data can be recovered after failure
  • Example Usecases:
    • Running MySQL, PostgreSQL, or MongoDB in a container
    • Web app that allows users to upload profile pictures or documents
    • Shopping cart in an e-commerce containerized frontend
    • Microservice logging activity data

What is a Volume in Docker? #


Volumes:

  • Ephemeral Nature of Containers: Containers are designed to be short-lived. Data inside a container is lost when it restarts or terminates
  • Volumes provide persistent storage: Data survives container restarts and removals
  • Volumes enable Data sharing: Multiple containers can access the same volume, enabling shared data

Types of Volumes in Docker:

  • 1) Named Volumes: Have a specific name assigned to them when created

    • Command: Created using command docker volume create <volume-name> or automatically when a container specifies a volume that Docker hasn't created yet
    docker volume create volume_name #OPTIONAL!
    docker run -d -v volume_name:/path/in/container my_image:tag

    Example:

    docker volume create pg_data #OPTIONAL
    
    # Run container
    # Docker-managed named volumes (like pg_data) 
    # are stored on the host filesystem
    # but in a location managed internally by Docker
    docker run -d \
    --name my_postgres \
    -e POSTGRES_USER=myuser \
    -e POSTGRES_PASSWORD=mypassword \
    -e POSTGRES_DB=mydatabase \
    -v pg_data:/var/lib/postgresql/data \
    -p 5432:5432 \
    postgres:latest
    
    # To check if the container is running
    docker container ls
    
    # To see the created volume
    docker volume ls
    
    # To inspect the volume
    docker volume inspect pg_data
    # `{"Name": "pg_data",
    #     "Mountpoint": "/var/lib/docker/volumes/pg_data/_data",...}`
  • 2) Anonymous Volumes: Created and managed by Docker and don't have a user-assigned name

    • Usage: Temporary or throwaway data during container runtime.
      • Used when volume re-usability is NOT important.
    • Command: Automatically created when a container specifies a volume destination without explicitly creating a named volume
    docker run -d -v /path/in/container my_image:tag

    Example:

    
    # Run container
    docker run -d \
    --name my_nginx \
    -v /var/log/nginx \
    -p 8080:80 \
    nginx:latest
    
    # To check if the container is running
    docker ps
    
    # To see the created anonymous volume
    docker volume ls
    
    # To inspect the volume
    docker volume inspect VOLUME_NAME # AUTO GENERATED NAME
docker volume ls
DRIVER    VOLUME NAME
local     9f4b1e3e5a9c4f7d88d46bb5efde14a3
local     project_data_volume
local     812dcff9c2a83c3b8f3dd70d7c0a9b0d
local     mysql_db_data

Explain Commands used to manage Volumes #


Create:

  • Create a Named Volume
docker volume create project_data_volume

Lists:

  • Lists all Docker volumes
docker volume ls
DRIVER    VOLUME NAME
local     9f4b1e3e5a9c4f7d88d46bb5efde14a3
local     project_data_volume
local     812dcff9c2a83c3b8f3dd70d7c0a9b0d
local     mysql_db_data

Inspection:

  • Returns detailed information about a Docker volume in JSON format
docker volume inspect <VOLUME_NAME>
[
    {
        "CreatedAt": "2025-12-01T14:23:45Z",
        "Driver": "local",
        "Labels": {
            "project": "sample-app",
            "env": "dev"
        },
        //Actual path where volume data resides on the host.
        "Mountpoint": "/var/lib/docker/volumes/project_data_volume/_data",
        //The volume's name (UUID for anonymous).
        "Name": "project_data_volume",
        "Options": {},
        "Scope": "local"
    }
]

Remove a Specific Volume:

  • Deletes a specified Docker volume
docker volume rm <VOLUME_NAME>

Remove All Unused Volumes:

  • Removes all volumes that are not currently used by any container
docker volume prune

Explain Bind Mounts with an example #


  • Bind mounts: Allow us to mount a directory or file on the host machine into a container
  • Path linking: Links to a specific path on the host filesystem
  • Flexibility: Offers direct access to host files, ideal for development and local testing
  • Use case: Real-time data sharing between host and container
  • Command:
docker run -d -v /host/path:/container/path my_image:tag

Example:

  • Web Development: Developing a website locally using Docker containers and website files are stored on your host machine at /tmp/website & any changes made on your host to be instantly visible in the container running your web server
  • Command:
    mkdir /tmp/website
    
    # Run the Apache Container
    docker run -d --name my_apache \
    -v /tmp/website:/usr/local/apache2/htdocs \
    -p 8080:80 \
    httpd:latest
    
    # Check if Apache is serving the page
    curl http://localhost:8080
    
    # Edit the index.html file on the host
    echo "<h1>Apache Server Updated - Live Changes</h1>" > \
    /tmp/website/index.html
    
    # Verify Changes with curl
    curl http://localhost:8080
    
    # Verify Inside the Container
    docker exec -it my_apache /bin/sh
    
    # Try to read the index.html file
    cat /usr/local/apache2/htdocs/index.html
    

Docker Volumes vs Bind Mounts #


Feature / Use Case Docker Volumes Bind Mounts
Definition Managed by Docker: Stored in Docker's internal storage (usually under /var/lib/docker/volumes/). Directly map a file or directory from the host filesystem into the container.
Management Managed by Docker Not managed by Docker – Does NOT have docker volume ls or inspect visibility.
Portability & Isolation Portable and isolated from host filesystem. Easy to backup, inspect, and share across containers. Tightly coupled with host. Changes on host are reflected in container and vice versa.
Use Cases Databases (e.g., MySQL, Postgres) storing data. Mounting source code during development; live file editing workflows.

Is it possible to share data between Multiple Containers? #


  • Yes!!: Possible to share data between multiple containers in Docker.

1) Data Sharing Using Volumes:

  • Recommended method: Docker volumes are the preferred way to share data between containers
  • Named volumes: Create and mount a named volume into both containers
  • Shared data store: Both containers can read from and write to the same volume
    # Create a named volume
    docker volume create shared_data
    # Container 1 mounts the volume
    docker run -d --name container1 -v shared_data:/data my_image1
    # Container 2 mounts the same volume
    docker run -d --name container2 -v shared_data:/data my_image2
  • Example:
    # Create a named volume
    docker volume create shared_data
    
    # Run the first container (Producer)
    docker run -d --name container1 \
    -v shared_data:/data alpine sh -c \
    "echo 'Hello from container1' > /data/hello.txt && tail -f /dev/null"
    
    # Run the second container (Consumer)
    docker run --name container2 \
    -v shared_data:/data alpine cat /data/hello.txt && sleep 5s

2) Data Sharing Using Bind Mount:

  • Bind mounts: Can be used to share data between containers by mounting the same host directory into both containers
    # Container 1 mounts a host directory
    docker run -d --name container1 -v /host/path:/data my_image1
    # Container 2 mounts the same host directory
    docker run -d --name container2 -v /host/path:/data my_image2
  • Example:
    #  Create a Host Directory for Shared Data
    mkdir -p /tmp/shared_logs && cd /tmp/shared_logs
    
    # Start the First Container (Producer)
    docker run -d --name container1 \
    -v /tmp/shared_logs:/data alpine sh -c \
    "while true; do echo $(date) >> /data/log.txt; sleep 1; done"
    
    # Start the Second Container (Consumer)
    docker run --rm --name container2 \
    -v /tmp/shared_logs:/data alpine sh -c "tail -f /data/log.txt"
    
    

3) Data Sharing Using Volumes from Containers:

  • Volume sharing: One container can access volumes from another container

    • If the container has multiple volume mounts, --volumes-from inherits all of them
  • Data sharing: Useful for sharing data between running containers

    # Start container 1 with a named volume
    docker run -d --name container1 \
    -v shared_data:/data my_image1
    # Start container 2 and use the volume from container 1
    docker run -d --name container2 \
    --volumes-from container1 my_image2
  • Example

    # Start Container 1 (Uploader/Producer)
    docker run -d --name container1 \
    -v shared_data:/data alpine sh -c \
    "echo 'Hello from container1' > /data/data.txt && tail -f /dev/null"
    
    # Start Container 2 (Processor/Consumer)
    docker run --rm --name container2 \
    --volumes-from container1 alpine cat /data/data.txt
    

How can you recover data after accidentally deleting a Container? #


  • Volume Storage: Data stored in a Docker volume remains intact even after container removal, allowing reuse with new containers.
    docker run -v my_volume:/app/data my-image
  • Bind Mounts from Host: Data saved in a host directory (e.g., -v $(pwd)/data:/app/data) persists independently of the container.
    docker run -v /your/local/path/data:/app/data my-image
  • Container Filesystem: Data stored inside the container's filesystem is lost once the container is deleted, and cannot be recovered unless backed up.
  • Recommendations:
    • Always use volumes or bind mounts for important data to prevent data loss.
    • Regularly back up data stored within containers if not using volumes or bind mounts.

What are Volume Drivers? #


  • Default Storage: Docker uses the local filesystem to store data by default
  • Volume Drivers: Plugins that enable Docker to create and manage volumes with different backends
  • External Storage Management: Support for storing, managing, and accessing data outside the local system
    • Data Sharing Across Hosts: Use of systems like NFS to share data between multiple Docker hosts
    • Cloud Storage Integration: Compatibility with cloud services such as Amazon EBS or Azure File Share
  • Additional Capabilities: Enable additional capabilities
    • Enhanced Redundancy: Improve data reliability and availability through specialized storage solutions
    • Performance Optimization: Different drivers can optimize performance based on backend storage capabilities
    • Security Features: Some volume drivers support encryption and access controls for sensitive data
    • Cross-Platform Compatibility: Drivers can facilitate data access across various operating environments
  • Custom Volume Creation: Use command like docker volume create --driver DRIVER_NAME --opt key=value volume_name to specify drivers and options