What is a Dockerfile? How do you build a Docker image? #


Dockerfile: A file containing a series of instructions on how to build a Docker image

Example:

  • Dockerfile
# Use an official Python runtime as the base image
FROM python:3.8-slim

# Sets the working directory inside the container
# Conceptually similar to `cd` command
WORKDIR /app

# Copies files/directories from the host to the container
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Exposes a port for external access
# Make port 80 available to the world outside this container
EXPOSE 80

# Define the command to run the application
CMD ["python", "app.py"]
  • Build the image using:
docker build -t my-python-app .
  • Create container from the image:
docker run -d -p 80:80 my-python-app

# Test
curl http://localhost

💡 Explore the following projects with detailed guides:

Example: Dockerfile


What is a Base Image? #


  • Starting Point for Builds: Provides an initial starting point to build your container images
  • Add Custom Things: Add your application code, dependencies, and configurations
  • Huge Variety: You have a variety of Base Images
    • OS Images: Examples include ubuntu:22.04, debian:12, alpine:3.20;
    • Language Runtimes: Examples include openjdk:28-jdk-slim, node:30-alpine, python:4.12-slim; these come with pre-installed runtime environments
    • Minimal & Secure Options: Such as alpine:3.20 or gcr.io/distroless/java21; they are tiny and have a smaller attack surface
    • Blank Slate: The 'scratch' image gives a completely empty starting point for maximum control

Why is the Base Image a Key Choice? #


  • Size: Smaller images lead to faster downloads and less storage usage
  • Security: Choosing a minimal and well-maintained image reduces vulnerability risks
  • Performance: Influences startup time and runtime efficiency of your container
  • Community Support: Popular images often have better documentation and troubleshooting resources

What is the difference between ENTRYPOINT and CMD in a Dockerfile? explain with example #


  • ENTRYPOINT: Defines the main executable

    • Main executable: Sets the container's primary command
    • Not easily overridden: Not overridden by arguments unless --entrypoint is used.
  • CMD: Default arguments to the ENTRYPOINT

    • Default execution: Provides default arguments for container run
    • Can be easily overridden: Can be overridden by arguments passed to docker run.

ENTRYPOINT Example:

#// Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Set the entrypoint to run the Python interpreter with app.py
ENTRYPOINT ["python", "app.py"]
  • ENTRYPOINT instruction sets the entry point for the container as python app.py
  • When the container starts, it will run python app.py by default, and you cannot change this behavior by passing arguments to docker run
  • To override the ENTRYPOINT, you need to use the --entrypoint flag with docker run:
docker run --entrypoint /bin/bash myimage

ENTRYPOINT + CMD:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Set the entrypoint to run the Python interpreter
ENTRYPOINT ["python"]

# Define the default argument to run
CMD ["app.py"]
  • This Dockerfile sets ENTRYPOINT to python and CMD to app.py
    • When the container starts, it will run python app.py
    • You can override the CMD argument like this:
      docker run myimage another_script.py

CMD Alone Example:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Define the default command to run when the container starts
CMD ["python", "app.py"]
  • The CMD instruction specifies that the container should run python app.py by default
  • You can override this command by specifying a different command when you run the container
docker run myimage python another_script.py

Difference between ADD and COPY in a Dockerfile #


  • COPY: Simply copies files and directories from the host to the container
  • ADD: More powerful. Can copy files and directories and also supports URL downloads and automatic extraction of compressed files

Example:

# Copy a local file into the /app directory in the image
COPY localfile.txt /app/

# Download a file from a URL 
# and unzip it into the /app directory in the image
ADD http://example.com/file.tar.gz /app/

How do you tag a Container image? Why is tagging important? #


  • Tagging Command: Use docker build -t imagename:tag . to assign a tag
  • Importance:
    • Version Control: Helps identify image versions
      • microservice-a:v1,microservice-a:v2,microservice-a:v3
      • mysql:5.7, mysql:6.0
      • openjdk:8.0.0,openjdk:25.0.0
    • Rollbacks: Easily revert to older versions of your image if newer ones fail
    • Flexibility: Gives teams flexibility w.r.t. deploying different versions in different environments (dev, staging, prod).
    • Multiple Tags: You can have multiple tags for an image
      • latest to indicate the latest version. Whoever want to use the latest version always can use the latest tag. myapp:latest

Example:

# Build an myapp image with a tag 1.0.0
docker build -t myapp:1.0.0 .

# You can have multiple tags for an image
# Add additional tag latest to myapp image with tag 1.0.0
docker tag myapp:1.0.0 myapp:latest

Explain the layered approach to building a Container image? #


  • Layered builds: Container images are created in layers
  • Example: Most instructions in a Dockerfile, such as FROM, RUN, COPY, create a new, read-only layer on top of the previous one.
  • Why is this important?
    • Layers are cached: Layers are cached for faster and more efficient builds
    • However Be Cautious: Changing one layer triggers a rebuild of subsequent layers
    • Example Below: In Changed Dockerfile, Every layer from Instruction 2 CHANGED is rebuilt again

Example Dockerfile

Instruction 1
Instruction 2
Instruction 3
Instruction 4
Instruction 5
Instruction 6

Changed Dockerfile

Instruction 1
Instruction 2 CHANGED
Instruction 3
Instruction 4
Instruction 5
Instruction 6

How can you design Dockerfiles to maximize layer reuse? #


  • Application code changes often: In applications, code changes often, libraries/dependencies change infrequently
    • Application code evolves to add new features, fix bugs, and adapt to changing requirements
    • Libraries and dependencies, on the other hand, tend to have more stable APIs and are updated less frequently, often for security patches, performance improvements, or new features
  • Recommended: Build Layers with Libraries and Dependencies First and Code Later

Non-Optimized Dockerfile:

# Use an official Node.js runtime as a parent image
FROM node:24

# Set the working directory
WORKDIR /usr/src/app

# Copy application code + dependencies files
COPY . .

# Install dependencies - TAKES TIME!
RUN npm install

# Expose the application port
EXPOSE 3000

# Run the application
CMD ["node", "app.js"]
  • In this Dockerfile, the COPY . . command copies the entire application code (app.js,...) + dependencies files (package.json and package-lock.json) before running npm install
  • This means any change in the application code (even a small one) will cause Docker to invalidate the cache for the COPY . . instruction and all subsequent instructions, leading to frequent rebuilds of the npm install step

Optimized Dockerfile

# Use an official Node.js runtime as a parent image
FROM node:24

# Set the working directory
WORKDIR /usr/src/app

# Copy only package.json and package-lock.json first
COPY package*.json ./

# Install dependencies - TAKES TIME!
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the application port
EXPOSE 3000

CMD ["node", "app.js"]
  • In this optimized Dockerfile, package.json and package-lock.json files are copied first
  • The RUN npm install step runs only if dependencies change (package.json or package-lock.json)
    • RUN npm install will NOT if ONLY application code (app.js,...) changes!
  • The COPY . . command copies the rest of the application code after installing dependencies.
  • This ensures changes to the application code do not affect the dependencies layer, allowing Docker to cache the npm install step unless package.json or package-lock.json change

What is a multi-stage build, and how can it be used to reduce the size of a container image? #


  • Avoid Unnecessary Files in Building Container Images
    • Including unused files increases image size
    • Slower builds and deployments
    • Higher security risks
    • Build stages (like target/, .class, node_modules/, dist/, test logs, etc.) often create temporary or dev-only files that should not end up in the final image
  • Multi-stage Dockerfile: Enables creating Container Images in multiple stages
  • How Does It Work?:
    • Multiple FROM statements: Each FROM starts a new build stage
    • Artifact copying: Transfer only necessary artifacts between stages to keep the final image small
  • Size optimization: Reduces final image size by excluding unnecessary tools and dependencies

Golang Example 1: Larger Image Size

// main.go
package main
import "fmt"
func main() {
    fmt.Println("Hello, Multi-Stage Builds!")
}
  • Dockerfile
FROM golang:1.16

WORKDIR /app

COPY main.go .

RUN go build main.go

CMD ["./main"]
  • Problem: final image includes unnecessary build tools, making it larger than needed
docker build -t large-image-size:latest .

# Test
docker images | grep -i "large-image-size"

Golang Example 2: Smaller Image Size

// main.go
package main
import "fmt"
func main() {
    fmt.Println("Hello, Multi-Stage Builds!")
}
  • Dockerfile
# Stage 1: Build the Go application
FROM golang:1.16 AS builder

WORKDIR /app

COPY main.go .

RUN go build main.go

# Stage 2: Create a minimal image with the built binary
FROM alpine:latest

WORKDIR /root/

COPY --from=builder /app/main .

CMD ["./main"]
  • Solution: Separate build stage, copy only compiled binary to minimal base
docker build -t small-image-size:latest .

# Test
docker images | grep -i "small-image-size"
  • Advantage: Smaller image, No unnecessary dependencies, Boost efficiency

  • Java Multi Stage Build Example

# Build a JAR File
FROM maven:3.8.2-jdk-8-slim AS stage1
WORKDIR /home/app
COPY . /home/app/
RUN mvn -f /home/app/pom.xml clean package

# Create an Image
FROM openjdk:24-jdk
EXPOSE 5000
COPY --from=stage1 /home/app/target/hello-world-java.jar hello-world-java.jar
ENTRYPOINT ["sh", "-c", "java -jar /hello-world-java.jar"]

Is it possible to make changes to an existing Container image? #


  • Read-only design: Container images are immutable — you cannot directly modify them.
    • This ensures consistency, reproducibility, and follows the principle of immutable infrastructure.
  • Commit changes: However, Modifications can be made to a running container and then committed to create a new image
  • New image creation: Instead of updating the existing image, a new one is created, following immutable infrastructure principles

Example: Commit changes from container & Create new image

  • Step 1: Run a container from the nginx:latest image
docker run --name test1 -d -p 80:80 nginx

# Test
curl http://localhost:80
  • Step 2: Modify the running container
docker exec test1 sh -c \
'echo "<h1 style="font-size:400px">Testing</h1>" > /usr/share/nginx/html/index.html'

# Test
curl http://localhost:80
  • Step 3: Create new container image from running container
docker commit test1 newnginx:modified
  • Step 4: Run a container from the newnginx:modified image (newly created)
docker run --name test2 -d -p 81:80 newnginx:modified

# Test
curl http://localhost:81

What is the purpose of the .dockerignore file? #


  • Exclude files: .dockerignore prevents specific files and directories from being copied into the Container image during the build and make image lightweight
  • Pattern matching: Define file and directory patterns (similar to .gitignore)that Docker should skip during the build process

Example .dockerignore File:

# Ignore node_modules directory
node_modules

# Ignore log files
*.log

# Ignore development or editor-specific files
.DS_Store
.vscode/
.idea/

What Are Alternatives to Writing Dockerfile? #


  • Manual Dockerfile Challenges: Writing good Dockerfile is challenging
    • Mistakes can lead to security issues, large images and long build times (inefficient caching)
  • Jib: Java-specific tool that eliminates the need for Dockerfiles
    • Layer Caching: Built-in optimization to speed up image building process
    • Security Features: Reduce risks by automating best practices
  • Buildpacks: Multi-language support (Java, Node.js, Python, etc.), no Dockerfile required
    • Layer Caching: Built-in optimization to speed up image building process
    • Security Features: Reduce risks by automating best practices
    • Language Compatibility: Supports a variety of programming languages
# Build a Java image with JIB
mvn compile com.google.cloud.tools:jib-maven-plugin:build \
  -Dimage=gcr.io/my-project/my-app

# Build a Java image with packeto build pack
pack build my-spring-app \
  --builder paketobuildpacks/builder:base \
  --env BP_JVM_VERSION=24 \
  --env BP_LOG_LEVEL=debug

# Use Spring Boot Maven Plugin (Uses packeto build pack)
mvn spring-boot:build-image

What is OCI and Why Is It Important? #


  • Problem Before OCI: Proprietary formats led to vendor lock-in and limited interoperability
  • Open Container Initiative(OCI): Defines open, vendor-neutral specifications for containers
    • Image Spec: Standard format for container images
    • Runtime Spec: Standardization for running containers across different platforms
    • Distribution Spec: Standardized methods for pushing and pulling images through container registries (Supported by Docker Hub, Google Artifact Registry, Amazon Elastic Container Registry, ...)
  • Open Governance: Managed by the Linux Foundation, ensuring open collaboration and transparency
  • OCI is Supported By Almost All Container Based Tools and Cloud Platforms: Docker, Podman, Buildpacks, Kubernetes, containerd, AWS, Azure, Google Cloud