What is a Dockerfile? and How do you build a Docker image #


Dockerfile: A script containing a series of instructions on how to build a Docker image

Example:

  • Dockerfile
# // Dockerfile

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Define the command to run the application
CMD ["python", "app.py"]
  • Python App app.py:
# // app.py
from flask import Flask

app = Flask(__name__)

@app.route('/')
def home():
    return "Hello, Docker World!"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)
  • Python App dependency: requirements.txt
flask
  • Build the image using:
docker build -t my-python-app .
  • Create container from the image:
docker run -d -p 80:80 my-python-app

# Test
curl http://localhost

💡 Explore the following projects with detailed guides:

Example: Dockerfile


How do you tag a Docker image? Why is tagging important? #


  • Tagging Command: Use docker build -t imagename:tag . to assign a tag
  • Importance:
    • Version Control: Helps identify image versions and roll back if necessary
    • Organization: Distinguishes between development, testing, and production images
    • Automation: Facilitates automated deployments by referencing specific tags

Example:

docker build -t myapp:1.0.0 .

docker tag myapp:1.0.0 myapp:latest

List out basic instructions used in a Dockerfile #


  • FROM: Specifies the base image
  • WORKDIR: Sets the working directory inside the container
  • COPY: Copies files/directories from the host to the container
  • EXPOSE: Exposes a port for external access
  • CMD: Specifies the command to run within the container
  • ENTRYPOINT: Sets the command and parameters that execute as the container starts

Difference between ADD and COPY in a Dockerfile #


  • COPY: Simply copies files and directories from the host to the container
  • ADD: Can copy files and directories and also supports URL downloads and automatic extraction of compressed files

Example:

# Copy a local file into the /app directory in the Docker image
COPY localfile.txt /app/

# Add a file from a URL and place it into the /app directory in the image
ADD http://example.com/file.tar.gz /app/

What is the difference between ENTRYPOINT and CMD in a Dockerfile? explain with example #


  • CMD: Default arguments, override with docker run.
    • Default execution: Provides default arguments for container run, can be overridden
    • Single effect: Only the last CMD in the Dockerfile is effective
  • ENTRYPOINT: Primary command, always runs
    • Main executable: Sets the container's primary command, not easily overridden
    • Always run: Ensures the specified command executes consistently

CMD Example:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Define the default command to run when the container starts
CMD ["python", "app.py"]
  • The CMD instruction specifies that the container should run python app.py by default
  • You can override this command by specifying a different command when you run the container
docker run myimage python another_script.py

ENTRYPOINT Example:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Set the entrypoint to run the Python interpreter with app.py
ENTRYPOINT ["python", "app.py"]
  • ENTRYPOINT instruction sets the entry point for the container as python app.py
  • When the container starts, it will run python app.py by default, and you cannot change this behavior by passing arguments to docker run
  • To override the ENTRYPOINT, you need to use the --entrypoint flag with docker run:
docker run --entrypoint /bin/bash myimage

ENTRYPOINT + CMD:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Set the entrypoint to run the Python interpreter
ENTRYPOINT ["python"]

# Define the default argument to run
CMD ["app.py"]
  • This Dockerfile sets ENTRYPOINT to python and CMD to app.py
    • The ENTRYPOINT will always be python, but you can change what script is run with it
    • When the container starts, it will run python app.py
    • You can override the CMD argument like this:
docker run myimage another_script.py

Explain the layered approach to building a Docker image? and how we can write optimized Dockerfile? #


  • Layered builds: Docker images are created in layers, with each Dockerfile instruction forming a new layer
  • Cached layers: Layers are cached for faster and more efficient builds
  • Independent layers: Changes in one layer don’t impact others
  • Rebuild process: Changing one layer triggers a rebuild of subsequent layers
  • Optimization practice: Design Dockerfiles to maximize layer reuse

Non-Optimized Dockerfile:

# Use an official Node.js runtime as a parent image
FROM node:14

# Set the working directory
WORKDIR /usr/src/app

# Copy the entire application code to the working directory
COPY . .

# Install dependencies
RUN npm install

# Expose the application port
EXPOSE 3000

# Run the application
CMD ["node", "app.js"]
  • In this Dockerfile, the COPY . . command copies the entire application code before running npm install
  • This means any change in the application code (even a small one) will cause Docker to invalidate the cache for the COPY . . instruction and all subsequent instructions, leading to frequent rebuilds of the npm install step

Optimized Dockerfile

# Use an official Node.js runtime as a parent image
FROM node:14

# Set the working directory
WORKDIR /usr/src/app

# Copy only package.json and package-lock.json first
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the application port
EXPOSE 3000

CMD ["node", "app.js"]
  • In this optimized Dockerfile, package*.json files are copied first
  • The RUN npm install step runs only if dependencies change
  • The COPY . . command copies the rest of the application code after installing dependencies.
  • This ensures changes to the application code do not affect the dependencies layer, allowing Docker to cache the npm install step unless package.json or package-lock.json change

What is immutable infrastructure in Docker? #


  • Immutable infrastructure: it's a paradigm in which servers (or containers) are never modified after they are deployed
  • Rebuild strategy: if a change is needed, a new server (or container) is built and deployed, and the old one is decommissioned

Is it possible to make changes to an existing Docker image? #


  • Read-only design: Docker images cannot be changed directly, as they are read-only
  • Commit changes: However, Modifications can be made to a running container and then committed to create a new image
  • New image creation: Instead of updating the existing image, a new one is created, following immutable infrastructure principles

Example: Commit changes from container & Create new image

  • Step 1: Run a container from the nginx:latest image
docker run --name test1 -d -p 80:80 nginx

# Test
curl http://localhost:80
  • Step 2: Modify the running docker container
docker exec test1 sh -c \
'echo "<h1 style="font-size:400px">Testing</h1>" > /usr/share/nginx/html/index.html'

# Test
curl http://localhost:80
  • Step 3: Create new docker image from running docker container
docker commit test1 newnginx:modified
  • Step 4: Run a container from the newnginx:modified image (newly created)
docker run --name test2 -d -p 81:80 newnginx:modified

# Test
curl http://localhost:81

What is a multi-stage build, and how can it be used to reduce the size of a container image? #


  • Multi-stage Dockerfile: Enables creating Dockerfiles in multiple stages
  • Multiple FROM statements: Each FROM starts a new build stage
  • Artifact copying: Transfer artifacts between stages to keep the final image small
  • Size optimization: Reduces final image size by excluding unnecessary tools and dependencies

Golang Example 1: Larger Image Size

// main.go
package main
import "fmt"
func main() {
    fmt.Println("Hello, Multi-Stage Builds!")
}
  • Dockerfile
FROM golang:1.16

WORKDIR /app

COPY main.go .

RUN go build main.go

CMD ["./main"]
  • Problem: final image includes unnecessary build tools, making it larger than needed
docker build -t large-image-size:latest .

# Test
docker images | grep -i "large-image-size"

Golang Example 2: Smaller Image Size

// main.go
package main
import "fmt"
func main() {
    fmt.Println("Hello, Multi-Stage Builds!")
}
  • Dockerfile
# Stage 1: Build the Go application
FROM golang:1.16 AS builder

WORKDIR /app

COPY main.go .

RUN go build main.go

# Stage 2: Create a minimal image with the built binary
FROM alpine:latest

WORKDIR /root/

COPY --from=builder /app/main .

CMD ["./main"]
  • Solution: Separate build stage, copy only compiled binary to minimal base
docker build -t small-image-size:latest .

# Test
docker images | grep -i "small-image-size"
  • Advantage: Smaller image, No unnecessary dependencies, Boost efficiency

What is the purpose of the .dockerignore file? #


  • Exclude files: .dockerignore prevents specific files and directories from being copied into the Docker image during the build and make image lightweight
  • Pattern matching: Define file and directory patterns, similar to .gitignore, that Docker should skip during the build process

Example .dockerignore File:

# Ignore node_modules directory
node_modules

# Ignore log files
*.log

# Ignore development or editor-specific files
.DS_Store
.vscode/
.idea/