Why is Version Control Essential? #
$ git log --oneline app.py
9e7c2f5 (Ethan) Add logging to greet()
a3d1cbe (Deepa) Fix typo in greeting
6b2e9a1 (Charlie) Refactor: move greeting to main()
f1a72bb (Bob) Add greeting function
d0a1b22 (Alice) Initial version of app.py
What is Version Control?
- Version Control Systems (VCS): Tools that track changes to files over time, allowing developers to manage code efficiently and collaborate without conflicts
- Purpose: Ensures that code history is maintained, and developers can work together without overwriting each others work
Why is Version Control Essential?
- Collaboration: Multiple developers can work on the same project without overwriting each others work
- History Tracking: Easily view, compare, and restore previous versions of code
- Branching: Developers can experiment with new features in separate branches without affecting the main code base
How Does Git Distributed Version Control Improve Collaboration and Efficiency? #
Types of VCS:
- Centralized Version Control Systems (CVCS): A single central server stores all versions of the code (e.g., SVN, CVS)
- Distributed Version Control Systems (DVCS): Every developer has a complete copy of the repository (e.g., Git, Mercurial)
Comparison
Aspect | Centralized VCS (CVCS) | Distributed VCS (DVCS) |
---|---|---|
Repository Location | Single central server | Each developer has a complete copy |
Offline Work (Check History,..) | Limited | Fully supported |
Single Point of Failure | Yes (central server outage) | No (any copy can restore the project) |
Backup | Central server must be backed up | Each clone serves as a backup |
Performance | Network-dependent | Local operations are faster |
Step by Step: Using a DVCS while Offline
- (Step 1) Cloning the Repository:
git clone https://github.com/in28minutes/devops-master-class.git
. Automatically creates and checks out themain
branch locally - (Step 2) Go Offline Immediately After Cloning: You can disconnect from the internet β You can do a LOT offline
- (Step 3) Make a Change to a File: Edit a file like
README.md
orkubernetes/02-kubernetes-for-beginners.md
- (Step 4) Stage and Commit the Change Locally: (
git add README.md
,git commit -m "Updated README with offline changes"
) - (Step 5) Compare With Remote:
git diff
(Compares your local with the last known state) - (Step 6) View History of a File:
git log README.md
,git blame README.md
(Helps understand who changed what and when β all offline) - (Step 7) Switch to Another Local Branch:
git checkout feature-branch-a
(and play with it) - (Advantage) Supports A Lot of Features Offline: You can commit (locally step by step), compare, inspect, and undo changes entirely without internet
- (Advantage) Sync Later When Online: Use
git push origin main
when you're back online to upload all your local changes
How Does Gitβs Distributed Version Control Improve Collaboration and Efficiency?
- Complete History for Everyone: Every developerβs local copy includes the full project history, making offline work possible
- Includes All Branches: Local clones contain all remote branches that were fetched, allowing developers to switch, create, or merge branches offline
- Faster Operations: Most Git commands (like commit, diff) are local, providing fast performance
- No Single Point of Failure: If the main server goes down, any local copy can be used to restore the project
Why is Git Snapshot System a Game-Changer? #
What is Snapshot-Based Systems?
- Snapshot Storage: Git captures the entire state of your project at each commit, like a photograph
- Complete Versions: Instead of storing just the changes (deltas), Git stores a complete version of changed project files
- Efficient Storage: Unchanged files are linked to previous versions rather than duplicated, saving space
How is it different from Delta-Based Systems (SVN)?
- Delta-Based Storage: Only stores the differences between file versions
- More Processing: Requires more computing to reconstruct previous versions
- Complex Branching: Changes are applied sequentially, making branching more complex
An example of Snapshot Based System
- COMMIT 1
- File Content:
Welcome Version 1
- Git Storage: Stores a full snapshot of the file as-is
- File Content:
- COMMIT 2
- File Content:
Welcome Version 2
- Git Storage: Stores a new full snapshot of the modified file
- File Content:
- COMMIT 3
- File Content:
Hello Version 3
- Git Storage: Stores yet another full snapshot of the file (Git optimizes behind the scenes using compression and shared objects)
- File Content:
An example of Delta Based System
- COMMIT 1
- File Content:
Welcome Version 1
- SVN Storage: Stores the complete file for the initial version
- File Content:
- COMMIT 2
- Delta Stored:
Change line 2 β Version 2
- SVN Storage: Only the difference from version 1 is stored
- Delta Stored:
- COMMIT 3
- Delta Stored:
Change line 1 β Hello
Change line 2 β Version 3
- SVN Storage: Stores only the differences (deltas) from previous version
- Delta Stored:
Why is Gitβs Snapshot System a Game-Changer?
- Speed: Quickly access any commit without calculating differences
- Reliability: Full file versions are always available
- Simplified Merging: Merging is faster because complete versions are available
Give a Short History of Git #
History of Git
- The Need: In 2005, the Linux kernel development team required a powerful, distributed version control system
- The Problem: They were using a proprietary tool that was expensive, unreliable, and centralized
- The Solution: Linus Torvalds, the creator of Linux, developed Git to solve these problems
- Git is Popular Because of its Key Design Principles:
- Speed: Git was designed to be fast, even for large projects
- Security: Changes are securely tracked and verifiable
- Decentralization: Developers can work independently, without relying on a central server
- 2005: Git was released as an open-source project
- 2008: GitHub launched, making Git widely accessible for collaborative projects (Git is a version control tool; GitHub is a cloud-based platform for hosting and managing Git repositories)
- 2010s: Git adoption surged as open-source communities and companies shifted away from Subversion (SVN) and other centralized tools
- 2018: Microsoft acquired GitHub, further boosting Gitβs presence in the enterprise market
- Today: Git is the standard for version control in software development
Key Commercial Products Built Around Git
- GitHub: Hosted Git repositories with collaboration, issue tracking, CI/CD, and security tools
- GitLab: Git repository hosting with integrated DevOps lifecycle tools (CI/CD, container registry, monitoring)
- Bitbucket (by Atlassian): Git hosting with Jira and Trello integration
- Azure Repos: Microsoft's Git repository service integrated with Azure DevOps
- AWS CodeCommit: Git-based repository service hosted by Amazon Web Services
- Cloud Source Repositories (CSR): Google Cloud's Git-compatible source control service for storing and managing code in private Git repositories
What are some of the popular Version Control Systems before Git? #
Important Version Control Systems Before Git
Tool | Type | Strengths | Weaknesses | Typical Use Case |
---|---|---|---|---|
CVS (Concurrent Versions System) | Centralized (CVCS) | Simple and lightweight | Limited merge support, prone to errors in large projects | Used by early open-source projects |
Subversion (SVN) | Centralized (CVCS) | User-friendly with basic features | Network-dependent for most operations, performance degrades with large repos or many branches | Popular for corporate projects earlier |
Perforce | Centralized (CVCS) | Excellent performance with large codebases and binary files (like game art and assets) | Complex setup, resource-intensive | Common in game development |
Comparing Git vs SVN
Feature | Git | Subversion (SVN) |
---|---|---|
Type | Distributed (DVCS) | Centralized (CVCS) |
Commit Model | Snapshots | Deltas |
Offline Work | Fully supported | Limited |
Merging | Fast and efficient | Can be slow |
Branching | Lightweight and fast | Branching is slower |
Used By | Used By Most Enterprise and Open Source Projects | Some legacy enterprise projects |
What is GitHub? ππ #
- Platform on Top of Git: GitHub is a web-based platform built on top of Git that enhances what Git can do by adding collaboration and project management features
- Collaboration & Hosting Made Easy: Designed to host Git repositories and provide powerful tools for team collaboration, code review, and project tracking β think of it as Git++
- Open Source Friendly: Widely used for hosting open-source projects and connecting with the global developer community
- Public and Private Repositories: Choose between public repos for sharing with the world or private repos for secure, team-only collaboration
- User-Friendly Interface for Beginners: Offers a simple and intuitive UI for those who are new to Git, making common tasks easier to perform
- Pull Requests and Code Reviews: Enables developers to propose changes and collaborate through peer reviews before merging to main branches
- Built-In Issue Tracking: Manage bugs, feature requests, and team tasks using GitHubβs issue system
- Team Discussions and Knowledge Sharing: Use the Discussions tab as a dedicated space for questions, ideas, and decisions β all in one place
- GitHub Actions for Automation: Set up workflows to automatically build, test, and deploy your code whenever changes are pushed
- GitHub Pages for Web Hosting: Host static websites straight from your repository β perfect for portfolios, documentation, or demos
Git vs Github
Feature | Git | GitHub |
---|---|---|
Version Control | Distributed version control for local version management | Cloud-based platform for hosting Git repositories, You can make it accessible to all team members |
Project Management | No built-in project management tools | Issue tracking, milestones, and project boards |
CI/CD Integration | Requires manual setup of external CI/CD tools | Integrated support for GitHub Actions to automate workflows |
Community | No direct community interaction | Community engagement through stars, forks, and discussions |
What is Git Repository? #
- Scenario: Imagine building a project where multiple developers work on the same code, often at the same time. Without a Git repository, itβs hard to track changes, avoid overwriting each otherβs work, or recover from mistakes.
- Git repository: A Git repository stores all your code and its history β like a time machine for your project
- You need a separate git repository for each of your projects
- Tracks Every Change with Context: Git records who made each change, when they made it, and why β enabling complete change management
- Enables Safe Experimentation: Try new ideas locally without fear β if anything breaks, you can always go back to a previous version
- Local Repository for Private Development: You work on your own machine in a fully functional Git environment, even without internet (
git init
) - Remote Repository for Team Collaboration: A shared repository helps your team work together on the same codebase (Create a repository on Github and
git clone
to your local machine) - Simple Process for Seamless Collaboration: Work locally, commit changes, and push to the remote repo β teammates pull updates to stay in sync
Practical Example: Create a New Git Repo and Push to GitHub
The goal of this workflow is to set up version control for your project using Git and connect it to GitHub
# STEP 1: Create a new local Git repository
git init
# WHAT: Initializes a new Git repo in current folder
# WHEN: You are starting a project from scratch
# STEP 2: Create a new file or modify files
echo "Hello Git" > hello.txt
# WHAT: Creates a file with sample content
# STEP 3: Check the status of the repo
git status
# WHAT: Shows untracked/modified/staged files
# WHY: Helps verify what will be committed
# STEP 4: Add file(s) to the staging area
git add hello.txt
# VARIATION:
# git add . β adds all modified/untracked files
# git add -A β adds all changes (incl. deleted files)
# STEP 5: Commit staged files
git commit -m "Initial commit"
# WHAT: Creates a snapshot of your staged changes
# BEST PRACTICE: Use short, clear commit messages
# STEP 6: Create a new GitHub repository
# Visit https://github.com/new
# NAME: my-first-git-repo (example)
# Keep the repo EMPTY (no README/.gitignore)
# STEP 7: Connect local repo to GitHub
git remote add origin \
https://github.com/youruser/my-first-git-repo.git
# WHAT: Adds a remote named 'origin' pointing to GitHub
# WHY: So you can push/pull code to/from GitHub
# STEP 8: Push code to GitHub
git push -u origin main
# WHAT: Pushes 'main' branch to 'origin' remote
# WHY: Publishes your local code to GitHub
# -u: Sets 'origin main' as default for future pushes
# Future pushes β just run: git push
How Git Organizes and Stores Your Code #
- Working Directory β Where Files Live and Evolve: This is your active project folder where you create and edit files. It reflects your latest code and uncommitted changes
- Index (Staging Area) β Prepare Files for Commit: When you run
git add
, Git stores a snapshot of the file in the staging area, ready to be committed - Local Object Database β Stores Full Commit History: Once you run
git commit
, Git creates objects like blobs, trees, and commits and stores them inside the hidden.git/objects
folder. This is your full version history. - Remote Object Database β Shared Copy in the Cloud: When you run
git push
, Git sends your commits to a remote repository like GitHub, GitLab, or Bitbucket so others can collaborate - Each Store Has a Specific Role: The working directory holds current files, the index stages selected changes, the local object database saves commit history, and the remote database enables sharing
- Everything Happens Inside
.git
Folder: Your entire Git project metadata, commit history, references, and configuration are stored here β no external database needed - Remote Store Mirrors Local State for Collaboration: Remote repositories hold the same Git objects, enabling team members to pull and push changes seamlessly
- Best Practice: Understanding where objects are stored helps troubleshoot issues with commits, staging, or syncing with remote
How a File Moves Through Git: From Untracked to Pushed #
- Flow for a new file
- Start as Untracked β File Not Known to Git Yet: When you create a new file, Git does not track it until you explicitly add it
- Staged β Ready to Be Committed: Once added using
git add
, the file becomes tracked and enters Gitβs control flow for changes and history. File is moved into the staging area, marking it as ready for commit. - Committed β Saved in Local Clone: Use
git commit
to take a snapshot of staged files; they are now part of your local Git history inside the.git
folder - Pushed β Synced to Remote Repository: Use
git push
to send your committed changes to a remote like GitHub or GitLab for backup and collaboration - Easy to Visualize Flow: File states flow like this β Untracked β Tracked and Staged β Committed β Pushed
- Flow for an already committed file
- Unmodified β No Changes Since Last Commit: The file remains in a stable state; nothing new has been changed after the last commit
- Modified β You Made Changes, But Didnβt Stage Yet: When you edit a file, Git marks it as modified but does not include it in the next commit unless you stage it
- Committed β Change Saved in Local Clone: Use
git commit
to take a snapshot of staged files; they are now part of your local Git history inside the.git
folder - Pushed β Synced to Remote Repository: Use
git push
to send your committed changes to a remote like GitHub or GitLab for backup and collaboration - Easy to Visualize Flow: File states flow like this β Modified β Staged β Committed β Pushed
- Foundation: This lifecycle is the foundation of Gitβs version control β it allows developers to track, manage, and share changes confidently and consistently across teams and time
What happens in the background when you commit code? (
git commit
) #
- Copy Staged Changes into the Local Object Database: Git stores the content from the staging area into a special
.git/objects
folder β this is your full version history - Create a Tree Object for the Snapshot: Git builds a tree structure representing the exact state of your files at the time of the commit
- Generate a Commit Object with Metadata: Git adds information like author name, timestamp, and commit message to describe what the change was about
- Link Commit to the Snapshot (Tree): The commit object points to the tree object, creating a traceable connection between the message and the actual files
- Assign a Unique Commit ID: Every commit is given a unique ID that can be used to identify, share, or roll back to that point
- Chain Commits Together with a History Pointer: Each commit includes a reference to the previous commit, forming a chain β the full history of your project
- Why It Matters: Commits help you track progress, roll back mistakes, and understand the evolution of your code over time
- Gitβs power comes from using commits as the foundation for structure, traceability, and control
- Branches and tags point to commits β no commits, no tracking or navigation
- Use commits to checkout, tag, revert, or explore any version