What is DevOps? #
What is DevOps?
- Scenario: Developers build features. Operations manage stability. But they work in silos → delays, finger-pointing. Can we build and run software together, better?
- DevOps: A culture and set of practices that bring Development and Operations teams together to deliver software faster, safer, and continuously.
No Strict Definition
- Amazon Web Services(AWS): "DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity"
- A Survey of DevOps Concepts and Challenges - L Leite: "DevOps is a collaborative and multidisciplinary effort within an organization to automate continuous delivery of new so�ware versions, while guaranteeing their correctness and reliability"
- The DevOps Handbook - Gene Kim , Patrick Debois, Et al.: "DevOps is the outcome of applying the most trusted principles from the domain of physical manufacturing and leadership to the IT value stream. DevOps relies on bodies of knowledge from Lean, Theory of Constraints, the Toyota Production System, resilience engineering, learning organizations, safety culture, human factors, and many others. The result is world-class reliability, stability, and security at ever lower cost and effort; and accelerated flow and reliability through the technology value stream, including Product Management, Development, QA, IT Operations, and Infosec."
DevOps is about Collaboration
- Dev + Ops = DevOps: Teams work together across the software lifecycle
- Shared Goals: Build fast, release often, stay stable
- Cultural Shift: Break down walls between teams
DevOps is about Quick Feedback
- Business can’t wait: Waiting weeks or months for feedback delays product improvements
- Defects shouldn't wait: Developers should be notified about code quality issues or unit test failures immediately
- Continuous Improvement: Quick feedback loops help refine features and fix things faster
DevOps is about Automation
- Manual = Slow + Error-Prone: Repeating steps by hand increases chances of mistakes
- Not Scalable: What works for 1 server won’t work for 100
- Automation Brings:
- Speed – Do things quickly
- Reliability – Same steps, every time (less chance of errors)
Key DevOps Practices
- Version Control: All code (infra + app) tracked in Git
- CI (Continuous Integration): Check code quality, run unit tests, .. immediately after code is committed
- CD (Continuous Delivery/Deployment): Automatically deploy code to you environment
- Infrastructure as Code (IaC): Use code to create your resources
- Observability: Understand what's happening inside a system (Metrics + Logs + Traces)
What is Continuous Integration (CI) and Continuous Deployment (CD)? #
- Continuous Integration (CI):
- Developers merge code changes often
- Automated systems build and test code every time
- Bugs are caught early
- Continuous Deployment (CD):
- Code that passes all tests is deployed automatically
- No need to wait for a big release day
- Continuous Delivery: Code is always ready to go live, but someone approves it
- Continuous Deployment: Code goes live automatically after passing tests
- General Tools: GitHub Actions, Jenkins, Argo CD
- Cloud: AWS CodePipeline, AWS CodeBuild, AWS CodeDeploy, Azure DevOps,Google Cloud Build, Google Cloud Deploy
What is Infrastructure as Code (IaC)? #
Why Infrastructure as Code (IaC)?
- Scenario: Imagine manually setting up infrastructure - servers, databases and networks manually each time you create a new environment - slow, inconsistent, and error-prone. Can we automate and repeat this reliably?
- IaC: Write code to provision and configure infrastructure – just like you write code for applications.
What is Infrastructure as Code?
- Definition: A practice of creating and configuring infrastructure (like servers, networks, databases) using machine-readable definition files.
- Core Idea: Treat infrastructure setup just like application code – store it in Git, version it, review it, and automate its deployment.
- Goal:
- Automate environment setup
- Eliminate manual steps
- Ensure identical environments in Dev, QA, and Prod
- Types:
- Infrastructure Provisioning – to create infra
- Configuration Management – to install and manage software on that infra
1: Infrastructure Provisioning
- What It Does: Creates cloud resources such as networks, virtual machines, storage buckets, load balancers, and databases using code
- Tools: Terraform (multi-cloud using HCL), Pulumi (multi-cloud using programming), AWS CloudFormation, Azure Bicep, Google Cloud Deployment Manager
- Benefits: Create complete environment in minutes with one command
2: Configuration Management
-
What It Does: Automates installation and configuration of software on servers (Install software, set timezone, update OS, configure app settings)
-
Tools: Ansible, Chef, Puppet
-
Benefits: Apply consistent configuration across 10s, 100s, or 1000s of servers
-
IaC = Provisioning + Configuration
-
Automate everything – from servers to software
-
Repeatable and scalable infrastructure – just like code
What is Standardization? How do containers and container orchestration enable Standardization? #
What is Standardization?
- Scenario: Imagine a scenario where you are using different processes for different applications built in different programming languages? Can we create a consistent process everywhere?
- Standardization: Creating uniform processes, tools, and environments for all environments to ensure apps are deployed and run the same way across all stages – development, testing, and production.
How Containers Enable Standardization
- Self-Contained Units: Containers package the app + dependencies + config together (Contain everything that an application needs to run!)
- Same Image Everywhere: Dev, QA, Prod – run the same container
- Portable: Runs on any system with a container runtime
Example
- You build a Java app with specific versions of Java + libraries
- Package it in a Docker container
- Run the exact same container on your laptop, test server, and cloud
- Result: Works the same everywhere → standardized deployment
- What's more: You can use the same process for Python or NodeJs Applications as well. Build a container image and deploy where ever you want.
What is Container Orchestration?
- Scenario: You have dozens of containers running your app – across multiple servers. How do you scale them automatically? How do you find out if one of the containers is failing?
- Kubernetes: An open-source platform that orchestrates (manages) containers – it automates deployment, scaling, and healing of containerized applications.
Why Container Orchestration/Kubernetes?
- Manual Scaling is Hard: Kubernetes scales apps automatically
- Resilience Needed: Restarts crashed containers
- Run Anywhere: Works on local, on-prem, and all major clouds
Containers + Container Orchestration
- Containers = Standard App Format
- Orchestration = Standard Runtime & Operations
- Together, they ensure consistency, reliability, and efficiency across the software delivery lifecycle.
What is Observability? #
Why Observability?
- Scenario: Your app is running slowly or returning errors. But you don’t know where or why. You need visibility across all systems.
- Observability: The ability to understand what’s happening inside your system just by looking at external outputs like logs, metrics, and traces.
3 Pillars of Observability
1. Logs
- What They Are: Text records of events (e.g., errors, warnings)
- Use: Helps answer what happened
- Example: “PaymentService failed: connection timeout”
2. Metrics
- What They Are: Numeric values tracked over time
- Use: Helps answer how is the system performing
- Example: “CPU usage = 80%”, “Latency = 220ms”
3. Traces
- What They Are: The journey of a request through multiple services
- Use: Helps answer where is the slowdown or failure
- Example: Trace shows 2.5s delay in
OrderService
during checkout
Observability vs Monitoring
- Monitoring = Alerts for known issues
- Observability = Deep visibility to explore unknown issues
Remember
- Observability = Deep Insight into System Behavior
- Helps teams build faster, detect earlier, and fix smarter
- Tools/Services:
- OpenTelemetry: Standard way to collect all three pillars
- Metrics: Prometheus (open-source, pull-based), Grafana (visualization), CloudWatch (AWS), Azure Monitor, Google Cloud Monitoring
- Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Loki (Grafana), AWS CloudWatch Logs, Azure Log Analytics, Google Cloud Logging
- Tracing: Jaeger (open-source), Zipkin (lightweight), Grafana Tempo, AWS X-Ray, Azure Application Insights, Google Cloud Trace
Can DevOps Be Done Without Cloud? #
Can DevOps Be Done Without Cloud?
- Yes – DevOps is Cloud-Friendly, Not Cloud-Dependent: DevOps is a culture + process, not tied to where your infrastructure runs. You can do DevOps with or without cloud.
DevOps Without Cloud is Possible
- On-Premises Servers: Use your own data centers
- Same Practices: CI/CD, monitoring, IaC – still apply
- Same Tools: Jenkins, Git, Docker, Terraform work on-prem too
- DevOps ≠ Cloud: You can adopt DevOps on-prem or hybrid
- Cloud Supercharges DevOps: Easier infra + more automation
Cloud Makes DevOps Easy
- On-Demand Infrastructure: Dev teams get servers instantly
- APIs & Automation: Everything can be automated easily
- CI/CD Pipelines: Implement CI/CD using cloud services
- Monitoring & Feedback: Monitor through cloud services
- IaC (Infrastructure as Code): Create/manage cloud resources with code