DevOps & SRE
Key Questions
What is the difference between DevOps, SRE (Site Reliability Engineering), and Platform Engineering?
Why do we need CI/CD pipelines instead of dragging files to a server?
What is Infrastructure as Code (IaC) and why is Terraform/OpenTofu standard?
What happens when a server crashes at 3 AM? (Incident Response)
What are SLIs, SLOs, and SLAs?
How do we know if the system is healthy? (Monitoring vs Observability)
Why is manual testing not enough for modern software?
What is the difference between Unit, Integration, and End-to-End (E2E) testing?
What is TDD (Test Driven Development)?
What is a 'Blameless Post-Mortem' and why do we need them?
Why is 'It works on my machine' not a valid excuse?
Learning Objectives
Learning Objectives
Track your progress as you learn
Hard Truths
Developers often build things that are impossible to operate or monitor.
Manual deployments are the root cause of most outages.
Uptime is a feature, not luck.
Alert fatigue is real: if everything is urgent, nothing is urgent.
The most permanent solution is a temporary workaround.