Skip to main content
Side panel
Home
Calendar
More
You are currently using guest access
Log in
Home
Calendar
Expand all
Collapse all
Open course index
Dashboard
Reliability First
Session 2: Outage Avoidance: The First 72 Hours Before Failure
Session 2: Outage Avoidance: The First 72 Hours Before Failure
Section outline
Select activity Anatomy of preventable outages: configuration drift, missed patches, capacity blind spots
Anatomy of preventable outages: configuration drift, missed patches, capacity blind spots
Page
Select activity Proactive monitoring vs. reactive firefighting: defining early-warning thresholds
Proactive monitoring vs. reactive firefighting: defining early-warning thresholds
Page
Select activity Pre-mortems: simulating failures before they happen (Chaos Engineering Lite for SMBs)
Pre-mortems: simulating failures before they happen (Chaos Engineering Lite for SMBs)
Page
Select activity Communication protocols: who to alert—and when—before a system degrades
Communication protocols: who to alert—and when—before a system degrades
Page
Select activity Cost of downtime vs. cost of prevention: making the case for reliability investments
Cost of downtime vs. cost of prevention: making the case for reliability investments
Page
Select activity Key roles during near-misses: sysadmin, DevOps, security, and business continuity leads
Key roles during near-misses: sysadmin, DevOps, security, and business continuity leads
Page