SRE Maturity Assessment

Baseline Your Reliability Maturity

40 questions across five dimensions. Get a detailed assessment of your current practices and a personalized roadmap for improvement.

Progress
0/40
Current Score
2.0 / 5
Level
Emerging
Your Company (Optional)
Include details to personalize your roadmap and report.
slos0/8 answered
Answer each question using the scale provided.
1. Customer‑centric SLIs (latency, errors, availability) are clearly defined for key journeys.
2. SLO targets are agreed with stakeholders and reviewed regularly.
3. Error budgets are tracked and influence roadmap/prioritization.
4. SLOs exist per service/component (not just global) and map to ownership.
5. SLOs are visible in dashboards with alerting tied to budget burn.
6. Release decisions incorporate current error budget and risk.
7. Success criteria for features include reliability impact (SLOs/SLIs).
8. SLO performance informs leadership updates and investment.
observability0/8 answered
Answer each question using the scale provided.
1. Centralized logging with structured, contextual logs (trace/span IDs).
2. Golden signals (latency, traffic, errors, saturation) are monitored per service.
3. Distributed tracing provides useful spans for key requests.
4. Alerting is tied to user impact and SLOs, not just infrastructure thresholds.
5. Dashboards are curated and actionable (no dashboard sprawl).
6. Ownership is assigned for alerts; on‑call knows what to do.
7. Telemetry pipelines are reliable and cost‑controlled.
8. Runbooks link from alerts; resolution steps are continuously improved.
incidents0/8 answered
Answer each question using the scale provided.
1. Clear incident severities, roles (IC, comms), and escalation paths exist.
2. On‑call is staffed, rested, and supported with rotations and handoffs.
3. Incident tooling (status pages, comms, timelines) is standardized.
4. MTTR and MTTD are measured with trends and goals.
5. Blameless postmortems identify actions with owners and due dates.
6. We run incident drills/chaos days to practice and improve response.
7. We track recurrence and systemic issues across incidents.
8. Learnings are fed back into runbooks, tests, and guardrails.
deploys0/8 answered
Answer each question using the scale provided.
1. CI pipelines are reliable, fast, and visible (flake rate, duration known).
2. We support safe deploys: canaries, blue‑green, or feature flags.
3. Rollbacks or roll‑forwards are quick and documented.
4. Automated tests cover critical paths (including smoke checks post‑deploy).
5. Change failure rate and deployment frequency are tracked.
6. Infra as code with reviews, policies, and drift detection.
7. Zero‑downtime migrations and database safety patterns are used.
8. Release health is monitored; deploys auto‑pause on issues.
culture0/8 answered
Answer each question using the scale provided.
1. Service ownership is clear, including out‑of‑hours responsibilities.
2. Reliability work has time budget and visibility alongside feature work.
3. Leaders reinforce blameless culture and learning.
4. SRE practices are part of onboarding and continuous training.
5. Shared definitions for severity, SLIs/SLOs, and runbooks exist.
6. Cross‑team reliability forums or reviews occur regularly.
7. Reliability OKRs/KPIs exist and tie to business outcomes.
8. Teams proactively propose reliability improvements.
Live Preview
Your scores update as you answer
Overall Score
2.0
out of 5
Maturity Level: Emerging
By Dimension
slos2.0
observability2.0
incidents2.0
deploys2.0
culture2.0

Answer all 40 questions to generate your roadmap.