SRE Maturity Assessment

A detailed questionnaire to baseline your SRE maturity

40 questions across five dimensions. Choose the option that best describes your current practice.

Step 1

Company details

Step 2

Questionnaire

Progress

0/40 answered

Step 3

Roadmap

Generated from your results

Current level

Emerging

Your company (optional)

Include details to personalize your roadmap and report.

Name

Work email

Company

Industry

Notes

slos

Answer each question using the scale provided.

1. Customer‑centric SLIs (latency, errors, availability) are clearly defined for key journeys.

2. SLO targets are agreed with stakeholders and reviewed regularly.

3. Error budgets are tracked and influence roadmap/prioritization.

4. SLOs exist per service/component (not just global) and map to ownership.

5. SLOs are visible in dashboards with alerting tied to budget burn.

6. Release decisions incorporate current error budget and risk.

7. Success criteria for features include reliability impact (SLOs/SLIs).

8. SLO performance informs leadership updates and investment.

observability

Answer each question using the scale provided.

1. Centralized logging with structured, contextual logs (trace/span IDs).

2. Golden signals (latency, traffic, errors, saturation) are monitored per service.

3. Distributed tracing provides useful spans for key requests.

4. Alerting is tied to user impact and SLOs, not just infrastructure thresholds.

5. Dashboards are curated and actionable (no dashboard sprawl).

6. Ownership is assigned for alerts; on‑call knows what to do.

7. Telemetry pipelines are reliable and cost‑controlled.

8. Runbooks link from alerts; resolution steps are continuously improved.

incidents

Answer each question using the scale provided.

1. Clear incident severities, roles (IC, comms), and escalation paths exist.

2. On‑call is staffed, rested, and supported with rotations and handoffs.

3. Incident tooling (status pages, comms, timelines) is standardized.

4. MTTR and MTTD are measured with trends and goals.

5. Blameless postmortems identify actions with owners and due dates.

6. We run incident drills/chaos days to practice and improve response.

7. We track recurrence and systemic issues across incidents.

8. Learnings are fed back into runbooks, tests, and guardrails.

deploys

Answer each question using the scale provided.

1. CI pipelines are reliable, fast, and visible (flake rate, duration known).

2. We support safe deploys: canaries, blue‑green, or feature flags.

3. Rollbacks or roll‑forwards are quick and documented.

4. Automated tests cover critical paths (including smoke checks post‑deploy).

5. Change failure rate and deployment frequency are tracked.

6. Infra as code with reviews, policies, and drift detection.

7. Zero‑downtime migrations and database safety patterns are used.

8. Release health is monitored; deploys auto‑pause on issues.

culture

Answer each question using the scale provided.

1. Service ownership is clear, including out‑of‑hours responsibilities.

2. Reliability work has time budget and visibility alongside feature work.

3. Leaders reinforce blameless culture and learning.

4. SRE practices are part of onboarding and continuous training.

5. Shared definitions for severity, SLIs/SLOs, and runbooks exist.

6. Cross‑team reliability forums or reviews occur regularly.

7. Reliability OKRs/KPIs exist and tie to business outcomes.

8. Teams proactively propose reliability improvements.

Preview

Live snapshot as you answer

Average

2.0 / 5

Level

Emerging

slos2.0
observability2.0
incidents2.0
deploys2.0
culture2.0

Please answer all 40 questions to generate your roadmap.