Frequently asked questions
What's the difference between SaaS SRE and traditional SRE?
SaaS SRE optimises for renewal-driving customer SLOs, multi-tenant blast-radius control, and per-customer pager budgets. Traditional SRE focuses on internal-facing services and uniform infrastructure. The difference shows up in how you write SLOs, how you scope incidents, and how you frame service credits when you breach.
How do customer-facing SLOs differ from internal SLOs?
A customer-facing SLO is a contract: you publish a target (e.g., 99.95% monthly uptime), measure against customer-impacting events only, and pay service credits when you miss. Internal SLOs are debugging tools — looser, more granular, and never visible to the customer. Most SaaS teams accidentally publish their internal SLOs and regret it.
What's a reasonable uptime SLA to publish for B2B SaaS?
99.9% monthly is table-stakes for B2B SaaS in 2026. 99.95% is competitive. 99.99% is enterprise-tier and requires real multi-region architecture, automated failover testing, and a 24/7 SRE rotation. Don't publish what you can't measure end-to-end with synthetic probes from your customers' regions.
How does multi-tenancy change incident response?
Three things change: (1) blast-radius detection has to map customer-by-customer, not just service-by-service; (2) noisy-neighbour incidents need rate-limit and quota tooling, not pod restarts; (3) per-customer SLO breach detection runs in parallel with platform-wide alerting. Most off-the-shelf monitoring assumes single-tenant — multi-tenant SaaS needs custom dashboards.
What does a SaaS SRE engagement deliver in 90 days?
Defined customer-facing SLOs with burn-rate alerts; an incident response runbook with severity gates and customer-comms templates; a multi-tenant dashboard for top-N customer health; a chaos test schedule. The outcome: your renewal team can answer 'how reliable were we for Customer X last month?' with data, in under five minutes.