I’m in a position with my SaaS where I basically host my customer’s content. As you can imagine, this can be a dangerous place to be.
Just for a moment, imagine if my SaaS went offline for 10 minutes and 5 seconds every month, that would be a problem right? People who rely on their website to make their money could see real, material losses.
You may be surprised, but that level of downtime actually matches one of Amazon Web Service’s biggest products: Cloudfront. Here’s a link to Cloudfront’s Service Level Agreement ) of 99.9% uptime .
It would seem reasonable that if I could match Cloudfront, that my service should be considered “reliable”.
But Here’s The Problem
It only takes one outage to get an online review that questions the reliability of a service. As my service involves hosting content for an online business, reliability is a pre-requisite.
What can be done to mitigate something like this?
Where Route 53 Fits In
Route 53 is Amazon Web Service’s DNS resolution service. If you don’t know what that means, when you type in a domain name like austinpena.com, it tells the browser where to go to find the content. Just how someone’s name is a descriptor and their Social Security Number is an “identifier”, a domain name like austinpena.com is just a name, and Route 53 looks up the identifier where the content is actually stored.
What’s very compelling, is that Route 53 can configure your website to be routed to a “healthy” endpoint, and they offer a competitive 100% SLA (they refund you money if their service goes down).
If my server goes down, Route 53 can just reroute the traffic to the customer’s origin server, meaning there’s no “real” downtime.
This works particularly well for a personalization service like Easy Landing , because I need to get the origin content from somewhere, but for a SaaS that’s more geared as creating and hosting the content, a failover can be a lot less trivial.
How To Know If You Should Use Route 53
“Premature optimization is the root of all evil.” If you’re going to make the jump into something like automated failover, be sure you have a good reason.
My personal choice of monitoring is pingr.io . The awesome thing is the founder is very available over live chat (as of January 2021).
If you start to see revenue impacting events related to downtime, I can’t recommend pinr enough.