Something smells fishy inside AWS
Every great startup has an exciting and superb "founding story:" eBay and Pez dispensers, Apple and the garage tinkering of the two Steves, Microsoft and its BASIC, and so on. In the case of Amazon and its web services (AWS, albeit not quite a startup but close enough), the founding story goes something like this: aware that they had built up incredibly robust excess capacity for handling the peaks of e-commerce traffic on Amazon.com, the bright minds from Seattle decided to offer the same capacity to the rest of the web, kicking off the era of cloud computing for the thousands of customers that signed up for their triple threat of services: S3 (storage), EC2 (compute cyles), and SQS (messaging queues).
And yet, if AWS is using Amazon.com's excess capacity, why has S3 been down for most of the day, rendering most of the profile images and other assets of Web 2.0 tapestry completely inaccessible while at the same time I can't manage to find even a single 404 on Amazon.com? Wouldn't they be using the same infrastructure for their store that they sell to the rest of us?
Outages like this one will for sure fuel the fire of all of the startups trying to sell "cloud redundancy" to people who want to fail over seamlessly between providers.
Comments
anonymous commented, on July 20, 2008 at 10:49 p.m.:
It all depends what was broken today, I'm sure that many components are not shared.
MonoMelodies commented, on July 21, 2008 at 4:58 p.m.:
I'm familiar with neither the exact workings of Amazon's service or with the pricing, but I'm guessing it goes somewhere like this:
- Amazon hosts your images on their servers so you don't have to bother with storage, bandwidth &c.
- Amazon offers this service free of charge (or for at least a fee lower than what it would cost you to host it yourself)
Ergo: if their service goes down you won't get a 404, you'll get a "server not found"-error (which is something entirely different). If it /was/ indeed a 404 there's some redirection-voodoo going on, but in any case http://amazon.com would run on the same technology, not necessarily the same infrastructure. In fact, they'd be pretty stupid to /have/ done that.
Ergo(2): if your business depends wholly on a free service, you're in trouble (note that "free software" != "free service"). Call me paranoid, but anything your (technology) business depends on should have a fallback. And I don't mean the kind you trust Amazon to deliver. If you can't manage e.g. redundancy in hosting, don't let your business depend on 24/7 uptime. It's that simple.
If there service indeed isn't free I'd be interested in their terms. If they aren't any good, why stick with Amazon?
So, the key questions are:
1) what went wrong, what was the error? If it was really a 404 a more sensible error could have been expected from Amazon;
2) is the service in question commercial or "as is";
3) was the fault with the service architecture itself (ie, could have happened to Amazon only didn't since it was on a different cluster or whatever), or are there different versions of the system in the field (eg, one "good one" for Amazon and one "crappy one" for the rest of us).
You're implying the second option in point 3, and that's a serious allegation. I'd say more info is required for anyone to make a call on that.
Jessynne commented, on July 25, 2008 at 4:07 p.m.:
"...if AWS is using Amazon.com's excess capacity, why has S3 been down for most of the day [...] while at the same time I can't manage to find even a single 404 on Amazon.com? Wouldn't they be using the same infrastructure for their store that they sell to the rest of us?" My guess: No, they probably wouldn't. Most likely, each environment has its own history, limitations, legacy code, optimizations and quirks. Some things are likely shared, but they are two different products.

Hi, I'm Antonio, living in Boston and working this whole net thing out...

russ commented, on July 20, 2008 at 10:28 p.m.:
Isn't "cloud redundancy" kind of redund..err, repetitive? But I agree. And it was odd that it was down for most of the day.