Xbox Live Keels Over … OOPS!

It is really hard to go more than a week or two without hearing news of yet another invincible, web 2.0-service-built-to-scale-and-be-reliable, not scaling or being reliable.

On the one hand it's easy to dismiss these failures as someone else's problem, but the plain reality is that these systems have been built and deployed by folks who really thought that they had the scale and reliability problems covered ... except that they didn't.

Oops.

Xbox 360 LiveThe latest reluctant contestant in the contest for biggest flop on a public stage award is ... XBOX LIVE!

Turns out that Halo 3 was released this week, the latest title in the lucrative series which has been the franchise title in the Xbox universe. With pre-orders topping 4 million, excited gamers spent last night unbelievably frustrated as they were unable to log onto Xbox Live. Turns out that it was limping along, allowing upwards of just 1,000 Halo 3 users to log on!

Halo3BoxUmm, I think I see the problem here ... 4,000,000 initial customers, capacity for 1,000. I'm sure the other 3,999,000 initial customers won't mind waiting for their turn in line? Especially when Xbox Live has been billed as "the ultimate broadband gaming experience", and by design (and competitive necessity) is crucial to the Halo 3 experience.

Making matters worse, I'm sure that Microsoft's business case counts on 1) more sales of Halo 3 than the initial 4,000,000 pre-orders, and 2) some serious monthly revenue from Xbox Live while all those Halo 3 customers forget they actually live in a flesh-and-blood world! Not to mention the need for Halo 3 to be a big hit in order to bolster Xbox 360 in the gaming platform wars.

Bottom Line
This service really matters to Microsoft and a key customer constituency, yet it isn't scaling well precisely when it matters most ... What happens next?

In the short term I'm sure that Microsoft will deploy a bevy of smart people to fix this particular problem. Then it will happen again, perhaps in another portion of Xbox Live, or perhaps in another SaaS (software as a service) offering. Except maybe this time it'll be in Office Live...

In any case, each of these examples had the economics and human capital to justify a hand-crafted scaling and reliability implementation, yet still fell short.

That is precisely why it is so important to deal with these issues up-front, ensuring scaling and reliability for all applications at the architectural level. Enable your developers to use the languages (like all the .Net languages, C / C++, Java, and more) that they love. Provide ways to reliably manage state without crushing your database layer. Assume that your commodity infrastructure will break, networks will break, operating systems will break. Fix those problems ahead of time automatically for the developer ...

Welcome to our world!

One More Thing
I'm going to begin highlighting some of these notable faux-pauxs as they occur ... mostly as a reminder to each of us that applications we build really should work as intended, with scale and reliability simply delivered. Our industry must make this transition, and it will ... after all, that is why we built the application fabric.


Technorati Tags: , , , , , , , , ,

Reply

The content of this field is kept private and will not be shown publicly.