CloudIQ – The Evolution of a Platform, Part 1

by bob on March 25, 2009

in Editorial

This is Part 1 of a planned three part series that traces the evolution of the CloudIQ Platform from first idea to what it is today, then considers what it is likely to become.

180px-View_from_the_La_Luz_Trail.jpg

Right after graduate school we spent a few years living in the foothills of a desert mountain range. I remember the first time that I hiked a narrow trail to the top of the nearest peak – standing at the bottom was rather intimidating; the circuitous ascent itself was such a tangled mixture of switchbacks, short ascents and descents, under cover and open trail that it could only be described as non-obvious (at best). However, it was only upon achieving the original goal that we gained any perspective on how, in fact, we had gotten there.

In this post I will reflect on the evolution of CloudIQ – the truly exciting (if I must say so myself!) cloud application platform that we announced a couple of weeks ago.

Roots
As some pondered the impending Y2K “crisis” and others looked for the best millennium parties, most of our founding team was deeply enmeshed in building, selling, and supporting an enterprise-grade (scalable, reliable, etc.) payment server.

Upon leaving that company I had time to reflect on my enduring frustrations -

why was it so hard to build software that we (and our customers) could rely upon?

It seemed that we were spending 60 to 70% of our engineering efforts not on core functionality, but in our best attempt to ensure that the resulting application could be relied upon.

Later on an early supporter coined the term “reliability tax” to refer to this overhead.

As I asked friends at other companies and enterprise shops most recognized the same problem – a few argued the overhead was actually higher, most thought that the cruel irony was that it was very, very difficult to ensure true reliability for enterprise apps – but all agreed that this just didn’t seem right, not nearly 50 years after Gracie Hopper did her most famous work.

With this question just really bugging me I had an opportunity to build the beginning of a digital recording studio. Using completely commodity gear – no name, cheap- I was genuinely shocked at the results. Serious performance, cheap.

So that led to the second question -

why weren’t we using commodity stuff like this for problems that we really cared about?

The answer to this seemed easy enough – who could trust this cheap stuff? What if it broke? (and it would break).

In pondering the first question it seemed to us that the core problem for software development was one of complexity – mainstream application architectures were simply too complex, and becoming inexorably more so.

The First Idea
Then it became fairly clear – we could solve both of these problems at the same time by enabling groups of commodity boxes to work together to ensure a stable platform for applications.

But what exactly did that mean? Or more to the point, could we build it? Ever MORE to the point, once we built it, how could people use it, and for what applications would this new thing be useful?

Over the course of a few months the founding team hammered out the first answers to those questions. Throughout this process we were driven by use cases – we wanted the resulting platform to be equally adept at running anything from fine grained, transactional applications to more computationally-intense enterprise applications.

This led to much refinement of the basic idea, which evolved to a self-organizing group of commodity machines that could act like one thing, reliably execute all sorts of applications, grow (and shrink) as needed without affecting the execution of any running applications, and be very simple to both write applications and operate.

The Hive is Alive
We decided to call this hive computing, and on June 12, 2002 we had the first successful demonstration of a running hive. We assembled a few commodity boxes on a re-purposed kitchen rack, loaded the prototype hive software, and … it worked!

We were able to (carefully) pull a few plugs and the application kept running without missing a beat – in fact, without even losing a bit of data.

Within two years we had our first paying customer (Sprint), a couple of patents filed, and a demonstration system on which we ran an eye-opening benchmark – a wall of 100 commodity computers that could legitimately double Visa’s then-current peak transaction load, for a total bill that was well under 10% of the conventional alternative.

The best part? It was arguably far more reliable as well. We were constantly amazed by the resilience and ease of use of this new type of application platform … though truth be told, we were not yet ready to use the “P” word.

Roadblocks
In our pursuit of the possible we (the founding team) sometimes thought that economics would do all the persuading for us. Well that turned out to be sometimes yes, but mostly no.

In fact, sometimes economics actually worked against us – the combination of 90% lower costs, simpler development, easier operations, and simultaneously increased reliability and scalability simply seemed too good to be true for many people.

The fact that we required some modification of the application also made adoption more complicated. While we supported several languages and multiple operating systems (and could easily support more of each), the plain, simple truth was that you did need to modify – albeit lightly – many components in each and every application.

borg_cube.jpg

This raised the adoption barrier a bit higher.

Then there was a little matter of language. Without a native category to call our own, sometimes we were put into all sorts of categories – everything from grid computing to autonomic computing, with several others between.

Early on I even told some folks that we were basically building the “Borg for applications”. While hard-core geeks loved that (and usually laughed), it didn’t exactly help us build trust with the typically non-technical executives responsible for making the final purchase decisions.

Yet, It Worked … Well
Despite these go-to-market difficulties the product itself worked well – really, really well. In fact, by mid 2002 several of us became firmly convinced that beyond a shadow of a doubt there would be a time in the future – say 10 or 15 years – where most mainstream computing would be done this way.

The economics and functional advantages were simply too compelling for any other outcome.

The only real question in our minds was when and who – when this transition would begin to occur and who would help make that transition happen.

So as 2004 came to a close we pondered solutions to these issues and continued to press rapidly forward.

In Part 2 we will talk about why this worked so well, and the transition to the application fabric.

Comments on this entry are closed.

Previous post: This is Why Appistry Kicks Apps!

Next post: You had me at “Open” — The Open Cloud Manifesto Manifesto