Is your code cloud-ready and multi-core friendly? (pt 3): Statelessness

Fullerene_c540As part of an ongoing series, we are discussing design principles that influence how ready our code is for distributed  computing in the cloud, as well as for multi-core utilization.

A friend of mine used to say, "All generalizations are false, including this one." Generally, the rule of thumb in distributed computing is that stateful objects are bad, and stateless is good. Today we'll explore the advantages of being stateless on the cloud and in concurrent programming.

Last time, when we discussed Atomicity, I said this about distributed computing in the cloud:

"For example, a general rule of thumb is that your code may execute anywhere in a distributed computing environment at a given time. If method A is called on an instance of object Foo on one server, and then if method B may be called on a different instance of object Foo on another server, then obviously any interdependency cannot be easily satisfied without additional overhead and effort. An example of such overhead falls under the topic of statefulness versus statelessness, which we'll discuss in part three of this series. Such overhead, if not handled correctly, can inhibit scalability and availability."

Today we'll examine scalability and availability in the context of stateless versus statefulness, and add to the discussion load balancing, reliability, and concurrency-safety.

Statelessness

First, some definitions are in order. A stateless object does not hold information or context between calls on that object. In other words, each call on that object stands alone and does not have reliance on prior calls as part of an ongoing "conversation." A caller could call a method on one instance of a stateless object, and then make a call on a different instance of the same object and not tell the difference. This does not mean that a stateless object does not deal with data or context, but rather it just does not hold it across multiple method calls, nor access it concurrent to other processes, be they distributed processes or local threads or processes.

Statefulness

Statefulness implies certain conditions. Take for example a stateful shopping cart object. Each instance of a stateful shopping cart object might represent an individual customer. Therefore, if we had a thousand customers shopping on a site, there would be, in theory, one thousand instances of the stateful shopping cart object. Therefore, to update the state of a given customer's shopping cart, we find the correct object for that customer, and make calls on it. This is the Neo the One version of statefulness. Only one object can do the job, and Morpheus must find it…somewhere in the Matrix.

Statefulness may also imply shared state. Shared state may be as simple as an object that is generally stateless in terms of member data (per instance values) and yet hold onto static data (per class values) that changes from method call to method call, and affects the outcome of subsequent method calls. The static data injects its importance into the conversation between callers and the object. Shared state gives rise to increased complexity whether shared across a cloud of computing resources, or across local threads or processes.

Therefore, a general definition of statefulness is in order: a stateful object holds onto data, information or context that is important across the conversation of multiple calls on a given instance of that object, and multiple callers may need concurrent access to that state.

Neo the One

Stateless objects mean that the cloud or the multi-core environment is free to use any instance of an object to get the work done. Therefore, any instance of the called object can be Neo and we're good. From a design standpoint, this affects scalability, availability, load balancing, and reliability. Let's examine each.

Statelessness and Scalability

In terms of scale in a distributed computing environment like the cloud, stateless objects mean that work can be load balanced onto the machine most suited to do it. With stateless objects, I do not have to worry about contention for a single instance of that object, a situation potentially leading to race conditions, dead lock, and starvation. I can just add more machines and let the work scale out horizontally onto them. If I am dependent on a single, golden instance of a stateful object, and it is getting all the attention from calling clients, then depending on the stateful model I'm using, I may or may not be creating a scalability bottleneck. But, no matter what stateful model I'm using, I do have more overhead than I likely would if I could have accomplished the same work without state.

Statelessness and Availability

If my cloud environment is free to use any machine to get the work done, then I have very wide spread availability. In case of failure, the loss of any machine does not affect me, provided I have enough machines to continue. Again, if I am dependent on a single, golden instance of a stateful object, and the resources holding it goes down, now I have lost availability for that instance of that object, and have to reconstitute it, if possible, elsewhere, hopefully without loss of state. At a minimum, I have introduced complexity into my fabric application.

Statelessness and Load Balancing

In terms of load balancing, if the cloud environment is free to use any machine to get the work done, then whatever machine has the least overhead can be given the work. If there is a single, golden instance of an object on a single machine, and it is getting all the attention from calling clients, then I may have both a load balance problem and a resource contention problem. The model of statefulness I am using will determine the severity and complexity of the problem, and determines whether I can spread the work to multiple machines sharing a cache of the same stateful instance of the object.

Statelessness and Reliability

The stateless model also bolsters a cousin of availability: reliability. Statelessness, coupled with a self-healing cloud environment affords us a lot of flexibility. In such an environment, if a machine goes down due to hardware failure then the work can continue with another instance of the object on another machine. I don't have to worry about loss of consistency or other complexity like I would if I depended on a now lost, stateful object.

Statelessness and Concurrency Safety

Stateful code in any form in a distributed or parallel environment requires being mindful of concurrency issues. Stateless code, on the other hand, is more likely (no guarantees of course) to be concurrency safe as you execute it across distributed nodes, multiple threads and cores. Stateless code also has more likelihood of being easier to execute across multiple processes instead of multiple threads, a model seen in Google's Chrome and Microsoft's IE 8 browsers where each tab is a process and not a thread.

Concurrency safety is a complex topic, and I'm already running long on this post. So, rather than rambling on myself, I'll point you to a great article titled "Java theory and practice: Are all stateful Web applications broken? HttpSession and friends are trickier then they look," by Brian Goetz. In his thesis, Goetz states "many web applications that use HttpSession for mutable data (such as JavaBeans classes) do so with insufficient coordination, exposing themselves to a host of potential concurrency hazards."

There's lots of that kind of code already out there, and it just gets more complex in distributed and multi-core parallel applications.

See you next time for part four.

Is your code cloud-ready and multi-core friendly? (part 2): Atomicity

As part of an ongoing series, we are discussing design principles that influence how ready our code is for distributed computing in the cloud, as well as for multi-core utilization. Today, we talk about Atomicity.

Atomicity558px-Electron_shell_079_Gold.svg

An atomic piece of code is code with a specific, clearly defined purpose. In object-oriented terminology, it has "cohesion." It's the pepperoni pizza and not the garbage pizza with everything. Likewise, atomic code can stand alone from the order of execution of other code. We’ll explore different aspects of atomic code below.

Don't Lean on Me

First, though atomic code may require other libraries, its execution is self-contained. In other words, there are no call-level interdependencies. For example, if a call on method A must precede a call on method B and method B can’t be called unless method A is called first, then A and B are not atomic separately. If there is interdependency across individual method calls, then your individual methods are not atomic, though they may form an atomic operation all together.

It’s Simple, My Dear Watson

Second, atomic code is concise by its nature, and, as stated above, has a specific, clearly defined purpose. Fat, hairy, do-multiple-things methods are not atomic. If a method serves multiple purposes, then the general rule is to break those purposes up into separate atomic methods.

The Long and Winding Code

The same holds true for long running methods. Long-running methods are not usually atomic. Perhaps the long running code has a single purpose, but if the long running method can be broken down into more atomic steps, then your gain flexibility when running that code in cloud environments or when utilizing multiple cores.

Flexibility and Control

The choice to break our code down into more atomic steps is all about flexibility and control. Whether our code is executing across multiple cores on multiple threads or processes, or across multiple servers in a distributed computing environment, the choice to break our code down into atomic steps is about flexibility and control in terms of how our code can execute in these environments.

Advantages of Atomic Code in Distributed Computing

For example, a general rule of thumb is that your code may execute anywhere in a distributed computing environment at a given time. If method A is called on an instance of object Foo on one server, and then if method B may be called on a different instance of object Foo on another server, then obviously any interdependency cannot be easily satisfied without additional overhead and effort. An example of such overhead falls under the topic of statefulness versus statelessness, which we’ll discuss in part three of this series. Such overhead, if not handled correctly, can inhibit scalability and availability.

In addition, some distributed computing environments plan for failure, indeed, expects there to be failure at any time when code is executing. When a power supply takes down a server in the middle of a method that does multiple things, and your distributed environment allows for automatic retries, what are the implications? How far did the method get in its work? Which steps were satisfied, and what was left undone? Atomic steps allow more fine grained levels of reliability and problem resolution.

Long running code only makes this situation worse. Perhaps the long running code has a single purpose, but if it runs for a long time to achieve a single end, and the server running it dies ninety percent through, do you retry it from the beginning? Starting over is fine for some purposes, while in time critical work, it would be unacceptable. If the long running method can be broken down into more atomic steps, then your distributed application can perhaps snapshot progress along the way, and retry from failure mid-process more efficiently.

Essential Value of Atomicity in Parallelism and Multi-core

In terms of parallelism and multi-core, Daniel Spiewak wrote an excellent entry on this very topic just this week on his Code Commit blog (our friend Alex Miller twittered about the article). Daniel makes the argument that

“there is actually a deeper question underlying concurrency: what operations do not depend upon each other in a sequential fashion?  As soon as we identify these critical operations, we’re one step closer to being able to effectively optimize a particular algorithm with respect to asynchronous processing.”

Later in the article he states:

“This is truly the defining factor of atomic computations: it may be possible to reorder a series of atomic computations, but such a reordering cannot affect the internals of these computations.  Within the “atom”, the order is fixed.

“So what does reordering have to do with concurrency?  Everything, as it turns out.  In order to implement an asynchronous algorithm, it is necessary to identify the parts of the algorithm which can be executed in parallel.  In order for one computation to be executed concurrently with another, neither must rely upon the other being at any particular stage in its evaluation.”

All of that should sound familiar at this point, and that serves as a good stopping point for now on the topic of atomicity. Here’s the link to Daniel’s whole article, which I highly recommend: http://www.codecommit.com/blog/scala/higher-order-fork-join-operators

Stack Overflow, web 2.0 done right or How I've become an addicted stackhead

There are plenty of technical Q&A web sites out there.  We've all used them at one time or another, if not daily. Most often, I stackoverflow-logo-250used sites, mailing lists and forums set up specifically for some tool I'm currently working with. In the past year, I've spent time on the Apache mailing lists, the Codehaus support forums for some open source projects, and of course on our own Peer2Peer developer community. Sites specific to a toolset or product like Peer2Peer tend to be focused and valuable.  However, generalist Q&A sites tend to devolve due under the Usenet effect and experience a peer2peer_logohigh noise to signal ratio. Doesn't it drive you crazy to have to wade through piles of junk to find the golden nugget? I've been  tooling about on the "Internet" since the early 1980's, and I shudder to think how much time I've spent being the physical embodiment of a regular expression filter, looking for the right answer among the non-matches.

Recently I found out about Stack Overflow, a generalist programming Q&A site that gets it right.

A while back I stumbled on the news that Jeff Atwood (of Coding Horror fame), and Joel Spolsky (yep, Joel on Software), and friends (the gang is further down the "about" page), were teaming up to create a technical Q&A site for developers. They revealed that it would be a free site, and the more I learned about it via their podcast, and other sources, the more excited I got. I began watching for them to go live.

I'm now a week into my beta membership, and I'm not excited anymore. I'm addicted. Why so?

It's free, it's fast, and it doesn't require registration (see more below).

And, I don't have to filter out the noise to find good answers. Why? Because Stack Overflow is a great example of Web 2.0 done right by delivering community driven content and value.

On Stack Overflow, you don't just post questions and reply with answers. If you do choose to register (again free), you get to participate by voting questions and answers up and down in relevance. Also, you get to categorize them into topics, as well a flagging them as too subjective and non-specific. Subjective? Yep. Stack Overflow cares about the relevance of the questions and value of the answers, and polices them via the community. As answers get voted up in relevance, they float to the top of the thread, right next to the question. Huzzah! No more reading long, irrelevant threads.

Gee, that sounds like...work. Not really, they make it fun. You build karma with just about every action, and gain badges of recognition for your efforts. Their meritocracy approach seems to be dead on. You don't get a vote in the community until you have participated enough to gain fifteen karma points. However, that's not hard to achieve. One answer considered relevant by one other person, gains you around eleven points, so it's not hard to get involved.

The Stack Overflow creators describe it as a synthesis of Wiki, Blog, Forum, and Digg/Reddit.

Okay, enough! If you want to know more, and I've barely touched on the details, go to the Stack Overflow About page, and the FAQ page to learn more. Several of us techie folks here at Appistry are becoming stackheads, so when you don't find us on Peer2Peer, perhaps we'll see you over there.

Categories:

Dr. Podcast or How I Learned to Keep Commuting and Enjoy the Time, episode 2

This weeks audio picks features episodes from Futures in Biotech with Marc Pelletier, one of several shows featured on the earth-rise-apollo8TWiT.tv Netcast Network with Leo Laporte.

TWiT.tv's history is pretty extensive, so I'll point you over to them to see what it's all about. These guys have a number of podcasts going on a regular basis. Besides Futures in Biotech, a random sampling of other topics are MacBreak Weekly, Daily Giz Wiz, Security Now, and Jumping Monkeys. I found TWiT.tv when I was reading up on Drizzle, the current work coming from Brian Aker and other folks from the MySQL project. Brian was interviewed for FLOSS Weekly 35, the weekly open source-oriented podcast at TWiT.tv.

AUDIO DU JOUR

Now on to this week's podcast picks from Futures in Biotech. Marc Pelletier is a biotechnologist, and host of the podcast. The byline for Futures in Biotech is found on the FiB site:

"This netcast explores the rapidly changing world of biotech, with a penchant towards getting a better understanding of who we are and where we are going. The living world will soon be a true substrate for engineering. Our world will change, and so will we.
We bring a first hand account from the scientists that are moving us into this new technological era - the era of biotech."

In prior blog posts here at Appistry, I've talked about my interest in the synthesis and cross-pollination of ideas between disciplines. Besides strictly  biotechnological topics, Marc has been interviewing folks from the space industry, another of my interests. He does this with the hope of exploring the cross-pollination between biotechnology and space sciences. For example, in episode 23, he interviews Dr. Buzz Aldrin, one of the first to walk on the moon, and in episode 26, he speaks with Dr. Harrison Schmitt, the last man to walk on the moon. Both men are still very active in mankind's reach into space. I'm a firm believer in the value that comes from space exploration, both in terms of knowledge gained, and the technological advances we all subsequently benefit from. Both interviews are excellent, and a great listen, and there are other space-related interviews that I've not gotten too yet.

See you next week with more. Meanwhile, keep learning!

Categories:

Is your code cloud-ready and multi-core friendly? (part 1): Introduction

With the advent of cloud computing, the explosion in mobile devices and the growing availability of multi-core CPUs, there GotCray is a drive toward new models of how we design and develop code. Well, they’re not new really. Some of the models have been  around, and in use, but they are not typically practiced by mainstream developers...yet.

Michael blogged about this trend in his blog post “Microsoft says you need to change how you are building your applications.” You should pop over and read the whole post, but here is part of what he said:

“I was surprised how many speakers [at Microsoft TechEd 2008] were conveying the same message: 

‘CPU speeds are topping out.  If you want your applications to run faster and better you are going to have to build your applications in a new way.  The solution isn't just to learn how to multi-thread your applications.  The solution lies in building your applications into smaller units of code called tasks that can be moved around to the different cores of a multi-core machine.’

Following that up, Michael quoted Bill Gates' keynote speech and added his own comment:

The... ‘need to take programs and break them down into parallel execution units now becomes absolutely necessary.’

This statement was music to my ears.  Why?  It is the same model that we have been using for fabric computing since our beginnings in 2001.  Now we are starting to see this message more and more as cloud computing is gaining acceptance.”

For our code to meet the demands of computing in a cloud-based, multi-core world, we'll need to design our code with the following attributes in mind:

· Atomicity

· Statelessness

· Idempotence

· Parallelizability

That's a mouthful. In a continuing series, we will examine what each means and what it gets us.