As part of an ongoing series, we have been discussing design principles that influence how ready our code is for distributed computing in the cloud, as well as for multi-core utilization. Today, we conclude the series with discussion about…
Today we shift our discussion to how atomicity, statelessness, idempotence, and parallelism in our code help us gain the benefits of cloud application platforms. Cloud application platforms allow our code to "inherit" capabilities like scaling out horizontally, scaling up across multiple cores, availability, reliability, manageability, load balancing, and command and control. Throughout this blog series, we have touched on these benefits, but now in our conclusion, we will discuss how cloud platforms deliver these benefits to our code.
First, we need to define what we mean by a cloud platform, and why we as architects and developers should care in the first place.
What is a Cloud Platform?
First, we need to orient ourselves in the cloud. The diagram below helps to categorize different cloud technologies into simple architectural layers. The break out is not perfect as some products may touch more than one layer, but it will serve fine here.
The Infrastructure-as-a-Service Cloud
Infrastructure-oriented cloud architectures, including Infrastructure-as-a-Service (IaaS) offerings, provide access to virtualized, on-demand computing resources. Amazon EC2 is a well-known example of this approach. The user can request Linux and Windows virtual machine instances be created on the fly and billed based on actual usage. The cloud infrastructure allows the user to manage their virtual machines and associated resources like IP addresses, and configuration. Regarding EC2, clients do not know where the machines are geographically located or what kind of hardware is being used. This is what makes the service cloud-like.
"Cloud Platforms" vs "The Platform-as-a-Service Cloud"
Platform-oriented approaches to cloud, including Platform-as-a-Service (PaaS) and cloud application platforms, run atop an underlying cloud infrastructure. Cloud platforms abstract applications away from the underlying cloud infrastructure, and provide supporting services and functionality to those applications. The distinction between cloud infrastructure and cloud platforms is a critical one for architects and developers to understand as they spoon about in the alphabet soup of cloud hype and reality.
Salesforce.com’s Force.com and Google’s AppEngine both typify the PaaS approach. The AppEngine user is solely concerned about the application they are creating to run on the platform. To deliver an application they simply package it and deploy it to AppEngine. The deployment happens in a single step and the end-user does not know whether the application is being run on one virtual machine or ten at any given point in time. In addition, the application can take advantage of special services provided by the AppEngine platform, such as authentication or data access.
Cloud application platforms, like their PaaS cousins, allow the developer to focus solely on the application deployed on the platform. Likewise, cloud application platforms offer the same or similar benefits described briefly for AppEngine above, such as virtualizing your application across the infrastructure, simplifying deployment, providing special services, etc. A key difference between some cloud application platforms and their PaaS cousins is portability across cloud infrastructures (only Google provides AppEngine deployment for example), the ability to run your cloud platforms in-house on a private cloud rather than on the public cloud, and flexibility in the choice of implementation languages, IDEs, and tools among other differences.
Typically, you should not have to care what the underlying cloud infrastructure is. Likewise, you should not be concerned with writing application code to implement scalability, reliability, and other cloud and distributed computing features that a cloud platform could provide. Your focus should be on the business logic that brings your "value add," while the cloud platform virtualizes your application, manages its lifecycle, and leverages your application over the underlying cloud infrastructure. Cloud platforms take your code, (which is ideally atomic, stateless (and likely stateful too), idempotent and parallelizable) and does the heavy distributed computing and multi-core lifting, giving you benefits that are otherwise hard to achieve on your own.
What benefits you ask?
Scaling Out, Scaling Up, and Scaling Down Gracefully ![]()
Cloud platforms horizontally scale out your application by running it across many servers or "workers." When transaction loads are high or we anticipate the need for more through put, we can add more workers. When loads drop, workers can be shutdown (offering "green" dividends by using less power), or shunted over to another application that needs the workers now.
Why do you as a developer care? If you have provided the cloud platform with a well-designed application, the cloud platform should be able to scale your application for you. Therefore, you don’t have to write the scalability code. In most cloud platforms, your code doesn’t know it’s in the cloud, much less being scaled out.
What about scaling up across multiple cores to utilize all the available processing power? The same principles apply. If your code follows the principles we’ve outlined throughout this blog series, then the cloud platform can automatically scale the execution of your code across whatever cores are available without you having to use any special language primitives or tools. The ability to do this varies by the cloud platform, but they are out there, and it works great.
By running stateless, atomic code on a cloud platform, and having a cloud platform that allows you to take advantage of availability and reliability, you get resilience and the ability for your cloud application to scale up and down gracefully. If you need more resources, you can add more nodes, and scale out horizontally, and if your cloud platform utilizes multi-core efficiently, you get to scale up across cores. If one or more nodes die, availability assures that new work will get done, and reliability assures that in-flight work has a chance to complete. Either way, you can scale down with a degree of grace, even in the face of hardware failures.
Availability
Cloud platforms distribute your code across the cloud in different ways. Some platforms put all of your code on every worker and execute your code on any of those workers at any given time. Other
platforms specify workers to given tasks or roles. Sometimes all of a transaction will occur on one worker until it is done. Other platforms may optionally distribute even the execution of a single transaction. Regardless of the model, cloud platforms make your application code highly available by distributing and managing it across multiple workers.
When your code is atomic and stateless in nature, your code can then reside anywhere in the cloud that the cloud platforms puts it, and in an ideal setup, the code can execute anywhere without you having to think about it. At its root this means you automatically have high availability. If a given compute node dies, who cares? The other nodes have the code and can fulfill transactions.
Reliability
What do I mean by reliability? Say I request code to execute, and something bad happens, then reliability means that the requested work still gets done, or at least the environment does it best to complete it instead of just giving up, or worse, losing the work entirely.![]()
There are a number of models for attaining reliable execution in cloud platform environments. If the cloud platform is designed to provide reliability to executed code, then you’ll likely get this functionality almost for free or through how your application is deployed at runtime. If not, well, then it can be a lot of work to do it yourself.
We’ll focus on one reliability model that directly shows the benefit of the atomic, stateless and idempotent nature of your code. Say I’ve requested code to execute in the cloud, and a failure occurs. Perhaps the worker doing my work suffers a power supply failure. The cloud platform detects the loss of work, and, depending on packaging-time configuration, retries that work on a different worker instead of returning the failure immediately to the requester. The cloud platform then retries that work until success is achieved or some configured threshold is met, and failure is returned.
If your code takes advantage of the attributes of atomicity, statelessness, and idempotence, then you can have the flexibility to reach for reliability, especially if the environment leverages this functionality for you. Without these attributes, your options are narrowed. For example, consider atomicity in the reliability model just discussed. If the executed code encapsulates multiple non-atomic steps, then the complexity of retrying that step goes way up. Likewise, if the code is a long running series of steps, rather than stand-alone atomic steps, then a retry must rerun the entire series when failure happens, instead of just picking up at the step that failed.
Of course, not all code is idempotent and repeatable, often because it affects state of some sort. In this case the cloud platform needs to be able to deal with that, preferably in an application configurable manner. We’ll address possible solutions under “Command and Control” below.
Manageability
Even as developers, we are affected by how difficult or easy it is to deploy and manage code in the runtime environment. When the runtime environment, even in
development and testing, is distributed across multiple servers, the complexity and time to manage the application goes up dramatically. Cloud platforms take this into account (more often than not because the developers that are creating and maintaining the cloud platform are affected by the same complexities!).
Some cloud platforms that I am aware of allow the developer to code and test their application on one box rather than many, and some Cloud Application Platforms allow you to develop most or all of your applications outside the cloud platform with your normal development and testing tools (this is not true for many Platform-as-a-Service environments).
Beyond this point, there are varying levels of difficulty in deploying and managing your application on the various cloud platforms. I’ll focus on what I consider the easier to deal with feature sets. Your mileage will vary based on what cloud platform you choose to run your code on.
Okay, so you have some code ready to run. Typically, you will package the application in some way, bundling with it configuration information that tells the cloud platform how you want the application managed. Next, you will deploy that application "into" the cloud platform with a single command. Some, but not all, cloud platforms will automatically distribute your application to all of its workers (or some workers–depending on the platform’s model), and bring your application up and running. You are done, now use your client and access your cloud application.
Subsequent versions of your code are handled the same way. You will usually repackage the code and redeploy it, probably handling the "version" of the package in some way. The cloud platform will take care of updating the code for you.
Why do I even bring this up in a blog series primarily focused on developers? Deployment is not a production and testing-only concern. Anything that affects time usage during development needs to get the hairy eyeball. At least the pragmatic programmer roots think so.
Of course, this level of manageability goes way beyond the developer. We are aware of one company with over three hundred workers (500+ cores) that manage their private, production cloud running multiple cloud applications with less than one-third of an administrator’s time.
Cloud platforms utilize various types of load balancing. It may be as simple as using software or hardware-based load balancers between the cloud application and its clients. Or, it could be as sophisticated as the cloud platform utilizing its own built-in software-based load balancing. Load balancing affects both scalability and availability. When your application’s work is distributed across many workers, you want to make sure that each worker in the cloud is being utilized in a maximum manner.
The situation gets more complex when one introduces the possibility of heterogeneous workers in the cloud. If your cloud is made up of workers that range from older single-core processors up to quad-core boxes or more, then you have workers with very different capability footprints.
Either way, wouldn’t it be great if you as a developer did not have to worry about this? Instead of having to sweat the application infrastructure and/or architecture to make sure the application load balances across the cloud, some cloud platforms take care of this for you.
Command and Control
Utilizing atomic, cohesive code opens up the possibility of using declarative state machines. Declarative state machines have been around for a while. They allow us to design flows of steps in a declarative way, often in XML or some other domain specific language (DSL). They are often used in middleware and in defining business logic and work flows. Spring Web Flow is based on this concept, as is Microsoft Windows Workflow, and Appistry’s own Process Flow technology. There are many other examples.
Typically the model runs something like this: a state machine of different steps is defined. Each step or state is tied to a unit of executable logic or task. Which state the state machine branches to next is determined by the task execution results of the prior step. If a step succeeds then the next step takes some happy path. If a step fails then the next step may execute a compensating task to deal with the failure, or request help, or return failure. Success or failure is usually defined by conditional logic, rules, data values, thrown exceptions, and other conditions.
By using declarative state machines to orchestrate atomic, stateless, perhaps idempotent code in the context of distributed environments like cloud platforms, we get surprising levels of robustness, reliability and flexibility. Additionally, the declarative state machine allows us to handle reliability more readily for that code that must be stateful or that cannot be safely re-executed because its operations are not idempotent. The declarative nature of the state machine allows design for how to deal with failures in these conditions, without putting the failure handling inside our code. Also, some state machines allow for snap shooting progress in the state machine steps, so that a process interrupted by failure can be resumed and completed. Again, this is something that would not be possible without making sure our code breaks down nicely into atomic steps or tasks.
When this technology is seen in cloud platforms, it allows the orchestration of our code across many workers in a reliable, scalable, available, and load balanced way without our code knowing about it.
Immediately once must ask, do these cloud platforms exist? Yes, they do. Google AppEngine, our earlier PaaS example, focuses on web service-based applications written in Python. In my first post, Michael quoted Bill Gates in regards to a coming need to change how we code to take advantage of these new paradigms. Since that first post, Microsoft has gone on to announce Microsoft Azure for .NET and its suite of related services and tools. Though Azure really touches all three tiers of the pyramid shown above, there is a cloud platform at its heart. Likewise, Appistry EAF is an excellent example of a cloud application platform that is cloud infrastructure agnostic, and allows you to deploy cloud applications written in JAVA, .NET, C/C++, and even native
command line utilities. And, there are other cloud platform choices. There are differences from cloud platform to cloud platform, both in features and focus, but typically each cloud platform hides the cloud infrastructure from your application, and virtualizes your application to manage and leverage it in a cloud-like manner, and provides essential services so that you as a developer do not need to re-invent the wheel.
This concludes our exploration of the types of changes in store for how we design code in the face of cloud computing. Like any design principles, they have to be applied with some common sense. There are no magic or silver bullets, and hammers aren’t the right tool for every job. However, considering these principles will help you leverage your code on cloud platforms now and in the on-rushing future.















Comments on this entry are closed.