MPI Is Dead … and It’s About Time!

by bob on September 24, 2007 · 4 comments

in Editorial

Last week at the HPC on Wall Street conference (it’s a really nice one-day format … hope this is a growing trend!) I helped with the keynote panel entitled What’s Hot in HPC (audio here). Along the way I mentioned in passing that MPI is dead. Dead as in assuming room temperature, dead as in let’s write an epitaph, dead as in maybe it’d be polite to start talking about something much more useful.

But wait a minute, you may ask … what about all of those apps built with MPI, what about OpenMPI, what about all of true-blue supercomputing elite (taking their cue from Monty Python’s In Search of the Holy Grail) who say that not only is it not dead, it’s not even sick yet?Montypythonfrench

The funny thing is, that a panel or two later a well-meaning panelist from Intel Research (but clearly not from Intel Application Development!) said just that … that I had clearly not intended to say that MPI was dead, surely I couldn’t possibly have meant to imply that at all. In fact, it’s not dead yet … it’s getting better.

Actually, that’s exactly what I meant to say. In case there’s any doubt, MPI IS DEAD!

How Can I Say This?
Actually, this is a relatively straight-forward to see. In the bad old days all true High Performance Computing was done with some combination of shared memory, pipelined processors, and lots of threads (in the really bad old days all of this was done with hand-crafted machine code, but let’s keep this blog suitable for reading with your family!)

At some point various forms of HPC clusters and grids arose, and apps were written using messaging primitives. Since that was a little hard to do from some of the languages common in the hpc world, folks developed MPI as a friendly way to use messaging primitives.

Unfortunately, no amount of api-magic could change the simple fact that writing code to execute coherently across an arbitrary number of independent processors (with or without shared memory) is enormously difficult.

Done well the resulting applications can run quickly and achieve results that could never be achieved with a small number of threads on a small number of processors … yet it’s so very hard to “do these well”. Hard as in takes a long time, hard as in easy to make mistakes. In fact, the skills needed to write high-quality messaging-based applications are on par with operating system development writing your own processor microcode. Possible? Sure. Satisfying when you finally get it to work? Uh-huh. Desirable? Only if there’s no other choice.

The most irritating thing is that this bog of forgotten application architectures is precisely where HPC application development has been stuck for the past 20+ years. The second most irritating part of all this is, for the vast majority of applications, it’s simply totally unnecessary.

Totally Unnecessary? Are You Kidding Me?
Nope, that is the simple truth.

True, there are some algorithms and applications for which developing shared, coherent state and operating on that state is best done with fine-grained messaging, perhaps augmented with some state mechanism. But most of what has been implemented in MPI has been built that way because, well, mostly because that’s the way they were built.

Many of the applications that I see are much easier to do by first looking for the data fissures … that is, looking for places in which the data has inherent independence. Once that has been identified, then just bundle the data into multiple, individually simple service requests, collect all the results and you’re done.

Of course you want to put these applications on our application fabric, so that everything is scalable, reliable, easy to use and operate. You can even make use of data-grid facilities (like our process-flow and FAM mechanisms) to keep any state needed (also with full fabric attributes – scalable, reliable, easy to use and operate).

Sure, if you have one of those problems that just can’t be done any other way then go ahead and use MPI (or it’s more “modern” derivative, OpenMPI), or if you have an existing MPI applications at least move it to a fabric to help with operations. But unless your goal is to show just how good your architects and developers are, for most problems you’d be far better to just leave MPI behind and move fully into our world.

One More Thing
Application patterns like map-reduce and scatter-gather take this simplification a step further. In the future I’ll post on how our fabric is an ideal substrate for map-reduce and scatter-gather applications, but in the meantime whenever somebody asks you to use MPI … Just Say No!

Technorati Tags: , , , , , , , ,

{ 4 comments }

Andrew Dalke December 10, 2007 at 7:09 pm

“there are some algorithms and applications for which developing shared, coherent state and operating on that state is best done with fine-grained messaging, perhaps augmented with some state mechanism.”

What are some examples of this? The ones I’m thinking of are dynamics codes: fluid dynamics, molecular dynamics, and so on. These are typically spatial with strong short-range interaction and weak/no long-range interaction. Are those valid examples, and what are other classes of solutions where fine-grained communications is more appropriate?

Bob Lozano December 11, 2007 at 4:17 pm

That’s an interesting question. If the clusters of activity can be made orthogonal enough, and IF the cluster can be contained on an individual worker, then you’re back to the case where utilizing simple service requests are better.

Other examples of this include algorithms that require several stages with very large matrix manipulations (for example in various imaging). While this can be helped with sparse matrix manipulation, even there all of the current sparse matrix implementations that I’m aware of (such as pspaces) are built on MPI (whether or not it’s truly necessary, which it may not be for certain classes of problems).

In practice, while these problems abound in certain domains, across the field of all problems (including many, many HPC applications that have been traditionally built with MPI) I think a very high percentage of problems can be done well with simpler fabric service requests.

Bob

Andrew Dalke December 11, 2007 at 6:15 pm

I agree. Quite a few of the cluster programs I know – including some I’ve written – use PVM/MPI for scatter/gather where I would now advocate to do it via a web request. One of my clients did that, but one problem was they didn’t have any in-house experience for doing scalable web services, while they did for MPI.

The MD codes I know of use spatial decomposition where each cell communicates with up to 26 neighboring cells and where the data exchanged is very uniform and consistent and where communications is a non-trivial part of the overall time, which is why I think fine grained communications is still appropriate there.

Jose Luis Quiroga May 28, 2009 at 4:54 pm

It seems to me that Bob it just ignorant of all the high level functions that have been builded on top of the basic messaging of MPI. Two particular ones available in any decent implementation of MPI are, guess what:

scatter and gather operations. (map and reduce operations)

and of course in OpenMPI provides several common cases already implemented.

On top of MPI libraries there are other open libraries (do your homework) that implement higher level funcs like all kinds of linear functions (matrix ops, etc) because of the simplicity to do so. On top of those libraries you can code what seems to be a linear code (a series of matrix ops).

OpenMPI is optimized for different kinds of underling hardware architecture (slows WANS, less slow LANs, high speed LANs or clusters, and shared memory multiprocessor machines). It is all transparent to the user.

The only point in favor of very few implementations of MapReduce over OpenMPI is the fault tolerance over unreliable networks, but OpenMPI is a joint project of several other projects one of them been LAN-MPI which of course has to have several fault-tolerant strategies that are going to be merged into OpenMPI.

Greetings.

Comments on this entry are closed.

Previous post: Appistry’s Enterprise Application Fabrics – The cure for scaling constipation and premature hardware acquisition

Next post: Xbox Live Keels Over … OOPS!