Gene Cooperman of the College of Computer and Information Sciences began tinkering with parallel computing over a decade ago, exploring the possibility of using 10 computers to do in 1 hour the job 1 computer can do in 10 hours. In 2009 he used this method to solve a Rubik’s cube in a record 26 moves.
But he kept coming across the same problem: Say 10 computers have been combining their power to run a program that sifts through thousands of molecules searching for that needle-in-a-haystack cancer drug for the last six days. One of those computers is mine and I suddenly decide that it’s critical for me to play solitaire ASAP, so I pull my computer out of the network. Well…I just screwed everything up. The last six days worth of work go down the drain.
Wouldn’t it be nice if I could just pause the program for a couple of minutes to get my solitaire fix, and then let the program continue molecule mining where it left off?
Programmers can write code that enables “checkpoint-restart” capabilities to do this job, but they need to do it every time a pause is anticipated. Before I play solitaire, I’d need to write some new code.
DMTCP — which stands for Distributed MultiThread CheckPointing (DMC would have been an easier acronym to remember, but it was already taken0). What the heck does that mean? I asked Cooperman a similar question yesterday.
The program hovers in the background of the molecule miner (or any other program) and stealthily checkpoints — or saves — the state affairs. When I come in to play solitaire DMTCP forces the molecule miner to stop, save everything and wait till I’m done. When I decide it’s time to stop procrastinating, the molecule miner starts up again from the checkpoint.
DMTCP is the most widely used checkpoint-restart program of its kind because it can run transparently in the background without interrupting anything. Also, said Cooperman yesterday, “Our main goal is to be completely general purpose.” It is flexible in that it can be used with a broad range of program types.
Five years ago, said Cooperman, there was no program like this so researchers across a variety of fields simply couldn’t explore certain questions. “Now there’s a new tool and now you show it to people they say ‘oh, I have something else I can use this tool for,’ completely different from what you developed it for.”
So, essentially, DMTCP has the potential to enable completely novel investigations using a variety of computational techniques.