A scientist writes a computer program, which in turn calculates the results of an experiment. They take those conclusions and publish a paper, detailing their results and their methods — including the bespoke computer program.
Now other scientists can replicate those results, right?
Not so fast, says assistant professor Jonathan Bell, in the Khoury College of Computer Science.
“I wrote my experiment, and I put it in a Jupyter notebook” — a program that merges code with plaintext descriptions, often used in classroom settings — “so I can share it with my students, and then they can run it and get the same result. ‘Science, hooray,’” Bell says. However, “that doesn’t actually happen much of the time.”
“There are a variety of challenges that make it difficult to rerun software experiments and get the same results,” he continues. “Regardless of what kind of science you’re doing … you run into exactly this problem.”
Scientists and software engineers call this the “software reproducibility crisis.”
This instability calls into question the results of many, many experiments. If scientists can’t validate one another’s results, from what grounds can anyone argue that an experiment was, in fact, accurate? How could a new experiment ever build on the results of a previous one?
Bell says that the problem can arise because a “program [often] relies on outside stuff, like reading a data file sitting on my desktop, or downloading some third-party library from the internet, or worse.”
For instance, he continues, maybe a user “downloaded some third-party library from the internet two years ago, installed it on my computer and I’m [still] using it now. Try to describe to someone else how to reproduce that.”
The devil is in the idiosyncrasies. “It should just be that you run a program, and you can run it again and it works.”
To achieve this goal, Bell and a group of researchers from Carnegie Mellon University and University College London have proposed “a community infrastructure that will bring the benefits of continuous integration to scientists developing research software,” they write in their abstract.
Continuous integration is a concept that arises out of software engineering, in which “developers create a fully-automated ‘workflow’ for executing some test suite, leveraging the relatively low cost of cloud computing resources to create a fast feedback loop,” the researchers explain in their paper.
In other words, “continuous integration creates an automated cloud testing environment, easily executed to ensure that an experiment remains reproducible,” Bell says.
This workflow becomes “especially valuable when it is necessary to design, implement and evaluate several prototypes” — as in the design of scientific software, for instance.
In practice, what this all means is that Bell, et al., are developing “a system for automating software experiments,” Bell says.
“So if you’re a researcher who wants to build a new tool to automatically fix bugs in software,” he says by way of example, “what our infrastructure does is take your tool — that fixes the bugs — and run it on all of those [bugs] and report back all of the results.”
“The reason why the infrastructure is needed,” he says, “is that running your tool on a single bug might take eight to 12 to 24 hours.”
Now multiply that time cost across thousands of bugs, or whichever kind of calculation a software experiment is interested in.
“A lot of it comes down to creating this shared research infrastructure for running these large and complex experiments,” he says.
Bell’s specialty is in software engineering, but more specifically testing. “So, how we broadly check that software is correct,” he says.
“We’re looking at this problem” of software reproducibility, he says, “through the lens of software testing, where ultimately, the experiment itself is a test.”
Bell points to Northeastern’s Research Computing center as an incredible resource for performing these computing-intensive tasks.
“This research is really only possible because of the resource that Research Computing provides here, because it lets us actually do these kinds of experiments,” Bell says. “We have our own servers that are maintained by the Khoury Systems Group, who have been phenomenal in terms of providing the hardware and software that we need.”
“I truly am grateful,” he concludes.
Noah Lloyd is a Senior Writer for NGN Research. Email him at n.lloyd@northeastern.edu. Follow him on Twitter at @noahghola.