Run your own R Script model

Most of the time, model code is not designed to be portable. For now, OpenMOLE handles Java, Scala, NetLogo and R (in the near future) via specific Tasks, but it is still far from covering the languages used to develop models around the world.

Meanwhile, you have to package your code using CARE, as explained in the Native Packaging section. The following contents expose how to handle your packaged model within OpenMOLE.

Embedding R code

Our first example is an R script contained in a file myscript.R. We want to distribute the execution of this R code to the EGI grid.

First your script should:
  • Run in headless mode with no input required from the user during the execution;
  • Produce files or write its results to the standard output so that OpenMOLE can retrieve them from the remote execution environment.

Here is an example R script matching these criteria:
args<-commandArgs(trailingOnly = TRUE)
write.csv(result,"result.csv", row.names=FALSE)
With an example data.csv:

This reads a file called data.csv, multiply its content by a number provided on the command line and writes the result to an output file called results.csv.

To call this script from the command line you should type: R -f script.R --slave --args 4 considering you have R installed on your system.

Once the script is up and running, remember that the first step to run it from OpenMOLE is to package it. This is done using CARE on your system.
care -r ~ -o R.tgz.bin R -f script.R --slave --args 4

Notice how the command line is identical to the original one. The call to the R script remains unchanged, as CARE and its options are inserted at the beginning of the command line.

The result of the previous command line is a file named R.tgz.bin. It is an archive containing a portable version of your execution. It can be extracted and executed on any other Linux platform.

The method described here packages everything including R itself! Therefore there is no need to install R on the target execution machine. All that is needed is for the remote execution host to run Linux, which is the case for the vast majority of (decent) high performance computing environments.

Packaging an application is done once and for all by running the original application against CARE. CARE's re-execution mechanisms allows you to change the original command line when re-running your application. This way you can update the parameters passed on the command line and the re-execution will be impacted accordingly. As long as all the configuration files, libraries, ... were used during the original execution, there is no need to package the application multiple times with different input parameters.

You can now upload this archive to your OpenMOLE workspace along with a data.csv file to a subfolder named data. Let's now explore a complete combination of all the data files with OpenMOLE. The input data files are located in data and the result files are written to a folder called results. A second input parameter is a numeric value i ranging from 1 to 10. The corresponding OpenMOLE script looks like this:

// Declare the variable
val i = Val[Double]
val input = Val[File]
val inputName = Val[String]
val output = Val[File]

// R task
// "workDirectory" is automatically set to the location of your .oms script in your OpenMOLE workspace
val rTask = CARETask(workDirectory / "data/R.tgz.bin", "R --slave -f script.R --args ${i}") set (
  (inputs, outputs) += (i, inputName),
  inputFiles += (input, "data.csv"),
  outputFiles += ("result.csv", output)

val exploration =
    (i in (1.0 to 10.0 by 1.0)) x
    (input in (workDirectory / "data").files withName inputName)

val copy = CopyFileHook(output, workDirectory / "result" / "${inputName}-${i}.csv")
exploration -< (rTask hook copy hook ToStringHook())

The CARETask performs two actions: it first unarchives the CARE container by running R.tgz.bin. Then the actual execution takes place as a second command. Note that for each execution of the CARETask, any command starting with / is relative to the root of the CARE archive, and any other command is executed in the current directory. The current directory defaults to the original packaging directory.

Several notions from OpenMOLE are reused in this example. If you're not too familiar with Hooks or Samplings, check the relevant sections of the documentation.

Two things should be noted from this example:
  • The procedure to package an application is always the same, regardless of the underlying programming language / framework used.
  • The CARETask is not different from the SystemExecTask to the extent of the archive given as a first parameter.
  • These two aspects make it really easy to embed native applications in OpenMOLE. You can also read more about packaging your native models for OpenMOLE in the dedicated section.