The purpose of GridSweeper is to take a simple user-provided description of what parameter settings to run a model with, run the model on a grid, and return results to the user.
The user will be able to manipulate the parameter-sweep description in three ways: (1) using an XML specification file, (2) with command-line arguments, and (3) with a graphical user interface. These three mechanisms can be mixed: command-line arguments can augment or override XML as well as be saved back out to XML, and the GUI tool will serve to edit and save XML files as well.
Ultimately, user action will result in running the GridSweeper program, which turns parameter sweep specifications into job specifications for the grid system via DRMAA. Specifically, the program does the following:
- Parses the XML specification and command-line arguments to generate an
Experimentobject. - Generates a list of
ExperimentCaseobjects (parameter value settings) from theExperimentclass. - Sets up an output directory for the files generated by this experiment. If a shared filesystem is not present, this can be done via FTP or other file-transfer mechanism supported by a plugin implementing a subclass of
FileTransferSystem. - Starts a DRMAA session and submits a job for each experiment case, using an archived
RunSetupobject for each job’s standard input. - Still unimplemented: monitors the results of jobs and reports status changes to the user.
The way things are set up now, GridSweeper requires support on both the submission end and the execution end of the grid. The DRMAA job specification specifies that the execution host run not the model itself, but the GridSweeperRunner program, which takes input data and uses that to actually run the model. Specifically, it does the following:
- Unarchives the
RunSetupobject from standard input. - If necessary, downloads input files via the file transfer mechanism.
- Actually runs the model using an instance of the
Adapterclass specified by the user (explicitly, or implicitly by using, e.g.,gdronefor the Drone compatibility adapter). TheAdapterobject knows how to take a set of parameters and send it to a particular type of model executable. - If necessary, uploads output files via the file transfer mechanism.
One problem with this mechanism is that it submits a separate job for every experiment case, bypassing DRMAA’s notion of batch jobs. DRMAA batch jobs let you submit a whole bunch of jobs at the same time by specifying that each job is the same except for an integer specifier, and that specifier can be used as a variable in command-line arguments. Because some systems may be faster at accepting batch jobs than a pile of individual jobs, it might be worth using the batch job mechanism.
One way to do this would be to defer the calculation of parameter assignments and random seeds to the execution host, but that makes it impossible to generate a file for reproducing the experiment as soon as it is submitted. A better way is to generate a series of input files in the experiment directory, named with the batch run index, and have the GridSweeperRunner tool read those files at runtime rather than reading an object from standard input.