High-level GridSweeper execution overview

The purpose of GridSweeper is to take a simple user-provided description of what parameter settings to run a model with, run the model on a grid, and return results to the user.

The user will be able to manipulate the parameter-sweep description in three ways: (1) using an XML specification file, (2) with command-line arguments, and (3) with a graphical user interface. These three mechanisms can be mixed: command-line arguments can augment or override XML as well as be saved back out to XML, and the GUI tool will serve to edit and save XML files as well.

Ultimately, user action will result in running the GridSweeper program, which turns parameter sweep specifications into job specifications for the grid system via DRMAA. Specifically, the program does the following:

  1. Parses the XML specification and command-line arguments to generate an Experiment object.
  2. Generates a list of ExperimentCase objects (parameter value settings) from the Experiment class.
  3. Sets up an output directory for the files generated by this experiment. If a shared filesystem is not present, this can be done via FTP or other file-transfer mechanism supported by a plugin implementing a subclass of FileTransferSystem.
  4. Starts a DRMAA session and submits a job for each experiment case, using an archived RunSetup object for each job’s standard input.
  5. Still unimplemented: monitors the results of jobs and reports status changes to the user.

The way things are set up now, GridSweeper requires support on both the submission end and the execution end of the grid. The DRMAA job specification specifies that the execution host run not the model itself, but the GridSweeperRunner program, which takes input data and uses that to actually run the model. Specifically, it does the following:

  1. Unarchives the RunSetup object from standard input.
  2. If necessary, downloads input files via the file transfer mechanism.
  3. Actually runs the model using an instance of the Adapter class specified by the user (explicitly, or implicitly by using, e.g., gdrone for the Drone compatibility adapter). The Adapter object knows how to take a set of parameters and send it to a particular type of model executable.
  4. If necessary, uploads output files via the file transfer mechanism.

One problem with this mechanism is that it submits a separate job for every experiment case, bypassing DRMAA’s notion of batch jobs. DRMAA batch jobs let you submit a whole bunch of jobs at the same time by specifying that each job is the same except for an integer specifier, and that specifier can be used as a variable in command-line arguments. Because some systems may be faster at accepting batch jobs than a pile of individual jobs, it might be worth using the batch job mechanism.

One way to do this would be to defer the calculation of parameter assignments and random seeds to the execution host, but that makes it impossible to generate a file for reproducing the experiment as soon as it is submitted. A better way is to generate a series of input files in the experiment directory, named with the batch run index, and have the GridSweeperRunner tool read those files at runtime rather than reading an object from standard input.

This entry was posted in GridSweeper. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>