GridSweeper’s file transfer mechanism is designed to allow transfer of input and output files in a grid system-independent way. In environments without a real IT infrastructure, such as my own ad-hoc Xgrid setup, a shared filesystem will not necessarily be available, so you need a way to stage and retrieve files.
The GridSweeper code is file transfer system-agnostic, providing a simple interface that can be implemented for a particular file transfer system (e.g., FTP, which is included as a working example). The interface requires just a few basic methods: connect(), disconnect(), uploadFile, deleteFile, makeDirectory(), removeDirectory, list(), and isDirectory(), which all do pretty much what you’d expect. There is no notion of a working directory, so all paths are relative to the implicit root of the file system. Particular file transfer systems can define custom properties to affect setup—e.g., the FTP system provides properties for setting the hostname, username/password, root directory (so the GridSweeper root need not be the same as the FTP server’s default working directory), etc.
Here’s how a GridSweeper run interacts with the file system:
- The experiment setup data includes a list of input files, mapping (absolute) paths on the local filesystem for the submit host to relative paths in the working directory of the running job. When the job is submitted, those files are copied into an input-file directory on the file transfer system, within the location
experimentName/submissionDate/submissionTime/input/. - The experiment setup data also includes a list of output file paths, relative to the working directory of the running job. This list is part of the input data for a running GridSweeper job.
- The GridSweeperRunner tool, which is the process actually executed by the grid system, begins the process by transferring any files in the input directory on the file transfer system into the working directory as specified. After the run is complete, it copies the specified output files back to the file transfer system to the location
experimentName/submissionDate/submissionTime/caseDir/filename, wherecaseDiris a directory name representing the particular parameter settings for the run (“b=0.1-g=25″). If the filename includes the wildcard$gs_rn_ph$, that will be replaced by the current run number. If it does not, the run number will be appended as an extension (filename.runNumber). - When each run is complete, the submit host, which is monitoring the activity, retrieves files back to the local experiments directory. If the submit host stopped monitoring, there should be a way to go back and retrieve files not yet retrieved; I haven’t designed this mechanism yet.
Note: as of 4:30 PM, July 5, 2007, this is not all implemented correctly.
Thought: it’s possible, though unlikely, that file transfer system collisions may occur from multiple people submitting identically-named experiments at the same time. I can imagine a lab class with people following the same tutorial instructions all submitting identically-named jobs at the same time. So maybe it’s better to name these directories with unique hashes. Assuming no collisions, though, it doesn’t matter from the user perspective, so this can be changed later; the current naming scheme is nice for debugging.
If you have a shared filesystem, of course, none of this is necessary!