In the Chem building, about to leave. I discovered it’s easy to blog photos from my phone. Fun!
Photo blogging!
January 16th, 2009Blog the inaug
January 16th, 2009This seems as good a time as any to revive a never really functional blog: the historic inauguration of Barack Obama.
I’m heading out to DC later today along with three other grad students willing to sacrifice a few days for this historic occasion. Somehow, I tell myself, I’ll get some work done on the road.
The last time I was a new student at the University of Michigan, we were about to elect Al GoreGeorge W. Bush. I remember my dad telling me at the time, “The republic will survive!” Technically, I guess he was right, but there have definitely been moments over the last few years when I feared the coming of the new American Empire (I mean, officially), the dissolution of the Senate, George Bush’s appointment of his dog to consul, and the inevitable decline and fall.
Things aren’t really so bad as we head into Barack Obama’s first term. After all, many people do have jobs. Some are safely ensconced in graduate school. Especially lucky are the people who didn’t have jobs before the recession, so they didn’t have any to lose! (Those people usually have the additional advantage of not owning any stocks.) Furthermore, we’re only involved in two major conflicts abroad, and everyone knows that two is a very small number.
The best news, of course, is that Barack Obama owns a magic wand.
Seriously, though: I think it’s a time to be hopeful, but also an opportunity to hold leaders accountable to the change they promised. Barack Obama ran on a platform of change—and, contrary to the beliefs of many, he in fact did enumerate a number of important specific changes to make.
One promised change is the elimination of lobbyist money from the political process. Lawrence Lessig’s Change Congress project aims to do just that, and aims to do it via the very effective tool of public shame. The organization did not make much of a splash when it first appeared, but momentum seems to be growing, and I encourage all three of my readers to sign up with their ongoing donor strike (see the website for more info), especially if you donated to any congresspeople last year. (I signed up for the strike, effectively witholding $0 in donations this year!)
Due to the threatened demise of the auto industry, I also think this is the perfect time to implement really ambitious fuel economy standards. More on that next time.
XgridLite on Leopard, continued…
October 12th, 2008As it turns out, the privilege failure mentioned in the last post was not being reported by the calls to xgridctl to start and stop the controller, but rather the ones that were trying to simply get what the current status was. In Leopard, you always need to run xgridctl as root, even just to get status.
However, when I recoded the status-getting code to use the privileged tool to do its work, it’s reporting that the controller is off when doing xgridctl c status at the command line reports that it’s running. One more roadblock, but this should be the last one. Given how my weeks go, it probably won’t be until next weekend (a long one, fall break).
XgridLite on Leopard: almost there, hopefully
October 11th, 2008To those who have been requesting that XgridLite start working on Leopard: first of all, sorry it’s taken so long for me to think about it. I didn’t even upgrade myself to Leopard until the middle of this year, and since then I’ve been pretty busy working, touring with my band, and finally starting a new life as an ecology grad student. The good news is that being a grad student means I’ll be running simulations again soon, so I’m selfishly motivated to have a working XgridLite on my machine.
I finally popped open the code after a long hiatus and tried it out. I was hoping the problem would just be a simple problem like the controller argument syntax changed, but I got this message in the console log:
ERROR: you must be running with root privileges
This is the same message you get when you run xgridctl c start as a regular user from the command line. So the mechanism I used to run as root under Tiger is clearly not working under Leopard. Perhaps this is related to the new security mechanisms (which I know nothing about) that Apple added.
I’ll try to tackle this tomorrow.
Google Summer of Code
July 13th, 2007I was interviewed at Google for a Summer of Code Podcast last week.
File transfer overview
July 5th, 2007GridSweeper’s file transfer mechanism is designed to allow transfer of input and output files in a grid system-independent way. In environments without a real IT infrastructure, such as my own ad-hoc Xgrid setup, a shared filesystem will not necessarily be available, so you need a way to stage and retrieve files.
The GridSweeper code is file transfer system-agnostic, providing a simple interface that can be implemented for a particular file transfer system (e.g., FTP, which is included as a working example). The interface requires just a few basic methods: connect(), disconnect(), uploadFile, deleteFile, makeDirectory(), removeDirectory, list(), and isDirectory(), which all do pretty much what you’d expect. There is no notion of a working directory, so all paths are relative to the implicit root of the file system. Particular file transfer systems can define custom properties to affect setup—e.g., the FTP system provides properties for setting the hostname, username/password, root directory (so the GridSweeper root need not be the same as the FTP server’s default working directory), etc.
Here’s how a GridSweeper run interacts with the file system:
- The experiment setup data includes a list of input files, mapping (absolute) paths on the local filesystem for the submit host to relative paths in the working directory of the running job. When the job is submitted, those files are copied into an input-file directory on the file transfer system, within the location
experimentName/submissionDate/submissionTime/input/. - The experiment setup data also includes a list of output file paths, relative to the working directory of the running job. This list is part of the input data for a running GridSweeper job.
- The GridSweeperRunner tool, which is the process actually executed by the grid system, begins the process by transferring any files in the input directory on the file transfer system into the working directory as specified. After the run is complete, it copies the specified output files back to the file transfer system to the location
experimentName/submissionDate/submissionTime/caseDir/filename, wherecaseDiris a directory name representing the particular parameter settings for the run (“b=0.1-g=25″). If the filename includes the wildcard$gs_rn_ph$, that will be replaced by the current run number. If it does not, the run number will be appended as an extension (filename.runNumber). - When each run is complete, the submit host, which is monitoring the activity, retrieves files back to the local experiments directory. If the submit host stopped monitoring, there should be a way to go back and retrieve files not yet retrieved; I haven’t designed this mechanism yet.
Note: as of 4:30 PM, July 5, 2007, this is not all implemented correctly.
Thought: it’s possible, though unlikely, that file transfer system collisions may occur from multiple people submitting identically-named experiments at the same time. I can imagine a lab class with people following the same tutorial instructions all submitting identically-named jobs at the same time. So maybe it’s better to name these directories with unique hashes. Assuming no collisions, though, it doesn’t matter from the user perspective, so this can be changed later; the current naming scheme is nice for debugging.
If you have a shared filesystem, of course, none of this is necessary!
Preliminary Javadoc completed
June 25th, 2007Between driving from San Francisco and selling furniture on Craigslist, this weekend I wrote preliminary Javadoc for all of last summer’s GridSweeper work. A very valuable exercise before diving into coding: it made me look through every single method I wrote and say something about it. It also brought a number of design flaws to my attention, duly noted in TODO comments.
High-level GridSweeper execution overview
June 21st, 2007The purpose of GridSweeper is to take a simple user-provided description of what parameter settings to run a model with, run the model on a grid, and return results to the user.
The user will be able to manipulate the parameter-sweep description in three ways: (1) using an XML specification file, (2) with command-line arguments, and (3) with a graphical user interface. These three mechanisms can be mixed: command-line arguments can augment or override XML as well as be saved back out to XML, and the GUI tool will serve to edit and save XML files as well.
Ultimately, user action will result in running the GridSweeper program, which turns parameter sweep specifications into job specifications for the grid system via DRMAA. Specifically, the program does the following:
- Parses the XML specification and command-line arguments to generate an
Experimentobject. - Generates a list of
ExperimentCaseobjects (parameter value settings) from theExperimentclass. - Sets up an output directory for the files generated by this experiment. If a shared filesystem is not present, this can be done via FTP or other file-transfer mechanism supported by a plugin implementing a subclass of
FileTransferSystem. - Starts a DRMAA session and submits a job for each experiment case, using an archived
RunSetupobject for each job’s standard input. - Still unimplemented: monitors the results of jobs and reports status changes to the user.
The way things are set up now, GridSweeper requires support on both the submission end and the execution end of the grid. The DRMAA job specification specifies that the execution host run not the model itself, but the GridSweeperRunner program, which takes input data and uses that to actually run the model. Specifically, it does the following:
- Unarchives the
RunSetupobject from standard input. - If necessary, downloads input files via the file transfer mechanism.
- Actually runs the model using an instance of the
Adapterclass specified by the user (explicitly, or implicitly by using, e.g.,gdronefor the Drone compatibility adapter). TheAdapterobject knows how to take a set of parameters and send it to a particular type of model executable. - If necessary, uploads output files via the file transfer mechanism.
One problem with this mechanism is that it submits a separate job for every experiment case, bypassing DRMAA’s notion of batch jobs. DRMAA batch jobs let you submit a whole bunch of jobs at the same time by specifying that each job is the same except for an integer specifier, and that specifier can be used as a variable in command-line arguments. Because some systems may be faster at accepting batch jobs than a pile of individual jobs, it might be worth using the batch job mechanism.
One way to do this would be to defer the calculation of parameter assignments and random seeds to the execution host, but that makes it impossible to generate a file for reproducing the experiment as soon as it is submitted. A better way is to generate a series of input files in the experiment directory, named with the batch run index, and have the GridSweeperRunner tool read those files at runtime rather than reading an object from standard input.
GridSweeper installation hierarchy
June 21st, 2007As currently conceived, GridSweeper will consist of a set of Java classes in JAR files, additional Java classes as plugins (plugin format to be determined, but will include a JAR file), and shell scripts to simplify this:
java -cp ${GRIDSWEEPER_ROOT}/classes/GridSweeper.jar \\
com.edbaskerville.gridsweeper.GridSweeper [args]
into this:
gsweep [args]
The top level of the hierarchy will be designated by the environment variable $GRIDSWEEPER_ROOT, within which the following tree will exist:
$GRIDSWEEPER_ROOT/
bin/
gsweep
(main GridSweeper submission executable)
gdrone
(shortcut to gsweep -a com.edbaskerville.gridsweeper.DroneAdapter)
grunner
(wrapper to actually execute jobs on the agent machine)
...
(other scripts to shortcut, e.g., the Repast adapter)
classes/
classes.jar
(all classes except those with main methods)
GridSweeper.jar
(app/tool class)
GridSweeperRunner.jar
(class to actually run simulations on agents)
plugins/
(contains add-on adapters and file-transfer systems)
Setting up the GridSweeper build environment
June 21st, 2007First things first: this post covers how to get the GridSweeper build environment set up on your machine. I’ve developed GridSweeper entirely with Eclipse, but the build process uses Ant, so it can be run from the command line as well (or, theoretically, any other Ant-compatible IDE).
To get GridSweeper building on your machine, you’ll need to get threefour things:
- The code distribution (trunk), checked out into your Eclipse workspace. Soon to be hosted at CSCS.
- An implementation of the Java Distributed Resource Management Application API (DRMAA). For CSCS/Linux, you should use the one provided by the Sun Grid Engine (in
/appl/sge/drmaa.jaron CSCS machines). For building on my Mac, I’m using my XgridDRMAA implementation. - Jakarta Commons Net (download page). This is for FTP file transfer, which won’t actually be relevant for CSCS—maybe I can modify the build system to make this optional.
- Jakarta ORO (download page), also for FTP. You won’t even realize you’re missing this until you get an obscure class not found error at runtime when using any of the FTP directory methods.
If you’re using Eclipse (recommended), open up the project in your workspace. Add the DRMAA and Jakarta Commons Net jar files (Project > Properties > Java Build Path > Add External JARs…), and, in theory, the project should build.
Next: evaluating the code.
