Here goes trying to install the open-source Grid Engine 6.0u8 on Tiger. It would be nice if there were a Mac OS X installer package…if I have extra time (ha) maybe I’ll put one together.
I can already see that Xgrid is an infinitely simpler system. Apple wins on ease-of-use already—just based on the instructions in the Plan the Installation section of the Grid Engine manual.
SGE, on the other hand, looks way more powerful. Sophisticated scheduling, intelligent matching of available resources to job needs, etc., etc. I like.
For my own personal use, Xgrid looks great. But I’m going to slog through, because I think I’d better get some hands-on use of the reference implementation of DRMAA before writing my own new implementation.
First, some preliminary notes on how the Grid Engine works…
Definitions
- master host
- Runs master daemon and scheduler daemon—basically, controls the system. Equivalent to the Xgrid controller. By default, also an administration host and submit host.
- shadow master host
- A system that can detect a failure of master and take over. Despite my mission-critical enterprise-grade infrastructure, I won’t bother dealing with these.
- execution host
- Systems that execute jobs. Equivalent to an Xgrid agent.
- administration host
- Systems that carry out any “administrative activity.” I guess this means editing jobs, adjusting controller settings, etc.?
- submit host
- Systems that allow users to submit batch jobs. Like an Xgrid client.
- queue
- Container for jobs that can run on one or more hosts concurrently. Sort of a sub-grid. Can include any subset of hosts on the system.
Daemons
sge_qmaster
- The master daemon. Handles all controller activity except scheduling decisions.
sge_schedd
- The scheduling daemon—decides where to send jobs, how to order & priorities.
sge_execd
- Execution daemon—actually runs jobs. Runs on execution hosts.
With this background, I can actually start thinking about how the hell to set up my own system! Here are the decisions I made for my giant 2-host grid:
Decisions
-
Single cluster My system will be a single cluster, rather than a collection of sub-clusters. My system consists, at last count, of my personal machines: a G5 and a four-year-old PowerBook. I’ll try to convince my roommates to let me use their machines too. At least they’re all connected via InfiniBand! Ha, just kidding.
- Hosts The G5 will be everything: master, administration, submit, and execution. The PowerBook will be everything except a master.
- Users “Ensure that all users of the grid engine system have the same user names on all submit and execution hosts.” This isn’t a decision! It’s an order!
- Software Directories I guess I’ll put a full directory tree on both machines so I don’t have to think about what to install and what not to install.
- Queue Structure One grid, one cluster, one queue; will include all (2) execution hosts. Easy peasy.
- Network Services I have no idea what an NIS file is (Solaris thing?), so I guess that means I’ll set things up as “local to each workstation in
/etc/services”.
- Gathering Information Another command: “Use the information in this chapter to gather the information necessary to complete the installation worksheet.” Decisions my ass.
I guess I’ll fill out their silly little worksheet. It looks like it might be useful…
Necessary Information
| Parameter |
Value |
| sge-root directory |
/usr/local/gridengine |
| cell name |
George W. Bush! My hero! Er, no, I’ll call it Al Gore.
|
| administrative user |
ebaskerv (c’est moi) |
sge_qmaster port number |
Uh…we’ll see what they use in the default file. |
sge_execd port number |
Ditto. |
| master host |
astor.local., G5 of my heart |
| shadow master hosts |
Nada. |
| execution hosts |
astor.local. darwin.local.
|
| administration hosts |
astor.local. darwin.local.
|
| submit hosts |
astor.local. darwin.local.
|
| group ID range for jobs |
I have no freaking clue. With one grid, probably doesn’t matter. |
| spooling mechanism |
Classic spooling sounds easier than messing with Berkeley DB. |
| Berkeley DB server host |
NA |
| Berkeley DB spooling directory |
NA |
| scheduler tuning profile |
“Normal” sounds good to me. |
| installation method |
automated? |
| If you are going to install N1GE 6 on a Windows system, acquire and install Microsoft Services for UNIX. See Appendix A for more information. |
What is this Windows you speak of? |
| If you are going to install N1GE 6 on a Windows system, create the required CSP certificates before installing N1GE. See the section called “How to Install a CSP-secured System” in Chapter 4 for information about CSP certificates. |
I see, it must be an operating system for people who want things to be even more complicated. |
| Check the Other Installations Appendix for applicability. |
Aigoo! |
This post is getting very long. Ah well, I press on.
Aw, fuck, I just noticed they have a guide to all of those table entries. Let’s see if that changes anything…well, they use 536 and 537 as ports in their example. Maybe those are free. And perhaps interactive installation will be better.
Well, it looks like it’s time to start installing. I’ll cover that in the next post.