Introduction

This is a quick walk through to get Grid Engine going on Linux for those who would like to use it for something like FSL. This documentation is a little old, being written when the Grid Engine software was owned by Sun and often referred to as SGE (Sun Grid Engine). However, this covers the basic requirements. A quick start guide for Ubuntu/Debian is available here, but more detailed setup can be found on this page.

Since the demise of the open source (Sun) Grid Engine, various ports have sprung up. Ubuntu/Debian package the last publicly available release (6.2u5), but users of Red Hat variants (CentOS, Scientific Linux) or Debian/Ubuntu users wishing to use a more modern release should look to installing Son of Grid Engine which makes available RPM and DEB packages and is still actively maintained (last update November 2013).

Grid Engine generally consists of one master (qmaster) and a number of execute (exec) hosts, note that the qmaster machine can also be an exec host which is fine for small deployments, but large clusters should look to keeping these functions separate.

This documentation was originally produced by A. Janke (a.janke@gmail.com) and is now maintained by the FSL team.

Conventions

All the below must be done as root so either su - or use sudo -s (if your account is sudo enabled).

Prerequisites

Where the following talk about shared resources, you can ignore the details if you are setting up a single machine qmaster/execution host.

NFS

Although Grid Engine can be configured such that all machines are self contained, the instructions here assume that at least some of the Grid Engine folders are shared amongst the controller (qmaster) and clients (exec hosts). To achieve this you will typically need to setup one or more NFS shares, typically at least the configuration files (see http://arc.liv.ac.uk/SGE/howto/nfsreduce.html). Further, the FSL binaries and datasets to be operated on should be made available to all exec hosts in the same filesystem location. In the case of the FSL software, you could install this to the same location on all execution hosts or install to one location and NFS mount this to the same location on all hosts. In the case of datasets, the instructions here assume you are using NFS mounts, but through prolog and epilog scripts it is possible to setup Grid Engine to copy data to/from exec hosts.

Setting up NFS shares is beyond the scope of this document.

Name services

Grid Engine needs to be able to locate exec hosts/qmasters based on host name. Assuming all of your hosts are known to your DNS service then you will have to do no work to set this up. If you don't have a DNS zone then you may need to configure the local /etc/hosts file to resolve hostnames or look into host aliases (man host_aliases) configuration

User accounts

Grid Engine runs the scheduled job as the user who submitted it, using the textual name form (not numeric ID). Consequently, all exec hosts need to know about all users who are going to submit jobs. In a very small scale setup you may wish to add the required users directly to each exec host, but this quickly becomes unmanageable, so we would recommend setting up some kind of centralised user database, e.g. LDAP, Active Directory.

Setting this up shared user accounts is beyond the scope of this document.

Admin account

The Grid Engine software has to run as a privileged user in order to be able to run jobs as the submitting user. However, as this is a potential security issue, the grid software that communicates with the network can be run under an admin account that doesn't have root access. This account needs to be available on all cluster hosts, so either set this up locally, or add it to your central LDAP/user account system.

If you decide to have a locally defined daemon account then set this up as follows (run as the root user) (this is Red Hat dialect, for Ubuntu/Debian use the interactive adduser command).

useradd --home /opt/sge --system sgeadmin

which will add a system account (e.g. no home folder creation, no ageing of the account etc). This should be run on the qmaster and all exec hosts.

Service ports

Grid Engine communicates over two statically configured ports. These ports have to be the same on all computers, and can be configured in the file /etc/services or by changing the Grid Engine configuration setup files that all users need to source to be able to use the software. The latter option is best where you need to have more than one cluster in a location, as each qmaster/exec host has to communicate with the different clusters on different ports. Modern Linux distributions are already setup with entries for Grid Engine (use grep sge_qmaster /etc/services to confirm). If your distribution does not include entries, then you need to add the following to this file:

sge_qmaster     6444/tcp                # Grid Engine Qmaster Service
sge_qmaster     6444/udp                # Grid Engine Qmaster Service
sge_execd       6445/tcp                # Grid Engine Execution Service
sge_execd       6445/udp                # Grid Engine Execution Service

commenting out any prior definitions for the ports 6444 and 6445.

Obtaining Son of Grid Engine

The Grid Engine distribution is available from the University of Liverpool http://arc.liv.ac.uk/downloads/SGE/releases/. These instructions are based upon version 8.1.6 (latest release as of April 2014).

For Red Hat Enterprise 6 variants, download the following packages:

(replace 8.1.6-1 with the appropriate version number should a new release be available.)

baseurl=http://arc.liv.ac.uk/downloads/SGE/releases/
version=8.1.6
subversion=1
for i in - -qmaster- -qmon- -execd-; do
wget ${baseurl}${version}/gridengine${i}${version}-${subversion}.el6.x86_64.rpm
done
wget ${baseurl}/${version}/gridengine-guiinst-${version}-${subversion}.el6.noarch.rpm

For the 8.1.6 release only also download the following:

wget http://arc.liv.ac.uk/downloads/SGE/releases/8.1.6/installer.jar

Debian platforms should get these files:

Installation

Where we refer to $SGE_ROOT, when using the Son Of Grid Engine packages, this will be /opt/sge.

QMaster

Red Hat Enterprise etc

Installation of the RPMs should be carried out using YUM as any additional software dependancies will be automatically resolved. A Grid master can be installed using:

yum install gridengine-8.1.6-1.el6.x86_64.rpm gridengine-qmaster-8.1.6-1.el6.x86_64.rpm gridengine-execd-8.1.6-1.el6.x86_64.rpm gridengine-qmon-8.1.6-1.el6.x86_64.rpm gridengine-guiinst-8.1.6-1.el6.noarch.rpm

8.1.6 Release only The installer shipped in the RPM has a mistake in it which prevents it from running, this can be repaired by copying the installer.jar file available on the web pages into $SGE_ROOT/util/gui-installer/ e.g. cp installer.jar $SGE_ROOT/util/gui-installer

Debian/Ubuntu

dpkg --install sge-common_8.1.6_all.deb sge-doc_8.1.6_all.deb sge_8.1.6_amd64.deb 

Execution Host

Red Hat Enterprise etc

Installation of the RPMs should be carried out using YUM as any additional software dependancies will be automatically resolved. A Grid exec host can be installed using:

yum install gridengine-8.1.6-1.el6.x86_64.rpm gridengine-execd-8.1.6-1.el6.x86_64.rpm gridengine-guiinst-8.1.6-1.el6.noarch.rpm

8.1.6 Release only The installer shipped in the RPM has a mistake in it which prevents it from running, this can be repaired by copying the installer.jar file available on the web pages into $SGE_ROOT/util/gui-installer/ e.g. cp installer.jar $SGE_ROOT/util/gui-installer/

8.1.7 Release The text-based installer script will refuse to install unless you have all the RPMs installed. Edit the install_execd script so that the final line reads: SGE_CHECK_BINARIES=false exec ./inst_sge -x "$@"

Debian/Ubuntu

The Debian packages aren't as fine grained as the Red Hat versions, so an exec host requires the same packages as the master.

dpkg --install sge-common_8.1.6_all.deb sge-doc_8.1.6_all.deb sge_8.1.6_amd64.deb 

Configuration

The packages ship with a graphical installer and text-based installers (if you wish to use the graphical installer, make sure you are able to run X11 programs, e.g. ssh -Y has been used when connecting and you are running X11 locally). The instructions here detail how to run the text based installers.

QMaster

Set an environment variable and then install the qmaster as such:

export SGE_ROOT=/opt/sge
cd $SGE_ROOT
./install_qmaster

Now go through the interactive install process:

Now that we are back to a shell (finally) we need to add a few things to our root .bashrc so that we can access the SGE binaries. Add the following lines to /root/.bashrc

   # SGE settings
   export SGE_ROOT=/usr/sge
   export SGE_CELL=default
   if [ -e $SGE_ROOT/$SGE_CELL ]
   then
      . $SGE_ROOT/$SGE_CELL/common/settings.sh
   fi

And then be sure to re-source your .bashrc

. /root/.bashrc

Now we can add our own username as an admin so that we can manage the system without becoming root.

qconf -am <myusername>

e.g qconf -am jbloggs if your username is jbloggs.

Exec Host

The process for installing exec hosts is as follows

  1. Add the exec host to the master host as an admin host. If your exec host is called client.foo.com then run this on your master host:
    • qconf -ah client.foo.com
  2. On the client (client.foo.com)
    1. Add the sgeadmin username as per above
    2. Add the lines to /etc/services if required

    3. Add the SGE bits to /root/.bashrc and re-source it (. /.bashrc)
    4. Ensure the binaries have been installed

  3. Set an environment variable and then install the exec host (this might be the same machine as the queue master, for example if you only have one computer)
    • export SGE_ROOT=/opt/sge
      cd $SGE_ROOT
      ./install_execd
  4. Now go through the interactive install process:
  5. The installer will ask that you check that this host has been added as an administrative host with the qconf -ah <hostname> command. Ensure this is the case (you can remove it as an admin host after the install if you wish), then press enter to continue

  6. Make sure the Grid Engine root matches that configured on the Qmaster (/opt/sge)

  7. Ensure the cell name matches that configured on the master (default is usually fine "default")
  8. Accept the age_execd port setting
  9. Accept the message about the host being known as an admin host
  10. Make a decision about the spool directory. For medium to large clusters local spool directories are the best option, for small (this should be an NFS mount) or stand-alone installs the default is fine. An appropriate local spool folder name might be /var/spool/sge. If you choose to have a local spool folder you will now receive a warning that the change of 'execd_spool_dir' not being effective before execd has been restarted - you will have to stop/start the execd after completing the install for this to take effect.

  11. press "y" to install the startup scripts
  12. confirm you have read the following messages
  13. When asked about adding a default queue instance for this host answer "n" - FSL requires specific queues, so it is better to define these rather than the default queue.
  14. press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer

Repeat this installation procedure on all of the execution hosts...

Queue Configuration

Now we can configure our compute queues. By default Grid Engine has one queue defined all.q. Lets make some changes to this to show how you would configure a queue. Assuming you have added your own username as an admin you can do this from your account, alternatively you will have to become the sgeadmin user.

Configuration of queues is via the command qconf which will use whatever you have set in your EDITOR environment variable to edit the settings. The default if EDITOR is unset is to use vi. If you are not sure how to use vi, be sure to set the EDITOR variable before executing qconf to, for example, gedit.

export EDITOR=gedit

Envoke the queue configuration with

qconf -mq all.q

Change the following settings:

Now check that your install has worked with qstat -f or with qhost -q. You should get some output regarding all.q

Alternatively, if you have installed the qmon package you can configure this with the X11 based graphical tool qmon.

FSL Queues

FSL uses 4 queues by default (see $FSLDIR/bin/fsl_sub), these are verylong.q, long.q, short.q and veryshort.q, the idea being that you can submit jobs to each of these based upon how long you expect the job to run. Thus if you submit a long running job to verylong.q and then want to also run 10 short jobs you can submit them to short.q and they will be run before the verylong.q jobs (or perhaps even in parallel with these jobs). One way to configure a fair share of resources is via nice values. For example in the previous section we set the nice value of the all.q to 20. This is the lowest priority so we will use the following scheme for the 4 FSL queues. What this means is that the queue with the lowest nice value (veryshort.q) will get priority for its jobs.

But before we do this be must first make our new 4 queues. The easiest way to do this is to copy the configuration from our default queue (all.q). For the brave there is a shell script below that will do this all in one foul swoop. The manual method (for those who want to know what is going on). First we dump the values from the all.q:

qconf -sq all.q > /tmp/q.temp

Then modify the qname entry using any editor you choose in the file /tmp/q.tmp, for example, change it to verylong.q. You can now use this new config to create the verylong.q queue

qconf -Aq /tmp/q.tmp

Change the file again for the long.q:

and make this queue

qconf -Aq /tmp/q.tmp

The repeat these steps for the short.q and veryshort.q varying the priority value appropriately. When you are all done you can check the new queues with:

qhost -q

Alternatively, this can be scripted as follows:

# change defaults for all.q
qconf -sq all.q |\
    sed -e 's/bin\/csh/bin\/sh/' |\
    sed -e 's/posix_compliant/unix_behavior/' |\
    sed -e 's/priority              0/priority 20/' >\
    /tmp/q.tmp
qconf -Mq /tmp/q.tmp

# add other queues
sed -e 's/all.q/verylong.q/' /tmp/q.tmp >\
   /tmp/verylong.q
qconf -Aq /tmp/verylong.q

sed -e 's/all.q/long.q/' /tmp/q.tmp |\
   sed -e 's/priority *20/priority 15/' >\
   /tmp/long.q
qconf -Aq /tmp/long.q

sed -e 's/all.q/short.q/' /tmp/q.tmp |\
   sed -e 's/priority *20/priority 10/' >\
   /tmp/short.q
qconf -Aq /tmp/short.q

sed -e 's/all.q/veryshort.q/' /tmp/q.tmp |\
   sed -e 's/priority *20/priority 5/' >\
   /tmp/veryshort.q
qconf -Aq /tmp/veryshort.q
 

FslSge (last edited 15:42:02 23-06-2014 by DuncanMortimer)