Globus Toolkit Tutorial The Globus Consortium
 
 

Chapter 3: Deploying Sun Grid Engine (SGE)

Installing and Configuring SGE

As the root user change into the directory SGE_ROOT and run the following command:

[root@nodeC sge-root]# ./util/setfileperm.sh $SGE_ROOT

You will see output similar to the following:

WARNING WARNING WARNING
-----------------------------
We will set the the file ownership and permission to

UserID: 0
GroupID: 0
In directory:     /opt/sge-root

We will also install the following binaries as SUID-root:

$SGE_ROOT/utilbin/<arch>/rlogin
$SGE_ROOT/utilbin/<arch>/rsh
$SGE_ROOT/utilbin/<arch>/testsuidroot
$SGE_ROOT/bin/<arch>/sgepasswd

Do you want to set the file permissions (yes/no) [NO] >>

Enter 'yes' to set the file permissions and the command will complete.

Next you will begin the actual installation of SGE by running the command './install_qmaster'. Running this command will lead you through a series of command line menus and propmts. Below we show in detail each step that is necesssary along with the output you should see.

Any entries you should type will be in red. Any action you should take will be in black.

[root@nodeC sge-root]# ./install_qmaster


Welcome to the Grid Engine installation

Grid Engine qmaster host installation

Before you continue with the installation please read these hints:
  • Your terminal window should have a size of at least 80x24 characters
     
  • The INTR character is often bound to the key Ctrl-C. The term >Ctrl-C< is used during the installation if you have the possibility to abort the installation

The qmaster installation procedure will take approximately 5-10 minutes.

Hit <RETURN>

Choosing Grid Engine admin user account
You may install Grid Engine that all files are created with the user id of an unprivileged user.

This will make it possible to install and run Grid Engine in directories where user >root< has no permissions to create and write files and directories.
  • Grid Engine still has to be started by user >root<
  • this directory should be owned by the Grid Engine administrator

Do you want to install Grid Engine under an user id other than >root< (y/n) [y] >>

n

Checking $SGE_ROOT directory

The Grid Engine root directory is:

$SGE_ROOT = /opt/sge-root

If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/sge-root] >>

Hit <RETURN>

ypcat: can't get local yp domain: Local domain name not set

Grid Engine TCP/IP service >sge_qmaster<

There is no service >sge_qmaster< available in your >/etc/services< file or in your NIS/NIS+ database.

You may add this service now to your services database or choose a port number. It is recommended to add the service now. If you are using NIS/NIS+ you should add the service at your NIS/NIS+ server and not to the local >/etc/services< file.

Please add an entry in the form

sge_qmaster <port_number>/tcp

to your services database and make sure to use an unused port number.

Please add the service now or press <RETURN> to go to entering a port number >>

In another terminal edit /etc/services and add the line
sge_qmaster 30000/tcp

When completed enter <RETURN>

Grid Engine TCP/IP service >sge_execd<

There is no service >sge_execd< available in your >/etc/services< file or in your NIS/NIS+ database.

You may add this service now to your services database or choose a port number. It is recommended to add the service now. If you are using NIS/NIS+ you should add the service at your NIS/NIS+ server and not to the local >/etc/services< file.

Please add an entry in the form

sge_execd <port_number>/tcp

to your services database and make sure to use an unused port number.

Make sure to use a different port number for the Executionhost as on the qmaster machine

infotext: too few arguments
Please add the service now or press <RETURN> to go to entering a port number >>

In another terminal edit /etc/services and add the line
sge_execd 30001/tcp

When completed enter <RETURN>

Grid Engine cells

Grid Engine supports multiple cells.

If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name default

If you want to install multiple cells you can enter a cell name now.

The environment variable

$SGE_CELL=<your_cell_name>

will be set for all further Grid Engine commands.

Enter cell name [default] >>

Hit <RETURN> to accept default

Grid Engine qmaster spool directory

The qmaster spool directory is the place where the qmaster daemon stores the configuration and the state of the queuing system.

User >root< on this host must have read/write accessto the qmaster spool directory.

If you will install shadow master hosts or if you want to be able to start the qmaster daemon on other hosts (see the corresponding section in the Grid Engine Installation and Administration Manual for details) the account on the shadow master hosts also needs read/write access to this directory.

The following directory

[/opt/sge-root/default/spool/qmaster]

will be used as qmaster spool directory by default!

Do you want to select another qmaster spool directory (y/n) [n] >>

n

Windows Execution Host Support

Are you going to install Windows Execution Hosts? (y/n) [n] >>

n

Verifying and setting file permissions

Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (y/n) [y] >>

y

Select default Grid Engine hostname resolving method

Are all hosts of your cluster in one DNS domain? If this is the case the hostnames

>hostA< and >hostA.foo.com<

would be treated as equal, because the DNS domain name >foo.com< is ignored when comparing hostnames.

Are all hosts of your cluster in a single DNS domain (y/n) [y] >>

y

Making directories

creating directory: default
creating directory: default/common
creating directory: /opt/sge-root/default/spool/qmaster
creating directory: /opt/sge-root/default/spool/qmaster/job_scripts
Hit <RETURN> to continue >>

hit <RETURN>

Setup spooling

Your SGE binaries are compiled to link the spooling libraries during runtime (dynamically). So you can choose between Berkeley DB spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>

enter <RETURN> to accept default

Hit <RETURN>

The Berkeley DB spooling method provides two configurations!

Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host

Berkeley DB Spooling Server:

If you want to setup a shadow master host, you need to use Berkeley DB Spooling Server!

In this case you have to choose a host with a configured RPC service. The qmaster host connects via RPC to the Berkeley DB. This setup is more failsafe, but results in a clear potential security hole. RPC communication (as used by Berkeley DB) can be easily compromised.

Please only use this alternative if your site is secure or if you are not concerned about security.

Check the installation guide for further advice on how to achieve failsafety without compromising security.

Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>

n

Berkeley Database spooling parameters

Please enter the Database Directory now, even if you want to spool locally, it is necessary to enter this Database Directory.

Default: [/opt/sge-root/default/spool/spooldb] >>

Hit <RETURN> to accept the default

Grid Engine group id range

When jobs are started under the control of Grid Engine an additional group id is set on platforms which do not support jobs. This is done to provide maximum control for Grid Engine jobs.

This additional UNIX group id range must be unused group id's in your system. Each job will be assigned a unique id during the time it is running. Therefore you need to provide a range of id's which will be assigned dynamically for jobs.

The range must be big enough to provide enough numbers for the maximum number of Grid Engine jobs running at a single moment on a single host. E.g. a range like >20000-20100< means, that Grid Engine will use the group ids from 20000-20100 and provides a range for 100 Grid Engine jobs at the same time on a single host.

You can change at any time the group id range in your cluster configuration.

Please enter a range >>

20000-20500

Grid Engine cluster configuration

Please give the basic configuration parameters of your Grid Engine installation:

<execd_spool_dir>

The pathname of the spool directory of the execution hosts. User >root< must have the right to create this directory and to write into it.

Default: [/opt/sge-root/default/spool] >>

Hit <RETURN> to accept the default

Grid Engine cluster configuration (continued)

<administrator_mail>

The email address of the administrator to whom problem reports are sent.

It's is recommended to configure this parameter. You may use >none< if you do not wish to receive administrator mail.

Please enter an email address in the form >user@foo.com<.

Default: [none] >>

Hit <RETURN> to accpet default

The following parameters for the cluster configuration were configured:

execd_spool_dir   /opt/sge-root/default/spool
administrator_mail   none

Do you want to change the configuration parameters (y/n) [n] >>

n

Creating local configuration

Creating >act_qmaster< file
Adding default complex attributes
Reading in complex attributes.
Adding default parallel environments (PE)
Reading in parallel environments:
PE "make.sge_pqs_api".
Adding SGE default usersets
Reading in usersets:
Userset "defaultdepartment".
Userset "deadlineusers".
Adding >sge_aliases< path aliases file
Adding >qtask< qtcsh sample default request file
Adding >sge_request< default submit options file
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<

Hit <RETURN> to continue >>

Hit <RETURN>

qmaster/scheduler startup script

We can install the startup script that will
start qmaster/scheduler at machine boot (y/n) [y] >>

n

Grid Engine qmaster and scheduler startup

Starting qmaster and scheduler daemon. Please wait ...
starting sge_qmaster
starting sge_schedd
Hit <RETURN> to continue >>

Hit <RETURN>

Adding Grid Engine hosts

Please now add the list of hosts, where you will later install your execution daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may press <RETURN> if the line is getting too long. Once you are finished simply press <RETURN> without entering a name.

You also may prepare a file with the hostnames of the machines where you plan to install Grid Engine. This may be convenient if you are installing Grid Engine on many hosts.

Do you want to use a file which contains the list of hosts (y/n) [n] >>

n

Adding admin and submit hosts

Please enter a blank seperated list of hosts.

Stop by entering <RETURN>. You may repeat this step until you are entering an empty list. You will see messages from Grid Engine when the hosts are added.

Host(s):

Hit <RETURN> twice

If you want to use a shadow host, it is recommended to add this host to the list of administrative hosts.

If you are not sure, it is also possible to add or remove hosts after the installation with <qconf -ah hostname> for adding and <qconf -dh hostname> for removing this host

Attention: This is not the shadow host installationprocedure. You still have to install the shadow host separately

Do you want to add your shadow host(s) now? (y/n) [y] >>

n

Creating the default <all.q> queue and <allhosts> hostgroup

root@nodeC.ps.univa.com added "@allhosts" to host group list
root@nodeC.ps.univa.com added "all.q" to cluster queue list

Hit <RETURN> to continue >>

Hit <RETURN>

Scheduler Tuning

The details on the different options are described in the manual.

Configurations

  1. Normal
    Fixed interval scheduling, report scheduling information, actual + assumed load
     
  2. High
    Fixed interval scheduling, report limited scheduling information, actual load
     
  3. Max
    Immediate Scheduling, report no scheduling information, actual load

Enter the number of your prefered configuration and hit <RETURN>!
Default configuration is [1] >>

1

Using Grid Engine

You should now enter the command:

/opt/sge-root/default/common/settings.csh

if you are a csh/tcsh user or

# . /opt/sge-root/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

  • $SGE_ROOT (always necessary)
  • $SGE_CELL (if you are using a cell other than >default<)
  • $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
  • $SGE_EXECD_PORT (if you haven't added the service >sge_execd<)
  • $PATH/$path (to find the Grid Engine binaries)
  • $MANPATH (to access the manual pages)

Hit <RETURN> to see where Grid Engine logs messages >>

Hit <RETURN>

Grid Engine messages

Grid Engine messages can be found at:

/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

Qmaster: /opt/sge-root/default/spool/qmaster/messages
Exec daemon: <execd_spool_dir>/<hostname>/messages

Grid Engine startup scripts

Grid Engine startup scripts can be found at:

/opt/sge-root/default/common/sgemaster (qmaster and scheduler)
/opt/sge-root/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

n

Your Grid Engine qmaster installation is now completed

Please now login to all hosts where you want to run an execution daemon and start the execution host installation procedure.

If you want to run an execution daemon on this host, please do not forget to make the execution host installation in this host as well.

All execution hosts must be administrative hosts during the installation. All hosts which you added to the list of administrative hosts during this installation procedure can now be installed.

You may verify your administrative hosts with the command

# qconf -sh

and you may add new administrative hosts with the command

# qconf -ah <hostname>

Please hit <RETURN> >>

Hit <RETURN>


This completes the first part of the SGE installation and configuration. Before continuing you need to set up your environment by doing the following:

[root@nodeC sge-root]#
source /opt/sge-root/default/common/settings.sh

You can verify that nodeC is configured properly to be the SGE administrative host by running

[root@nodeC sge-root]# qconf -sh
nodeC.ps.univa.com

Next nodeC needs to be configured as an execution host. Run the following command and again enter the indicated values for each menu choice:

[root@nodeC sge-root]# /opt/sge-root/install_execd


Welcome to the Grid Engine execution host installation

If you haven't installed the Grid Engine qmaster host yet, you must execute this step (with >install_qmaster<) prior the execution host installation.

For a sucessfull installation you need a running Grid Engine qmaster. It is also neccesary that this host is an administrative host.

You can verify your current list of administrative hosts with the command:

# qconf -sh

You can add an administrative host with the command:

# qconf -ah <hostname>

The execution host installation will take approximately 5 minutes.

Hit <RETURN> to continue >>

Hit <RETURN>

Checking $SGE_ROOT directory

The Grid Engine root directory is:

$SGE_ROOT = /opt/sge-root

If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/sge-root] >>

Hit <RETURN>

Grid Engine cells

Please enter cell name which you used for the qmaster installation or press <RETURN> to use [default] >>

Hit <RETURN> for default

Checking hostname resolving

This hostname is known at qmaster as an administrative host.

Hit <RETURN> to continue >>

Hit <RETURN>

Local execd spool directory configuration

During the qmaster installation you've already entered a global execd spool directory. This is used, if no local spool directory is configured.

Now you can enter a local spool directory for this host.

Do you want to configure a local spool directory for this host (y/n) [n] >>

n

Creating local configuration

root@nodeC.ps.univa.com modified "nodeC.ps.univa.com" in configuration list
Local configuration for host >nodeC.ps.univa.com< created.

Hit <RETURN> to continue >>

Hit <RETURN>

execd startup script

We can install the startup script that will start execd at machine boot (y/n) [y] >>

n

Grid Engine execution daemon startup

Starting execution daemon. Please wait ...
starting sge_execd

Hit <RETURN> to continue >>

Hit <RETURN>

Adding a queue for this host

We can now add a queue instance for this host:

  • it is added to the >allhosts< hostgroup
  • the queue provides 2 slot(s) for jobs in all queues referencing the >allhosts< hostgroup

You do not need to add this host now, but before running jobs on this host it must be added to at least one queue.

Do you want to add a default queue instance for this host (y/n) [y] >>

y

root@nodeC.ps.univa.com modified "@allhosts" in host group list
root@nodeC.ps.univa.com modified "all.q" in cluster queue list


Using Grid Engine

You should now enter the command:

source /opt/sge-root/default/common/settings.csh

if you are a csh/tcsh user or

# . /opt/sge-root/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

  • $SGE_ROOT (always necessary)
  • $SGE_CELL (if you are using a cell other than >default<)
  • $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
  • $SGE_EXECD_PORT (if you haven't added the service >sge_execd<)
  • $PATH/$path (to find the Grid Engine binaries)
  • $MANPATH (to access the manual pages)

Hit <RETURN> to see where Grid Engine logs messages >>

Hit <RETURN>

Grid Engine messages

Grid Engine messages can be found at:

/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

Qmaster: /opt/sge-root/default/spool/qmaster/messages
Exec daemon: <execd_spool_dir>/<hostname>/messages


Grid Engine startup scripts

Grid Engine startup scripts can be found at:

/opt/sge-root/default/common/sgemaster (qmaster and scheduler)
/opt/sge-root/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

n


This completes the installation and configuration of SGE

 
 
 
FAQ Feedback The Globus Consortium Home Page