Chapter 3: Deploying Sun Grid Engine (SGE)
Installing and Configuring SGE
As the root user change into the directory SGE_ROOT and run the following command:
[root@nodeC sge-root]# ./util/setfileperm.sh $SGE_ROOT
You will see output similar to the following:
WARNING WARNING WARNING
-----------------------------
We will set the the file ownership and permission to
| UserID: |
0 |
| GroupID: |
0 |
| In directory: |
/opt/sge-root |
We will also install the following binaries as SUID-root:
$SGE_ROOT/utilbin/<arch>/rlogin
$SGE_ROOT/utilbin/<arch>/rsh
$SGE_ROOT/utilbin/<arch>/testsuidroot
$SGE_ROOT/bin/<arch>/sgepasswd
Do you want to set the file permissions (yes/no) [NO] >>
Enter 'yes' to set the file permissions and the command will complete.
Next you will begin the actual installation of SGE by running the command './install_qmaster'. Running this command will lead you through a series of command line menus and propmts. Below we show in detail each step that is necesssary along with the output you should see.
Any entries you should type will be in red. Any action you should take will be in black.
[root@nodeC sge-root]# ./install_qmaster
Welcome to the Grid Engine installation
Grid Engine qmaster host installation
Before you continue with the installation please read these hints:
- Your terminal window should have a size of at least 80x24 characters
- The INTR character is often bound to the key Ctrl-C. The term >Ctrl-C< is used during the installation if you have the possibility to abort the installation
The qmaster installation procedure will take approximately 5-10 minutes.
Hit <RETURN>
Choosing Grid Engine admin user account
You may install Grid Engine that all files are created with the user id of an unprivileged user.
This will make it possible to install and run Grid Engine in directories where user >root< has no permissions to create and write files and directories.
- Grid Engine still has to be started by user >root<
- this directory should be owned by the Grid Engine administrator
Do you want to install Grid Engine under an user id other than >root< (y/n) [y] >>
n
Checking $SGE_ROOT directory
The Grid Engine root directory is:
$SGE_ROOT = /opt/sge-root
If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/sge-root] >>
Hit <RETURN>
ypcat: can't get local yp domain: Local domain name not set
Grid Engine TCP/IP service >sge_qmaster<
There is no service >sge_qmaster< available in your >/etc/services< file or in your NIS/NIS+ database.
You may add this service now to your services database or choose a port number. It is recommended to add the service now. If you are using NIS/NIS+ you should add the service at your NIS/NIS+ server and not to the local >/etc/services< file.
Please add an entry in the form
sge_qmaster <port_number>/tcp
to your services database and make sure to use an unused port number.
Please add the service now or press <RETURN> to go to entering a port number >>
In another terminal edit /etc/services and add the line
sge_qmaster 30000/tcp
When completed enter <RETURN>
Grid Engine TCP/IP service >sge_execd<
There is no service >sge_execd< available in your >/etc/services< file or in your NIS/NIS+ database.
You may add this service now to your services database or choose a port number. It is recommended to add the service now. If you are using NIS/NIS+ you should add the service at your NIS/NIS+ server and not to the local >/etc/services< file.
Please add an entry in the form
sge_execd <port_number>/tcp
to your services database and make sure to use an unused port number.
Make sure to use a different port number for the Executionhost as on the qmaster machine
infotext: too few arguments
Please add the service now or press <RETURN> to go to entering a port number >>
In another terminal edit /etc/services and add the line
sge_execd 30001/tcp
When completed enter <RETURN>
Grid Engine cells
Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >>
Hit <RETURN> to accept default
Grid Engine qmaster spool directory
The qmaster spool directory is the place where the qmaster daemon stores the configuration and the state of the queuing system.
User >root< on this host must have read/write accessto the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start the qmaster daemon on other hosts (see the corresponding section in the Grid Engine Installation and Administration Manual for details) the account on the shadow master hosts also needs read/write access to this directory.
The following directory
[/opt/sge-root/default/spool/qmaster]
will be used as qmaster spool directory by default!
Do you want to select another qmaster spool directory (y/n) [n] >>
n
Windows Execution Host Support
Are you going to install Windows Execution Hosts? (y/n) [n] >>
n
Verifying and setting file permissions
Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (y/n) [y] >>
y
Select default Grid Engine hostname resolving method
Are all hosts of your cluster in one DNS domain? If this is the case the hostnames
>hostA< and >hostA.foo.com<
would be treated as equal, because the DNS domain name >foo.com< is ignored when comparing hostnames.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >>
y
Making directories
creating directory: default
creating directory: default/common
creating directory: /opt/sge-root/default/spool/qmaster
creating directory: /opt/sge-root/default/spool/qmaster/job_scripts
Hit <RETURN> to continue >>
hit <RETURN>
Setup spooling
Your SGE binaries are compiled to link the spooling libraries during runtime (dynamically). So you can choose between Berkeley DB spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
enter <RETURN> to accept default
Hit <RETURN>
The Berkeley DB spooling method provides two configurations!
Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host
Berkeley DB Spooling Server:
If you want to setup a shadow master host, you need to use Berkeley DB Spooling Server!
In this case you have to choose a host with a configured RPC service. The qmaster host connects via RPC to the Berkeley DB. This setup is more failsafe, but results in a clear potential security hole. RPC communication (as used by Berkeley DB) can be easily compromised.
Please only use this alternative if your site is secure or if you are not concerned about security.
Check the installation guide for further advice on how to achieve failsafety without compromising security.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>
n
Berkeley Database spooling parameters
Please enter the Database Directory now, even if you want to spool locally, it is necessary to enter this Database Directory.
Default: [/opt/sge-root/default/spool/spooldb] >>
Hit <RETURN> to accept the default
Grid Engine group id range
When jobs are started under the control of Grid Engine an additional group id is set on platforms which do not support jobs. This is done to provide maximum control for Grid Engine jobs.
This additional UNIX group id range must be unused group id's in your system. Each job will be assigned a unique id during the time it is running. Therefore you need to provide a range of id's which will be assigned dynamically for jobs.
The range must be big enough to provide enough numbers for the maximum number of Grid Engine jobs running at a single moment on a single host. E.g. a range like >20000-20100< means, that Grid Engine will use the group ids from 20000-20100 and provides a range for 100 Grid Engine jobs at the same time on a single host.
You can change at any time the group id range in your cluster configuration.
Please enter a range >>
20000-20500
Grid Engine cluster configuration
Please give the basic configuration parameters of your Grid Engine installation:
<execd_spool_dir>
The pathname of the spool directory of the execution hosts. User >root< must have the right to create this directory and to write into it.
Default: [/opt/sge-root/default/spool] >>
Hit <RETURN> to accept the default
Grid Engine cluster configuration (continued)
<administrator_mail>
The email address of the administrator to whom problem reports are sent.
It's is recommended to configure this parameter. You may use >none< if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [none] >>
Hit <RETURN> to accpet default
The following parameters for the cluster configuration were configured:
execd_spool_dir /opt/sge-root/default/spool
administrator_mail none
Do you want to change the configuration parameters (y/n) [n] >>
n
Creating local configuration
Creating >act_qmaster< file
Adding default complex attributes
Reading in complex attributes.
Adding default parallel environments (PE)
Reading in parallel environments:
PE "make.sge_pqs_api".
Adding SGE default usersets
Reading in usersets:
Userset "defaultdepartment".
Userset "deadlineusers".
Adding >sge_aliases< path aliases file
Adding >qtask< qtcsh sample default request file
Adding >sge_request< default submit options file
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<
Hit <RETURN> to continue >>
Hit <RETURN>
qmaster/scheduler startup script
We can install the startup script that will
start qmaster/scheduler at machine boot (y/n) [y] >>
n
Grid Engine qmaster and scheduler startup
Starting qmaster and scheduler daemon. Please wait ...
starting sge_qmaster
starting sge_schedd
Hit <RETURN> to continue >>
Hit <RETURN>
Adding Grid Engine hosts
Please now add the list of hosts, where you will later install your execution daemons. These hosts will be also added as valid submit hosts.
Please enter a blank separated list of your execution hosts. You may press <RETURN> if the line is getting too long. Once you are finished simply press <RETURN> without entering a name.
You also may prepare a file with the hostnames of the machines where you plan to install Grid Engine. This may be convenient if you are installing Grid Engine on many hosts.
Do you want to use a file which contains the list of hosts (y/n) [n] >>
n
Adding admin and submit hosts
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are entering an empty list. You will see messages from Grid Engine when the hosts are added.
Host(s):
Hit <RETURN> twice
If you want to use a shadow host, it is recommended to add this host to the list of administrative hosts.
If you are not sure, it is also possible to add or remove hosts after the installation with <qconf -ah hostname> for adding and <qconf -dh hostname> for removing this host
Attention: This is not the shadow host installationprocedure. You still have to install the shadow host separately
Do you want to add your shadow host(s) now? (y/n) [y] >>
n
Creating the default <all.q> queue and <allhosts> hostgroup
root@nodeC.ps.univa.com added "@allhosts" to host group list
root@nodeC.ps.univa.com added "all.q" to cluster queue list
Hit <RETURN> to continue >>
Hit <RETURN>
Scheduler Tuning
The details on the different options are described in the manual.
Configurations
- Normal
Fixed interval scheduling, report scheduling information, actual + assumed load
- High
Fixed interval scheduling, report limited scheduling information, actual load
- Max
Immediate Scheduling, report no scheduling information, actual load
Enter the number of your prefered configuration and hit <RETURN>!
Default configuration is [1] >>
1
Using Grid Engine
You should now enter the command:
/opt/sge-root/default/common/settings.csh
if you are a csh/tcsh user or
# . /opt/sge-root/default/common/settings.sh
if you are a sh/ksh user.
This will set or expand the following environment variables:
- $SGE_ROOT (always necessary)
- $SGE_CELL (if you are using a cell other than >default<)
- $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
- $SGE_EXECD_PORT (if you haven't added the service >sge_execd<)
- $PATH/$path (to find the Grid Engine binaries)
- $MANPATH (to access the manual pages)
Hit <RETURN> to see where Grid Engine logs messages >>
Hit <RETURN>
Grid Engine messages
Grid Engine messages can be found at:
/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages (during execution daemon startup)
After startup the daemons log their messages in their spool directories.
Qmaster: /opt/sge-root/default/spool/qmaster/messages
Exec daemon: <execd_spool_dir>/<hostname>/messages
Grid Engine startup scripts
Grid Engine startup scripts can be found at:
/opt/sge-root/default/common/sgemaster (qmaster and scheduler)
/opt/sge-root/default/common/sgeexecd (execd)
Do you want to see previous screen about using Grid Engine again (y/n) [n] >>
n
Your Grid Engine qmaster installation is now completed
Please now login to all hosts where you want to run an execution daemon and start the execution host installation procedure.
If you want to run an execution daemon on this host, please do not forget to make the execution host installation in this host as well.
All execution hosts must be administrative hosts during the installation. All hosts which you added to the list of administrative hosts during this installation procedure can now be installed.
You may verify your administrative hosts with the command
# qconf -sh
and you may add new administrative hosts with the command
# qconf -ah <hostname>
Please hit <RETURN> >>
Hit <RETURN>
This completes the first part of the SGE installation and configuration. Before continuing you need to set up your environment by doing the following:
[root@nodeC sge-root]#
source /opt/sge-root/default/common/settings.sh
You can verify that nodeC is configured properly to be the SGE administrative host by running
[root@nodeC sge-root]# qconf -sh
nodeC.ps.univa.com
Next nodeC needs to be configured as an execution host. Run the following command and again enter the indicated values for each menu choice:
[root@nodeC sge-root]# /opt/sge-root/install_execd
Welcome to the Grid Engine execution host installation
If you haven't installed the Grid Engine qmaster host yet, you must execute this step (with >install_qmaster<) prior the execution host installation.
For a sucessfull installation you need a running Grid Engine qmaster. It is also neccesary that this host is an administrative host.
You can verify your current list of administrative hosts with the command:
# qconf -sh
You can add an administrative host with the command:
# qconf -ah <hostname>
The execution host installation will take approximately 5 minutes.
Hit <RETURN> to continue >>
Hit <RETURN>
Checking $SGE_ROOT directory
The Grid Engine root directory is:
$SGE_ROOT = /opt/sge-root
If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/opt/sge-root] >>
Hit <RETURN>
Grid Engine cells
Please enter cell name which you used for the qmaster installation or press <RETURN> to use [default] >>
Hit <RETURN> for default
Checking hostname resolving
This hostname is known at qmaster as an administrative host.
Hit <RETURN> to continue >>
Hit <RETURN>
Local execd spool directory configuration
During the qmaster installation you've already entered a global execd spool directory. This is used, if no local spool directory is configured.
Now you can enter a local spool directory for this host.
Do you want to configure a local spool directory for this host (y/n) [n] >>
n
Creating local configuration
root@nodeC.ps.univa.com modified "nodeC.ps.univa.com" in configuration list
Local configuration for host >nodeC.ps.univa.com< created.
Hit <RETURN> to continue >>
Hit <RETURN>
execd startup script
We can install the startup script that will start execd at machine boot (y/n) [y] >>
n
Grid Engine execution daemon startup
Starting execution daemon. Please wait ...
starting sge_execd
Hit <RETURN> to continue >>
Hit <RETURN>
Adding a queue for this host
We can now add a queue instance for this host:
- it is added to the >allhosts< hostgroup
- the queue provides 2 slot(s) for jobs in all queues referencing the >allhosts< hostgroup
You do not need to add this host now, but before running jobs on this host it must be added to at least one queue.
Do you want to add a default queue instance for this host (y/n) [y] >>
y
root@nodeC.ps.univa.com modified "@allhosts" in host group list
root@nodeC.ps.univa.com modified "all.q" in cluster queue list
Using Grid Engine
You should now enter the command:
source /opt/sge-root/default/common/settings.csh
if you are a csh/tcsh user or
# . /opt/sge-root/default/common/settings.sh
if you are a sh/ksh user.
This will set or expand the following environment variables:
- $SGE_ROOT (always necessary)
- $SGE_CELL (if you are using a cell other than >default<)
- $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
- $SGE_EXECD_PORT (if you haven't added the service >sge_execd<)
- $PATH/$path (to find the Grid Engine binaries)
- $MANPATH (to access the manual pages)
Hit <RETURN> to see where Grid Engine logs messages >>
Hit <RETURN>
Grid Engine messages
Grid Engine messages can be found at:
/tmp/qmaster_messages (during qmaster startup)
/tmp/execd_messages (during execution daemon startup)
After startup the daemons log their messages in their spool directories.
Qmaster: /opt/sge-root/default/spool/qmaster/messages
Exec daemon: <execd_spool_dir>/<hostname>/messages
Grid Engine startup scripts
Grid Engine startup scripts can be found at:
/opt/sge-root/default/common/sgemaster (qmaster and scheduler)
/opt/sge-root/default/common/sgeexecd (execd)
Do you want to see previous screen about using Grid Engine again (y/n) [n] >>
n
This completes the installation and configuration of SGE
|