Submitting Jobs
The "Submit Job" screen allows you to start an SnB job
on the local computer or submit it to a batch processing system, such
as PBS or LoadLeveler, if one is available. It also supports submission
to Condor, a system that scavenges unused computing time on a network
of workstations (for more information on Condor, see http://www.cs.wisc.edu/condor/
). These options provide you with convenient ways to take maximum
advantage of the inherently parallel nature of the Shake-and-Bake
algorithm by dividing the trial structures among as many processors as
possible. Thus, jobs can be run in several parts with each subjob
creating its own set of output files. The results, however, are
combined for inspection using the tools provided by the Evaluate
Trials screen.
We (the SnB developers) have a limited number of platforms
available for development and testing. Your system configuration may
differ from ours, and the batch submission options may not work as
expected. In that case, please contact us at snbhelp@hwi.buffalo.edu so
that we can work with you to support your configuration.
There are three sections on this screen: Required Information, Local
Options, and Batch Options. The required information must be
supplied. Whether or not the other sections need to be completed
depends on the choices you make in the required information section.
-
Required Information
-
Queueing System: Select the queueing system you would
like to use.
None (local machine) runs the job on the machine where
the GUI is running. If you are using X-Windows, note that this is not
necessarily the same as the machine where the GUI is being displayed.
PBS will submit the job to a PBS queue. The 'qsub'
program must be installed and configured on your local machine, even if
PBS is actually submitting jobs to a remote machine.
Loadleveler submits a job to a LoadLeveler queue on an
IBM SP system.
Condor allows submission to a Condor flock.
-
Don't Run SnB: Clicking "Yes" generates
the dat files required to run SnB without actually
starting the job. This is useful if you want to run SnB
via a batch queueing system that is not supported directly by SnB.
Given the dat files, you can write a script that will submit the job to
the batch queueing system that you are using at your site.
-
File name prefix for results: All
files that are generated by the SnB run will start with
the prefix entered here. Appended to this prefix will be an underscore
and a number ranging from zero to one less than the number of SnB
processes you request (see the next variable). Do NOT use an
underscore in the prefix name itself (hyphens are OK).
-
Number of SnB processes to run: If the local
run method is selected, the GUI will initiate this many processes on
the local machine. If you select one of the batch methods (PBS, LoadLeveler, Condor),
this variable indicates the number of nodes to be requested from the
batch queueing system.
-
Local Options
-
Priority: Used to choose the "nice" value at
the time of job submission. If you are sharing a machine and wish to
run a background job, choose "low" priority.
-
Process jobs: When you have
finished filling in all the required fields, click this button to begin
processing the job.
-
Batch Options
-
Queue: Select the queue for PBS and LoadLeveler jobs.
Condor does not support different queues.
-
Copy input files to remote machine(s): Select
"yes" if you want to copy all input files to the machine
where the job will be run. When SnB is finished, it will
copy the output files back to the working directory on the local
machine. Copying the files does not really improve overall performance
since the only significant amount of I/O occurs at the start of the
job. However, it is recommended that you transfer input files to remote
cluster machines since these machines typically have low disk and
network I/O performance. Thus, their network and disk subsystems could
become overloaded when starting a job.
-
Remote directory: The directory for staging files. You
need to supply this information only if you selected "yes"
for "copy input files to remote machine." If your batch
environment provides a temporary directory name in an environment
variable, you can enter that here.
-
Queue type: Your choices are serial, parallel (shared
memory), and parallel (cluster). For example, suppose you entered
"8" for the number of SnB processes to run (in
the required information section). Choosing serial would
cause eight single-processor jobs to be submitted to the queue that you
selected. Both parallel selections will submit a single eight-processor
job. The difference between the two is that the parallel
shared memory option will use cp to stage files
whereas the parallel cluster option uses rcp (a
shared file system is not assumed). When running LoadLeveler jobs, you
are not prompted for this item.
Shared memory machines include the SGI Origin2000, Sun
Enterprise 10000, and any other machine that has multiple processors in
the same physical unit. On these machines you should select parallel
shared memory as the queue type.
Cluster machines include the IBM SP and Beowulf-style
clusters. Clusters consist of two or more distinct computers that are
coupled together via software. For these machines you should select parallel
cluster as the queue type.
Serial can be chosen for either shared memory or
cluster computers. Whether you choose serial or one of the parallel
options is a matter of preference. One serial job will start up when a
single processor is free. On the other hand, a parallel job that
requires n processors will have to wait till n
processors are free. Your computing site will also have limits on how
many jobs you can have running as well as how many processors you can
allocate for a parallel job. These limits will also influence which
option you should choose. If you are unsure, you should contact the
administrator of the machine you are using.
-
Tasks per node (LoadLeveler only): The number of tasks
to start on each SP node. If you are utilizing SMP nodes, you can set
this number to the number of processors in each node. Then, the total
number of processors that your job will use is equal to (tasks per
node)*(number of nodes).
-
Number of nodes (LoadLeveler only): The number of nodes
to allocate for the job.
-
Process jobs: When you have finished filling in all the
required fields, click this button to submit your job to the batch
system that you have selected.