| Queue Name | CPU Limit |
|---|---|
| short | 180 min. |
| medium | 300 min. |
| large | infinite |
Options
| queue specification | -q queue-name |
| host specification | -m host-name |
| input specification | -i input-file |
| output specification | -o output-file |
| error specification | -e error-log |
Depending on your personal environment these might not be the best steps or the only steps but this should give you an idea of what you need to do.
To display an X window from a batch job, you need to make sure you set your DISPLAY environment variable so the window will have somewhere to go and you need to give the window the authority to display on the remote machine.
1) On the machine you are sitting at on which you want the X window to display, I would recommend running a command like (replace localmachinename with the name of the machine you are sitting at):
xauth extract - localmachinename.fnal.gov:0 | rsh d0mino xauth merge -
There are other commands you could run like "xhost + d0mino" but that is a bad iidea as anyone on d0mino could then spy on you. Either one of these commands would give X applications that are run from your account on d0mino the ability to display on localmachinename.fnal.gov
2) Now that you have the X permissions set up, you just need to run your job. For root you could have a script containing this:
#!/bin/sh
. /usr/local/etc/fermi.shrc
setup n32
setup D0RunII
export DISPLAY=my_machine_name.fnal.gov:0.0
xterm -e root
Make sure you do a "chmod +x
Your job should get put in the queue after the bsub and when it actually gets run you should get an xterm on your display followed by the root splash screen.
The helpdesk has been getting a number of calls about batch jobs on
on d0mino; mostly of the sort - "why is my job not running?". The batch
queues have gotten a lot of use recently. Here is an short summary
of how the LSF queuing system works. Note, the limits and parameters
change as we tune things.
The command bqueues shows the queues and a summary of their status:
bqueues shows the queues in priority order. "short" jobs are run before
"medium" jobs, etc. MAX is the maximum number of jobs that will run in the
queue, JL/U is the maximum number of jobs that a user will run at any
time, NJOBS is the number currently running, PEND are those waiting,
RUN are those running.
Scheduling is FAIRSHARE. That means jobs are not strictly run in
queue priority order; but, users are given a fair share. This considers
the number of jobs a user has running, cpu time they have used, and
clock time they have used and calculates a priority starting
high priority users first.
In this example the user time on the system (85%) exceeds the threshold (.8)
and no more jobs are scheduled.
If the user time exceeds .9 then running jobs are suspended.
Note there is a limit of 200 running jobs (there are 176 processors)
and 70 jobs per user. Some users submit a lot of jobs but again
the scheduling is FAIRSHARE. If a user submits one job and has
no jobs running then they will move ahead of users currently running
a lot of jobs. The new user has to wait for an available jobs slot.
This is fair. Users with a lot of jobs keep the host busy when there
is nothing else to do.
If you have questions send mail to helpdesk@fnal.gov.
This page is maintained by
D0WebSupport
LSF Batch Scheduling
d0mino: bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
short 35 Open:Active 72 40 - - 8 0 8 0
medium 30 Open:Active 96 40 - - 438 390 48 0
sam_lo 25 Open:Active 100 25 - - 17 0 17 0
mc99 24 Open:Active 9 9 - - 0 0 0 0
gtr 24 Open:Active 4 4 - - 0 0 0 0
jes 24 Open:Active 4 2 - - 0 0 0 0
large 24 Open:Active 100 30 - - 134 42 88 4
smt 24 Open:Active 3 3 - - 0 0 0 0
trigsim 24 Open:Active 20 20 - - 0 0 0 0
bigmem 1 Open:Active 5 2 - - 0 0 0 0
ttk1 1 Open:Active 20 20 - - 0 0 0 0
d0mino: bhpart
HOST_PARTITION_NAME: eq_share
HOSTS: all
SHARE_INFO_FOR: eq_share/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME
evansde 1 0.269 0 0 3663.8 0
opeters 1 0.255 0 0 4739.9 0
avto 1 0.184 0 0 12453.5 0
dhan 1 0.155 1 0 1698.9 570
askew 1 0.128 1 0 3651.9 5684
serban 1 0.123 1 0 4240.0 6703
gardnerj 1 0.116 1 0 4893.5 8775
kinyip 1 0.105 1 0 6412.2 11736
magnan 1 0.096 2 0 4313.5 2801
pverdier 1 0.086 0 0 44221.9 0
mverzocc 1 0.032 2 0 36826.9 77916
kuznets 1 0.011 13 0 199976.9 67922
flera 1 0.009 7 0 142827.6 282426
greenlee 1 0.008 19 0 182270.9 134148
yen 1 0.007 1 0 7.5 696355
klchan 1 0.005 8 0 301522.0 658224
kaefer 1 0.004 70 0 101263.4 64301
estrada 1 0.002 29 0 702728.5 991747
molina 1 0.000 30 0 3291425.0 7311884
Jobs may remain pending for a long time but usually are started within a
few hours. "bjobs -p" shows the reason jobs are pending. Here are some
of the reasons:
The status of host scheduling is shown with "bhosts -l".
d0mino: bhosts -l
HOST d0mino.fnal.gov
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
closed_Full 1.00 70 200 200 172 28 0 0 -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
Total 1.0 0.3 0.1 *85% 17.2 4e+04 121 0 4222M 35G 59G
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M
LOAD THRESHOLD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - *0.8 - - - - - - 5000M
loadStop - - - 0.9 - - - - - - -