How to submit McFarm jobs.
Tomasz Wlodek
University of the Great State of Texas
Abstract:
I giive instructions how to submit McFarm jobs.
Introduction
MC events are generated in two steps. Upon receiveing a request from users
operator has to prepare a script for submission of generator (parent) job.
It typically involves generating between 1000 and 50000 events using pythia
or isajet generators.
Once the generator job is completed and its output file stored on file server
disk cache operator submits children jobs. Usually they simulate 500 events
each, read sequentially from the generator output file.
Once all children jobs are either completed and their output files delivered
to SAM or have errored out the operator has to manually delete the generator
input file from disk cache. (This step will be made automatic one day).
In the following sections I explain how to submit generator and children
jobs.
Where to find example scripts: use the make_gen script from /home/mcfarm/mcfarm_export/templates
directory. You will find two scripts there: make_gen for generator job submission
and start_run for children jobs submission.
How to submit generator job.
Warning: Submitting MC jobs is not exact science. I assume that you
have some basic knowledge of HEP generators and mc_runjob data cards. There
is no general script to give you an example how to submit any type of job.
The make_gen script is an example you can use to start with, but it will
need to be modified to suit for your particular process. You will have to
use some of your brain cells to do this, so please try to understand what
this script does, do not treat is as a "black box". If you have an idea how
to make it better or simpler let me know.
When user reqests a particular process to be generated you have to modify
the template script and in some cases add now/replace old steering cards.I
assume that you have some (very basic) knowledge of python and shell scripting.
How to modify the make_gen script:
Firs of all look at the script carefully. It consists of several parts.
At the top of it one defines several enviroment variables. You should fill
the correct values.
- FARM - fill the ip address of your farm
- GENVERSION RECOVERSION D0GSTARVERSION D0SIMVERSION RECOVERSION RECOAVERSION
CARDSVERSION - fill the versions of executables and datacards to be used.
- PARENT and DECAY : what are the parent and decay particles?
- PTMIN and PTMAX min and max value of Pt
- NBGND number of bgnd events to be used
- NEVT number of events to be generated
- GROUP : which group reqyests the production? (higgs,top,np)
- PART this is a parameter to distinguish two identical requests. Usually
set it to "a". When some time later you are requested to repeat exactly the
same production, then set it to "b", then "c" and so on.
- REQUEST what is the request number?
- MAXOPT - compiler options, do not touch it unless you know what you
are doing.
- GEOMETRY which geometry should be used? Plain or mixed.
- If needeed add some additional exnviroment variabless to describe
Higgs mass, top mass, masses of other SUSY particles or other parameters.
If you do so, you will have to modify mc_runjob cards below.
- PROCESS give a human readable description of the process used, embedded
in "" sings, with no spaces. For example "A+B-->C+D;Pt>$PTMIN
Ok, we have filled the enviroment variables. Now read the script carefully.
In the second part of the script it prepares a "logfile" which will strore
information about the MC run. This file will be used for further job submission
and for storing information for the bookeeper.
More down below the script prepares a temporary file which contains mc_runjob
cards. It fills it with values extracted from the enviroment variables in
the head part of the script. This mc_runjob cards file will be needed to submit
the production.
Yet further below (after the ##### define the d0gstar+d0sim+reco+recoA cards
### line) the script prepares second mc_runjob cards file which will be used
to submit the children jobs when the generator job is done.
Then, below the line ### prepare the script which will extract jobname and
generator #### we prepare a temporary python script which will read the output
of job submission command, extract from it the generator job name and generator
output file, and append it to the run logfile.
And finally comes the part when we actually register the job. (after # register
the job line)
We copy the mc_runjob script to configuration scripts directory and execute
command
reg_job $SCRIPT --final_disposition=cache --num_events=$NEVT>$TEMPORARYFILE
which submitts the job. The output of job submission is stored in temporary
file which is analyzed by a python script which will extract the generator
job name and generator file name in order to append it to logfile.
End of generator script description. Now let us use it.
Store the script in some temporary directory. Edit it to describe the
process you would like to generate. Then execute command:
./make_gen
The script will run and when done it will create a file with extension "run".
It is a plain text file with run information. have a look at it. It will contain
the generator job name.
Check when the generator job is finished and when it is done you can submit
children jobs.
You do not need to wait with children jobs submission until generator job
is completed, but it is a good practide to wait. The generator job may crash
and should this happen you will have to kill all children jobs.
How to submit children jobs.
Once the generator job is done and archieved you can submit its children
jobs. In the directory where you submit jobs you should have a file with extension
*.run with the run information and a coresponding mc_runjob script file.
The mc_runjob script file will have the same name as run information file
but with extension ".script" appended to it. You do not need to edit any
of those files. In fact do not even think about editing them!
For example, your run information file could be qcd-incl-0.5-PtGt5.0-50000-xxx.run
and the mc_runjob file qcd-incl-0.5-PtGt5.0-50000-xxx.run.script .
To start a run you should execute start_run script from directory ....
python start_run [run_type] logfile
The run_type parameter must be one of the following: D, DSR, DSRA depending
whether you would like to submit d0gstar only, d0gstar+sim+reco or full d0gstar+sim+reco+recoanalyze
production. logfile is the name of run information logfile (for example
qcd-incl-0.5-PtGt5.0-50000-xxx.run). You can submit more than one run in one
command and you can use wildcards in the logfile names.
The start_run script will read the run information files, it will decode
from them the generator file name, number of events to be generated and then
if will start submitting the children jobs. After each child job is submitted
its name will be appended to the run information file. This is a rather slow
script, so be prepared that it will run for a while.
Once you are cone, the run information file will contain names of all jobs
used in this run. At this stage you must manually move the run information
file to the directory where the bookeeper expects to find run information
files. Once this is done your run information will appear on the bookeeper
WWW page after next update.