How to submit McFarm jobs.

Tomasz Wlodek

University of the Great State of Texas

Abstract:

I giive instructions how to submit McFarm jobs.

Introduction

MC events are generated in two steps. Upon receiveing a request from users operator has to prepare a script for submission of generator (parent) job. It typically involves generating between 1000 and 50000 events using pythia or isajet generators.

Once the generator job is completed and its output file stored on file server disk cache operator submits children jobs. Usually they simulate 500 events each, read sequentially from the generator output file.

Once all children jobs are either completed and their output files delivered to SAM or have errored out the operator has to manually delete the generator input file from disk cache. (This step will be made automatic one day).

In the following sections I explain how to submit generator and children jobs.

Where to find example scripts: use the make_gen script from  /home/mcfarm/mcfarm_export/templates directory. You will find two scripts there: make_gen for generator job submission and start_run for children jobs submission.

How to submit generator job.

Warning: Submitting MC jobs is not exact science. I assume that you have some basic knowledge of HEP generators and mc_runjob data cards. There is no general script to give you an example how to submit any type of job. The make_gen script is an example you can use to start with, but it will need to be modified to suit for your particular process. You will have to use some of your brain cells to do this, so please try to understand what this script does, do not treat is as a "black box". If you have an idea how to make it better or simpler let me know.

When user reqests a particular process to be generated you have to modify the template script and in some cases add now/replace old steering cards.I assume that you have some (very basic) knowledge of python and shell scripting.

How to modify the make_gen script:

Firs of all look at the script carefully. It consists of several parts.

At the top of it one defines several enviroment variables. You should fill the correct values.

Ok, we have filled the enviroment variables. Now read the script carefully.

In the second part of the script it prepares a "logfile" which will strore information about the MC run. This file will be used for further job submission and for storing information for the bookeeper.

More down below the script prepares a temporary file which contains mc_runjob cards. It fills it with values extracted from the enviroment variables in the head part of the script. This mc_runjob cards file will be needed to submit the production.

Yet further below (after the ##### define the d0gstar+d0sim+reco+recoA cards ### line) the script prepares second mc_runjob cards file which will be used to submit the children jobs when the generator job is done.

Then, below the line ### prepare the script which will extract jobname and generator #### we prepare a temporary python script which will read the output of job submission command, extract from it the generator job name and generator output file, and append it to the run logfile.

And finally comes the part when we actually register the job. (after # register the job line)

We copy the mc_runjob script to configuration scripts directory and execute command

reg_job $SCRIPT --final_disposition=cache --num_events=$NEVT>$TEMPORARYFILE

which submitts the job. The output of job submission is stored in temporary file which is analyzed by a python script which will extract the generator job name and generator file name in order to append it to logfile.

End of generator script description. Now let us use it.

Store the script in some temporary directory. Edit it to describe the process you would like to generate. Then execute command:

./make_gen

The script will run and when done it will create a file with extension "run". It is a plain text file with run information. have a look at it. It will contain the generator job name.

Check when the generator job is finished and when it is done you can submit children jobs.

You do not need to wait with children jobs submission until generator job is completed, but it is a good practide to wait. The generator job may crash and should this happen you will have to kill all children jobs.

How to submit children jobs.


Once the generator job is done and archieved you can submit its children jobs. In the directory where you submit jobs you should have a file with extension *.run with the run information and a coresponding mc_runjob script file. The mc_runjob script file will have the same name as run information file but with extension ".script" appended to it. You do not need to edit any of those files. In fact do not even think about editing them!

For example, your run information file could be qcd-incl-0.5-PtGt5.0-50000-xxx.run  and the mc_runjob file qcd-incl-0.5-PtGt5.0-50000-xxx.run.script .

To start a run you should execute start_run script from directory ....

python start_run [run_type] logfile

The run_type parameter must be one of the following: D, DSR, DSRA depending whether you would like to submit d0gstar only, d0gstar+sim+reco or full d0gstar+sim+reco+recoanalyze production. logfile is the name of run information logfile (for example qcd-incl-0.5-PtGt5.0-50000-xxx.run). You can submit more than one run in one command and you can use wildcards in the logfile names.

The start_run script will read the run information files, it will decode from them the generator file name, number of events to be generated and then if will start submitting the children jobs. After each child job is submitted its name will be appended to the run information file. This is a rather slow script, so be prepared that it will run for a while.

Once you are cone, the run information file will contain names of all jobs used in this run. At this stage you must manually move the run information file to the directory where the bookeeper expects to find run information files. Once this is done your run information will appear on the bookeeper WWW page after next update.