How to run jobs on the D0 production farm
today-Version 1.0
- Log onto one of the build machines (d0lxbld1-3)
- Go to the scratch area
cd /scratch/7/yourname
- setup the version you wish to run
setup D0RunII <version>
newrel -t <version> <version>
- go into the <version> directory you just made
- check out the latest version of the d0reco scripts from that branch
addpkg d0reco <branch> // for production
addpkg -h d0reco // for test
- type d0setwa to set your working area
- type
cd d0reco/scripts
buildfarmreco
wait a while ....
The default is maxopt from the build
In special cases where you have need to run the debug version or
use modified scripts (runboth...) you may wish to do the following
but this is generally only for testing. Such code should not
be used for production.
buildfarmreco [maxopt/debug] [/LOCAL]
The LOCAL uses the latest version of the run scripts you just checked out.
If they are stable, you can omit the LOCAL and the scripts will come
from the release <version>.
- buildfarmreco will make 2 things:
- A local standalone reco in <version>
- A tarfile which will be copied to d0farm:d0reco area on the farms
- If you need to test the release
- go into the <version> directory
- Do the following
unsetup D0RunII
setup TestData
runboth $recotestdatap07 10
This should process 10 events through reco and reco_analyze.
You will need sam to know about your code
- check the applications
to se
in the DB yet.
- If you have db access follow the instructions at:
http://d0db-prd.fnal.gov/sam_admin/updatedb.html
- If not, send an email to sam-admin@fnal.govequesting:
family/name/version
reconstruction d0reco <version>
analysis reco_analyze <version>
be put into the prd and int databases.
You can find the buildfarmreco source at: d0reco/scripts/buildfarmreco
What it does:
- Gets the executables from the release
- Copies the rcp databases from the release
- Gets the files in package rundata/d0reco
- Gets some files like ptable and hmatrix which are accessed via environmentals
- Gets the run scripts from d0reco/scripts
- Puts them all into a subdirectory called <$1>- <version>
- Tars it all up into <version>.tar.
runboth has 2 arguments, input data file and number of events.
You can find runboth at:
d0reco/scripts/runboth
It can run D0 code standalone.
- Hacks environmentals for RCP access
- Hacks environmentals for hmatrix
- Parses the input file name and chooses the correct rcp file to run
- Runs D0reco and reco_analyze, returns error code which depends on which
failed.
- login to host: 'd0bbin' as d0farm
- type
source \$HOME/FARM_SETUP
This will set up a pointer to the official version of the farm scripts
and put you into the $HOME/run subdirectory, where you should be running jobs.
- To make a new job for MC, type
make_cert <physics-key> <simversion> <recoversion>
This makes a dataset definition which checks to see if a file
has already been processed by <recoversion> and only processes
the ones that haven't!
- To make a new job for data, type
make_raw <minrun> <maxrun> <recoversion>
where min and max are the range of runs and recoversion is the code version you
will be running.
This does not do daq_test runs.
- To run a job type
runrecocert <dataset definition> <nodes> <recoversion> <queue>
where the run script: runrecocert
has parameters: nameOfDataSet noOfNodes recoversion queue db mode restart analysisname events email
The defaults are listed as the first option.
- nameOfDataSet - is name of existing data set in SAM Data Base
(you can run against producton or integration Data Base - parameter: prod/integ)
- noOfNodes - number of PC Farm nodes you want to run on
(this number depends on how many nodes are available, how many data sets
are to be run and how many files are included in data sets)
- recoversion - version of reconstruction code, eg. preco04.00.04
- queue = (TitaniumQ,BlueQ) The batch queue.
- db = (prd/int/dev) - should be 'prd' if you want to run against production
'int' if you want to run against integration
- mode = (recon_root/recon/root) - what you want to run
- restart = (no/yes) - restart of crashed project
- analysisname (no) - name of analysis of crashed project
- events (0)- number of events for reconstruction ('0' means all)
- email= (d0farmd0mino.fnal.gov) mail from job sent to this address
- To see running jobs
fbs lj
or listjobs.py
or fbs monitor &
You need to look at the farm_debug documentation to see how jobs are running.
How to run jobs on the D0 production farm
This document was generated using the
LaTeX2HTML translator Version 99.1 release (March 30, 1999)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -show_section_numbers farminstructions.tex
The translation was initiated by Heidi Schellman on 2001-06-06
Heidi Schellman
2001-06-06