Development

De-coupling DØRunjob and SAMGrid

DØRunjob and SAMGrid interactions have changed by removing the need for the jim_job_manager package to supply a DZero workflow description. The new interface requires jim_job_manager to construct an input file containing minimal information and the site and job, essentially just passing the JDL data to DØRunjob.

The input file is parsed by DØRunjob using the standard python ConfigParser module. An example file is shown. This contains enough information for a 'workflow factory' to construct production jobs.

# Example input file used in SAMGrid MC production
# collects JIM-specific and site-specific info
[jdl]
requestid = 60592

numevents=2

originname = clued0
producedforname = Peter Love

jobname = d0rj-123
optionalmonitors = jim

jimglobaljobid = 123
jimclusterjobid = 567
jimlocaljobid = 890

jobcachearea = /work/blackpool-clued0/plove/jobs

loglevel = INFO

distro = /work/blackpool-clued0/plove/mcc-dist
cardfiles = /work/blackpool-clued0/plove/cardfiles
minbidir = /work/blackpool-clued0/plove/minbias

facilityname = clued0
producedbyname = plove
destinationpath = samgfarm2.fnal.gov:/data/sam/disk
samtemplocation = sam@samgfarm2.fnal.gov:/data/sam/tmp_buffer
stationname = samgfarm
samusername = sam
samgroup = test 

foo = bar

Example used for reprocessing, no requestid and inputfiles added:

[jdl]
numrecords = 2

originname = clued0
producedforname = Peter Love

jobname = d0rj-test-123
optionalmonitors = jim

inputfiles = /work/blackpool-clued0/plove/dump/all_0000195142_001.raw
processid = 9497248
jimglobaljobid = 3212
jimclusterjobid = 567
jimlocaljobid = 890
appversion = p17.03.03-test
appfamily = =reconstruction
appname = d0reco-test

jobcachearea = /work/blackpool-clued0/plove/jobs

loglevel = DEBUG
distro = /work/blackpool-clued0/plove/RTE_runjob

facilityname = clued0
producedbyname = plove
destinationpath = samgfarm2.fnal.gov:/data/sam/disk
samtemplocation = sam@samgfarm2.fnal.gov:/data/sam/tmp_buffer
stationname = samgfarm
samusername = sam
samgroup = test 

job_type = dzero_reconstruction
sam_experiment = d0
sam_universe = prd
group = d0production
station_name = osg-ouhep
check_consistency = true
test_run = true
instances = 1
d0_release_version = p17.09.06
jobfiles_dataset = d0repro_jobfiles_p20.07.00_samgridV7-2-5
input_dataset =  
grid_resource_requirement_string = cmsosgce.fnal.gov:2119/jobmanager-condor

DØ CVS access

DØRunjob uses packages from Shahkar and MCPS, these must be checked out along with DØRunjob using the following commands. A version of the frozen sam client compiled for python 2.4 also needs to be downloaded.

ftp://fnkits.fnal.gov/ftp/products/sam/v7_4_0a_py24/Linux+2/sam_v7_4_0a_py24_Linux+2.tar.gz
mkdir sam-py24; cd sam-py24
tar xfz sam_v7_4_0a_py24_Linux+2.tar.gz

export CVSROOT=cvsuser@d0cvs.fnal.gov:/cvsroot/d0cvs
cvs co D0Runjob

export CVSROOT=:pserver:anonymous@cdcvs.fnal.gov:/cvs/cd_read_only
cvs co Shahkar

export CVSROOT=:pserver:anonymous@cdcvs.fnal.gov:/cvs/uscms
cvs co MCPS

The following environment variables need to be set when using DØRunjob:

export PROJECTS=/home/plove/projects

export PYTHONPATH=$PROJECTS/D0Runjob/d0runjob:$PYTHONPATH
export PYTHONPATH=$PROJECTS/MCPS/MCPSRunjob/Python:$PYTHONPATH
export PYTHONPATH=$PROJECTS/Shahkar/Python:$PYTHONPATH

export PYTHONPATH=$PROJECTS/sam-py24/lib/sam_pyclib.zip:$PYTHONPATH
export PYTHONPATH=$PROJECTS/sam-py24/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$PROJECTS/sam-py24/lib
export SAM_DB_SERVER_NAME="SAMDbServer.clued0_prd2:SAMDbServer"
export SAM_LOG_SERVER_ADDR="d0ora2.fnal.gov:40583"
export SAM_NAMING_SERVICE_IOR="IOR:000000000000002a49444c3a6f6f632e636f6d2f436f734e616d696e672f4f424e616d696e67436f6e746578743a312e3000000000000001000000000000002c000100000000001064306f7261322e666e616c2e676f7600233200000000000c4e616d655365727669636500"

export SHAHKAR_FACTORY_EXT=$PALPROJECTS/D0Runjob/xml/d0factory.xml

Bug reporting

All problems should be emailed to either of the maillists. Please include steps leading to the problem with pointers to relevant files.

SAM interface

The dØrjsam package provides support for calling SAM methods via the sam user API, which must be installed if using SAM functions within DØRunjob. An interface using SAM webservices is also available (and much cleaner) but SAM webservices are not currently supported for production. DØRunjob uses the following SAM methods

sam.getMetadata()

sam.reserveProject()
sam.startProject()
sam.stopProject()
sam.getProjectInfo()
sam.getMetadata()

RCP handling

RCPs are required on the submission node filesystem. DØRunjob will edit RCP entries based on macro and default settings in the application schema. A framework rcp is required for most DØ executables. At submission time all RCPTree items are written to the task directory, no rcps are required to be copied from the DistroDir at runtime.

DØ software setup

DØ executables are packaged in RTE tarballs. These are usually staged by the job from SAM onto the execution node. Once staged they are unpacked into the task directory and a setup script is sourced, preparing the runtime environment. DØRunjob expects a well-define interface to setup execution packages. Executable package setup scripts are expected to do the following:

Setup scripts will be given DistroDir and TaskDir.