De-coupling DØRunjob and SAMGrid
DØRunjob and SAMGrid interactions have changed by removing the need for the jim_job_manager package to supply a DZero workflow description. The new interface requires jim_job_manager to construct an input file containing minimal information and the site and job, essentially just passing the JDL data to DØRunjob.
The input file is parsed by DØRunjob using the standard python ConfigParser module. An example file is shown. This contains enough information for a 'workflow factory' to construct production jobs.
# Example input file used in SAMGrid MC production # collects JIM-specific and site-specific info [jdl] requestid = 60592 numevents=2 originname = clued0 producedforname = Peter Love jobname = d0rj-123 optionalmonitors = jim jimglobaljobid = 123 jimclusterjobid = 567 jimlocaljobid = 890 jobcachearea = /work/blackpool-clued0/plove/jobs loglevel = INFO distro = /work/blackpool-clued0/plove/mcc-dist cardfiles = /work/blackpool-clued0/plove/cardfiles minbidir = /work/blackpool-clued0/plove/minbias facilityname = clued0 producedbyname = plove destinationpath = samgfarm2.fnal.gov:/data/sam/disk samtemplocation = email@example.com:/data/sam/tmp_buffer stationname = samgfarm samusername = sam samgroup = test foo = bar
Example used for reprocessing, no requestid and inputfiles added:
[jdl] numrecords = 2 originname = clued0 producedforname = Peter Love jobname = d0rj-test-123 optionalmonitors = jim inputfiles = /work/blackpool-clued0/plove/dump/all_0000195142_001.raw processid = 9497248 jimglobaljobid = 3212 jimclusterjobid = 567 jimlocaljobid = 890 appversion = p17.03.03-test appfamily = =reconstruction appname = d0reco-test jobcachearea = /work/blackpool-clued0/plove/jobs loglevel = DEBUG distro = /work/blackpool-clued0/plove/RTE_runjob facilityname = clued0 producedbyname = plove destinationpath = samgfarm2.fnal.gov:/data/sam/disk samtemplocation = firstname.lastname@example.org:/data/sam/tmp_buffer stationname = samgfarm samusername = sam samgroup = test job_type = dzero_reconstruction sam_experiment = d0 sam_universe = prd group = d0production station_name = osg-ouhep check_consistency = true test_run = true instances = 1 d0_release_version = p17.09.06 jobfiles_dataset = d0repro_jobfiles_p20.07.00_samgridV7-2-5 input_dataset =
grid_resource_requirement_string = cmsosgce.fnal.gov:2119/jobmanager-condor
DØ CVS access
DØRunjob uses packages from Shahkar and MCPS, these must be checked out along with DØRunjob using the following commands. A version of the frozen sam client compiled for python 2.4 also needs to be downloaded.
ftp://fnkits.fnal.gov/ftp/products/sam/v7_4_0a_py24/Linux+2/sam_v7_4_0a_py24_Linux+2.tar.gz mkdir sam-py24; cd sam-py24 tar xfz sam_v7_4_0a_py24_Linux+2.tar.gz export CVSROOTemail@example.com:/cvsroot/d0cvs cvs co D0Runjob export CVSROOT=:pserver:firstname.lastname@example.org:/cvs/cd_read_only cvs co Shahkar export CVSROOT=:pserver:email@example.com:/cvs/uscms cvs co MCPS
The following environment variables need to be set when using DØRunjob:
export PROJECTS=/home/plove/projects export PYTHONPATH=$PROJECTS/D0Runjob/d0runjob:$PYTHONPATH export PYTHONPATH=$PROJECTS/MCPS/MCPSRunjob/Python:$PYTHONPATH export PYTHONPATH=$PROJECTS/Shahkar/Python:$PYTHONPATH export PYTHONPATH=$PROJECTS/sam-py24/lib/sam_pyclib.zip:$PYTHONPATH export PYTHONPATH=$PROJECTS/sam-py24/lib:$PYTHONPATH export LD_LIBRARY_PATH=$PROJECTS/sam-py24/lib export SAM_DB_SERVER_NAME="SAMDbServer.clued0_prd2:SAMDbServer" export SAM_LOG_SERVER_ADDR="d0ora2.fnal.gov:40583" export SAM_NAMING_SERVICE_IOR="IOR:000000000000002a49444c3a6f6f632e636f6d2f436f734e616d696e672f4f424e616d696e67436f6e746578743a312e3000000000000001000000000000002c000100000000001064306f7261322e666e616c2e676f7600233200000000000c4e616d655365727669636500" export SHAHKAR_FACTORY_EXT=$PALPROJECTS/D0Runjob/xml/d0factory.xml
All problems should be emailed to either of the maillists. Please include steps leading to the problem with pointers to relevant files.
The dØrjsam package provides support for calling SAM methods via the sam user API, which must be installed if using SAM functions within DØRunjob. An interface using SAM webservices is also available (and much cleaner) but SAM webservices are not currently supported for production. DØRunjob uses the following SAM methods
sam.getMetadata() sam.reserveProject() sam.startProject() sam.stopProject() sam.getProjectInfo() sam.getMetadata()
RCPs are required on the submission node filesystem. DØRunjob will edit RCP entries based on macro and default settings in the application schema. A framework rcp is required for most DØ executables. At submission time all RCPTree items are written to the task directory, no rcps are required to be copied from the DistroDir at runtime.
DØ software setup
DØ executables are packaged in RTE tarballs. These are usually staged by the job from SAM onto the execution node. Once staged they are unpacked into the task directory and a setup script is sourced, preparing the runtime environment. DØRunjob expects a well-define interface to setup execution packages. Executable package setup scripts are expected to do the following:
- define runtime environment variables
- copy rundata files from DistroDir to TaskDir
Setup scripts will be given DistroDir and TaskDir.