DAJ Daemon HOWTO

Quick start instructions for getting DAJ Daemon operational.

Contents

  1. Unpack tarball
  2. Configure dajdrc
  3. Configure dajd_quotas
  4. Setup products and get grid credentials
  5. Start DAJ Daemon
  6. Monitoring
  7. More info

  1. Unpack daj.tgz tarball in a directory (DAJdir).

    $ tar xzf daj.tgz

  2. Configure dajdrc in DAJdir.
  3. Configure dajd_quotas in DAJdir.

    Needed for automatically starting new requests only. See dajd_quotas.template in DAJdir. Each non-comment non-blank line corresponds to a site to be managed. Each site line has at least two fields and as many as five fields separated by whitespace.

    For OSG and LCG sites station name is of the form: station;grid_requirement_string
    e.g. osg-ouhep;red.unl.edu:2119/jobmanager-pbs
    Fields 2, 3, and 4 may be omitted and default to zero. Fields 1-4 must be non-negative integers. A value of zero for field 4 means no upper limit on events in request.

    For example consider a dajd_quotas file consisting of:

    luhep 1 50000 0 200000
    osg-ouhep;red.unl.edu:2119/jobmanager-pbs 3 150000 50000
    ccin2p3-grid1;marseillece01.mrs.grid.cnrs.fr:2119/jobmanager-pbs-dzero 4 200000 100000
    ress1@resspool;grid1.oscer.ou.edu:2119/jobmanager-lsf,\
    osg-gw-2.t2.ucsd.edu:2119/jobmanager-condor 2 100000
    # comment line in example dajd_quotas file
    > Line 1 means maintain at site with station name luhep at least 1 request running or at least 50,000 events in running grid jobs of running requests at luhep, and obtain requests with at least 0 events but not more than 200,000 events.
    > Line 2 means maintain at OSG site with grid requirement string "red.unl.edu:2119/jobmanager-pbs" at least 3 requests running or at least 150,000 events in running grid jobs of running requests, and only run requests with at least 50,000 events in them.
    > Line 3 means maintain at LCG site with grid requirement string "ccin2p3-grid1;marseillece01.mrs.grid.cnrs.fr:2119/jobmanager-pbs-dzero" at least 4 requests running or at least 200,000 events in running grid jobs of running requests, and obtain requests with at least 100,000 events.
    > Lines 4 and 5 define a resource pool for ReSS OSG sites. The pool's name is ress1 of type resspool and consists of two OSG computing elements. This pool will maintain at least 2 requests running or at least 100,000 events in running grid jobs of running requests. Line 5 is a continuation of line 4.

  4. Setup necessary products and obtain appropriate grid credentials for sites jobs will be submitted to. For OSG and LCG sites the credentials must be stored in the Fermi myproxy server. Fermi KCA credentials are okay. The grid subject of the credentials must be in the DZero VOMS. See http://www-d0.fnal.gov/VO/DZero_VO_Instructions.html#UserInstructions for instructions on how to register in the DZero VOMS. For example for native Samgrid installations:

    $ setup sam
    $ setup jim_client
    $ grid-proxy-init -valid 350:0

  5. Start DAJ Daemon.

    $ daj_daemon.py

  6. Monitor the run log, the error log, and email (if configured). The names of the logs are displayed when the daemon starts. The operator must take care to renew their proxy at the appropriate time for continued job submission. Notifications are sent when remaining proxy time gets below a configurable threshold.

  7. More information:


$Revision: 1.3 $
Joel Snow
Created September 8, 2006
Revised October 25, 2007