jim_client software description for developers

Overview

This package provides an interface to manage jobs submitted to the SAM-Grid computing infrastructure. This user interface is a thin layer that interacts mainly with the jim submission site service (jim_broker_client package). The jim submission site service is responsible to accept grid jobs from the client, maintain a persistent queue of grid jobs, interact with the execution site on behalf of the user, mediate the user's management actions on the jobs at the execution site. The jim_client software provides interfaces to interact with submission sites in order to submit, list and delete jobs. The implementation of these interfaces consist of python wrappers around underlying Condor-G commands.

The entry point to the various interfaces is the "samg" command. This is a python script with 2 responsibilities:

  1. check the user credentials: verify that the user has a valid proxy; if not ask and convert the user's fermilab ticket (if present) into a KCA proxy.
  2. dispatch the command to the module respsonsible to handle the request (samg_submit.py, samg_list.py, samg_stop.py).
Type "samg" on the command line to get the help on line

Job Submission

Overview

In order to submit a job, the jim_client software accepts as input a job description file (JDF), where the user specifies the details of the job. This description consists of directives to the jim_client software (e.g. it triggers jim_client to package the user input) and to the SAM-Grid infrastructure (e.g. specifies requirements on the resources that should run the job). The directives are executed and the JDF is translated into a condor JDF. Ultimately, the job is submitted invoking an underlying condor command. Depending on the type of job the user is submitting, the translation is done differently. Refer to the appendices of the SAM-Grid manual for the list of possbile jobs and their description language.

General details

The module responsible for job submission is samg_submit.py. The module is diveded in two parts: first, it interprets the SAM-Grid JDF, acting according to the directive thereof and translating it into a Condor JDF; second, it uses an underlying condor command to submit the job to a submission site ("condor_submit -s" or condor_submit_dag). The specific behavior of the two phases depend on the type of job the user is submitting.

There are two major categories of jobs: structured and unstructured. Unstructured jobs are independent unit of work: they do not expose their internal structure to the grid services (e.g. the broker). Examples of unstructured jobs are producing events for a montecarlo request; merging a dataset (e.g. of montecarlo events); analyzing a dataset.
On the other hand, structured jobs expose their internal details to the grid services. A structured is logically composed of more than one unstructured job. The JDF of a structured job describes the unstructured jobs and their relashionship. An example of structured job is the production of montecarlo events and the subsequent merging of the output. As of today (Apr 15, 2004), structured jobs are prototypical and their underlying implementation use condor_dagman.

The SAM-Grid job description language (JDL) consists of a newline separated list of instructions. Instruction are in the form of a "command" (a single non-space separated word) or "attribute = value" pairs. (Note: "command"s are in the parser by design, but not used in any of the current language implementations).
During the translation phase (CJDFTranslator.translate() method), samg_submit tokenizes the JDF, applies a very basic syntactic check, then looks at the attribute "job_type". Depending on the job type, a different python module is loaded, a specific document context (Context class) is created and passed to the constructor of the specific job_type document (Doc class of the given module), which is the representation of the user provided job description.
At this point the semantic check is applied (Doc.checkSemantic()). The semantic check is responsible to instantiate and save in the context the Attribute objects, which encapsulate the information provided by the users AND the information that must be present in the final Condor JDF.
The translation phase of samg_submit finishes with doc.writeCJDF, which is responsible for streaming the Attributes in the document context to a file, according to a document specific template.

Specific JDL modules

The classes used in the semantic check (Context, Doc, Attribute) are specializetions of parent classes implemented in the samgjob module. This module specifies the attributes of the SAM-Grid JDL and of the resulting Condor JDF that are common to all types of jobs. Attributes provided by the user are constants that begin with SAMGJDF_ (Examples: SAMGJDF_JOB_TYPE_, SAMGJDF_USER_ARGUMENTS_, etc.). Attributes that must be present in the Condor JDF and, eventually appear in the classad representation of the jobm, are constants that begin with CLASSAD_ (Examples: CLASSAD_ENVIRONMENT_, CLASSAD_RANK_, etc.).
The Doc class of each module optionally define 4 lists:
  1. SAMGJDF_REQUIRED_ATTR_: the list of attributes that must be passed by the user in the given JDF.
    Syntax: [ [attr1], [attr2], [attr3, attr4] ]
    Semantic: attr1 and attr2 and (attr3 or attr4) must be provided
  2. CLASSAD_REQUIRED_ATTR_: the list of attributes that must be present in the condor JDF. This list of attributes will be automatically instantiated as Attribute objects by the samgjob.semantic() method (invoked by the overriding methods).
  3. SAMGJDF_INVALID_ATTR_: attributes that must not be specified in the SAM-Grid JDF by the user
  4. SAMGJDF_MUTUALLY_EXCLUSIVE_ATTR_: attributes specified by the user in the SAM-Grid JDF that are mutually exclusive
Attribute objects are instnciated using a static parametrized Factory method of the Attribute class. The paramter to the factory is the name of the attribute to be instanciated (one of the SAMDJDF_* or CLASSAD_* constants). The factory is automatically invoked (in samgjob.checkSemantic()) for all the user specified attributes AND for all the job specific CLASSAD_REQUIRED_ATTR_ attributes (specified in the specific job_type Doc class).
Dependencies among attributes (e.g. an attribute can be created only after another is created) are handled in the factory for each Attribute (grep for dependentAttrs and subsequent code).

In order to add a new attribute to a JDL

  1. edit the corresponding job_type module. Available modules as of Apr 15, 2004 are: samgjob.py (parent classes), mcjob.py, mergejob.py, samanalysisjob.py, structuredjob.py, cafjob.py, vanillajob.py
  2. add the new SAMGJDF_ATTRIBUTE_NAME_ constant: the value of the constant correspond to the attribute name that the user specifies
  3. optionally add SAMGJDF_ATTRIBUTE_NAME_ to the 4 lists of the specific Doc class, to make it a required attribute, a mutually exclusive attribute, etc. (see above)
  4. add an "elif" statement to the static Attribute factory and implement the instanciation of the Attribute object. Depending on the type of attribute you may need to initialize different fileds of the constructor. For example: an Attribute may appear in the job classad with a name different from what the user specified in the SAM-Grid JDF; or it may be passed as an option to the execution site infrastructure; or it may be used only to build other attributes, but does not appear in the final classad. Look at the comments in the code to understand what fields to initialize. Also, if this new attribute depends on the creation of other attributes to be instanciated, specify it in the factory implementation (grep for dependentAttrs and subsequent code for examples)
  5. change the factory of other attributes that may need to use the new attribute
  6. look at the Doc.template() method: you may need to add a print method if the new attribute needs special treatement or if your module does NOT automatically prints all the user provided attributes (see use of samgjob.Doc.printAllTheOtherAttributes() )

First written by Gabriele Garzoglio on Apr 15, 2004