jim_client software description for developers
Overview
This package provides an interface to manage jobs submitted to the SAM-Grid
computing infrastructure. This user interface is a thin layer that
interacts mainly with the jim submission site service (jim_broker_client
package). The jim submission site service is responsible to accept
grid jobs from the client, maintain a persistent queue of grid jobs,
interact with the execution site on behalf of the user, mediate the
user's management actions on the jobs at the execution site. The
jim_client software provides interfaces to interact with submission
sites in order to submit, list and delete jobs. The implementation of
these interfaces consist of python wrappers around underlying Condor-G
commands.
The entry point to the various interfaces is the "samg" command. This is
a python script with 2 responsibilities:
- check the user credentials: verify that the user has a valid proxy;
if not ask and convert the user's fermilab ticket (if present) into
a KCA proxy.
- dispatch the command to the module respsonsible to
handle the request (samg_submit.py, samg_list.py, samg_stop.py).
Type "samg" on the command line to get the help on line
Job Submission
Overview
In order to submit a job, the jim_client software accepts as input
a job description file (JDF), where the user specifies the details
of the job. This description consists of directives to the jim_client
software (e.g. it triggers jim_client to package the user input) and
to the SAM-Grid infrastructure (e.g. specifies requirements on the
resources that should run the job). The directives are executed and
the JDF is translated into a condor JDF. Ultimately, the job is
submitted invoking an underlying condor command. Depending on the type of job
the user is submitting, the translation is done differently. Refer
to the appendices of the
SAM-Grid manual for the list of possbile jobs and their
description language.
General details
The module responsible for job submission is samg_submit.py. The
module is diveded in two parts: first, it interprets the SAM-Grid JDF,
acting according to the directive thereof and translating it into a
Condor JDF; second, it uses an underlying condor command to submit the
job to a submission site ("condor_submit -s" or condor_submit_dag).
The specific behavior of the two phases
depend on the type of job the user is submitting.
There are two major categories of jobs: structured and unstructured.
Unstructured jobs are independent unit of work: they do not expose
their internal structure to the grid services (e.g. the broker).
Examples of unstructured jobs are producing events for a montecarlo
request; merging a dataset (e.g. of montecarlo events);
analyzing a dataset.
On the other hand, structured jobs expose their internal details to the
grid services. A structured is logically composed of more than one
unstructured job. The JDF of a structured job describes the
unstructured jobs and their relashionship. An example of structured
job is the production of montecarlo events and the subsequent merging
of the output. As of today (Apr 15, 2004), structured jobs are
prototypical and their underlying implementation use condor_dagman.
The SAM-Grid job description language (JDL) consists of a newline
separated list of instructions. Instruction are in the form of a
"command" (a single non-space separated word) or
"attribute = value" pairs. (Note: "command"s are in the parser by
design, but not used in any of the current language
implementations).
During the translation phase (CJDFTranslator.translate() method),
samg_submit tokenizes the JDF, applies a very basic syntactic check,
then looks at the attribute "job_type".
Depending on the job type, a different python module is loaded,
a specific document context (Context class) is created and passed
to the constructor of the specific job_type document (Doc class of the
given module), which is the representation of the user provided job
description.
At this point the semantic check is applied (Doc.checkSemantic()).
The semantic check is responsible to instantiate and save in the
context the Attribute objects, which encapsulate the information
provided by the users AND the information that must be present
in the final Condor JDF.
The translation phase of samg_submit finishes with doc.writeCJDF,
which is responsible for streaming the Attributes in the document
context to a file, according to a document specific template.
Specific JDL modules
The classes used in the semantic check (Context, Doc, Attribute)
are specializetions of parent classes implemented in the samgjob module.
This module specifies the attributes of the SAM-Grid JDL and
of the resulting Condor JDF that are common to all types of jobs.
Attributes provided by the user are constants that begin with
SAMGJDF_ (Examples: SAMGJDF_JOB_TYPE_, SAMGJDF_USER_ARGUMENTS_,
etc.). Attributes that must be present in the Condor JDF and,
eventually appear in the classad representation of the jobm, are
constants that begin with CLASSAD_ (Examples: CLASSAD_ENVIRONMENT_,
CLASSAD_RANK_, etc.).
The Doc class of each module optionally define 4 lists:
- SAMGJDF_REQUIRED_ATTR_: the list of attributes that must be
passed by the user in the given JDF.
Syntax: [ [attr1], [attr2], [attr3, attr4] ]
Semantic: attr1 and attr2 and (attr3 or attr4) must be provided
- CLASSAD_REQUIRED_ATTR_: the list of attributes that must be
present in the condor JDF. This list of attributes will be
automatically instantiated as Attribute objects by the
samgjob.semantic() method (invoked by the overriding methods).
- SAMGJDF_INVALID_ATTR_: attributes that must not be specified in
the SAM-Grid JDF by the user
- SAMGJDF_MUTUALLY_EXCLUSIVE_ATTR_: attributes specified by the
user in the SAM-Grid JDF that are mutually exclusive
Attribute objects are instnciated using a static parametrized
Factory method of the Attribute class. The paramter to the factory
is the name of the attribute to be instanciated (one of the
SAMDJDF_* or CLASSAD_* constants). The factory is automatically
invoked (in samgjob.checkSemantic()) for all the user specified
attributes AND for all the job specific CLASSAD_REQUIRED_ATTR_
attributes (specified in the specific job_type Doc class).
Dependencies among attributes (e.g. an attribute can be created only
after another is created) are handled in the factory for each
Attribute (grep for dependentAttrs and subsequent code).
In order to add a new attribute to a JDL
- edit the corresponding job_type module. Available modules as of
Apr 15, 2004 are: samgjob.py (parent classes), mcjob.py, mergejob.py,
samanalysisjob.py, structuredjob.py, cafjob.py, vanillajob.py
- add the new SAMGJDF_ATTRIBUTE_NAME_ constant: the value of the
constant correspond to the attribute name that the user specifies
- optionally add SAMGJDF_ATTRIBUTE_NAME_ to the 4 lists of the
specific Doc class, to make it a required attribute, a mutually
exclusive attribute, etc. (see above)
- add an "elif" statement to the static Attribute factory and
implement the instanciation of the Attribute object. Depending on
the type of attribute you may need to initialize different fileds of
the constructor. For example: an Attribute may appear in the job
classad with a name different from what the user specified in the
SAM-Grid JDF; or it may be passed as an option to the execution site
infrastructure; or it may be used only to build other attributes,
but does not appear in the final classad. Look at the comments in
the code to understand what fields to initialize. Also, if this new
attribute depends on the creation of other attributes to be
instanciated, specify it in the factory implementation (grep for
dependentAttrs and subsequent code for examples)
- change the factory of other attributes that may need to use the
new attribute
- look at the Doc.template() method: you may need to add a print
method if the new attribute needs special treatement or if your
module does NOT automatically prints all the user provided attributes
(see use of samgjob.Doc.printAllTheOtherAttributes() )
First written by Gabriele Garzoglio on Apr 15, 2004