JIM Sandbox

Design manifesto and maintainer’s guide

1           Introduction

 

JIM sandbox is a part of the SAMGrid job management. It was originally intended to be used on the Grid-Fabric boundary by the JIM job managers, which instantiate SAMGrid job at the execution site as a collection of local jobs, see the document on Grid to Fabric job submission interface. The purpose of the sandbox component is to provide a viable abstraction for a collection of all the files except the input/output data that are required by a user job. The services provided are: packing and unpacking of the files, as well as physical management (movement) of file collections both within the local cluster and through the Grid/Fabric gateway.

 

In what follows, we describe the rationale behind the sandboxing development. We then give some details of the design and implementation, which should be sufficient to start delving into the inline documentation as necessary for maintenance, and then notes on installation and configuration. We conclude with package status and issues.

 

2           Rationale and Principal Features

 

The need for a sandbox abstraction and associated services has emerged as follows. In a traditional (pre-Grid) computing model, users tend to make at least two assumptions about the environment where they execute their jobs. These assumptions, which are violently broken on a Grid, follow:

 

1.     Standard software is installed cluster-wide, by means of e.g. NFS-exported UPS products tree. Software installed includes the experiment software as well as numerous “infrastructure” packages such as the Python interpreter.

2.     There is a durable (almost permanent), “no-cost” local storage, called home area, where the jobs and agents (such as batch systems) acting on behalf thereof can safely deposit small files. These files include both those needed to bootstrap the job (input) and any logs produced (output).

 

The first assumption is now often stated explicitly, and the experiments are lifting it by developing tools to envelop their applications and provide appropriate run-time environment (good examples are D0 RTE and CDF CAF). As we will later explain, user code and other run-time environment must be augmented with additional infrastructure packages and then deployed physically at the worker nodes (i.e. computers where actual data processing takes place, usually in the batch mode).  In SAMGrid, this is accomplished by JIM sandboxing.

 

What is more, the second assumption is almost always implied and cast in stone. The “home area” concept has a long history and is part of the broader concept of an account, whereby computer access is controlled statically. Grid computing in general (not JIM or SAMGrid in particular) strives to provide a fuller and much more dynamic resource control by virtue of sophisticated authorization frameworks. The Run II physics experiments are adopting and driving these services. Thus, it becomes increasingly necessary to be able to move job files in and out of the execution node bypassing home.  JIM sandboxing obviates the home area concept.

                                                                                                                                                    

Incidentally, both of the above assumptions involve usage of a shared file system such as NFS. Many of our collaborators, from D0 and CDF, as well as system administrators have repeatedly expressed dissatisfaction with reliance on a shared file system. It is these people, who actually have had rich experience managing jobs on large clusters that shaped our cautious attitude towards shared file systems. The most common issue is the performance bottleneck (because of the centralized topology and UDP-based communication); low security (NFS authentication is IP based) should also be mentioned. JIM sandboxing provides complete independence of the shared file system.

 

In addition to the above assumptions, (SAM)Grid computing faces the well-known issue of dramatic variation of the computing environment of the sites of its deployment. Development of a uniform job submission interface, which could be used by standard Grid machinery such as that of Condor-G/JIM, was severely complicated by this heterogeneity, especially when it came to the mechanisms of job file transfer. Most systems relied on a shared file system; some used batch systems with built-in file transfer mechanisms, etc. To make the task of the Grid-Fabric job submission interfacing manageable, we (SAMGrid developers) decided to develop a standalone component, which would be separate from the actual job management, and which minimized the dependence on the local site configuration. This independence could not be complete because at least one executable and at least one output file per job had to be transferred by locally configured mechanisms (e.g. by the batch system). We reduced, however, the management of hundreds of job-related files to understanding such a local configuration for very few files with subsequent bootstrapping of the sandboxing, whereby the same software is used at all the participating sites, however dissimilar.

 

Last but not least, we observe that some of the files needed by the user job, at least in the case of coordinated activities such as Monte-Carlo (MC) production, are the same for many jobs. For example, the standard D0 and CDF code releases are used multiple times for different MC requests, and efforts began within the experiments (outside of JIM) to package (pieces of) releases as tar-balls. At some point, it was simultaneously proposed by a number of people to use a sophisticated data handling system such as SAM for retrieval of such large (GB size) common files, and thereby leverage some of the powerful data handling features. To name a few of such features:

 

Ø     central bookkeeping

Ø     dynamic (on demand) data replication, as opposed to manual installation of releases from repositories such as KITS,

Ø     intelligent caching of common files with automatic reclamation of space from old, unused files (that are in the GB range),

Ø     file transfer throttling and other (global) resource management,

Ø     robustness through retrials and failover for alternative replicas,

Ø     ease of interface with a Grid-level scheduler/resource broker,

 

Obviously, these features have a less profound effect for job files, than for the actual data; but being able to leverage an existing technology instead of developing new tools was a definite advantage. Note that we can restate the second point above more strongly as independence from the software pre-installed on sites, which in turn is part of the highly desired independence of physics results from the identity of the site.

 

Thus, a software component was conceived to provide the above services for the job management. It was not designed and developed from scratch but grew within the JIM job management suite and was eventually identified and cut out. As for the term “sandboxing” itself, it was originally used in security context, to provide isolation of the user programs from its ambience – the hosting execution environment. We (and many others) use the same word in a different, complementing sense. If you would like a comparison to the real-life sandbox, we provide bagging services for bringing the toys in and carrying garbage out of the play area, whereas the security context means understanding of the boundaries of the area, and rules e.g. not to throw sand out. Obviously, both aspects are needed.

 

3           Design and Implementation

 

To provide the services described above, we followed the following strategies while designing JIM sandboxing:

Ø     Develop an easy way to gather all the files required by the job in a single logical container, called sandbox. These files include both those specified by the user explicitly and those implied by the job, such as the X509 user proxy, configuration instances, file transfer clients etc.

Ø     Provide an easy mechanism to transfer this entire collection to the worker node of the execution site. “Ease” refers to the ability to insert execution of sandbox management code before and after actual user job and thus redefine the job as far as the local batch system is concerned, without dramatically complicating the definition of the wrapped job. For a counter-example, specifying in a submission command line a long list of files that must be pre-staged is unacceptably tedious and error-prone.

Ø     Control the transfer of the sandbox constituents, whose number and/or size may be large, to multiple destinations within the cluster, i.e. to many worker nodes. Such control is desired for efficiency and reliability, to avoid hundreds of simultaneously starting jobs accessing retrieving their constituents. As another counter-example, implicit file retrieval (access) through a home-like shared disk area is completely unmanageable in the case of NFS.

Ø     Provide a service to the user job for returning a “small” output back to the aforementioned logical collection. This output is separate from any data designated for data handling system and includes log files, etc. Symmetrically to the input reading, this service should be transparent to the batch system (for ease of job submission configuration), efficient and controllable.

 

Thus, we start with a sandbox as a logical container. Although we map sandboxes to (initially blank) disk directories, we strive to provide a level of abstraction slightly above a disk directory and other operating system concepts. It is physically created at the head-node of the cluster. We then allow the user (i.e. the job management software) populate the sandbox with the necessary constituents (files), typically by means of creating symbolic links. Of course, we check for errors while dereferencing these links. Once the sandbox is finalized, the user requests a handle that can be used to reconstitute the sandbox on another machine (e.g. worker node). We accomplish this by packing the sandbox, instantiating a sandbox replication service (to be explained later) and returning, as the handle, a bootstrap executable. Packaging includes addition of files internal to JIM sandbox and creation of the first stage of the bootstrapping process, which is presented next.

3.1         Bootstrapping

 

The term “bootstrap” means a sequence of construction/initialization stages whereby each subsequent stage uses the machinery created in the previous stage. There are three stages in the course of sandbox setup:

 

 

Figure 1. Bootstrapping stages in JIM Sandbox

 

Initially, there is a bare minimum assumed about the machine where the sandbox replication takes place – nothing beyond the standard OS with “sh” and “tar”. The bootstrap executable combines both control scripts and its input and is therefore the one and only file that must be transferred by the batch system in a way that must be configured locally. This binary contains in itself the files required for the second stage of the bootstrap process. [1] These second-stage files are the binaries, libraries and configuration files for the file transfer mechanisms as well as the script containing instructions to retrieve a list of files from the physical location of the sandbox. When executing, the bootstrap binary unpacks these files and passes control to the “sandbox manager” script, or stage two. As a technical detail, our self-extractor is a compiled binary, which we preferred over a “standard” ASCII UNIX facility such as “shar” because of unavailability of the “uudecode” which the latter uses and of the better size and speed of the program. Thus, we require the presence of a C compiler at the time of the preparation and we assume that the packing occurs on the same architecture as the unpacking.

 

Stage two invokes file transfers to fetch the user sandbox, i.e. the file initially supplied by the SAMGrid user at the time of job submission, as well as “application”-specific files needed for the stage three. Our stage three transfers are done via the SAM data handling system, and therefore the files retrieved in stage two are the data handling (SAM) clients: sam_user_api (or its successors), sam_cp and their dependencies.

 

Stage three is, strictly speaking, outside of JIM sandboxing per se. It retrieves SAM datasets specified by the upper-level layers (JIM job managers and other SAMGrid services) and passes control to the “user script” – the script supplied by the end user at the time of SAMGrid job submission. In principle, this script may be the base of a new, user-resigned bootstrapping sequence involving even more files.

3.2         Sandbox replication service

 

The core of the sandbox replication service is a built-in file transfer mechanism. It was designed to be part of the sandboxing management for the sake of facilitation of the configuration of SAMGrid execution sites. We chose a flavor of gridFTP, which for our purposes is nothing but a common FTP client/server suite with specially configured security mechanisms. Our choice of gridFTP is driven by the popularity of both the underlying FTP mechanisms and the GSI standards, as well as by the ease of derivation of the security context from that of the associated Grid job.

 

The actual file transfers are authenticated with (a form of) the same X509 credential that was used to authenticate the Grid job at the execution site at hand. This proxy credential is an important part of the job’s Grid context; it is typically accessed through the X509_USER_PROXY environment variable. The authorization file is also derived from that of the cluster’s gatekeeper and restricts access to those users who were authorized to run jobs at this site in the first place.

 

Conceptually, this service is instantiated dynamically for the duration of the Grid job (i.e. for the lifetime of the associated local jobs in the batch system). In practice, we prefer to deploy statically a server on the gateway node, called jim_gridftp, that can be used for multiple jobs for multiple users. Dynamic server starting/stopping is also supported the jim_gridftp software package, which provides additional isolation of individual Grid job from each other.

3.3         Output Management

In the second stage of the bootstrapping process, JIM sandbox sets an environment variable, OUTPUT_FILE, that is propagated through the subsequent stages to all the layers of the user job. The user job may gather files that it deems important (these may include any core files, as long as they are not too big) and create e.g. a compressed tar file pointed to by the variable. Upon completion of the user job (successful or not), the second-layer sandbox manager uses the same file transfer mechanism as the one in the sandbox replication service, to transfer the aggregated output back to the physical location of the sandbox (on the head-node). Afterwards (and outside the scope of JIM sandbox), when the JIM job managers terminate the Grid job and destroy the sandbox , such output files from all the local jobs are aggregated further into what can be considered the output of the Grid job as a whole. This final file is later pulled back through the gateway back to the Grid file spooling area, and ultimately, is retrieved by the Grid job owner.

4           The Code, Distribution and Installation

The implementation physically resides in the CDCVS package jim_sandbox. Its src/python subdirectory contains the main sandbox.py file with the Sandbox class definition; additional implementation files are found in the same directory. Both the CLI and the Python API are provided for the functionalities described in the Section on design: create(), enter(), add(), package(), destroy().

The src/shell subdirectory contains the shell scripts for the packing of the bootstrap. Other scripts are used to save and restore the configuration of the (SAMGrid) packages in the form of the files which can easily be added into the sandbox and processed by the described mechanisms. The src/c subdirectory contains miscellaneous C routines, some of which probably belong in a more generic “util” package. An important utility is the “sleeper” which allows to pause the current process until the beginning time of the X509 proxy validity, which is used to compensate local clock discrepancy, which, in turn, is required by the gridFTP client. The etc subdirectory of the package contains, most importantly, the template for the second stage sandbox manager and a “half-packed” tar file with the gridFTP client. More detailed information is contained in the code itself.

The package is presently distributed via FNAL KITS. It uses, as a dependency, the jim_gridftp product and miscellaneous utilities common for many SAMGrid packages. When installing the product, the only essential configuration parameter is the physical disk location where sandboxes will be physically created as OS directories. The size of this local storage is determined by the product of the number or Grid jobs (O(10), the SAMGrid design intelligently avoids proliferation of the number of Grid jobs by structuring them appropriately) by the “typical” sandbox size (O(1GB)).

5           Status and Issues

 

The product is deemed to be reasonably stable and we do not anticipate additional significant development in the near future. It has been thoroughly refurbished and stripped of most of the unnecessary and legacy code. Perhaps the only item that was planned but postponed, due to the lack of immediate need, was the stage-two throttling of the sandbox constituents’ transfers through a mechanism such as “fcp”. (We use such control widely used at stage three for bigger files.)

 

Some of the issues remain at the design level, however.

 

 

 

 

6           Suggestions

 

Please send your suggestions (or) comments about this document to Igor Terekhov (terekhov@fnal.gov) and Gabriele Garzoglio(garzoglio@fnal.gov).

 

 

Last updated on Friday, August 27, 2004.

 



[1] We often refer to this executable as a self-extractor, inspired by the self-extracting archives used since the times of MSDOS.