Date: 11/10/98

To: Run II Steering Committee

From: K. Wyatt Merritt

RE: Questions for Data Access Workshop

DØ has the following questions, and comments on previous questions, for the Data Access Workshop:

Overview questions from Marjorie:

  1. Meant to be answered by the first talk.
  2. Meant to be answered by the first talk.

Tape Handling Questions from Marjorie:

As Marjorie stated, many of these questions are more global than SAM / ENSTORE. Some of them are viewed at DØ as the province of the Global Computing Model Working Group, chaired by Paul Grannis. This group has completed the first part of its charge (related to desktop and analysis computing at Fermilab), but the second part (related to analysis at remote institutions) is deferred till next year. We are confident that the SAM / ENSTORE choices do not preclude, for example, export of those systems to reasonably equipped outside centers, but we do not view it as part of the SAM / ENSTORE project to provide for all remote analysis. Our plan for that will not be fully developed until the Grannis committee reconvenes for its second round. We view it as far higher priority for the current workshop to get the data access projects on a sound, mutually agreed footing than to solve all the issues in remote computing. The Import facility for SAM / ENSTORE is sufficient to take care of the one immediate need for connection with remote institutions, that of importing Monte Carlo data written outside Fermilab, and it has been demonstrated within the prototype. With that in mind, here is our take on these questions:

  1. University import/export : SAM provides import scripts. Universities would provide their own tape handling software to get exported data tapes read onto disk; the file management from that point could be SAM but need not be. We would bounce this to the GCM.
  2. What does this question refer to? The robot is the data archive; we have not addressed the general problem of backups, which is known to be difficult. We would defer this to a different discussion.
  3. For truly small data sets, this becomes part of the general backup problem – see above.
  4. Didn’t the choice of a flexible robot offer a way around most of that question?
  5. Specifics of this are still being worked out in SAM / ENSTORE. Of course it is a necessary piece of the system.
  6. Good question to ask.

Resource management questions from Dane and Marjorie

These are all reasonable questions to ask, and we think they are answerable in the SAM / ENSTORE framework. Many of them are condensed into our question (3), below.

File and database questions from Marjorie

These are all reasonable questions to ask. Given the very small amount of lead time that the SAM /ENSTORE crew will have after returning from SC98, we should not expect detailed answers to all of these and the resource questions by Tuesday/Wednesday. But certainly it is worthwhile to keep track of these questions and make sure the evolving documentation answers them.

DØ questions for data handling:

  1. What are the modes of access to data which the system must provide (e. g., single file access, freight train, find single events, etc.)?
  2. What information must be made available to the general user to aid in constructing an analysis project (e. g., lists of previously specified projects, lists of files available on disk, etc.)? What facilities must be provided for the steering of analysis projects (e. g., resubmission with appropriate bookkeeping after job failure, direction to a particular server as most appropriate, denial of access for unreasonable resource consumption, guarantee of access for high priority activity, etc.)? [The file/event database in SAM has been constructed with the aim of satisfying what is hoped to be the full range of answers to those questions. It evolves when the group comes up with new additions to those lists.]
  3. At which points in the system will "policies" (i.e., code to regulate the use of scarce resources) be implemented? What information is available at each of those points to be used in creating policies?