Minutes of the Persistency/Schema Evolution Meeting

18 May 2000

Present:  Marc Paterno, Amber Boehnlein, Jim Kowalkowski, Wyatt Merritt (notes), Herb Greenlee, Vicky White, Scott Snyder, Qizhong Li, Harry Melanson, John Hobbs (partially)

1)  We had a brief discussion of the header vector concept.  This allows for reading a small sample of information from the event, which can be used to decide whether to read in the whole event from disk or not.  This capability exists in EVPACK, but not in DSPACK.  However, we don't yet have an implementation of the header vector.  The most basic implementation would be just run and event number, and we will definitely want this for PickEvents and similar utilities.  Other
possibilities are to include trigger and/or physics information.    ==>  we should not forget to schedule an implementation of this consistent with the SAM schedule for implementing Pick Events.

2)  We discussed how to write data tiers of events which contain subsets of event chunks.  We would like to have the flexibility to write two or more data tiers from within the same program, and not require that the second be a fully contained subset of the first (which is true if the writing package just uses the DropChunk method).  Also the current implementation of writing out subsets involves making a copy of the Event which "owns" the same chunks as the original Event.  This has the potential to cause problems for users if they inadvertently manipulate the wrong Event.   There were two proposals discussed, each of
which provides the required flexibility and avoids the Event copy.
 

Proposal A    Change Event so that it does not inherit from d0_Object;  instead it will contain another class PersistentEvent which inherits from d0_Object and contains a list of d0_refs to the persistent chunks.   The WriteEvent package would change so that it hands off the PersistentEvent to D0OM, rather than the Event.  The ReadEvent package would change so that it would use a public constructor of Event that takes a PersistentEvent as input.

Proposal B    Implement filtering for output streams using the D0_Output_Filter class, associated with each stream.  Appropriate filters for each stream would use a selection function that would need to access the Event, in order to extract tag information, as well as (perhaps?) the chunks.    The EDM interface might have to add a method to deliver tag information?
 
 
Pros for A
- removes problems with inconsistent copy of Event

- separates Event structure, as passed around in the framework, from persistency; therefore, simplifies maintenance of Event by allowing changes for efficiency, etc., without affecting persistent format and introducing backwards compatibility problems
 

Pros for B
- removes problems with inconsistent copy of Event
 
 
Cons for A
-  changes are required to any code which uses d0stream directly, for example the pileup package.  [Note that the edm meeting will also propose changes which, if ratified, will require this package to change anyway.]
 
 
Cons for B
-  leaves empty pointers in the subset data tiers;  some bloating of data (not much) and also inelegant

Result:   The DØOM group sees nothing intrinsically wrong with A, but B was envisioned as the strategy before.  There is agreement that we should move from the current implementation to either A or B.   A is preferred by the edm maintainers for the benefit of being able to make changes to the Event infrastructure more transparently.     Neither choice implies big changes for the ordinary reco developer, so this is not a high cost infrastructure change.   As noted above, choice A does imply changes to some non-infrastructure code, but that code may have to change anyhow if other proposed edm mods are adopted.   We should choose between A and B soon:  let's set a goal of two weeks, so we can get this one out of the way.  We will want to be testing data tiers in MCC Phase 3, so we need to have this capability.

3)   User controlled version control with DØOM

We discussed two much desired improvements for schema evolution:

-  infrastructure objects need to change without invalidating old data (e.g. when we changed Chunk ID, this was a problem)

-  the auto conversion mechanism in DØOM covers most cases, but we are sure to encounter important exceptions.

Scott already is thinking about how to address these new requirements.  Jim presented a prototype scheme with the basic features.    We will need to have some such mechanism in place before the September reco production release.   A good feature of the plan is that, again, it is low cost for ordinary users.  Classes for which the auto conversion is sufficient do not need to change their implementations at all.  User code only needs to use these new mechanisms from the point at which they introduce a new version which needs the user-controlled conversion.

4)  As an aside in these discussions, it was pointed out that d0cint already needs to preserve its dictionary file along with each event or group of
events from a given (raw?) file.   Since the d0cint dictionary is comparable in size to a DST event, this may have serious implications for the size of derived data sets,  which have not been taken into account in the estimates of required storage.    A further aside was the observation that it would be nice to have new timing numbers for event i/o, using EVPACK.