1) We had a brief discussion of the header vector concept.
This allows for reading a small sample of information from the event, which
can be used to decide whether to read in the whole event from disk or not.
This capability exists in EVPACK, but not in DSPACK. However, we
don't yet have an implementation of the header vector. The most basic
implementation would be just run and event number, and we will definitely
want this for PickEvents and similar utilities. Other
possibilities are to include trigger and/or physics information.
==> we should not forget to schedule an implementation of this consistent
with the SAM schedule for implementing Pick Events.
2) We discussed how to write data tiers of events which contain
subsets of event chunks. We would like to have the flexibility to
write two or more data tiers from within the same program, and not require
that the second be a fully contained subset of the first (which is true
if the writing package just uses the DropChunk method). Also the
current implementation of writing out subsets involves making a copy of
the Event which "owns" the same chunks as the original Event. This
has the potential to cause problems for users if they inadvertently manipulate
the wrong Event. There were two proposals discussed, each of
which provides the required flexibility and avoids the Event copy.
Proposal A Change Event so that it does not inherit from d0_Object; instead it will contain another class PersistentEvent which inherits from d0_Object and contains a list of d0_refs to the persistent chunks. The WriteEvent package would change so that it hands off the PersistentEvent to D0OM, rather than the Event. The ReadEvent package would change so that it would use a public constructor of Event that takes a PersistentEvent as input.
Proposal B Implement filtering for output streams
using the D0_Output_Filter class, associated with each stream. Appropriate
filters for each stream would use a selection function that would need
to access the Event, in order to extract tag information, as well as (perhaps?)
the chunks. The EDM interface might have to add a method
to deliver tag information?
| Pros for A
- removes problems with inconsistent copy of Event - separates Event structure, as passed around in the framework, from
persistency; therefore, simplifies maintenance of Event by allowing changes
for efficiency, etc., without affecting persistent format and introducing
backwards compatibility problems
|
Pros for B
- removes problems with inconsistent copy of Event |
| Cons for A
- changes are required to any code which uses d0stream directly, for example the pileup package. [Note that the edm meeting will also propose changes which, if ratified, will require this package to change anyway.] |
Cons for B
- leaves empty pointers in the subset data tiers; some bloating of data (not much) and also inelegant |
Result: The DØOM group sees nothing intrinsically wrong with A, but B was envisioned as the strategy before. There is agreement that we should move from the current implementation to either A or B. A is preferred by the edm maintainers for the benefit of being able to make changes to the Event infrastructure more transparently. Neither choice implies big changes for the ordinary reco developer, so this is not a high cost infrastructure change. As noted above, choice A does imply changes to some non-infrastructure code, but that code may have to change anyhow if other proposed edm mods are adopted. We should choose between A and B soon: let's set a goal of two weeks, so we can get this one out of the way. We will want to be testing data tiers in MCC Phase 3, so we need to have this capability.
3) User controlled version control with DØOM
We discussed two much desired improvements for schema evolution:
- infrastructure objects need to change without invalidating old data (e.g. when we changed Chunk ID, this was a problem)
- the auto conversion mechanism in DØOM covers most cases, but we are sure to encounter important exceptions.
Scott already is thinking about how to address these new requirements. Jim presented a prototype scheme with the basic features. We will need to have some such mechanism in place before the September reco production release. A good feature of the plan is that, again, it is low cost for ordinary users. Classes for which the auto conversion is sufficient do not need to change their implementations at all. User code only needs to use these new mechanisms from the point at which they introduce a new version which needs the user-controlled conversion.
4) As an aside in these discussions, it was pointed out that d0cint
already needs to preserve its dictionary file along with each event or
group of
events from a given (raw?) file. Since the d0cint dictionary
is comparable in size to a DST event, this may have serious implications
for the size of derived data sets, which have not been taken into
account in the estimates of required storage. A further
aside was the observation that it would be nice to have new timing numbers
for event i/o, using EVPACK.