David Adams
12 May 1997 1940
[http://www.bonner.rice.edu/adams/event]
The bulk of the data in HEP is "event data", i.e. that which can be uniquely associated with a particular event. This event data includes the raw data read from the detector and the data obtained by processing this raw data. In an OO world it is natural to identify the event as a class whose primary responsiblity is to manage this data.
Here we present an OO model for the event. The data in the event is organized into collections of objects of common type (e.g. tracks, jets or electrons). Here we call these collections "chunks". Each chunk is responsible for managing a clearly defined set of data which is loosely coupled with the data in other chunks. Typically an event is created with raw data and then a series of reconstruction algorithms are applied. Each chunk contains a reference to a generator (aka a reconstructor) which is used to generate its data.
Some justification of this model can be found in a talk given to the D0 data model group.
Here are some requirements for this or any other model.
The model is illustrated in five figures: a data class diagram, a key class diagram, a generator class diagram, an event trace for creating a chunk and an event trace for retrieving a chunk. Physical dependencies are shown within the model and for RECO packages which make use of the event package.
The association between chunk and generator is indirect. The chunk uses the static generator manager is to fetch the concrete generator. Strong typing is lost but the method get_type() is used at run time to verify consistency. The advantage is that the coupling between the concrete chunk and generator is broken. It is possible to have an executable which fetches and manipulates a chunk and its data without linking in the code for generation of that data.
Different types of keys may be derived from the TypeKey.
These are templates with the same argument.
We immediately identify two or three within the event category.
The concrete NameKey
The template TypeKey also provides a method for promoting chunks,
i.e. converting a persistent pointer into real pointer.
The details of this conversion depend on the persistency mechanism
but this placement of the conversion allows for minimal coupling.
GeneratorManager
The generator manager has the responsibility of
managing the list of known generators.
These are constructed by the user and assigned a name when they
are registered with the manager. The pointers from chunks to
generators are implemented with these names. This eliminates the
link-time coupling between between chunks and generators. Thus it
is possible to construct a program which makes use of a chunk without
linking in the code for its generator.
Typically an event object is created each time the data acqusition system assembles the appropriate collection of detector data. This data is organized into a raw data chunk, an event is created and the chunk is stored in the event. The event is then defined by adding a series of predefind chunks. This list of chunks might depend on the type of trigger used to select the event. An event trace diagram shows how a chunk is created and assigned to an event.
If all chunks are enabled for automatic data generation, then any type of data (as defined by the chunks) may now be fetched. However, there are many chunks for which generation is rather time-consuming and it may be desirable to ensure that a collection of events are reconstructed in a consistent manner. For these reasons, a production computing farm is used to generate data which is then kept in persistent storage.
The data for a particular chunk is generated by constructing a key identifying the chunk (usually the same key is used for many events), using that key to fetch the chunk from the event and then asking the chunk to generate its data. The second event trace diagram shows how a key is used to select a chunk.
After the data has been reconstructed, the volume of data is too large to fit in an affordable system of disks, tape robots or even tape warehouses. Instead increasingly large fractions of data are maintained in each and a substantial fraction is discarded. This event model provides many features to faciltate this allocation:
Of course, these interactions do not always occur as ordered above. The process is iterative: there will be new types of data and new algorithms to generate existing types. The division of the event into pieces with well-defined interdependencies and history helps to ensure that invalid data is regenerated and valid data is not. There may also be changes in the types of data, i.e. schema evolution. This is also something that could be handled by the chunk.
The above requirements hold for both packages (e.g. the event classes or the software used for one kind of data) and for individual classes. By design the event model has no dependencies on any of the packages that will be used to access or generate data in the event.
The first figure shows the dependencies for classes within the event package. Dependencies introduced by allowing Chunks to construct their own data are shown with dashed lines. Other than these, we see that there are no cyclic dependencies except between the event and key classes. The key must make use of the event to extract chunks. The event uses keys when inserting new chunks. We have chosen to keep this small cycle rather than to complicate the model by splitting the key class to try to break the cycle.
The cycles introduced by allowing chunks to generate their data would likely be difficult or impossible to remove without giving up this capability. Note however that this cyclic behavior is confined to the event package and is not reflected in the derived classes which appear in the reconstruction packages.
The second figure shows the dependencies for a reconstruction package whose data is built from the data in one other package. We see the dependencies are very clean and that no cycles appear. The event classes are not shown in this diagram but they are no dependencies pointing back from that system.
Here are some comments and questions about the model.
Please direct questions or comments to David Adams (adams@physics.rice.edu).