Monte Carlo Data Flow

David Adams

August 12, 1996

I have been trying to understand how GEANT, Pythia, Isajet and other legacy FORTRAN tools will be integrated into the an OO (object-oriented) environment. My ideal solution is shown in option 1 . Intrinsically OO subsystems are labeled as such. The objects in these systems (geometry, magnetic field, Monte Carlo tracks, hits, digitizations and general reco) serve as the primary data stores. A persistent storage system is shown and each of these subsystems can save its data and rebuild itself from this store. The legacy non-OO systems are included through OO wrappers which enable communicate directly with these OO subsystems. Note that in these diagrams, arrows indicate the path and direction of data flow. The only access to persistent store is by objects which are storing or restoring themselves.

At the August D0 upgrade meeting, it was proposed that D0 organize its Monte Carlo along as CMS has done in its current simulation framework. The geometry data is stored in Zebra title cards, tracks are input through ntuples and hits are stored in GEANT Zebra banks. Option 2 shows one way of organizing the data flow in this case. The legacy code no longer makes use of OO wrappers and the Monte Carlo generation process can be sensibly implemented in FORTRAN. Both the disadvantages and the benefits of an object oriented design are eliminated.

I assume that in the reconstruction process we will want to move to an OO framework and access to the data will be through the subsystems described above. This imposes the requirement that the subsytems must exist (either in memory or persistent store). This is provided automatically in option 1 but imposes addition demands on option 2.

I have assumed the user creates the geometry and field subsystems idependent of their parallel title card store. One could also introduce another store which fills both of these or introduce a mixed system where the title cards or objects are filled from one another. Note that if the OO subsystems are viewed as the primary store, we are moving in the direction of option 1.

I have assumed that the GEANT framework stores the event data (tracks and hits) directly in DSPACK to aviod interfacing with C++. One could introduce another store such as keeping the information in Zebra. I have also assumed that the digitization takes place in the OO framework but it it is easy to see how it can be moved back into GEANT.

We do not want to give up all legacy non-OO code and there will be some mixing between OO and non-OO systems. In option 1, the burden was put on the non-OO systems to appear OO. In option 2, the OO and non-OO systems are decoupled by duplicating geometry and field information and using DSPACK to convert non-OO event data to OO. Option 3 shows the possibility of putting the burden on the OO system. It is required to produce the title card file and to be able read Monte Carlo track and GEANT hit data from ZEBRA.

My personal preference is for option 1 because it maintains a consistent view. The legacy systems appear as any other OO system and the view of the data (persistent objects) is uniform. The other options have the "advantage" of insulating Monte Carlo users from the OO environment and no doubt will have some adherents. In any case, we should make a decision rather than fall into one.

This page is http://www.bonner.rice.edu/adams/mcdataflow/. Please direct any comments or questions to adams@physics.rice.edu .