Report to Fermi Lab Submitted by: John Lakos and Shawn Edwards Contents ________ I. Event Data Model Structure II. Events III. Reconstructors IV. Data Chunks V. Startup Checking VI. Miscellaneous VII. Bibliography I. Event Data Model Structure _____________________________ The Event Data Model defines the core relationships in the RunII system. The components (.h, .c pairs) that make up the data model should live in one package [Lakos], hereafter called edm. The edm package is the framework that clients extend by employing subtype polymorphism, compile time parameterization, and layering. It is extremely important that the interfaces of the classes in this package be carefully designed to obviate any changes. The importance of this task cannot be understated. Changes to these interfaces will force many clients to change their code and (at the least) cause massive recompliations. Over time, it will become impossible to change these interfaces. This fundamental reality should not be viewed as negative or a symptom of a fault in the architecture. In keeping with the spirit of Meyer's Open-Closed Principle [Meyer], edm should be closed to changes (relatively early in the project) but open to extension through layering, inheritance or parameterization. We have reviewed Brent's and Marc's Reconstructor proposal as well as Dave's Generator proposal and found both to be plausible. The main difference between these proposals, as we see it, is in which objects the responsibility of producing data chunks lies. The Reconstructor proposal gives this responsibility to a reconstructor object where as the chunks are created higher in the system (main?) in the Generator proposal. The chunks then populate themselves when needed by using a Generator object. This approach can be viewed as an instance of the Proxy pattern [Gamma]. We view the relative advantages and disadvantages as follows: 1. The Generator design is more complex than the Reconstructor design. Does the added functionality, namely "on-demand" chunk regeneration, warrant the added complexity? We feel that it probably does not in this situation. The latter point should be emphasized. Your group is providing a framework for numerous clients with varying backgrounds and skill levels in object oriented design. The framework must be easy to use, *extremely* well documented, well organized, and concise (it should be as simple and safe as possible while offering the required functionality). Any additional complexity should be closely scrutinized. We believe that adaquate functionality can be achieved using the simpler Reconstructor scheme that minimizes framework complexity. 2. The Reconstructor approach is desirable when the relationships are truly fixed (it is what it is). This hard-wired approach encourages designers to explicitly state physical dependencies. 3. The Generator approach lends itself more naturally to caching, as it relates to the "on-demand" chunk regeneration discussion above. Although the Reconstructor approach may make caching chunks more difficult, this complexity is contained within code that we are assuming the core team would own. Hence we avoid pushing excessive complexity onto clients. Both designs have merit and have much in common. As can be seen the arguments for one design over the other are not necessarily compelling. Either design can be made to work as long as sound design principles are followed. But given the requirements, however, we favor the simplicity and explicitness of the Reconstructor approach. Along with this document you will find a component dependency diagram and header files that describe an event data model framework. This framework is not intended to be the final solution as there have been many details that were not addressed and many issues that we are not fully aware of. Instead this framework should be used as a guideline and be viewed as an illustration of the core component relationships with simple but sufficient interfaces. Notice that this framework is built upon aspects from both the Reconstructor and Generator proposals. As previously stated, one of your goals should be to develop a framework that is easy for clients to use. In the example framework, clients are primarily concerned with three components; edm_recon, edm_chunk, and edm_tkey. For clients to create their own reconstructor types, they must derive their types from edm_Reconstructor and are required to provide only two member functions. The first, execute, is the entry point for all the domain specific computation; the second, registerDependencies, simply registers dependencies on chunk types. To create new chunk types, clients simply derive from chunks (see below for discussion on chunks). The protocol for chunks will most definitely be more substantial than edm_Chunk, but in this design, reconstructors work with fully typed chunks and not on the base class. Reconstructors extract the proper chunks from the event by using the classes in edm_tkey and edm_handle. The edm_tkey component has four parameterizable keys; two that identify chunks by type (see discussion below) and two that identify chunks by type and any custom unary predicate. Using these keys is simple. The emphasis here is to allow clients to focus on writing physics algorithms rather than burdening them with intricate framework details. II. Events __________ An event is defined to be a collision between two physics particles. In the example framework, a particular repository holds all data associated with a given event. That is, for each event, there is a corresponding data repository. The framework provides an interface to such repositories called edm_Event (perhaps edm_EventData is a better name?). The need to separate the event interface from any implementation is crucial for several reasons. The clients of the edm framework are presumably physicists concerned with writing and executing physics algorithms. With respect to edm_Event, clients should be concerned only with adding and retrieving data from a repository in a very simple manner and nothing more. They should not be burdened with specific database and platform issues or even higher level framework issues when developing reconstructors. The edm_Event protocol class insulates users from these issues and at the same time limits client actions to those that are safe and precludes those that are undesirable. The example framework prevents users from possibly dangerous actions such as creating an event object or removing and deleting a chunk from an event. The decision of what is allowed and disallowed is obviously left to the core team's discretion. Several different event types will most likely exist in the various Fermi systems. The concrete event type used in the Level III system will most likely have different requirements from the event type used in the analysis system. Supplying a concrete event type in the framework that satisfies the need of the various systems would be very difficult and result in an error prone implementation that would inevitably be modified on a frequent basis. Furthermore a concrete event type would cause unnecessary large link times. The framework would act as an anchor, dragging productivity down and in turn inhibit testing. Instead, the core team could provide simple concrete event classes useful for users to test reconstructor code. These stand-alone test events would have no database or system dependencies but rather they could have a simple parser that reads sample data (ASCII text) files. The idea of providing test stubs has been found to be extremely useful in our experience and allows for concurrent development. +-------------+ +----------------+ +----------------+ +-------------+ | test1_event | | analysis_event | | realtime_event | | other_event | +-------------+ +----------------+ +----------------+ +-------------+ ` \ / , ` \ / , ` \ / , ` \ / , ` \ / , ` \ / , ` \ / , +-----------+ | edm_event | +-----------+ III. Reconstructors ___________________ The reconstructor protocol allows clients to tie their particular physics algorithms into the RunII system. Their reconstructor's execute function can be the entry point into whole subsystems. Clients are free to extract and put chunks from/to the event object and need only to return a status. Clients should not put and get data chunks on a fine-grained level in order to avoid building a hierarchical system. That is, the event object should not be used to pass messages or information between low-level objects and utilities. Doing so would be an abuse of the framework. This approach would lead to a system that, although levelizable, would be flat. This type of system would sacrifice type checking and would simply push problems from compile time to run time. A subsystem should (must!) form an acyclic hierarchy of components. Among the many benefits described in [Lakos], ensuring acyclic component dependencies will enable these subsystems to be (fully or partially) reused independently within other reconstructors. Every reconstructor type must be able to register (with a central registry) what chunk types it will extract from the event and what chunk types it will deposit into the event. This registration facility enables the system to check if it has been programmed to execute the proper reconstructors in the proper order. We understand that such run-time self diagnostics satisfy a critical requirement for this framework. IV. Data Chunks _______________ The example framework requires all chunk types in the system to derive from class edm_Chunk. As shown, edm_Chunk's interface is minimal and in practice would probably have a significantly larger interface (e.g., have a method id() that would return and integer uniquely identifying it within the context of an particular event). The event uses the edm_Key protocol (isMatch) to compare chunks. The framework also provides several template keys that inherit from edm_Key and provide type-safe matching. These keys are like functiods from the Standard Library except that they also load fully typed chunks into handles. The use of the chunk handles for retrieving (derived) chunk types avoids imposing error prone and unnecessary user casts [Stroustrup]. An edm_TypeKey simply returns the address of the chunk in question if the chunk is of the same type as the type with which the key was parameterized with. edm_PredKey also checks the type and can be parameterized with a unary predicate. The predicate can be extremely powerful and can act as a filter to select chunks with certain values. There are corresponding const keys for retrieving const chunks. The framework could (probably should) provide more key types. Of course, clients can define their own template or non-template key types. Grouping chunks into families may prevent changes from permeating throughout the system. edm_Chunk / \ / \ / \ abc_Chunk_v1_0 xyz_Chunk_v1_0 ^ ^ | | | | abc_Chunk_v1_5 xyz_Chunk_v2_0 / \ / \ / \ abc_Chunk_v3_1 abc_Chunk_v3_3 Suppose abc_Chunk_v1_0 had two accessor methods returning values. The physicist owning this chunk wants to add a third data member with an accessor member returning it. This new atribute may represent some physical property only some people are interested in. If the chunk is simply modified, all reconstructors that depend on it must be recompiled. Instead, the physicist could derive a new chunk type, say abc_Chunk_v1_5, that will provide this new feature only to those that want and explicitly adopt it. Those who are not interested in the new feature can continue to treat the type as an abc_Chunk_v1_0 and will not have to recompile. Subsystems can evolve at their own pace and stable subsystems can remain stable. V. Start-Up Checking ____________________ All reconstructors must have the ability to register their chunk type dependencies with a central registry. The registry can cross check these dependencies and determine if the system was programmed with the proper reconstructors and in the correct order. This check would happen before executing any reconstructors so the system could be validated off line before actually being used in the Level III system. The registration process can be excluded entirely from a program such as the off-line analysis system. This system could have the links available for an interpreted language such as tcl or Python. A user could select (via a GUI) "canned" sets of reconstructors that he/she wants to be run. The system could then link in the proper libraries dynamically and execute the algorithms with no need to rebuild an executable. VI. Miscellaneous _________________ Here are some miscellaneous issues: - In order for the RunII project to be successful, the proper organization must be in place. Some group must own the core framework package group [Lakos]. This group must have the resolve to resist unwarranted requests for changes in the low level components of the system - and you will have many. This core team must educate clients on the proper usage of the framework as well as on proper design and testing. A significant amount of mentoring must be anticipated: appropriate time and resources for this important task must be allocated. This cost is real, necessary, and must be budgeted. - Carefully investigated well-behaved Fortran subroutines can be called from within reconstructors. Third party scientific computation libraries or code factored out of the legacy system could be used. This may be difficult due to the typical reliance on global data and on the infamous Common Block, but it may be worth while for certain routines that take advantage of basic vector computation or are difficult to rewrite. The core team should be involved in deciding which routines are viable candidates for reuse. Please note that the task of factoring out code from legacy systems into well defined, self contained routines is often much more difficult than rewriting it from scratch. - Improper implementations of the registerDependencies function will suborn the self checking properties of the framework. Proper testing of these functions will be required of all reconstructors. If possible, the core team should offer utilities to users to make testing of these functions easy. VII. Bibliography _________________ Gamma Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides, 1995, Desing Patterns, Elements of Reusable Object-Oriented Software, Addison-Wesley, Reading, Massachusetts. Lakos Lakos, John, 1996, Large-Scale C++ Software Design, Addison-Wesley, Reading, Massachusetts. Meyer Meyer, Bertrand, 1988, Object-Oriented Software Construction, Prentice Hall, Englewood Cliffs, NJ. Stroustrup Stroustrup, Bjarne, 1994, The Design and Evolution of C+, Addison-Wesley, Reading, Massachusetts. John Lakos (will forward new email) Shawn Edwards (will forward new email)