From serban@bnl.gov Thu Apr 29 17:04:47 2004 Date: Thu, 15 Apr 2004 17:28:03 -0400 From: Serban Protopopescu To: Arnulf Quadt Cc: Aurelio_Juste , greenlee@fnal.gov, d0dfwg@fnal.gov, quadt@fnal.gov Subject: Re: Common root-based data format. [ The following text is in the "ISO-8859-1" character set. ] [ Your display is set for the "US-ASCII" character set. ] [ Some characters may be displayed incorrectly. ] Hi Arnulf, I think the tmb root tree satisfies your requirements of non-D0 specific in the following sense: 1) It is dependent only on 2 D0 libraries (so far) which are not coupled to any other D0 library, namely tmb_tree and kinem_util. So there is no edm or d0om code of any kind. This is something we are very conscious of and strictly enforce. 2) The TMB classes use or inherit only root specific classes (like TObject , TRef, TLorentz) So no D0 specific code is needed to use the files. Making files with that format is a very different proposition but we all understand that. Serban Arnulf Quadt wrote: >Dear Serban, > > with `not be D0 specific' we were concerned about two points: > >- the new as any data format requires service and maintenance (by you > or some D0 offline group ?). Maintenance is for us one of the most > important points on the data format question. If you keep the > D0-specific part (content of databases, calibration, un-/packing of > variables ...) of the data format as small as possible the expected > required man power on maintenance/service is minimal, i.e. the > stability and maintenance optimal. > > ROOT is an acceptable format because many people in many experiments > use it and debug it ... > >- Once we know what a jet, a CAL cluster and track, hit ... is then the > (presumably) ROOT-based analysis code which exploits information > on inheritance, pointers ... does track refitting, vertex refitting, > maybe re-does jet-finding, kinematic ttbar fitting ... i.e. the actual > analysis algorithms does not depend on D0 anymore. > > If some HEP-wide used code exist (for example PAX) to do those parts of > the analysis we want to use that. Similarly if some analysis code exists > to select Z->ll or other physics objects or processes we want to use > that (like WZ_reco ...). > > What does CDF do on this part ? What do CMS and ATLAS do for that ? > If there is anything we can share with those experiments it will > minimize D0-manpower to be invested here. And it will make it easier > to port D0-analyses to the LHC, i.e. make it more attractive to > do this work. > >I'm sure you are well aware of those aspects and have already taken >them into account. > >I hope this helps, > > Arnulf > > > > > > >>Hi, >> >>One point raised by Aurelio is that analysis files not be D0 specific. >>That of course is impossible with edm root. We would be using all the >>container libraries in D0. The tmb tree on the other hand has everything >>that is needed (for reading) in two libraries: kinem_util and tmb_tree. >> >>The question of documentation needs to be answered in a general way: >>Is the documentation for chunks adequate? If yes, is having the identical >>names for methods in the tree returning the same information sufficient? >> >> >> Serban >> >>Aurelio_Juste wrote: >> >> >> >>> Dear Herb and D0DFWG members, >>> >>> please find below our replies to your questions: >>> >>> >>> >>> >>> >>>>1. What analysis data formats and analysis tools are members of your >>>>group currently using? >>>> >>>> >>>> >>>> >>> top_analyze is used to produce top_trees, which is the >>> common root-based data format used in the Top Group. >>> Some people analyse top_trees in the makeclass style of ROOT, >>> other use the top_tree reader. Some part of the single-top group >>> put some additional framework and software tools on top of that. >>> >>> >>> >>> >>>>2. What analysis data formats or analysis tools does your group >>>>recommend to its members? >>>> >>>> >>>> >>>> >>> For the sake of uniformity among all top analyses, we require >>> everybody to use top_trees, either using makeclass or the top_tree >>> reader to analyze them. >>> >>> >>> >>> >>>>3. Do you encourage or discourage people to use tmb_tree? Why or why >>>>not? >>>> >>>> >>>> >>>> >>> We have discouraged the use of tmb_trees within the Top Group. >>> The reason is largely historical. The official data format at >>> D0 has been TMBs, not tmb_trees. Top_analyze was developed to >>> address the issue of centralized root-tuple production ensuring >>> the use of corrections to objects (well before the >>> development of d0correct), calculation of topological variables, >>> kinematic fitting, etc, in a completely uniform way within the >>> Top Group, as well as to provide fast feedback, immediate >>> bug-fixes and 24/7 support (by small group of experts). >>> >>> >>> >>> >>> >>>>4. How does your physics group support the efforts of analyzers? >>>>That is, does your group provide centrally managed data sets, >>>>tuples/trees, or analysis tools? >>>> >>>> >>>> >>>> >>> yes, top_trees have been centrally produced for the winter >>> conferences, both for data and MC. All the tools inside >>> top_analyze are common. Many other tools, at the root level >>> (e.g. a package to compute trigger efficiencies, etc) are also >>> made common to the whole group. We try to make the root level >>> common tools be independent of the analysis framework. >>> In most cases a small group of developers implement the >>> code and maintain/support it. >>> >>> >>> >>> >>>>5. Would your group benefit from the availability of common, possibly >>>>centrally produced root trees? What requirements would a common root >>>>format have to fulfill for your group to benefit? >>>> >>>> >>>> >>>> >>> We would certainly benefit as, based on our experience, this is >>> something that requires a significant amount of time and effort. >>> In order to REALLY benefit, we need to be able to do everything >>> from the centrally produced root trees: e.g. computing any >>> additional variables needed and add them to the trees, being >>> able to do very easy and fast skimming based on objects, etc >>> There is also the concern of having enough information: e.g. in >>> the past having the CalDataChunk, full trigger information, etc, >>> available in the top_tree has been crucial. Have the size issues >>> related to the tmb_trees been resolved? The interface to objects >>> should be made as close as possible to the TMB, so that framework >>> algorithms could be run: e.g. jet finding. >>> As soon as we need to rerun ourselves tmb_tree production, there >>> is really not much advantage for us regarding the common data >>> format, except for the important fact that code could then be >>> shared with the rest of the collaboration. >>> An additional concern is maintenance: with top_analyze right now we >>> have 24/7 coverage. If there is a problem and we need to produce a >>> new tag and start rerunning top_tree production, we can do it >>> almost immediately. >>> Proper documentation is something non-negotiable for something >>> that is supposed to be a D0-wide data format. >>> >>> What we have learned from Moriond'04, where we provided centrally >>> produced ROOT tuples for data and MC for the first time, are the >>> limitations of the system. Storing the files on big disks (/rooms/... on >>> clued0) means we are disk space limited. Storing them in SAM means we >>> are during hot conference phases IO/SAM station limited (at least this >>> time where partially the CSG skimming and TMB fixing was running in >>> parallel on CAB). It is worthwhile re-thinking the data handling model >>> at the same time as the two issues are strongly coupled. >>> >>> >>> >>> >>> >>>>6. If tmb_tree were chosen as the basis for a common format, what >>>>changes would be required to make it attractive to your group? >>>> >>>> >>>> >>>> >>>> >>> See reply to 5). >>> >>> >>> >>> >>> >>>>7. Does your group develop algorithms in root? Should algorithm >>>>development in root be encouraged? What is the best way to allow the >>>>entire collaboration to benefit from algorithms developed in root? >>>> >>>> >>>> >>>> >>> Current development of algorithms in root within the Top Group is >>> not analysis framework independent, which is something we really >>> would like to see. It would be highly desirable to make it >>> such that it can be imported efficiently into the framework. The >>> best way is by having the root-based data format have the same >>> interface as the TMB. >>> >>> >>> >>> >>> >>>>8. Is there any other information that you would like to bring to the >>>>attention of the Data Format Working Group? >>>> >>>> >>>> >>>> >>> Please give proper consideration to the long term needs of the >>> experiment and ask yourselves (and/or the experts): are tmb_trees >>> really the ultimate data format that D0 needs to do physics on >>> datasets in the fb^-1 range? We can make this one change in data >>> format but that's really the last one. We cannot afford to keep >>> changing every 2 years. >>> >>> Since it is crucial to have full maintainance and software support >>> for a data structure and the corresponding analysis framework we >>> wonder >>> - if the way to go is EDMROOT (a recommandation/implementation by >>> the D0 computing experts. If that's not the way to go why not, >>> what is the timescale for this project, will we have to change >>> data format again, when ? >>> - this analysis structure needs really fast turnaround time when it >>> comes to debugging. So software releases cycles are too slow. >>> - please do not recommend/develop something D0-specific. Data >>> storage or packing/unpacking is one think, for data analysis a >>> number of packages have been developed, see for example >>> >>> http://pax.home.cern.ch/pax/paxguide/index.html >>> >>> which people are already using in the context of CMS analysis >>> preparation (for question please contact Martin.Erdmann@cern.ch). >>> - please make sure the software can be easily used at external >>> sites, on small desktops and laptops without the need to install >>> huge pieces of D0-software. A small tar ball might just be >>> acceptable. >>> >>> >>> We very much hope this helps. >>> Thanks a lot, >>> >>> Arnulf and Aurelio >>> >>> >>> >>> >> >> > > >