From Arnulf.Quadt@cern.ch Thu Apr 29 17:03:39 2004 Date: Thu, 15 Apr 2004 22:48:48 +0200 (CEST) From: Arnulf Quadt To: Gordon Watts Cc: Aurelio_Juste , greenlee@fnal.gov, d0dfwg@fnal.gov, quadt@fnal.gov Subject: RE: Common root-based data format. Hi Gordon, I tried to clarify those points to my email a minute ago. Our TMB bit-packing is D0-specific. ROOT-IO questions are not. kinematic fitting code is not D0-specific, the input resolution functions are ... Selection bits (as provided by the CSG) are also needed by other experiments ... Muon-isolation depends on the available detector information and is therefore D0-specific. Once I have an iso-mu bit I can used this to measure efficiencies and fake rates. That part is not D0-specific. The code to define what a isolated muon is I want to be able to develop offline and port (best without any modifications) to the D0reco. The code to measure the efficiencies I want to develope in one analysis and be able to apply to other methods without the need to rewrite the code or even rethink the method (so that we can profit from the code in WZ and vice versa). I'm sure you can find exceptions in every single of the above examples. But I hope it helps to clarify the points. Best regards, Arnulf On Wed, 14 Apr 2004, Gordon Watts wrote: > Hi Aurelio, > Thanks for all the answers! > > What do you mean by not being DZERO specific? > > For example, things like TMB Tree and other things can be built > outside the DZERO framework. Does that make them non-dzero specific? Do > you call the top_tree non-dzero specific? > > Cheers, > Gordon. > > -----Original Message----- > From: Aurelio_Juste [mailto:juste@fnal.gov] > Sent: Wednesday, April 14, 2004 6:31 AM > To: greenlee@fnal.gov; d0dfwg@fnal.gov > Cc: quadt@fnal.gov > Subject: Re: Common root-based data format. > > Dear Herb and D0DFWG members, > > please find below our replies to your questions: > > > 1. What analysis data formats and analysis tools are members of your > > group currently using? > > top_analyze is used to produce top_trees, which is the > common root-based data format used in the Top Group. > Some people analyse top_trees in the makeclass style of ROOT, > other use the top_tree reader. Some part of the single-top group > put some additional framework and software tools on top of that. > > > > 2. What analysis data formats or analysis tools does your group > > recommend to its members? > > For the sake of uniformity among all top analyses, we require > everybody to use top_trees, either using makeclass or the > top_tree > reader to analyze them. > > > > 3. Do you encourage or discourage people to use tmb_tree? Why or why > > not? > > We have discouraged the use of tmb_trees within the Top Group. > The reason is largely historical. The official data format at > D0 has been TMBs, not tmb_trees. Top_analyze was developed to > address the issue of centralized root-tuple production ensuring > the use of corrections to objects (well before the > development of d0correct), calculation of topological variables, > kinematic fitting, etc, in a completely uniform way within the > Top Group, as well as to provide fast feedback, immediate > bug-fixes and 24/7 support (by small group of experts). > > > > > 4. How does your physics group support the efforts of analyzers? > > That is, does your group provide centrally managed data sets, > > tuples/trees, or analysis tools? > > yes, top_trees have been centrally produced for the winter > conferences, both for data and MC. All the tools inside > top_analyze are common. Many other tools, at the root level > (e.g. a package to compute trigger efficiencies, etc) are also > made common to the whole group. We try to make the root level > common tools be independent of the analysis framework. > In most cases a small group of developers implement the > code and maintain/support it. > > > > 5. Would your group benefit from the availability of common, possibly > > centrally produced root trees? What requirements would a common root > > format have to fulfill for your group to benefit? > > We would certainly benefit as, based on our experience, this is > something that requires a significant amount of time and effort. > In order to REALLY benefit, we need to be able to do everything > from the centrally produced root trees: e.g. computing any > additional variables needed and add them to the trees, being > able to do very easy and fast skimming based on objects, etc > There is also the concern of having enough information: e.g. in > the past having the CalDataChunk, full trigger information, etc, > available in the top_tree has been crucial. Have the size issues > related to the tmb_trees been resolved? The interface to objects > should be made as close as possible to the TMB, so that > framework > algorithms could be run: e.g. jet finding. > As soon as we need to rerun ourselves tmb_tree production, there > is really not much advantage for us regarding the common data > format, except for the important fact that code could then be > shared with the rest of the collaboration. > An additional concern is maintenance: with top_analyze right now > we > have 24/7 coverage. If there is a problem and we need to produce > a > new tag and start rerunning top_tree production, we can do it > almost immediately. > Proper documentation is something non-negotiable for something > that is supposed to be a D0-wide data format. > > What we have learned from Moriond'04, where we provided > centrally > produced ROOT tuples for data and MC for the first time, are the > limitations of the system. Storing the files on big disks > (/rooms/... on > clued0) means we are disk space limited. Storing them in SAM > means we > are during hot conference phases IO/SAM station limited (at > least this > time where partially the CSG skimming and TMB fixing was running > in > parallel on CAB). It is worthwhile re-thinking the data handling > model > at the same time as the two issues are strongly coupled. > > > > > 6. If tmb_tree were chosen as the basis for a common format, what > > changes would be required to make it attractive to your group? > > > See reply to 5). > > > 7. Does your group develop algorithms in root? Should algorithm > > development in root be encouraged? What is the best way to allow the > > entire collaboration to benefit from algorithms developed in root? > > Current development of algorithms in root within the Top Group > is > not analysis framework independent, which is something we really > would like to see. It would be highly desirable to make it > such that it can be imported efficiently into the framework. The > best way is by having the root-based data format have the same > interface as the TMB. > > > 8. Is there any other information that you would like to bring to the > > attention of the Data Format Working Group? > > Please give proper consideration to the long term needs of the > experiment and ask yourselves (and/or the experts): are > tmb_trees > really the ultimate data format that D0 needs to do physics on > datasets in the fb^-1 range? We can make this one change in data > format but that's really the last one. We cannot afford to keep > changing every 2 years. > > Since it is crucial to have full maintainance and software > support > for a data structure and the corresponding analysis framework we > wonder > - if the way to go is EDMROOT (a recommandation/implementation > by > the D0 computing experts. If that's not the way to go why not, > what is the timescale for this project, will we have to change > data format again, when ? > - this analysis structure needs really fast turnaround time when > it > comes to debugging. So software releases cycles are too slow. > - please do not recommend/develop something D0-specific. Data > storage or packing/unpacking is one think, for data analysis a > number of packages have been developed, see for example > > http://pax.home.cern.ch/pax/paxguide/index.html > > which people are already using in the context of CMS analysis > preparation (for question please contact > Martin.Erdmann@cern.ch). > - please make sure the software can be easily used at external > sites, on small desktops and laptops without the need to > install > huge pieces of D0-software. A small tar ball might just be > acceptable. > > > We very much hope this helps. > Thanks a lot, > > Arnulf and Aurelio > -- Best regards, Arnulf ------------------------------------------------------------------- Arnulf Quadt Email: Quadt@physik.uni-bonn.de Physikalisches Institut der Tel. Fax Universitaet Bonn in Bonn: ++49-228-73 2648 3220 Nussallee 12 at CERN: ++41-22-767 2952/8123 9330 D-53115 Bonn at FNAL: 001-630-840 5440 8481 --------------------------------------------------------------------