From duflot@lal.in2p3.fr Mon Apr 12 14:52:00 2004 Date: Mon, 12 Apr 2004 20:06:20 +0200 (CEST) From: Laurent Duflot To: Herbert Greenlee , D0 Data Format Working Group Subject: Re: Common root-based data format. Hi, I'd like to take this opportunity to comment on the common root-based format, more precisely on the last questions. First let me mention that if that were only developped for analysis, that would still be a good step forward and allows people to share samples and compare results across physics groups. > 5. Would your group benefit from the availability of common, possibly > centrally produced root trees? What requirements would a common root > format have to fulfill for your group to benefit? > > 6. If tmb_tree were chosen as the basis for a common format, what > changes would be required to make it attractive to your group? I think here that speed is the most important issue if you want tmb_tree to be used directly in analyses. Size is important and probably related to speed. Many people want to put their samples on institution machine or even their laptop. One should make skimming of tmb_trees easy (selecting events, selecting branches, selecting objects is possible [not obvious with the TRefs?]). It seems that there are examples, but there are often issues with root itself. That's a problem we have to contemplate for any root-based solution: for an outsider point of view root seems to be rather unstable (not talking of the poor design and bad habits it promotes ("never delete after a new")). The interface of the tmb_tree object should be made the same as that of the physics objects as far as possible. When possible, differences should be hidden in utility methods on both sides. That should help developping code that can be more easily ported to reco. My personal experience is that it is not too difficult when dealing with each type of object separately (I use template class/methods) but it becomes difficult when connecting objects (e.g. tracks associated to jets: TRef vs LinkIndex). A redesign of the interface in both sides or a set of utility functions might help. As you may see, I have worked with tmb_tree and found them interesting but rather slow (but I don't know all the tricks), the interface to be disturbing (why invent new method names when the physics object already defined one?) and the documentation to be ... almost empty. > > 7. Does your group develop algorithms in root? Should algorithm > development in root be encouraged? What is the best way to allow the > entire collaboration to benefit from algorithms developed in root? > I'm not totally convinced by the current situation with algorithm development in root. At least in some case I know, it requires to learn another framework for analysis (maybe even switch to using it, I don't know for sure). Any developement should be "analysis framework neutral" (i.e. not specific to tmb_pro / d0root/ top_analyze etc...) and be made as close as possible to an algorithm that would be put in reco. The alternative to tmb_tree that was proposed is edmroot, i.e. chunks in root. This option should be evaluated seriously in terms of: - time scale : developers need to modify the chunks, when can we set a realistic deadline ? Most probably the shortest way is to find someone to spent 2 weeks to make the changes for ALL chunks... - size : storing chunks ("DST" chunks) after TMB unpacking wastes space, as in many chunks a sizeable fraction of the information is not available in TMB. Should we redesign the most offending chunks to be able to store the "TMB part of the chunk" ? - development : how long does it take to port back an algorithm developed on root_trees to reco ? An example is the vertexing code, it seems to me that it took a very long time, but Gordon can correct me if I'm wrong. If we have tmb_tree, what is the probablility that algorithm will be developed on tmb_tree and never be ported back to reco (or not often enough) ? We could end up running half of reco on a day to day basis to get any analysis done. - documentation : in tmb_tree there's little documentation on the object interface. edmroot has the same interface as chunks, so we have a basis to work on (idem for tmb_tree if they support the physics object interface), and a good reason to improve the chunk documentation :-) > 8. Is there any other information that you would like to bring to the > attention of the Data Format Working Group? > Whatever the solution is, there shoud be good documentation and tutorials (simple analysis, tricks to speed up analysis, skimming) and they should be maintained. Any new version of root should be tested to work with example analyses before being adopted for a release. regards, Laurent > Regards, > > The D0 Data Format Working Group >