From serban@bnl.gov Thu Apr 29 16:46:45 2004 Date: Tue, 13 Apr 2004 10:42:49 -0400 From: Serban Protopopescu To: duflot@lal.in2p3.fr Cc: Herbert Greenlee , D0 Data Format Working Group Subject: Re: Common root-based data format. [ The following text is in the "ISO-8859-1" character set. ] [ Your display is set for the "US-ASCII" character set. ] [ Some characters may be displayed incorrectly. ] Hi, The points raised by Laurent on tmb-tree are legitimate and I believe can be handled satisfactorily without extraodinary effort (but still some dedicated work will be required). Some effort was expended on uniformity, that is why TMB tree objects inherit TPhysObj which is equivalent to D0PhysObj in edm (and has the same interface). I agree it was not a good move to change the name of other access methods in TMBObj compared to edm objects, the long history behind it is not worth going over. However, it is a relatively simple and straightforward undertaking to add methods with identical names to those in edm, one student could do that in less than a week (I would not replace existing methods as it is likely to break many existing macros, at least not right away). If that is done, do we need additional documentation? Speed of tmb-tree is intermediate between ntuples and thumbnail. Judicious use of branch access can speed things up further but it will never be as fast as a simple tree (or tuple) with a small number of entries. That conversion will always be necessary for some analysis. The point of the tmb-tree is to have a complete set of information for analysis with much faster turnaround than the thumbnail. So far, example macros are available for making clones of root tree files for selected events. Macros can be made available for making clones with selected branches if that is desired. Or for making files with a totally new tree format starting from a standard format. If tmb tree is adopted as standard then it will make sense to create utility libraries of macros available to everyone. So far each analysis has created macros as needed without much attention paid to sharing information. Finally, on TRefs. This is one aspect of root trees that is definitely inferior to edm, the lack of a template like LinkIndex makes TRefs prone to misuse. My recommended solution to this problem is that users not be encouraged to use TRefs directly. Rather, objects should have methods that return a proper pointer and TRef should be used only internally. That is what is done in many objects and wherever it is not yet done methods should added. In general that is also encouraged in edm, many objects have methods that return the pointer while storing LinkIndex internally. I believe the amount of work needed to get tmb_tree into a more universally acceptable shape is much smaller than trying to get edm root going. I also have serious doubts that edm root will ever be faster or smaller that tmb tree. There is no question that additional work is needed but I believe it is manageable. Serban Laurent Duflot wrote: > Hi, > > I'd like to take this opportunity to comment on the common root-based format, >more precisely on the last questions. > > First let me mention that if that were only developped for analysis, that >would still be a good step forward and allows people to share samples and >compare results across physics groups. > > > >>5. Would your group benefit from the availability of common, possibly >>centrally produced root trees? What requirements would a common root >>format have to fulfill for your group to benefit? >> >>6. If tmb_tree were chosen as the basis for a common format, what >>changes would be required to make it attractive to your group? >> >> > > I think here that speed is the most important issue if you want tmb_tree to >be used directly in analyses. Size is important and probably related to speed. >Many people want to put their samples on institution machine or even their >laptop. > > One should make skimming of tmb_trees easy (selecting events, selecting >branches, selecting objects is possible [not obvious with the TRefs?]). It >seems that there are examples, but there are often issues with root itself. >That's a problem we have to contemplate for any root-based solution: for an >outsider point of view root seems to be rather unstable (not talking of the >poor design and bad habits it promotes ("never delete after a new")). > > The interface of the tmb_tree object should be made the same as that of the >physics objects as far as possible. When possible, differences should be hidden >in utility methods on both sides. That should help developping code that can be >more easily ported to reco. My personal experience is that it is not too >difficult when dealing with each type of object separately (I use template >class/methods) but it becomes difficult when connecting objects (e.g. tracks >associated to jets: TRef vs LinkIndex). A redesign of the interface in both >sides or a set of utility functions might help. > > > As you may see, I have worked with tmb_tree and found them interesting but >rather slow (but I don't know all the tricks), the interface to be disturbing >(why invent new method names when the physics object already defined one?) and >the documentation to be ... almost empty. > > > >>7. Does your group develop algorithms in root? Should algorithm >>development in root be encouraged? What is the best way to allow the >>entire collaboration to benefit from algorithms developed in root? >> >> >> > > I'm not totally convinced by the current situation with algorithm development >in root. At least in some case I know, it requires to learn another framework >for analysis (maybe even switch to using it, I don't know for sure). Any >developement should be "analysis framework neutral" (i.e. not specific to >tmb_pro / d0root/ top_analyze etc...) and be made as close as possible to an >algorithm that would be put in reco. > > The alternative to tmb_tree that was proposed is edmroot, i.e. chunks in >root. This option should be evaluated seriously in terms of: > > - time scale : developers need to modify the chunks, when can we set a > realistic deadline ? Most probably the shortest way is to find someone to > spent 2 weeks to make the changes for ALL chunks... > > - size : storing chunks ("DST" chunks) after TMB unpacking wastes space, as > in many chunks a sizeable fraction of the information is not available in > TMB. Should we redesign the most offending chunks to be able to store the > "TMB part of the chunk" ? > > - development : > > how long does it take to port back an algorithm developed on > root_trees to reco ? An example is the vertexing code, it seems to me that > it took a very long time, but Gordon can correct me if I'm wrong. > > If we have tmb_tree, what is the probablility that algorithm will be > developed on tmb_tree and never be ported back to reco (or not often > enough) ? We could end up running half of reco on a day to day basis to get > any analysis done. > > - documentation : in tmb_tree there's little documentation on the object > interface. edmroot has the same interface as chunks, so we have a basis > to work on (idem for tmb_tree if they support the physics object > interface), and a good reason to improve the chunk documentation :-) > > > >>8. Is there any other information that you would like to bring to the >>attention of the Data Format Working Group? >> >> >> > > Whatever the solution is, there shoud be good documentation and tutorials >(simple analysis, tricks to speed up analysis, skimming) and they should be >maintained. Any new version of root should be tested to work with example >analyses before being adopted for a release. > > > regards, > > Laurent > > > > >>Regards, >> >>The D0 Data Format Working Group >> >> >>