From landsberg@hep.brown.edu Mon Apr 12 14:38:16 2004 Date: Fri, 09 Apr 2004 12:10:21 -0400 From: Greg Landsberg To: 'Herbert Greenlee' Cc: 'Jianming' Subject: RE: Common root-based data format. Hi Herb, The effort of converging on a common root-tuple format is very important and the future of our physics analysis depends strongly on it. Let me share some of my thoughts on this, based both on my personal analysis experience and the Tower of Babylon of formats I am exposed to as a Deputy Physics Coordinator. I believe that the TMBTree is a very nice idea and with proper support I find it very useful analysis tool. However, I do not use TMBTree as the "final" analysis format for a very simple reason: it carries the entire 10,000lb gorilla of the D0 code with it. What I want to have is a straight column-wise r-tuple that I can use "unplugged" - e.g., on my Wondows laptop at 30,000 feet. My personal solution to this problem was straightforward: I wrote a simple r-tuple maker that essentially maps the TMBTree variables into a fixed r-tuple. This way, I process the data on clued0, apply preselection and d0correct at the TMBTree level, and then write out my flat root-tuple, which I can then use anywhere I want. Whether this is the best possible solution or not is for your committee to figure out, but I do want to stress the following few features that I find crucial: (a) completeness: most of the TMB/TMBTree variables should be mapped in the r-tuple; some of the "derived" variables should be added on top of this, perhaps as additional chunks. (b) scalability: user could chose which chunks to keep and which to drop, thus effectively reducing the r-tuple size by dropping what's not needed. (c) portability: the high-level format should be maximally decoupled from the d0 framework, so that people in remote institutions working on different platforms could use the r-tuple for analysis, without being tied to a particular release, etc. Hopefully all standard methods developed for this r-tuple could be standalone as well, i.e. completely self-contained in a separate portable piece of code/library. This is, perhaps, the most important requirement for the new format - if we fail to decouple it from the framework, we would lose many potential physics contributors from remote institutions that do not keep up-to-date software, as well as "professors" who do not know the framework and do not want to learn it. Please, feel free to share these notes with your committee! Hope it helps. Greg > -----Original Message----- > From: owner-d0-conveners@listserv.fnal.gov [mailto:owner-d0- > conveners@listserv.fnal.gov] On Behalf Of Herbert Greenlee > Sent: Wednesday, April 07, 2004 8:37 PM > To: d0-conveners@fnal.gov > Cc: D0 Data Format Working Group > Subject: Common root-based data format. > > Hello Conveners, > > As you know, the D0 Data Format Working Group has been formed to > review analysis data formats (tuples/trees) currently being used in D0, > and to propose and develop a common root-based format. We are now in > the information gathering stage. We are interested in your opinions. > We are asking you, the physics group conveners, to assist our group in > its efforts by answering the following questions. > > As we currently envision it, a common root format would consist of a > centrally maintained set of tools (framework packages and executables) for > producing root files from thumbnails. We also envision having a centrally > produced and managed set of root files produced from Common Sample Group > thumbnails, stored in sam and possibly pinned on disk. Where and by whom > such common root files would be produced is net yet determined. The > common root files should incorporate thumbnail fixes and certified object > corrections (d0correct). Common root files would be offered to the > physics groups as an alternative format to thumbnails for doing analysis. > > Thank you for giving this matter your thoughtful attention. Please send > replies to d0dfwg@fnal.gov. > > Here are the questions. > > 1. What analysis data formats and analysis tools are members of your > group currently using? > > 2. What analysis data formats or analysis tools does your group > recommend to its members? > > 3. Do you encourage or discourage people to use tmb_tree? Why or why > not? > > 4. How does your physics group support the efforts of analyzers? > That is, does your group provide centrally managed data sets, > tuples/trees, or analysis tools? > > 5. Would your group benefit from the availability of common, possibly > centrally produced root trees? What requirements would a common root > format have to fulfill for your group to benefit? > > 6. If tmb_tree were chosen as the basis for a common format, what > changes would be required to make it attractive to your group? > > 7. Does your group develop algorithms in root? Should algorithm > development in root be encouraged? What is the best way to allow the > entire collaboration to benefit from algorithms developed in root? > > 8. Is there any other information that you would like to bring to the > attention of the Data Format Working Group? > > Regards, > > The D0 Data Format Working Group