From gwatts@phys.washington.edu Fri Apr 30 16:27:19 2004 Date: Tue, 20 Apr 2004 01:35:34 -0700 From: Gordon Watts To: Alex Melnitchouk , d0dfwg@fnal.gov Subject: RE: comments on the future data format [ The following text is in the "iso-8859-1" character set. ] [ Your display is set for the "US-ASCII" character set. ] [ Some characters may be displayed incorrectly. ] Hi Alex, BTW, I suspect it is best to switch off formatted email when sending email to a large group of people; many people use pine and, being stuck in the 1950's, can't read formatted email as the rest of us do. :-) 1. Cool! 2. Elemer, Eric, and Serban have been a huge help to getting tmbtree going in DZERO. Ariel as well, when it comes to implementing bid. Good people and constant support will be needed with whatever this new format is if it is to succeed. 3. As far as I know, the samples available can be used both ways. I've certainly done that with the various samples (used .L file.C+ or similar to force root to build it). But perhaps there should be more documentation along those lines. I do agree that using the compiler is just better in the end! --> I suppose there are two issues here. One is choosing a data format, the second is how you support it and what extra samples and features you have. To a large extent, as long as you stick to root, I suspect these two are, for the most part, independent. 4. I've recently seen a rather cool idea -- you can build a version of root with extra classes linked in. One could then make a new version of root that had the root-data-format-objects linked in. Typing "d0root" or similar, and you'd have TMBJet (or whatever). 5. Depending on how you do you analysis this is either easy or it isn't. If you are totally object based (as are the tmb trees for the most part), this isn't too hard. In fact, you can write code that has _no_ branch names, and then ask for particular TClonesArray's or single objects from the tree at runtime. Root is flexible enough to only read them in when you ask for them. I've done written code for one of my frameworks that does this already, so I know it can be done. However, while to get a good speedup as long as you don't need tracks, it is some pretty hairy coding. 6. I think you are talking about a FAQ or a better search engine for the d0rug mailing list here. :-) 7. Us too! Cheers, Gordon. ________________________________________ From: Alex Melnitchouk [mailto:melnit@fnal.gov] Sent: Sunday, April 18, 2004 8:50 PM To: d0dfwg@fnal.gov Cc: Alex Melnitchouk Subject: comments on the future data format Dear data format working group,   I have 7 comments:   1. First of all -- JUST ONE ROOT BASED format is a great idea !   2. I have been doing tmb_tree based analysis and have been very pleased both with this particular format and with help I was receiving from Elemer, Eric, and Serban whenever i needed it. I wouldn't mind if the new format will be similar to this one.   3. If it is indeed going to be smth similar to tmb_trees, I would express an opinion that, even though it is ROOT based, it's usage (and then official code examples / support)  does not need to be interpreter-oriented. I prefer to use it with a compiler and would like the support/documentaion to be compile-mode oriented instead. So far I was borrowing from some *unofficial* examples from some users and then passing those to other users. It was not a big deal but I thought why not have it as smth official/standard (vs. made and distributed by individual users ) since many analyzers -- to the best of my knowledge -- are working in the compile mode anyway. The benefits will be increasing as datasets grow too. Besides,  if this approach would allow to build in the language requirements more strict than those that ROOT generally imposes -- i think this would definitely be a big plus and will eventually pay off in better understanding of our data.   4. I am certainly not that familiar with the tmb_tree internals and peripherals, but  as a user I would suggest the following simplification  for the stage of preparation of common shared library (in case the tmb_tree-like approach will be followed for the data format to be decided on ) : instead of this: -------------------------------------------------------------------------------------- 1). check with which tmb_tree and tmb_analyze package versions     d0correct executable that produced the tmb_tree file to be analyzed was built. 2). create a release area 3). addpkg tmb_tree and tmb_analyze of those specific versions 4). ln -s tmb_analyze/macros macros 5). build shared library as instructed in tmb_analyze/macros/README.txt 6). do analysis 7). in case new d0correct version came up -- repeat steps 1) through 6) ------------------------------------------------------------------------------------- have this: ---------------------------------------------------------------------------------- 1). copy(of just include) already existing (build centrally (just once) with proper versions) shared library to my working area which does not have to be a release area in this case 2). do analysis 3). in case new d0correct version came up -- repeat steps 1) through 2) ----------------------------------------------------------------------------------- There is a stretch in mentioning not having to create a release area: one eventually would still need the luminosity package, and, consequently, release area. On the other hand, if the luminosity software (i'm talking specifically about the software that all users are using in physics analyses to identify good lbns)  could possibly be decoupled from the framework too (and even encouraged to be decoupled by nature/properties of the future data format to be decided on)  it would be quite advantageous i think.   5.  Turning on/off specific branches to read : having this option available, user-friendly, and explained in the documentation in a way that would be transparent  to any novice -- would be great.   6. it would be more efficient if basic questions that many users may have about the tuples/trees(using them) would be answered in advance on the webpage and naturally taken off the d0rug or/and private user-author email exchanges, (e.g. -- in which coordinate system that particular variable was calculated, or -- what was the pt threshold when counting clusters or -- if tuples/trees were produced with certain release version, in which release should one analyze them)  i hope, since it would be common D0 format it will be easier to maintain good up-to-date documentaion.   7. I'll be looking forward to seeing this new format and doing analysis with it !   Regards,   Alex