From vj@bnl.gov Thu Apr 15 14:27:36 2004 Date: Thu, 15 Apr 2004 13:21:40 -0400 From: vivek jain To: Herbert Greenlee , Serban Protopopescu , Jianming Qian , rick van kooten , d0dfwg@fnal.gov Subject: responses to the common data format questionnaire hi herb, here is some feedback. apologies for not adhering to the exact format of your questionnaire. 1. What analysis data formats and analysis tools are members of your group currently using? 2. What analysis data formats or analysis tools does your group recommend to its members? 3. Do you encourage or discourage people to use tmb_tree? Why or why not? 4. How does your physics group support the efforts of analyzers? That is, does your group provide centrally managed data sets, tuples/trees, or analysis tools? Answers to (1) - (4). In the past (upto Lepton-Photon'03), almost everyone in the B group used TMBTrees. But in the last six months or so, people have been moving over to another format which is readable by the BANA package developed by Guennadi. We don't encourage/discourage people (let water find its own level), to use any particular format. Both approaches have their pros/cons. At the moment, the B group has been following two approaches: a) TMB -> TMBTrees. Here someone in the B group (mainly me) runs over the TMBfiles in SAM and makes TMBTrees. The latter are disk-resident. We have also hacked meta-data for them and put the root-trees back into SAM. These skims are inclusive in nature, e.g., the bMU skim and the 2MU skim, and are very useful for the B group (since we are still trying to understand our capabilities). For PASS2 skimming, we've asked the common sample not to make the bMU skim - too big to handle (1/3 of the data). We already have 200 pb^-1 worth on disk. People who use TMBTrees use the tools available in Ariel's d0root_* facilities. At the moment, there are three groups who are using TMBTrees. All others have been moving over to Guennadi data format/analysis package. b) TMB-> AADST. Here Guennadi et al, run on the TMBfiles in SAM and make their own files. They gave tight selection criteria to the common sample group, who has made a new skim, AA_SKIM. This is very focussed and small. As a result, this skim can only be used for analyses which have been thought of before (this is a problem). People who've used this package find it very easy to use, mainly because the author (Guennadi) is himself active in the B group. 5. Would your group benefit from the availability of common, possibly centrally produced root trees? What requirements would a common root format have to fulfill for your group to benefit? Yes. It would life easier for the person(s) who have to run d0correct on tmb-files in SAM. They should have certified objects (via d0correct). However, as I mentioned many people have moved over or are moving over to Guennadi's format/analysis package. At the moment, there are three groups of people still using TMBTrees. 6. If tmb_tree were chosen as the basis for a common format, what changes would be required to make it attractive to your group? Disk-space to store TMBTrees. Ability of d0tools to be able to submit (TMBTree reading) parallel jobs on CAB. At the moment, we have to submit jobs on clued0 (where the batch system really sucks). One member has figured how to submit jobs on CAB, but this is still one job (to read one root-tree) at a time. This is non-optimal. 7. Does your group develop algorithms in root? Should algorithm development in root be encouraged? What is the best way to allow the entire collaboration to benefit from algorithms developed in root? People develop algorithms in root and in BANA package. 8. Is there any other information that you would like to bring to the attention of the Data Format Working Group? will have to think some more. vivek