From suyong@fnal.gov Mon May 3 16:47:49 2004 Date: Mon, 03 May 2004 12:11:46 -0500 From: Suyong Choi To: d0dfwg@fnal.gov Subject: summary of answers to Dfwg survey - Higgs group Hi, Here is the summary from the Higgs group. Regards Suyong 1. What analysis data formats and analysis tools are members of your group currently using? > Higgs group use various formats Athena, higgs_skim, higgs_multijet, tmb_tree and top_tree tuple makers all with d0correct applied. Except for TMB_tree, others are non-object format root-tuples. 2. What analysis data formats or analysis tools does your group recommend to its members? > We don't make recommendations. Subgroup leaders may suggest some format for which they already have analysis code ready. Analyzers are encouraged to check their results against those obtained by others using different formats. 3. Do you encourage or discourage people to use tmb_tree? Why or why not? > We do not encourage or discourage tmb_trees. This is a personal preference mostly. Some people don't like to use objects and/or find it cumbersome to use. Other formats are smaller, faster, easier to modify, and easy to analyze both at the root command line and in standalone programs. 4. How does your physics group support the efforts of analyzers? That is, does your group provide centrally managed data sets, tuples/trees, or analysis tools? > We use Common Sample Group's skims. Each subgroup makes the tuples. Datasets and analysis tools are provided for the Athena and higgs_skim format. 5. Would your group benefit from the availability of common, possibly centrally produced root trees? What requirements would a common root format have to fulfill for your group to benefit? > We would certainly benefit from a centrally produced tuples, eliminating the need for us to support our own format and generate our own samples. The requirements are: 1. It contains most of the tmb_tree content in a few kilobytes/event. 2. A standalone program to analyze the format can be linked within a few seconds or less. In other words, it shouldn't depend on a huge amount of code and d0 environment. 3. It can be read fast. Quantities that are computationally intensive to compute should be calculated on demand rather than in streamers. 4. It should be easy to strip events, trim branches, and add user specific branches geared toward particular analysis without writing a new class. 5. It probably is a good idea to keep the common tuple in SAM system so that access to tuple is consistent to other data sets and also accessible from remote. If it's too big or slow or it takes forever to link, we'll want to continue making the current root tuples and the benefit will be lost. 6. If tmb_tree were chosen as the basis for a common format, what changes would be required to make it attractive to your group? > At least a clear documentation of all the methods without too much navigating should be available. Also, It should be a lot smaller. The tmb_tree takes about 20kB/event, much of which is redundant. The tmb_tree track object, for example, uses 272 bytes/track while the tmb uses 44 bytes/track. Other roottuple formats fit essentially the same information into 3.5kB/event and could be made still smaller. The small format allows large data and MC samples (including the complete 1EMloose, 1MUloose, and QCD moriond skims) to be kept on a single workstation. This speeds up the analysis cycle. 7. Does your group develop algorithms in root? Should algorithm development in root be encouraged? What is the best way to allow the entire collaboration to benefit from algorithms developed in root? > We currently do not develop algorithms in ROOT. That being said, the major improvements to physics in the past couple of years came from algorithms developed and optimized outside of d0 framework environment, e.g. tracking and b-tagging. Due to the slowness of working in d0 environment (linking, running, and debugging), algorithm development outside the framework is unavoidable. However, algorithm development (using ROOT) should be done carefully, especially the design of classes and packages, with assistance from true software experts to make it simple and portable. It can be written so that it is not tied to any specific format. 8. Is there any other information that you would like to bring to the attention of the Data Format Working Group? > The root-tuple maker should directly use D0 already existing packages/code, e.g. d0correct, metreco,... not re-code these in the root-tuple maker itself, to avoid more chances for mistakes.