From gwatts@phys.washington.edu Fri Apr 30 16:51:13 2004 Date: Thu, 22 Apr 2004 16:04:29 -0700 From: Gordon Watts To: mverzocc@fnal.gov Cc: d0dfwg@fnal.gov, Terry Wyatt Subject: RE: Comments from the WZ group on data format issues Hi Marco, Thanks for the answers. > Right now in the TMBtree if you want to chop off variables from the Tree >because you don't need them (and so squeeze the data size) you have >essentially >to rewrite the code. Being able to do that via RCP directives would be so >much nicer. I assume you are talking about finer control than just dropping branches here. For example, if you wanted to drop branches you could drop all the jet information at once. You would like to, say, be able to drop the jet eta from the jet information (ok, dumb example, but that is the level you are talking about). > I don't think that this is acceptable when it means that the porting >to the >framework is done several months after the availability of the algorithm >inside ROOT. If the algorithm is developed inside root, it will be "available" by definition first in root. If we have a common root format, and the translation infrastructure has been already generated, then it will be available in both the framework and root at approximately the same time. The author will, however, have to write a chunk to hold the results and (the hardest part in a scheme like this) make connections back to the parent objects. You're earlier point about how long it takes the "wrapper" to convert from TMB to this root format for use in the framework is well taken. I don't know how long it is, but I know it is tiny compared to the current running of reco. I don't, for example, know how long it is compared to the current running of d0correct. I also know it is tiny compared to the time it takes to run the root based vertexing. Cheers, Gordon. -----Original Message----- From: Marco Verzocchi [mailto:mverzocc@fnal.gov] Sent: Thursday, April 22, 2004 6:37 AM To: Gordon Watts Cc: d0dfwg@fnal.gov; Terry Wyatt Subject: Re: Comments from the WZ group on data format issues Hi Gordon Just to clarify, I am not in favour or against the existance of a common format. I personally don't care, I am not going to use it, and I believe that a lot of analyses in the WZ group will not use it because we need the full TMB++. And I don't believe that the full TMB++ will be ever made available inside a ROOT format. And if that is the case I would like to see who is going to port the full reconstruction to ROOT... What I don't accept and what I won't accept are changes to the computing model of D0, where: * algorithm development is not done inside the D0 framework or at least made available inside the D0 framework * anything which is alignment/track fitting/calibration/calorimeter reconstruction related has to work in the framework (and honestly it does have to run over O(G) events, which makes repeated usage of wrappers a huge loss of time) * buffer disks are removed from SAM to be used as NFS servers to store huge ROOT files (we have already started to do this, and this is plain stupid) * resources are driven away from improving our reconstruction software where it is mostly needed: - track reconstruction at |eta|>1.4 (think about the electron fake rate in the 1.5<|eta|<1.7 region, think about the drop in tracking efficiency) - usage of the preshower information for energy reconstruction and particle identification - identification of hard bremsstrahlung - ........................ (I'm sure the list is very long ........) >I didn't understand what you meant, Marco & Terry, by the statement >"Possibilities of reducing the branches without recompilation.". Could >you provide a little more detail? > > Right now in the TMBtree if you want to chop off variables from the Tree because you don't need them (and so squeeze the data size) you have essentially to rewrite the code. Being able to do that via RCP directives would be so much nicer. >And at one point you say: > >" The answer to this question is NO F4ING WAY. The reasons have > already been explained at point 5. Any algorithm development > which is not done inside the framework, or which is not ported > to the framework is not useful for the collaboration and should > be discouraged." > >I'm guessing that being able to run the root algorithms in the framework >is not acceptable to you. Could you explain what issues you don't like >there? > > I don't think that this is acceptable when it means that the porting to the framework is done several months after the availability of the algorithm inside ROOT. The two should become available at the same time. I am still waiting to see how long it will take before the vertexing algorithm available inside d0root will be used inside D0reco. The vertexing algorithm run in D0reco is not the same one used by some physics groups offline. I don't think that it is acceptable when it means that for running a framework executable you need to download the full ROOT release because somebody decided to use CINT in the code initialization (I had fun, real fun, porting the TMBfixer at UTA which didn't have the full installation of the D0release). And I've had a look at how the b-tagging is done.... Well don't tell me that this is a good model..... I would still like to see the b-tagging results reformatted into chunks and the appropriate links to jets created. And all of this wrapped in a package. Instead of having the user to cut&paste the code into his analysis code. > I've picked a few repeated things out of it and summarized below. Let >me know if I've missed any major themes. > >- Transparency. If there is a variable in the root file can you tell >what TMB variable(s) it came from easily? > >- Docs/Samples not sufficient at the moment for tmb_tree -- had better >be for the data format choosen. > >- Back porting of root code (selection, algorithms) is slow. > >- PMCS uses its own root format (!!) > > Yes and we are working on this, trying to make the transition to the writing physics object chunks. Some people are dreaming of having PMCS available for analysis. Without the help of the physics groups this will never happen. The fact that it uses its own ROOT format is one of the biggest limitations to a more widespread usage of PMCS. >- More people should use cvs for their analysis code, no matter it being >in framework or other. > > > Yes. I would like to know how many of the physics results D0 has produced are reproducible !!!! I suspect that the answer is NONE...... >- Common root-tuple reduces the chance of errors for fairly complex TMB >extraction code (or chance is reduced to a single point of failure). > >- Yet Another Root Tuple Processing Framework (YARTPF). > > Aren't the TopTreeReader, the TMBTree shared libraries etc examples of this. >- Needs ability to customize the final root format WZ will use the final >format. This is more than just adding or removing branches; sounds like >different variables for things like electrons, etc., would be added or >removed. > >- Who will support the common format? > >- Keep number of Shared Libraries to a minimum > > >