Computing & Software Status Report

September 2002

 

RECO

LEVEL 3

TRIGGER SIMULATION

INFRASTRUCTURE

ANALYSIS TOOLS

ONLINE

SIMULATION

GRAPHICS

DATABASES

 

 

 

 

RECO status 

Harry Melanson

Status of releases

 

Recent progress:

Major outstanding problems / issues:

Ongoing projects:

 

 

 

September 2002 Level3 Monthly Report

Dan Claes, October 1, 2002

 

 

L3 analysis_example and chunk documentation..................Erich Varnes

Erich has prepared "user"-level analysis examples and documentation of physics objects for L3. He has documented the access methods for all quantities in the L3 physics_results objects for the current global

trigger list. A simple stripped down user function that loops over each of the L3 physics_results objects provides an example of using these access methods.

 

Thumbnail (p13)..........................................Peter Tamburello

The infrastructure to store the base class physics_result information for is ready for each tool except L3FMuon (physics_results are not filled or at least not persistent), Missing ET (awaiting the description of its physics_results), and the standalone track filter (which needs a wrapper producing physics_results). Quality information on candidates will be stored when tool authors supply a description of the access methods of the quantities needed. Physics_results information not directly stored in the L3Chunk is re-created from more elementary information when required. This will not be feasible when the more elementary information is unavailable on the thumbnail (e.g. electron transverse shower shape variables).

 

Generic trigger list for Monte Carlo production...............Terry Wyatt

In anticipation integrating trigsim as part of standard MC production on the offsite farms, a generic trigger list exercising all tools at low threshold and producing physics objects at each trigger level is being generated.

 

Update on timing studies...........................................Han Do

Tabulated exclusive and inclusive total CPU times for function calls to code. However, need number of calls made to each tool/filter. Detailed break down needed of where time is used in the calorimeter tools.

 

calorimeter unpacker..........................................Marumi Kado

Zero suppression has been implemented with a rcp-selectable threshold.The pedestal/sigma file needed consists of two arrays of 55k integers. Calibration runs update this every few days, though Offline uses a single file for long periods. How do we determine when it must be updated for L3? Zero suppression has little impact on unpacking time because of the overhead in reading the pedestal files and applying the threshold. Time should be saved in clustering, but this has not yet been shown. Reorganized/simplified code which may provide speedup has proceded in a separate development version (independent of the zero suppression mods) and will be released in p13.

 

L3 tracking efficiency studies using Z->ee events.........Daniel Whiteson

Within geometrical acceptance efficiency ~80% axial, 70% axial+stereo. If cft has at least 7/8 axial hits within road around expected electron track path efficiency 94%.

 

The road algorithm for electrons in L3..................Florian Beaudette

Offline, for electrons within jets, the road method has demonstrated significantly better efficiency/background rejection than the simple cone-based electron finder. To simplify implementation and maintenance, a new stand-alone package, "calroad" identifies those cells within an extrapolated path. Most of the L3-specific code (finding the Level-3 energies of these cells) have been has been implemented by Volker in p12 and tested with J/Psi -> electron candidates selected from data. Timing and memory consumption test still need to be run.

 

L3 Missing ET....................................Lee Sawyer, Markus Klute

Example filters and triggers have been put into the trigger database. Released into t02.35 are the implementation of eta_max, a bugfix to apply the E cut to energy (not ET), and a correction to the phi calculation for

cell-based missing ET. Negative cells be currently included. Should they be? Algorithm selection and threshold should probably appear directly in the trigger list rather under rcp control. The application of a muon correction should probably be made by a filter that also requires a muon. L3 vs reco comparisons need to be re-done with a reco version correctly applying zero suppression and the efficiency should be measured on a sample of offline selected W->enu events. The effectiveness of L3 NADA should be demonstrated on a run with known calorimeter hot towers.

 

vertex stability...........................................Michiel Sanders

Beam spot position and angle (from vertex_examine) are sufficiently stable over a fill for impact parameter b-tagging, which does not need to await the availability of primary vertex tool. The beam spot should also prove a useful constraint event by event in determining the primary vertex. A very short run at the beginning of each fill could provide the current beam spot. About 5000 tracks or 500 events are neccessary, which takes 5-10 minutes in vertex_examine (or a few seconds in L3?). The beamspot could be downloaded through COOR along with the trigger list.

 

L3 Muon.................................Martin Wegner, Martijn Mulders

Martijn is running the routine L3Muon testing of a number of offline code changes that have gon einto p12. Martin is studying the special run (L3Muon with central track-match) taken xxxx and reports that so far the performance looks fine. Full certification continues.

 

CFT tracking..............................................Ray Beuselinck

A significant improvement in cputime is reported for the reorganised stereo code. Work on improving the axial tracking is in progress.

 

Bugzilla........................................................Josh Dyer

The Bugzilla utility documents and tracks the history of bugs reported in d0trigsim. It prompts each user to supply the information necessary for efficient bug tracking. When bugs are generated during aq release, it automatically sends email to the relevant tool authors (to Angela Bellavance and Dugan O'Neil when a specific author cannot be identified). Additional documententation suggested were the range of releases in which a particular bug persisted and a list of bug fixes for each release.

 

 

Farm Status

FNAL Farms Status Report for September
Mike Diesburg, October 2, 2002


Farms are currently running p11.12.01. We are currently processing only about one third of the raw data due to speed of current version of d0reco. Current processsing rate is ~500K events/day.

New farm nodes have arrived and are being installed in FCC. Two minor problems were encountered when the nodes were delivered. The racks did not match the specifications in the bid for physical construction and some of the power supplies in the racks drew higher than expected current. Both problems have been addressed by the vendor and both racks and power supplies are being replaced now.

We have run production code on the first 40 of the new nodes and checked output against that produced on older nodes. All looks well at this time. New nodes are performing as expected in  terms of cpu speed.

We expect the full set of new nodes to be turned over to us sometime in mid to late October.

 

 

Trigger simulation

Dugan O'Neil, October 2, 2002

 

We are currently certifying p12.04.00. All plots are available for users and developers' comment. p12.03.00 had a couple of online-only errors in L2 and L3 which prevented it from being run online. p12.04.00 has fixes which should allow it to run online (they have already been tested in the online environment). There is a bug in Heptuple reported by Gordon Watts. Gordon also told us how to fix it. However, patching Heptuple is complicated. In the meantime any d0trigsim run which includes certain L3 analyze packages will crash. L3 unpacker branches in the ntuple must be turned off. p13 is to be based on t02.35.00. Several major improvements are either fully or partially implemented. We expect more than just bug-fixes between p13.00.00 and p13.01.00. However, there are no comilation or linking errors in recent test releases and some testing has been done.

 

Infrastructure/Code Management Status: Sept 2002

Alan Jonckheere

 

sections: Code Management

Releases

gcc

Build System changes

Resources

Major Infrastructure products

RCP

EDM

 

**** Code Management

** Test Releases General:

We are routinely doing only debug builds on IRIX and RH7. We are not doing RH6 at all. We are doing a maxopt build at the end of the week. We are also doing a gcc 3.1 build at the end of the week.

** Production Releases General:

We are routinely doing RH6, RH7 and IRIX, debug and maxopt, (6 builds) per release only for the p11 series. We are not doing p12 on RH6 at all.

** "t" releases:

We are currently building t02.35.00 which is to become the basis of the p13 production releases. The "t" releases are better than they have been for a long time due to the concentrated attention that Harry Melanson has been giving them. But they are still a *long* way from being "clean". There are now (as of this week) only a few packages broken up to the LIB phase, so other packages can build. But there are a lot of failures in BIN and even more in TEST.

** "p" releases:

We continue to do both the p11 series as well as p12 and will be cutting p13 in the next few days. p11 is running on both the L3 and reco farms (p11.10.01 online, p11.12.01 reco). We have a new one, p11.13.00 with corrections for trigsim, ready to freeze. p12 will never have a usable D0Reco. But it *will* be useful for L3 (Filters and DAQ), for the Monte Carlos and for the online examines. For L3 it includes, in part streaming support. The milestone to put it onto the L3 farms is 7 Oct. The Monte Carlo in p12 includes the latest, corrected detector geometry and (I think) support for FPD and the luminosity monitors. The online examines are being upgraded to use p12 as well. This is a long overdue change. They have been lagging very far behind, being passive consumers rather than pushing for developments. p13 will be cut in the next few days from t02.35.00

**** Build system changes:

Recently Paul has modified ctbuild and the LIBDEPS system to automatically generate the LIBDEPS information. Basically it automatically decides what packages need to go onto the link line by analysing the actual calls in the libraries and object modules. This has decreased by about a factor of two the memory needed to link D0Reco. This has dropped it down enough that it can be linked on most of our Linux boxes without swapping. This has dramatically decreased the time needed to link on many machines, and has improved the time substantially on all of them.

**** Build Resources:

Build Machines

The major resource problem we are having these days is that build times are becoming unacceptable. Builds on d0mino have taken as long as two days (44:20 hours) and almost 36 hours on d0lomite (RH7). This is with at most one other build occuring at the same time. This has gone up from about 8 hours on d0mino about a year ago when we first installed the parallel build system. The growth in time has been fairly slow but steady. It can not be attributed to an increase in packages or anything like that. But it does pretty much parallel the use of d0mino. However, there should be plenty of resources available on d0mino. So this is not well understood. The build times on the Linux boxes are pretty much what we expect. D0lomite (RH7.1, 750MHz, 8 processor) is only a little faster than d0lxbld4 (RH6.2, 500MHz, 4 processor) because it's using disks nfs mounted from d02ka. This costs about a factor of two. However, we would like to keep this arrangement because in this configuration the builds are instantly available to the entire Linux world at FNAL, ClueD0, in particular. One other problem that we've run into: we simply can not do two production releases and a t release in one week. d0lomite, the RH7 build machine just can't handle it.

Disks

On /d0dist/dist/ we have:

 

d0mino    

205GB 3xstripped and shadowed main set

137GB 2x stripped secondary set

36BG for tarfiles the main/secondary sets keep the two sorts of builds from interfering and at least one from interfering with the users

 

d0lxbld4

100GB RH6.1 served to d0lxbld1 (very few releases there)

 

d0lomite

262GB on d02ka but still have 215GB locally if we need it. Using the nfs disks slows the builds a lot (30% at least) but it makes the builds available immediately. So far, this is judged to be more important than speed. d02ka 262GB (same disk as above) RH7.1 builds served to clued0 etc

 

**** Major Infrastructure products

Nothing new to report.

 

 

Analysis Tools group status

Wyatt Merritt

Working through task list for targetted streaming implementation Oct.7 

Current holdup: 

further online testing is waiting on G. Watts' fix for L3 supervisor.  Some concern whether Michael will finish equipping online and offline luminosity reporting for 2 streams by the requested date.   Also some concern about getting p12 to the L3 farm on time -  work is ongoing but finding a problem in the filters during certification would delay this.  

-  Expecting physics group definition of the 2-stream contents.  Sense of  the management at the ADM discussion (9/27) was that physics groups should provide 10, 20, and 30% scenarios.   Recap of considerations:

-  a stream split of some size (could use data from Stu here) is required for online disk performance reasons to permit data logging rate to occur at full rate. The translation to Hz depends on event size.

-  the amount of reprocessing with p13 which can be done for Moriond is not easily determined with current information about accelerator, detector, and reco performance, but is estimated by the farms group  to be in the 10-30% range, leading to the request for scenarios, above

-  Work continues under the SBIR on improvements in dataset definition capability and planning for database resource management and security tools

-  Carmenita Moore is back at work half time, and can work on analysis tools  projects once outstanding issues with the MC request system are taken care of
 

 

September Online status report:

Stu Fuess, October 3, 2002

 

- Now more common to see logged rates at design capacity of 50 Hz and above

- Integral rate still fairly low due to accelerator performance

- Survived the Lehman review. Bottom line for DAQ/Online is $605K, split ~ equally among L3 nodes, host systems, and VME processor upgrades.

- New control room systems just arrived: 10 dual AMD2000 systems and flat panel monitors. Also have 2 systems for SiDet test stands and 2 server systems for RunIIb configuration testing.

- Software activities still focus on making operations smoother - unifying startup scripts and attempting to "herd the cats" of the diverse monitoring applications which have been created.

 

Simulation Status (10/3/2002)

Qizhong Li, October 3, 2002

 

- Testing p12.03.00.

We have generated some p12.03.00 MC events in ee channel and mumu channel for test/verification. Testing result shows improvements from p10.15. Only one know problem right now: some of forward muon hits are in wrong octant. It is under study and try to fix it. This problem will be in p13 as well if not fixed.

 

- Main efforts are for p13 to get more realistic MC.

Generators:

- added new generator AlpGen,

- updated CompHep interface to Puthia v6_2,

- new package: mctest.

 

D0gstar:

- Verified subdetector geometry: smt geo updated

- still don't have relative alignment,

- all materials are accounted for,

- Magnet field are verified ok.

 

d0raw2sim:

- using zero-bias special run for test/debug.

- L1cal problem is fixed.

- can not read cft database. Bypass database works.

- requested zero-bias stream for overlaping MC events.

 

D0Sim:

- dead channel handling is moved to unpacking for every subdetector,

- cal noise correction added since p12.

- add cal noise below pedestal.

- study/adjust zero-suppression cut.

- waiting for cal task force's recommendation to correct calorimeter noise simulation.

 

PMCS (from Marco):

During september most of the code changes foreseen for P13 have been implemented and released into CVS. Testing and final debugging will require another two weeks of work and will be done in parallel in the development branch and in the p13-br during the month of october. The P13 version of PMCS will be a first production release for physics involving electrons, muons, jets and missing energy in the final state. Users will still be required to perform some tuning of PMCS, mostly through RCP switches at run time.

 

Graphics group report for September 2002

Laurent Duflot

 

I d0ve

- display for triggers that pass (was on the TO DO list)

- improvements to vertex display (was on the TO DO list)

TO DO:

- investigate using threads to allow for refresh and GUI interaction while events are reconstructed.

- optionally filter events by trigger to show more interesting events

- define default viewpoints for the control display when the layout of  monitors is finalized

- move track code from d0ve_smt to d0ve_reco

 

II d0scan

- Physics Lego view now has L3 objects (was on the TO DO list)

- infrastructure work for iguana v3 and oiv v3

- fix bugs

 

TO DO:

- finish the implementation of the XY and RZ views

- L1 Cal trigger tower display for the (eta,phi) view

- display trigger that passes

- certify the code with oiv v3

 

Databases

Taka Yasuda

 

Calibration Databases
---------------------
 D0 toplevel, SMT, CFT, CPS, FPS, and Muon databases are in production. Calorimeter database is in integration. The c++ client code for all but the calorimeter is in the t02.35 release and will go into p13 release. The servers for these databases have been generated, installed on d0dbsrv2 and have gone through extensive tests with up to 32 farm jobs accessing the servers at around the same time, against the integration databases. The servers will be started against the production databases on 10/9 (Wed).
 
Herb Greenlee updated d0omCORBA, d0stream and d0om packages. Jim Kowalkowski updated memutil package. These updates have reduced the memory footprint of c++ clients by approximately a factor of 10.
 

DAN server project (Steve White)
--------------------------------
 Code for SQL Proxy is completed and passed the first tests.  It still needs a more complete set of tests done. Code for the File Cache has been completed and passed the first tests. It still needs a more complete set of tests done. A cache rebuild application remains to be written.  Conversion from DCOracle1 to DCOracle2 is underway.  At the same time Steve is also working on getting the reconnect feature of the Connection classes working. A new set of products has been released into production.  They contain the above changes.  The servers have all been rebuilt and released with these products.
 

Runs Quality Database (Stefan Soldner-Rembold)
---------------------
 The Runs Quality Database has been designed to store data quality information on a run-by-run basis. Representatives of the detector and particle identification groups are regularly evaluating the quality of global physics runs, assessing whether they have detector and trigger operating conditions suitable for data analysis.
 
The database consists of two parts. The first part is identical for every Quality Group (currently Muons system, Missing Energy/Jets, SMT, CFT, Calorimeter) and follows a common evaluation system which can be applied by the users (good, reasonable, bad, special runs). The groups cannot change this grading system.
 
In the second part, Run Quality Parameters Database, the groups can store any Name:Value pair. This gives them access to group specific quality information. The Run Quality Database uses the run number as key and is linked to the Runs Database. A special script, QualityGrabber, has been written so that the representatives of the detector and particle identification groups can easily enter data into the database.
 
A web interface has been designed which can be used to access the information either in a HTML format or in the form of ASCII files for easy access in the analysis environment. The quality information is also available through the SAM system, allowing the user to select files with runs of pre-defined quality. The system has been running successfully for several months.
 

Streaming Database (Jeremy Simmons)
-----------------------------------
Streaming Db is a single table in Oracle with 2 columns: version and stream scheme xml. This table holds the version and xml (as a CLOB) used for streaming.
 

Luminosity Database (Jeremy Simmons)
------------------------------------
 Luminosity Db has been designed and sized to be 20 GB/year. We are verifying the schema meet the user requriement. Testing will continue into next week.
 

Trigger Database (Elizabeth Gallas)
-----------------------------------
 Elizabeth will give us a summary report when she gets back from HCP.