o Real data processing
t154 is running on the farm. It uses the correct CFT geometry, and
has improved real data track finding paths. Fixes for memory leaks in CellNN
are included. Also, SMT software deals with abnormal occupancies by truncating
cluster lists. SMT sparse mode readout is not supported until t155. Unfortunately,
t155 is unable to process real data, due to a last minute error in calorimeter
code (fixed in t156).
o Monte Carlo processing p08.12.00 is being installed on the remote farms. It suffers from two large memory leaks in reco_analyze. These have been fixed, and p08.13.00 will be built as soon as disk space / release manager resources are available.
o Upcoming production release p10.00.00 will be based on t01.56.00. After the first build of t156, the d0reco executable existed and passed its MC integrated tests. The real data test failed due to a RCP error (under investigation). Tests of t155 give (d0mino, maxopt, Z(mm) + 2.5) CPU time: 85 sec/event, mem usage: 654 Mb (1K events). The CPU time is a little slower than expected (may be ok), and the memory usage indicates probable memory leaks. Memory leaks are under study, however purify crashes (experts informed). Performance on Linux to be measured. New reco certification samples using p09.08.00 are being generated for p10 certification (12K events exist so far).
Other outstanding issues:
Manpower Changes...................................................
With her new appointment as co-leader of the D0 Computing and Software
Project, Amber Boehnlein steps down from her role in Level3.
Moacyr
Souza, (LAFEX), the architect of ScriptRunner, steps in as L3 Filtering
co-leader.
Imperial student Ian Blackler begins work on a filter for messages sent
to
the Significant Event System, in hopes of controlling the number forwarded
to the alarm server.
Daniel Bloch and Francois Charles (Strasbourg) are studying a 3D vertex
finder based on similar work Francois has worked on for CMS.
Online Testing.............................................Jonathan
Hayes
The following are already on ONLINE nodes and have been/or will be
tested:
Calorimeter: unpacking, clustering, jet finding
SMT : unpacking, SMT-only tracking
Muon : unpacking, coincidence filter
CFT : unpacking, tracking and
global tracking
Timeouts in iniitializing (and backing up when SESystem que fills) slowed
early summer work. May need to stop blocking ScriptRunner whenever
SEServer is down.
4 basic triggerlists exercise the current online needs of the exe:
SMT tracking, MuoLocal, Muon coincidence, and Jet reconstruction.
Initial tests on Author node have shaken down SMT code on data samples
that include p05 QCD, and 1x8 store collider data. All 4 tools
must be
tested (looking for reasonable output), possibly with some profiling
and
RCP-tuning.
Transfer to Linux needed for l3fanalyze runs.
Online profiling (with an optimised build) of muon unpacking, calorimeter
unpacking, muon coincidence filtering, calorimeter clustering, and
the
jet tool should follow.
Tool Reports
------------
Jet........................................................Volker Beuscher
Little/no feedback from physics/Id groups leaves development still
directed internally with L3. This tool is running on data but
still needs
its large statistics shakedown. EM functionality is being added to
the
Jet tool for short term data collection plans. The earlier (p06)
studies
by Andre and Sailesh should be revisited.
SMT/CFT unpacking/clustering............................Robert Illingworth
A DB-script now extracts the SMT cable map from the database.
Timing results from tsim_l3 runs (on Z->bb+2.5 min bias reco certification
sample) give 21msec(sigma=7.8) for SMT unpacking and clustering (d0mino).
For comparison: 26msec(1.6) for the (non-zero-suppressed) 36x36 data.
Time scales beautifully with the number of clusters, as expected.
The CFT timing distributions showed a tail (w/overflow) with a 23.12msec
mean. However, the times did not cleanly scale with clusters, which
is to
be investigated.
CFT Tracking................................................Ray Beuselink
Regional tracking is not yet functional, and will require fairly extensive
reorganization of the way lists of clusters and links are built/stored.
Ray is studying premature hit assignments (to candidate tracks) made
in
complex events to improve the reliability of axial tracking. Modifications
are still being tested and debugged (and not yet in CVS).
Global/SMT Tracking.......................................Daniel Whiteson
L3 global and SMT only tracking is running (offline) on real data.
Daniel
has shown the 1st reconstructed gloabl track (run 119679,evt=231507)
SMT-only histograms (>~10000 events) are being used to study the hits
per
track (~3900 with 4 hits, ~400 with 5 hits, a few w/ 6-7 hits),
chi-squared fit, distance of closest approach, phi-0, Q/pt, Z0,
tan(lambda).
Tracking Efficiency Standards............................................
While the efficiencies and rejection determined by fully simulated
physics
triggers will have the greatest utility to users (and triggermeisters),
we
also need to document the individual performance of complementary tracking
tools (to understand the rates and select the tool appropriate for
specific filters). Daniel and Ray have been converging on common
definitions for reporting purity and efficiency, and Daniel has written
a
set of tools to calculate these measurements.
Primary Vertex..............................................Ray Beuselink
L3TfastZVertex uses L1 luminosity bits (providing a 6.25cm binning)
and
Runs on collider data (but not MC where the required bits are unavailable)
L3fcft_vertex is in cvs (though unreleased) and provides z-resolutions
on
the order of a centimeter. In principal it could be run on SMT
or global
tracks as well.
SecondaryVertex..........................................Arnaud Dupperin
The current algorithm finds secondary vertices in 2% of QCD events,
but
55% of Z->bb or top (at least one vertex found with a decay length
at
least 1 mm, minimum of 3 tracks with a least 4 hits/tracks) (~100 ms).
These rough estimates do note yet include any L1/L2 cuts.
Connecting to CFT/SMT and Global tracker reconstruction by obtaining
a
pointer to an L3TTracker and now one can now perform 3D tracking using
the
official L3 tracking. SMT clustering/hit finding is also implemented
to
compute reconstruction efficiencies.
Daniel Bloch and Francois Charles (Strasbourg) are studying fast 3D
vertexing, but this work should really start some September.
CPS unpacking..................................................Chunhui
Han
unpacker runs on 1x8 RawDataChunk. Channel mapping/array sorting is
fine
- no difference observed w/results of cps_examine (in by-hand comparisons,
which Chunhui plans to automate in order to check all channels).
CPS...........................................................Andre
Turcot
Tool has seen several thousands of events now, without major problems.
The schedule of AFE arrival/implementation suggests an axial-only mode
(provide adequate resolution by itself) be implemented.
FPS...........................................................Andre
Turcot
No unpacking yet available. Tool development awaits map and final
hardware specs.
Muon Hit Coincidence (MuoHitCoinc)........................Eduardo Gregores
Has been updated to take parameters defining the layers and octants
in
to check for coincidences.
Muon............................................................Paul
Balm
Unpacking runs regularly online. Its speedup has not been a priority.
In L3TmuoLocal, however, possible speedups were targetted in both
segment-finding and track reco. Central track-matching (timing
in at
4msec) has been released. L3TmuoCalTrack (utility for the MTC
package)
properly identifies the expected calorimetry cells.
Two new problems have just been uncovered relevent to the muon tool
(both from underlying packages, not L3 code)
1)Memory leak in the track reconstruction code (~100 MB after 100k
events)
is being looked into by the Saclay group.
2)mis-identified electronic channels assigned to non-existing detector
elements e.g., phi-section #7, when only 5 phi partitions are present
2 occurrences found - being investigated by Dave Hedin and Mike Fortner.
L3Tau..................................................Gustaaf Brooijmans
New algorithms introduce a track-based (rather than the previous
calorimeter-based) search. Track clustering is Pt ordered, and
single
isolated tracks above a minimum Pt are selected. The code
has been
debugged, released in p10, and timing and memory use is being
investigated.
L3Propagator..............................................Arnaud Duperrin
The classes (L3PropagatorManager - the interface allowing constant
or non-
uniform B selection, and L3propagatorSimple - covering the constant
B
case) have been defined. Propagation can begin from any space
point or
DCA, and with the input of momentum, charge, and R to be propagated
to,
returns an extrapolated vector position. This code is in cvs
and in use
by the jet tool. Timing studies are underway, and will be followed
by
L3PropagatorFull (the non-uniform B-field method) as well as inward
propagation.
- Continuing to fine tune the Access Control Lists this week.
Expect to
have them all in place by end of week.
- Jerry Guglielmo working on getting Kerberos to work for Tru64 V5.
This
is a major stumbling block to Kerberizing the Online system.
Our goal is
to allow only Kerberized ssh external access.
- In dire need of both EXAMINE czar to coordinate efforts, and EXAMINE
expert to work on READ_EVENT_DAQ low-level routines. In
general the
EXAMINEs need more attention, particularly Global and Display
instances.
- Another round of high rate tests are pending, to see if recent ITC
improvements have gained us anything.
- Another 8 Linux nodes due to arrive at end of month, for DAQ (separate
off the Distributor), Luminosity, Linux/Kai build node, and
EXAMINE
use.
The system is in heavy use now, and people are both finding and creating
problems - which is good. Inaccessability of tapes has been a major
problem recently. A new mode of operation of someone attempting to
run on
a linux cluster and fetch files from SAM, that could not be fetched
because there was no disk cache on the linux cluster, managed to bring
central-analysis on d0mino to its knees a couple of times, until this
was
tracked down.
New features added to the system in the last release include the ability
to fully run the Farms as a proper SAM station (which it has not done
for
the past year thereby becoming out-of-date with versions of software).
This has been tested, but still not put into production on the Farms,
so
there may still be some issues we don't know about. Also this version
of
SAM supports a cluster of workstations of any sort, including the desktop
cluster of ClueD0, where the disk cache is distributed throughout,
but
only one (or a few) servers designated to retrieve data from Enstore.
This is now being put to the test on ClueD0.
Datasets can now be defined using several attributes ("dimensions")
from
the Run Config database.
The SAM team has a long list of little jobs to do for V3.2 release and
is
constantly busy with support and new people trying to use the system
in
new ways. This is all good, but makes for slow progress on major
new
features.
SAM stations are running at Nikhef, Prague and Lyon and produce MC data
and enter it directly into the system. Also there are test stations
running at Imperial College, MSU and Columbia.
Issues:
With every new station we must insist that there is a responsible station
administrator who understands the possible tuning parameters of the
system. Some additional tuning parameters are urgently needed
and are
scheduled for V3.2 release in a 2 weeks - these will control the total
number of files that can be in transfer simulataneously on a station.
Currently this parameter is buried and on a systems like d0lxac1 where
10
disks have been defined we see 10 transfers from Enstore starting up
for
each disk - and this kills the system. Somehow these test installations
immediately become 'production' facilities for users, without the person
who installed them being the long-term responsible person.
We are working to enable more types of data, with more flexible and
extensible meta-data to be stored in SAM. This should be done in the
next
few weeks.
A documentation and error message blitz is needed.
The d0 I/O system is difficult to deal with and to inform users of
problems with a particular file. Files are opened -and if there
is an
error that is reported, but there is no way to report that certain
files
should have been delivered, but were not. This needs some help
from Jim
K. to think through what should be done. I think it involves other
ways
to funnel error messages through the framework.
A task list for the next release V3.2 is posted off of the SAM
development web pages at http://d0db-dev.fnal.gov/sam
Future Features:
Some of the SAM team members are now starting to work part of their
time
on the Particle Physics Data Grid, with hires to help with SAM to be
made
imminently. This work will in fact be exactly along the line
of work
that has to be done for SAM in interfacing to batch systems, dispatching
jobs and reliably running a whole chain of jobs. This also relates
to
Farm production code. It is extremely likely that most remote Farms
will
not use the same job management system as at Fermilab, but rather be
more
inclined to use Condor, as this meshes more with work and funding they
also may get from European grid initatives.
There are major pieces of work to be done in order to fully incorporate
Trigger and Stream information and to fully support user Analysis and
jobs.
The MC request tracking and MC metadata must be finished and fully
incorporated in the system.
Of course a constant stream of small things will also need to be done
at
the same time.
Understanding the entire distributed system, its status and examining
failures is a difficult task and log files are hard to read.
We are
working on a better monitoring and system information framework that
will
put all of this information into a monitoring database and allow us
to
better analyze it. However, this does not add user functionality,
so has
to be prioritized accordingly.
Summary:
The Run Configuration database is now in production.
The tests of using real calibration data for SMT calibration can now
go
forward, although the performance may still be less than desired and
the
size of the reco executable when bloated with calibration objects is
still too large. Many performance problems with the database
servers and
with d0om client code have been addressed, although more sophisticated
handling of these immense in-memory caches of almost 2 million objects
needs to be looked into. Other calibration databases remain in the
same
state as they have for many months - almost done, but need a real push
to
get them into production and working.
The Trigger database is on a path for first release at the end of August.
XML files are just now starting to be generated from the database.
Some
more human interfaces need to be finished to enter parts of the trigger
list and additional database triggers written to ensure integrity of
all
parts of the database.
The Luminosity database design is stalled waiting for it to be filled
directly from the luminosity monitor, instead of from loading files
as it
has been in test. The file loading leads to small problems that
have to
be ironed out, and can only be when streaming live data into the
database. It is not clear if the design is adequate and it remains
to be
decided just how much of the raw online data must be archived to the
offline database and how much can be summarized or discarded - assuming
we keep a copy in the form of raw files archived to tape via SAM.
This
latter process has also not been ironed out yet.
The RCP database has been done for some time. Hooking it up to
the RCP
package to save/restore sets of RCP parameters is still ongoing.
The SAM database has been quite stable. There are now tables for tracking
Farm jobs, but that part is not fully in production. Work continues
on
tables for recording and tracking MC request and physics information.
Tables for tracking physics information of MC data have been complete
for
some time, but they are not being filled, or the data even produced
yet
by mc_runjob.
Issues:
Accessing calibration data still needs a lot more work. We need
to
improve caching of data in the calibration database servers, possibly
using calibration files written out by the servers as intermediate
cache. We also should consider putting these files in SAM and
treating
them as another type of file, for archive and cache management. I think
we do not understand our access patterns for calibration data well
enough
- no-one has really been analyzing or using calibration constants up
to
now. It is also not clear that the decision to use d0om objects
for
calibration objects and d0om persistency mechanisms was a good one,
or is
a viable long-term path. We need to do some tests with non-d0om
sets of
calibration objects and study memory size and performance. The
database
server and d0om code also must have cache flushing notification added
to
them. If we need to go to non-d0om calibration objects then the
calibration_manager code for each subsystem will have to be modified
to
support an alternative form of reading in the constants.
The Run Config database is now in place and presumably bits of
information from it will be needed in reco or other analysis code.
So
the package that fetches these constants will need to be maintained
and
updated, possibly frequently.
MCHerwig is now running properly and can be made available
on the farms in p10.
The programs MCSingle_x and MCCosmic_x have been moved
from mc_runjob to d0_mcpp_gen. MCSingle_x has been
modified slightly to handle single tau generation.
D0GSTAR Status for p10
(S. Kunori)
All planned changes for p10 have been released with t01.55.00 build.
Major changes from p09 to p10 are as follows.
1) New 3D magnetic field map is now default. Default field orientation
is that solenoidal field: Bz positive at (0,0,0) and toroidal field:
+ in
phi
(i.e. counter clockwise). The previous field map is still available
as an option .OLD'.
2) Default ECUTs for calorimeter plate level geometry were lowered.
3) A new option to swim very forward p/pbar to FPD in double precision
was added. This option is OFF as default.
4) SimEventInfoChunk has additional methods which return 1) version
of
magnetic field map and field orientation and 2) FPD tracking option.
PMCS Status
(M. Verzocchi)
PMCS is not yet ready for use in a physics analysis.
The general framework of PMCS is in place and the current
work (by several contributors) focuses on providing the
smearing functions for the various physics objects. We
are working on the creation of physics objects chunks
inside PMCS (so that for example the RecoAnalyze packages
can be run on the PMCS output) and on the creation of the
Thumbnail. A first version of PMCS for general use
is expected for p11.
Just added an initial version of the smearing functions
for muons, and will add during the next two weeks code
for simulating the response of the tracking system. The
current code for smearing electrons and photons is being
tested. Lagging somewhat behind on jets and missing energy, and have
no
coverage for taus. The number of people currently working
on PMCS is unfortunately decreasing, the group would benefit
from additional help from the physics groups (particularly
from Jet/Met and Tau, and eventually Btagging and trigger
simulation).
For more details, see Sarah Eno's talk
two weeks ago at the July 25th MC meeting (transparencies are available
at
http://www-d0.fnal.gov/~sceno/pmcs_doc/temp/simulatio_mtg_24jul01.pdf)
gives the status of each PMCS package. There is also
a draft of a "developer's guide" for PMCS which contains a
detailed list of the work to be done, available at
http://www-d0.fnal.gov/~mverzocc/notes/pmcs/draft.ps
(some thing will be changed, and some things which are
listed as "TO BE DONE" are already "DONE").
D0SIM Status
(S. Protopopescu)
p09.02 corrected bugs in calorimeter packing/unpacking (negative energies)
p09.07 applied proper weights to calorimeter trigger towers
p09.08 corrected bug in cps packing/unpacking
The pileup package has been updated in t01.55.00 (for p10) to:
1) Treat mixture and plate d0gstar output on equal footing
2) Apply correct weights to calorimeter trigger towers
3) Ready to merge real minbias data with MC hard scatter
4) Ensure cal. unpacking can use same weights for real and MC data
All digi packages have code added to handle pseudo Sim data
(ie SimXXXChunk's that include hits derived from real minbias data)
D0RAW2SIM
Status
(S. Protopopescu)
The first production version of D0Raw2Sim program will be available
in p10 but it is not yet ready for use in farms. The package to
handle L1 calorimeter towers is missing.