Page updated: 2004

Run II Reconstruction Program Status - Version p14

Latest version:
p14.06.01

Running on the farms


Overview   Status    Performance   Test samples
Upcoming Features 
 Known Problems
Version Compatibility  
 Version History   Release Notes
General Information 
 How to run RECO   Report RECO problems


Overview

Now...

The current version of p14 is p14.06.01.

In general...

The p14 version of the reconstruction program has the following significant differences with the previous (p13) version:x


Status

p14.06.00 is currently installed on the data processing farms (FNAL). p14.06.00 is the final version.


Performance

The following statistics were measured on CAB using run 174244. The CPU time has been converted to 1 GHz-seconds. RSS measures real memory used and VSIZE measures virtual memory (real memory is more important for farm production, and virtual memory is more important for users running in shared batch systems). The numbers represent averages of different files within a run (we have observed significant differences between files in a given run). These numbers should be used with care when extrapolating performance to other runs.

Version

CPU Time (sec/event)

RSS Memory (MB)

VSIZE Memory (MB)

DST size (KB)

TMB size (KB)

NEVT

p14.03.00

14.8

413.4

666.4

208.2

22.3

1000

p14.02.00

20.7

445.0

597.5

211.2

23.8

100

p14.01.00

25.7

688.3

852.5

220.3

18.9

100

p13.06.01

17.2

388.9

494.4

171.6

18.9

100

Run 179760 is 26e30 run

Version

CPU Time (sec/event)

RSS Memory (MB)

VSIZE Memory (MB)

DST size (KB)

TMB size(KB)

NEVT

P14.05.00

20.6

273

523

210

24.7

2500

Run 174491 is a higher luminosity run.

Version

CPU Time (sec/event)

RSS Memory (MB)

VSIZE Memory (MB)

DST size (KB)

TMB size (KB)

NEVT

p14.03.00

37.1

453

883.1

283.6

27.4

1000

p14.02.00
(pre-build testing)

40.2

434.8

698.2

270.1

28.6

100

p13.06.01

31.8

432.1

578.1

224.0

23.0

100

Tracking efficiency is measured by matching "tight muons" to tracks found in the central detector (a technique developed by Erich Varnes and described in a talk to the tracking group).

Version

Tracks/event

Hits/track

Track effic (phi=0)

Tracks/primary vertex

p14.01.00

47.7

18.1

0.89

23.3

p13.06.01

35.3

16.0

0.71

12.3

DST composition (top 20 chunks), based on run 174244.

 

p13.06.01

 

 

p14.03.00

 

Chunk

Number of chunks

% of total

Chunk

Number of chunks

% of total

FPSClusterChunk

1

20.41%

L3Chunk

1

15.04%

L3Chunk

1

17.30%

TrackCalExtraChunk

1

14.32%

TrackCalExtraChunk

1

9.12%

FPSClusterChunk

1

10.74%

JetChunk

8

7.66%

JetChunk

6

8.02%

FPSDataChunk

1

6.00%

Calt42Chunk

1

6.63%

EMparticleChunk

2

5.49%

ChargedParticleChunk

1

5.08%

CalSCClusterChunk

1

4.27%

GTrackChunk

1

4.17%

RawDataChunk

1

4.09%

L1L2Chunk

1

3.96%

ChargedParticleChunk

1

3.99%

CPSClusterChunk

1

3.71%

GTrackChunk

1

3.68%

RawDataChunk

1

3.63%

CalDataChunk

1

3.31%

FPSDataChunk

1

3.22%

CalTClusterChunk

10

2.49%

EMparticleChunk

2

3.18%

CftClusterChunk

1

2.10%

CalSCClusterChunk

1

2.98%

SMTPosBCollectChunk

1

1.98%

CalDataChunk

1

2.63%

SMTPosDCollectChunk

1

1.67%

CalTClusterChunk

8

1.92%

VertexCollChunk

6

0.86%

CftClusterChunk

1

1.72%

MuoAlignChunk

1

0.59%

SMTPosBCollectChunk

1

1.45%

MuoCentralMatchChunk

1

0.54%

SMTPosDCollectChunk

1

1.23%

MuoSegmentChunk

1

0.53%

CPSDigiChunk

1

1.22%

TrackChunk

1

1%

VertexCollChunk

5

0.56%

A breakdown of where CPU time is spent, based on run 174244. Note that these numbers fluctuate by a few percent, based on which run is processed, and the instantaneous luminosity of that run. Also, not all times are accounted for (to the level of about 5%). These numbers should be considered to indicate general trends.

 

p13.06.01

p14.01.00

p14.03.00

Stage

% of CPU time

% of CPU time

% of CPU time

Initialization

0.08%

0.06%

0.00%

SAM

0.00%

0.00%

0.00%

Read event

0.08%

0.06%

0.11%

Unpacking

6.67%

5.51%

6.80%

Detector (RDC)

3.21%

2.88%

2.79%

Detector (DST)

1.69%

1.38%

2.01%

Tracking

56.96%

63.97%

62.65%

Vertexing

6.33%

5.51%

5.13%

Particle ID

14.94%

11.65%

15.16%

  cal

1.10%

0.94%

0.78%

  chpart

6.67%

5.58%

7.36%

  em

0.25%

0.19%

0.22%

  mu

2.11%

1.44%

2.68%

  jet

3.46%

2.63%

3.01%

  tau

0.08%

0.06%

0.00%

  met

0.25%

0.25%

0.22%

  links

0.00%

0.00%

0.00%

  bc

1.01%

0.56%

0.78%

  wz

0.00%

0.00%

0.00%

Write event

0.17%

0.13%

0.22%

Finish event

7.26%

5.70%

1.00%

  RecoStat (recostat)

1.43%

0.69%

0.86%

RECO certification plots generated with recocert:

p14.02.00, Run 176571 (3 MB)

p14.02.00, higher statistics (from Data Monitoring group)

p14.01.00 vs p13.06.01, Run 176571 (5.2 MB)

The above set of plots compare run 176517 reconstructed on the farms with p14.01.00 and p13.06.00. The plots were generated with the recocert package. Results from p14.01.00 are in black (p13.06.01 in red). Where appropriate, the plots have been normalized by the number of events processed. Track efficiency plots have been refined by Erich Varnes recently: a) require calorimeter confirmation of muon, to reduce muon fake rate, b) tuned matching cuts to better handle geometric matching. These modifications have resulted in a higher observed tracking efficiency.

p13.06.01, Run 174244 (10 MB) - for comparison


Test Samples

p14.01.00 certifications samples (Data and Monte Carlo) are available in SAM

Data

Monte Carlo


Upcoming Features


Known Problems

Data reconstructed with the following versions have known problems or annoyances:

 

History of Problems and resolutions:

red dotMajor error, yellow dotAnnoying feature, blue dotFunctionality to be added
green ballScheduled for next release, blue dotProblem fixed.
(common = occurs within 100 events, rare = occurs within 1000 events)

Status

Description

Fixed in version

Segmentation faults

Not fixed

Infinite loop in AA tracking algorithm

p14.06.00

blue dot

Energy information in the CPS data reconstructed with p14.03.0x (x>=1) due to a bug in cps_util and cps_calibration. Position information is not great, but usable. This is all fixed in p14.04.00.

p14.04.00

green ball

EM reco produces a peak at pT=5 GeV for low PT electrons. Fortunately, the cell information is stored, so EM post-process allows to do the reconstruction correctly.

p14.06.00

dot green

A serious bug was introduced in p14.03.01 as part of DSPACK. When L1L2 information is unpacked, program crashes, but not for all p14.03.01 data.

p14.03.02

red dot

A serious bug has been found in the calculation of the jet estimator n90 (from which f90 is built), for kt jets, and for cone jets which result from a merging of 2 or more jets. See this D0 News announcement for more details.

p14.04.00

yellow dot

A bug in Jet::p() results in returning pz instead of p. Users are advised to avoid p() and instead use sqrt(px() * px() + py() * py() + pz() * pz());

No plan to fix in p14.

red dot

CPU time per event can be x20 -x40 larger than typical for specific events, dominating average time.

p14.05.00.

blue dot

The calorimeter "energy sharing" problem that occurred during the first few months of 2003 (see Nirmalya's ADM talk or Greg Landsberg's summary) resulted in approximately 40 pb-1 of data that was compromised for physics analyses. A solution ( description of the solution, a before plot and an after plot) was implemented and can be used to reprocess RAW or DST data tiers.

p14.03.00

blue dot

RECO startup time is extremely variable when processing raw data. This was due to limitations with cacheing within the calibration database servers. The python version of the cache could only store about 6 individual calibration constant sets, aresulting in "cache thrashing" with the user db servers due to requests for more than this number of sets. Users attempting to (re)process events from a wide range of runs experienced even worse performance. Farm production was less affected. See below for more details. The cacheing code has been rewritten in C++, allowing for currently 30 SMT data sets to be in memory at one time.

New SMT C++ cache code deployed.

blue dot

CPU time per event grows linearly with the number of events processed. See Slava Kulik's ADM talk for a description of the solution.

p14.03.00

blue dot

RECO requires more memory than p13.

Improved in p14.02.00 and p14.03.00.