Examining D0 Run I with
20/20 Hindsight
March 26, 1998, New Phenomena Workshop
Lee Lueking
Mission
Identify major areas from our Run I experience where careful
steering now will save work and confusion in Run II.
Workshop Goal
Discuss and document some of the major areas of Run I
weakness. Propose solutions to issues not already being addressed.
Major Topics for Discussion
- Triggers and Backckgrounds
- Trigger Simulation
- Event & Detector Simulation
- Reco and Reconstruction
- Data Access, Cataloging and Managing
- Analysis Techniques
- Computers
- Sociology
Enlightenment
From "The Cathedral and the Bazaar" by Eric
S. Raymond
12.Often, the most striking and innovative solutions come
from realizing that your concept of the problem was wrong.
13.Perfection (in design) is achieved not when there is
nothing more to add, but rather when there is nothing more to take away.
Antoine de Saint-Exupery
14.Any tool should be useful in the expected way, but
a *great* tool lends itself to uses you never expected.
18. To solve an interesting problem, start by finding
a problem that is interesting to you.
Specific Issues
Thanks to Sarah,Wyatt,Doug, and Amber for their many
comments.
Triggers and Backckgrounds
- There is a post mortem of the run I trigger system at
http://www-d0.fnal.gov/hardware/upgrade/l2workshop/l2l3_workshop.html
- The trigger board.
- What they were, how they were established, how they were
tracked.
- Trigger configuration files
- Our louzy wino/zino trigger.
- We MUST trigger on low pt leptons next run.
- The trigger list was not part of the data and it is difficult
to determine what triggers were used. People were at the mercy of the trigermeister.
- List of generic triggers. Do we want zoo events?
- In run I did not download triggers within the run. Could
not download triggers to get em reference sets. If this had been possible
, might have ben able to low et electrons.
- Varying trigger lists were a big problem.
- Bookkeeping - what triggers and what MC for each of triggers
is difficult
- Low pt electrons triggers not optimized. Adam lyon's
analysis was limited by AVAILABLE BACKGROUND SAMPLES. this is horrible!
he had to require the leading jet have et > 115 GeV because of this.
He would never have made this cut without it.
- Very hard to get NP people decide on background samples.
Need to make an inventory before the run begins.
Discussion
Sarah Eno:Divide up bandwitdh by physics groupSome groups
did not need all the trigger bandwidth they had, maybe like the top group,
and they filled up the bandwidth with wierd triggers that they did not
really need.
John: I remember
Wyatt:
Sarah: I don't know if this was such a good idea Lee.
Wyatt Merrit: I cannot imagine any system other than the
spokesman making that system work
Dave :They asked for input and then
Wyatt: The final decision came from the spokes
John Hobbs: similar final states could share a trigger.
the top group had slightly different w trigger than the electroweak group.
You might imaginne trying to allocate di
Wyatt: I think toward the end there was some wheeling
and dealing along those lines, but one thing that limited it was the unavailability
of tools provided to determine the overlaps. it was recognized that a great
deal of our triggers were used by other groups. I think the system worked
fairly well but just needed better tools
let me just remind you , who could forget this (shows
overhead of the trigger list) all of the bandwidths and allocations was
that all of the bandwidths put into by the trigger board.
John: The other thing to remember is that there is the
trigger meister who was crucial in this issue, and there was a three way
between the physics groups the trigger meister, and trigger board. Various
trigger meisters had a lot of influence That one person had a lot of pressure
and also had a lot of influence depending whether they were your guy or
the other guy. Having better tools would help a lot, better simulation
tools so when they cam and said this is what the rate would be, they could
accomplish that.
Gordon Watts:
John: I think that later on
Gordon: When you say later
JW: I think that something we really did screw up was
that we didn't really know what to expect for a given trigger, so we started
out with loose triggers and as teh luminosity increased we had to change
the triggers. The trigger meister was always running around like a headless
chicken saying "the rate is going up." "We have to change
the triggers NOW!"
SE: I am sad that there was no W trigger in run 1C
In the monopole
JW:I have a comment on this low pt electron thing. There
are other cases like that we only figured out after too longthat we were
screwing up. We were taking data and
WM: Are you suggesting that this was a lag in Reco
Doug: The thresholds were up too high and we did not have
certain resonances
Lee:
Gordon:
John: If I claim that I have a trigger that I claim
is going to work for a certain channel, I'd like to look at the first 1000
events to see if it is woeking.
Gordon:
JW:The inability to figure out you are screwing up while
you still have a chance of fixing it.
GW: That second one is a really thorney question, but
isnt that first one
John: But look, maybe our histograms look like shit and
you need to go off for 3 weeks and figure out what was wrong. The problem
was that some triggers were not looked at for a year after they were taken.
Kaushik De:
Greg Landsberg:
John:
LL:Background samples. There were some samples which were
missing background samples when we finished running. This is something
we need to make a careful inventory for now. There is no better time to
make a list.
Sarah
John: If your lazy like this
Harry: The other strategy is to take a number of
background triggers and change one thing at a time
Wyatt
Greg: We have to accept very bad
John:
Greg: For every signal trigger we can come up with a background
trigger.
John: This was another example of where we were not
looking at the data until a year after the data was taken.
Trigger Simulation
- Building in bugs, finding bugs.
- Running
- Maintaining
- Problems of trigger simulator
- Awkward to use.
- Output is not user friendly.
- Lots of information is not useful and can be confusing.
- What would people like? Should it be partof reco? Mike
Fortner has some ideas.
- What kind of info would be useful? ntuple? Linking to
users program?
- The trigger simulator is much smaller than reco, may
be only 10%. The L3 the simulation the code used was the same code as was
downloaded. For L2 and L1 the code was emulated or simulated.
- The simulator had many purposes but only one interface.
This was a problem.
- Could not make the trigger simulater run all triggers
simultaneously.
- Needs more flexibility to turn off machine dependent
code.
- At the moment noone is assigned to work on overall trigger
simulator.
- Lots of pilot error.
Discussion
Lee:Trigger simulation seemed to ba a very difficult thing.
This i
Sarah: I think it would be
Gordon: We don't want to run the trigger simulation for
all the data, just on the MC
Sarah: Actually It might be a good idea to run the simulator
on the data for each event.
John:
Marc:Thats an extrodenary inconveinence, that while I
hope ... it is something that the simulator can deal with
John: made it hard to run offline as a trigger simulator
for all events.
Harry: it is central to all analysis
Gordon: There was at one point an idea to have two kinds
of trigger simulators
Lee: OK, could not make the trigger simulator run all
the triggers simultaneously,
John: A feature not a bug
Lee:
Sarah: How many people are currently working on the trigger
simulator
Lee: It is on the list: No one is currently assigned.
Marc:
John: The who question of ... is a horible one and
... Should not rely on the simulation to get everything right.
Event & Detector Simulation
- Stability of the code, geometry,.
- Getting needed CPU, coordinating offline farms etc.
- Levels of simulation snail/lightening.
- Really nasty details like "the noisy package"
and "musmear".
- Luminosity dependant effects.
- Adding underlying events.
- We should have a fast monte carlo available at the begining
of the run.
- MC (recoing) on farm was difficult to use. Usually had
wrong stp and rcp info.
- Some people did all this themselves to make sure everything
was correct.
- Long and difficult chain to do simulations isajet, geant/shower
lib, musmear, noisy, overlay, triggersim, reconstruction.
- Need proper muon smearing and detector studies to track
muon smearing parameters.
Reco and reconstruction
- Reco code, development, evolution, releases, testing.
- Why did it take MONTHS to get a little fix into production?
- Moving into production was extremely difficult due to
the complexity of the farm.
- Farm operation was complex and difficult to track.
- Data processing priorities were difficult to establish.
- Fixed data still not usable.
- Fixing things - need ways to redo parts of the reconstruction,
for example need to keep strip info for certain busy regions in the detector
where there are many hits and possible to mess up tracking. Need to keep
some strip info to go back and redo the tracking.
- Should be possible to do full tracking more easily, after
all we have a whole new central detector.
- Calibration constants - Establishing them for reco. Why
did we have to wait 2 weeks for muon constants?
Data Access, Cataloging and Managing
- Data size
- Did we need every byte we saved? Was our data good to
the last byte?
- What was missing from the uDST?
- What additional information and functionality were needed.
- How could the DST and STA be made more useful useful?
- Missing things in early data. Could not go back to run1a,
for example could not calculate MTC without going back to STAs. Forced
to combine analysis together in imperfect way.
- Data access was difficult at the beginning because there
were no uDSTs.
- D0DAD had real benefit when it was correct.
- FATMEN access was bad.
- What's so bad about FATMEN?
- It's Hierarchical and difficult to organize and use
- Real bad interface and difficult to write utilities for
RZ corruption nightmares.
- Not built for 1.5 E6 entries
- Machinary behind it was flakey
- LOTS of 8mm tape failures.
- Very labor intensive system.
- Post-reco streaming was a nightmare to do and a disaster
to use.
- Access to streams other than udst, was difficult. Maybe
the 1b stop analysis would be done by now if it were easier to access this
stream?
Analysis techniques
- Frameworks. Why so many?
- cafix, mufix ...d0fix.
- Columwise ntuples, PAW/PIAF.
- Combining triggers in analysis.
- Particle ID blues - H-matrix limitations.
- b's and tau's. CDF really killed us here!
- Cut optimization. the leptoquark analysis gained alot
by proper use of cut optmization. Other analyses should try this.
- Don't make too tight of cuts. the leptonquark analsysis
gained A FACTOR OF 2 by only requiring 1 leg to have a track. Nirmalya
got rid of almost 1/2 his original cuts, without any appreciable gain in
background.
- Don't be afraid to go to lower levels of data. jianming
gained MORE THAN A FACTOR OF 2 over womersley cuz he was willing to go
to the non-udst (and could therefore run hitsinfo). Bryan Lauer got a similar
effect when he was talked into doing this.
- Don't be afraid of muons.
- Next time, we should all use the same lepton id, to minimize
duplication of effort (evaluating efficieicies, fake rates)
- Why isn't the run 1b stop analysis done yet?
- Unbiased j/psi samples would make it easier to do electon
id and energy calibrarion. Of course will have e/p too. could do a lot
better than 54% efficiency at high end of pt spectrum.
- Maybe use neural net of other multivariate approach.
- Need NP presence in the ID groups.
- Muons-
- Assume efficiency will be the same same as electrons.
- Tracking and momentum measurements were bad,efficiency
was low.
- Coverage not even in eta.
- Efficiency varied over the run.
- ID tools were not there and changed dramatically over
the course of the run.
- Most of the mu effort was concentrated in the bgroup.
- SUSY concentrated in jets. No one in sysy thinking about
muons.
- Electron ID tracking efficiency. Need better rejection
for fakes and ther backgrounds.
- Need soft lepton tagging.
- Taus.
Computers
- Disk. why did it take so long for the computing division
to give us the disks for the d0fix data? rumor had it that there were tons
of disks sitting on the shelf, but some bureaucracy prevented them from
giving them to us.
- CPU was issue at first. Not a problem later on.
- Allocating computing resources was a pain, especially
disk space. Computing Priority Policy Board.
- Paw/piaf were avoided because too difficult to use on
rapid turnaround cycle.
- Paw is flaky for large ntuples.
Sociology
- Publish. why isn't our higgs search published?
- WRITE YOUR PRL FIRST, THEN LOOK FOR JOBS AND WRITE YOUR
THESIS!!!!
- Don't work by yourself. where did we get results out
fast? where we had a strong team working together (leptoquarks, jianming+
Sailesh). one person working by themselves often gets beat by CDF!
- There was a lack of clear documentation about what others
have done. Due to this it is hard to reproduce other peoples results and
hard to figure out.
- Should focus on common cuts made by all groups.
- Perhaps need standardized form so there is some consistency
amont reports and notes about various studies.
- Many GS feel what they know is not real important, and
PD think they do not have time to write up what they have done.
- There was some territorialism which precluded the use
of some peoples work. o
- Distribute responsiblity to better use manpower.
Lee Lueking
Last
modified: Mon Mar 23 13:45:42 CST 1998