Running p10.07.01
Chip Brock on shift.
Friday/Sat/Sunday - test p10.07.00 find problem
Monday install p10.07.01
Tuesday start production.
Problems:
sam timeouts, probably due to rogue query. Seem to clear up by
Friday
These force Chip to resubmit jobs 4 times to get completion.
It is now
hard to convince him that any job has actually completed properly.
disk /stripe7/ fills as that is where we write root
root-tuples are bigger than in past. Also keeping
2 copies on that disk because of worries about file stores.
root now going to /stripe9/
Problems due to new /stripe10/ induced by attempt to write root files
there.
Raid not set up right - hangs d0bbin, stop using it, system people
make a fix - not tested yet.
Special projects overlap with shiftsets can cause overlap of merged
root-tuples.
Code needed to prevent this conflicts with code needed to store the
file. Hack around
it by doing wrapping python in csh and doing explicity version setups.
About 5000
events are duplicate and several files are in a strange sam state,
usable but tape location
not known to database. Need to do a full audit for overlaps,
the difference between
reco files which think they have root done to them and root files could
just be bad
tapes (reco files aren;t counted if tape is noaccess) or real problems
with root files
being stored twice.
But... in the end: Farm runs with fair efficiency and processed over
1M events through
p10.07.01
Raw data produced between 10/13/2001 and 10/21/2001
File Count: 2
Average File Size: 4328
Total File Size: 8656
Total Event Count: 63
Raw data between 10/13/2001 and 10/21/2001 which had reco version p10.07.01
run on it
File Count: 0
Average File Size: 0
Total File Size: 0
Total Event Count: 0
Raw data Reco output files for version p10.07.01 produced between 10/13/2001
and 10/21/2001
File Count: 900
Average File Size: 333532
Total File Size: 300179494
Total Event Count: 1132308
Reco data between 10/13/2001 and 10/21/2001 which had recoA version
p10.07.01 run on it
File Count: 504
Average File Size: 342449
Total File Size: 172594765
Total Event Count: 630221
Raw Data Root files for version p10.07.01 produced between 10/13/2001
and 10/21/2001
File Count: 85
Average File Size: 224413
Total File Size: 19075150
Total Event Count: 713297
Plan for next week.
Chip gets to do the rest of p10.07.01
Try to get automatic submission in -- Maciej
Clean up the problems with sam_user versions - Heidi and Sam team
Keep working on getting station going - current station code will let
CPU sit idle rather than move
a file from another node - can't use it in disk cache mode although
can probably do so with hard-wired
tape only mode. Igor Terekhov is looking into a solution
Mike D. is working on getting calibration from DB onto the farms.