resumed skimming last Friday night (no ICD) after consulting with Jan. 3rd batch of 300 jobs 30 hours 19 of 30 failed rund0exe jobs these 19 when resubmitted -- 17 were processed OK (took 26 hours) !!! 4-th batch of 300 jobs 50 hours 15 of 30 failed rund0exe jobs following Heidi's thoughts, i undeclared the problematic files again, put them all with their metadata in Wp20-post_store folder and submitted my standard script, so far i see only *good* messages "file ... not found", i.e. not found in sam database, so it can go to sam now. Monday evening i see some very short jobs (minutes, and they have one read input, the one i checked anyway) they all exited with code 0. ---- before 032309 name run_min run_max comment PASS2 (part I of RunIIb) v15a 221694 224486 startup without 11 ticks in SMT (15.03) v15b 224487 226163 last 11 ticks added (15.13) v15c 226996 229706 major trigger list upgrade (Oct 06 shutdown) (15.17-20) v15d 229707 230270 CFT thresholds set to 1 PE (15.27) v15e 230271 231969 CFT thresholds reset, new trigger list (15.50) v15f 231970 233118 New triggers (15.60) v15g 233119 234102 L1Cal cabling problem (15.64) v15h-j 234103 234846 RunIIb part I ends PASS4 (part II of RunIIb) v15m 237278 237801 runIIb part II starts v15n 237802 238365 200 micron + in x v15o 238366 239216 z0 change v15p 239217 239889 betay goes from 38 to 32! v15q 239890 240767 +100 in x v16a 240768 241889 v16 starts v16b 241890 245473 moriond sample make EMinclusive datasets for each epoch: "Heidi's datasets" split each of "Heidi's datasets" into snapshots of 100 files: "Sahal's snapshots" make skimming setup based on run2a setup from Jan /prj_root/3003/wz2_write/stark/SkimDevel/Play2/np_tmb_stream/ include run2b EMReco postprocessing with run2b W mass calibration make three skims 2EM, EMMET, EMJET skimmed tmb files are not EMreco post-processed (post-processing is done only to decide on proper event selection when skimming) Post-processing is then applied when making caf trees out of skimmed tmb files. Sahal made lots of studies comparing skimming code performance with wmass_analysis, checking how MET is calculated in both, when revertexing is done, etc. more info in Sahal's talks at Run2B meetings last couple weeks. We are certain that skimming does not discard events which we are interested in. one issue related to EM cell info still needs to be resolved. At the moment it causes only very rare crashes of skimming jobs (and nothing else ?) Qizhong asked to submit maximum of 300 cab jobs to cabsrv1 (only one w mass analyser) one batch of 300 jobs processes 30 Sahal's snapshots, one snapshot per rund0exe job, each split into 10 cab jobs. average size of output files: 2EM: 184.65MB EMMET: 657.61MB EMJET: 1.12GB output of one batch of 300 jobs occupies 700-750GB of disk space we're using two disks, each has this amount of free space: /prj_root/5670/wmass1 /prj_root/2646/wmass2 300 job batch finishes in (walltime) -- 14 hours -- 25 hours then we check the output: ****************************************************** processing jobs with timestamp = _Sun-Mar-15--19-13-52 (number of folders = 30) Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007013_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007014_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007015_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 24G . 2EM, EMMET, EMJET: 10, 10, 10 FAILED.Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007016_Sun-Mar-15--19-13-52 input: closed: 98, open: Status: open; output: size: 19G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007017_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 20G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007018_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007019_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007020_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007021_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 20G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007022_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007023_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 20G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15b-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007024_Sun-Mar-15--19-13-52 input: closed: 39, open: ; output: size: 9.2G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007001_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 23G . 2EM, EMMET, EMJET: 10, 10, 10 FAILED.Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007002_Sun-Mar-15--19-13-52 input: closed: 98, open: Status: open; output: size: 20G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007003_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007004_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007005_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007006_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 FAILED.Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007007_Sun-Mar-15--19-13-52 input: closed: 98, open: Status: open; output: size: 18G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007008_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007009_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007010_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007011_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 22G . 2EM, EMMET, EMJET: 10, 10, 11 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007012_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 18G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007013_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 20G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007014_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 19G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007015_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 18G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007016_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 20G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007017_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 21G . 2EM, EMMET, EMJET: 10, 10, 10 Wp20-v15c-CSskim-EMinclusive-PASS2-p21.03.00-allfix2007018_Sun-Mar-15--19-13-52 input: closed: 100, open: ; output: size: 19G . 2EM, EMMET, EMJET: 10, 10, 10 ************************************************************************** then we patch the output / modify output filenames and metadata so that the name of each output file contains its parent snapshot. then skimmed output is sent to sam for 870 files to go to sam it takes -- 36 hours -- ??? hours Failures: 1st batch of 300 jobs: one failure: exit 134 due to seg violation in EMMETTAG 2nd batch of 300 jobs: 4 failures: exit 134 -60143 checksum error exit 134 -59988 expected length, got length... exit 134 -60222 EMMETTAG seg violation exit 134 -60238 EMMETTAG seg violation 3rd batch of 300 jobs: ???