Reprocessing: Meeting of 10-Oct-2005: 9:30-10:30CDT ESNet video conference
Our meeting number on the ESNet is 823073776 (82d0repro).
Instructions to dial into a video conference via phone.
Agenda
- News
- New JIM release cut (October release cut; fixed installation scripts, fixes to improve job throtteling)
- Status of reprocessing production
- MC Issues
- AOB.
Minutes
Participants
Frederic Villeneuve-Seguier,
Gavin Davis (London)
Jae Yu (),
Joel Snow (Oklahoma),
Tibor Kurca (Lyon),
Joe Steele,
Joerg Meyer,
Parag Mhashilkar,
Gabriele Garzoglio,
Mike Diesburg (FNAL),
Daniel Wicke (Wuppertal)
Topics
- News
- D0Farm is now 100% fixing up until p17.08 is available. From then only one stream of data until fixing is done.
That will probably will be some day in November.
- Accounting not yet rerun. Still some discrepancy between unmerged and merged-tmbs in sam
(possibly due to 380 unmerged files at WestGrid).
Sites should give back datasets that aren't going to be processed.
- Status of production
- D0Farm: 100% fixing at the moment.
- WestGrid: Stopped reprocessing. 380 files to be merged. Some Manchester daysets to be recovered (statement corrected after email by Dugan. DW).
- Lyon: 20 daysets not started yet; 10 daysets to be recovered. Transfer from FNAL extremely slow since last Tuesday.
- DØSAR-Oscer, CMS-Farm, Wisconsin: Oscer: 2 datasets to recover, CMS:
16 daysets started but unfinished (currently site is broken after system upgrade), 14 at Wisconsin.
Joel may run jobs at Oscer even if they were started elsewhere.
CMS has the fixing of the samgrid-job manager high on their priority list (GG will follow up with CMS).
- DØSAR-UTA: Done with assigned dataset.
- DØSAR-Sprace: Done with assigned dataset (?)
- Prague: Done with bulk processing about 20 daysets to be recovered.
- Imperial College: 3 daysets unstarted, 10 dayset to be recovered, 4 being merged.
- GridKa: 30 daysets left unstarted. Some recovery jobs. Big chunk files to be merged.
SAM problems should be reported to the helpdesk, CC Adam Lyon and Gavin Davies.
- New JIM release cut (October release cut)
- Request to allow merging of files stemming from different runs in order for cleanup to be feasible.
Daniel will to integration tests at d0farm (with only a few CPUs).
- MC recovery meachnisms are to be implemented. Joel will do integration testing at Oscer.
Priorities are to be discussed offline.
- October release cut will only contain minor changes. It's mainly meabt for new sites.
- MC Issues
- Joel is working on a script to store card files to SAM. Investigating whether clued0 is appropriate.
- Scripts should also assure that a single request can't be picked up by 2 sites.
- Design of recovery of phased datasets provided by the SAMGrid team.
- Cano had again a malformed request. These should be terminated by the farmer
and the requester should be informed abou tthe problem.
- AOB:
Next Meeting
Next Meeting 17-Oct-2005 (MC only)
Next Meeting 24-Oct-2005 (Combined Reprocessing MC meegin)
Meeting might be moved to Wednesdays
Mike Diesburg, Daniel Wicke, 4-Oct-2005. Last Change 11-Oct-2005.
Diesburg@fnal.gov,
Wicke@fnal.gov