Reprocessing: Meeting of 12-September-2005: 9:30-10:30CDT ESNet video conference
Our meeting number on the ESNet is 823073776 (82d0repro).
Instructions to dial into a video conference via phone.
Agenda
- News
- New JIM release cut (September release cut; coexistance with MC, bug fixes, new KCA fingerprint)
- Status of production
- MC Issues
- AOB.
Minutes
Participants
Patrice Lebrun,
Tibor Kurca (Lyon),
Joel Snow (Oklahoma),
Joe Steele,
Gavin Davies,
Parag Mhashilkar,
Gabriele Garzoglio,
Mike Diesburg (FNAL),
Daniel Wicke (Wuppertal)
Topics
- News/Misc.
- We need to find a new meeting day/time (top group is now starting at 9am), Move to Tuesday for the Collaboration week.
- Mike rerun accounting, no chunks that are big enough to be reassigned remain. Around 30-40M events from v12 missing.
- All datasets are now assigned.
- New JIM release cut (September release cut; coexistance with MC, bug fixes, new KCA fingerprint)
- Joel reports authentication problems after upgrading CMS-Farm to September release cut.
We should compare with Yanns authentication problem and post the solution to the list. Oscer was upgraded without problems.
- Release candidate will be declared current after the meeting. [DW: done]
- Status of production
- D0Farm: Down. Sam doesn't deliver files.
- WestGrid: --
- Lyon: Suffering from SAM problems. No. of CPUs reduced after holiday season and due to MC production.
- DØSAR-Oscer, CMS-Farm, Wisconsin: Joel is givin production to his student (awaiting the grid certificate).
- DØSAR-UTA: --
- DØSAR-Sprace: --
- Prague: Vlastislav by Email: Out of datasets, will get reassign sets from Joel.
- Imperial College: Frederick now resuming operations.
- Manchester: --
- GridKa: Joerg by email: Only local problems. Nothing that needs discussion.
- MC Issues
- Need to complete web-page to help operators running MC requests.
Joel will link more information/documention into the corresponding web-page.
- In the future need to develop tools similar to d0repro to automate the necessary steps.
- New zero bias files were sent around, Joel will fix the web-page. Sites should upgrade asap,
but running requests should be restarted.
- AOB:
- The current estimate for the completion of bulk reprocessing is mid of October.
At that time all sites should have processed the assigned datasets.
We envision to allow for a cleanup period of 2 weeks in which sites should
try to recover from recent failures and complete the merging of their production.
After those two weeks all remote reprocessing should stop and we will start a central recovery phase.
The exact dates will have to be defined.
- Bar chart on Web is broken...
- Tibor has a dataset the claims to be complete with 19 files processed,
but the partition numbering goes up to above 30. This may be caused by an inaccessible tape.
The dataset should be retried later. It is a known bug of d0repro to falsely mark such datasets as completed.
Next Meeting
26-Sep-2005 or 27-Sep-2005
Mike Diesburg, Daniel Wicke, 7-Sep-2005. Last Change 12-Sep-2005.
Diesburg@fnal.gov,
Wicke@fnal.gov