Reprocessing: Meeting of 26-Jan-2005: 9:00-10:00 ESNet video conference
Our meeting number on the ESNet is 823073776 (82d0repro).
Instructions to dial into a video conference via phone.
Agenda
- News
- Status of Implementation of Reprocessing in JIM
- JIM Deployment and Remote Setup
- Certification of Sites
- AOB.
Minutes
Participants
Tibor Kurca, Patrice Lebrun (Lyon), Dugan O'Neil, Yann Cadou (Vancouver), Jae Yu (SAR), Joe Steele, Laurent Duflot, Gabriele Garzoglio, Mike Diesburg, Daniel Wicke (FNAL)
Topics
- News
Reprocessing will continue as planned (see Terry and Jerrys mail).
p17.03.00 is being build. One last modification is considered. Aim to freeze on next Monday.
- Status of Implementation of Reprocessing in JIM
Problem of files declared in but having no location in SAM.
We need to make sure this happens as seldom as possible (i.e. the --killDeclaration option should be used)
If this happens often to a given site, the FSS configuration needs to be reviewed (contact GG).
- JIM Deployment and Remote Setup
- DØFarm: Up to date
- GridKa: Configurations fixed, only minor known problems remaining (monitoring), these shouldn't affect certification.
- Lyon: Test of p17.02.00 succeeded. But using calorimeter db proxxy results in crashes (Unknown exception from CORBA). Certification not yet started (should we wait for proxy to work.)
- SAR: p17.02.00 succeeded. (No Cal DB proxy used). Merge certification started.
- WestGrid: p17.02.00 working. Failure rates of 2% in JIM plus 2% due to hardware setup.
About to get Starlight connection up and running.
- Wisconsin: Some problems with project master. 10% of the jobs don't get a file name upon get-next file. (Easily recoverable)
- UK: Frederic by email: not yet ready for certification at IC.
Fixed a major issue 2 days ago, which is the communication
between our working nodes and the head node. Tests ongoing.
- CMS Farm: --
- Prague: --
- Certification of Sites
Joe can run RecoCert on CAB. We need to check whether we can produce the plots.
Problems with SAM might occur for each of the sites as most probably remote read access was never tested before.
ID groups might ask to check production certification with other programms than RecoCert.
In this case we should (manually) store the files in each of the certification dataset to enstore.
- AOB.
Action Items:
Define new datasets for merge certification of Westgrid (Mike)
All sites which have a 90% of better efficiency should start certification of the merge step (D0Farm, GridKa, Lyon, SAR, Westgrid, Wisconsin, Prague?)
Next Meeting
31-Jan-2005.
Mike Diesburg, Daniel Wicke, 25-Jan-2005. Last Change 26-Jan-2005.
Diesburg@fnal.gov,
Wicke@fnal.gov