Reprocessing: Meeting of 9-May-2005: 9:30-10:30 ESNet video conference
Our meeting number on the ESNet is 823073776 (82d0repro).
Instructions to dial into a video conference via phone.
Agenda
- News
- Status of production
- Status of remote site Certification
- JIM Deployment and Remote Setup of remaining sites.
- AOB.
Minutes
Participants
Yann Coadou (Vancouver), Frederic Villeneuve-Seguier (London), Joerg Meyer, Gabriele Garzoglio, Mike Diesburg, Daniel Wicke (FNAL)
Topics
- News
- Saturday earliy morning until early enstore had problems with the STKIN robot; late Saturday night the SAM naming service had problems.
- Status of production
- WestGrid: After kernel update (two weeks ago) running more or less smoothly. Still occasional failures.
Running at 3 times the expected load of WestGrid; running up to 20 grid jobs at a time. A record of 785 jobs running in parallel was observed.
The configured durable location is too small for that high number of parallel jobs.
Yann consideres to move the durable location from head node to GridStore.
- Lyon: GG: Reasonably smoothly after last SamGrid upgrade. Some problems reported.
- SAR: GG&Mike: No operator on duty? Rerunning certification with local proxies?
- Prague: Were running into a bug into the p17.03.03 tarball beeing incompatible with certain linux versions (new tar-ball was created by Iain).
Having problems with the PBS. Admins are working on it.
- Status of remote site Certification
- Status of production certification.
- Prague: certified.
- SAR: UTA: Rerun after changing their DB proxies. Also not accesible. Joe will retry.
- SAR: Oscer: Running.
- UK: IC: Certification datasets provided (see below).
- CMS Farm: Certification datasets provided, but files can't be accessed (Robert is looking into it).
-
- Status of Merge Certification of Sites
- DØFarm: done.
- WestGrid: done.
- Lyon: done.
- SAR: UTA: done
- SAR: Oscer: done
- GridKa: done.
- Wisconsin: done.
- Prague: done.
- UK: IC: First certification datasets provided, but Joe has problems accessing files (Robert is informed).
- UK: Manchester: ??
- JIM Deployment and Remote Setup of remaining sites.
- GridKa: Suffering from reoccuring NFS problems. Suggestion is to coordinate with GridKa admins (is it still Jos van Wezel?)
to avoid tests when the system is in an undefinded/buggy state. Scheduled maintainance on Wednesday.
- Prague:
- UK:
- RAL: Required user accounts now available. Installation about to start.
- Lancs: --
- AOB:
- Yann requested a summary of how many events were produced.
Mike will programm that this week. A veeerry slow version exists.
Action Items:
Next Meeting
16-May-2005.
Mike Diesburg, Daniel Wicke, 9-May-2005. Last Change 9-May-2005.
Diesburg@fnal.gov,
Wicke@fnal.gov