Reprocessing: Meeting of 11-July-2005: 9:30-10:30 ESNet video conference
Our meeting number on the ESNet is 823073776 (82d0repro).
Instructions to dial into a video conference via phone.
Agenda
- News
- Features for new JIM release cut (Andrew Baranovski)
- Status of remote site Certification
- Status of production
- JIM Deployment and Remote Setup of remaining sites.
- AOB.
Minutes
Participants
Patrice Lebrun, Tibor Kurca, Gabriele Garzoglio, Parag Mhashilkar (Lyon), Jae Yu (FNAL, Outback),
Vlastislav Hynek, Gavin Davies, Andrew Baranovski, Mike Diesburg, Daniel Wicke (FNAL, WH7X)
Topics
- News
- p17.05.xx won't be deployed for reprocessing. We think about using p17.05.xx for recovery of crashed or corrupted files.
- Marco Verzocchi found 7 files (so far, after check 90% of what is done) which crash when being read with dsdump.
- General policy of logging and how long they should be kept was discussed. Log files should ideally be kept 6 months.
To have enough space we will investigate whether to remove core files.
- Features for new JIM release cut (Andrew Baranovski)
- Job throttling:
Control / reduce rates jobs are submited to a site based on the current
snapshot of all jobs in the submitted state. The goal is to reduce machine load
during submission and improve operations.
- Ability to configure sam batch adapter batch queues on per application basis.
- Ability to configure data queues (fcp) and storage locations on per
application basis (binary download, reco data, merge data, reco output ,
merged output ) to improve CPU utlization.
- Multi site submissions. Random , "most data" criteria brokering when
selecting target site per submission attempts.
- Web monitoring improvements.
-- w3c compliance changes.
-- GMT to local timezone conversions.
-- monitoring filters per user and per site.
This has to be cut to a formal release, which then will be tested on the FNAL farm. Timescale: a bit more than one week.
- Status of production
- WestGrid: --
- Lyon: Eagerly awaiting new cut.
- SAR-UTA: Limited to 5 merge jobs, to be investigated.
- SAR-Oscer:
- CMS-Farm:
- Wisconsin:
- Prague: Stuck with expired host certificate.
- Manchester: --
- Imperial College: Jobs are going to idle status; durable location seems at the limit.
- Status of remote site Certification
- Status of production certification.
- GridKa: produced opteron and xeon results separately. Joe will provide plots asap.
- SAR-SPRACE:
- Status of Merge Certification of Sites
- DØFarm: done.
- WestGrid: done.
- Lyon: done.
- SAR: UTA: done
- SAR: Oscer: done
- GridKa: done.
- Wisconsin: done.
- Prague: done.
- GridKa:
- SAR-SPRACE: dataset will be assigned.
- JIM Deployment and Remote Setup of remaining sites.
- RAL: Admins are insisting on using an existing gatekeeper.
- Lancs: --
- AOB:
- Gavin Davies: Resources will have to be balanced, i.e redirected towards MC production.
Action Items:
Next Meeting
18-July-2005.
Mike Diesburg, Daniel Wicke, 7-July-2005. Last Change 11-July-2005.
Diesburg@fnal.gov,
Wicke@fnal.gov