What to improve in the next round of ReReco?
Do it from RAW data
Prerequisite : Data Base access
Solution: DB proxy server (local, regional)
- Hardware needs: dedicated machine, memory, disk space ...?
- Software: products, licences, modifications ... ?
---> define specifications
How to proceed
1. Feasibility tests -- localy --> other sites
2. Certification -- sw version, data sample
see Thomass N. at GridKa
Do TMBs merging outside FNAL
Prerequisite : portable, robust merging script
Caveats:
- intensive Sam DB access
- central access to all output files from a run
- files integrity checks
Merging utilities:
- copyd0om (slow, but ensures integrity of input files)
- evcopy (fast but no check on input files)
- evcopy & dsdump of merged files (ensures data integrity)
--> see Mike D.
and see Marco V.
- files Sam declaration, temporary storage of TMBs ?
Improve book-keeping
Goal: Automate/ generalize book-keeping
1. projects assignment (see also Data set preparation)
2. job management & monitoring
- which parts of Lyon system are reusable and how to generalize them ?
- remove any local dependancies, paths
- what can be used from the GridKa procedure ?
do we have in SAM a functionality not used yet, similar to those of Lyon system ?
recovery/resubmission procedure
-
additional crosschecks in the remote procedures ?
- number of events produced (compare with expectation from raw file/sam)
- check readability of output files ?
Data set preparation
1. Project assignment
Goal: avoid manual assignment of datasets -> common bookkeeping
- web page where sites can sign up for specified datasets
----- ensure unique assignement (1 project - 1 site)
----- general information about the status of project
(assigned to X, dataset copied to X, dataset done, dataset in sam, dataset copied back)
Solution:
--> MC-production tools ?
--> Lyon system? (Python scripts + Oracle DB)
2. Data delivery
dedicated node with Gb connections and large buffer space ?
---- specify out server to handle both data delivery of raw files and storage of DSTs & thumbnails
avoid the need for (manual) prestaging of input data?
Certification procedure
need a well defined line of responsibility:
- for initial certification
- for continous monitoring of resulrs
--> representatives of physics groups and/or detectors responsibles ?
need a central repository for a standard set of plots:
- reference plots
- plots from run-time info
- which additional info do we need ?
--> tools, mechanisms for displaying/overalying plots ?
Can we use GRID ?
What about SAM-Grid ?
- in rather advanced phase for MC-production
- could it be adapted to ReReco without great pain?
EDG experience
What about LCG ?
- premature ?
Tibor Kurca
Last modified: Thu Feb 19 03:33:36 CST 2004