What to improve in the next round of ReReco?

Do it from RAW data

  • Prerequisite : Data Base access
  • Solution: DB proxy server (local, regional)
    - Hardware needs: dedicated machine, memory, disk space ...?
    - Software: products, licences, modifications ... ?
    ---> define specifications
  • How to proceed
    1. Feasibility tests -- localy --> other sites
    2. Certification -- sw version, data sample
  • see Thomass N. at GridKa
  • Do TMBs merging outside FNAL

  • Prerequisite : portable, robust merging script
  • Caveats:
    - intensive Sam DB access
    - central access to all output files from a run
    - files integrity checks
  • Merging utilities:
    - copyd0om (slow, but ensures integrity of input files)
    - evcopy (fast but no check on input files)
    - evcopy & dsdump of merged files (ensures data integrity)
    --> see Mike D. and see Marco V.
    - files Sam declaration, temporary storage of TMBs ?
  • Improve book-keeping

  • Goal: Automate/ generalize book-keeping
    1. projects assignment (see also Data set preparation)
    2. job management & monitoring
    - which parts of Lyon system are reusable and how to generalize them ?
    - remove any local dependancies, paths
    - what can be used from the GridKa procedure ?
  • do we have in SAM a functionality not used yet, similar to those of Lyon system ?
  • recovery/resubmission procedure
    -
  • additional crosschecks in the remote procedures ?
    - number of events produced (compare with expectation from raw file/sam)
    - check readability of output files ?
  • Data set preparation

    1. Project assignment

  • Goal: avoid manual assignment of datasets -> common bookkeeping
    - web page where sites can sign up for specified datasets
    ----- ensure unique assignement (1 project - 1 site)
    ----- general information about the status of project
    (assigned to X, dataset copied to X, dataset done, dataset in sam, dataset copied back)
  • Solution:
    --> MC-production tools ?
    --> Lyon system? (Python scripts + Oracle DB)

    2. Data delivery

  • dedicated node with Gb connections and large buffer space ?
    ---- specify out server to handle both data delivery of raw files and storage of DSTs & thumbnails
  • avoid the need for (manual) prestaging of input data?

    Certification procedure

  • need a well defined line of responsibility:
    - for initial certification
    - for continous monitoring of resulrs
    --> representatives of physics groups and/or detectors responsibles ?
  • need a central repository for a standard set of plots:
    - reference plots
    - plots from run-time info
    - which additional info do we need ?
    --> tools, mechanisms for displaying/overalying plots ?

    Can we use GRID ?

  • What about SAM-Grid ?
    - in rather advanced phase for MC-production
    - could it be adapted to ReReco without great pain?
  • EDG experience
  • What about LCG ?
    - premature ?

  • Tibor Kurca
    Last modified: Thu Feb 19 03:33:36 CST 2004