Reprocessing: Meeting of 29-Nov-2004: 9:30-10:30 ESNet video conference

Our meeting number on the ESNet is 823073776 (82d0repro).
Instructions to dial into a video conference via phone.

Agenda

  1. News
  2. Status of Implementation of Reprocessing in JIM (update)
  3. Status of JIM Deployment to Remote Site
  4. AOB.

Minutes

Participants

Jae Yu (SAR), Dugan O'Neil, Yann Cadaou (Vancouver), Tibor Kurca, Patrice Lebrun (Lyon), Laurent Duflot, Joe Steele, Gabriele Garzoglio, Vlastislav Hynek, Mike Diesburg, Daniel Wicke (FNAL), Phil Lewis (Chicago)

Topics

  1. News
    Of the 20pb-1 sample to be processed with p17.01.00 at the d0farm 40% started. So far 1 infinite loop. A few memory explosions (hit 2GB limit). New reco is faster and output is bigger. For that reason the output buffer was filled. TMB++ should be 60-80kB/event.
  2. Status of Implementation of Reprocessing in JIM (update)
  3. Status of JIM Deployment to Remote Site

    New cut of JIM versions appropriate for reprocessing (fixes XML DB problems):
    jim_job_managers v2_2_18
    jim_sandbox v2_2_1
    sam_client v1_0_8
    xmldb_client v2_0_6
    xmldb_server v1_0_3
    vdt v1_1_14_13

    mc_runjob_v06_03_04-jim-02.tar.gz

    Status of Farms:
    • DØFarm: Storing problem of last time is believed to be cause be misconfiguration which hinders the head node to write to enstore.
    • GridKa: Out of operation since last Thrusday due to expired SAM server certificate (no-one received the warning email)
    • Imperial: JIM installation resumes tomorrow after hardware changes (Frederic Villeneuve-Seguier by email).
    • Lancester: --
    • Manchester: --
    • Rutherford: --
    • Lyon:
      JIM is updated, reprocessing needs updating (Logfiles std_out*, std_err* and custum_output* should be invedtigated). Tibor should send the global job id so that Gabriele and I can have a quick look for known problems
      Further questions: Can we keep merged output. Can we allow to store only to a remote location.
    • SAR:
      Still investigating mc_farm. New test farm setup for mc_runjob MC production. This is done in preparation for mc_rungjob reprocessing.
    • WestGrid:
      Job submission to WestGrid successfull. Progress done by Bimal, Gabriele and Yann should be put together that looks promissing to get WestGrid up and running. During the meeting first jobs started up at worker nodes in WestGrid.
    • Wisconsin:
  4. AOB.

    Mike is Creating datasets for GridKa and In2p3 (~2 months of datadating), will take ~6h to finalise. Thereafter creation of datasets for WestGrid will be started.

    Scientific Linux will be supported, because the D0farm is going to move to it as well. First tests showed library conflict (under investigation).

    p17.01.00 tarball for JIM? Ian has problems getting it to work. Will talk to Mike this week.

    Prague wants to join reprocessing.

    Scheduled test of 'load reducing' jobmanager at the farm for next monday. Try with smaller jobs before.

Next Meeting

6-Dec-2004
Mike Diesburg, Daniel Wicke, 4-Nov-2004. Last Change 29-Nov-2004.
Diesburg@fnal.gov, Wicke@fnal.gov