Support for SAM V7:
Support new SAM, request system, d0runjob.
Requester: Samgrid
Status: In Progress
Tasks Completed: Reco, Reco Merge, Primary processing with mutiple streams, Monte Carlo with all four phases, Monte Carlo Merge.
This requires modification to the runjob macro. We will need the exact macro that will enable us to achieve this.
Requester: Users
Status: Users
Some of the jobs need to be run just till the generator phase and the output should be stored in SAM. This requires modification to the runjob macro. We will need the exact macro that will enable us to achieve this. There are two options available here. Either we have a new job type or if it is a slight variation of MC job then we add a samgrid parameter that distinguishes the phase till which job should be run.
Requester: Users
This already exists for the simulation phase via phase_dataset. Making this facility available for the simulation and reconstruction phases would add greatly to the flexibility of the MC effort.
Requester: Joel, Users
MC merge output storage configuration:
Design and implement job specific storage configuration in MC merge job. There were some software dependancy issues which need to be investigated.
Requester: Yann
JIM Monitoring:
The monitoring should return success or failure based on the success and failure of the user job and not the d0runjob. d0runjob always returns success unless it fails so this is not the right way to check. Port the changes made in d0reco_wrapper for Monte Carlo and verify if it works. This is still fuzzy and we may not be able to do it at the end. Needs further investigation.
Requester: Samgrid
Example:
N = Number of events to to overlap
D = Default number of events per job (Today 250)
X = Percentage of events considered per file
T = Total number of events needed from min bais
T = 100ND/X
Today there are 700 - 3000 events per file. So based on the number of events required to process and the events in the files select the number of files to be included in the dataset.
Requester: Dzero Users
jim_client looks up the SAM database and run queries making its response time slower. We need to lookup ways to reformat the queries or get the information to speed up the things.
Requester: Samgrid
Access to SAM at some sites timeout and the users need to modify the timeout interval in the code. A good way to do it is to make the interval configurable.
Requester: ??
Reorganizing accounting for files that have been merged. Currently we delete the locations from SAM database. This creates inconsistency since now the stations knows about the file but the database does not know anything about the file. Better option would be to check the parentage. But do we need to have access to the file after the merging has been done?
Requester: ??
Lower the load on the gateway node by adding nice commands. Today we are ok but if the operational load increases we need to support scale up.
Requester: ??
Move the changes made to run_grid_job.py to support pbs related job names to batch handler for pbs
Requester: Samgrid
Integration with MyProxy:
look into the possibility of using MyProxy with the SAMGrid for credential management.
Requester: Samgrid
Event count verification for MC:
There are some features that let us verify the number of events produced for reprocessing. Check if these could be useful for MC. Details about this will follow soon.
Do not assume a directory structure of the binary tar balls:
set appropriate environment variables (e.g. $SRC_DIR for d0runjob to work with d0 code) according to the directory tree of the binary tar barr (d0_code, d0runjob). Currently the dir tree in MC and repro wrappers are hard coded.
Requester: MC group
Accelerator should not kill the job while the transfer is in process:
Accelerator kills the job if there is no disk activity for a given interval of time. We should probably check if the job is waiting for its output to be stored back to pnfs and not kill it in such case.
Requester: Samgrid
$Id: mc_todo.html,v 1.38 2006/08/16 02:47:48 parag Exp $