Procedure
The two steps production and merging need to be certified separately.
Thus each site processes two daysets with comon settings:
- Production Certification:
The first dataset to process is common to all sites and will be used to verify that the production step produces identical results at the various sites.
- Merge Certification:
The second dataset is different for each site and will be used to certify the merging step
by comparing the unmerged with the merged plots at each site.
It checks that the merging procedure doesn't modify the data.
Common settings
SamGrid Release Cuts are now described on the p17 reprocessing page.
Production Certification
Merge certification is ongoing.
Production certification needs to be done by manually submitting grid jobs with samg submit.
The common JDF with the test_run = true should be used.
The dataset to be run is input_dataset = dayset-2004-08-18-196489-0.
Sites should not merge the results. In fact they shouldn't be able to merge.
Sites should define and report back a dataset definition with the resulting unmerged thumbnails
which will be used to run RecoCert on these unmerged thumbnails. This dataset can be defined with
setup d0repro
create_certification_dataset.py unmerged dayset-2004-08-18-196489-0 p17.03.01 "<gid1>[|<gid2>...]" --test
where <gidn> represent the global job id(s) used for producing the unmerged thumbnails.
If you have multiple ids you need to separate them by '|'.
Results of Production Certification
We were able to run only the first 12 files. Attempts to run the full dataset crashed. Results for the standard farm production was obtained by merging the root tuple written out during the farm production.
Standard production vs. JIM production on DØFarm was produced using the release RecoCert p17.03.01. It shows (more than minute) deviations only in plots with known bugs. As the d0farm contains a mixture of Pentuim and Athlon CPUs, these may reflect differences between the FPUs, as jobs are randomly distributed among the various nodes.
Names of datasets used for Production Certification
- Lyon:
- WestGrid:
- DOSAR-UTA-DPCC: recocert-uta-dpcc-dataset0_001-012
- Wisconsin: dayset-2004-08-18-196489-0_certification-wisconsin_001-012
- Prague: dayset-2004-08-18-196489-0_certification_20050406040109_001-012
Results done with different versions of RecoCert
The following overlay plots show histograms produced with different version of RecoCert.
Root tuples for the remote site were produced with self compiled p17-br version of RecoCert checked out on 11-Mar-2005.
The reference root tuple was produced with released RecoCert p17.03.01 from D0Farm TMBs.
- DØFarm dataset: dayset-2004-08-18-196489-0_001-012_certification_20050408083809
Lyon vs. DØFarm
Westgrid vs. DØFarm
Results done consistently with RecoCert p17-br 11-Mar-2005
Lyon vs. DØFarm (certified)
Westgrid vs. DØFarm (certified)
Wisconsin vs. DØFarm (deviations observed)
DOSAR-UTA-DPCC vs. DØFarm (corrected 5-Apr-2005;deviations observed)
Prague vs. DØFarm (deviations observed)
Results done consistently with RecoCert p17-br 11-Mar-2005 compared to a new set of reference plots
These plots are produced from a production run on the d0farm on 8-April-2005
Comparison of Reference 1 (used so far) to Reference 3 (new)
The deviations between these two sets of plots are believed to stem from an update of the calibration database from 1-Apr-2005.
- DØFarm dataset: dayset-2004-08-18-196489-0_001-012_certification_20050413122649
Wisconsin vs. DØFarm Ref.3 (corrected 15-Apr-2005; certified)
DOSAR-UTA-DPCC central DB proxy vs. DØFarm Ref. 3 (corrected 15-Apr-2005; certified)
DOSAR-UTA-DPCC local DB proxy vs. DØFarm Ref. 3 (11-May-2005; certified)
Prague vs. DØFarm Ref. 3 (corrected 15-Apr-2005; certified)
Imperial vs. DØFarm Ref. 3 (11-May-2005; certified)
CMS Farm vs. DØFarm Ref. 3 (12-May-2005;certified)
DOSAR-Oscer vs. DØFarm Ref. 3 (12-May-2005; certified)
Manchester vs. DØFarm Ref. 3 (6-June-2005; certified)
GridKa vs. DØFarm Ref. 3 (30-June-2005; certified (all Xenon CPUs) )
GridKa vs. DØFarm Ref. 3 (05-July-2005; (all Opteron CPUs) )
DOSAR-Sprace vs. DØFarm Ref. 3 (09-August-2005; certified)
Results done consistently with RecoCert p17.03.03
Standard production vs. JIM production on DØFarm (partition 001 only)
Comparison of p17.03.03 tarball vers. 2, vs initial p17.03.03 tarball, on DØFarm (partition 001 only)
The following plots were done with RecoCert p17.05.01:
Standard vs. SamGrid production with p17.05.01 on DØFarm, run 208204
Standard vs. SamGrid production with p17.05.01 on DØFarm, run 208277
Standard vs. SamGrid production with p17.05.01 on DØFarm, run 208278
The following plots were made by fixing the inputs to previous plots (actually after merging them). Daniel Wicke
ran corrections with the p17tmbfixer p17.09.01.
The following plots were made by running RecoCert p17.09.01 on the
fixed certification datasets:
Standard vs. SamGrid production with p17.05.01 on DØFarm, run 208204, fixed with p17.09.01
The following plots were made by re-fixing the input of the previous plots. Daniel Wicke
ran corrections with the p17tmbfixer p17.09.02
Standard vs. SamGrid production with p17.05.01 on DØFarm, run 208204, fixed with p17.09.01 refixed with p17.09.02
Merge Certification
Merge certification is ongoing.
Dataset names to be used for the Merge Certification (pick your site and find the plots linked from the dataset name):
- DØFarm: dayset-2004-06-22-194374-0
dayset-2004-06-22-194374-1
dayset-2004-06-22-194374-2
- GridKa: dayset-2004-06-14-194025-0
dayset-2004-06-14-194025-1
dayset-2004-06-14-194025-2
- Lyon: dayset-2004-06-19-194288-0
dayset-2004-06-19-194288-1
dayset-2004-06-19-194288-2
- DOSAR-UTA-DPCC: dayset-2004-06-20-194289-0
dayset-2004-06-20-194289-1
dayset-2004-06-20-194289-2
- DOSAR-Oscer: dayset-2004-06-23-194379-0 dayset-2004-06-23-194379-1 dayset-2004-06-23-194379-2
- WestGrid: dayset-2004-06-15-194026-0 dayset-2004-06-15-194026-1 dayset-2004-06-15-194026-2
- Wisconsin: dayset-2004-06-23-194380-0 dayset-2004-06-23-194380-1 dayset-2004-06-23-194380-2
- UK: Lancester dayset-2004-06-21-194340-0 dayset-2004-06-21-194340-1 dayset-2004-06-21-194340-2
- UK: London: dayset-2004-06-21-194341-0 dayset-2004-06-21-194341-1 dayset-2004-06-21-194341-2
- UK: Manchester dayset-2004-06-21-194342-0 dayset-2004-06-21-194342-1 dayset-2004-06-21-194342-2
- UK: Rutherford: dayset-2004-06-21-194347-0 dayset-2004-06-21-194347-1 dayset-2004-06-21-194347-2
- CMS Farm: dayset-2004-06-15-194027-0 dayset-2004-06-15-194027-1 dayset-2004-06-15-194027-2
- Prague: dayset-2004-06-24-194447-0 dayset-2004-06-24-194447-1 dayset-2004-06-24-194447-2
This certification dataset can fully be run by using the d0repro tool.
-
setup d0repro
sub_production.py dayset_of_your_site p17.02.00 --test
Then check the status (this step is required before any recovery submission)
check_production.py dayset_of_your_site p17.02.00 --test
Repeat sub_production after the grid-job is completed if some files crashed.
- Create dataset of produced files by:
create_certification_dataset.py unmerged <dayset_of_your_site> p17.02.00 ".*" --test
- Then report back a dataset definition with the resulting unmerged thumbnails
and the global job ids you used for doing the production.
This will be used to run RecoCert again.
- When RecoCert is finished, sites will be asked to merge these unmerged thumbnails.
setup d0repro
sub_merge.py dayset_of_your_site p17.02.00 --test
- Create dataset of produced files by:
create_certification_dataset.py merged <certification_dataset_returned_in_step_2> p17.02.00 ".*" --test
and report back the dataset name created by this command and the global job ids you used for doing the merge.
Results of Merge Certification
Results of Merge Certification are available on a separate web-page.
Comparison of RecoCert Versions
Imperial dayset-2004-06-21-194341-1 using p17.03.03 for both (These comparisons all use one Imperial College merge cert dataset.)
Imperial dayset-2004-06-21-194342-1 (RecoCert plots made with p17.02.00 versus plots made with p17.03.03)
Imperial dayset-2004-06-21-194342-1 (RecoCert plots made with p17.02.00 versus plots made with p17.05.00)
Imperial dayset-2004-06-21-194342-1 (RecoCert plots made with p17.03.03 versus plots made with p17.05.00)
Imperial dayset-2004-06-21-194342-1 (RecoCert plots made with p17.03.03 versus plots made with p17.05.01)
Imperial dayset-2004-06-21-194342-1 (RecoCert plots made with p17.05.00 versus plots made with p17.05.01)
Joe Steele,Daniel Wicke
Created: Wed Jan 5 15:57:32 CST 2005
Last modified: Thu May 25 12:00:19 CDT 2006