Farm Dataset Audit Information

    The audit file contains information on the completion status of physics datasets processed on the production farm.  It is updated on a daily basis at ~01:00AM.  Information is provided on the completion status of unmerged thumbnails, merged thumbnails, and RecoCert files.  The columns in the file are described below.

Project Definition Name

    The project defnition name listed in column 1 is the name of the project  definition which selects the raw data files from SAM.    The definition names have the following format:

dayset-yyyy-mm-dd-stream-run-index

where

dayset      =   Just an identifier for the farm projects, it means nothing
yyyy-mm-dd  =   Date on which the first partition of raw data in this stream was written to SAM
stream      =   Physical datastream name of the data.   For example:  all_1, all_2, etc
run         =   Run number
index       =   First digit of the 3-digit partition numbers of files in the set.  

For example:
      dayset-2007-12-03-all_4-238290-0   Contains raw data partitions 000-099 of stream all_4 in run 238290
      dayset-2007-12-03-all_3-238290-1   Contains raw data partitions 100-199 of stream all_3 in run 238290
      dayset-2007-12-03-all_3-238290-2   Contains raw data partitions 200-299 of stream all_3 in run 238290

    Note that all streams from a given run do not necessarily start on the same day.   But all partitions in a given stream and run will have the same starting date. The actual definitions are of the following form:

% sam describe definition --defname=dayset-2007-12-03-all_4-238290-0

Definition Name : dayset-2007-12-03-all_4-238290-0
Definition Id   : 1084378
Description     : raw data run 238290 which started  on 12/03/2007
Creation date   : 05-Dec-2007 06:00:00 (UTC)
User name       : diesburg
Group name      : d0production
Dimensions      : ((((DATA_TIER  raw and TRIG_CONFIG_TYPE  physics) and PHYSICAL_DATASTREAM_NAME  all_4) and RUN_NUMBER  238290) and FILE_PARTITION  000-099)

Raw Files, Events, KB/Ev

    Columns 2-4 display information about the raw data selected by the project.   Column 2 is the number of raw files selected by the project definition.   Column 3 is the total number events in all files in the project.   Column 4 is the average size of the events in the project in KBytes.   The size is calculated from the total file size reported by SAM divided by the total event count.   The data for the raw files is extracted from SAM with a command like:

% sam translate constraints --dim="__set__ dayset-2007-12-03-all_4-238290-0" --summaryOnly

Unmerged Files, Events, %Comp         

    Columns 5-7 display information about unmerged thumbnails produced from the raw data in columns 2-3.  Column 5 is the number of raw files which have a descendant in SAM in the unmerged-thumbnail data tier that was produced by the d0reco application of the appropriate version.  Column 6 is the total number of raw events which have a descendant in SAM in the unmerged-thumbnail data tier that was produced by the d0reco application of the appropriate version.  Column 7 is the percentage by event count of the raw data which has an unmerged thumbnail in SAM.
    Note that these numbers are determined by counting constrained raw data, not by counting unmerged thumbnails and events directly.  The data for these counts is extracted from SAM with a command like:

% sam translate constraints --dim="__set__ dayset-2007-12-03-all_4-238290-0 and file_analyzed > 0 and appl_name_analyzed d0reco and data_tier_analyzed unmerged-thumbnail and version_analyzed p20.11.01"  --summaryOnly

Merged Files, Events, %Comp, KB/Ev

    Columns 8-11 display information about the merged thumbnails produced from the unmerged data in columns 5-7.  Statistics for the merged files obtained by examining the merged files themselves rather than by constraining the raw file selection.   The list of merged files which  match the input raw dataset is first selected with a command like:

% sam translate constraints --dim="data_tier thumbnail and appl_name d0reco and run_number 238290 and physical_datastream_name all_3 and file_name recoT_%_mrg_0% and version p20.11.01"

    The list of merged files is looped over and the number of parents of the merged files is totaled to arrive at the file count listed in column 8.  The event counts of the merged files are totaled to arrive at the event count in column 9.   Column 10 is the percentage by event count of the raw data which has a merged thumbnail in SAM.    Column 11 is the size per event of the merged thumbnail events as determined by the total file size reported by SAM divided by the total event count for the merged files.

RecoCert Files, Events, %Comp

    Columns 12-14 display information about the RecoCert files produced from the merged data in columns 8-11.   Each merged file of a given dataset is checked to see if it has a descendant with a file name of the form "cert_mergedfilename.root".    The parentage count of each merged file which has such a descendant are totaled to arrive at the file count in column 12.  The event counts of the descendant files are totaled to arrive at the event count in column 13.    Column 14 is the percentage by event count of raw data which have entries in RecoCert files.

Delta Files, Events

    Columns 15-20 contain the number of missing files and events for unmerged, merged, and RecoCert data in that order.   Note there is no additional consistency information in these nummbers.  They are simply differences caclulated from the preceeding columns.

Status

    Column 21 contains the completion status of the dataset.  The completion status is determined by comparing the raw, unmerged, and merged file counts.   The RecoCert counts are not considered in setting the status.    The possible values of the status are:

COMPLETE   =    No further processing to do.  All raw files in the dataset were successfully
                processed and merged and the merged files stored in SAM.
FINISHED   =    No further processing to do.  Some raw files failed reconstruction and cannot be recovered.
ACTIVE     =    Processing still in progress or not yet started.
RECHECK    =    Status to be rechecked at the next audit update.

    The status of "FINISHED" cannot be unequivocally determined from the file counts.  If a dataset is merged before all files have been reconstructed then it may be incorreclty flagged as "FINISHED".    "FINISHED" datasets will be periodically reviewd to ensure their status is correct.

Version

    Column 22 contains the version of D0reco (and RecoCert) that was used to process the dataset.    This version is taken from the metadata of a merged TMB file (specifically, the last merged TMB reported by SAM).    Consequently, no verison will be rported until at least one merged TMB has been stored in SAM.   An ACTIVE dataset which does not yet have any merged TMB files in SAM will show '---------' in this column.