An example to store one file

From pick_event software, I got a file dimuons_set17a_171.out with the following metadata 
dimuons_set17a_171.out.metadata.py looking like :
from import_classes import * 
TheFile = ProcessedFile(
name = 'dimuons_set17a_171.out',
sizeK = 14956, events =
Events(2463870, 3221837, 100),
stream = '',
tier = 'reconstructed-bygroup',
start_time = '06/18/2003 20:25:03',
end_time = '06/18/2003 21:42:06',
pid = 2074981,
parents = ['all_0000175681_039.raw', 'all_0000175681_040.raw', 'all_0000175681_042.raw',
'all_0000175681_050.raw', 'all_0 000175681_048.raw', 'all_0000175681_041.raw',
'all_0000175681_052.raw', 'all_0000175681_045.raw', 'all_0000175681_049.raw',
'all_ 0000175681_047.raw', 'all_0000175681_043.raw', 'all_0000175681_046.raw',
'all_0000175681_044.raw', 'all_0000175681_051.raw'])

You should do the following in order to store this file into SAM:

(1) You want to change the 'tier' to 'raw-bygroup' instead of  'reconstructed-bygroup' or 'reconstructed' or anything else (since this is raw data ).  That is, the line with 'tier' should be :

tier = 'raw-bygroup',

(2) Since this manual storing procedure doesn't have any flags to distinguish our (Bphysics group) data from other groups, you also want to make the files more distinguishable. 

I have changed the filename from "dimuons_set17a_171.out" to "bphys_dimuons_set17a_171.out".

And since you change the filename, you also have to change the filename in the metadata accordingly, ie.,

name = 'bphys_dimuons_set17a_171.out',

At the end, I save/rename the new metadata file to be "bphys_dimuons_set17a_171.out.metadata.py" .

(3) With these changes, you can then store the data into SAM easily (assuming that both the data file and metadata file are on the current directory) by

sam store --descrip=bphys_dimuons_set17a_171.out.metadata.py --source=. --dest=/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all

in one line (assuming that you have already done

                                                                setup sam                                ).


An example to store many files in a directory

Since I have many files of the pattern "dimuons_set17a.out" in the same directory, I write the following csh script called "many_store" :

#!/bin/csh                                                                                                                       

set pattern = "dimuons_set17a"
set files = `ls ${pattern}*.out`

if (! -d ./stored ) mkdir ./stored

setup sam

foreach pick ( $files )
sed "s/${pattern}/bphys_${pattern}/" ${pick}.metadata.py | sed 's/reconstructed-bygroup/raw-bygroup/' >! bphys_${pick}.metadata.py

mv ${pick}.metadata.py /scratch7/kinyip/sam_store
mv ${pick} bphys_${pick}

sam store --descrip=bphys_${pick}.metadata.py --source=. --dest=/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all

rm bphys_${pick}.metadata.pyc
mv bphys_${pick}* ./stored

# Cautiously checking whether the file is stored into SAM

sam locate bphys_${pick} | grep pnfs

if ( $status != 0 ) then
echo "bphys_${pick} is not stored !" >>! bad.list
else
echo "bphys_${pick} is stored."
endif

end
#################################################################################


To execute, you just do:                                        

                                            chmod u+x many_store
                                            many_store &

remembering that the script name is "many_store".  ("chmod" is just in case that the script "many_store" is not executable.)

A few explanations and points of significance :


To verify

You may do, for example, 

        sam locate bphys_dimuons_set17a_171.out

to verify whether you have successfully stored the file "bphys_dimuons_set17a_171.out".

If successful, you should see something like :

['/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all,prq759']


If something screwed up ...

For some reasons, if you see only

[]

when you do

                        sam locate your_filename.

That means, your file is declared but not stored successfully.

If you want to store it for the 2nd time, you need to add the famous "--resubmit" flag at the end of sam store command line.
 


To create a dataset definition

(0) The following uses SAM in command lines.  Many people like to do it at the
Dataset Definition Editor website   They are the same.   In command lines, do execute
"setup sam" first (if you haven't) before any of the following commands.

(1) Check by using "sam translate constraints command" such as :

sam translate constraints --dim="(FILE_NAME bphys_dimuons_set17%.out) and (DATA_TIER raw-bygroup) and
(FULL_PATH /pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all) and
(AVAILABILITY_STATUS available) and (CONTENT_STATUS good)"

in one command line and you should see something like:

Files:
   bphys_dimuons_set17b_0.out
   bphys_dimuons_set17b_1.out
   bphys_dimuons_set17b_10.out
   ............
   ............

File Count:  766
Average File Size:  14879
Total File Size:  11397767
Total Event
Count:  76509


Make sure that the number of files is exactly what you expect.

(2)  All the constraints are inside --dim="..." option.   I have chosen to over-constraint a bit by using
"FULL_PATH" and probably few people (or nobody) have done that.  But since we don't have
any other flags (such as the Application name/faimily/version), I just want to
make damn sure that the dataset doesn't include any other files in the future that some people by
accident choose the same pattern of filenames (though perhaps unimaginable).

And yes (!), the no. of files in a  dataset definition may change over time depending on the constraints.

(3) To create the dataset definition, do for example,

sam create dataset definition --defname=bphysics-dimuons-picked-set17 --group=bphysics
--dim="(FILE_NAME bphys_dimuons_set17%.out) and (DATA_TIER raw-bygroup) and
(FULL_PATH /pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all) and
(AVAILABILITY_STATUS available) and (CONTENT_STATUS good)"

in one command line.

The dataset definition created here is : "bphysics-dimuons-picked-set17".

(4) To check whether this dataset definition contains the correct no. of files, do for example,

 sam translate constraints --dim="__SET__ bphysics-dimuons-picked-set17"

and you should see the same output as shown above in (1).