From pick_event software, I got a file dimuons_set17a_171.out with the following metadata
dimuons_set17a_171.out.metadata.py looking like :
from import_classes import *
TheFile = ProcessedFile(
name = 'dimuons_set17a_171.out',
sizeK = 14956, events =
Events(2463870, 3221837, 100),
stream = '',
tier = 'reconstructed-bygroup',
start_time = '06/18/2003 20:25:03',
end_time = '06/18/2003 21:42:06',
pid = 2074981,
parents = ['all_0000175681_039.raw', 'all_0000175681_040.raw', 'all_0000175681_042.raw',
'all_0000175681_050.raw', 'all_0 000175681_048.raw', 'all_0000175681_041.raw',
'all_0000175681_052.raw', 'all_0000175681_045.raw', 'all_0000175681_049.raw',
'all_ 0000175681_047.raw', 'all_0000175681_043.raw', 'all_0000175681_046.raw',
'all_0000175681_044.raw', 'all_0000175681_051.raw'])
You should do the following in order to store this file into SAM:
(1) You want to change the 'tier' to 'raw-bygroup' instead of 'reconstructed-bygroup' or 'reconstructed' or anything else (since this is raw data ). That is, the line with 'tier' should be :
tier = 'raw-bygroup',
(2) Since this manual storing procedure doesn't have any flags to distinguish our (Bphysics group) data from other groups, you also want to make the files more distinguishable.
I have changed the filename from "dimuons_set17a_171.out" to "bphys_dimuons_set17a_171.out".
And since you change the filename, you also have to change the filename in the metadata accordingly, ie.,
name = 'bphys_dimuons_set17a_171.out',
At the end, I save/rename the new metadata
file to be "bphys_dimuons_set17a_171.out.metadata.py"
.
(3) With these changes, you can then store the data into SAM easily (assuming that both the data file and metadata file are on the current directory) by
sam store --descrip=bphys_dimuons_set17a_171.out.metadata.py --source=. --dest=/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all
in one line (assuming that you have already done
setup sam ).
Since I have many files of the pattern "dimuons_set17a.out" in the same directory, I write the following csh script called "many_store" :
#!/bin/csh
set pattern = "dimuons_set17a"
set files = `ls ${pattern}*.out`
if (! -d ./stored ) mkdir ./stored
setup sam
foreach pick ( $files )
sed "s/${pattern}/bphys_${pattern}/" ${pick}.metadata.py | sed 's/reconstructed-bygroup/raw-bygroup/' >! bphys_${pick}.metadata.py
mv ${pick}.metadata.py /scratch7/kinyip/sam_store
mv ${pick} bphys_${pick}
sam store --descrip=bphys_${pick}.metadata.py --source=. --dest=/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all
rm bphys_${pick}.metadata.pyc
mv bphys_${pick}* ./stored
# Cautiously checking whether the file is stored into SAM
sam locate bphys_${pick} | grep pnfs
if ( $status != 0 ) then
echo "bphys_${pick} is not stored !" >>! bad.list
else
echo "bphys_${pick} is stored."
endif
end
#################################################################################
chmod u+x many_store
many_store
&
remembering that the script name is
"many_store". ("chmod" is just in case that the script
"many_store"
is not executable.)
A few explanations and points of significance :
You may do, for example,
sam locate bphys_dimuons_set17a_171.out
to verify whether you have successfully stored the file "bphys_dimuons_set17a_171.out".
If successful, you should see something like :
['/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all,prq759']
For some reasons, if you see only
[]
when you do
sam locate your_filename.
That means, your file is declared but not stored successfully.
If you want to store it for the
2nd time, you need to add the famous "--resubmit" flag at the end of sam store command line.
(0) The following uses SAM in command
lines. Many people like to do it at the
Dataset
Definition Editor website.
They are the same. In command lines, do execute
"setup sam" first (if you haven't) before any of the following commands.
(1) Check by using "sam translate constraints
command" such as :
sam translate constraints --dim="(FILE_NAME
bphys_dimuons_set17%.out) and (DATA_TIER raw-bygroup) and
(FULL_PATH
/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all)
and
(AVAILABILITY_STATUS available) and (CONTENT_STATUS good)"
in one command line and you
should see something like:
Files:
bphys_dimuons_set17b_0.out
bphys_dimuons_set17b_1.out
bphys_dimuons_set17b_10.out
............
............
File Count: 766
Average File Size: 14879
Total File Size: 11397767
Total Event Count: 76509
Make sure that the number of files is exactly what you expect.
(2)
All the constraints are inside --dim="..." option. I have
chosen to over-constraint a bit by using
"FULL_PATH" and probably few
people (or nobody) have done that. But since we don't have
any other flags (such as the
Application name/faimily/version), I just want to
make damn sure that the dataset
doesn't include any other files in the future that some people by
accident choose the same pattern
of filenames (though perhaps unimaginable).
And yes (!), the no. of files in
a dataset definition may change over time depending on the
constraints.
(3) To create the dataset definition, do for
example,
sam create dataset definition
--defname=bphysics-dimuons-picked-set17 --group=bphysics
--dim="(FILE_NAME bphys_dimuons_set17%.out) and (DATA_TIER raw-bygroup)
and
(FULL_PATH
/pnfs/sam/dzero/copy1/physics_data_taking/group-phase1/bphysics/raw/all)
and
(AVAILABILITY_STATUS available) and (CONTENT_STATUS good)"
in one command line.
The dataset definition created here is : "bphysics-dimuons-picked-set17".
(4) To check whether this dataset definition
contains the
correct no. of files, do for example,
sam
translate constraints --dim="__SET__ bphysics-dimuons-picked-set17"
and you should see the same
output as shown above in (1).