I am assuming you are doing your analysis on d0mino. There may be subtle differences to the setups and path names among other things if you do your work elsewhere. Much of this is covered in greater detail in Heidi Schellman's tutorial. I highly recommend reading this, as I only cover what is needed to setup and run a single package.
I save myself time by putting the following setups in the .login file in my home directory:
You do not need to set up a new release each time you login to d0mino. The new release is only necessary if you wish to use the latest and freshest D0 code. It is not uncommon to be using a version one or two generations behind the most recent. The same is true for adding a package. If the author of a package has informed you of a change, or if you want to make sure you are using the head version, you can obtain the updates by issuing the following command:
Before you compile your package and produce an executable, you need to samify your package. For the cal_entpl package, Laurent has already done most of the work for us:
| samify cal_entpl |
|
#!/usr/local/bin/tcsh -f #cvs checkout sam_manager/rcp echo "sam_manager" >> $1/bin/LIBRARIES echo "RegSAMManager" >> $1/bin/OBJECTS exit 0 |
It will change the OBJECTS and LIBRARIES but not check out sam_manager.
Next, go to your working area directory, and type the following commands:
There is a framework.rcp file in the directory:
|
string Packages = "sam read runcfg geo globalnt calana unpack" RCP sam = < cal_entpl SAM > RCP read = < cal_entpl ReadEvent > RCP runcfg = < cal_entpl run_config_mgr > RCP geo = < geometry_management geometry_management > RCP globalnt = < cal_entpl NtupleMgr > RCP unpack = < unpack_reco UnpackReco_CAL_FE > RCP calana = < cal_entpl CalElecNtupleMaker > untracked int debugLevel = 0 |
If not, I recommend that you modify your file so it does. sam has been added to the string of packages, and a RCP sam line has also been added.
If you wish to retrieve and analyze a SAM file, you need to modify the ReadEvent.rcp file. Comment out:
Now you will want to use a previously created dataset, or create one of your own in SAM. Go to the Sam Data Browsing page. If you know about a dataset created by someone, you can search in Datasets by name, date, user, etc.
If you want to create a dataset of your own, you will need to work with a set of parameters in order to narrow your search. Type the following commands:
Specify dimensions and constraints combined with and/or/minus operators as in these examples:
--dim='file_name %ztautau% and data_tier digitized'
--rpn='file_name %ztautau% data_tier digitized and'
--dim='file_name %ztautau%,%ztigtig% or physical_datastream_name e+j'
--dim='(data_tier digitized and appl_name d0reco and version preco03.07.00) minus run_number 40041'
Available dimensions (not case sensitive):
APPL_NAME : Application Name that was run on other files, resulting in the production of this file.
APPL_NAME_ANALYZED : Application Name that was run to analyze this file.
CREATE_DATE : Date the file was created.
DATASET_DEF_ID : Dataset definition id for a definition that contains this file in one of its datasets.
Useful with the Dataset_Version dimension.
DATASET_DEF_NAME : Dataset definition name for a definition that contains the file in one of its datasets.
Useful with the Dataset_Version dimension.
DATASET_ID : Numeric ID of a dataset that contains the file.
DATASET_VERSION : Version of a dataset that containts the file. Useful when combined with either the
Dataset_Def_Id or Dataset_Def_Name dimensions.
DATA_FILE_LOCATION_STATUS : Status of the data file location.
DATA_FILE_NAME : Unique name of the file in SAM. The wildcard (%) is very useful when using this dimension.
DATA_TIER : Data tier of the file.
DELIVERED_STATUS : Status of the file delivery.
EVENT_NUMBER : Event number contained within the file.
FAMILY : Application Family that was run on other files, resulting in the production of this file.
FAMILY_ANALYZED : Application Family that was run to analyze this file.
FILE_ANALYZED : Name of a data file that was analyzed to produce this file.
FILE_NAME : Unique name of the file in SAM. The wildcard (%) is very useful when using this dimension.
FILE_STATUS : Status of the data file.
FULL_PATH : The full path of the data file, including disk or tape location.
LOGICAL_DATASTREAM_NAME : The name of the logical datastream contained in this file.
PATH : The path of the data file, excluding the file name itself.
PHYSICAL_DATASTREAM_NAME : The name of the physical datatream contained in this file.
PROJECT_NAME : The name of the project that was run to produce this file.
RUN_ID :
RUN_NUMBER : The Run Number that created this file.
RUN_TYPE : The Run Type of the run that created this file.
RUN_TYPE_ID : The numeric ID of the Run Type that created this file.
TAPE_LABEL : The label on the tape that contains this file.
VERSION : The Application Version that was run on other files, resulting in the production of this file.
VERSION_ANALYZED : Application Name that was run to analyze this file.
__SET__ : Special dimension allowing you to query all files that match another dataset definition name.
This is useful for combining with union/and/or operators on your own set of dimensions. Simply use
__SET__ as your dimension name and the name of your existing definition as the constraint value, e.g.
--dim='file_name %ztautau% minus __set__ my-files-already-analyzed'
The dimension __SET__ is a special dimension which lets you combine prior dataset definitions into your
new dataset definition, simply use __SET__ as your dimension name and the name of the existing
dataset definition as the constraint value, e.g.
--dim='file_name %ztautau% minus __set__ my-files-already-analyzed'
|
For additional information on Dimension Names, Constraint Operators and Set Operators, go to the page on SAM Dataset Definition Grammar.
If you want to look at recent raw data from a global run, you could type:
Files: all_0000123872_001.raw all_0000123873_001.raw all_0000123873_002.raw all_0000123873_003.raw all_0000123874_001.raw all_0000123874_002.raw all_0000123874_003.raw all_0000123874_004.raw all_0000123874_005.raw all_0000123874_006.raw all_0000123874_007.raw File Count: 11 Average File Size: 366269Other options for physical_datastream_name are store_1x8, cosmics, daq_test and calibration. As for data_tier, you can find more options on this SAM Query Page. The full sam translate constraints --dim='...' command is issued on a continuous line.
| Attention: The physical datastream name is just a part of the file name in SAM. The all stream is currently synonomous with 36x36 collisions, i.e. real physics. However, store_1x8 has accidentally been used in the names of files recorded during proton and pbar halo runs. It is better to search the Runs Database for particular trigger configurations, and take that information to SAM when creating a dataset. |
Now you are ready to Define Your Dataset. Taking the example from above:
You should practice using the Sam Data Browsing. Search for your Dataset Definition
with key words like:
Dataset Definition Name
The next step is to Create Your Dataset.
Persons First Name
Persons Last Name
Username
Physics Work Group
Started Before: (dd-mon-yyyy:) Ex: 06-JUN-1999
Started On or After: (dd-mon-yyyy:) Ex: 06-JUN-1999
There is an alternative, and in my opinion, easier way to create a dataset in SAM. The Dataset Definition Editor interface allows you to create a new dataset or clone a previously defined dataset. As an example of the latter, you can click on Person, then the user name alstone. There you will find several datasets. Click on raw_run_124110, and you will see the following in your browser:
You cannot edit this dataset, but if you click on the clone button, a copy of the dataset is produced. You can then edit the clone to fit your needs. The clone will look like:
You can change the name, the group, user and/or dimension query. If you do change the dimension fields, you can always click on the translate button, and an updated dataset will appear in the window at the bottom of the page. Once you are satisfied, you can save the cloned dataset, which is the same as defining a new one. Make sure you are happy with the name before saving it. The default name justs adds a Clone- prefix to the original definition name.
Wyatt Merritt put together a quick tutorial on the Dataset Definition Editor with more details than I gave above, particularly in the case of starting at the beginning with a New Dataset.
|
Please Note:
SAM does not guarantee that the files in your dataset will be delivered in a
particular order. If you need to work with only a single file, or you want
to repeat your analysis on the same set of datafiles with a new version of your code,
you should not rely on SAM to provide the files in the order and combination in
which you need them. A python script exists that people have used to cache files so
they could analyze SAM files in a particular group or sequence. However,
the sam run project command is being deprecated, and
should no longer be used with the python script. You can search for files and/or datasets to find out whether particular files are already cached on d0mino. Cached files should stay in cache indefinitely, unless the disk quota for that working group has been reached. If that occurs, the oldest unused cached files will be deleted to make room for new requests. You can check the status of the project and disk quotas by typing:
|
You are now ready to retrieve and process a SAM file with your package.
I have modified some scripts that you can use to submit an interactive or batch SAM job. Only the essentials options and parameters are provided, but the scripts are complete and get the job done.
The only parameter you are likely to change from job to job is SAM_DATASET. If you use a different executable, you will need to change EXEC. The GROUP value of cal is fine unless we reach some group or disk limit, in which case you may try dzero. To submit your batch job, type:
[d0mino]< alstone > ./sam_job.sh SAM_PROJECT raw_run_124236_v1_06_28_01_02_22 SAM_DATASET raw_run_124236_v1 GROUP cal SNAP_VERSION last EXEC bin/IRIX6-KCC_3_4/CalElecNtupleMaker FRAME_RCP -rcp framework.rcp -num_events 50000 BATCH_JOB -N -o my_job_log >>>>>> Starting project with the Station Master Station Master contacted, result: Started project 29462(raw_run_124236_v1_06_28_01_02_22) for group cal Waiting for the project to initialize... Callback from server: 'OK|Project is ready' >>>>>> Submitting the job to the batch system >>>>>> Executing: bsub -P raw_run_124236_v1_06_28_01_02_22 -N -o my_job_log -q sam_lo /usr/products/sam_user/IRIX-6-5/v3_1_3/bin/samscript.sh framework_wrapper.sh raw_run_124236_v1_06_28_01_02_22 central-analysis bin/IRIX6-KCC_3_4/CalElecNtupleMaker -rcp framework.rcp -num_events 50000 Job < 16037 > is submitted to queue < sam_lo >. [d0mino]< alstone > |
I did a command line override for the Number of Events with -num_events 50000. An output log file is created for the job.
I saved the job log from a batch job which processed 5130 events successfully from Run 124038.
To follow the progress of your run, you can check on the status of your project, or go to the command line and type:
|
Last modified: Mon Jul 16 22:30:41 CDT 2001
Web page maintained by Alan L. Stone: alstone@fnal.gov |