| 1 | General Introduction | ||||||||||
| |||||||||||
| 2 | Set Up SEED | ||||||||||
| |||||||||||
| 3 | N-tuple To ROOT File | ||||||||||
| |||||||||||
| 4 | Analyzing Data In ROOT File | ||||||||||
| |||||||||||
| 5 | Code | ||||||||||
| |||||||||||
| 6 | Feedback | ||||||||||
|
SEED was created end of 2000 by Paul Balm, Axel Naumann and Onne Peters, dealing with n-tuples and realizing the lack of flexibility that comes with them. The ability to share code was very limited, data was not stored in physics objects but in "leafs", making program code hard to read. One had to rewrite large portions of one's code when the n-tuple was changed. n-Tuples are simply not object oriented.
As ROOT is the standard analysis tool at DØ we decided to integrate the package into ROOT's environment, allowing data storage and object handling to be done by ROOT. It consists of a framework allowing the transition from the n-tuple to objects, a library of predefined Data Classes. The framework is completely generic, no DØ or physics specific code is part of the framework.
By the way: The name "SEED" comes from the fact that this tool helps you to build trees.
SEED helps you to analyze the data stored in n-tuple form. Although n-tuples can be read back to ROOT, they lack some important functionality: They do not store objects but data tables, they can not be accessed in an object oriented way and they don't allow inheritance, thus blocking code sharing.
You reformat the n-tuple data into objects once, using SEED. Afterwards you can simply read your objects from file, again using SEED. So SEED consists of a ROOT compatible data storage and access framework, of built-in Data Classes and of an interface that makes it easy and still versatile to use. There are many options available within the package, such as
| What are Data Classes, what are Seeds? To seperate data storage from data generation, you will not transform and store your data within the same class. Instead, Data Classes store your data, and Seeds fill them with the data from your n-tuple. Any class that is supported by Root can be used as Data Class. |
Whenever you change the layout of your n-tuple you will only have to rewrite a small portion of your code. Only the Seeds know about the layout of parts of the n-tuple, so only the affected Seeds have to be changed.
The objects storing your data, the Data Classes, can inherit from other Data Classes. That allows you to reuse standard code, such as definitions of jet widths, jet types etc. But you can always substitute these definitions by adding your own. The object oriented approach allows you to use the same code on different types of input data. So while you are extending your personal version of SEED think of all the others using this library, too: Share code! Just send your class in.
n-Tuple files are still the common data format for all sorts of data.
If you want to generate plots, you usually read the n-tuples into your data analysis
program (e.g. ROOT). Even for further
data analysis (e.g. to look for appropriate cuts) a relatively small n-tuple is much
handier than a huge data file.
n-Tuples have a defined layout, i.e. a set (struct) of variables of
defined type. It might consist of 1 int,
followed by two arrays of 100 floats, and finally a single float.
This would be one "event". Each "event" is stored in one "line" of the n-tuple,
each "column" represents one bit of data (e.g. one int).
ROOT is the DØ standard
data analysis tool. It can be used on several platforms. It is written in c++,
expecting the user to either use a script version of c++ for the interactive
mode or to link ROOT's libraries into the user's analysis program. The latter is usually the
preferred method; given todays computing power it takes some seconds to compile
one's code - complete with error checking and a "nice" development GUI "for free".
And the code is executed much faster than in interactive mode.
For ROOT an n-tuple is a special case of a
TTree.
Trees can be
read in several ways, the preferred way for using ROOT's libraries is
TTree::MakeClass. Here, a struct with members according to the n-tuple's
columns is filled with one row at a time - thus one can iterate through the
n-tuple, accessing each event's data via a filled struct.
A typical (but very small) n-tuple struct, as generated by
TTree::MakeClass, would look like this:
struct _somedata{
int JCCG_BCxP;
int JCCG_nJets;
float JCCG_Px[100];
float JCCG_Py[100];
float JCCG_Pz[100];
};
There are some disadvantages for using these structs with ROOT:
So here is the way out: Instead of accessing your data from n-tuple structs you use classes that represent objects, holding the data that "belongs" to the object. For example a jet would have a direction and a width, another implementation might add the jet's energy and mass, or the tracks belonging to the jet.
SEED comes with a set of data classes - if they don't suit you just add your own. It is important to understand that these data classes are the containers of your data - after they are filled with n-tuple data once you will never touch the n-tuple again, instead you will write jet[i].E() to access a jet's energy.
So how does inheritance help? Assume you know somebody who has written a nice tool to calculate the probability that a jet is a b-jet, based on its track multiplicity. His jets contain only the basic jet information (width and direction) and tracks, let's call his class TTrackJet. But you want to store additional, more detailed information in your jets, provided by the calorimeter, let's call your class TCaloTrackJet. So you share some common data (basic jet, let's just call it TJet, and the tracks), but you have different classes. With the n-tuple you would spend at least a day to change the code so you can use it.
Now let's look at the inheritance tree of those jet classes:
![]() |
All the information contained in TTrackJet can also be accessed in TCaloTrackJet - even more, you can make TCaloTrackJet look like a TTrackJet: All methods written for a TTrackJet can also be used by a TCaloTrackJet. |
The seeds move data from the n-tuple to the Data Classes. You define a class deriving from TSeed transferring a certain part of an n-tuple (e.g. Monte Carlo information) into a certain Data Class (e.g. a TParticle). The seeds look a bit like the structs mentioned above. They are classes, but as they build the link to the ntuples they have a little bit of "fortran-ness" attached to them. If you're used to TTree::MakeClass you'll easily see how it works!
You can automatically generate Seeds from an n-tuple (or just a branch of it) by calling TEventSample::MakeSeed, just like TTree::MakeClass does. The more parameters you specify to MakeSeed, the more it can prepare for you. Here are the possible parameters:
| TEventSample::MakeSeed( | ||
| ntuple file name, | Give an ntuple file as input to generate a Seed for a specific ntuple layout | |
| ntuple ID, | Give the name of the ntuple in the file (or NULL if there is just one) | |
| branch name, | Give the name of a branch of the ntuple if you want to generate a Seed only for this specific branch (or NULL if you want your Seed to access to the complete ntuple) | |
| seed name, | Give the class name of your seed (if you don't give this argument, the default Seed name "TMySeed" will be assumed). This also defines the name of the output file (default: TMySeed.cxx and TMySeed.h) | |
| data class name) | Give the name of the class into which this seed should store the n-tuple data | |
This generated Seed has one method that needs to be written by you (it tells you, just look at the generated code): SeedName::FillSeedData(...). It is called for every event, and the framework will pass two arguments, TSingleEntryTree &tree, const int iEventNo. iEventNo obviously holds the number of the current event (starting with 0). The TSingleEntryTree is most probably not needed for your Seed.
Once you have written this method filling the Data Class with data from the n-tuple SEED will process the n-tuple, write a Root file containing one TEventSample which holds a TTree containing all processed data in their Data Classes.
When you change your n-tuple most of the Seeds will stay the same, only those actually affected by the change (where actually means "content wise", e.g. because you want a new column to be stored in your Data Classes) will have to change. Furthermore, you can keep different Seeds and enable only a subset of them, without recompiling your ntpl2root converter. And to make it even fancier: To speed up your analysis you can make SEED ignore parts of your root file that you don't need - by disabeling some of your Seeds. Again without recompiling your code, just by editing a text file (see the chapter Include Your Seeds for information on how that works).
This chapter assumes you use ROOT and SEED within your own program, i.e. you link their libraries. If you want to use ROOT and SEED interactively you will have to
Easy. First of all, download the current SEED package for Windows and MSVC++ (Most of the time you will only find the sources up to date, which come with a workspace for MSVC - which might also be not up to date...). Execute the file you've downloaded and follow the instructions. Select the "typical installation" option. After some clicks you have your own copy of the SEED library. Do the same for ROOT (download ROOT version 3.02 or better). Now it's time to tell your compiler about ROOT and SEED:
| Open MS Visual C++. Go to Tools, Options, tab Directories, show directories for Include Files, add a new line with the location of your installation of ROOT + \include behind it (e. g. D:\root\include) Now you will have to do the same for the libraries: change to show directories for Library Files and add a new line for your installation path for ROOT + \lib (e. g. D:\root\lib), |
Now it's time to create your first project. Worth reading: Andy Haas has written the "official" ROOT on Windows page; have a look for a short introduction on how to get a program up and running. It's not necessary - we will go though all the steps in this tutorial, too. So let's start!
If you downloaded the source files, continue reading - if you downloaded the windows executable (InstallShield), jump!
You will have to build SEED's library first. In the directory seed_framework open the workspace seed_framework.dsw by double clicking on it in the explorer. Right click on the project "seed_framework" in the project browser (left window tile) and select "Set as active project". Open a command shell (under NT this is called "cmd.exe"), and cd to your installation directory of SEED. Now cd to seed_framework\seed_dict. Make sure you have the following environment variables set (check by issuing a "set" command in the shell):
Now you will have to build ROOT's dictionary for SEED. Say CreateRootdict /f and check the output of this batch file. If it says "file in use by another application" - do it again. If it still says so, do it again. Then it's time for CreateRootdict /d. Dictionary done!
Now let's build the library: In MS Visual Studio press F7 to build this project. Sounds like a lot of work? Maybe. But you will only have to do this when you update the library from this web site.
In your SEED directory you will find a file called Seed.dsw - an example Workspace for MS VC++. Close any already opened MS VC++. Double click the file Seed.dsw. MS VC++ will open, with a number of predefined projects in the projects panel.
Right click on the ntpl2class project, click Set as active project. Goto Build, Rebuild all. Now MSVC compiles this project. Execute it. You will see SEED transforming the example n-tuple file into a SEED file (or an error message, if you downloaded the sources only, without example files). When it is done it shows you the number of objects it created and the content of the newly created tree.
Now right click on the analyze project, click Set as active project. Press F7. Now MSVC compiles this project. Execute it. You will see ROOT and SEED working hard to plot the angular difference between jets and tracks (yes, there are more tracks within jets). Now have a short look at the only included source file in this project, analyze.cxx. It's pretty short, and quite easy to read.
| non DØ | DØ-it-yourself | DØ | |
| Get the code | Download the code, gunzip it and tar -xf it | cvs checkout seed seed_framework | setup seedlib and cvs checkout seed (you will use the seed_framework as a product, which means you won't have to build it yourself) |
| Setup environment variables | cd seed, then
|
|
cd seed; setseeddir |
| Build the framework |
cd seed_framework; gmake You will have to build seed_framework and seed separately. The good thing is: Once you have built seed_framework you will never change it - so you will never have to rebuild it (until you download the next SEED update, that is). | not necessary | |
| Build example binaries | cd seed; gmake | ||
If you have cvs checkout'ed seed or downloaded the source version with the example file then execute the example binary bin/ntpl2class_x (call it from the directory seed/). It will transform the example n-tuple (jets.ntpl) into a root file (jets.root) containing jet and particle objects. When it's done, call bin/analyze_x for a little example analysis. Have a look at these programs' source codes, they are in src/ntpl2class.cxx and src/analyze.cxx.
The SEED library incorporates both parts of SEED: The framework (the n-tuple extraction and tree management classes) and example Data Classes. You can look up their documentation in the Reference Guide. You will need to link this library to your programs using SEED, and you have to load the shared library version (.so for Unix, .dll for Windows) to use SEED with the interactive version of Root. When using the supplied UNIX makefile this is all done automatically if you enter the file names without file extension of all your files containing a main() in seed/src/BINARIES.
| All classes in the SEED library "live" in the namespace seed! |
| All files in the SEED library have to be #included giving the directory seed, e.g. #include "seed/TEventSample.h" |
The Data Classes are a library of classes written by users of SEED. You can use your own, any Root class, or use those being part of SEED. SEED's Data Classes are documented, they are smoothly connecting with Root's classes. They have their own namespace seed to prevent them from interfering with your or Root's with the same name (e.g. Root defines its own TParticle, SEED another with you can access as seed::TParticle).
| All Data Classes must invoke ROOT's ClassDef macro, see ROOT's documentation on how to add your own classes, to allow ROOT to store and read the data. |
If you want to add your own data classes, put them into the directory seed/dataclasses and include them in the file seed/dataclasses/dataclasses_user_linkdef.h. If you are not familiar with linkdef.h files you should first give it a try (it's pretty obvious what to do once you have a look at the existing seed/dataclasses/dataclasses_user_linkdef.h) or go to the documentation on rootcint.)
| Your dataclasses have to be #included without trailing directory, e.g. #include "TMyDataClass.h" for a TMyDataClass.h in dataclasses/. This is valid for all files in the seed directory: To include them, do not give a directory. |
Be sure to set up your project first, as this chapter will use references to example classes and files excessivly. You will learn how to transform nasty n-tuple data into easy to analyze objects.
Open the file src/ntpl2class.cxx. As you can see, this file includes two "types" of headers: one SEED header (#include "seed/TEventSample.h"), and two other headers, #include "TJCCGJet.h" and #include "TPRTParticle.h", which contain example Seeds. TEventSample is the class which will contain your class tree (and which handles the conversion from an n-tuple). We will have a look at the two Seeds headers later. Let's first step though the main() method:
|
Notice the using namespace seed; statement; all classes defined in SEED "live" within this namespace. As long as classes are not defined in global scope and in SEED using namespace seed is the easiest option, otherwise you have to precede the classes with their respective namespaces, i.e. seed::TEventSample.
First we create a TEventSample, which will carry out the n-tuple to root conversion. Before, we set some options for the conversion:
I did not explain the ROOT part - TH1F, TCanvas, TROOT and TApplication are the ROOT objects used in this example. But there is a whole list of very good documentations on ROOT, e.g. the official ROOT documentation, DØ's ROOT documentation and, probably most important, the online reference guide to ROOT.
We have already seen that you need to include your Seeds' header files in the main-cxx. Now let's have a look at the file seed/src/TJCCGJet.h. This is the header for one of the user defined Seeds. It looks like this:
|
| Each Seed declaration has to include the file "seed/TSeed.h" plus the file which defines the type of the object where the data will be stored in, in this case TCaloJet.h. |
In line 9 we tell the compiler that this seed has to handle data classes of type seed::TCaloJet. As the authors of SEED did not know which data class you will use your seed for, they had to use a "template" to describe what TSeed should do; here (in < > brackets) you have to specify what the template class actually is.
Line 13 holds a constructor (which has to have the exact same list of arguments as given here!) which calls TSeed's constructor, telling it that this seed is called "TJCCGJet". To create your own seed, just copy these lines, change the name of the constructor and destructor and the name of the class passed to TSeed as a string.
| The last line defines a static object within an unnamed namespace (which makes the whole object inaccessible from the outside world). This object will be instantiated when you include the seed's header file (under UNIX just linking the object containing the seed triggers its instantiation). When this static object is constructed it will make itself known to SEED. Do NOT forget this static object or SEED won't be able to use your seed! |
Of course the interesting part happens in Init and FillSeedData, which have to be written by you. But before looking at their implementation for TJCCGJet make sure you know what branches and leafs are. And you might want to open the output of showntpl run over the example n-tuple. Now let's look at the implementation of Init and FillSeedData:
|
Let's start with Init. The TSingleEventTree tree gives you
access to the n-tuple data for the current event, and in line 9 one
branch of this n-tuple is selected (the branch is called "JCCG", you
can find the branch names running showntpl
In line 11 the first value is requested from the n-tuple: The leaf storing this value in the ntuple is called "JCCGPx" and tree.Transfer tells TSingleEntryTree to make that value accessible through fPX. fPX is one if the members of TJCCGJet (check for yourself!), and it is declared as a Float_t*, so it's an array. It will hold whatever is stored in the leaf "JCCGPx" or branch "JCCG" of the currently selected entry of the ntuple, so e.g. the third jet's "JCCGPx" value is fPX[2]. To know how many jets are in the current event TJCCGJet has the member fNJets which gets connected to the leaf "JCCGnjets" in line 20. Of course this is just one value per event, so fNJets is declared as Int_t, not Int_t*.
Now that all the data has arrived in our example seed TJCCGJet all that's left is to fill it into one of our objects (of type seed::TCaloJet, that's what we specified in the header). The for loop in the same line will execute the following block once for every jet (we know how many jets there are, that's stored in the leaf "JCCGnjets" which is transferred to fNJets), setting iJet to the index of the curent jet. In line 30 you get access to an object of type seed::TCaloJet - one requests this object with TSeed::NewEntry().
| Assume you want to put a new object into the class tree which is filled by your Seed's FillSeedData. Then do not call the new operator of this object method, as this will not make the object you just created part of the new tree! You have to call your Seed's NewEntry() method which will return a pointer to an entry in the tree. |
Now we have a pointer to a new (and just initialized = empty) TCaloJet (we told TJCCGJet which data class to handle, thus it creates the correct class type for us, in this case a TCaloJet). We can now initialize this object, in the case of TCaloJet we can use its Set method.
This might be one of your first steps: To write your own seed file. You might want to use TJCCGJet.h and TJCCGJet.cxx as a template. Put your seeds into the same directory as TJCCGJet: seed/src.
Next you have to #include them in your main cxx file. Otherwise the static Seed objects might not be created (see the seed declaration) and SEED can't access your Seed. Additionally you will have to link your Seed's .cxx file (automatically done under UNIX by the GNUmakefile, if they are in the seed/src directory). Those are the two prerequisites.
There is an additional feature, the Seed activation file: Normally, you would recompile all your code whenever your change the number of Seeds you use (for your analysis or for the ntpl2class conversion). But you don't have to: Have a look at the file called "SeedActivation.ini" (you can use your own file name as long as you specify it in TEventSample::SetSeedActivationFile) and write each Seed's label you want SEED to use into this file, e.g. TJCCGJet. Now you can include all your Seeds in your main .cxx file - only those Seeds will create objects that are mentioned in this Seed activation file (both TEventSample::Create and TEventSample::Load respect the entries in this file).
The root file you created using the TEventSample::Create method contains one TEventSample object, which has a TTree-member which in turn has all the objects you've filled in using your Seeds. In the following example (a screen shot of a TBrowser window) two Seeds were used, the TJCCGJet and the TPRTParticle Seeds (just as in the ntpl2class.cxx example).
| |
| Screen shot of a TBrowser window showing a TEventSample. |
Again: You can analyze your Root file with the interactive version of Root (e.g. by opening the file and starting a TBrowser). But here I will focus on the non-interactive version, i.e. with compiled programs linking against the Root and SEED libraries.
Here is the main() function of the example analyze program, which will be referenced in the following paragraphs:
|
Just call TEventSample's static Load method. It returns a pointer to the loaded TEventSample (see line 20). The two parameters are the path and name of the Root file created by ntpl2class and the path and name of the Seed activation file.
If you specify a Seed activation file only part of the data (as given in the Seed activation file) will be loaded, allowing faster i/o transfers.
Look at line 26: You can retrieve the number of events stored in a TEventSample by its GetNEvents() method.
| Don't forget to tell the TEventSample which event you want to access by calling its SetEvent method (see line 30). |
Look at line 23. Here, a variable of type TJCCGJet is created
- your seed, as defined in th header. It derives from
TSeedABC
(actually - concerning read-out - TJCCGJet is just a shortcut for
TSeed
In line 28 you can see how to loop over a Seed's data entries: You can retrieve the number of entries with the TSeedABC::Size() method. And you access the data itself (in the case of the TJCCGJet it's a seed::TCaloJet) by the usual array operator (see line 29, jet[iJet] returns one seed::TCaloJet which was filled by TJCCGJet).
You will have to write your own Seeds corresponding to the data you want to retrieve from your n-tuple. But maybe the Data Classes offered by SEED and Root are not sufficient for you. Then you will have to write your own classes and to link them into your project. Put them into your seed/dataclasses directory first (and add an entry to the seed/dataclasses/dataclasses_user_linkdef.h), test them, and if you think they are of general value (i.e. there's more than you who might want to use it), share it!
The basic recipe for creating a new data class is:
We decided to apply the Root coding guidelines. Most important: Classes start with a capital "T", methods start with a capital letter, header files end on a ".h". When writing extensions to SEED please follow this guideline to make it easier for us to add your extensions - you'll benefit as well, as you end up with one homogenous code guideline.
Please provide feedback, currently the best way is to send an email to the authors of SEED or by sending emails to the SEED user mailing list at seedtalk@listserv.fnal.gov.
We're trying to keep the bug list up to date. So if you find something which does not have an entry in the bug list yet, please send us an email or post it on the seed mailing list.
Please share extensions to SEED with the rest of
its users! Send them in (the source files, that is) to
authors of SEED.
Too complicated? Don't get it? Something wrong? Send questions or comments to the seed mailing list or directly to the authors of SEED.