Artificial Neural Network

for Photon analysis


When discriminating between photons and background particles (neutral pions as well as the neutral decay channels of eta and K^0_s mesons) we face a pattern recognition problem, typical for high energy physics. The standard procedure for solving such a problem is the introduction of relevant cuts in the multi-dimensional data. Nowadays the application of a software-implemented artificial neural network (ANN) for pattern recognition is well known and usually gives the results that are superior to conventional approaches (see e.g. Proc. of CERN School of Computing, 1991, Ystad, Sweden, CERN 92-02, p.113-170 ).
Thus, instead of direct application of the chosen variables and cuts on them it is often better to build ANN that can accumulate a power of all the discrimination variables. A cut on the produced output of the built ANN is then can be applied as an additional and very effective selection criterion.

ANNs usually have more input than output nodes and thus may be viewed as performing dimensionality reduction of input data set.
The ANN approach is a technique which assigns objects to various classes. These objects can be different data types, such as a signal and a background in our case. Each data type is assigned to a class (e.g. classes of background and signal events). Discrimination is achieved by looking at the class to which the data event should belong. The technique fully exploits the correlation among different variables and provides a discriminating boundary between the signal and the background.
ANNs have an ability to learn, remember and create relationships among the data by tuning the connectivity weights between input, hidden and output nodes and by tuning the node thresholds. There are many different types of ANN but the feed forward types are most popular in for the recognition tasks. Feed forward implies that information can only flow in one direction and the output directly determines the probability that an event characterized by some input pattern vector X(x_1,x_2,...x_n) is from the signal class. A typical ANN with a single (binary) output and one layers of hidden units is shown on the picture below.



A list of most popular ANN packages can be find on the page of Fermilab Advanced Analysis Group (see "Downloads", "Documentation" and "Isteresting sites" sections). At least two of them have proved their reliability and have been widely applied for various HEP analyses: JETNET and SNNS packages. JETNET package has a nice ROOT interface developed in CDF. SNNS has also very comfortable JAVA based graphical design implemeneted in JavaNNS . Both packages are well documented.

Photon ANN

For building ANN we have used JETNET package with ROOT interface.
The main file here that governs behavior of ANN is root_to_jetnet.C
In a given example the input variables that are fed to the ANN input for signal and background classes are taken from files gam_40_50.dat_095_010 and emj_40_50.dat_095_010 . Here, the first and second files contain variables that characterize direct photon (as signal) and EM-jet (as background), respectively. The unities in the array bit_pattern defines the choice of the variables (columns of the files gam_40_50.dat_095_010 and emj_40_50.dat_095_010). In this example the following four variables have been chosen: the number of the EM cluster cells at the EM1 layer ncell_EM1_clus, the number of cells around the EM cluster within the ring of 0.2ncell_EM1_dr02, the scalar sum of track transverse momenta in the ring of 0.05 sum_PT_track and the energy-weighted EM cluster width in r x phi at the EM3 layer sigrp_EM3 (Other four variables describe number of CPS 3D clusters and V cluster strips as well as the number of tracks in the ring of 0.05
After the choice of the physical discrimination variables one needs to define total and training number of events. The usual rule is the following: number of events for the training stage should be be at least 20-30*N_ind. Here N_ind is a number of independent parameters (which are weights and thresholds of ANN). The total number of independent parameters in a neural network with a single layer is given by:

N_ind=(N_in+N_on) * N_hn + N_ht + N_ot

where N_in is a number of input nodes, N_on is a number of output nodes, N_hn is a number of nodes in a hidden single layer, N_ht is a number of thresholds in a hidden single layer and N_ot is a number of output thresholds.
For the case described in root_to_jetnet.C N_in=4, N_on=N_ot=1, N_hn=N_ht=6 and N_in=37. Thus, at least of order of 1000 events is required for training this network.

The practice shows that the network performance weakly depends on the number of hidden units and that is usually of order of number of input variables. Nevertheless, the study of ANN discrimination efficiency on N_hn is useful.

Nepoch is the number of training cycles. The training should continue until the network stabilization (changes in the found weights are negligible). From practice, ANN becomes stable when Nepoch = 50-300. The exact number depends on the nature of input samples and the number of independent parameters. The ANN classification/separation efficiency should grow with Nepoch while the ANN error (see JETNET manual ). The network performance and error vs. Nepoch (for the two event classes considered here) can be loaded from plot1 and plot2 .

Another issue is the minimization algorithm (parameter MiniMe in root_to_jetnet.C). The numbers here correspond to the following possible cases: MiniMe=0 is standard Backpropagation, MiniMe=1 is Manhattan and MiniMe=2 is Langevin algorithms. In most cases they produce close results (e.g. see hep-ex/0108051), but for some cases their comparison and the tuning for their parameters is needed.

Below are instructions how to start .
Type in the loaded ROOT session:

root[] .L root_to_jetnet.C
root[] jetnet("00011011") // assuming that the set of variables above has been chosen
root[] plotPerform() // plots the network performance and errors as a function of Nepoch.

The built network is completely defined by its architecture and by the set of found weights and node thresholds. For example, the network used here have architecture 4-6-1. After finishing its work jetnet() creates the weight file. In our case it is file weights_00011011_4_6_1.dat . Using this file one plot ANN output for one or more input samples (assuming they have same structure as the training input files). For instance, to see the network output for signal and background samples samples, call plotNN(...) function:

root[] plotNN(4000,"Samples/gam_40_50.dat_095_010",3000,"Samples/emj_40_50.dat_095_010","Out/weights_00011011_4_6_1.dat")

Here is a typical distribution of ANN output (The ANN was trained to produce "0" for EM-jet background and "1" for signal photon events).

Before you start the network training, it is very useful to see the distributions over your chosen input variables. For this aim just type:
root[] plotInput("Samples/gam_40_50.dat_095_010", "Samples/emj_40_50.dat_095_010")
In our case the distributions you should get are shown on Input_1.eps and Input_2.eps .

And finally, ROOT_to_JETNET creates file NNout.C that calculates ANN output using ANN architecture and weights contained inside. It has very convenient interface and can be easily embedded in your code. Just copy the whole file in your program, calculate pattern vector and call NNout().
float* pat = new float[8];
pat[0] = ...
pat[7] = ...
float* output = new float[1];
The value of out_dat[0] can be used for the following discrimination. In our example, cut "out_dat[0]>0.50" should noticeably suppress photon background and select photons with a good efficiency.

Feel free to send any questions on this page to
Dmitry Bandurin
This document was last modified: