Artificial Neural Network
for Photon analysis
When discriminating between photons and background particles (neutral pions as well as the neutral decay channels
of eta and K^0_s mesons) we face a pattern recognition problem, typical for high energy physics.
The standard procedure for solving such a problem is the introduction of
relevant cuts in the multi-dimensional data. Nowadays the application of a software-implemented
artificial neural network (ANN) for pattern recognition is well known
and usually gives the results that are superior to conventional approaches
(see e.g. Proc. of CERN School of Computing, 1991, Ystad, Sweden, CERN 92-02, p.113-170
Thus, instead of direct application of the chosen variables and cuts on them it is often better to build ANN
that can accumulate a power of all the discrimination variables.
A cut on the produced output of the built ANN is then can be applied as an additional and very effective selection criterion.
ANNs usually have more input than output nodes and thus may be viewed
as performing dimensionality reduction of input data set.
The ANN approach is a technique which assigns objects to various classes. These objects can be
different data types, such as a signal and a background in our case. Each data type is assigned
to a class (e.g. classes of background and signal events).
Discrimination is achieved by looking at the class to which
the data event should belong. The technique fully exploits the correlation among different variables
and provides a discriminating boundary between the signal and the background.
ANNs have an ability to learn, remember and create relationships among the data by tuning the connectivity weights
between input, hidden and output nodes and by tuning the node thresholds.
There are many different types of ANN but the feed forward types are most popular in
for the recognition tasks. Feed forward implies that information can only flow in one direction
and the output directly determines the probability that an event characterized by some input
pattern vector X(x_1,x_2,...x_n) is from the signal class.
A typical ANN with a single (binary) output and one layers of hidden units is shown on the picture below.
ANN in HEP
A list of most popular ANN packages can be find on the page of
Fermilab Advanced Analysis Group
(see "Downloads", "Documentation" and "Isteresting sites" sections).
At least two of them have proved their reliability and have been widely applied for various HEP analyses:
JETNET package has a nice ROOT interface developed in CDF. SNNS has also
very comfortable JAVA based graphical design implemeneted in
Both packages are well documented.
For building ANN we have used JETNET package with ROOT interface.
The main file here that governs behavior of ANN is root_to_jetnet.C
In a given example the input variables that are fed to the ANN input for signal and background classes
are taken from files gam_40_50.dat_095_010 and
Here, the first and second files contain variables that characterize direct photon (as signal)
and EM-jet (as background), respectively.
The unities in the array bit_pattern defines the choice of the variables
(columns of the files gam_40_50.dat_095_010 and emj_40_50.dat_095_010).
In this example the following four variables have been chosen: the number of the EM cluster cells at the EM1 layer
ncell_EM1_clus, the number of cells around the EM cluster within the ring of 0.2ncell_EM1_dr02,
the scalar sum of track transverse momenta in the ring of 0.05 sum_PT_track and
the energy-weighted EM cluster width in r x phi at the EM3 layer sigrp_EM3
(Other four variables describe number of CPS 3D clusters and V cluster strips as well as the number of tracks in
the ring of 0.05
After the choice of the physical discrimination variables one needs to define total and training
number of events.
The usual rule is the following: number of events for the training stage should be
be at least 20-30*N_ind. Here N_ind is a number of independent parameters (which are
weights and thresholds of ANN). The total number of independent parameters in a neural network
with a single layer is given by:
N_ind=(N_in+N_on) * N_hn + N_ht + N_ot
where N_in is a number of input nodes, N_on is a number of output nodes,
N_hn is a number of nodes in a hidden single layer, N_ht is a number of thresholds in
a hidden single layer and N_ot is a number of output thresholds.
For the case described in root_to_jetnet.C N_in=4, N_on=N_ot=1, N_hn=N_ht=6 and N_in=37. Thus, at least of order of
1000 events is required for training this network.
The practice shows that the network performance weakly depends on the number of hidden units
and that is usually of order of number of input variables. Nevertheless, the study of ANN discrimination
efficiency on N_hn is useful.
Nepoch is the number of training cycles. The training should continue until the network stabilization
(changes in the found weights are negligible). From practice, ANN becomes stable when Nepoch = 50-300.
The exact number depends on the nature of input samples and the number of independent parameters.
The ANN classification/separation efficiency should grow with Nepoch while the ANN error
(see JETNET manual ). The network performance and error vs. Nepoch
(for the two event classes considered here) can be loaded from plot1
and plot2 .
Another issue is the minimization algorithm (parameter MiniMe in root_to_jetnet.C).
The numbers here correspond to the following possible cases: MiniMe=0 is standard Backpropagation,
MiniMe=1 is Manhattan and MiniMe=2 is Langevin algorithms. In most cases they produce close results
(e.g. see hep-ex/0108051), but for some cases their comparison and the tuning for their parameters is needed.
Below are instructions how to start .
Type in the loaded ROOT session:
root .L root_to_jetnet.C
root jetnet("00011011") // assuming that the set of variables above has been chosen
root plotPerform() // plots the network performance and errors as a function of Nepoch.
The built network is completely defined by its architecture and by the set of found weights and node thresholds.
For example, the network used here have architecture 4-6-1. After finishing its work jetnet() creates
the weight file. In our case it is file weights_00011011_4_6_1.dat .
Using this file one plot ANN output for one
or more input samples (assuming they have same structure as the training input files).
For instance, to see the network output for signal and background samples samples, call plotNN(...) function:
Here is a typical distribution of ANN output
(The ANN was trained to produce "0" for EM-jet background and "1" for signal photon events).
Before you start the network training,
it is very useful to see the distributions over your chosen input variables. For this aim just type:
root plotInput("Samples/gam_40_50.dat_095_010", "Samples/emj_40_50.dat_095_010")
In our case the distributions you should get are shown on Input_1.eps and
And finally, ROOT_to_JETNET creates file NNout.C that calculates ANN output using
ANN architecture and weights contained inside. It has very convenient interface and can be easily embedded in your code.
Just copy the whole file in your program, calculate pattern vector and call NNout().
float* pat = new float;
pat = ...
pat = ...
float* output = new float;
The value of out_dat can be used for the following discrimination. In our example, cut "out_dat>0.50"
should noticeably suppress photon background and select photons with a good efficiency.
This document was last modified: