Evidence for Production of Single Top Quarks
using Boosted Decision Trees
The DØ Collaboration
December 2006
[Overview]
[Decision Trees]
[Variables]
[Cross Checks]
[DT Outputs]
[Cross Sections]
[FAQ]
[References]
Abstract
The DØ Collaboration presents first evidence for the production of
single top quarks at the Fermilab Tevatron ppbar collider. Using a 0.9
fb-1 dataset, we apply a multivariate analysis, boosted decision trees,
to separate signal from background and measure
σ(ppbar→ tb + X, tqb + X) = 4.9 ± 1.4 pb.
The probability to measure a cross section at this value or higher
in the absence of
signal is 0.035%, corresponding to a 3.4 standard deviation
significance. We use the cross section measurement to directly
determine the CKM matrix element that describes the Wtb
coupling and find
0.68 < |Vtb| ≤ 1
at 95% C.L. within the standard model.
The decision tree analysis also provides measurements for tb and
tqb separately:
σ(ppbar → tqb + X) = 4.2+1.8-1.4 pb
and
σ(ppbar → tb + X) = 1.0 ± 0.9 pb
Analysis Overview
Details about the event selection and samples used to model the data can
be found on the analysis main page.
For simplicity, and because the decision trees were expected to
deal well with all components at once, trees were trained against all
backgrounds together rather than making separate trees for each
background. The background includes Monte Carlo events for
ttbar → lepton+jets, ttbar → dilepton+jets and W+jets
(consisting of the separate sub-samples:
Wbb+Nlp, Wcc+Nlp and W+Nlp,
where bb/cc stands for a b/c quark-antiquark pair
and Nlp stands for N light partons, 0 ≤ N ≤ 5).
Each background component is represented in
proportion to its expected fraction in the background model.
Each sample is treated independently with its own training for each
signal, leading to 36 different trees (3 signals × 2 lepton
flavors × 3 jet multiplicities × 2 b-tagging
possibilities).
In the tb+tqb training, the tb and tqb channel
components of the signal are taken in their SM proportions.
We then calculate a multivariable discriminant
(the boosted decision tree output) to separate as much as possible the
expected signal from the background. We fit the background model to
the data in the discriminant output distribution for each analysis
channel and combine the results to improve the expected sensitivity.
Decision Trees and Boosting
Decision trees are a machine learning technique [1] not (yet) commonly used in high energy
physics. The goal is to extend a simple cut-based analysis into a
multivariate technique by continuing to analyze events that fail a
particular criterion.
Tree construction
Mathematically, decision trees are rooted binary trees. An example
is shown below. Nodes are shown in blue ( ),
with their associated splitting test; terminal nodes (leaves) are in green
( ).

Note: Clicking on a plot will give the .eps version. Right click
and "View Image" will get the full resolution .png version.
Consider a training sample made of known signal and background
events: they form the root node of the tree. Given a list of variables
{xi}, all events are sorted in turn according to
each variable. For each xi the splitting value that
gives the best separation of the events into two child nodes -- one
with mostly signal events, the other with mostly background events --
is found. The variable and split value giving the best separation are
selected and two new nodes are created, one corresponding to events
satisfying the split criterion (labeled P for passed in the above
figure), the other containing events that failed it (labeled F). The
algorithm is then applied recursively to the two child nodes. When the
splitting stops, the terminal node is called a leaf, with an
associated purity, the weighted signal fraction of the training sample
in this node.
Once a tree is built, events can be run through it to get the
decision tree output of a particular sample. When a new event is
passed through the tree, its properties are compared to the criterion
at each node until it reaches a leaf. For instance with the tree in
the above figure the event will go to the right after the root node if
HT>212 GeV, and to the right again if
pT<31.6 GeV. It would then have reached a leaf and the
output of the tree for this event is the leaf purity.
A leaf is labeled signal if the purity is larger than 0.5,
background otherwise. The Gini factor was used as the splitting
criterion (see Ref [2] for more details). Each
node has to contain at least 100 events. Variables used with our
decision trees are described here.
Boosting
A very powerful technique to improve the performance of any weak
classifier (anything that does better than random guessing) was
introduced a decade ago: boosting [3]. Boosting
was recently used in high energy physics with decision trees by the
MiniBooNe experiment [4]. The boosting algorithm
used in this single top quark search is adaptive boosting, known in
the literature as AdaBoost [3].
The basic principle of boosted decision trees is to train a tree
Tn, check which events are misclassified by Tn,
increase the weight of misclassified events and train a tree
Tn+1 on the reweighted sample. It makes Tn+1
work harder on difficult events to classify them properly. The boosted
decision tree output for an event is the weigthed average of the
different tree outputs.
Discriminating Variables
We identify 49 variables from an analysis of the signal and
background Feynman diagrams [5] and studies of
single top quark production at next-to-leading order [6]. The variables may be classified into three
categories: individual object kinematics, global event kinematics, and
variables based on angular correlations. The list of variables is
shown below (Latex source). Jets are
sorted in pT and index 1 refers to the leading jet in a jet
category: "jetn" (n=1,2,3,4) corresponds to each jet in the event,
"tagn" to b-tagged jets, "untagn" to non-b-tagged jets, "bestn" to the
best jet and "notbestn" to all but the best jet. The best jet is
defined as the jet for which the invariant mass M(W,jet) is
closest to mt = 175 GeV.

Cross Check Samples
Two cross check samples were created:
- a W+jets enriched sample by requiring two jets, one of them b-tagged, and
HT
(scalar sum of all jet pT's, lepton pT and MET)
less than 175 GeV
- a ttbar enriched sample with 4 jets, one of them b-tagged, and HT > 300 GeV.
Data was compared with the expected background samples for several variable distributions.
Below the DT output distributions are shown for the cross checks samples.
|
|
DT output from the W+jets-enhanced cross-check sample
|
DT output from the ttbar-enhanced cross-check sample
|
Decision Tree Outputs
Decision trees are trained on 1/3 of all W+jets and ttbar MC events and
the rest of the events are used to measure acceptances.
e+jets
1 b-tagged jet
2 b-tagged jets
mu+jets
1 b-tagged jet
2 b-tagged jets
Cross Section Measurements
We apply a Bayesian approach [7] to measure the
single top quark production cross section. We form a binned likelihood
as a product over all bins and channels (lepton flavor, jet
multiplicity, and tag multiplicity) of the decision tree discriminant,
separately for the tb+tqb, tqb, and tb analyses. We assume a Poisson
distribution for the observed counts and flat nonnegative prior
probabilities for the signal cross sections. Systematic uncertainties
and their correlations are taken into account by integrating over the
signal acceptances, background yields, and integrated luminosity with
Gaussian priors for each systematic uncertainty. The final posterior
probability density is computed as a function of the production cross
section. For each analysis, we measure the cross section using the
position of the posterior density peak and we take the 68% asymmetric
interval about the peak as the uncertainty on the measurement. The
tb+tqb posterior is shown below:

We obtain the following results:
| s-channel | | σ(ppbar → tb + X) | | = 1.0 ± 0.9 pb |
| t-channel | | σ(ppbar → tqb + X) | | = 4.2+1.8-1.4 pb |
| s+t channels | | σ(ppbar → tb + X, tqb + X) | | = 4.9 ± 1.4 pb |
which are consistent with the SM next-to-leading-order predictions of 0.9 pb, 2.0 pb, and
2.9 pb respectively.
Significance of the result
A large ensemble of pseudo-datasets (over 68000 entries) with all systematic
uncertainties included, has been generated with zero signal content,
i.e., they contain only events from the background. We have performed
the decision tree analysis on each of these datasets, including full
systematic error treatment, and have measured the cross section for
tb+tqb in each set. We measure the probability that data containing
no single top quark events could fluctuate to give us at least our measured cross section
value. This is the so-called "p-value" and is widely used to estimate the significance of a
measurement.
We find that the probability that the background fluctuates up to
produce the measured cross section of 4.9 pb or greater is 0.035%,
corresponding to a significance for our result of 3.4 Gaussian
equivalent standard deviations. Using a second ensemble of
pseudo-datasets which includes a SM tb+tqb signal with 2.9 pb cross
section, with all systematic uncertainties included, we find the
probability to measure a cross section of at least 4.9 pb to be 11%.

Measured cross section in a large ensemble of pseudo-datasets without single top content.
The significance of the measurement is given by the fraction of pseudo-datasets with
measured cross section greater than the measured cross section in the real data.
Frequently Asked Questions
Figures from the PRL paper can be found
here
References
- L. Breiman et al., "Classification and Regression Trees," Wadsworth (1984).
- Y. Coadou for the CDF and DØ Collaborations, "Uses of Multivariate Analysis Methods," PoS TOP2006, 016 (2006).
- Y. Freund and R.E. Schapire, "Experiments with a New Boosting Algorithm," in Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156 (1996).
- B.P. Roe et al., "Boosted Decision Trees as an Alternative to Artificial Neural Networks for Particle Identification," Nucl. Instrum. Meth. A 543, 577 (2005).
- E. Boos, L. Dudko, and T. Ohl, Eur. Phys. J. C 11, 473 (1999); L. Dudko, AIP Conf. Proc. 583, 83 (2001); E. Boos and L. Dudko, Nucl. Instrum. Methods A 502, 486 (2003).
- Q.-H. Cao, R. Schwienhorst, and C.-P. Yuan, Phys. Rev. D 71, 054023 (2005); Q.-H. Cao et al., ibid. 72, 094027 (2005).
- I. Bertram et al., "A Recipe for the Construction of Confidence Limits", Fermilab-TM-2104 (2000), and references therein; E.T. Jaynes and L. Bretthorst, "Probability Theory: the Logic of Science," Cambridge University Press, Cambridge (2003).
- G.L. Kane, F.A. Ladinsky and C.P. Yuan, Phys. Rev. D 45, 124 (1992); K. Whisnant et al., Phys. Rev. D 56, 467 (1997).
E-mail the single top subgroup leaders: Arán García-Bellido, Ann Heinson
Last modified:
|