Evidence for Production of Single Top Quarks
using Boosted Decision Trees

The DØ Collaboration

December 2006



[Overview]   [Decision Trees]   [Variables]   [Cross Checks]   [DT Outputs]   [Cross Sections]   [FAQ]   [References]


Abstract

The DØ Collaboration presents first evidence for the production of single top quarks at the Fermilab Tevatron ppbar collider. Using a 0.9 fb-1 dataset, we apply a multivariate analysis, boosted decision trees, to separate signal from background and measure σ(ppbar→ tb + X, tqb + X) = 4.9 ± 1.4 pb. The probability to measure a cross section at this value or higher in the absence of signal is 0.035%, corresponding to a 3.4 standard deviation significance. We use the cross section measurement to directly determine the CKM matrix element that describes the Wtb coupling and find 0.68 < |Vtb| ≤ 1 at 95% C.L. within the standard model.

The decision tree analysis also provides measurements for tb and tqb separately: σ(ppbar → tqb + X) = 4.2+1.8-1.4 pb and σ(ppbar → tb + X) = 1.0 ± 0.9 pb


Analysis Overview

Details about the event selection and samples used to model the data can be found on the analysis main page.

For simplicity, and because the decision trees were expected to deal well with all components at once, trees were trained against all backgrounds together rather than making separate trees for each background. The background includes Monte Carlo events for ttbar → lepton+jets, ttbar → dilepton+jets and W+jets (consisting of the separate sub-samples: Wbb+Nlp, Wcc+Nlp and W+Nlp, where bb/cc stands for a b/c quark-antiquark pair and Nlp stands for N light partons, 0 ≤ N ≤ 5). Each background component is represented in proportion to its expected fraction in the background model.

Each sample is treated independently with its own training for each signal, leading to 36 different trees (3 signals × 2 lepton flavors × 3 jet multiplicities × 2 b-tagging possibilities). In the tb+tqb training, the tb and tqb channel components of the signal are taken in their SM proportions.

We then calculate a multivariable discriminant (the boosted decision tree output) to separate as much as possible the expected signal from the background. We fit the background model to the data in the discriminant output distribution for each analysis channel and combine the results to improve the expected sensitivity.


Decision Trees and Boosting

Decision trees are a machine learning technique [1] not (yet) commonly used in high energy physics. The goal is to extend a simple cut-based analysis into a multivariate technique by continuing to analyze events that fail a particular criterion.

Tree construction

Mathematically, decision trees are rooted binary trees. An example is shown below. Nodes are shown in blue (DT node), with their associated splitting test; terminal nodes (leaves) are in green (DT leaf).

Decision tree


Note: Clicking on a plot will give the .eps version. Right click and "View Image" will get the full resolution .png version.

Consider a training sample made of known signal and background events: they form the root node of the tree. Given a list of variables {xi}, all events are sorted in turn according to each variable. For each xi the splitting value that gives the best separation of the events into two child nodes -- one with mostly signal events, the other with mostly background events -- is found. The variable and split value giving the best separation are selected and two new nodes are created, one corresponding to events satisfying the split criterion (labeled P for passed in the above figure), the other containing events that failed it (labeled F). The algorithm is then applied recursively to the two child nodes. When the splitting stops, the terminal node is called a leaf, with an associated purity, the weighted signal fraction of the training sample in this node.

Once a tree is built, events can be run through it to get the decision tree output of a particular sample. When a new event is passed through the tree, its properties are compared to the criterion at each node until it reaches a leaf. For instance with the tree in the above figure the event will go to the right after the root node if HT>212 GeV, and to the right again if pT<31.6 GeV. It would then have reached a leaf and the output of the tree for this event is the leaf purity.

A leaf is labeled signal if the purity is larger than 0.5, background otherwise. The Gini factor was used as the splitting criterion (see Ref [2] for more details). Each node has to contain at least 100 events. Variables used with our decision trees are described here.

Boosting

A very powerful technique to improve the performance of any weak classifier (anything that does better than random guessing) was introduced a decade ago: boosting [3]. Boosting was recently used in high energy physics with decision trees by the MiniBooNe experiment [4]. The boosting algorithm used in this single top quark search is adaptive boosting, known in the literature as AdaBoost [3].

The basic principle of boosted decision trees is to train a tree Tn, check which events are misclassified by Tn, increase the weight of misclassified events and train a tree Tn+1 on the reweighted sample. It makes Tn+1 work harder on difficult events to classify them properly. The boosted decision tree output for an event is the weigthed average of the different tree outputs.


Discriminating Variables

We identify 49 variables from an analysis of the signal and background Feynman diagrams [5] and studies of single top quark production at next-to-leading order [6]. The variables may be classified into three categories: individual object kinematics, global event kinematics, and variables based on angular correlations. The list of variables is shown below (Latex source). Jets are sorted in pT and index 1 refers to the leading jet in a jet category: "jetn" (n=1,2,3,4) corresponds to each jet in the event, "tagn" to b-tagged jets, "untagn" to non-b-tagged jets, "bestn" to the best jet and "notbestn" to all but the best jet. The best jet is defined as the jet for which the invariant mass M(W,jet) is closest to mt = 175 GeV.

Variables



Cross Check Samples

Two cross check samples were created:

  • a W+jets enriched sample by requiring two jets, one of them b-tagged, and HT (scalar sum of all jet pT's, lepton pT and MET) less than 175 GeV
  • a ttbar enriched sample with 4 jets, one of them b-tagged, and HT > 300 GeV.

Data was compared with the expected background samples for several variable distributions. Below the DT output distributions are shown for the cross checks samples.

W+jets cross-check sample ttbar cross-check sample
DT output from the
W+jets-enhanced cross-check sample
DT output from the
ttbar-enhanced cross-check sample


Decision Tree Outputs

Decision trees are trained on 1/3 of all W+jets and ttbar MC events and the rest of the events are used to measure acceptances.

e+jets

1 b-tagged jet
2 jets3 jets4 jets

2 b-tagged jets
2 jets3 jets4 jets

mu+jets

1 b-tagged jet
2 jets3 jets4 jets

2 b-tagged jets
2 jets3 jets4 jets


Cross Section Measurements

We apply a Bayesian approach [7] to measure the single top quark production cross section. We form a binned likelihood as a product over all bins and channels (lepton flavor, jet multiplicity, and tag multiplicity) of the decision tree discriminant, separately for the tb+tqb, tqb, and tb analyses. We assume a Poisson distribution for the observed counts and flat nonnegative prior probabilities for the signal cross sections. Systematic uncertainties and their correlations are taken into account by integrating over the signal acceptances, background yields, and integrated luminosity with Gaussian priors for each systematic uncertainty. The final posterior probability density is computed as a function of the production cross section. For each analysis, we measure the cross section using the position of the posterior density peak and we take the 68% asymmetric interval about the peak as the uncertainty on the measurement. The tb+tqb posterior is shown below:

Posterior

We obtain the following results:

s-channel σ(ppbar → tb + X) = 1.0 ± 0.9 pb
t-channel σ(ppbar → tqb + X) = 4.2+1.8-1.4 pb
s+t channels σ(ppbar → tb + X, tqb + X) = 4.9 ± 1.4 pb

which are consistent with the SM next-to-leading-order predictions of 0.9 pb, 2.0 pb, and 2.9 pb respectively.

Significance of the result

A large ensemble of pseudo-datasets (over 68000 entries) with all systematic uncertainties included, has been generated with zero signal content, i.e., they contain only events from the background. We have performed the decision tree analysis on each of these datasets, including full systematic error treatment, and have measured the cross section for tb+tqb in each set. We measure the probability that data containing no single top quark events could fluctuate to give us at least our measured cross section value. This is the so-called "p-value" and is widely used to estimate the significance of a measurement.

We find that the probability that the background fluctuates up to produce the measured cross section of 4.9 pb or greater is 0.035%, corresponding to a significance for our result of 3.4 Gaussian equivalent standard deviations. Using a second ensemble of pseudo-datasets which includes a SM tb+tqb signal with 2.9 pb cross section, with all systematic uncertainties included, we find the probability to measure a cross section of at least 4.9 pb to be 11%.

Single top significance (p-value)
Measured cross section in a large ensemble of pseudo-datasets without single top content.
The significance of the measurement is given by the fraction of pseudo-datasets with
measured cross section greater than the measured cross section in the real data.



Frequently Asked Questions

Figures from the PRL paper can be found here


References

  1. L. Breiman et al., "Classification and Regression Trees," Wadsworth (1984).
  2. Y. Coadou for the CDF and DØ Collaborations, "Uses of Multivariate Analysis Methods," PoS TOP2006, 016 (2006).
  3. Y. Freund and R.E. Schapire, "Experiments with a New Boosting Algorithm," in Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156 (1996).
  4. B.P. Roe et al., "Boosted Decision Trees as an Alternative to Artificial Neural Networks for Particle Identification," Nucl. Instrum. Meth. A 543, 577 (2005).
  5. E. Boos, L. Dudko, and T. Ohl, Eur. Phys. J. C 11, 473 (1999); L. Dudko, AIP Conf. Proc. 583, 83 (2001); E. Boos and L. Dudko, Nucl. Instrum. Methods A 502, 486 (2003).
  6. Q.-H. Cao, R. Schwienhorst, and C.-P. Yuan, Phys. Rev. D 71, 054023 (2005); Q.-H. Cao et al., ibid. 72, 094027 (2005).
  7. I. Bertram et al., "A Recipe for the Construction of Confidence Limits", Fermilab-TM-2104 (2000), and references therein; E.T. Jaynes and L. Bretthorst, "Probability Theory: the Logic of Science," Cambridge University Press, Cambridge (2003).
  8. G.L. Kane, F.A. Ladinsky and C.P. Yuan, Phys. Rev. D 45, 124 (1992); K. Whisnant et al., Phys. Rev. D 56, 467 (1997).

E-mail the single top subgroup leaders: Arán García-Bellido, Ann Heinson

Last modified: