One approach is to define a `confidence limit' by analogy to the
conventional
test. A
test is applicable to the case
where one has a collection of N normally distributed variables
with means
and widths
. Call this
hypothesis H. One defines the function

Then, given some particular measurement
(call it D),
one can test how
consistent it is with H by asking how likely it would be,
assuming H, to get a measurement with a larger
than the one actually seen. That is, one evaluates the
integral

This is in some sense the probability for the presumed model H to fluctuate to give the observed data D. (Note that it is not the probability that H is true; that is not well-defined unless one specifies the complete set of alternatives to H.)
In order to generalize this procedure for other types of distributions, note that the probability for observing a particular measurement D assuming H is just

This suggests that one can
obtain an analogous significance for a problem with an arbitrary
likelihood
by computing the integral
That is, by computing the total probability of all possible data samples which have a lower probability than the one actually observed.
For the mass fitting problem, the hypothesis to test is that the data
are described entirely by the background model. The appropriate
likelihood is thus obtained by setting
= 0 in
equation (7.28), yielding

The remaining parameter
is then integrated out:

The prior
is again taken to be a gaussian.
The integral over the data space can then be written

Note that if
is uniform, this prescription yields the
same result as was used for the counting experiment
(equation (5.12)).
Strictly speaking, this is true only if
is restricted to be
larger than N; this will make a difference only in cases
where the expected background is not small in comparison to the
number of observed events. This is because the prescription developed
here tests the consistency of D with H regardless of the
direction of any disagreement, while (5.12)
counts only upward fluctuations in the number of events. For example,
consider some hypothetical experiment where H predicts that
100 events should be expected, but 1000 events are actually
observed. Both methods would assign a small probability to this
occurrence. However, if 100 events are expected, it is also
quite unlikely to see zero events. The prescription developed
here will also assign a small probability to this latter case;
however, the counting experiment significance
(5.12) would assign it a probability of 1.
The integral in equation (8.4) can be evaluated by Monte Carlo techniques. An outline of a procedure for doing so is as follows.
.
.

Define a probability threshold
by
.
of
event experiments,
picking each mass
from the background probability
distribution
. This forms a set of samples
. For each of
these samples, compute the remaining likelihood factor
, and count the number of times that this is less
than the threshold
. Call this
.
.
Return to step 2, and continue looping until the
terms being summed become insignificantly small.
The results of this calculation are
for the loose
cuts, and
for the standard cuts. If the calculation
is repeated with
taken to be uniform (i.e., using only counting
information), the results are
for the loose
cuts and
for the standard cuts. (If the
counting experiment prescription of equation (5.12)
were used instead, the result is unchanged for the standard cuts,
but goes down to
for the loose cuts.)
It is also interesting to try to construct a significance which uses
only the shapes of the distributions and which does not depend
on the scale of
. This can be done by fixing
= N and taking
the likelihood to be simply
=
. The results
from this are 0.06 for the loose cuts and 0.30 for the standard
cuts.