P-value calibration in multiple hypotheses testing

As p-values are the most common measures of evidence against a hypothesis, their calibration with respect to null hypothesis conditional probability is important in order to match frequentist unconditional inference with the Bayesian ones. The Selke, Bayarri and Berger calibration is one of the most popular attempts to obtain such a calibration. This relies on the theoretical sampling null distribution of p-values, which is the well-known Uniform(0,1), but arising only for specific sampling models. We generalize this calibration by considering a sampling null distribution estimated from the data. It is possible to obtain such an empirical null distribution, for instance, in the context of multiple testing in which many p-values come from the null model. Such a context is purely instrumental for the purposes of p-value calibration, and multiple testing still needs to be considered with appropriate techniques. The new calibration proposed here still remains a simple analytic formula like the original one under the Uniform(0,1) and basically provides a stronger interpretation framework for the widely used p-value.

P-value calibration in multiple hypotheses testing Articles