Abstract
Multiple hypothesis testing is an integral component of data analysis for large-scale technologies such as proteomics, transcriptomics, or metabolomics, for which the false discovery rate (FDR) and positive FDR (pFDR) have been accepted as error estimation and control measures. The pFDR is the expectation of false discovery proportion (FDP), which refers to the ratio of the number of null hypotheses to that of all rejected hypotheses. In practice, the expectation of ratio is approximated by the ratio of expectation; however, the conditions for transforming the former into the latter have not been investigated. This work derives exact integral expressions for the expectation (pFDR) and variance of FDP. The widely used approximation (ratio of expectations) is shown to be a particular case (in the limit of a large sample size) of the integral formula for pFDR. A recurrence formula is provided to compute the pFDR for a predefined number of null hypotheses. The variance of FDP was approximated for a practical application in peptide identification using forward and reversed protein sequences. The simulations demonstrate that the integral expression exhibits better accuracy than the approximate formula in the case of a small number of hypotheses. For large sample sizes, the pFDRs obtained by the integral expression and approximation do not differ substantially. Applications to proteomics data sets are included.
Original language | English (US) |
---|---|
Pages (from-to) | 2298-2305 |
Number of pages | 8 |
Journal | Journal of Proteome Research |
Volume | 23 |
Issue number | 6 |
DOIs | |
State | Published - Jun 7 2024 |
Keywords
- false discovery rate
- integral expression for false discovery rate
- integral formula for variance of false discovery proportion
- multiple hypothesis testing
- positive false discovery rate
- variance of false discovery proportion
ASJC Scopus subject areas
- General Chemistry
- Biochemistry