Back

diagFDR: Verifiable False Discovery Rate Reporting in Proteomics via Scope, Calibration, and Stability Diagnostics

Chion, M.; Godmer, A.; Douche, T.; Matondo, M.; Giai Gianetto, Q.

2026-04-20 bioinformatics
10.64898/2026.04.16.718468 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWIn mass spectrometry-based proteomics, false discovery rate (FDR) control underpins the credibility of peptide and protein identifications. In contemporary workflows, including multi-run Data Independent Acquisition (DIA), deep learning-assisted scoring, library-free searches, and extensive post-processing, the statement "1% FDR" has become increasingly ambiguous, potentially referring to different statistical entities, multiple-testing scopes, and null models. We propose a standardized framework requiring explicit specification of three complementary properties: "scope", meaning which statistical universe is controlled; "calibration", meaning whether confidence measures behave consistently with their intended interpretation on the reported unit; and "stability", meaning whether acceptance thresholds and resulting identification lists remain robust to perturbations. Building on routine target/decoy outputs, we introduce pipeline-agnostic diagnostics that audit internal coherence of scores, q-values, and posterior error probabilities, quantify tail support and cutoff fragility, and test plausibility of target-decoy assumptions. We further complement internal checks with external validation via entrapment, which measures empirical false positives on knownabsent sequences. We highlight a "granularity paradox": as scoring becomes more discriminative, decoy matches can become so sparse near stringent cutoffs that the numerical support for decoy-based estimation deteriorates, making reported FDR thresholds increasingly fragile despite improved separation between the distributions of target and decoy scores. Applications to DIA-NN and MS2Rescore show that scope and aggregation choices can materially alter both estimated error rates and list reproducibility. We provide a practical reporting checklist and an open-source R package (diagFDR, available from CRAN) that generates diagnostic reports from standard software outputs. As a minimal verifiable reporting standard, we recommend that any "FDR = %" claim specify the controlled unit and scope, report tail support at the operating cutoff, and make decoy-inclusive outputs available for independent verification. HighlightsO_LIFDR claims can be misleading without explicit scope, calibration, and stability assessment. C_LIO_LIdiagFDR introduces pipeline-agnostic diagnostics from standard software outputs. C_LIO_LIThe granularity paradox shows sparse decoy tails can make stringent cutoffs numerically fragile. C_LIO_LICase studies show that scope misuse and rescoring can affect both error rates and stability. C_LIO_LIdiagFDR produces reviewer-ready reports and a practical reporting checklist. C_LI

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
21.6%
2
Nature Methods
336 papers in training set
Top 0.5%
16.8%
3
Nature Communications
4913 papers in training set
Top 21%
8.8%
4
Genome Biology
555 papers in training set
Top 0.8%
6.9%
50% of probability mass above
5
Journal of Proteome Research
215 papers in training set
Top 0.5%
6.5%
6
Molecular & Cellular Proteomics
158 papers in training set
Top 0.5%
6.1%
7
Nature Biotechnology
147 papers in training set
Top 2%
4.7%
8
Cell Systems
167 papers in training set
Top 3%
4.7%
9
PLOS Computational Biology
1633 papers in training set
Top 11%
2.9%
10
Nature Machine Intelligence
61 papers in training set
Top 2%
2.0%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.4%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.3%
13
Patterns
70 papers in training set
Top 2%
1.1%
14
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
15
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.8%
17
Cell Reports Methods
141 papers in training set
Top 6%
0.7%
18
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
19
PLOS ONE
4510 papers in training set
Top 70%
0.7%
20
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%
21
GigaScience
172 papers in training set
Top 3%
0.7%
22
Genome Research
409 papers in training set
Top 5%
0.6%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%