diagFDR: Verifiable False Discovery Rate Reporting in Proteomics via Scope, Calibration, and Stability Diagnostics

Chion, M.; Godmer, A.; Douche, T.; Matondo, M.; Giai Gianetto, Q.

2026-04-20 bioinformatics

10.64898/2026.04.16.718468 bioRxiv

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWIn mass spectrometry-based proteomics, false discovery rate (FDR) control underpins the credibility of peptide and protein identifications. In contemporary workflows, including multi-run Data Independent Acquisition (DIA), deep learning-assisted scoring, library-free searches, and extensive post-processing, the statement "1% FDR" has become increasingly ambiguous, potentially referring to different statistical entities, multiple-testing scopes, and null models. We propose a standardized framework requiring explicit specification of three complementary properties: "scope", meaning which statistical universe is controlled; "calibration", meaning whether confidence measures behave consistently with their intended interpretation on the reported unit; and "stability", meaning whether acceptance thresholds and resulting identification lists remain robust to perturbations. Building on routine target/decoy outputs, we introduce pipeline-agnostic diagnostics that audit internal coherence of scores, q-values, and posterior error probabilities, quantify tail support and cutoff fragility, and test plausibility of target-decoy assumptions. We further complement internal checks with external validation via entrapment, which measures empirical false positives on knownabsent sequences. We highlight a "granularity paradox": as scoring becomes more discriminative, decoy matches can become so sparse near stringent cutoffs that the numerical support for decoy-based estimation deteriorates, making reported FDR thresholds increasingly fragile despite improved separation between the distributions of target and decoy scores. Applications to DIA-NN and MS2Rescore show that scope and aggregation choices can materially alter both estimated error rates and list reproducibility. We provide a practical reporting checklist and an open-source R package (diagFDR, available from CRAN) that generates diagnostic reports from standard software outputs. As a minimal verifiable reporting standard, we recommend that any "FDR = %" claim specify the controlled unit and scope, report tail support at the operating cutoff, and make decoy-inclusive outputs available for independent verification. HighlightsO_LIFDR claims can be misleading without explicit scope, calibration, and stability assessment. C_LIO_LIdiagFDR introduces pipeline-agnostic diagnostics from standard software outputs. C_LIO_LIThe granularity paradox shows sparse decoy tails can make stringent cutoffs numerically fragile. C_LIO_LICase studies show that scope misuse and rescoring can affect both error rates and stability. C_LIO_LIdiagFDR produces reviewer-ready reports and a practical reporting checklist. C_LI

diagFDR: Verifiable False Discovery Rate Reporting in Proteomics via Scope, Calibration, and Stability Diagnostics

Matching journals