Biometrics
◐ Oxford University Press (OUP)
Preprints posted in the last 90 days, ranked by how well they match Biometrics's content profile, based on 22 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Kornilov, S. A.
Show abstract
Shenhar et al. (2026) report 50% "intrinsic" lifespan heritability after calibrating a one-component correlated-frailty survival model to Scandinavian twin lifespans. Their framework is mathematically coherent, but the intrinsic component is not identified if heritable, mortality-relevant extrinsic susceptibility is omitted at calibration. We show that one-component calibration absorbs omitted familial extrinsic structure into the intrinsic frailty scale parameter{sigma}{theta} , and that this variance absorption is visible through separate diagnostics (1) Variance absorption. Under misspecification,{sigma}{theta} is inflated by +22.1% (95% CI: 21.5-22.7%), corresponding to +49% inflation in [Formula]. Falconer h2 is downstream of calibration and inherits a +9.2 pp bias (95% CI: 8.7-9.7). The{sigma}{theta} inflation is model-general: +22% (GM), +18% (MGG), +14% (SR); any dependence summary that is strictly increasing in{sigma}{theta} inherits this inflation, so Falconer h2 is one affected downstream quantity among many (Corollary B3). (2) Structural fingerprint. In the joint twin survival surface S(t1, t2), misspecification produces systematic dependence errors (ISE 48x that of the recovery model). Conditional twin dependence is inflated at all ages, peaking at age 80 ({Delta}r = 0.048). (3) Specificity. The bias requires an omitted component that is both heritable and mortality-relevant. Three negative controls, a boundary check ({rho} = 0), and a two-component recovery refit ({sigma}{theta} restored to within -3.2%) establish specificity. ACE decomposition yields C {approx} 0 throughout: the omitted extrinsic component loads onto A (because it is shared 1.0/0.5 in MZ/DZ), so switching summary statistics does not restore identification. (4) Sensitivity and falsifiability. Over an empirically anchored regime ({sigma}{gamma} [isin] [0.30, 0.65],{rho} [isin] [0.20, 0.50]), Falconer bias ranges from +2.8 to +18.9 pp (mean 9 pp). If{rho} is sufficiently negative, the bias reverses sign in all three model families (Corollary B4). A full-likelihood robustness check shows that this upward pull is partly structural and partly estimator-specific: in the same misspecified one-component model, ML still inflates{sigma}{theta} (+3%), whereas matching only rMZ inflates it much more (+21%). These results do not resolve true intrinsic heritability but establish that Shenhars 50% estimate carries a structured, model-general upward bias originating in the fitted latent variance{sigma}{theta} .
Gill, P.; Bleka, O.
Show abstract
The interpretation of trace DNA evidence at activity level requires explicit modelling of transfer, persistence, and failure to detect a person of interest. We present the theoretical foundations of HaloGen, an open-source hierarchical Bayesian framework for evaluating biological results under competing activity-level propositions, such as direct versus secondary transfer. HaloGen accounts for dropout, multiple contributors, and multiple stains. Evidence is evaluated using an exhaustive-propositions likelihood ratio frame-work that combines information across contributors and stains, while fully accounting for uncertainty in transfer and detection. Observed DNA quantities and non-detects are handled consistently within a single probabilistic model, avoiding reliance on fixed parameter estimates. The framework yields intuitive and robust behaviour: strong support for direct transfer when DNA quantities are informative, and appropriately neutral or defence-leaning likelihood ratios in low-information or non-detect scenarios. An empirically constrained fail-rate parameter prevents spurious inflation of likelihood ratios when offender detection is unlikely, providing stability across laboratories and experimental conditions. This paper establishes the theoretical basis of HaloGen; a companion paper addresses validation and applied casework examples.
Melton, H. J.; Bradley, J. R.; Wu, C.
Show abstract
Oral squamous cell carcinomas (OSCC), the predominant head and neck cancer, pose significant challenges due to late-stage diagnoses and low five-year survival rates. Spatial transcriptomics offers a promising avenue to decipher the genetic intricacies of OSCC tumor microenvironments. In spatial transcriptomics, Cell-type deconvolution is a crucial inferential goal; however, current methods fail to consider the high zero-inflation present in OSCC data. To address this, we develop a novel zero-inflated version of the hierarchical generalized transformation model (ZI-HGT) and apply it to the Conditional AutoRegressive Deconvolution (CARD) for cell-type deconvolution. The ZI-HGT serves as an auxiliary Bayesian technique for CARD, reconciling the highly zero-inflated OSCC spatial transcriptomics data with CARDs normality assumption. The combined ZI-HGT + CARD framework achieves enhanced cell-type deconvolution accuracy and quantifies uncertainty in the estimated cell-type proportions. We demonstrate the superior performance through simulations and analysis of the OSCC data. Furthermore, our approach enables the determination of the locations of the diverse fibroblast population in the tumor microenvironment, critical for understanding tumor growth and immunosuppression in OSCC.
Goncalves, B. P.; Franco, E. L.
Show abstract
Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.
Romanescu, R.; Liu, M.
Show abstract
We consider the problem of optimal testing for genetic interaction between two variants, allowing for possible main effects. Finding a most powerful test is important because it ends a series of attempts in the literature to construct ever more powerful tests for interaction at the variant pair level. Testing under a logistic regression model is known to be underpowered, partly because patterns of enrichment in the genotypes themselves are lost when regarding genotypes solely as predictors. Instead, we use the retrospective likelihood approach, which makes use of all the data by treating genotypes as outcomes alongside affection status. Using a parsimonious parameterization of penetrance based on the risk ratio, which links directly to the population prevalence and avoids having to estimate an intercept term, we construct an approximate uniformly most powerful unbiased test for interaction. This test is based on optimal testing theory and accounts for nuisance main effects without requiring their explicit estimation. The test statistic can be easily modified for optimal testing under other modes of genetic interaction, such as recessive x recessive or dominant x dominant. We demonstrate significant power gains compared to the odds-ratio-based PLINK test, in simulation studies. Finally, we apply the test to scan for interactions in IBD cases and controls from the UK Biobank. The top SNP pairs show enrichment for a pathway related to existing therapies for IBD.
Hao, H.; Chen, D.; Qian, C.; Zhou, X.; Huang, H.; Zuo, J.; Wang, G.; Peng, X.; Liu, H.-X.
Show abstract
Causal effects in complex traits are typically represented by a single linear slope. While conventional Mendelian randomization (MR) provides efficient scalar estimates, projection-based summaries do not explicitly capture structural organisation in joint effect space under genetic heterogeneity. We introduce MR-UBRA (Mendelian randomization-Unified Bayesian Risk Architecture), a probabilistic framework that decomposes instrumental variants into genetic risk fragments (GRFs) and quantifies extreme deviations using tail-risk metrics defined on the standardised residual magnitude |e|. MR-UBRA preserves the classical MR estimand while offering a structurally resolved representation of genetic heterogeneity. Across stroke subtypes, AF[->]CES, smoking[->]lung cancer, and BMI[->]T2D, effect-space distributions exhibit reproducible asymmetry, amplitude stratification, and multi-modal structure. MR-UBRA resolves component-level organisation, separating tail-dominant contributions from the causal core while maintaining consistency with the classical MR estimand. Simulations and boundary regimes demonstrate adaptive model complexity: MR-UBRA selects K>1 when multi-component structure is present and collapses to K=1 under homogeneous conditions, avoiding spurious stratification. These results support viewing causal effects in complex traits as structured distributions in joint effect space, enhancing causal representation without altering the MR estimand. Graphical AbstractMendelian randomization (MR) typically represents causal effects with a single linear slope. Under genetic heterogeneity, instrumental effects in joint ({beta}X, {beta}Y) space may exhibit multi-component structure and amplitude stratification that cannot be captured by a scalar summary. MR-UBRA fits a standard error-weighted mixture model to decompose instruments into genetic risk fragments (GRFs), estimates GRF-specific effects using posterior-weighted soft-IVW, and quantifies extreme deviations through tail-risk metrics (VaR/CVaR). Across empirical analyses and boundary regimes, MR-UBRA adapts model complexity (K) to structural signal, collapsing to K=1 under homogeneous conditions. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=144 SRC="FIGDIR/small/26348288v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@1627086org.highwire.dtl.DTLVardef@1c9982eorg.highwire.dtl.DTLVardef@262730org.highwire.dtl.DTLVardef@d6e551_HPS_FORMAT_FIGEXP M_FIG C_FIG
Ivanov, S.; Fosse, S.; dos reis, M.; Duchene, S.
Show abstract
Bayesian inference of divergence times for extant species using molecular data is an unconventional statistical problem: Divergence times and molecular rates are confounded, and only their product, the molecular branch length, is statistically identifiable. This means we must use priors on times and rates to break the identifiability problem. As a consequence, there is a lower bound in the uncertainty that can be attained under infinite data for estimates of evolutionary timescales using the molecular clock. With infinite data (i.e., an infinite number of sites and loci in the alignment) uncertainty in ages of nodes in phylogenies increases proportionally with their mean age, such that older nodes have higher uncertainty than younger nodes. On the other hand, if extinct taxa are present in the phylogeny, and if their sampling times are known (i.e., heterochronous data), then times and rates are identifiable and uncertainties of inferred times and rates go to zero with infinite data. However, in real heterochronous datasets (such as viruses and bacteria), alignments tend to be small and how much uncertainty is present and how it can be reduced as a function of data size are questions that have not been explored. This is clearly important for our understanding of the tempo and mode of microbial evolution using the molecular clock. Here we conducted extensive simulation experiments and analyses of empirical data to develop the infinite-sites theory for heterochronous data. Contrary to expectations, we find that uncertainty in ages of internal nodes scales positively with the distance to their closest tip with known age (i.e., calibration age), not their absolute age. Our results also demonstrate that estimation uncertainty decreases with calibration age more slowly in data sets with more, rather than fewer site patterns, although overall uncertainty is lower in the former. Our statistical framework establishes the minimum uncertainty that can be attained with perfect calibrations and sequence data that are effectively infinitely informative. Finally, we discuss the implications for viral sequence data sets. In a vast majority of cases viral data from outbreaks is not sufficiently informative to display infinite-sites behaviour and thus all estimates of evolutionary timescales will be associated with a degree of uncertainty that will depend on the size of the data set, its information content, and the complexity of the model. We anticipate that our framework is useful to determine such theoretical limits in empirical analyses of microbial outbreaks.
Hripcsak, G.; Anand, T.; Chen, H. Y.; Zhang, L.; Chen, Y.; Suchard, M. A.; Ryan, P. B.; Schuemie, M. J.
Show abstract
Propensity score adjustment is commonly used in observational research to address confounding. Controversy persists about how to select covariates as possible confounders to generate the propensity model. A desire to include all possible confounders is offset by a concern that more covariates will augment bias or increase variance. Much of concern is over instruments, which are variables that affect the treatment but not the outcome. Adjusting for an instrument has been shown to increase bias due to unadjusted confounding and to increase the variance of the effect estimate. Large-scale propensity score (LSPS) adjustment includes most available pre-treatment covariates in its propensity model. It addresses instruments with a pair of diagnostics, ceasing the analysis if any covariate exceeds a correlation coefficient of 0.5 with the treatment and checking for an aggregation of instruments with equipoise reported as a preference score. Our simulation assesses the impact of adjusting for instruments in the context of LSPSs diagnostics. In our simulation, even when the variance of the treatment contributed by the adjusted instrument(s) exceeds an unadjusted confounder by over twenty-fold, when the correlation between the instrument(s) and the treatment was less than 0.5 and the equipoise was greater than 0.5, the additional shift in the effect estimate due to adjusting for the instrument(s) was less than the shift due to confounding by itself. Therefore, we find in this simulation that adjusting for instruments contributed a minor amount of bias to the effect estimate. This simulation aligns well with a previous assessment of the impact of adjusting for instruments and with separate empirical evidence that adjusting for many covariates surpasses attempts to identify a limited set of confounders.
Chen, J.; Lambe, T.; Kamau, E.; Donnelly, C.; Lambert, B.; Bajaj, S.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWSerological surveys measure the presence of antibodies in a population to infer past exposure to an infectious pathogen. If study participants ages are known, serocatalytic models can be used to retrace the historical transmission strength of a pathogen within that population, quantified by the force of infection (FOI). These models rely on age information as a key variable since infection risks are interpreted in relation to how long individuals have been at risk. However, due to data constraints, participants ages may be provided only within "age bins". A common approach is then to assign individuals ages to be midpoints of their respective age bins, ignoring uncertainty in this quantity. In this study, we quantify the bias introduced by this midpoint approach and develop a Bayesian framework that explicitly accounts for uncertainty in age. By comparing inference under constant, age-dependent, and time-dependent FOI scenarios, we show that incorporating uncertainty in age in serocatalytic models yields more reliable FOI estimates without sacrificing computational complexity. These improvements support the interpretation of serological data and inform public health decisions, such as estimating disease burden and identifying targeted vaccination groups.
Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.
Show abstract
Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.
Jones, L.; Barnett, A.; Hartel, G.; Vagenas, D.
Show abstract
Background: In health research, variability in modelling decisions can lead to different conclusions even when the same data are analysed, a challenge known as inferential reproducibility. In linear regression analyses, incorrect handling of key assumptions, such as normality of the residuals and linearity, can undermine reproducibility. This study examines how violations of these assumptions influence inferential conclusions when the same data are reanalysed. Methods: We randomly sampled 95 health-related PLOS ONE papers from 2019 that reported linear regression in their methods. Data were available for 43 papers, and 20 were assessed for computational reproducibility, with three models per paper evaluated. The 14 papers that included a model at least partially computationally reproduced were then examined for inferential reproducibility. To assess the impact of assumption violations, differences in coefficients, 95% confidence intervals, and model fit were compared. Results: Of the fourteen papers assessed, only three were inferentially reproducible. The most frequently violated assumptions were normality and independence, each occurring in eight papers. Violations of independence were particularly consequential and were commonly associated with inferential failure. Although reproduced analyses often retained the same binary statistical significance classification as the original studies, confidence intervals were frequently wider, indicating greater uncertainty and reduced precision. Such uncertainty may affect the interpretation of results and, in turn, influence treatment decisions and clinical practice. Conclusion: Our findings demonstrate that substantial violations of key modelling assumptions often went undetected by authors and peer reviewers and, in many cases, were associated with inferential reproducibility failure. This highlights the need for stronger statistical education and greater transparency in modelling decisions. Rather than applying rigid or misinformed rules, such as incorrectly testing the normality of the outcome variable, researchers should adopt modelling frameworks guided by the research question and the study design. When assumptions are violated, appropriate alternatives, such as robust methods, bootstrapping, generalized linear models, or mixed-effects models, should be considered. Given that assumption violations were common even in relatively simple regression models, early and sustained collaboration with statisticians is critical for supporting robust, defensible, and clinically meaningful conclusions.
Vloeberghs, R.; Tuerlinckx, F.; Urai, A. E.; Desender, K.
Show abstract
A widely used framework for studying the computational mechanisms of decision making is the Drift Diffusion Model (DDM). To account for the presence of both fast and slow errors in empirical data, the DDM incorporates across-trial variability in parameters such as the drift rate and the starting point. Although these variability parameters enable the model to reproduce both fast and slow errors, they rely on the assumption that over trials each parameter is independently sampled. As a result, the DDM effectively predicts that errors-- whether fast or slow--occur randomly over time. However, in empirical data this assumption is violated, as error responses are often temporally clustered. To address this limitation, we introduce the autocorrelated DDM, in which trial-to-trial fluctuations in drift rate, starting point, and boundary evolve according to first-order autoregressive (AR1) processes. Using simulations, we demonstrate that, unlike the across-trial variability DDM, the autocorrelated DDM naturally accounts for temporal clustering of errors. We further show that model parameters can be reliably recovered using Amortized Bayesian Inference, even with as few as 500 trials. Finally, fits to empirical data indicate that the autocorrelated DDM provides the best account of error clustering, highlighting that computational parameters fluctuate over time, despite typically being estimated as fixed across trials.
Sapoval, N.; Nakhleh, L.
Show abstract
Gene tree parsimony (GTP) is a common approach for efficient reconciliation of multiple discordant gene tree phylogenies for the inference of a single species tree. However, despite the popularity of GTP methods due to their low computational costs, prior work has shown that some commonly employed parsimony costs are statistically inconsistent under the multispecies coalescent process. Furthermore, a fine-grained analysis of the inconsistency has indicated potentially complimentary behavior of duplication and deep coalescence costs for symmetric and asymmetric species trees. In this work, we prove inconsistency of GTP estimators for all linear combinations of duplication, loss and deep coalescence scores. We also explore empirical implications of this result evaluating inference results of several GTP cost schemes under varying levels of incomplete lineage sorting.
Li, K.; Hou, Y.; Mukherjee, B.; Pitzer, V. E.; Weinberger, D. M.
Show abstract
Household transmission studies are important for understanding infectious disease transmission and evaluating interventions; however, they are frequently constrained by methodological challenges, including in study design and sample size determination, and in estimating parameters of interest after collecting the data. Existing tools often lack flexibility in modeling age-specific susceptibility, infectivity patterns, and the impact of interventions such as vaccination or prophylaxis. Here, we develop HHBayes, an open-source R package that provides a unified framework for simulating and analyzing household transmission data using Bayesian methods. The package enables researchers to: (1) simulate realistic household transmission dynamics with highly customizable variables; (2) incorporate viral load data (measured in viral copies/mL or cycle threshold values) to model time-varying infectiousness; (3) estimate age-dependent susceptibility and infectivity parameters using Hamiltonian Monte Carlo methods implemented in Stan; and (4) evaluate intervention effects through user-defined covariates that modify susceptibility or infectivity. We demonstrate the capabilities of the package through simulation studies showing accurate parameter recovery and applications to seasonal respiratory virus transmission, including the impact of vaccination and antiviral prophylaxis on household attack rates. HHBayes addresses a critical gap in infectious disease epidemiology by providing researchers with accessible tools for both prospective study design and retrospective data analysis. The flexibility of the package in handling complex household structures, time-varying infectiousness, and intervention effects makes it valuable for studying diverse pathogens.
Kesimoglu, Z. N.; Hodzic, E.; Hoinka, J.; Amgalan, B.; Hirsch, M. G.; Przytycka, T. M.
Show abstract
Mutational signatures represent characteristic mutational patterns imprinted on the genome by mutagenic processes. They can provide information about the impact of the environmental and endogenous cellular processes on tumor mutations and can suggest treatment. Analysis of presence and strength of mutational signatures in cancer genomes has become a cornerstone in analysis of new and legacy cancer data. However, a precise inference of novel (de novo) signatures requires a large set of genomes, and methods focusing on estimating the presence of previously defined signatures are unable to uncover potential novel signatures that might emerge in new data. Thus, reliable methods to address these challenges are needed. We formally define the Combined Mutational Signature Inference Problem (CMSI) for the identification of known signatures and the inference of novel signatures in cancer data. CMSI represents non-convex optimization, and we provide a heuristic algorithm, ReDeNovo, to solve it efficiently. We extensively validated ReDeNovo on simulated data, evaluating its ability to precisely estimate presence and exposure to known signatures and to discover of novel signatures. On both tasks ReDeNovo outperformed existing approaches. In real biological data, ReDeNovo identified signatures missed by previous analyses and defined a new signature related to UV light exposure. ReDeNovo method provides a new and powerful tool to uncover mutational signatures. ReDeNovo is available from https://github.com/ncbi/redenovo.
Yelmen, B.; Güler, M. N.; Estonian Biobank Research Team, ; Kollo, T.; Möls, M.; Charpiat, G.; Jay, F.
Show abstract
Over the past two decades, genome-wide association studies (GWAS) enabled the discovery of thousands of variants associated with many complex human traits. However, conventional GWAS are still widely performed with linear models with the assumption that the genetic effects are predominantly additive. In this work, we investigate the test statistic behavior when linear models are used to obtain significant genotype-phenotype associations without accounting for epistasis. We first algebraically derive mean and variance shift in the null statistic due to the omitted interaction term, and define the boundary between conservative (i.e., deflated statistic tail) and anti-conservative (i.e., inflated statistic tail) regimes for the common GWAS significance threshold. We then perform phenotype simulation analyses using the Estonian Biobank genotypes and validate the mathematical model. We demonstrate that the anti-conservative regime is plausible under realistic parameter settings and models omitting interaction terms can produce spurious significance. Our findings suggest caution when interpreting statistically significant signals reported in the literature based on linear models, especially for large-scale GWAS.
Gurel, O.; Rasmussen, M. F.; Veginati, V.; Weinstein, J. N.
Show abstract
Large language models (LLMs) increasingly guide clinical decisions through population-level evidence, yet they cannot encode individual patient preferences. When treatments yield comparable outcomes, patient choice may drive decisions, though its effect remains unquantified. The Spine Patient Outcomes Research Trial (SPORT)--marked by similar surgical and nonoperative results and substantial crossover--provided a natural experiment to use causal-inference methods to estimate unbiased treatment effects and quantify the contribution of patient choice to outcomes. Using only published aggregate results from SPORT, we conducted two-stage least squares instrumental-variable analysis using randomized treatment assignment as the instrument, with Complier Average Causal Effects (CACE) and E-values assessing sensitivity to unmeasured confounding. Primary outcomes were SF-36 Bodily Pain, SF-36 Physical Function scores, and the Oswestry Disability Index. We decomposed treatment effects into , the biological treatment mechanism, and {beta}, the patient-choice contribution. Aggregate estimates revealed G = 15.7 (0.5) and {beta}G = 7.4 (3.4), with the net difference between surgical and nonoperative treatment effects {Delta} {approx} 0.65. This analysis quantifies a measurable and significant effect of patient choice ({beta}) on physical outcomes. When treatment effects are comparable ({Delta} small), {beta}--a dimension inaccessible to current LLMs trained on -biased population-level evidence--emerges as the dominant driver of decision-making. These findings provide an empirical grounding for informed choice, clarify the limits of LLMs trained on -biased evidence, and quantify a structural constraint in AI-driven clinical decision support. Key messagesO_LIThe effect of patient choice ({beta}) on physical outcomes is real, measurable, and clinically meaningful. C_LIO_LI{beta} becomes the dominant driver of outcomes when biological treatment differences ({Delta}) are small. C_LIO_LILLMs cannot encode {beta} because they are trained on -biased population-level evidence. C_LIO_LIThese findings provide the empirical foundation for informed choice--not just informed consent. C_LI
Liu, C.; Mayer, M.; Lactaoen, K.; Gomez, L.; Weissman, G.; Hubbard, R.
Show abstract
Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-exchangeability between internal and external control patients. To address this challenge, we developed a sensitivity analysis framework to assess the robustness of HCT results to potential unmeasured confounding. We propose a tipping point analysis that adapts the E-value framework to the HCT setting where trial participation rather than treatment assignment is subject to confounding. To aid interpretation, we also introduce a data-driven benchmark representing the strength of unmeasured confounding reflected by the observed outcome non-exchangeability. We then propose an operational decision rule and evaluate its performance through simulation studies. Finally, we illustrate the approach using an asthma trial augmented by data from electronic health records. Simulation results demonstrate that our decision rule safeguards against Type I error inflation while preserving the power gains achieved by incorporating external data. In settings where moderate unmeasured confounding led to poorer outcomes for external controls, Type I error was controlled near the nominal 5% level, and power increased by 10-20% compared with analyses using RCT data alone. Our approach provides a practical, interpretable method to assess HCT robustness, supporting rigorous inference when integrating external real-world data.
Zhou, Y.-H.; Sun, G.
Show abstract
Fecal microbiota transplantation (FMT) has emerged as a highly effective treatment for recurrent Clostridioides difficile infection and is being actively investigated for numerous other conditions. While multi-omics studies have revealed dynamic changes in microbial communities and host metabolism following FMT, existing approaches are primarily descriptive and lack the ability to predict individual patient trajectories or identify early biomarkers of treatment response. Small-sample, multi-omics, longitudinal prediction problems present unique computational challenges: high dimensionality, multi-omics integration, temporal dynamics, and interpretability. Here, we present Hierarchical Multi-Omics Trajectory Prediction (HMOTP), a novel machine learning framework specifically designed for small-sample, multi-omics, longitudinal prediction that addresses these challenges through hierarchical feature construction using domain knowledge, multi-level attention mechanisms, and patient-specific trajectory prediction. HMOTP integrates multi-omics data at multiple biological levels (raw features, aggregated classes/categories, and cross-level interactions) while preserving biological interpretability. The framework employs multi-head attention to learn feature importance at different hierarchy levels and integrates information across omics layers. Patient-specific trajectory prediction enables personalized predictions despite limited sample sizes through transfer learning. We evaluated HMOTP on a cohort of 15 patients with recurrent Clostridioides difficile infection who underwent fecal microbiota transplantation, with comprehensive lipidomics (397 features) and metagenomics (10,634 pathways) profiling at four timepoints spanning six months. Using leave-one-patient-out cross-validation, HMOTP achieved 96.67% {+/-} 10.54% accuracy, outperforming baseline methods including Random Forest (91.33% {+/-} 21.33%) and Logistic Regression (86.33% {+/-} 24.67%). The framework demonstrated robust generalization across timepoints. Through hierarchical interpretability, HMOTP identified key biomarkers and revealed mechanistically informative cross-omics associations, including 324 strong correlations (|r| > 0.7) involving top-predictive biomarkers, demonstrating its utility for both prediction and biological discovery in FMT applications. HMOTP provides a generalizable framework applicable to other small-sample multi-omics problems, offering a powerful tool for personalized medicine applications. Biographical NoteProf. Zhou is an interdisciplinary statistician and machine learning expert whose work develops innovative computational methods for multi-omics integration, biomedical prediction, and precision medicine applications. Key PointsOur novel framework, HMOTP, addresses this challenge through three key innovations: O_LIHierarchical feature construction using domain knowledge - Reduces dimensionality while preserving biological interpretability, unlike PCA-based methods C_LIO_LIMulti-level attention mechanisms - Learns feature importance at multiple biological scales (individual features [->] classes [->] cross-omics interactions) C_LIO_LIPatient-specific trajectory prediction with transfer learning - Enables personalized predictions despite limited sample sizes (parameter-sharing within the cohort, not external pre-training) C_LI
Demdiont, A. C.
Show abstract
Algorithmic decision systems mediate access to healthcare, credit, employment and housing, yet individuals who experience adverse decisions face multi-stage barriers when seeking recourse. We formalize these barriers as a series-structured system with 11 empirically parameterized stages across three layers (data integration, data accuracy and institutional access) and prove that single-barrier interventions are bounded by baseline system success. Under baseline parameterization derived from federal datasets and peer-reviewed algorithmic audit studies, end-to-end recourse probability is 0.0018%. Removing any single barrier yields negligible improvement (<0.02%). Factorial decomposition reveals that the three-way cross-layer interaction accounts for 87.6% of achievable improvement, confirmed by Shapley attribution, Sobol sensitivity analysis and bootstrap resampling (n = 1,000). These results provide a structural explanation for the limited impact of incremental reforms and support coordinated multi-layer intervention approaches for clinical AI governance and algorithmic fairness.