BMJ
● BMJ
Preprints posted in the last 7 days, ranked by how well they match BMJ's content profile, based on 49 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Xiang, J.; Zhu, B.; Xu, H.; Chen, Y.; Sun, X.; xiang, r.; Zhao, Y.; Liu, W.; Zhang, L.; He, J.; liu, j.; Chen, Y.; Fan, Z.; Zhang, H.; Tan, J.; Pang, L.; Shi, L.; Kong, Y.; Cai, A.
Show abstract
Background Thalassemia is one of the most common monogenic disorders worldwide, current screening strategies combining hematological testing with molecular assays still carry a risk of missed diagnoses and undesirable efficiency, particularly for complex structural variants and rare mutations. Methods In this prospective double-blind, multicenter cohort study of 3,842 participants (3,362 pregnant women and 480 male partners), we conducted a head-to-head comparison to systematically evaluate the incremental clinical value and detection performance of single-molecule nanopore sequencing in thalassemia (SMITH) against conventional hematological testing and next-generation sequencing (NGS). Findings The overall concordance rate between NGS and SMITH was 98.6% (3789/3842). The discrepant cases (n=53) were directly attributed to the superior detection capabilities of SMITH, which successfully identified complex structural rearrangements-including 45 -globin gene triplications and four HK alleles-that were missed by NGS. Furthermore, SMITH accurately detected four rare variants (c.134_135insT/, c.-22(C>T)/, {beta}N/{beta}c.316-290delinsAGGGCAATAATTT and {beta}3.5 kb deletion/{beta}N ) and resolved ten trans and three cis configurations within the globin gene allele. Clinically, these technical advantages translated to a 9.3% (5/54) increase in the detection rate of high-risk prenatal couples, effectively preventing one birth affected by moderate-to-severe thalassemia. Additionally, SMITH corrected a diagnostic discrepancy in one case (HK vs. -3.7), sparing the couple from an unnecessary invasive procedure. Interpretation Our findings demonstrate that SMITH provides a powerful platform for resolving globin gene rearrangements, detecting rare variants, and enabling direct haplotype phasing. By effectively eliminating diagnostic blind spots, SMITH is expected to become an optimal method for thalassemia prevention programs. Funding This study was supported by Chinese National Natural Science Foundation Projects 81760037 and 82271894.
Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.
Show abstract
Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.
Kirakoya Samadoulougou, F.; Barche, B.; Ukwishaka, J.; Subedi, S.; Erchick, D. J.; Suarez Idueta, L.; Hamer, D. H.; Semrau, K. E. A.; Hamomba, F. M.; Banda, B.; Manasyan, A.; Pry, J. M.; Maleta, K.; Ashorn, U.; Schmiegelow, C.; Hjort, L.; Minja, D. T. R.; Lusingu, J. P. A.; Freitas da Silveira, M.; Buffarini, R.; Baqui, A. H.; Khanam, R.; Ahmed, S.; Zhu, Z.; Zeng, L.; Cheng, Y.; Lachat, C.; Roberfroid, D.; Huybregts, L.; Toe, L. C.; Tielsch, J. M.; Khatry, S. K.; Mullany, L. C.; Ohuma, E. O.; Blencowe, H.; Katz, J.; Lee, A. C. C.; Black, R. E.; Hazel, E. A.
Show abstract
Background Large-for-gestational-age (LGA) and macrosomic newborns are at increased risk of adverse perinatal outcomes, including death, yet the burden of neonatal mortality associated with these conditions in low- and middle-income countries (LMICs), where ongoing nutritional and epidemiological transitions suggest their prevalence will rise, remains poorly quantified. In this study, we quantify the neonatal mortality risk associated with LGA and macrosomia from 16 subnational birth cohorts in low- and middle-income countries between 2000 and 2017. Methods and findings This is an individual-participant meta-analysis to estimate neonatal mortality rates (NMRs) and relative risks among LGA infants (>90th and >97th percentile birth weight-for-gestational-age using INTERGROWTH-21st) versus appropriate-for-gestational-age (AGA, 10th-90th percentile) infants. Macrosomic ([≥]4000 g and [≥]4500 g) neonates were compared with those weighing 2500 g-3999g. Missing birth weights were imputed using recalibration and multiple imputation methods. We used random effects meta-analysis to pool relative risks. Median prevalences of LGA >90th and >97th percentile were 5.3% (interquartile range 3.6-8.2) and 2.6% (IQR 1.3-4.5), respectively; macrosomia ([≥]4000 g and [≥]4500 g) prevalences were 1.0% (IQR 0.3-3.1) and 0.06% (IQR 0.0, 0.30), respectively. Mortality was highest among preterm plus LGA infants (61.3 per 1000). LGA infants in the >90th percentile had over twofold increased mortality compared with appropriate-for-gestational-age infants (RR: 2.46; 95% CI: 1.86-3.25), while >97th percentile infants had a higher risk (RR: 3.77; 95% CI: 2.50-5.69). Term LGA >97th percentile infants also showed elevated mortality (RR: 3.14; 95% CI: 1.58-6.22). For LGA >97th percentile, the risk was higher in the early neonatal period (RR: 2.71; 95% CI: 1.92-3.82) than late (RR: 1.69; 95% CI: 1.22-2.34). There was no overall association between macrosomia ([≥]4000 g) and neonatal mortality. Population attributable fractions were 7.2% for LGA >90th percentile and 0.4% for macrosomia ([≥]4000 g). Conclusions Neonatal mortality risks were elevated among LGA infants in low- and middle-income countries, particularly at extreme values (>97th percentile) and during the early neonatal period. Macrosomia showed weaker, less robust associations. Although LGA prevalence is currently low ([~]5%) and contributes less to neonatal mortality than small newborns, ongoing nutritional and epidemiological transitions suggest increasing prevalence. This highlights the need for strengthened surveillance, monitoring, and improved delivery planning to ensure that no population is left behind.
Das, P.; Schneider, J.; Mayo-Wilson, E.; Kilicoglu, H.; Menke, J. D.; Nam, D.; Ninan, K.; Oberste, J.-P.; Troy, A. M.; Ying, X.; Holt, A. W.; Smalheiser, N. R.
Show abstract
Objectives: Study design indexing of biomedical publications is crucial for evidence retrieval and synthesis. We sought to evaluate the accuracy and suitability of a transformer-based model (TM) for indexing clinical study designs, in comparison to National Library of Medicine (NLM) indexing. However, this is challenging for at least three reasons: First, to date, all automated systems have been trained and evaluated on manual NLM indexing assignments, itself subject to errors. Second, TM's probabilistic predictive scores take into account uncertainty, and can be converted to TRUE/FALSE assignments in different ways depending on the needs of users, while NLM labels are categorical. Third, our goal (to tag articles only that exhibit a given design) differs from NLM which tags articles that both discuss as well as exhibit that design. Materials and Methods: Therefore, we carried out a limited evaluation of the TM model that focuses only on the articles that received the most confident predictions, that is, the highest scores that are almost certainly TRUE and the lowest scores that are almost certainly FALSE, but which disagreed with NLM assignments. This was performed both for articles published in 2016 (when NLM decisions were manual) and in 2025 (when NLM decisions were automated). To establish ground truth, dual annotators indexed the articles independently, following written definitions, for four prominent study designs--cohort, case-control, cross-sectional, and case report. Results: For three designs (case-control, case report, cross-sectional), the articles having the top 100 predictive TM scores (when NLM failed to assign that design) were judged to exhibit that design in the great majority (86-100%) of cases. Conversely, the articles having the lowest 100 predictive TM scores (when NLM did assign the study design) exhibited the design only in relatively few (0-21%) of cases. The most confident predictions of the TM model were highly accurate and not redundant with automated NLM indexing; the exception was cohort studies articles, in which both TM and NLM labels showed high error rates of both omission and commission. Discussion and Conclusion: TM may have value for identifying articles exhibiting study designs, which is especially important for clinical decision-making as well as systematic reviews and other evidence syntheses. NLM indexing of cohort studies cannot be regarded as a reliable gold standard for training or evaluation of automated systems, warranting efforts to create a new manually annotated corpus.
Carlisle, N.; Zhang, M.; Simpson, N.; Stacey, T.
Show abstract
Background Tobacco smoking during pregnancy increases the risk of preterm birth, small for gestational age (SGA), stillbirth, and longer-term adverse health outcomes. Globally, reducing smoking in pregnancy is a key public health priority, yet the organisation, accessibility, and effectiveness of cessation support varies substantially between countries and healthcare systems. Differences in policy implementation, resource allocation, and integration of cessation services into antenatal care influence uptake and success rates across diverse settings. In England, pregnant women are entitled to free smoking cessation support, however, service delivery varies across regions with mixed efficacy. While tobacco smoking is more prevalent in deprived communities, there is limited understanding of how, why, for whom, and under what circumstances these services are most effective, particularly in areas of social deprivation, such as the North East and Yorkshire. Objective To conduct a realist evaluation to understand how smoking cessation services support pregnant women in areas of social deprivation to stop smoking and reduce adverse perinatal outcomes. Methods This multi-site realist evaluation will be conducted across three NHS maternity services in West Yorkshire, England. The study comprises four iterative stages: (1) development of initial programme theories through realist-informed literature scoping and stakeholder consultation; (2) case study data collection including qualitative interviews with pregnant women (approximately 15-30) and staff (approximately 15-30); (3) analysis of routine anonymised maternity and neonatal electronic data collected over a one-year period; and (4) realist analysis to refine context-mechanism-outcome (CMO) configurations. Qualitative data will be analysed using realist logic supported by NVivo software. Quantitative data will be analysed using descriptive and inferential statistics to explore associations between smoking cessation engagement and perinatal outcomes. Ethics and dissemination Ethical approval was obtained through the UK Health Research Authority and a Research Ethics Committee prior to study commencement (IRAS 364173; REC reference number 26/SC/0020). Findings will inform recommendations to improve smoking cessation support for pregnant women in deprived areas. Results will be disseminated through peer-reviewed publications, conference presentations, and stakeholder engagement.
Landry, T. C.; Kim, Y.
Show abstract
Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.
Collier, A.
Show abstract
Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.
Sood, E.; Canter, K.; Arasteh, K.; Kazak, A. E.
Show abstract
Background: Maternal mental health problems are common after prenatal diagnosis of congenital heart disease (CHD), with long-term implications for child and family wellbeing. HEARTPrep is a prenatal psychosocial intervention with three self-paced modules and corresponding telehealth sessions, delivered during pregnancy via mobile app to improve mental health and wellbeing for mothers expecting a baby with CHD. This proof-of-concept study evaluated the feasibility of HEARTPrep and examined maternal mental health and psychosocial functioning throughout participation. Methods: Participants were mothers receiving care for a fetal CHD diagnosis within one health system. Feasibility was assessed via rates of enrollment and completion. Mothers completed 4-item PROMIS questionnaires assessing anxiety, depression, and social isolation and reported self-efficacy and hope on a weekly basis throughout HEARTPrep. Results: Of 34 recruited mothers, 29 (85%) enrolled and two were subsequently not eligible (delivery prior to participation, change in fetal diagnosis), resulting in a final sample of 27 mothers. The majority (n = 22, 81%) completed all three telehealth sessions and Modules 1 (n = 22, 81%) and 2 (n = 19, 70%), with just over half (n = 14, 52%) completing Module 3 prior to delivery. Mean PROMIS depression T-scores decreased from 57.5 to 52.9, and 48% of mothers had a decrease in depression scores exceeding the meaningful change threshold (half standard deviation). The percentage of mothers reporting high self-efficacy increased from 19% to 48%. Conclusions: HEARTPrep is feasible and corresponds with reduced maternal depression and increased self-efficacy, supporting proof-of-concept. A randomized controlled trial is needed to determine whether HEARTPrep improves outcomes compared to a control group.
Moe, A. B.; Haverty, C.; Lee, M.; Hahn, S. E.; McElrath, T. F.; Jain, M.; Rasmussen, M.; Corso, A.; Larson, M. L.; Morrison, H.; Melroy, L. M.; Roofeh, J.; Phelps-Sandall, B.; Kiefer, D.; Biggio, J. R.
Show abstract
Introduction: Preeclampsia (PE) is a leading cause of maternal and neonatal morbidity and mortality, and low-dose aspirin (LDA) prophylaxis is the cornerstone of evidence-based prevention. Despite guideline recommendations, LDA adherence remains poor, with 10-25% of moderate-risk patients taking aspirin. Objective personalized risk stratification using biomarkers has been shown to motivate behavior change in other disease contexts. Survey data suggest that patients are more motivated to take aspirin if informed by an objective predictive test. Here, we report real-world LDA adherence among patients who received a high-risk result from a cell-free RNA (cfRNA) PE risk prediction test. Methods: This retrospective, observational survey study included asymptomatic patients of advanced maternal age (AMA; [≥] 35 years at delivery) with singleton pregnancies without USPSTF-defined preexisting high-risk conditions for PE who received the cfRNA PE risk prediction test. Patients who opted in to receive text message surveys were asked about LDA use following receipt of test results. High adherence was defined as reporting LDA use on at least 6 of 7 days per week at least 85% of the time surveyed. The primary analysis included patients with a high-risk test result and at least one LDA frequency survey response following receipt of test result. The observed proportion of adherent patients was compared to a baseline estimate of 25% using an exact binomial test. Results: Of 166 patients who received a cfRNA PE risk prediction test result, 48 (28.9%) received a high-risk result. Of these, 29 (60%) opted in and responded to at least one survey, constituting the primary analysis population. Twenty-seven of the 29 (93.1%; 95% CI: 78.0-98.1%) were classified as highly adherent, significantly higher than the 25% baseline adherence estimate for moderate-risk patients (p < 0.0001). Conclusion: Among surveyed patients who received a high-risk cfRNA PE test result, the proportion classified as highly adherent to LDA (93%) substantially exceeded published estimates of adherence in a similar patient population and met the clinically meaningful threshold of [≥] 80% associated with reduced risk of preterm preeclampsia. These findings indicate that objective and personalized biomarker risk testing may be a powerful driver of behavior change that current guidelines have failed to produce.
wang, d.; yuan, x.; Lv, D.; wang, y.
Show abstract
Background: Red cell distribution width (RDW), a readily available hematological parameter reflecting erythrocyte size heterogeneity, has been increasingly recognized as a prognostic marker in congestive heart failure (CHF), with elevated levels independently associated with adverse outcomes. However, RDW-derived composite indices-particularly the RDW-to-platelet ratio (RPR) and RDW-to-hemoglobin ratio (RHR), which integrate inflammatory, hemostatic, and oxygen-delivery pathways-remain largely unexplored in CHF populations. Whether these indices provide incremental prognostic value beyond RDW alone in critically ill patients with CHF has not been established. Methods: This retrospective cohort study included 30,409 participants from the MIMIC-IV and eICU-CRD databases. Multivariable logistic regression, restricted cubic spline (RCS) analysis, and subgroup analyses were employed to evaluate the associations between RDW, RDW-derived indices (RPR and RHR), and in-hospital mortality in patients with congestive heart failure. Results: Based on a pooled cohort of 30,409 patients with CHF from the MIMIC-IV and multi-center eICU-CRD databases (15,983 and 14,426, respectively), 16,295 (53.6%) were male and 14,114 were female, with a median age of 71.7 years. The mean RDW was 16.0 {+/-} 2.5, and the overall in-hospital mortality rate was 12.6%. Higher RDW quintiles were associated with progressively increased in-hospital mortality. In the fully adjusted model, RDW, RPR, and RHR were all significantly associated with increased in-hospital mortality, with adjusted odds ratios (ORs) of 2.46 (95% CI: 2.17-2.79) for RDW, 1.55 (95% CI: 1.38-1.73) for RPR, and 2.43 (95% CI: 2.09-2.82) for RHR. Sensitivity analyses using restricted cubic splines demonstrated that the association between RDW and RHR with in-hospital mortality was linear (P for nonlinearity > 0.05), whereas that for RPR exhibited a non-linear pattern (P = 0.02 for non-linearity). Conclusions. Elevated RDW, RPR, and RHR were independently associated with increased in-hospital mortality in patients with congestive heart failure. Notably, RPR exhibited a non-linear threshold association with in-hospital mortality.
Biswas, M. A.; Laila, A.
Show abstract
Background: Machine learning models trained on population health surveys offer scalable tools for cardiovascular screening, but recurring methodological weaknesses undermine their credibility and equity: data leakage from synthetic oversampling, qualitative rather than quantitative explainability evaluation, and the absence of demographic fairness auditing at the clinical operating threshold. Methods: We present EXHEART, a leakage-free stacked ensemble pipeline trained on BRFSS 2015 (n = 253,680) and validated on BRFSS 2020 (n = 319,795; temporal transport and retrain) and a clinical cardiovascular examination dataset (n = 68,730). The pipeline combines XGBoost, LightGBM, Random Forest, and a multi-layer perceptron as base learners with 5-fold out-of-fold logistic regression stacking and Platt scaling calibration. A quantitative SHAP-LIME consistency framework, based on Kendall-tau rank correlation and Jaccard overlap, accompanies a decision-curve analysis, a subgroup-stratified SHAP interaction analysis, and an intersectional fairness audit (Sex x Age x Income) with threshold-shifting mitigation and a frontier of the fairness-utility trade-off. The framework also adds cross-instrument fairness-disparity attribution, an empirical diagnostic that provides evidence on whether an observed subgroup disparity is more consistent with a measurement-induced or a substantive explanation by re-validating it on a dataset that measures the same clinical construct objectively. On heart disease, this diagnostic associates 89% of the sex TPR gap (95% CI [0.65, 0.99]) with the self-reported survey outcome rather than with a substantive risk difference. Results: On BRFSS 2015, EXHEART achieves AUC-ROC = 0.850, AUPRC = 0.371, Brier score = 0.071, and reduces ECE by 96% (0.256 to 0.011) via Platt scaling. Global SHAP-LIME rank agreement is moderate-to-strong (Kendall-tau = 0.580, Spearman-rho = 0.818) with a substantial top-3 divergence (Jaccard@3 = 0.200), where Stroke flips from SHAP rank 8 to LIME rank 1. The Sex TPR gap is 0.124 at the screening threshold; intersectional Sex x Age disparities reach 0.649 among adequately-powered cells, 5.2x the single-attribute gap. Temporal transport to BRFSS 2020 collapses sensitivity from 0.776 to 0.267, while retraining restores AUC = 0.840 and ECE = 0.012. On clinical examination data, the Sex TPR gap collapses to 0.014; the attribution test indicates this gap is instrument-dependent, consistent with a measurement or outcome-definition explanation rather than a substantive risk difference. Cross-domain SHAP analysis identifies four instrument-independent CVD risk factors and two major portability failures. Conclusions: EXHEART combines three practices that population-scale cardiovascular classifiers usually apply in isolation: leakage-free training with calibrated probabilities, a test of whether the model's explanations are stable, and a fairness audit that examines intersecting subgroups rather than single attributes. Bringing them together proved worthwhile. The intersectional audit revealed disparities that single-attribute auditing missed, and the cross-instrument comparison indicated that much of the sex gap reflects how the outcome is measured in survey data rather than a substantive difference in risk. The temporal transport findings indicate that deployed BRFSS models warrant periodic monitoring and retraining to maintain clinical utility. EXHEART is a retrospective methodological evaluation on public de-identified data; it is not validated for direct clinical decision-making, diagnosis, or treatment recommendation without prospective clinical validation.
Kosola, S.; Salonen, S.; Miettinen, J.; Horhammer, I.; Impio, A.-R.; Kumpulainen, S. M.; Sergejeff, J.; Numari, S.; Laitinen-Parkkonen, P.; Tapola-Haapala, M.; Aaltio, E.; Thorn, L.
Show abstract
Introduction Education is a core social determinant of health for children and adolescents. Unfortunately, academic achievement, health, and wellbeing of adolescents have decreased in many developed countries in the past decade. The purpose of the Wellbeing and Education linkages in school-aged children (WELL-ED) study is to examine associations of school absences and academic achievement with use of school-based and community-based health and social welfare services. In addition, we will assess user experiences and multi-sector services pathways of school-aged children for a better understanding of how the service system could respond to the needs of children. Methods and analysis WELL-ED is a large population-based study that combines register data on school absences and educational support from municipalities with register data on healthcare and social service use collected from wellbeing services counties in Finland. The study cohort includes all children who attended mandatory education in public schools in Southern Finland in school year 2023-2024. A smaller cohort of adolescents in school year 8 was invited to complete a user experience survey. The primary outcomes of this study are related to equity of service use. Ethics and dissemination The Regional Committee on Medical Research Ethics of the Helsinki and Uusimaa Hospital District (2803/2024) has approved the WELL-ED study protocol. For the survey, adolescents in year 8 and parents of adolescents younger than 15 provided informed consent. Results will be published in peer-reviewed journals, summaries will be sent to participating municipalities and wellbeing services counties and press releases will be written on key findings.
Wagner, A. P.; Risebro, H.; Clark, A.; Stirling, S.; Sims, E.; Bion, V.; Blacklock, J.; Birt, L.; Bryant, R.; Cook, L.; Dean, T.; Wyn Griffiths, A.; Guillard, C.; Holland, R.; Jones, A. P.; Jones, L.; Katangwe-Chigamba, T.; Pitcher, J.; Scott, S.; Wright, D.; Patel, A.
Show abstract
Introduction Care home (CH) influenza vaccination of staff improves resident health, yet uptake remains low at just over 11% (England, 2025/2026). We report an economic evaluation (EE) of "FluCare", an intervention to increase staff influenza vaccination through: vaccination clinics at CHs; promotional materials; and CH financial incentives. Method Seventy-five CHs were randomised to FluCare or control. A cost-consequence analysis took the influenza vaccination programme funder perspective, but also extended to the National Health Service (NHS) and CH perspective. Costs included: influenza vaccination; administration fee; FluCare components; CH resident NHS utilisation. Outcomes were: staff influenza vaccination rates; staff sickness; and resident mortality. Sensitivity analyses excluded intervention CHs that did not host vaccination clinics. Results Compared to control CHs, adjusted analysis found intervention homes with a mean absolute increase in vaccination rates of 1.8% (95% CI: -6.0%, 10.8%; p=0.572) at an increased cost of {pound}451 (95% CI: {pound}239, {pound}675; p<0.001) to the vaccination programme funders: {pound}249 per additional percentage point (PAPP) per CH. Vaccination clinics were delivered late in the influenza season, with 80% taking place from February 2023. Including only intervention CHs that hosted staff flu vaccination clinics (23/35), increases the mean difference to 10.1% (95% CI: 0.9%, 21.9%; p=0.018) and costs to {pound}805 (95% CI: {pound}603, {pound}1,079; p<0.001): {pound}79 PAPP per CH. Differences between trial arms in other costs and outcomes were marginal and generally non-significant. Conclusions FluCare delivered little improvement when staff flu vaccination clinics did not occur and had little impact on other costs/outcomes. Cost-effectiveness depends on willingness-to-pay for increased staff vaccination, but cost PAPP per CH improved from {pound}249 to {pound}79 when only CHs hosting clinics were considered. Late implementation, likely reduced impact by limiting clinic delivery, as reflected in sensitivity analysis. Future evaluations should implement FluCare earlier in the season.
Fieggen, J.; Simond, G.; Segal, B. M.; Noori, A.; Thakurta, A.; Butler, C. C.; Clifton, D. A.; Clifton, L.
Show abstract
Background. Blood-based biomarkers are increasingly proposed for identifying high-risk individuals before clinical disease and for making prevention-oriented trials more efficient. Prognostic enrichment can increase event rates, but trial efficiency also depends on whether the intervention effect is preserved in the enriched population. Methods. Using the UK Biobank Pharma Proteomics Project, we trained disease-specific proteomic risk scores (ProRS) from 2,916 plasma proteins with elastic-net Cox models. We compared ProRS, polygenic risk scores (PRS), and combined PRS--ProRS scores across ten incident diseases. We estimated cumulative incidence and theoretical two-arm time-to-event trial sample sizes across risk strata. To evaluate effect preservation, we examined six intervention-analogue exposure--outcome pairs spanning genetic (PCSK9/coronary artery disease, APOE/Alzheimer's disease, PPARG/type 2 diabetes, IL23R/Crohn's disease), behavioural (physical activity/all-cause mortality), and pharmacological (RAAS inhibitors versus calcium channel blockers/coronary artery disease) examples. Results. ProRS outperformed PRS for 9 of 10 diseases (median C-index 0.75 versus 0.61). ProRS and PRS were weakly correlated (median Pearson |r| = 0.04), and joint PRS--ProRS stratification identified groups with higher observed incidence than either score alone for several endpoints. In the top risk quartile, combined-score enrichment reduced theoretical required sample sizes by 32--74\% under a fixed 20\% relative hazard reduction. These gains were not always preserved when stratum-specific intervention-analogue effects were used. Effects were broadly preserved for APOE/Alzheimer's disease and physical activity/mortality. The PPARG/type 2 diabetes effect attenuated toward the null under all three score types, showing that event-rate enrichment does not guarantee effect preservation. For IL23R/Crohn's disease and the antihypertensive comparison, point estimates differed across score types -- preserved under polygenic but attenuated under proteomic enrichment -- but confidence intervals were wide and overlapping. Conclusions. Proteomic risk scores can identify high-event-rate populations for prevention-oriented trials, but event-rate enrichment alone is insufficient for trial design. Biomarker-guided enrichment should evaluate mechanism-specific effect preservation and may be preferable as a stratification or adaptive-design variable rather than as a restrictive eligibility criterion.
Xia, J.; Zhu, Z.; Zhang, G.; Shen, Q.; Su, E.; Schoones, J.; Arcelus, J.; Hu, T.; Xu, M.; Zhang, X.; Zhao, Z.; Ye, Z.; Yao, X.
Show abstract
Introduction: Trans and gender-diverse (TGD) individuals often face stigma and discrimination in healthcare, hindering access to gender-affirming care. Training healthcare workers on TGD health aims to foster inclusive and affirming care practices. This review aimed to evaluate the effectiveness of TGD health training programs for healthcare workers. Methods: This systematic review followed the PRISMA guidelines and was registered with PROSPERO (CRD42023443288). We searched 13 databases for studies up to March 2024, with no language/geographic restrictions. Ten reviewers screened studies in pairs, resolving discrepancies via discussion or third-reviewer input. We included randomized/non-randomized comparative and before-after studies for quantitative analysis (mean difference [MD] or standardized mean difference [SMD] with 95% CIs) and qualitative/mixed-methods studies for thematic synthesis. Evidence certainty was assessed using GRADE (quantitative) and GRADE-CERQual (qualitative). Outcomes included knowledge, attitudes, skills, discrimination, competence, comfort, TGD quality of life, and stakeholder preferences. Results: From 20,188 records, 85 studies were included. Training appears to have improved healthcare workers' knowledge (SMD=1.08, 95% CI 0.78-1.39), attitudes (SMD=0.22, 95% CI 0.05-0.39), skills (SMD=0.96, 95% CI 0.56-1.37), competence (SMD=0.55, 95% CI 0.29-0.81), and comfort (SMD=0.69, 95% CI 0.17-1.21). Qualitative analysis of 130 findings identified 18 categories and four key themes on intervention design and impact. Conclusions: TGD training programs may enhance health workers' knowledge, attitudes, skills, competence, and comfort. Well-structured, interactive, and inclusive programs showed promise, but evidence certainty was low with limited follow-up. Further high-quality research is needed to confirm these findings.
Gupta, M.; Zoega, H.; Stopard, I. J.; Liu, B.; Macartney, K.; Wood, J. G.; Hogan, A. B.
Show abstract
Introduction: Respiratory infections are a leading cause of morbidity. Newly available vaccines to prevent respiratory syncytial virus (RSV) disease and encouraging clinical progress on vaccines for human metapneumovirus (hMPV) and parainfluenza (PIV) could reduce the disease burden beyond existing influenza and SARS-CoV-2 immunisation programs. However, evidence on the contribution of these viruses to respiratory disease burden across the lifespan remains limited. Methods: We reviewed studies from 01/2002-11/2025 reporting age-stratified, medically attended cases of influenza, and at least one of RSV, hMPV, or PIV, in high-income countries, excluding periods substantially overlapping with the COVID-19 pandemic. Using only studies that tested for all four viruses, we estimated the age-specific proportion of cases that were non-influenza (total across RSV, hMPV and PIV) compared to influenza using a mixed-effects logistic regression model. Results: Following exclusions and screening, 61 studies were included in the primary analysis comprising >500,000 detections of the four viruses. We found that a substantial proportion of medically attended respiratory illness in infants and young children was due to PIV, hMPV and RSV, rather than influenza, with a non-influenza virus proportion of 90.2% (95% CI 85.9-93.2%) in young infants aged 0-6 months. The converse was true for school-aged children, with a non-influenza virus proportion of 34.8% (95% CI 26.5-44.2%) in children aged 5-18 years. In adults aged 65+ years, non-influenza causes of medically attended disease were common at 60.2% (95% CI 50.0-69.5%). Restricting to studies reporting hospitalised cases (n=19) produced broadly similar age-specific trends in relative virus burden contributions. Discussion: We highlight the significant burden of medically attended illness due to PIV, hMPV and RSV across ages, particularly in infant and preschool-aged children and older adults, supporting the need for effective vaccines targeting this burden.
Pears, M.; Wadhwa, K.; Payne, S. R.; Konstantinidis, S. T. H.; Biyani, C. S.
Show abstract
Large language models (LLMs) such as ChatGPT are rapidly reshaping healthcare education and simulation-based training in non-technical skills (NTS), yet no bibliometric analysis has mapped this landscape. We searched seven open-access databases (OpenAlex, PubMed, Europe PMC, Crossref, Semantic Scholar, CORE, DOAJ) for English-language publications from January 2020 to March 2026. From 100,277 initial records, a sequential keyword funnel yielded 830 candidate papers, which were screened by 83 independent Claude Sonnet 4.6 AI agents applying pre-specified inclusion criteria (PRISMA-trAIce compliant; Cohen's kappa = 0.86 pre-reconciliation, 1.0 post-reconciliation). The final AI-verified corpus comprised 551 papers with a compound annual growth rate of 109%, contributions from 2,398 authors across 279 journals in 58 countries, and an h-index of 41. ChatGPT dominated the model landscape (46% of papers), with open-source models virtually absent. Virtual patient chatbots were the leading simulation modality (106 papers). Among NTS domains, communication (145 papers) and decision-making (135 papers) were most studied, whereas teamwork, leadership, situational awareness, and crisis resource management were markedly underrepresented. Only 6 urology-relevant papers were identified, none examining LLM integration within boot camp training formats. The field is growing at extraordinary pace but remains concentrated in a narrow range of NTS domains and a single proprietary model. Critical gaps persist in team-based skills training, open-source model evaluation, and specialty-specific simulation. AI-assisted bibliometric screening using multiple independent agents is feasible, reliable, and scalable, offering a replicable methodology for mapping fast-evolving research fields.
Spielvogel, C. P.; Kluge, K.; Ning, J.; Kumpf, K.; Nitsche, C.; Hengstenberg, C.; Slomka, P. J.; Hacker, M.
Show abstract
Background: Cardiovascular-kidney-metabolic (CKM) syndrome is a leading driver of cardiovascular morbidity and mortality. Whole-body molecular imaging is well-positioned to phenotype such syndromes, yet no imaging biomarker quantifies cumulative CKM burden. Bone scintigraphy with 99mTc-labeled bisphosphonates is widely performed and expanding with transthyretin amyloidosis assessment, under which Perugini grade 0 (absent cardiac uptake) is considered clinically benign. Objective: We hypothesized that the soft tissue-to-bone ratio (STBR) on these scans captures CKM burden and is an independent prognostic biomarker. Methods: We retrospectively analyzed 8,769 consecutive patients without cardiac uptake on 99mTc-DPD whole-body planar scintigraphy. The primary endpoint was all-cause mortality. Secondary endpoints were major adverse cardiovascular events (MACE) and heart failure hospitalization. Cox models were adjusted for ten established cardiovascular risk factors. Imaging-phenotype association (IPA) analysis mapped STBR to 1,210 clinical traits. STBR distribution across CKM stages was assessed in four prespecified analyses, including a non-cancer subgroup. Results: During a median follow-up of 5.1 years (IQR 2.5-8.2), 2,418 deaths occurred. Patients with prespecified STBR >0.5 (n=772, 8.8%) had significantly higher mortality (adjHR 1.73, 95% CI 1.54-1.94, p<0.0001) with an adjHR of up to 3.42 at higher thresholds (95% CI 2.05-5.42, p<0.0001). Hazard increased monotonically with STBR. STBR >0.5 was independently associated with MACE (adjHR 1.51, 95% CI 1.11-2.05, p=0.008) and heart failure hospitalization (adjHR 1.31, 95% CI 1.02-1.67, p=0.03). The association was robust across all prespecified subgroups and sensitivity analyses, including continuous STBR and patients without renal insufficiency. IPA analysis identified significant associations with type 2 diabetes, chronic kidney disease, chronic ischaemic heart disease, heart failure, atrial fibrillation, liver disease, amyloidosis, and hypertension among binary traits, as well as with CRP, NT-proBNP, BUN, cholesterol (inverse), and hemoglobin (inverse) among continuous parameters. STBR increased monotonically across CKM stages in all sensitivity analyses (all p<0.0001). Conclusions: STBR derived from routine 99mTc-DPD bone scintigraphy in patients without cardiac uptake is an independent prognostic imaging biomarker associated with cumulative cardiovascular-kidney-metabolic burden. As an opportunistic measure from scans already acquired at scale, STBR could refine CKM risk stratification at no additional cost, radiation, or acquisition time.
Uppal, A.; Thomas, R.; De Pasquale, M.; Sillo, J.; Getahun, H.
Show abstract
Background: The Universal Periodic Review (UPR) is a peer-review mechanism established to hold UN Member States accountable for human rights including the right to health, yet evidence on its impact on health outcomes is limited. We evaluated whether UPR engagement is associated with accelerated improvements in maternal health trajectories. Methods and Findings: We conducted a longitudinal ecological analysis of 89 countries with a baseline maternal mortality ratio (MMR) of 70 or greater per 100,000 live births in 2005. Outcomes were trajectories of annual MMR, skilled birth attendance (SBA), and contraceptive prevalence rate (CPR), from 2005 to 2023. The exposure was the volume of health-related UPR recommendations received across three cycles, thematically classified using a validated rule-based algorithm. Mixed-effects models adjusted for time-varying GDP per capita and historical fragility. The 89 countries received 41,733 UPR recommendations across three cycles, of which 405 (1%) were related to maternal health. Maternal health recommendations were preferentially directed at countries with higher baseline MMR and lower SBA. After adjustment, each additional maternal health recommendation was associated with a 0.24% [95% confidence interval (CI): 0.08, 0.40] faster annual reduction in MMR, a 0.52% [0.12, 0.91] faster annual gain in the odds of SBA, and a 0.21% [0.09, 0.34] faster annual gain in the odds of CPR. Broader recommendations on women's health and health systems and services were also associated with faster annual improvements in trajectories across all three outcomes; recommendations on abortion, family planning, sexual health and wellbeing, and sexual education tended to be directed towards lower-burden countries and were not associated with differences in any trajectories. It is important to note that the ecological design precludes causal inference. Conclusions: Receiving UPR recommendations on the themes of maternal health, womens health, and health systems and services are associated with accelerated improvements in maternal health trajectories among high-burden countries. These findings suggest that international human rights accountability mechanisms may have a role in supporting national progress on maternal health.
Yerukala Sathipati, S.; Scott, H.
Show abstract
Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.