Back

Bone

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Bone's content profile, based on 22 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Age Related Differences in BMD Response During Three Years of Denosumab Treatment

Ishikawa, K.; Asada, T.; Richardson, W.; Marius, C.; Ishikawa, M.; Nguyen, T.; Varnadore, P.; Tani, S.; Passias, P.; Alman, B. A.

2026-05-26 endocrinology 10.64898/2026.05.25.26354051 medRxiv
Top 0.1%
35.7%
Show abstract

Introduction Denosumab increases bone mineral density and reduces fracture risk in patients with osteoporosis. However, whether BMD response to denosumab differs by age, particularly during longer term treatment, remains unclear. This study investigated the association between baseline age and BMD gain during 3 years of denosumab treatment in patients with osteoporosis. Methods This retrospective study included patients with osteoporosis who were treated with denosumab. DXA-based BMD and bone turnover markers were followed for up to 3 years. Percent BMD gain from baseline, defined as %BMD gain, was evaluated. The longitudinal association between baseline age and %BMD gain was assessed using multivariable linear mixed-effects models for the lumbar spine and total hip. Analyses were performed in the treatment naive cohort and the overall cohort according to prior osteoporosis treatment status. Results A total of 255 patients were included in the analysis, of whom 110 had not received prior osteoporosis treatment. In multivariable linear mixed-effects models, older baseline age was associated with smaller lumbar spine %BMD gain in the treatment naive cohort at both 1 and 3 years. Each 1-year increase in age was associated with a 0.187 percentage-point lower lumbar spine %BMD gain at 1 year and a 0.293 percentage-point lower gain at 3 years (1 year: {beta} = -0.187, p = 0.006, 3 years: {beta} = -0.293, p = 0.031). In contrast, baseline age was not significantly associated with total hip %BMD gain in the treatment naive cohort (1 year: {beta} = -0.011, p = 0.826; 3 years: {beta} = 0.028, p = 0.727). In the overall cohort, baseline age was not significantly associated with %BMD gain at either the lumbar spine or total hip at 1 or 3 years (all p > 0.05). Conclusion Older baseline age was associated with a modestly smaller lumbar spine BMD gain in treatment naive patients, whereas no significant age-related association was observed at the total hip. In the overall cohort, age was not significantly associated with BMD gain at either site. These findings suggest that age may have a limited, site specific influence on BMD response to denosumab, particularly in treatment naive patients, and may support more individualized treatment planning in patients with osteoporosis.

2
A Bibliometric and Content Analysis of Exercise Interventions Research in Rheumatoid Arthritis

Zou, Z.; Zhang, Z.; Zhao, R.; Liu, Y.; Gao, J.; Gu, L.

2026-05-28 rheumatology 10.64898/2026.05.27.26354187 medRxiv
Top 0.2%
1.7%
Show abstract

Background: Rheumatoid arthritis is a chronic inflammatory disorder in which exercise is increasingly recognized as an important component of long-term management. Yet, most reviews in this field evaluate the effects of single exercise modalities, while bibliometric studies primarily identify publication trends and research hotspots without showing whether highly visible themes also represent coherent and comparatively mature evidence domains. Methods: We searched the Web of Science Core Collection for publications on exercise interventions in rheumatoid arthritis from 2016 to 2025. CiteSpace (6.4.1) and VOSviewer (1.6.20) were used to analyze publication growth, collaboration networks, keyword co-occurrence, thematic clusters, and burst terms. We then applied structured content coding in Excel 2021 to classify exercise modalities, outcome domains, and mechanistic topics, and integrated these findings into a visual evidence-distribution profile. Results: Publication output increased from 16 studies in 2016 to 37 in 2025. The United States led in productivity, Karolinska Institutet was the most prolific institution, and Kitas, Duda, and Metsios were among the most influential authors. Keyword analyses identified a shift from function- and disease-focused themes toward quality of life, risk factors, and comprehensive management. The integrated analysis revealed an uneven evidence structure: aerobic and resistance training accounted for the most concentrated and recurrently studied exercise-outcome domains, whereas mind-body and water-based interventions formed visible but methodologically heterogeneous clusters. Newer modalities, including blood flow restriction training and high-intensity interval training, showed growing prominence but limited depth of evidence. Conclusion:Exercise research in rheumatoid arthritis has evolved toward broader and more patient-centered management targets, but the field remains imbalanced across intervention types and outcome domains. This study demonstrates the value of combining bibliometric mapping with structured content analysis to distinguish thematic visibility from evidentiary coherence in heterogeneous intervention fields and may offer a transferable analytical framework for research evaluation beyond rheumatoid arthritis. Keywords: Rheumatoid Arthritis; Exercise Intervention; Bibliometrics; Content Analysis; Rehabilitation

3
Resting energy expenditure and thermic effect of a high-fat meal in the early follicular and mid-luteal phases of the menstrual cycle: a crossover trial protocol

Goulet, N.; Lyndon, S.; Beauregard, N.; McInnis, K.; Mauger, J.-F.; Doucet, E.; Imbeault, P.

2026-05-30 nutrition 10.64898/2026.05.25.26354032 medRxiv
Top 0.4%
0.8%
Show abstract

Introduction: Menstrual cycle phase has been proposed as a source of intra-individual variability in resting energy expenditure and the thermic effect of food in premenopausal females, yet studies examining the thermic effect of food across menstrual cycle phases report conflicting findings. Methods: This protocol describes a secondary analysis of prespecified outcomes from a non-randomized, two-period crossover trial primarily designed to assess postprandial plasma triglyceride concentrations across menstrual cycle phases (ClinicalTrials.gov: NCT07459465) in 12 premenopausal females aged 18-30 years, free of chronic disease and hormonal contraceptive use, recruited in Ottawa, Canada. Participants complete two experimental sessions: one in the early follicular phase and one in the mid-luteal phase, each involving consumption of a high-fat meal. Eleven secondary outcomes will be reported: fasting resting energy expenditure, thermic effect of food, respiratory exchange ratio, carbohydrate oxidation rate, lipid oxidation rate, desire to eat, hunger, fullness, prospective food consumption, serum beta-estradiol, and serum progesterone. Masked outcome analyses are performed using linear mixed-effects models. Results: Recruitment began on 26 March 2026; results will be reported in the Stage 2 manuscript. Discussion: Findings from this trial may help clarify whether menstrual cycle phase constitutes a meaningful source of intra-individual variability in energy metabolism, with implications for the design of metabolic research in premenopausal females.

4
Generalized Sensory Sensitivity for Prediction of Post-Surgical Analgesic Outcomes: An Observational Cohort Study of Total Hip Arthroplasty and Hysterectomy

Schrepf, A.; Smith, T.; Waller, N.; Harris, R. E.; Ichesco, E.; Kaplan, C. M.; Till, S. R.; Williams, D. A.; As-Sanie, S.; Evanski, J. M.; Urquhart, A.; Brummett, C. M.; Clauw, D. J.; Harte, S. E.

2026-05-27 rheumatology 10.64898/2026.05.26.26354108 medRxiv
Top 0.4%
0.7%
Show abstract

Background. A substantial minority (~20%) of patients fail to achieve meaningful pain reduction following surgery intended to relieve pain. Risk is elevated in patients with nociplastic pain features, but available self-report measures were not designed for pre-surgical screening. We aimed to develop a brief, data- driven screener for poor analgesic response to surgery. Methods. Participants were recruited from tertiary orthopedic and chronic pelvic pain clinics. Total hip arthroplasty participants had Kellgren-Lawrence grades III-IV with hip pain greater than or equal to 1 year; hysterectomy participants had chronic pelvic pain greater than or equal to 6 months. The primary outcome was a 50% reduction in worst pain at six months. Items were selected via elastic net regression with k-fold cross-validation from 68 candidates. Results. Of 428 participants (81% female; mean age 51), 35% failed to achieve a 50% pain reduction. The resulting 11-item screener - the GenerAlized sensory sensitivity for sUrGical rEsponsiveness (GAUGE) - comprises pain across seven body regions and four symptom items measuring interoception (nausea, numbness/tingling) and exteroception (sensitivity to sound, sensitivity to odors). GAUGE outperformed the Central Sensitization Inventory, Fibromyalgia Survey Criteria, and PainDETECT for predicting surgical non-response (RR 1.535, 95% CI 1.342-1.55; AUC 0.738; sensitivity 0.741, specificity 0.635) and for predicting Patient Global Impression of Change. In an independent validation cohort of 54 total knee arthroplasty patients, GAUGE outperformed the Fibromyalgia Survey Criteria in predicting pain severity at six-months. Conclusions. GAUGE is a data-driven, theoretically grounded screener for poor analgesic response to surgery, with potential utility for pre-surgical counseling and clinical trial enrichment.

5
Cross-Sectional Measures of Periodontal Severity: Distortion from Severity-Dependent Tooth Loss

McCormick, K. M.; Amarasena, N.; Guzzo, G.; Nath, S.; Jamieson, L.

2026-05-30 dentistry and oral medicine 10.64898/2026.05.27.26354277 medRxiv
Top 0.7%
0.2%
Show abstract

Aim: Cross-sectional summaries of periodontitis based on clinical attachment loss (CAL) are, by definition, conditioned on surviving teeth. Because the most severely affected teeth are more likely to have been lost, these measures may underestimate cumulative disease burden and show an artificial flattening (attenuation) of severity with age. We hypothesised that measures more sensitive to severe attachment loss would show greater attenuation at older ages than measures defined across a broader range of sites. Materials and Methods: Using nationally representative data from adults aged 30+ years in NHANES 2009-2014, we examined age-specific trajectories across multiple continuous measures of periodontal severity and assessed whether divergence between measures followed the pattern predicted under severity-dependent tooth loss. Results: The proportion of observable sites declined from 93% at ages 30-34 to 68% at 80+ years, establishing the structural basis for the divergence observed across severity measures. All severity measures showed nonlinear attenuation with age, with distortion increasing with severity threshold. Higher-threshold measures exhibited the greatest attenuation, while lower-threshold measures showed more stable trajectories. Conclusions: Cross-sectional summaries of periodontitis reflect disease among surviving teeth rather than cumulative damage across teeth originally at risk. Attenuation at older ages is consistent with depletion of the most severely affected teeth rather than biological slowing. Distortion varies by measure, with higher-threshold and mean-based indices most affected, whereas the CAL 3+ mm threshold provides a more stable basis for age comparisons.

6
Estimating Lifetime Periodontal Burden Under Informative Tooth Loss

McCormick, K. M.; Amarasena, N.; Guzzo, G.

2026-05-30 dentistry and oral medicine 10.64898/2026.05.27.26354300 medRxiv
Top 0.8%
0.2%
Show abstract

Background: Periodontitis is defined by cumulative, irreversible tissue destruction, yet population-based measurement typically relies on cross-sectional indicators derived from retained teeth. Destruction that occurred earlier in life, particularly disease severe enough to result in tooth loss, is structurally excluded from these measures, potentially leading to systematic underestimation of lifetime periodontal burden. Objective: To develop and evaluate a measurement framework that estimates lifetime periodontal burden from cross-sectional data by explicitly incorporating informative tooth loss under etiological uncertainty. Methods: Data were drawn from 10,324 adults aged [≥]30 years participating in the 20090-2016 National Health and Nutrition Examination Survey (NHANES) who completed full-mouth periodontal examination and glycated hemoglobin (HbA1c) testing. Lifetime periodontal burden was estimated by combining observed clinical attachment loss in retained teeth with probabilistic contributions from missing teeth, using three alternative age-stratified attribution schedules derived from epidemiological studies of periodontal extraction. Performance was compared with conventional measures of periodontal severity and extent using distributional analyses, correlations with HbA1c, discrimination of diabetes status, and relative importance analysis. Age-adjusted models were treated as sensitivity analyses. Results: Estimated lifetime periodontal burden exhibited strong, monotonic age gradients across glycemic categories, in contrast to more attenuated patterns observed for severity and extent. Across attribution schedules, lifetime burden showed stronger correlations with HbA1c ({rho} = 0.30-0.32) than conventional measures. In multivariable models including all indices, lifetime burden retained an independent association with HbA1c, whereas severity and extent contributed little unique information. Discriminative performance for diabetes status was consistently higher for lifetime burden than for conventional measures and remained stable across attribution schedules. Conclusions: Lifetime periodontal burden can be estimated from cross-sectional data by explicitly modelling informative tooth loss rather than restricting measurement to retained teeth. Incorporating historical tissue loss under uncertainty yields a more coherent representation of cumulative periodontal destruction than snapshot-based measures and provides a methodological basis for life-course-oriented periodontal epidemiology.

7
Polyphenol Estimator: A New Tool to Estimate Dietary Polyphenol Intake from ASA24 and NHANES Dietary Data

Wilson, S. M. G.; Oliver, A.; Lemay, D. G.

2026-05-29 nutrition 10.64898/2026.05.27.26353727 medRxiv
Top 0.9%
0.2%
Show abstract

Background: Recent food-based recommendations for flavan-3-ols highlight a growing need to understand the breadth of our dietary polyphenol exposure. However, estimation of dietary polyphenol intake remains challenging, requiring custom computational tools that are often difficult to implement or not fully reproducible. Objective: We aimed to an automated, user-friendly tool to estimate polyphenol intake from diet recalls and records. Methods: We developed Polyphenol Estimator, a tool that processes dietary data from the Automated Self-Administered 24-Hour (ASA24) Dietary Assessment Tool or the Automated Multiple-Pass Method from the National Health and Examination Survey (NHANES). Polyphenol Estimator disaggregates foods using the FDA Food Disaggregation Database into ingredients, matches these ingredients to FooDB, and estimates polyphenol intake at the total, class, and compound level. Optionally, these polyphenol estimates can be used to calculate the Dietary Inflammatory Index (DII). Polyphenol Estimator is freely available online (https://swi1.github.io/polyphenol_estimator) with a tutorial for users with limited programming experience. Results: To illustrate Polyphenol Estimator, we applied it to two days of diet recalls from adults ([≥] 20 years) in NHANES 2021-2023 (n = 2778). For 97.7% of participants, less than 2.5% of reported foods went unmapped, with 75.7% of participants having complete mappings. Total polyphenol intake was 517 +/- 439 (mean +/- SD) mg/1000 kcal, largely from green tea, coffee, black tea, apples, wine, oranges, and blueberries. At the class level, polyphenols classified as organooxygen compounds, flavonoids, and cinnamic acids and derivatives were top intake contributors. At the compound level, cyptochlorogenic acid, neocholorogenic acid, and caffeic acid were top contributors. Lastly, the DII was 1.4 +/- 1.9, indicating the average diet had proinflammatory potential. Conclusions: Polyphenol Estimator offers an automated method to obtain total, class, and compound-level polyphenol estimates from dietary data to aid future efforts to understand polyphenol intake exposures and their biological impact on health.

8
Compatibility of National Food Composition Databases with USDA FoodData Central: A Seven-Country LLM-Based Analysis

Nakagawa, S.; Yamamoto, A.

2026-06-01 nutrition 10.64898/2026.05.23.26353942 medRxiv
Top 1%
0.1%
Show abstract

To evaluate the international interoperability of food composition databases, we assessed the compatibility of seven national food composition tables with USDA FoodData Central (FDC) using the LLM-based matching method reported previously (Nakagawa and Yamamoto, 2026). Databases from four English-speaking countries (Canada, United Kingdom, Australia, and New Zealand), South Korea, and Japan were compared with 8,158 USDA FDC entries (SR Legacy and Foundation Foods, excluding Survey/FNDDS). Match rates varied by country (62.0-89.7%) and food category. After excluding six USDA categories unsuitable for cross-national comparison, 45.2% of the remaining 6,290 entries were not matched by any country. Canada showed the highest concordance, reflecting shared North American food supply. Japan and South Korea showed similar low coverage for vegetables and spices. These findings suggest that while USDA FDC represents a practical foundation for a globally comprehensive food composition database given its breadth, systematic incorporation of country-specific foods and classification schemes will be necessary to achieve true international interoperability.

9
Cleaner Air for Lower Cardiometabolic Risk: protocol for a double-blind, randomized, sham-controlled trial of HEPA filtration in adults with prediabetes.

Wittkopp, S.; Asachi, P.; Kazatsker, F.; Aleman, J. O.; Gordon, T.; Brook, R.; Thorpe, L.; Newman, J. D.

2026-06-01 endocrinology 10.64898/2026.05.29.26354420 medRxiv
Top 1%
0.1%
Show abstract

Introduction Air pollution is a leading driver of cardiovascular disease with a growing body of literature implicating this in worse glucose homeostasis. Increases in fine particulate matter air pollution (PM2.5) are associated with increased blood glucose and hemoglobin A1c across the glycemic spectrum from normoglycemia to prediabetes to all forms of diabetes. Despite strong evidence for positive associations of PM2.5 with dysglycemia, it remains unknown if reducing air pollution exposure through air filtration can effect improvements in glucose. This study aims to test the hypothesis that short-term, in-home air pollution reduction using high efficiency particulate air (HEPA) filtration will improve blood sugar in adults with prediabetes. Methods and analysis This trial is a randomized, double-blind, sham-controlled trial of the effects of lowering air pollution exposure using HEPA filtration on cardiometabolic health in adults with prediabetes living in the New York City area. Participants will be randomly assigned to use bedroom air cleaners, or sham air cleaners, while measuring PM2.5 continuously for 1 month. The primary outcomes will be continuous glucose monitoring metrics measured before and after HEPA air filtration. Exploratory outcomes will include insulin resistance measures, serum biomarkers and transcriptomics measured before and after HEPA intervention. We will quantify effects of HEPA filtration with models using treatment arm (true versus sham filtration) as the independent variable. Secondary analyses will model continuous measures of PM2.5 as the independent variable. Ethics and Dissemination This study has undergone peer review; and the work was supported by Grant 2023-0214 from the Doris Duke Foundation, who had no other role in study design or implementation. The study was registered in ClinicalTrials.gov (NCT05994937) prior to recruitment. Clinical Trials Clinical Trials NCT05994937; https://clinicaltrials.gov/study/NCT05994937

10
Association of a polygenic risk score with coronary atherosclerotic burden in clinical CT angiograms

Hartmann, K.; Gannon, M.; Natarajan, P.; Greenland, P.; Biobank, P. M.; Levin, M.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.26.26353801 medRxiv
Top 1%
0.1%
Show abstract

Background: Polygenic risk scores (PRS) for coronary artery disease (CAD) are associated with cardiovascular events, but the relationship between inherited risk and routinely reported coronary computed tomography angiography (CTA) findings has not been studied. Objectives: To evaluate associations between a genome-wide PRS for angiographic coronary disease burden and coronary CTA-derived measures of atherosclerotic severity in a real-world clinical cohort. Methods: We studied Penn Medicine BioBank participants with available genotypes and clinically obtained coronary CTA reports. A previously published PRS for angiographic CAD burden was calculated using pgsc_calc. CAD-RADS scores and coronary artery calcium (CAC) values were extracted from radiology reports using the large language model Llama 3.1 8B. Associations between PRS and CAD-RADS severity were evaluated using Bayesian cumulative ordinal logit regression, while associations with log-transformed CAC burden were assessed using Bayesian linear regression. Results: Among 630 participants, median age was 59 years (IQR 49 - 68), 53% were female, 62% were genetically similar to a European reference population, and 34% to an African reference population. LLM-extracted CAD-RADS and CAC values demonstrated near-perfect agreement with manual abstraction. Higher PRS was associated with greater coronary atherosclerotic burden on CTA. Each 1-standard deviation (SD) increase in PRS was associated with a 20% higher odds of belonging to a more severe CAD-RADS category (cumulative OR 1.20, 95% credible interval 1.06-1.44). Higher PRS was also associated with greater CAC burden ({beta} 0.38, 95% credible interval 0.15 - 0.61). Conclusions: Polygenic risk for angiographic coronary disease burden is reflected in clinically reported coronary CTA severity measures, including CAD-RADS and CAC. These findings demonstrate that inherited susceptibility to CAD manifests as greater anatomic atherosclerotic burden at the time of clinical presentation and support further investigation of genetic risk integration into imaging-based cardiovascular risk assessment.

11
A Personalized Whole-Food Diet Differentially Modulates Glucoregulatory and Cognitive Responses Compared With Conventional Dietary Counseling in Young Black and White Adults With Overweight or Obesity: An 8-Week Randomized Controlled Trial

Ani, O.; Rabbani, E.; Dhillon, J.

2026-05-29 nutrition 10.64898/2026.05.27.26354244 medRxiv
Top 2%
0.0%
Show abstract

Background: Black adults bear a disproportionate burden of cardiometabolic dysfunction, yet most dietary trial evidence comes from predominantly White cohorts. Objective: To evaluate whether a personalized whole-food dietary intervention improves cardiometabolic outcomes more in Black than White young adults with overweight or obesity. Methods: In this 8-week randomized, controlled trial (ClinicalTrials.gov: NCT04635917), 112 Black and White adults (18-35 years; BMI 25-45 kg/m2) were block-randomized by race to a personalized dietary intervention providing whole foods (PD, n=57) or conventional dietary counseling at baseline (BL) using MyPlate guidelines (CD, n=55). Primary outcomes were Matsuda Index and fasting and OGTT-derived glucose, insulin, and non-esterified fatty acids. Other glucoregulatory, cardiovascular, anthropometric, appetite, and cognitive outcomes were also assessed. Outcomes were analyzed using baseline-adjusted linear models with sensitivity analyses adjusting for baseline BMI and food security score. Results: Compliance with study food consumption was 85-91%. Diet quality was higher in PD than CD (P < 0.05), with larger gains in vegetable-related outcomes among Black participants (group x race, P < 0.05). HOMA-{beta} was lower in PD than CD overall (P < 0.05). In sensitivity analyses, Black PD participants had greater fasting insulin reductions than White, especially in the latter half of intervention (week x group x race, P < 0.05), with a similar tendency for HOMA-IR. Glucose AUC 0-30 min was higher in White than Black PD participants (group x race, P < 0.05). Concentration performance was higher in PD than CD overall (P < 0.05), with larger gains in processing speed and accuracy among Black than White participants (group x race, P < 0.05). No effects were observed for cardiovascular or appetite outcomes. Conclusions: The personalized whole-food intervention produced differential effects in fasting insulin and early-phase glucose handling, and greater benefits in attention, in Black compared with White young adults with overweight or obesity during weight maintenance.

12
Mid-Pregnancy Maternal Leukocyte Telomere Length and Preterm Birth in a Population-Based Hispanic/Latina California Cohort

Garay, O.; Oltman, S.; Bear, R. J.; Lin, J.; Wojcicki, J. M.; Ryckman, K. K.; Jelliffe-Pawlowski, L. L.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.27.26354189 medRxiv
Top 2%
0.0%
Show abstract

Background Preterm birth (PTB) rates among Hispanic/Latina individuals in the United States have risen over the past decade. Data suggests this rise may be driven in part by psychosocial stress. Leukocyte telomere length (LTL), a marker of cumulative cellular aging that shortens under chronic stress, may capture stress-related biological vulnerability, but has not been examined as a potential population-level contributor to PTB in Hispanic/Latina pregnancies. Objective To examine the association between mid-pregnancy maternal LTL and PTB in a population-based Hispanic/Latina cohort. Methods In a case-control study nested within a California singleton birth cohort (n = 436 Hispanic/Latina individuals; 215 PTB, 221 term births), LTL was measured by quantitative PCR from biobank specimens collected from 15 to 20 weeks of gestation. Covariates from linked birth certificate and hospital discharge records were included. Logistic regression estimated ORs and 95% CIs of PTB by LTL examined continuously and by percentile category (<=10th, 11th-89th, >=90th) with and without adjustment for covariates. Results Mean and median LTL did not differ between PTB and term births. LTL at or below the 10th percentile was associated with elevated odds of PTB relative to full-term birth (12.6% versus 4.3%; ORc = 3.2, 95% CI 1.3-7.9), persisting after partial (ORadj1 = 3.2, 95% CI 1.3-8.3) and full covariate adjustment (ORadj2 = 3.4, 95% CI 1.3-9.3). Subgroup analyses showed consistent directional patterns across PTB subgroups and for early term birth (ORadj2 = 5.1, 95% CI 1.5-17.0). Conclusions Mid-pregnancy maternal LTL <=10th percentile was associated with more than three times the odds of PTB, with risk concentrated at the extreme low tail of the distribution. Consistent with a cumulative allostatic load model, markedly short LTL at mid-gestation may reflect elevated stress-related biological risk for preterm delivery. These findings support upstream investment in stress reduction and prospective LTL research in high-burden populations.

13
DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv
Top 2%
0.0%
Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

14
Exploring Auditory Biofeedback Paradigms for Gait Training in Children with Cerebral Palsy: A User-Centered Design Study

Kantan, P. R.; Hansen, M. B.; Foldager, J. J.; Fjeldgaard, F. S.; Dahl, S.; Spaich, E. G.

2026-05-29 rehabilitation medicine and physical therapy 10.64898/2026.05.29.26353852 medRxiv
Top 2%
0.0%
Show abstract

Purpose: To identify, through iterative user-centered design, the auditory biofeedback requirements and sound preferences supporting gait training in children with cerebral palsy (CP), and to determine which feedback variables, sound mappings, and sound types yield clinically viable and movement-interpretable paradigms. Methods: The iterative process spanned two prototype phases. Prototype A comprised seven paradigms demonstrated to two experienced physiotherapists (Workshop 1A). Two of these were subsequently discarded owing to poor sound-movement interpretability and two were modified. Six paradigms were added to Prototype B, demonstrated to four children, five parents, and one therapist (Workshop 1B) and two therapists (Workshop 2B). Data were analyzed using systematic text condensation. Results: Within-child sound preferences varied with energy level and sensory state on a given day. Sound-movement interpretability tended to suffer for paradigms with greater acoustic complexity (e.g. computer-generated music). Therapists endorsed a repertoire spanning both movement quality and movement quantity targets. Participants independently proposed paradigms rewarding restrained and controlled movement, a feedback category absent from the current prototype. Conclusions: Session-level calibration is preferable to fixed sound profiles, requiring real-time interface support for paradigm adjustment. Acoustic complexity must remain subordinate to movement-sound interpretability. Paradigms targeting movement restraint are a development priority unaddressed in the literature.

15
Comparative Study on Image Quality of Deep Learning and Adaptive Statistical Iterative Reconstruction-V in Thin Layer CT of liver Lesions

Yang, J.; Li, L.; Cao, J.; Zhang, J.

2026-05-26 radiology and imaging 10.64898/2026.05.23.26353923 medRxiv
Top 2%
0.0%
Show abstract

Objective:This study aims to compare the advantages and disadvantages of DLIR and adaptive statistical iterative reconstruction-V (ASIR-V) in thin-slice (2.5 mm) CT images of hepatic lesions characterized by high and low contrast. Additionally, the study seeks to determine the optimal DLIR strength for the evaluation of liver lesions. Methods:A retrospective analysis was performed on 90 patients who underwent abdominal contrast-enhanced CT scans. Group A comprised 48 patients with low-contrast lesions, while Group B included 42 patients with high-contrast lesions. The acquired images were reconstructed using post-processing DLIR at low (DLIR-L), medium (DLIR-M), and high (DLIR-H) strengths, all with a slice thickness of 2.5 mm (subgroups A1-A3, B1-B3). Furthermore, images were reconstructed with ASIR-V at 50% strength at slice thicknesses of 2.5 mm and 5 mm (subgroups A4/B4 and A5/B5, respectively). CT values and standard deviations (SD) of the liver and lesions were measured, and the corresponding signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) were calculated. The edge rise slope (ERS) was determined using ImageJ software by measuring CT values along a line from the liver parenchyma to the lesion. Objective metrics were compared using one-way ANOVA, with independent samples t-tests applied for inter-group differences. Subjective scoring, which encompassed noise level, diagnostic confidence, and lesion margin delineation, was conducted by two radiologists, with differences analyzed using the Kappa test. Results: Objective evaluation revealed a progressive decrease in lesion SD and a progressive increase in SNR and CNR from subgroups A1/B1 to A3/B3. The SD of Group A2 decreased by 57.4% compared to A4, while the SNR and CNR of A2 icreased by 19.3% and 24.6% compared to A4. Although subgroup B2 had a lower SNR than B5, the difference was not statistically significant. SNR and CNR in B2 increased by 24.1% and 11.9%, respectively, compared to B4. ERS gradually decreased from A1/B1 to A3/B3. ERS values in A2 and B2 increased by 27.0% and 39.4%, respectively, relative to A5 and B5. Although A3 had a lower ERS than A1 and A2, all DLIR subgroups exhibited higher ERS than A5; similar trends were observed in Group B. Subjective evaluation indicated good inter-reader agreement (Kappa > 0.61, p < 0.05). As DLIR strength increased, noise scores rose progressively in both groups. However, noise in A2 and B2 was lower than in A4/A5 and B4/B5. Diagnostic confidence and lesion margin delineation scores were highest in A2 and B2, while all subjective scores were lowest in A5 and B5. Discussion: Most prior studies evaluated the liver, vessels, or confirmed that image quality can be guaranteed at low doses. However, there are few studies on specific individual lesions. Therefore, this study aims to investigate specific individual lesions. The details and detection rate were analyzed separately to confirm the clinical acceptability of 2.5-mm DLIR image in different contrast lesions. Conclusion: For both high- and low-contrast hepatic lesions, DLIR provides superior image quality compared to ASIR-V, with the 2.5mm DLIR-M setting being optimal. DLIR-M reduces image noise, improves spatial resolution, and produces images more suitable for diagnostic purposes.

16
Association Between Quadriceps Strength And Knee Flexion During Drop Landing In Healthy Adolescent Athletes

Lyons, B.; Hopfauf, J.; Bond, C. W.; Noonan, B. C.

2026-05-30 sports medicine 10.64898/2026.05.28.26353494 medRxiv
Top 2%
0.0%
Show abstract

Background: Quadriceps strength and landing mechanics are two modifiable factors associated with anterior cruciate ligament (ACL) injury risk. Collecting detailed biomechanical data is an arduous task. Identifying a relationship using more easily measured variables, such as quadriceps strength, would offer value for athlete counseling and injury prevention programs. Although quadriceps weakness has been associated with altered landing strategies in ACL-reconstructed (ACLR) individuals, this relationship is less clear in healthy athletes. Purpose: To investigate the association between isokinetic quadriceps strength and peak knee flexion angle during a vertical drop jump in healthy adolescent athletes. Study Design: Secondary analysis of previously collected data. Methods: Healthy adolescent athletes had their dominant leg quadriceps strength measured using an isokinetic dynamometer at 60{degrees}/s from 0-90{degrees} of knee flexion. Landing mechanics were assessed during a vertical drop jump using three-dimensional motion capture synchronized with force plates. Pearson correlation was used to evaluate the association between quadriceps strength and peak knee flexion angle during landing, with statistical significance defined as p < .05. Results: There was a weak negative correlation between quadriceps strength and peak knee flexion angle (p = .017, R = -.22 [-.04, -.38]), suggesting that stronger athletes achieved greater knee flexion angles. Discussion: Greater quadriceps strength was associated with increased peak knee flexion angles during landing; however, the weak correlation suggests that strength explains only a small portion of the variability in landing mechanics. These findings deviate slightly from prior literature in healthy populations but are consistent with studies demonstrating that greater quadriceps strength is associated with achieving greater peak knee flexion in ACLR patients. Accordingly, quadriceps strengthening should remain a key component of multifactorial ACL injury prevention programs.

17
Vaginal Antisepsis for Major Gynecologic Surgeries Using Chlorhexidine Gluconate versus Povidone Iodine: A Systematic Review and Meta-Analysis

Dias, Y.; Gebrekidan, F.; Lowder, J.; Sutcliffe, S.; Yaeger, L.

2026-05-27 obstetrics and gynecology 10.64898/2026.05.26.26353429 medRxiv
Top 2%
0.0%
Show abstract

ABSTRACT OBJECTIVE: We performed a systematic review and meta-analysis (SRMA) of post-surgical outcomes, comparing chlorhexidine gluconate (CHG) versus povidone iodine (PI) for vaginal antisepsis of major gynecologic procedures. DATA SOURCES: Ovid Medline, Embase, Scopus, Embase, Cochrane, and Clinicaltrials.gov were searched between 1986 and December 2023, for studies comparing CHG with PI for vaginal antisepsis of major gynecologic operations. STUDY ELIGIBILITY CRITERIA: We included Randomized Controlled Trials (RCTs) and non-RCTs comparing CHG to PI for vaginal antisepsis of major gynecologic operations. The primary outcome was surgical site infections (SSIs) and the secondary outcome was urinary tract infections (UTIs) and vaginal irritation. METHODS: Summary estimates were calculated by fixed effects models when I2 [&le;] 25% and by random effects models when I2 > 25%. Statistical analysis was performed using RevMan 5.4.1. The protocol for this systematic review was registered on PROSPERO (ID CRD42022378101). RESULTS: Nine studies met the inclusion criteria, four of which were randomized controlled trials (RCTs). 9538 patients were included, 4300 (45%) of whom were allocated to CHG and 5238 (55%) to PI. No statistically significant difference in SSI incidence was found for vaginal antisepsis with CHG versus PI in pooled analyses (n= 9538 patients; RR 1.20; 95% CI 0.92-1.57; I2 =0%). In contrast, a significantly higher risk of UTIs was observed for vaginal antisepsis with CHG than with PI (n=6061 patients; RR 1.48 95% CI 1.03-2.14; I2 = 0%). CONCLUSION: In our SRMA, there were no significant differences in SSI risk when either CHG or PI was utilized for antiseptic vaginal preparation. Interestingly, vaginal antisepsis with PI was associated with a lower incidence of post-operative UTIs following major gynecologic surgery. Our findings support current guidelines that form of vaginal antisepsis can be used for SSI prevention. They also suggest that PI may result in fewer postoperative UTIs but further randomized studies are needed to support these findings. Key words: surgical site infection, surgical wound infection, urinary tract infection, urogynecologic surgery, Chlorhexidine, Povidone Iodine, surgical antiseptic,

18
Auditable cross-instrument detection of unusual multivariate psychiatric response configurations using a semantically aligned covariance subspace

Periwal, V.

2026-05-27 psychiatry and clinical psychology 10.64898/2026.05.22.26353902 medRxiv
Top 2%
0.0%
Show abstract

Background: Conventional psychiatric screening instruments summarize symptoms within individual scales and prioritize cases with high single-instrument additive score severity. This design treats items as independent within instruments and ignores cross-instrument covariance structure, making it insensitive to respondents whose responses are distributed across multiple domains in unusual combinations that remain below threshold on every individual scale. Methods: We analyzed two cohorts spanning older and younger adults. Item prompts from depression, stress, anxiety, and sleep instruments were embedded into a shared semantic space using a pretrained sentence encoder. Principal component analysis of the item-prompt embeddings alone---with no use of respondent data at this stage---was used to construct a low-dimensional subspace retaining 80\% of variance in the item embedding matrix. Normalized participant responses were then projected into this subspace, with Jaccard-based stability analysis used as a check on dimensional robustness. Multivariate deviation from the cohort norm was quantified with Mahalanobis distance using Ledoit-Wolf covariance regularization. Candidate outliers were defined by the empirical 95th percentile of the cohort-specific distance distribution. To isolate response configurations not already captured by conventional single-instrument extreme-value logic, we excluded all outlier respondents who had endorsed any individual item at the maximum value of its Likert scale on any instrument. For the remaining outliers, anomalous components were backtracked to their original item loadings for interpretation. Results: In the older-adult Health and Retirement Study (HRS) cohort, principal component analysis of 27 item-prompt embeddings showed that a 10-dimensional subspace provided a stable representation of cross-instrument semantic structure. In the younger-adult Xinxiang cohort the corresponding stable solution was 16-dimensional. In each cohort, seven respondents remained as multivariate outliers despite falling below every single-instrument extreme-value threshold. These cases were not characterized by uniformly severe symptom scores but by unusual cross-domain response configurations that became visible only in the shared semantic covariance subspace. The response structure of the retained configurations differed across cohorts: older-adult cases more often involved weak endorsement of mood-labeled items alongside nonzero body- and sleep-related responses, whereas younger-adult cases more often involved incomplete response configurations spanning mood, sleep, stress, and self-harm-related items. Conclusions: A semantically aligned, auditable covariance subspace provides a practical tool for flagging unusual multivariate response configurations that single-instrument additive screening may not flag. The method is interpretable at the level of original item contributions. It should be understood as a hypothesis-generating screen for unusual response configurations requiring further clinical assessment, not as a diagnostic instrument. Outcome validity remains to be established by prospective study.

19
Data Assimilation Substitutes for Biological Complexity in Hybrid Influenza Forecasting Models

Alleman, T. W.; Van Wesemael, T.; Shanker, N.; Mietchen, M. S.; Loo, S.; Ajagbe, S. O.; Baetens, J. M.; Lemaitre, J.; Hill, A. L.; Truelove, S. A.; Bento, A. I.

2026-05-27 public and global health 10.64898/2026.05.19.26353597 medRxiv
Top 2%
0.0%
Show abstract

Hybrid mechanistic-statistical models offer interpretability and adaptability for short-term seasonal epidemic forecasting, but it remains unclear whether their accuracy depends more on increased biological complexity or on the assimilation of richer data. Using eight retrospective influenza seasons in North Carolina, we evaluate whether training on historical data and assimilating auxiliary emergency department (ED) visit data improves four-week-ahead hospital admission forecasts more than adding biological complexity (multi-subtype structure and cross-season immunity). Hierarchical Bayesian training on historical data improves accuracy by 22.4 % (95 % CI: 16.4-28.1 %), and inclusion of ED visit data yields a further 5.3 % (95 % CI: 3.0-7.6 %) improvement, whereas added biological complexity produces diminishing or null gains. We further observe a substitution effect in which ED visit data partially compensates for omitted biological structure. We deployed a simplified model variant in the 2025-2026 CDC FluSight Challenge and ranked among the top ensemble performers, supporting the robustness of Bayesian hierarchical training in real time. Together, these findings indicate that short-term forecast accuracy is driven more by historical learning and assimilating auxiliary signals than by biological fidelity, with implications for how forecasting systems should balance mechanistic complexity.

20
Dentine markers of pre/early postnatal lead exposure links with brain, cognitive, and behavioral outcomes in adolescents

Marshall, A. T.; Kan, E.; Adise, S.; König, M.; McConnell, R.; Martinez, M.; Midya, V.; Arora, M.; Sowell, E. R.

2026-05-27 pediatrics 10.64898/2026.05.26.26354134 medRxiv
Top 2%
0.0%
Show abstract

Lead is a toxic metal ubiquitous in our environment. While dramatic reductions in lead sources have paralleled equivalent decreases in lead-poisoning rates, chronic lead exposure remains a critical public health concern. Childhood lead exposure (at its lowest levels) is liked to changes in cognitive development but less is known about lead's effects on children's brain structure, especially as a result of in utero exposure. We measured prenatal and early-postnatal lead exposure in shed deciduous teeth of 448 9- and 10-year-old children (from 20 United States cities) and linked those lead levels to childhood brain structure, cognition/behavior, and neighborhood- and family-level socioeconomic characteristics. Here we show negative associations between tooth-lead levels and the thickness of the brain's cortex, particularly in regions linked to language processing. With increasing tooth-lead levels, children of lower-income (versus higher-income) families showed steeper declines in receptive vocabulary. Caregiver-reported behavioral problems exhibited similar associations. With in utero exposure linked to adverse neurodevelopmental outcomes (well before lead exposure and its risks are evaluated by healthcare professionals), prenatal screening of maternal lead levels/exposure, coupled with recommended strategies to reduce its placental transmission, may help reduce lead's effects on future generations.