Back

Epidemiology

Ovid Technologies (Wolters Kluwer Health)

Preprints posted in the last 7 days, ranked by how well they match Epidemiology's content profile, based on 26 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Estimating COVID-19 Cumulative Incidence from Seroprevalence Surveys accounting for Time-Varying Seroreversion: A Fully Bayesian Methodology

Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.

2026-06-10 epidemiology 10.64898/2026.06.09.26355264 medRxiv
Top 0.1%
6.8%
Show abstract

Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.

2
Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.

2026-06-06 epidemiology 10.64898/2026.05.29.26354427 medRxiv
Top 0.1%
6.4%
Show abstract

Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.

3
Modeling the Impact of Pediatric RSV Immunization in Massachusetts, 2024--2025

Jones, L.; Ergas, R.; Tibbs, A.; Russo, E. T.; Norville, J.; Bingay, B.; Brown, C. M.; Reich, N. G.; Pasco, R.

2026-06-10 epidemiology 10.64898/2026.06.05.26354236 medRxiv
Top 0.1%
4.3%
Show abstract

Background Pediatric immunizations for Respiratory Syncytial Virus (RSV), including monoclonal antibodies for infants and vaccines for pregnant people, have become broadly available and can prevent severe RSV outcomes in infants. However, quantifying the impact of RSV immunization in prevention of severe pediatric illness at the population-level is limited by lack of RSV case surveillance data. The Massachusetts Department of Public Health (DPH) conducted a modeling analysis using routine public health surveillance data to estimate the state-level impact of new RSV immunization products on Emergency Department (ED) visits and hospitalizations in Massachusetts for highest risk pediatric groups. Methods A scenario projection tool, called R.Scenario.Vax, was utilized to simulate RSV-associated ED hospital encounters by age group in the context of newly available immunizations. ED visit and hospitalization data from the National Syndromic Surveillance Program (NSSP) during the time period 10/08/2017--10/19/2024 were analyzed, scaled to account for changes in RSV testing practices over time and missing encounter volume in historic data, and utilized to inform model fit of a "typical" RSV season. RSV immunization data from the Massachusetts Immunization Information System (MIIS) for the 2023--2024 and 2024--2025 RSV seasons informed high and moderate pediatric RSV immunization coverage scenarios and their impact was compared to a counterfactual reference scenario of no new immunizations. Median projections were quantitatively and qualitatively compared to observed 2024--2025 season data. Percent reduction in hospital encounters and encounters averted per 10,000 population were calculated for each scenario as compared to the reference. Results Projections for the youngest at-risk age groups showed significantly lower RSV-associated ED visits and hospitalizations during the 2024--2025 season for both high and moderate immunization coverage scenarios. Median projections for infants under 6 months old in the highest coverage scenario, wherein nearly all infants were immunized, showed 72.6% lower ED visits and 73.4% lower hospitalizations when compared to the reference scenario, equating to 262 ED visits and 85 hospitalizations averted per 10,000 population. Conclusions Our results support the use of modeling methods for public health insights and suggest that RSV immunizations for infant populations result in significantly lower RSV-related ED encounters in Massachusetts.

4
Bias from small-count suppression in county-level cancer disparity estimates: a calibrated simulation study

gahan, k.

2026-06-08 epidemiology 10.64898/2026.06.05.26355021 medRxiv
Top 0.1%
3.6%
Show abstract

Abstract Background. Area-level cancer disparities are routinely estimated from public county data in which rates based on small counts (fewer than 16 cases or deaths) are suppressed. Analysts typically drop suppressed counties (complete-case analysis). Because suppression depends on case counts tied to population size and demographic composition, this missingness may be informative, but its effect on the disparity estimate has not, to our knowledge, been quantified. Methods. In a cross-sectional ecological study of 3,143 U.S. counties (analytic sample 3,018 with computable exposure) using one frozen public release of NCI State Cancer Profiles incidence and mortality data and ACS 2018-2022 5-year data, we estimated the most- versus least-deprived ICE(race+income) quintile rate ratio (RR) and rate difference for female breast, stomach, and cervix cancers under four suppression-handling methods: complete-case, available-case, bounding, and model-based small-area estimation. We characterized which counties were erased, and, following the ADEMP framework, ran a Monte Carlo simulation (1,000 replicates per cell; Monte Carlo standard error of bias approximately 0.0025) calibrated to the release to measure bias against a known truth. Analyses were pre-registered. Results. The suppressed fraction rose with rarity: 7.4% of counties for breast, 61.3% for stomach, and 75.7% for cervix incidence. Suppression was concentrated in the most-deprived quintile (cervix, 81.8% suppressed vs 63.8% least-deprived) and overwhelmingly removed rural rather than minority residents (cervix: 81% of the rural but 9% of the minority population erased). For breast (little suppression) the RR was 0.87 (95% CI 0.85-0.89) and identical across methods; for cervix incidence the complete-case RR (1.56) exceeded the model-based estimate (1.50), and for cervix mortality (91% suppressed) complete-case (1.86) exceeded model-based (1.56) by 16% with a wide bounding interval (1.88-2.62). In calibrated simulation, population-weighted complete-case bias was small (less than 2%) at the observed deprivation-county-size correlation and grew with rarity, threshold, and unweighted aggregation; its direction was conditional, becoming positive (over-estimation) as deprived counties became smaller. Conclusions. Complete-case handling of suppressed counties over-estimates rare-cancer area disparities relative to methods that retain them, while silently erasing most of the rural and most-deprived communities the estimate is meant to represent. The effect is negligible for common cancers and grows with rarity. Public-data disparity analyses should report the suppressed fraction and use bounded or model-based estimates by default. Keywords: cancer disparities; small-count suppression; Index of Concentration at the Extremes; informative missingness; small-area estimation; rural health.

5
Virtually Delivered Psychosocial Intervention for Mothers Expecting a Baby with Congenital Heart Disease: A Proof-of-Concept Study of HEARTPrep

Sood, E.; Canter, K.; Arasteh, K.; Kazak, A. E.

2026-06-05 cardiovascular medicine 10.64898/2026.06.03.26354861 medRxiv
Top 0.1%
3.5%
Show abstract

Background: Maternal mental health problems are common after prenatal diagnosis of congenital heart disease (CHD), with long-term implications for child and family wellbeing. HEARTPrep is a prenatal psychosocial intervention with three self-paced modules and corresponding telehealth sessions, delivered during pregnancy via mobile app to improve mental health and wellbeing for mothers expecting a baby with CHD. This proof-of-concept study evaluated the feasibility of HEARTPrep and examined maternal mental health and psychosocial functioning throughout participation. Methods: Participants were mothers receiving care for a fetal CHD diagnosis within one health system. Feasibility was assessed via rates of enrollment and completion. Mothers completed 4-item PROMIS questionnaires assessing anxiety, depression, and social isolation and reported self-efficacy and hope on a weekly basis throughout HEARTPrep. Results: Of 34 recruited mothers, 29 (85%) enrolled and two were subsequently not eligible (delivery prior to participation, change in fetal diagnosis), resulting in a final sample of 27 mothers. The majority (n = 22, 81%) completed all three telehealth sessions and Modules 1 (n = 22, 81%) and 2 (n = 19, 70%), with just over half (n = 14, 52%) completing Module 3 prior to delivery. Mean PROMIS depression T-scores decreased from 57.5 to 52.9, and 48% of mothers had a decrease in depression scores exceeding the meaningful change threshold (half standard deviation). The percentage of mothers reporting high self-efficacy increased from 19% to 48%. Conclusions: HEARTPrep is feasible and corresponds with reduced maternal depression and increased self-efficacy, supporting proof-of-concept. A randomized controlled trial is needed to determine whether HEARTPrep improves outcomes compared to a control group.

6
Integrated cardiometabolic and nutritional risk profiling identifies pregnancy loss as a marker of systemic metabolic vulnerability

Agarwal, T.; Namburu, J. R.; Kachroo, P.

2026-06-08 epidemiology 10.64898/2026.06.04.26354910 medRxiv
Top 0.2%
3.1%
Show abstract

Background: Pregnancy loss has important implications for womens health. Although maternal age is a well-established risk factor, the contribution of routinely measured cardiometabolic and behavioral markers at population-scale remains incompletely characterized. Objective: To examine associations between cardiometabolic, nutritional, and behavioral risk markers and pregnancy loss among U.S. women of reproductive age. Methods: We conducted a cross-sectional analysis of 4,842 U.S. women aged 20-44 years with [≥]1 pregnancy using the National Health and Nutrition Examination Survey data (2013-2023). Pregnancy loss was defined as [≥]1 prior miscarriages. Exposures included body mass index, smoking exposure (cotinine), lipid biomarkers, vitamin D and folate, and a composite cardiometabolic-nutritional risk score. Survey-weighted logistic regression estimated adjusted odds ratios (aORs) and 95% confidence intervals, with bootstrap resampling for predictor robustness. Results: The weighted prevalence of pregnancy loss was 23%. Higher odds of pregnancy loss were associated with increasing age (aOR per year=1.02; 95% CI: 1.00-1.04), Non-Hispanic Black race (aOR=1.32; 95% CI: 1.00-1.74), overweight (aOR=1.56; 95% CI: 1.16-2.11), obesity (aOR=2.06; 95% CI: 1.39-3.05), and smoking (aOR=1.58; 95% CI: 1.19-2.10). Adverse lipid profiles, particularly elevated triglycerides (aOR=1.83; 95% CI: 1.16-2.90) and high low-density lipoprotein (aOR=2.97; 95% CI: 1.45-6.61), were independently associated with pregnancy loss. Vitamin D/folate were not stable predictors. Higher composite cardiometabolic-nutritional risk scores were observed among women with pregnancy loss (P=0.026). Conclusion: Pregnancy loss clustered with adverse cardiometabolic and behavioral risk markers in a nationally representative population. These findings highlight pregnancy loss as a marker of broader metabolic vulnerability supporting the need for longitudinal studies and cardiometabolic profiling to inform preconception care and risk stratification.

7
Development of Longitudinal, Linked Maternal-Infant Cohorts using the Epic Cosmos Electronic Health Record Dataset

Leonard, S. A.; Dysart, K.; Callahan, A.; Siadat, S.; Zhang, J.; Handley, S. C.; Huybrechts, K. F.; Igbinosa, I.; Bateman, B. T.

2026-06-04 epidemiology 10.64898/2026.06.02.26354757 medRxiv
Top 0.2%
2.7%
Show abstract

Background: Epic Cosmos is a relatively new centralized electronic health record dataset with high potential utility in perinatal epidemiologic research. Objectives: The study objectives were to develop replicable steps to create longitudinal, linked maternal-infant cohorts in Cosmos, assess completeness of key variables, evaluate potential selection bias with restrictions for longitudinal healthcare encounters, and provide an example epidemiologic analysis. Methods: We created maternal-infant cohorts by starting with live births during 2023-2024 recorded in the BirthFact data table and joining with additional data tables as needed. We selected and created variables for perinatal characteristics, common comorbidities, and routinely measured vital signs and laboratory values, and assessed variable completeness. We sequentially restricted the birth cohort for maternal-infant linkage and longitudinal healthcare from first-trimester prenatal care encounter through infant follow-up care within 12 weeks post-discharge from birth hospitalization. Finally, we conducted an example analysis of the association between high systolic blood pressure in the first trimester ([≥]140 mm Hg) and later onset of preeclampsia among those with chronic hypertension. Results: The total linked birth cohort included 2,624,186 pregnancies. Completeness was >90% for most variables assessed but was 77% for racial and ethnic group and 76% for body mass index at delivery. Characteristics of the cohort were similar to those reported for the entire United States birth population based on birth certificate data, including similar regional and racial-ethnic composition. Longitudinal cohort restriction requiring linked records from first trimester prenatal care through infant follow-up care reduced the cohort size to 509,148 pregnancies. However, restriction had minimal effects on cohort characteristics. In the example analysis, high systolic blood pressure was associated with increased risk of preeclampsia among those with chronic hypertension (aRR: 1.26; 95% CI: 1.22, 1.30). Conclusions: This study provides a rigorous and reproducible approach to creating longitudinal, linked maternal-infant cohorts in Epic Cosmos and the analytical findings suggest high data quality and representativeness.

8
A New Mixed Frequency Regression Model For Environmental Epidemiology

Shukla, N.; Bartington, S. E.; Hansell, A. L.; Lucas, T. C.

2026-06-04 epidemiology 10.64898/2026.06.03.26354801 medRxiv
Top 0.2%
2.1%
Show abstract

Background: In the absence of high-resolution response data, exposure-response modelling often relies on aggregated low-frequency exposure data, leading to loss of high-resolution information. Mixed Data Sampling (MIDAS) from econometrics offers an alternative but is limited due to its inability to make high-resolution predictions, inflexible likelihoods and penalised nonlinear functions, and limited visualization options. We propose a mixed-frequency Distributed Lag Non-linear Model (mf-DLNM) which can eliminate the need to aggregate exposure data in environmental epidemiology and provide high resolution predictions for time series studies. Methods: We evaluated the inference and predictive performance of the mf-DLNM. To evaluate its ability to estimate exposure-response relationships, we applied mf-DLNM and same-frequency (sf)-DLNM using data from the West Midlands, UK. Additionally, we compared the predictive performance of mf-DLNM with sf-DLNM and MIDAS across nine regions of England. As MIDAS cannot predict at the resolution of the predictor (daily), we compared the predictive performance of mf-DLNM and MIDAS at weekly resolution. To test the model's ability to predict high temporal resolution risk (daily), we compared sf-DLNM (with access to daily mortality counts) with mf-DLNM (with access only to weekly mortality counts). Results: In the West Midlands example, mf-DLNM performed comparably to sf-DLNM in estimating daily risk of temperature on respiratory mortality. Furthermore, mf-DLNM and MIDAS exhibited similar performance for weekly predictions. For high-resolution predictions, mf-DLNM and sf-DLNM showed nearly similar performance, despite mf-DLNM having access only to low-resolution response data. Conclusion: This mixed-frequency approach in environmental epidemiology overcomes the limitations of predicting health risks using aggregated exposure data and provides estimates of high-resolution outcomes in the absence of high-frequency health outcome datasets.

9
Disentangling infectiousness and susceptibility by age group using transmission pair data: a study of SARS-CoV-2 household transmission

Leung, K. Y.; Miura, F.; Backer, J. A.

2026-06-05 epidemiology 10.64898/2026.06.04.26354892 medRxiv
Top 0.5%
0.8%
Show abstract

Background Differential contributions to transmission across age groups have been reported for many respiratory infections, including SARS-CoV-2. They are crucial for estimating the impact of age-specific interventions. Disentangling these age-dependent contributions remains challenging, as they may reflect differences in contact rates, biological susceptibility, or infectiousness. Aim We aim to jointly estimate age-specific per-contact infectiousness and susceptibility and their effect on the impact of age-specific interventions. Methods The age-specific infectiousness and susceptibility were jointly estimated in a Bayesian framework by combining contact data with transmission pair data (who-infected-whom). We applied this approach to 197,840 self-reported household transmission pairs collected in the Netherlands during the COVID-19 pandemic. Using these estimates, we projected the expected impact of school closure and work-from-home measures during the early stages of an epidemic in the absence of other interventions. Results Both infectiousness and susceptibility to SARS-CoV-2 infection were lowest in children aged 0-9 years and highest in adults over 30 years old, with 2- to 4.5-fold differences between these groups. Projected impacts of age-specific interventions indicated that school closures would reduce the reproduction number by 8% or 29% when age-specific susceptibility and infectiousness were or were not considered, respectively. Conversely, working-from-home policies would lead to reductions of 41% with and 20% without age-specific infectiousness and susceptibility. Conclusion Our method enables robust estimation of age-specific infectiousness and susceptibility. Accounting for these age heterogeneities is essential for projecting the impact of age-targeted interventions. Our approach is adaptable to other respiratory infections and can guide more tailored public health responses.

10
A Decade of the Center for Disease Control and Prevention's FluSight Influenza Forecasting

Hines, A. G.; Mathis, S. M.; Johansson, M. A.; Biggerstaff, M.; Reed, C.; Borchering, R.

2026-06-08 epidemiology 10.64898/2026.06.05.26354941 medRxiv
Top 0.6%
0.7%
Show abstract

Since the U.S. 2013/14 influenza season, the CDC's FluSight Challenge has provided a platform for evaluating influenza forecasting models and fostering collaboration across institutions. The Challenge aims to improve the science and enhance the utility of infectious disease forecasts for public health decision making. We analyzed ten years of submitted forecasts (2014/15-2019/20 (influenza-like illness seasons) and 2021/22-2024/25 (hospital admissions seasons)) across a range of model types, including statistical, mechanistic, machine learning, and hybrid models. Influenza-like illness (ILI) forecasts were evaluated using the exponentiated logarithmic score (skill metric) while hospital admissions forecasts were evaluated using the log transformed relative Weighted Interval Score. Corresponding potential performance differences were assessed using Wilcoxon rank-sum tests, and associations with team participation history were evaluated using Spearman's rank correlation. Model performance varied by season, and no single model type consistently outperformed others. In ILI seasons, statistical models generally performed better than mechanistic and machine learning models, though consistent differences were not observed in more recent hospital admissions seasons. Ensemble forecasts showed better overall performance across seasons, and the CDC's FluSight ensemble ranked among the top-performing forecasts every year. We also found a positive correlation between forecast accuracy and the number of years a team participated in the Challenge, with statistically significant associations in four seasons. These findings highlight the benefits of ensemble approaches and sustained engagement in improving forecasting performance, while also underscoring the continued value of forecast evaluation before and following the COVID-19 pandemic. Insights from the FluSight Challenge can guide future infectious disease forecasting efforts and support more effective public health preparedness.

11
Serum Cotinine and Wrist-Worn Ambient Light Exposure Patterns in U.S. Adults: A Cross-Sectional Analysis of NHANES 2011-2014

Wong, A.; Lee, C. W.; Park, A.; Yin, L.; Choi, Y.

2026-06-04 epidemiology 10.64898/2026.06.02.26354759 medRxiv
Top 0.6%
0.7%
Show abstract

Background. Tobacco smoke exposure, quantified by serum cotinine, is associated with cardiovascular, metabolic, and sleep-related health risks. The relationship between biomarker-verified tobacco smoke exposure and objectively measured, free-living wrist-worn ambient light patterns has not been examined in a nationally representative U.S. adult sample. Methods. We analyzed NHANES 2011-2014 cross-sectional data from 6,937 adults aged >20 years with valid serum cotinine and wrist-worn Physical Activity Monitor (PAM) ambient light data. Seven light outcomes were modeled using survey-weighted linear regression with log2(cotinine+1) as the continuous exposure across four covariate adjustment levels. Benjamini-Hochberg false discovery rate (FDR) correction was applied across the 7 outcomes within each model. Results. In Model 2 (adjusted for age, sex, race/ethnicity, education, poverty-income ratio, BMI, and survey cycle; N = 6,350), higher serum cotinine was associated with significantly higher nighttime light (beta = +0.024, 95% CI: 0.010, 0.038; p-FDR = 0.014) and lower evening light (beta = -0.031, 95% CI: -0.055, -0.008; p-FDR = 0.042). In exploratory behavioral models without alcohol (Model 3a; N = 5,766), both nighttime and evening associations remained FDR-significant. After additional adjustment for alcohol, which substantially reduced the sample due to 37.6% missingness (Model 3b; N = 3,866), the nighttime association attenuated below the FDR threshold, while the evening association remained FDR-significant. Categorical analyses showed progressively higher nighttime light across cotinine groups, and a hypothesis-generating sex interaction was identified (p-interaction = 0.001). Conclusions. Higher serum cotinine concentrations were associated with higher nighttime and lower evening ambient light after sociodemographic adjustment. Attenuation after behavioral adjustment and the cross-sectional design preclude causal inference. Longitudinal studies with formal mediation analyses are needed to clarify the temporal ordering and mechanisms linking tobacco smoke exposure, smoking-related behaviors, and personal light-dark cycle patterns.

12
Early life multidimensional disadvantage of South Australian children: a whole-population linked data study

Kalamkarian, A.; Pilkington, R. M.; Lynch, J.; Mittinty, M. N.; Malvaso, C.; Hawkins, K.; Pharo, H.; Beck, K.; Chittleborough, C. R.

2026-06-05 epidemiology 10.64898/2026.06.03.26354860 medRxiv
Top 0.7%
0.6%
Show abstract

Background: Whole-population linked administrative data platforms provide an opportunity to generate evidence on early life multidimensional disadvantage to inform resourcing and service provision to families with complex needs. Methods: We used individual-level de-identified data from nine administrative data sources included in the Better Evidence Better Outcomes Linked Data (BEBOLD) platform. The population included all children born in South Australia between 2004-2011 (n=143,083), and their parents. We described the prevalence and distribution of multiple disadvantages affecting children from the 12 months before birth to age 5. Eleven domains of parental disadvantage were created: economic, education, access to services, mental health, substance misuse, smoking during pregnancy, domestic and family violence, health, child protection contact, justice system contact, and death. We investigated the concordance of our measure with an area-level socioeconomic measure used in government reporting. Results: One in two children (48%) were exposed to at least one disadvantage domain, and one in seven (14%) were exposed to three or more domains before age five. Economic disadvantage was most prevalent, affecting one in four (27%) children, of which 75% were exposed to additional forms of disadvantage. Substance misuse, domestic and family violence, and justice system contact were the least likely domains to occur in isolation. Only 54.4% who experienced five or more disadvantage domains were classified in the area-level socioeconomic measure's 'most disadvantaged' quintile. Conclusion: Early life exposure to parental disadvantage can be highly multidimensional. Measurement across different systems is important for informing coordinated service provision for families with complex needs.

13
Spatial and temporal associations between animal ownership and malaria prevalence in Africa using cross-sectional national Demographic and Health Surveys

Topazian, H. M.; Morgan, C. E.; Goel, V.

2026-06-08 epidemiology 10.64898/2026.06.05.26355017 medRxiv
Top 1%
0.3%
Show abstract

Use of zooprophylaxis as a malaria control strategy has been recommended historically, but a complex relationship exists between animal ownership and malaria infection, with mixed associations described in the literature. We sought to characterize this relationship spatially and temporally in malaria-endemic regions of Africa. We used data from 392,843 individuals from 66 Demographic and Health surveys from countries within Africa to investigate the association between household animal ownership and Plasmodium infection. We used Bayesian models with Integrated Nested Laplace Approximation to incorporate spatially varying coefficient processes, allowing the association of interest to vary over space, time, and within strata of vector species occurrence, land cover, and number of animals owned by households. Spatially varying intercept models showed that ownership of cattle, chickens/poultry, goats, horses/donkeys/mules, pigs, and sheep was broadly associated with malaria infection, with odds ratios ranging from 1.55 to 1.67. However, spatially varying slope models revealed considerable heterogeneity, with odds ratio estimates for all animal types demonstrating both protective and harmful effects varying from 0.33 to 3.33 both subnationally and across time. We found no evidence that modification by vector species, number of animals owned, and land cover fully explained the variation in estimates. Unobserved localized cultural, behavioral, or ecological factors likely modify the association between animal ownership and malaria prevalence. Further exploring the nature of this relationship over space and time will be important to understanding how context-specific One Health dynamics between humans, animals and the environment affect malaria prevention and control efforts.

14
Revisiting Plasmodium vivax molecular correction

Taylor, A. R.; Foo, Y. S.; White, M. T.

2026-06-04 infectious diseases 10.64898/2026.06.02.26354709 medRxiv
Top 1%
0.3%
Show abstract

Background: Reliable inference of Plasmodium vivax recurrence states - relapse, recrudescence and reinfection (the ``3Rs'') - improves estimates of antimalarial efficacy. The R package Pv3Rs features a Bayesian model designed for P. vivax molecular correction, i.e., using parasite genetic data to infer recurrence states. The model is an extension of a prototype built to analyse microsatellite data from the Vivax History (VHX) and Best Primaquine Dose (BPD) trials. Methods: We re-analysed data from 212 VHX and BPD trial participants (493 recurrences) using Pv3Rs, comparing results with those from the prototype and with genetic relatedness estimated using Dcifer, a tool for estimating relatedness based on identity-by-descent. Posterior recurrence state probabilities were computed using both uniform and time-to-event priors: artificial but equal prior probabilities facilitate posterior interpretation, while time-to-event priors leverage all available information and enable re-computation of failure rates. Relatedness estimates were used to identify and correct instances of model misspecification. Results: The Pv3Rs model generated posterior probabilities for all recurrences and was able to jointly model data on all episodes per participant for 89% of participants, compared with 73% using the prototype. Recurrence state probabilities were broadly consistent across methods, though the Pv3Rs model elevated reinfection probabilities slightly. Relatedness estimates exposed various outliers consistent with half-sibling parasites and/or genotyping errors. Outlier correction impacted some per-participant failure probabilities, but reinfection-adjusted radical-cure failure rates of high-dose primaquine remained near 3%, in line with previous findings. Conclusion: Re-analysis of VHX and BPD P. vivax genetic data restates earlier reinfection-adjusted efficacy estimates. It demonstrates the increased computational capability and misspecification sensitivity of Pv3Rs, highlighting a need for careful analyses. Using relatedness-based diagnostics alongside model-based inference, we were able to harness the advantages of model-based inference and provide a framework for future P. vivax molecular correction.

15
Assessing the impact of absence of coordination in malaria intervention strategies: a modelling study

Iggidr, Y.; Ruktanonchai, N. W.; Benhana, B.; Turbe, V.; Bauzile, B.; Ward, A.; Cohen, J.; Pothin, E.; Champagne, C.

2026-06-05 epidemiology 10.64898/2026.06.03.26354857 medRxiv
Top 1%
0.3%
Show abstract

Malaria control programs are increasingly tailored at subnational scales; however, neighboring areas remain connected through human mobility, allowing parasite importation that may undermine independently timed interventions. Although the spatial targeting of control has been the focus of extensive research, the epidemiological consequences of temporal misalignment in intervention deployment across interconnected regions remain to be elucidated. We investigate how asynchronous timing of malaria interventions affects transmission dynamics using a two-patch susceptible-infected-susceptible metapopulation model. We compare synchronous and asynchronous intervention schedules and quantify their impact using measures of excess cumulative incidence attributable to asynchrony. The measure that will be used for this purpose is referred to as Asynchrony Induced Growth (AIG). Across a range of 10,000 parameter combinations, asynchronous implementation has been observed to result in a heightened incidence compared to synchronized deployment, though the impact is typically negligible in most endemic settings. Sensitivity analyses indicate that the impact is most significant when interventions are highly effective, infectious duration is brief, and transmission intensity approaches the elimination threshold. In such circumstances, asynchrony has the potential to substantially inflate case numbers, delay transmission interruption, or even prevent elimination entirely. In illustrative scenarios that reflect realistic settings, synchronizing interventions has been shown to avert large numbers of infections and shorten elimination timelines by years to decades. These findings demonstrate that, beyond spatial targeting, temporal coordination of interventions across connected areas can meaningfully enhance malaria control and elimination. Coordinated timing may be particularly valuable for cross-border or near-elimination programs and should be considered in operational planning and resource allocation.

16
A wealth index based on two-component polychoric principal component analysis reduces urban bias and improves socioeconomic classification in low- and middle-income country surveys: a validation study using LSMS surveys

Vidaletti, L. P.; Dos Santos, A. M.; Hellwig, F.; Barros, A. J. D.

2026-06-08 epidemiology 10.64898/2026.06.01.26354245 medRxiv
Top 1%
0.3%
Show abstract

Background: The traditional wealth index, based on principal component analysis (PCA), used in the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS), suffers from urban bias, distorting estimates of health inequality. We compared the traditional index (PEAR1) with an alternative two-component polychoric PCA index (POLY2) using annual expenditure from 12 LSMS surveys as the gold standard to determine which provides more accurate SEP measures for equitable policy targeting. Methods: We compared the traditional wealth index (PEAR1) with a two-component polychoric PCA approach (POLY2) using 12 LSMS (Living Standards Measurement Study) surveys (2015-2022) from 12 African countries. Annual household consumption expenditure was the gold standard. We assessed agreement using weighted Cohen's kappa and validated against education (proportion of households with secondary or higher education) using the concentration index (CIX) and slope index of inequality (SII). Results: The POLY2 index showed higher agreement with expenditure quintiles (average national weighted kappa = 43.3%) than the PEAR1 index (35.1%), with notable improvements in urban (43.5% vs. 27.5%) and rural (35.3% vs. 22.4%) areas. POLY2 also attenuated extreme household distributions observed in PEAR1. Education validation showed that POLY2 produced intermediate inequality gradients between the flatter expenditure-based gradient and the steeper PEAR1-based gradient. Conclusion: The POLY2 wealth index is superior to the traditional index, reducing urban-rural bias and providing more accurate socioeconomic classifications. Its adoption in large-scale surveys such as DHS and MICS is recommended to improve equitable monitoring of health inequalities in low- and middle-income countries.

17
Within-household transmission risk of pulmonary tuberculosis in the era of universal antiretroviral therapy

Khan, P. Y.; Govender, I.; McCreesh, N.; Sithole, M.; Mkwanzai, E.; Sweeney, S.; Ording-Jespersen, G.; Wong, E. B.; Hanekom, W.; Houben, R. M. G. J.; White, R. G. M. G. J.; Smit, T.; Smith, M. J.; Fielding, K.; Grant, A. D.

2026-06-09 epidemiology 10.64898/2026.06.01.26354571 medRxiv
Top 1%
0.3%
Show abstract

Background Tuberculosis remains the leading infectious cause of death worldwide. In the WHO African region, declining incidence has coincided with antiretroviral therapy (ART) scale-up, though whether this reflects reduced progression to disease or reduced transmission is unclear. We evaluated how ART and symptom status influence within-household Mycobacterium tuberculosis complex (MTBC) transmission risk. Methods We conducted a case-contact household study in rural South Africa, enrolling index adults with bacteriologically-confirmed pulmonary tuberculosis. MTBC immunoreactivity was measured in all child household contacts (aged 2-14 years) as a proxy measure of within-household transmission. We assessed the influence of index person ART status and symptom status, and explored effect-measure modification of the association between index person HIV status and transmission risk by sex. Results Among 755 child contacts of 296 index persons, effective ART was not associated with within-household MTBC transmission risk (risk ratio [RR], 1.07; 95% CI, 0.66-1.74). Among PLHIV engaged in ART care, WHO TB four-symptom screen (WHO4SS) status was not associated with transmission risk (RR, 0.80; 95% CI, 0.43-1.47), although absence of reported cough reduced risk (RR, 0.61; 95% CI, 0.38-0.96). A pronounced interaction between sex and HIV status was observed: HIV-negative women had the highest within-household MTBC transmission risk (30.5% vs. 14.3% in women with HIV) whereas risks were similar between HIV-positive and HIV-negative men. Conclusions We found no evidence that effective ART or WHO4SS status influenced within-household MTBC transmission risk, though confidence intervals were wide. Absence of reported cough was associated with lower risk, and transmission risk was highest among child contacts of HIV-negative women. These findings suggest reported cough is a useful marker of transmission risk and that routine tuberculosis screening within ART care may reduce transmission from PLHIV; intensified efforts are nonetheless needed to achieve earlier tuberculosis detection in HIV-negative individuals.

18
Mortality in people with attention-deficit/hyperactivity disorder (ADHD): Examining how risk is embodied in a pooling of two prospective cohort studies

Li, H.; Ford, T.; Warrier, V.; Bell, S.; Batty, G. D.

2026-06-09 epidemiology 10.64898/2026.06.08.26355148 medRxiv
Top 1%
0.3%
Show abstract

Background. Nascent findings suggest that people with attention-deficit/hyperactivity disorder (ADHD) experience higher rates of mortality. To date, study samples have been insufficiently well-characterized to examine the mechanisms via which this neurodevelopmental condition elevates mortality risk. Methods. We used data from the 2007 and 2011 waves of the US National Health Interview Survey, a general population-based cohort study comprising 52097 adults (28675 women) aged 18 years or older at baseline. ADHD diagnosis and an array of demographic, socioeconomic, lifestyle, and co-morbidity (somatic and psychiatric) covariates were self-reported. Findings. At baseline, compared with unaffected individuals, participants with ADHD were more likely to be socioeconomically disadvantaged, smoke cigarettes, consume alcohol, and report symptoms of psychological distress. A median 7.75 years of mortality surveillance (range: 7.25-12.25) gave rise to 6597 deaths from all-causes. After adjustment for age, sex, ethnicity, and survey year, ADHD was associated with a markedly elevated risk of death (hazard ratio [95% confidence interval]: 1.58 [1.20-2.09]). Statistical adjustment for socioeconomic circumstances (11% attenuation), physical co-morbidities (15%), and lifestyle factors (17%) had only a modest impact on the ADHD-death gradient, with the greatest explanatory power apparent for symptoms of depression and anxiety (58%). The magnitude of the association of ADHD with mortality was commensurate to that for several well-established risk factors such as poverty (1.66 [1.55-1.78]), hypertension (1.41 [1.32-1.51]), and diabetes (1.71 [1.59-1.85]) but somewhat lower than cigarette smoking (2.51 [2.29-2.76]) after controlling for age, sex, ethnicity, and survey year. Associations between ADHD and cause-specific mortality from cardiovascular disease, cancer, and chronic respiratory disease were inconclusive. Interpretation. In the present study, the influence of ADHD on total mortality appears to be largely embodied via a series of malleable characteristics, particularly mental illness. If confirmed elsewhere, these results raise the possibility that risk factor modification via standard pharmacological and behavioral interventions could help reduce rates of premature mortality in this patient group. Funding. This paper received no direct funding. GDB is supported by the UK Medical Research Council (MR/P023444/1) and the US National Institute on Aging (1R56AG052519-01, 1R01AG052519-01A1).

19
KESOZI Digital Twin: Physics-Informed Neural Network for Independent Estimation and Prediction of Childhood Diarrheal Disease Burden in Kenya, Somaliland, and Zimbabwe

KESOZI Digital Twin, ; Agumba, J. O.; Namusonge, L.; Ogendo, J.; Hassan, M. A.; Pembere, A.; Takavarasha, M.

2026-06-04 epidemiology 10.64898/2026.06.03.26354823 medRxiv
Top 1%
0.2%
Show abstract

Childhood diarrheal disease remains a leading cause of morbidity and mortality among children under five years in sub-Saharan Africa, particularly in settings affected by inadequate sanitation, climate variability, malnutrition, and limited healthcare access. Conventional forecasting approaches are often constrained by sparse surveillance data, weak spatial representation, and limited incorporation of mechanistic disease dynamics. This study presents a Physics-Informed Multimodal Artificial Intelligence Digital Twin framework that integrates Physics-Informed Neural Networks, Graph Neural Networks, diffusion-reaction epidemiological modeling, multimodal fusion learning, and Digital Twin simulation to estimate and predict childhood diarrheal disease burden in Kenya, Somaliland, and Zimbabwe. Using public epidemiological, environmental, climate, sanitation, and synthetic proof-of-concept datasets, the framework modeled temporal disease dynamics, spatial transmission, pathogen-attributed burden, and outbreak trajectories while enforcing epidemiological consistency through physics-informed optimization. Results demonstrated robust forecasting performance, enhanced spatial transmission modeling, uncertainty-aware predictions, and realistic outbreak simulations across the three countries. Rotavirus, Shigella, and Cryptosporidium were identified as major contributors to modeled mortality burden, while unsafe water exposure, poor sanitation, malnutrition, and climate-sensitive transmission substantially increased disease risk. Compared with a Bayesian baseline model, the multimodal framework achieved superior nonlinear risk characterization, geospatial learning, and temporal prediction. These findings highlight the potential of scientific machine learning and digital twin systems for infectious disease surveillance, outbreak forecasting, climate-health analytics, and evidence-based public health decision-making in low-resource African settings. Keywords: Physics-Informed Neural Networks, Graph Neural Networks, Digital Twin, Childhood Diarrheal Disease, Epidemiology, Kenya, Somaliland, Zimbabwe, Scientific Machine Learning, Spatial Epidemiology, Multimodal Fusion

20
Comparison of the Mini Parasep SF, ParaPak SpinCon, and Paradevice fecal filtration and concentration devices for microscopic and AI-assisted detection of intestinal parasites

Morris, H.; Pritt, B. S.

2026-06-04 infectious diseases 10.64898/2026.06.02.26354769 medRxiv
Top 1%
0.2%
Show abstract

Effective filtration and concentration of stool specimens is an essential pre-analytical step for reducing fecal debris and improving organism recovery using microscopy-based ova and parasite (O&P) examination. This study evaluated three commercially available fecal sedimentation-based filtration/concentration systems, ParaPak SpinCon (Meridian Bioscience), Mini Parasep SF (Apacor), and the newly-available ParadeviceReingenuity), for qualitative parasite detection and workflow logistics using conventional and artificial intelligence (AI)-assisted microscopy. Forty clinical stool specimens (20 parasite-positive and 20 parasite-negative) were processed with the 3 devices, and the resultant 120 wet mount and 120 trichrome stained smear preparations were examined using conventional microscopy. Trichrome-stained slides were also scanned at 40x magnification using a Hamamatsu NanoZoomerS360 flatbed digital slide scanner and images were analyzed using the Techcyte Fusion Human Fecal Trichrome AI algorithm. Positive and indeterminate digital findings were confirmed by conventional glass slide microscopy. Slides and digital images were reviewed in a blinded manner. Concordance was assessed among the 360 initial evaluations (microscopy and AI-assisted), and discrepant parasitology results were resolved through re-review and specimen reprocessing as needed. Final qualitative agreement across slide/image evaluations using all three concentration systems was 100%. Minor discrepancies in protozoan and white/red blood cell detection/identification were noted in 5 and 7 cases, respectively, and likely reflected sampling and observer variability. While the three concentration systems produced equivalent qualitative results, the Paradevice and Mini Parasep SF offered the most streamlined workflows. These findings support the Paradevice and Mini Parasep SF as efficient, analytically equivalent systems that are compatible with traditional and AI-assisted O&P workflows.