Back

Mathematics

MDPI AG

Preprints posted in the last 30 days, ranked by how well they match Mathematics's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Identification of a Fractional Model for an Outbreak of the Dengue Fever

Cresson, J.; Pere, M.; Szafranska, A.

2026-05-27 epidemiology 10.64898/2026.05.26.26354120 medRxiv
Top 0.1%
2.0%
Show abstract

This work focuses on the global and partial identification problem for fractional differential equations. We provide a general numerical procedure based on global and local optimization algorithms with two refinements for biological systems that ensure solution positivity and homogeneous parameter units. The method is applied to a new fractional model of Dengue outbreak called the Fractional Homogeneous Nishiura (FHN) model, calibrated using data of newly infected people in Cape Verde. We show that our identification method yields a better fit between data and model solutions than previous approaches and that our FHN model captures the dynamics of Dengue more closely than existing systems.

2
Physics-Informed Neural Networks for Parameter Recovery in the Repressilator Oscillatory Model

Casajuana, B.; Casals-Franch, R.; Lopez Garcia de Lomana, A.; Marti-Puig, P.; Villa-Freixa, J.

2026-05-15 bioinformatics 10.64898/2026.05.12.724679 medRxiv
Top 0.2%
1.3%
Show abstract

Parameter estimation in nonlinear biological dynamical systems is a difficult inverse problem because the governing equations are often stiff or oscillatory, the data are sparse and noisy, and the objective landscape is non-convex. Physics-informed neural networks (PINNs) offer an alternative to purely simulation-based calibration by representing state trajectories with neural networks while penalizing violations of the governing equations. This paper studies the empirical reliability of PINNs for recovering the parameters of the repressilator, a synthetic genetic oscillator formed by three cyclically repressive genes. We use synthetic time-series generated from the standard ordinary differential equation model and train inverse PINNs to estimate the production parameter {beta} and the Hill coefficient n. The study varies observation noise, partial observation of repressors, sampling density, sensitivity to initial parameter guesses, and the difference between stable and oscillatory regimes. The results show that PINNs can reconstruct trajectories accurately when the model structure is correct and the three repressors are observed, but parameter recovery is more fragile than trajectory fitting. Noise, sparse sampling, unobserved variables, and unfavorable initial guesses increase the risk of biased estimates. The stable regime is easier to reconstruct, whereas the oscillatory regime provides richer information but also exposes optimization sensitivity. These findings support PINNs as a useful reverse-engineering tool for small gene-regulatory ODE models, while highlighting the need for repeated runs, uncertainty reporting, and experimental designs that improve identifiability.

3
Denoised MDS-UPDRS Part-III Scores Yield New Patterns of Progression Heterogeneity in Early Stage Parkinson's Disease

Koss, J.; Tinaz, S.; Tagare, H.

2026-05-08 bioinformatics 10.64898/2026.05.04.722810 medRxiv
Top 0.5%
0.7%
Show abstract

Parkinsons Disease (PD) Motor Scores (MDS-UPDRS Part III) are quite noisy. This paper proposes a new methodology for processing these scores by first denoising the scores to enhance the underlying progression signal, and then conducting a high-dimensional analysis which does not sum the scores into a total movement score. The analysis gives novel insights into PD progression heterogeneity: it reveals that the heterogeneity is continuously variable rather than clustered into "subtypes" and that the variability is along two easily understood axes. This analysis also resolves some of the discrepancies in previously reported progression subtypes. Finally, the analysis reveals that patient-specific progression cannot be predicted from baseline using only MDS-UPDRS Part III scores.

4
From naive to foundation: benchmarking models for epidemic forecasting

Wang, D.; Li, Y.; Perra, N.

2026-05-13 epidemiology 10.64898/2026.05.11.26352889 medRxiv
Top 0.5%
0.7%
Show abstract

We systematically evaluate and compare the performance of classical statistical methods (ARIMA), mechanistic compartmental models (SEIR), modern deep learning architectures (LSTM, DLinear, Autoformer), and an emerging time-series foundation model (TabPFN-TS) to forecasts the incidence of Influenza-Like Illness (ILI) across nine European countries. The models are benchmarked against a naive baseline and a multi-model ensemble (RespiCast) created by an initiative of the ECDC. In line with the operational practice of existing forecasting hubs, our entire evaluation is explicitly optimized for short-term horizons (1 to 4 weeks ahead). Interestingly, we found that the foundation model TabPFN-TS allows for great zero-shot inference capabilities. Without any task-specific retraining, it successfully overcomes extreme data scarcity to consistently outperform all other individual architectures, frequently rivalling or surpassing the RespiCast ensemble. Our results highlight how deep learning architectures are severely constrained by extreme data scarcity, typical in epidemic forecasting, requiring targeted endogenous data augmentation to reduce predictive errors. Within the deep learning class of models, we observe that simpler architectures (such as DLinear and LSTM) frequently exhibit greater robustness and outperform complex, attention-based models (such as Autoformer) when data is constrained. Finally, our results show how a weighted ensemble, constructed by fusing all the models, delivers highly robust forecasts in all regions considered. Overall, our findings showcase the transformative potential of zero-shot foundation models in epidemic forecasting and confirm the importance of multi-model ensembles.

5
Tolerance Regions For Compositional Data With Application To Reference Regions For Healthy Microbiome Profiles

Wickramasinghe, N.; Choudhary, P.

2026-05-07 microbiology 10.64898/2026.05.06.723285 medRxiv
Top 0.5%
0.7%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWImbalances in the human microbiome are associated with numerous diseases, highlighting the need for benchmarks that define healthy microbiome composition and identify abnormal deviations. Although the microbiome is increasingly studied as a potential clinical marker, statistical approaches for constructing reference regions of healthy microbiome composition remain relatively underexplored. This work develops statistical methods to construct reference regions for healthy microbiome data, addressing three main challenges. First, since microbiome data contain relative rather than absolute information, standard statistical methods are not directly appropriate. Therefore, microbiome profiles are treated as compositional data satisfying a sum constraint, and log-ratio transformations are used to analyze them in real space while preserving their relative structure. Second, reference regions are constructed as tolerance regions rather than confidence regions, so that they cover a pre-specified proportion of the healthy population with a given confidence level. The proposed framework incorporates both parametric and nonparametric approaches for constructing these tolerance regions. Parametric methods are considered when the ilr-transformed data approximately follow an elliptical distribution, where they can yield smaller regions while maintaining the desired coverage. Nonparametric approaches provide a flexible alternative by avoiding distributional assumptions. Third, because microbiome data are multidimensional and difficult to interpret, quantitative and graphical tools are introduced to assess atypicality and identify which microbial taxa contribute most to deviations from healthy profiles. Simulation studies are conducted to evaluate the performance of the proposed methods. The methodology is then demonstrated by constructing reference regions for healthy microbiome profiles using real-world data. Finally, the approach is applied to microbiome datasets comparing healthy and patient profiles to assess whether patient samples are identified as atypical and to examine which taxa contribute to these deviations. Overall, the proposed framework provides a clear and statistically robust approach for defining healthy microbiome reference regions and detecting atypical microbiome profiles.

6
Automatic Bevacizumab Response Prediction in Ovarian Cancer from Digital Pathology Images via Novel AI-based Computational Pipeline

Alsaiari, A.; Turki, T.; Taguchi, Y.-h.

2026-05-04 bioinformatics 10.64898/2026.04.29.721782 medRxiv
Top 0.6%
0.5%
Show abstract

Ovarian cancer is one of the gynecological cancer types, which, if metastasized and not detected early, can cause deaths among women. Therefore, there is a need to accurately predict drug responses to ovarian cancer. A gynecological pathologist inspects abnormality in tissues, followed by providing a report about patients; however, such a diagnostic process is (1) hard; (2) requires experience; and (3) time consuming. Moreover, existing tools are far from perfect. Hence, we present a computational pipeline to improve predicting drug response pertaining to ovarian cancer, derived as follows. First, we download digital pathology images pertaining to ovarian bevacizumab response from the cancer imaging archive repository. We employed histogram of oriented gradients to images, constructing feature vectors, provided to Fisher linear discriminant analysis to change the representation through dimensionality reduction. Then, we provide reduced-dimensionality data for regression analysis through support vector regression coupled with various kernels and calculating the area under the ROC curve (AUC). Experimental results against transformer-based models (ViT and Swin) and other deep learning (DL) models (VGG16, ResNet50, InceptionV3, MobileNetV2, and EfficientNetB6) demonstrate that our approach with radial kernel (named SVRD+R) yielded an AUC performance improvements of 17% against the best-performing transformer-based model (ViT) while obtaining an AUC performance improvements of 14.9% when compared against the best DL-based model (MobileNetV2). These results demonstrate the superiority and feasibility of our AI-based pipeline when tackling prediction problems pertaining to gynecologic cancer studies. MSC92B05; 68T09

7
An Interpretable Multimodal Framework for Student Mental Health Risk Assessment Using Temporal Embeddings and Fuzzy Inference

Shah, A.; Mehta, A.; Bhensdadia, C. K.

2026-05-20 health informatics 10.64898/2026.05.16.26352630 medRxiv
Top 0.6%
0.5%
Show abstract

Mental health challenges among university students have increased due to academic pressure, lifestyle changes, and continuous digital engagement. Existing approaches for mental health assessment often rely either on self-reported psychological scales or isolated behavioral indicators, limiting their ability to capture complex temporal and contextual patterns. This study proposes an interpretable multimodal framework for student mental health risk assessment using behavioral sensing, academic information, ecological momentary assessments (EMA), and psychometric survey data. A bidirectional Long Short-Term Memory autoencoder is employed to learn latent temporal representations from day-level behavioral sequences, while graph embeddings capture structural relationships among students using similarity-based neighborhood graphs. These representations are fused with academic and survey-derived features and reduced using Principal Component Analysis and Uniform Manifold Approximation and Projection. K-means clustering is then applied to identify behaviorally distinct student groups. Experimental analysis on the StudentLife dataset demonstrates meaningful clustering performance with a Silhouette Score of 0.4209 and Adjusted Rand Index stability of 0.6869. The identified clusters correspond to low-risk, moderate-risk, and high-risk behavioral profiles. To improve interpretability and practical usability, a fuzzy inference system is introduced to compute mental risk, academic risk, and wellbeing indices using psychometric indicators including PHQ-9, PSS, PANAS, VR-12, and Big Five personality traits. The results demonstrate the potential of combining multimodal behavioral modeling with interpretable fuzzy reasoning to support early mental health risk assessment in educational settings.

8
Modeling the Impact of Exposed Cases in a Hantavirus Outbreak on a Cruise Ship

Cui, J.

2026-05-12 epidemiology 10.64898/2026.05.08.26352718 medRxiv
Top 0.8%
0.4%
Show abstract

The emergence of a hantavirus variant aboard a commercial cruise ship presents a significant public health concern. This study develops a discrete-time stochastic Susceptible-Exposed-Infectious-Recovered-Dead model to estimate transmission dynamics, hidden exposed infections, and outbreak risk among passengers and crew. Epidemiological parameters and latent disease states were inferred using an Ensemble Adjustment Kalman Filter calibrated to reported case data from WHO and ECDC situation reports. The estimated basic reproduction number was 2.76, with a 95% confidence interval of 2.52-2.99, indicating substantial potential for sustained onboard transmission before strict quarantine measures. Simulations further suggest that several exposed individuals may remain unidentified during the early outbreak phase, creating a hidden reservoir that symptom-based surveillance alone may fail to detect. These findings highlight the importance of rapid surveillance, widespread testing, targeted quarantine, and active monitoring of exposed individuals in confined travel settings. The proposed modeling framework can support timely outbreak assessment and intervention planning for infectious-disease events in similarly dense and spatially constrained populations.

9
Modelling between-cell heterogeneity in within-host influenza virus infection

Yan, A. W. C.; Riley, S.; McCaw, J. M.

2026-05-18 microbiology 10.64898/2026.05.17.725795 medRxiv
Top 0.8%
0.4%
Show abstract

Cell tropism, or the preference of a virus for particular cell types, has major implications for viral transmission, pathogenesis, and evolution. An increase in viral fitness -- increased within-host replication, also leading to increased transmission between hosts -- can result from a virus changing its cell tropism. This is illustrated in the context of influenza, where adaptation to infect cells expressing 2-6 linked sialic acid receptors enhances human-to-human transmissibility. Target cell populations differ not only in abundance but also in intrinsic properties such as susceptibility, viral production, and interferon responses, rendering the relationship between tropism and viral fitness multi-faceted and complex. Understanding how different cell tropisms quantitatively change fitness remains an important open question in virology and quantitative biology. Here, we present a within-host mathematical model that incorporates distinct target cell types differing in key properties, and examine how cell tropism affects viral fitness, as measured by metrics such as peak viral load, infection duration, or total virus produced. Our analysis reveals that tradeoffs may arise when cell types differ by multiple characteristics. We further demonstrate that model parameters describing heterogeneity between cell types can be more accurately inferred when cell type proportions are measured alongside viral load. Our findings provide a framework for assessing the links between viral evolution, cell tropism, and within-host fitness, and motivate the design of experiments to collect quantitative data on between-cell heterogeneity.

10
Geometric Kinematics of Human Eyes

Turski, J.

2026-05-10 neuroscience 10.64898/2026.04.10.716809 medRxiv
Top 0.8%
0.3%
Show abstract

In previous studies by the author on binocular vision with the asymmetric eye (AE), which models a healthy human eye with misaligned optical components, the results were primarily presented in the Rodrigues vector (RV) framework and supported by simulations and 3D visualizations in GeoGebras dynamic geometry environment. In this paper, the novel geometric kinematics of the human eye, that is, the eye with misaligned optics, and simplified assumptions about the eye rotations (the eyes translational movements are disregarded), are developed within the framework of rigid-body rotations. The originality of the analysis lies in a precise geometric decomposition of a full rotation of the eyes posture into a torsion-free rotation (the geodesic part) and a torsional rotation (the non-geodesic extension of the geodesic part). This decomposition is extended to the corresponding decomposition of the angular velocity. A novel derivation of the eyes angular velocity from the RV formulation of the eye kinematics is proposed.

11
Enhancing dengue diagnosis and surveillance by integrating machine learning technologies with the NS1 rapid test kit

Hwang, C.-K.; Chen, Y.-W.; WANG, Y.-T.; Ho, T.-S.; Oyang, Y.-J.

2026-05-06 health informatics 10.64898/2026.05.05.26352445 medRxiv
Top 0.8%
0.3%
Show abstract

BackgroundDengue has been a major health threat globally in recent years. In particular, dengue incidences continue to increase annually and the epidemic area has expanded primarily due to global warming. Therefore, effective case detection and surveillance strategies are crucial to tackle this global health challenge. In clinical practice, the rapid test kit detecting dengue non-structural protein 1 antigen and commonly referred as NS1, is widely employed for early diagnosis. However, real-world studies revealed that the sensitivity of the NS1 test kit ranged from approximately 61% to 95%. Since early diagnosis is really critical for disease surveillance in the early stage of a dengue epidemic, scientists have been working hard to develop novel diagnosis methods that can provide higher sensitivity levels. Methodology/Principal FindingsIn response to this challenge, in this study, we have developed a novel diagnosis procedure that integrates machine learning technologies with the NS1 test kit. Our experimental results revealed that we would be able to raise the sensitivity of the dengue diagnosis procedure to higher than 99% by incorporating machine learning based prediction models to screen the suspected patients with a negative NS1 result. Furthermore, the relative risks between the suspected patients who were predicted to be positive and those who were predicted to be negative exceeded 4.8. Conclusions/SignificanceThese results illustrate that the proposed approach provides an effective and efficient diagnosis procedure to address the global health challenge caused by spread of dengue. Author SummaryThis study has aimed to enhance surveillance of the dengue disease by integrating machine learning technologies with the rapid test kit commonly employed in early diagnosis. In clinical practice, the NS1 rapid test kit is widely employed for early diagnosis. However, real-world studies revealed that a certain percentage of the patients with a negative NS1 test result, ranging from 5% to 39%, were actually infected by dengue. Since early diagnosis is critical for disease control in the early stage of a dengue epidemic, scientists have been working hard to tackle this challenge. Based on this observation, this study was launched to investigate the effects of incorporating machine learning based prediction models to further screen those patients with a negative NS1 test result. The experimental results revealed that the proposed approach was able to identify over 99% of the patients who were infected by the dengue disease. Furthermore, the risk of the suspected patients who were predicted to be positive was 4.8 times higher than the risk of those who were predicted to be negative. The experimental results illustrate that the proposed approach provides an effective and efficient diagnosis procedure to enhance surveillance of the dengue disease.

12
A comparison of scalable approaches for the pairwise analysis of large pathogen genomic and spatial datasets: an application to studying Mycobacterium tuberculosis transmission

Lan, Y.; Wu, C.-Y.; Lin, H.-H.; Cohen, T.; Warren, J. L.

2026-05-21 microbiology 10.64898/2026.05.21.726848 medRxiv
Top 0.9%
0.3%
Show abstract

Pairwise analysis of genomic and spatial data offers opportunities to identify and estimate the associations between covariates and the transmission of pathogens between individuals. However, such pairwise analyses are computationally intensive, and may not be feasible to conduct given the high dyad count in even moderately sized datasets. Here we compare two approaches to increase the efficiency of pairwise analysis for large datasets. We quantify and compare the performance of divide-and-conquer Bayesian model fitting and pairwise case-control approaches for estimating associations between individual- and pair-level covariates and shared membership in a transmission cluster. We utilize a large dataset (n=4,154) of spatially-referenced, genomically-sequenced Mycobacterium tuberculosis isolates collected from a single city for this analysis. We find that the case-control approach produces unbiased estimates of effect sizes with expected credible interval coverage and is more robust than the divide-and-conquer method when effect sizes are large. Thus, we recommend using the case-control approach with at least three controls per case to downscale datasets for pairwise analysis when analysis of the entire dataset is not possible. This approach mitigates the computational challenges of pairwise Bayesian modeling on datasets that require significant computational resources while maintaining desired inferential properties. Author SummaryPairwise analyses of large datasets to study pathogen transmission are computationally demanding because they typically require simultaneous analysis of each possible pair of individuals in a dataset; as datasets become larger these analyses often are not feasible to conduct even with access to high-performance computing resources. In this work, we compare a case-control approach and divide-and-conquer approaches for more efficient pairwise analysis of large datasets. Using a large dataset of Mycobacterium tuberculosis isolates including genetic and spatial data, we investigate the performance of each method for estimating the associations between host covariates and genetic clustering of isolates. We find that the case-control approach is generally preferred over methods which first divide the data into subsets and then combine results. While additional extensions of these analyses are needed to test the generality of these findings to other data settings, this work provides a practical way forward for the pairwise analysis of large datasets to study pathogen transmission.

13
Pixel-Based Skin Tone Estimation on Dermoscopy: A Dual-Rater MST Benchmark and Feasibility Study

Kumarasinghe, A.; Bui, V.; Ghanbarzadeh, R.

2026-05-17 health informatics 10.64898/2026.05.13.26353004 medRxiv
Top 0.9%
0.3%
Show abstract

Skin-tone labels are absent from public dermoscopy benchmarks such as the International Skin Imaging Collaboration (ISIC), making it impossible to audit whether clinical AI performs equitably across skin tones. While several recent works estimate skin tone automatically from clinical photography and selfies, we ask whether this approach is feasible on dermoscopy, the primary imaging modality of these benchmarks. To answer this, we make three main contributions. First, we release MST-Derm, a dual-rater Monk Skin Tone (MST) annotation benchmark on 500 ISIC 2018 images. Raters were given an explicit unrateable option for crops where the skin surrounding the lesion was too occluded to label confidently. We find that 60% of images were marked unrateable, yielding a 193-image consensus subset (quadratic-weighted Cohen's Kappa = 0.82). Second, we conduct a systematic feasibility study of three pixel-based MST annotation pipelines spanning the principal families in prior work: palette matching in perceptual colour space, robust colour statistics, and projection to a 1D colorimetric scalar. All three pipelines produce ordinal signal above chance (95% confidence intervals on quadratic-weighted Kappa exclude zero). However, ISIC 2018's extreme light-skin bias leaves 82% of the evaluation set at MST 2, giving a constant "always predict MST 2" baseline an accuracy floor the methods cannot overcome. To separate algorithmic signal from dataset bias, we evaluate on a class-balanced subset. The best method reaches quadratic-weighted Kappa = 0.43 against the trivial baseline of Kappa = 0.00, confirming the signal is genuine. Third, we diagnose this performance ceiling. We trace the bottleneck to two causes: dermoscopy's specialised illumination physically compresses the colour range on which lighter skin tones differ, and ISIC's dataset skew makes standard absolute-accuracy metrics uninformative. We conclude that while pixel-based colour features carry real MST signal on dermoscopy, current performance is insufficient for autonomous annotation. We release the benchmark, annotation protocol, all prediction runs, and analysis code to facilitate the development of robust skin-tone estimators, a vital prerequisite for accurately auditing fairness and mitigating bias in dermatological machine learning.

14
A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.

2026-05-26 health informatics 10.64898/2026.05.18.26352989 medRxiv
Top 0.9%
0.3%
Show abstract

Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.

15
Machine Learning and Explainable AI for Multi-State Classification of Malaria Transmission Dynamics in Kenya

Gogo, J. A.; Wanyonyi, M.

2026-05-12 health informatics 10.64898/2026.05.09.26352789 medRxiv
Top 0.9%
0.3%
Show abstract

Malaria remains a major public health challenge in sub-Saharan Africa, with pronounced spatial and temporal variation in transmission intensity that complicates effective control strategies. Accurate classification of transmission states is essential for guiding targeted interventions and strengthening early warning systems. This study develops a machine learning framework for the classification of malaria transmission states in Kenya using monthly panel data from 47 counties spanning the period 2015 to 2025. Transmission was categorised into four operationally relevant states based on incidence thresholds. Four supervised learning models, namely multinomial logistic regression, random forest, extreme gradient boosting, and support vector machine, were trained using temporally lagged features and evaluated under a forward chaining validation scheme to preserve temporal structure. Model performance was assessed using accuracy, macro averaged F1 score, Matthews correlation coefficient, and Brier score, complemented by calibration analysis. Extreme gradient boosting achieved the best overall performance, with accuracy of 0.9918, macro averaged F1 score of 0.9647, and Matthews correlation coefficient of 0.9831, alongside the lowest Brier score of 0.0031, indicating highly reliable probability estimates. Feature importance analysis revealed that lagged incidence, vegetation index, precipitation, and insecticide treated net coverage were the most influential predictors. Partial dependence analysis demonstrated nonlinear relationships and clear seasonal patterns in transmission dynamics. The findings show that machine learning approaches can accurately classify malaria transmission states while providing interpretable and well calibrated outputs for decision making. This framework offers a practical tool for supporting malaria surveillance and resource allocation. Further validation in different epidemiological settings is recommended to assess generalisability.

16
The covariance matrix of metapopulation disease models and applications to early warning signals

Looker, J.; Rock, K. S.; Dyson, L.

2026-05-12 epidemiology 10.64898/2026.05.08.26352721 medRxiv
Top 1.0%
0.3%
Show abstract

Infectious disease time series often show signs of epidemic transitions, such as the peaks and troughs of the time series. In these time series, key system parameters can lead to catastrophic changes in the dynamical system behaviour (often called critical transitions). Modellers have increasingly shown that early warning signals can anticipate these transitions, both critical and non-critical, in infectious disease time series. Existing methods, however, generally focus on univariate time series data, or ignore spatiotemporal patterns that may be present as a disease spreads through a population. Recent ecological literature developments expand existing temporal and spatial methods to consider the covariance matrix of multiple, related time series. However, many of these proposed signals still make an assumption of stationary time series/system equilibrium. Whilst often true in ecological modelling, disease systems are seldom at equilibrium. In this paper, we propose the usage of the eigendecomposition of the non-stationary covariance matrix as a more suitable early warning signal for epidemiological data. We first analyse the expected trends in the eigenvalues and eigenbasis of the covariance matrix on approach to a transition. Next we apply these methods to a spatially-structured susceptible-infectious-recovered model to explore how the eigenbasis may provide extra information to modellers. Finally, we test these methods on SARS-CoV-2 case data during the 2020-2021 pandemic period in England.

17
SEIR-IoT cyber-physical architecture with dual parametric coupling for epidemic scenario simulation using synthetic biomedical signals

Martinez Campo, S. D.; Campo-Ariza, F. M.; Martinez Campo, J. A.; Cormane, M.

2026-05-10 epidemiology 10.64898/2026.05.06.26352603 medRxiv
Top 1%
0.2%
Show abstract

This study presents a proof-of-concept cyber-physical architecture integrating a SEIR epidemiological model (Susceptible-Exposed-Infectious-Recovered), implemented in MATLAB, with a simulated Internet of Things (IoT) acquisition and transmission stage based on the ESP32 microcontroller and the ThingSpeak platform. The system generates synthetic biomedical signals of body temperature and peripheral oxygen saturation (SpO2), structured across three levels: circadian variation, scheduled pathological episodes, and Gaussian noise. These signals feed a dual parametric coupling function that dynamically updates the SEIR transmission parameter as a combined function of body temperature and oxygen saturation deviations from their clinical reference values. The proposed architecture is organized into four functional phases: measurement, communication, computational processing, and feedback. Five simulated clinical scenarios were evaluated, ranging from normal conditions (T = 36.5 {degrees}C, SpO2 = 97%) to fever with severe hypoxia (T = 38.5 {degrees}C, SpO2 = 88%), yielding basic reproduction number (R0) values between 4.20 and 5.38, and peak infected proportions between 29.9% and 35.2% of the simulated population (N = 1,000). A sensitivity analysis on the coupling coefficients, with {+/-}50% variation from nominal values, showed that the oxygen saturation coefficient is the most influential parameter on R0 (range = 0.76) compared to the thermal coefficient (range = 0.42), with monotonic and predictable behavior across the entire evaluated parametric space. The primary contribution of this work is system integration: we propose a reproducible platform connecting biomedical simulation, IoT communication, and epidemiological modeling through parametric coupling in a controlled environment. All data used are entirely synthetic; a retrospective calibration with real Colombian data from the first epidemic wave of 2020 confirmed the epidemiological consistency of the model, with a calibrated R0 of 1.85 and a Pearson correlation of 0.930. Results should be interpreted as evidence of architectural feasibility, not as clinical or epidemiological validation. Author SummaryThe COVID-19 pandemic made it clear that epidemiological surveillance systems need tools that combine accessible technology with mathematical models capable of anticipating disease spread. In this work, we built a proof-of-concept platform connecting three elements: a low-cost electronic sensor based on the ESP32 microcontroller, a cloud communication platform (ThingSpeak), and a mathematical model that simulates how an epidemic spreads through a population. The sensor generates synthetic data on body temperature and oxygen saturation that, through a mathematical formula we designed, dynamically modify the rate of contagion in the model. We evaluated five clinical scenarios, ranging from normal conditions to fever with severe hypoxia, and analyzed how sensitive the results are to changes in the system parameters. We found that oxygen saturation has a greater influence on the estimated contagion potential than body temperature. Although all data are synthetic, this platform demonstrates that it is possible to integrate low-cost sensors with epidemiological models in real time, opening a viable pathway for early warning systems in resource-limited settings.

18
Synthetic-data augmented calibration for expert-informed rare disease models

Yang, H.; Rachel, T.; Litwin, T.; Karakioulaki, M.; Reimer-Taschenbrecker, A.; Timmer, J.; Has, C.; Binder, H.; Hess, M.

2026-05-20 bioinformatics 10.64898/2026.05.18.725833 medRxiv
Top 1%
0.2%
Show abstract

Clinical data for rare diseases are sparse, noisy, and heterogeneous, complicating calibration of ordinary differential equation (ODE) models. Thus, we introduce a noise-robust calibration in latent space that combines expertderived ODEs with learned latent representations. Our approach leverages synthetic ODE trajectories, augmenting our scarce observations to train a model-specific autoencoder representation and imputer. During calibration, observed and ODE-generated trajectories are compared in latent space, and ODE parameters are updated by minimizing their latent distance. In a controlled ABCDE simulation model, the imputer outperformed a carry-forward baseline for moderate parameter shifts, parameter recovery remained stable under random missingness, calibration remained robust to additional noise variables despite reduced downstream identifiability, and distinct dynamics formed visually separable latent trajectories. On a custom developed ODE model for real Epidermolysis Bullosa patients, the calibrated phenomenological model reproduced patient-level trajectories from sparse observations. Thus, we conclude that our latent-space calibration approach supports rare-disease modeling.

19
Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.

2026-06-01 health informatics 10.64898/2026.05.29.26354442 medRxiv
Top 1%
0.2%
Show abstract

Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.

20
MOSAIC: Model-based, Subgroup-Aware Identification of Driver Mutations in Cancer

Campbell, K.; Reyna, M. A.

2026-05-03 bioinformatics 10.64898/2026.04.29.721672 medRxiv
Top 1%
0.2%
Show abstract

In cancer genomics, recurrent patterns of mutual exclusivity within a gene set can indicate shared biological context and involvement in tumorigenesis. However, existing methods are not designed to distinguish between mutual exclusivity arising from meaningful biological interactions from those influenced by heterogeneity between underlying patient subpopulations. In this work, we introduce MOSAIC, a novel statistical framework that models patient subgroup heterogeneity in mutual exclusivity analyses. In experiments with simulated data and real data from The Cancer Genome Atlas, we show that MOSAIC amplifies subgroup-specific mutual exclusivity signals, including between IDH1 and IDH2 in young low grade glioma patients, while reducing the effect of signals produced by underlying subgroup structures, such as distinct genomic lineages associated with histological subtypes of endometrial cancer. Finally, we demonstrate that MOSAIC is more powerful than existing p-value combination methods for patient subgroup stratification. MOSAIC is available as an open-source tool at https://github.com/reynalab/mosaic.