Data-Driven Hybrid Model of SARIMA-CNNAR For Tuberculosis Incidence Time Series Analysis in Nepal

Singh, D. B.; Dawadi, P. R.; Dangi, Y.

2026-02-24 health informatics

10.64898/2026.02.22.26346853 medRxiv

Show abstract

BackgroundTuberculosis (TB) remains a major public health challenge in Nepal, with incidence rates substantially higher than global estimates. Accurate forecasting of TB incidence is essential for early warning systems, resource allocation, and targeted interventions. This study aimed to develop and validate a hybrid Seasonal Autoregressive Integrated Moving Average (SARIMA) and Convolutional Neural Network Auto-Regressive (CNNAR) model for TB incidence forecasting in Nepal. MethodsMonthly TB incidence data (January 2015 to December 2024) were obtained from the National Tuberculosis Control Center (NTCC), Nepal. A hybrid SARIMA-CNNAR model was developed, where SARIMA modeled linear seasonal trends and CNNAR captured nonlinear patterns in the residuals. Hyperparameters were optimized using grid search with 5-fold cross-validation. Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and R2 on the 2024 test set. Structural break analysis and sensitivity analysis assessed model robustness. The hybrid model was compared against standalone SARIMA, CNNAR, and three state-of-the-art benchmarks: Long Short-Term Memory (LSTM), Facebook Prophet, and XGBoost. ResultsTB incidence in Nepal increased from a monthly average of 2,048 cases in 2015 to 3,447 in 2024 (68.4% increase). The hybrid SARIMA-CNNAR model demonstrated strong performance with test set metrics of MAE=248.35, RMSE=294.31, MAPE=7.2%, and R2=0.79. Comparative performance: CNNAR (MAE=251.08, RMSE=336.55, MAPE=7.7%, R2=0.73); LSTM (MAE=267.91, RMSE=324.55, MAPE=7.5%, R2=0.75); XGBoost (MAE=314.74, RMSE=373.99, MAPE=8.5%, R2=0.66); Prophet (MAE=371.15, RMSE=478.40, MAPE=10.4%, R2=0.45); SARIMA (MAE=401.11, RMSE=503.93, MAPE=10.99%, R2=0.39). All models captured seasonal peaks in March-May and July-August, with forecasts for 2025 indicating continued seasonal patterns. Sensitivity analysis confirmed robustness with <5% metric variation across parameter configurations. ConclusionsThis first validated hybrid model for TB prediction in Nepal demonstrates high forecasting accuracy by integrating linear seasonal modeling with nonlinear pattern detection. The approach offers a robust tool for evidence-based public health planning in resource-limited settings and it is suitable for integration into national surveillance systems. Author SummaryTuberculosis remains a major public health challenge in Nepal, with cases increasing substantially over the past-decade. In this study, we developed a computer model that combines two different forecasting ap proaches: one that captures regular seasonal patterns and another that learns complex trends from data to predict monthly TB cases. Using ten years of national surveillance data, our hybrid model achieved high accuracy in forecasting TB incidence, outperforming standard approaches including SARIMA, PROPHET, CNNAR, LSTM neural networks, and XGBoost. The model successfully predicted seasonal peaks in March-May and July-August, with forecasts for 2025 suggesting continued high case numbers. These predictions can help Nepals health authorities prepare by pre-positioning diagnostic supplies, scheduling additional staffs during peak months, and targeting awareness campaigns. The modeling approach is desig ned to be adaptable for other diseases and countries with similar health data.

Data-Driven Hybrid Model of SARIMA-CNNAR For Tuberculosis Incidence Time Series Analysis in Nepal

Matching journals