Back

Data-Driven Hybrid Model of SARIMA-CNNAR For Tuberculosis Incidence Time Series Analysis in Nepal

Singh, D. B.; Dawadi, P. R.; Dangi, Y.

2026-02-24 health informatics
10.64898/2026.02.22.26346853 medRxiv
Show abstract

BackgroundTuberculosis (TB) remains a major public health challenge in Nepal, with incidence rates substantially higher than global estimates. Accurate forecasting of TB incidence is essential for early warning systems, resource allocation, and targeted interventions. This study aimed to develop and validate a hybrid Seasonal Autoregressive Integrated Moving Average (SARIMA) and Convolutional Neural Network Auto-Regressive (CNNAR) model for TB incidence forecasting in Nepal. MethodsMonthly TB incidence data (January 2015 to December 2024) were obtained from the National Tuberculosis Control Center (NTCC), Nepal. A hybrid SARIMA-CNNAR model was developed, where SARIMA modeled linear seasonal trends and CNNAR captured nonlinear patterns in the residuals. Hyperparameters were optimized using grid search with 5-fold cross-validation. Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and R2 on the 2024 test set. Structural break analysis and sensitivity analysis assessed model robustness. The hybrid model was compared against standalone SARIMA, CNNAR, and three state-of-the-art benchmarks: Long Short-Term Memory (LSTM), Facebook Prophet, and XGBoost. ResultsTB incidence in Nepal increased from a monthly average of 2,048 cases in 2015 to 3,447 in 2024 (68.4% increase). The hybrid SARIMA-CNNAR model demonstrated strong performance with test set metrics of MAE=248.35, RMSE=294.31, MAPE=7.2%, and R2=0.79. Comparative performance: CNNAR (MAE=251.08, RMSE=336.55, MAPE=7.7%, R2=0.73); LSTM (MAE=267.91, RMSE=324.55, MAPE=7.5%, R2=0.75); XGBoost (MAE=314.74, RMSE=373.99, MAPE=8.5%, R2=0.66); Prophet (MAE=371.15, RMSE=478.40, MAPE=10.4%, R2=0.45); SARIMA (MAE=401.11, RMSE=503.93, MAPE=10.99%, R2=0.39). All models captured seasonal peaks in March-May and July-August, with forecasts for 2025 indicating continued seasonal patterns. Sensitivity analysis confirmed robustness with <5% metric variation across parameter configurations. ConclusionsThis first validated hybrid model for TB prediction in Nepal demonstrates high forecasting accuracy by integrating linear seasonal modeling with nonlinear pattern detection. The approach offers a robust tool for evidence-based public health planning in resource-limited settings and it is suitable for integration into national surveillance systems. Author SummaryTuberculosis remains a major public health challenge in Nepal, with cases increasing substantially over the past-decade. In this study, we developed a computer model that combines two different forecasting ap proaches: one that captures regular seasonal patterns and another that learns complex trends from data to predict monthly TB cases. Using ten years of national surveillance data, our hybrid model achieved high accuracy in forecasting TB incidence, outperforming standard approaches including SARIMA, PROPHET, CNNAR, LSTM neural networks, and XGBoost. The model successfully predicted seasonal peaks in March-May and July-August, with forecasts for 2025 suggesting continued high case numbers. These predictions can help Nepals health authorities prepare by pre-positioning diagnostic supplies, scheduling additional staffs during peak months, and targeting awareness campaigns. The modeling approach is desig ned to be adaptable for other diseases and countries with similar health data.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.2%
10.2%
2
PLOS ONE
4510 papers in training set
Top 18%
10.2%
3
BMC Infectious Diseases
118 papers in training set
Top 0.1%
8.5%
4
JMIR Public Health and Surveillance
45 papers in training set
Top 0.1%
8.5%
5
Scientific Reports
3102 papers in training set
Top 12%
7.2%
6
International Journal of Medical Informatics
25 papers in training set
Top 0.1%
6.9%
50% of probability mass above
7
Frontiers in Public Health
140 papers in training set
Top 2%
3.6%
8
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.6%
9
PLOS Neglected Tropical Diseases
378 papers in training set
Top 2%
3.6%
10
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
11
Journal of Infection
71 papers in training set
Top 0.7%
2.6%
12
BMC Medicine
163 papers in training set
Top 2%
2.4%
13
Wellcome Open Research
57 papers in training set
Top 0.7%
1.8%
14
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.5%
15
Bioinformatics
1061 papers in training set
Top 8%
1.5%
16
European Respiratory Journal
54 papers in training set
Top 1%
1.3%
17
Frontiers in Digital Health
20 papers in training set
Top 1%
0.9%
18
Parasites & Vectors
57 papers in training set
Top 1%
0.8%
19
Royal Society Open Science
193 papers in training set
Top 5%
0.8%
20
Frontiers in Microbiology
375 papers in training set
Top 9%
0.8%
21
BMC Public Health
147 papers in training set
Top 6%
0.7%
22
PeerJ
261 papers in training set
Top 16%
0.7%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
24
Epidemiology and Infection
84 papers in training set
Top 4%
0.6%
25
eClinicalMedicine
55 papers in training set
Top 2%
0.6%
26
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.6%
27
Infectious Disease Modelling
50 papers in training set
Top 1%
0.5%
28
PLOS Global Public Health
293 papers in training set
Top 7%
0.5%
29
Viruses
318 papers in training set
Top 7%
0.5%
30
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%