Back

TDA Engine v2.1: A Computational Framework for Detecting Structural Voids in Spatially Censored Epidemiological Data with Temporal Classification and Causal Inference

Mboya, G. O.

2026-03-05 health informatics
10.64898/2026.02.01.26345283 medRxiv
Show abstract

BackgroundIn public health surveillance, silence--the absence of data--is often more significant than the signal. Traditional epidemiological mapping tools efficiently visualize data density but struggle to mathematically define data absence. Standard approaches conflate stochastic sparsity with systemic suppression and remain vulnerable to edge effects. MethodsWe introduce a topological framework that detects structural voids--regions of unexpected data absence within clusters. Using Distance-to-Measure (DTM) filtration with adaptive thresholding via the Kneedle algorithm [11], we eliminate arbitrary parameter choices. Version 2.1 extends the original framework with three methodological additions: (1) a temporal void classifier combining the Fano factor and a two-state Hidden Markov Model (HMM) to distinguish persistent structural silence from stochastic fluctuation across reporting periods; (2) a causal taxonomy (BORDER, ACCESS, INFRASTRUCTURE, SYSTEM, UNKNOWN) that maps detected voids to probable reporting failure mechanisms via covariate decision trees; and (3) an Observed-to-Expected (O/E) completeness engine calibrated against WHO-standard disease incidence rates across seven conditions. Parameters are derived geometrically from the DTM distribution itself. We validate against known ground truth through a censoring simulation framework using public Kenyan health facility data. Detection accuracy is quantified using the Jaccard index [12], centroid error, and recovery rate. ResultsTDA Engine achieves Jaccard = 0.82 (95% CI: 0.74-0.89) on simulated suppression events, significantly outperforming KDE (0.45) and relative risk surfaces (0.38). Centroid error is 342 m (IQR: 187-512 m). The temporal classifier correctly labels 91% of structurally silent units across six-period validation datasets (HMM posterior P (structural) [≥]0.60). Permutation tests yield p = 0.003 (95% CI: 0.001-0.008) [13], confirming statistical significance beyond complete spatial randomness. ConclusionTDA Engine v2.1 provides a mathematically rigorous, topology-based framework for detecting structural voids in censored epidemiological data and classifying them by temporal persistence and probable causal mechanism. By shifting from density-based to geometry-based inference with quantitative validation metrics and causal labelling, we enable public health officials to distinguish between natural gaps and potential suppression, and to direct field investigation resources accordingly. We emphasize that structural voids are geometric anomalies consistent with suppression, not proof thereof--requiring contextual validation.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 13%
14.6%
2
BMC Infectious Diseases
118 papers in training set
Top 0.1%
10.0%
3
PLOS Digital Health
91 papers in training set
Top 0.2%
10.0%
4
PLOS Computational Biology
1633 papers in training set
Top 3%
10.0%
5
Scientific Reports
3102 papers in training set
Top 10%
8.3%
50% of probability mass above
6
JMIR Public Health and Surveillance
45 papers in training set
Top 0.1%
6.7%
7
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.2%
8
Journal of Medical Internet Research
85 papers in training set
Top 2%
3.0%
9
JAMIA Open
37 papers in training set
Top 0.5%
2.7%
10
Wellcome Open Research
57 papers in training set
Top 0.6%
2.1%
11
Patterns
70 papers in training set
Top 0.7%
1.9%
12
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.7%
13
Royal Society Open Science
193 papers in training set
Top 2%
1.7%
14
International Journal of Medical Informatics
25 papers in training set
Top 0.9%
1.7%
15
Epidemiology and Infection
84 papers in training set
Top 2%
1.3%
16
BMJ Open
554 papers in training set
Top 10%
1.3%
17
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.3%
18
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
19
Nature Communications
4913 papers in training set
Top 60%
0.9%
20
SoftwareX
15 papers in training set
Top 0.4%
0.8%
21
Epidemics
104 papers in training set
Top 2%
0.7%
22
PLOS Neglected Tropical Diseases
378 papers in training set
Top 5%
0.7%
23
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
15 papers in training set
Top 0.9%
0.7%