TDA Engine v2.1: A Computational Framework for Detecting Structural Voids in Spatially Censored Epidemiological Data with Temporal Classification and Causal Inference

Mboya, G. O.

2026-03-05 health informatics

10.64898/2026.02.01.26345283 medRxiv

Show abstract

BackgroundIn public health surveillance, silence--the absence of data--is often more significant than the signal. Traditional epidemiological mapping tools efficiently visualize data density but struggle to mathematically define data absence. Standard approaches conflate stochastic sparsity with systemic suppression and remain vulnerable to edge effects. MethodsWe introduce a topological framework that detects structural voids--regions of unexpected data absence within clusters. Using Distance-to-Measure (DTM) filtration with adaptive thresholding via the Kneedle algorithm [11], we eliminate arbitrary parameter choices. Version 2.1 extends the original framework with three methodological additions: (1) a temporal void classifier combining the Fano factor and a two-state Hidden Markov Model (HMM) to distinguish persistent structural silence from stochastic fluctuation across reporting periods; (2) a causal taxonomy (BORDER, ACCESS, INFRASTRUCTURE, SYSTEM, UNKNOWN) that maps detected voids to probable reporting failure mechanisms via covariate decision trees; and (3) an Observed-to-Expected (O/E) completeness engine calibrated against WHO-standard disease incidence rates across seven conditions. Parameters are derived geometrically from the DTM distribution itself. We validate against known ground truth through a censoring simulation framework using public Kenyan health facility data. Detection accuracy is quantified using the Jaccard index [12], centroid error, and recovery rate. ResultsTDA Engine achieves Jaccard = 0.82 (95% CI: 0.74-0.89) on simulated suppression events, significantly outperforming KDE (0.45) and relative risk surfaces (0.38). Centroid error is 342 m (IQR: 187-512 m). The temporal classifier correctly labels 91% of structurally silent units across six-period validation datasets (HMM posterior P (structural) [≥]0.60). Permutation tests yield p = 0.003 (95% CI: 0.001-0.008) [13], confirming statistical significance beyond complete spatial randomness. ConclusionTDA Engine v2.1 provides a mathematically rigorous, topology-based framework for detecting structural voids in censored epidemiological data and classifying them by temporal persistence and probable causal mechanism. By shifting from density-based to geometry-based inference with quantitative validation metrics and causal labelling, we enable public health officials to distinguish between natural gaps and potential suppression, and to direct field investigation resources accordingly. We emphasize that structural voids are geometric anomalies consistent with suppression, not proof thereof--requiring contextual validation.

TDA Engine v2.1: A Computational Framework for Detecting Structural Voids in Spatially Censored Epidemiological Data with Temporal Classification and Causal Inference

Matching journals