Back

Analytical Centralization of Health Expenditure at the National Administrator of Health System Resources: Architecture, Data Quality, and Operational Performance of the ADRES Health System Analytics Platform, Colombia

Garavito Jimenez, D. A.; Bello Angulo, D. E.; Mejia Lemus, L. T.; Chipatecua, D.; Fula, D. D.; Perez-Rubiano, S.; Martinez, F. L.; Bohorquez Pinzon, J. C.

2026-06-10 public and global health
10.64898/2026.06.08.26355159 medRxiv
Show abstract

Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS -- Registro Background Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded RIPS records (FEV-RIPS) as the standard for financial and clinical data exchange. ADRES -- the entity responsible for administering the resources of Colombia's General Social Security Health System -- faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers. Health systems in high-income countries converge clinical-financial data in consolidated platforms; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and no centralized analytical infrastructure until 2023. Objective We describe the design, technical challenges of integrating heterogeneous data, and operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates. Methods Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January-November 2025. We report indicators of data volume, processing speed, computational capacity, concurrent use by functional group, and governance structure. The architecture integrates VPN connectivity with MinSalud, automated processing of multiple formats (XML, relational tables, flat files), and a medallion data lake (Bronze/Silver/Gold). Data quality challenges include structural inconsistencies across sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and absent technical documentation. Results The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds using clusters of up to 32 TB RAM and 4,096 vCPU. During September-October 2025, monthly query peaks reached 78,028 across eleven functional groups. Integration required Python/PySpark parsers for variable-depth XML, equivalence tables for incompatible municipality codes, cleaning routines for extreme dates used as nulls (1900-01-01, 9999-12-31), and transformation logic bridging classic RIPS and FEV-RIPS. The platform supported econometric analyses, judicial mandate responses, and public interactive dashboards. Conversational AI integration (Genie, Copilot) extends analytical access to users without SQL knowledge. Conclusions ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. Centralizing health system information at national scale is technically feasible under public institutional constraints -- but requires solving cross-source standardization problems the implementation literature does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
9.0%
2
JMIR Public Health and Surveillance
45 papers in training set
Top 0.1%
8.2%
3
PLOS ONE
4510 papers in training set
Top 22%
8.2%
4
Journal of Medical Internet Research
85 papers in training set
Top 0.5%
8.0%
5
Frontiers in Public Health
140 papers in training set
Top 0.9%
6.2%
6
BMC Health Services Research
42 papers in training set
Top 0.3%
6.2%
7
npj Digital Medicine
97 papers in training set
Top 1%
4.8%
50% of probability mass above
8
PLOS Digital Health
91 papers in training set
Top 0.8%
3.5%
9
PLOS Global Public Health
293 papers in training set
Top 3%
2.5%
10
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.5%
11
The Lancet Digital Health
25 papers in training set
Top 0.3%
1.8%
12
JMIRx Med
31 papers in training set
Top 0.6%
1.7%
13
International Journal of Medical Informatics
25 papers in training set
Top 0.8%
1.7%
14
BMC Medical Research Methodology
43 papers in training set
Top 0.5%
1.7%
15
JMIR Medical Informatics
17 papers in training set
Top 0.7%
1.7%
16
Frontiers in Digital Health
20 papers in training set
Top 0.7%
1.7%
17
GigaScience
172 papers in training set
Top 2%
1.6%
18
International Journal of Epidemiology
74 papers in training set
Top 2%
1.3%
19
Scientific Reports
3102 papers in training set
Top 64%
1.3%
20
BMJ Open
554 papers in training set
Top 10%
1.3%
21
Wellcome Open Research
57 papers in training set
Top 1%
1.2%
22
eLife
5422 papers in training set
Top 52%
0.9%
23
Disaster Medicine and Public Health Preparedness
16 papers in training set
Top 1%
0.9%
24
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
25
JMIR Formative Research
32 papers in training set
Top 1%
0.9%
26
DIGITAL HEALTH
12 papers in training set
Top 0.6%
0.8%
27
BMJ Global Health
98 papers in training set
Top 3%
0.8%
28
BMC Infectious Diseases
118 papers in training set
Top 5%
0.7%
29
BMJ Open Quality
15 papers in training set
Top 0.9%
0.7%
30
F1000Research
79 papers in training set
Top 5%
0.7%