Automated Calculation of the Disruption Index: A Reproducible Computational Workflow for Large-Scale Bibliometric Analyses
Braga Apolinario, A.; Vieira, K. V.; Costa, A. K. M. M.; Freitas, L. C.; Pinheiro, I. S.; Vitral, R. W. F.; Campos, M. J. d. S.
Show abstract
Bibliometric analyses have become essential for understanding scientific production and innovation dynamics; however, large-scale applications remain limited by challenges related to data extraction, preprocessing, citation network reconstruction, and reproducibility, particularly when using PubMed-indexed records. This study presents a fully automated and reproducible computational workflow for large-scale bibliometric analyses based on the Disruption Index (DI). The pipeline enables systematic retrieval of PubMed data, standardized metadata processing, construction of citation networks, and calculation of DI values within a fixed post-publication citation window. Implemented in Python, the workflow integrates automated querying, XML parsing, data consolidation, and network-based citation classification, allowing scalable and transparent analyses that are infeasible through manual approaches. In a demonstrative application focused on orthodontic literature, the pipeline processed more than 67,000 articles and reconstructed over 300,000 citation relationships, resulting in a final analytical sample of 3,234 articles with indexed references and citations. The automated framework ensures methodological transparency, facilitates replication, and substantially reduces the time and technical barriers associated with advanced bibliometric studies. By providing an open and extensible solution for calculating the Disruption Index at scale, this workflow supports robust assessments of scientific innovation and consolidation and can be readily adapted to other biomedical research domains indexed in PubMed.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.