IDTrack: Time- and Namespace-Aware Identifier Harmonization for Reproducible Workflows
Inecik, K.; Erken, E.; Theis, F. J.
Show abstract
MotivationReproducibility in computational biology fails silently when gene identifiers drift beneath unchanged analysis code: the same frozen pipeline, rerun months later, yields different results not because biology evolved but because identifier semantics shifted with upstream annotation releases--a failure mode invisible to version control and containerization because the mapping layer itself constitutes an undeclared coordinate system whose time axis advances independently of downstream workflows. Gene identifiers occupy positions in a joint space of namespace, annotation release, genome assembly, and entity layer that evolves through retirements, merges, splits, and nomenclature reassignments, so atlas integration, retrospective reanalysis, and perturbation screens inherit temporal dependencies that existing utilities cannot surface: current mappers answer what an identifier resolves to now rather than under what declared contract the feature space was constructed. ResultsIDTrack reconceptualizes identifier harmonization as a time-indexed coordinate transformation by materializing annotation release history into a snapshot-bounded identifier graph and solving conversions through a time-traveling, contract-constrained pathfinder that pins release boundaries, assembly contexts, and ambiguity policies as explicit parameters rather than implicit endpoint state. This architecture surfaces reachability and ambiguity as interpretable outcome classes--unmapped, uniquely resolved, or ambiguously multi-target--enables atlas-scale harmonization with explicit collision handling, and records every mapping decision in a provenance ledger that transforms invisible preprocessing into citable methodological infrastructure whose coordinate choices can be inspected, compared, and reproduced rather than lost as ephemeral preprocessing. Availability and ImplementationCode: https://github.com/theislab/idtrack; package: pip install idtrack. Contactkemal.inecik@helmholtz-munich.de; erkmenerken22@ku.edu.tr; fabian.theis@helmholtz-munich.de. Supplementary InformationSupplementary material elaborates on architectural decisions and implementation details.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.