Back

CellExLink: End-to-end cell-type recognition and normalization in biomedical text

Nabijiang, A.; Shahriyari, L.

2026-05-29 bioinformatics
10.64898/2026.05.26.728013 bioRxiv
Show abstract

Since cells are the main components of many biological and biomedical studies, cell-type extraction is an important task in biomedical text mining. However, current biomedical text-mining systems either do not explicitly support cell-type extraction, provide limited support for Cell Ontology normalization, or show limited performance in end-to-end cell-type extraction. These limitations can affect downstream tasks that depend on reliable cell-type information. Here, we present CellExLink, an end-to-end biomedical natural language processing pipeline designed specifically for cell-type recognition and Cell Ontology normalization in biomedical text. The pipeline is designed to improve extraction accuracy and practical usability in literature-mining workflows, while accounting for computational efficiency in its recognition and normalization design. We evaluate CellExLink across heterogeneous biomedical corpora and compare it with established and recent biomedical text-mining tools. The results show that CellExLink provides reliable cell-type recognition, Cell Ontology normalization, and end-to-end extraction across these corpora. By addressing the need for reliable end-to-end cell-type recognition and Cell Ontology normalization, CellExLink can support downstream tasks such as curation, search, relation extraction, and knowledge graph construction. Author summaryCell types are central to biomedical research, but biomedical papers often use different names, abbreviations, and synonyms for the same cell type. This variation makes it difficult for automated processes to collect and compare cell-type information across papers. Reliable automated extraction is important because literature mining requires consistent cell-type identification before evidence from different studies can be searched, integrated, or reused. Existing off-the-shelf biomedical text-mining tools provide useful functionality, but their ability to support cell-type extraction remains limited and inconsistent. To address this gap, we developed CellExLink, a pipeline that finds cell-type entities in biomedical text and links them to standard Cell Ontology identifiers. We evaluated the pipeline on several biomedical corpora and compared it with existing tools that support cell-type extraction. Across these evaluations, CellExLink showed clear accuracy gains in both detecting cell-type entities and assigning correct standard identifiers. Together, these gains make CellExLink a powerful tool for extracting reliable standardized cell-type information from large collections of papers, supporting literature curation, relation extraction, knowledge graph construction, and studies of cell-type-specific roles in diseases, drug responses, and biological pathways.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.