Analyzing the Naming Conventions of Life Science Data Resources to Inform Human and Computational Findability
Imker, H. J.; Ou, H.
Show abstract
This study aimed to evaluate the names of life science data resources and consider the impacts on findability, a core feature of the FAIR (Findability, Accessibility, Interoperability, and Reusability) Principles. Utilizing a previously published list of unique data resources, we identified and validated data resources with both common and full names available (n = 1153). From this set, we analyzed characteristics of resource names to identify if any naming conventions have emerged organically. Additionally, since common names are often used in the absence of a resources full name, we performed a test to evaluate our ability to infer any meaning from common names. Our results highlight suboptimal naming practices and a wide-spread opaqueness in common names, which poses challenges to resource identification and retrieval by both human-and computationally-centric methods. These results are informative for those who establish and promote data resources as well as for those who search for data to use in individual research projects, develop data discovery systems, analyze the scientific literature, or assess research infrastructure. The findings underscore the value of findability in the FAIR Principles and the current efforts to develop infrastructure that supports more efficient communication and global connectedness.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.