Back

Analyzing the Naming Conventions of Life Science Data Resources to Inform Human and Computational Findability

Imker, H. J.; Ou, H.

2025-10-04 bioinformatics
10.1101/2025.10.02.680112 bioRxiv
Show abstract

This study aimed to evaluate the names of life science data resources and consider the impacts on findability, a core feature of the FAIR (Findability, Accessibility, Interoperability, and Reusability) Principles. Utilizing a previously published list of unique data resources, we identified and validated data resources with both common and full names available (n = 1153). From this set, we analyzed characteristics of resource names to identify if any naming conventions have emerged organically. Additionally, since common names are often used in the absence of a resources full name, we performed a test to evaluate our ability to infer any meaning from common names. Our results highlight suboptimal naming practices and a wide-spread opaqueness in common names, which poses challenges to resource identification and retrieval by both human-and computationally-centric methods. These results are informative for those who establish and promote data resources as well as for those who search for data to use in individual research projects, develop data discovery systems, analyze the scientific literature, or assess research infrastructure. The findings underscore the value of findability in the FAIR Principles and the current efforts to develop infrastructure that supports more efficient communication and global connectedness.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
GigaScience
172 papers in training set
Top 0.1%
23.6%
2
Scientific Data
174 papers in training set
Top 0.1%
18.3%
3
PLOS ONE
4510 papers in training set
Top 24%
7.1%
4
PeerJ
261 papers in training set
Top 2%
4.1%
50% of probability mass above
5
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.1%
6
Scientific Reports
3102 papers in training set
Top 40%
3.2%
7
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
2.2%
8
PLOS Computational Biology
1633 papers in training set
Top 14%
2.0%
9
Database
51 papers in training set
Top 0.3%
2.0%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.9%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
12
Bioinformatics
1061 papers in training set
Top 7%
1.8%
13
Patterns
70 papers in training set
Top 0.8%
1.8%
14
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 12%
1.4%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.4%
17
eLife
5422 papers in training set
Top 46%
1.4%
18
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
19
International Journal of Molecular Sciences
453 papers in training set
Top 12%
0.9%
20
Journal of Proteome Research
215 papers in training set
Top 2%
0.8%
21
Journal of Structural Biology
58 papers in training set
Top 2%
0.8%
22
BMC Biology
248 papers in training set
Top 5%
0.7%
23
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
24
Plant Direct
81 papers in training set
Top 2%
0.7%
25
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%
26
Protein Science
221 papers in training set
Top 2%
0.5%
27
Limnology and Oceanography: Methods
11 papers in training set
Top 0.5%
0.5%