Back

Community needs for FAIR pathogen data

van Geest, G.; Thomas-Lopez, D.; Feitzinger, A. A.; Weissgold, L. A.; Halabi, S.; Cuesta, I.; Hjerde, E.; Gurwitz, K. T.; Arora, N.; Neves, A.; Palagi, P. M.; Williams, J. J.

2026-04-15 scientific communication and education
10.64898/2026.04.14.718420 bioRxiv
Show abstract

BackgroundDatasets related to infectious diseases are essential for public health decision-making, yet their reuse remains limited by persistent barriers to data sharing and integration. Achieving data that are Findable, Accessible, Interoperable, and Reusable (FAIR) is widely recognized as essential for accelerating scientific discovery and enabling coordinated responses to emerging threats, but the needs of the global pathogen data community have not been systematically characterized. AimThis study, conducted by the Pathogen Data Network (PDN), aims to identify infrastructural and educational priorities among stakeholders working with infectious disease-related data in order to guide community-responsive support for data sharing and interoperability. MethodsA cross-sectional stakeholder survey was disseminated to a well-defined expert population within PDN networks and via open professional channels. A total of 136 responses from researchers, healthcare professionals, bioinformaticians, and educators were analyzed descriptively to identify prioritized barriers, training needs, and preferred support mechanisms. ResultsRespondents consistently identified structural constraints as the primary impediments to effective data use, including limited funding (74%), data-aggregation challenges (68%), and a shortage of skilled personnel (52%). Respondents identified bioinformatics for infectious disease research (68%) as the highest priority for training, followed by guidance on using the integrated pathogen data and tools portal provided by the PDN, the Pathogens Portal (51%). The Pathogens Portal was also ranked as the most essential PDN resource (72%). Preferred training formats included virtual short courses (68%) and webinars (66%). Notably, while researchers emphasized technical subjects like machine learning, educators prioritized foundational case studies. ConclusionThese findings provide an evidence-based diagnostic of community needs and suggest that barriers to FAIR pathogen data are predominantly systemic rather than purely technological. The survey framework and openly available dataset offer a reusable template for assessing needs in other communities and regions. By aligning training, infrastructure development, and outreach with empirically identified priorities, organizations supporting infectious disease research can strengthen the interoperability and reuse of data and establish a benchmark for future community-driven improvements.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
GigaScience
172 papers in training set
Top 0.1%
14.5%
2
F1000Research
79 papers in training set
Top 0.1%
12.6%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
8.5%
4
PLOS ONE
4510 papers in training set
Top 22%
8.5%
5
Wellcome Open Research
57 papers in training set
Top 0.1%
8.5%
50% of probability mass above
6
Patterns
70 papers in training set
Top 0.1%
6.4%
7
PLOS Computational Biology
1633 papers in training set
Top 7%
4.9%
8
PeerJ
261 papers in training set
Top 3%
3.1%
9
PLOS Biology
408 papers in training set
Top 5%
2.8%
10
Bioinformatics
1061 papers in training set
Top 6%
2.6%
11
BioData Mining
15 papers in training set
Top 0.3%
1.7%
12
PLOS Digital Health
91 papers in training set
Top 1%
1.7%
13
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.3%
1.3%
14
Genome Medicine
154 papers in training set
Top 6%
1.0%
15
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.9%
16
Scientific Reports
3102 papers in training set
Top 71%
0.9%
17
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.9%
18
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
19
Scientific Data
174 papers in training set
Top 2%
0.8%
20
European Journal of Epidemiology
40 papers in training set
Top 0.7%
0.8%
21
Heliyon
146 papers in training set
Top 6%
0.8%
22
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.8%
23
Cell Systems
167 papers in training set
Top 12%
0.8%
24
JAMIA Open
37 papers in training set
Top 1%
0.8%
25
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.8%
26
Frontiers in Public Health
140 papers in training set
Top 9%
0.6%
27
eneuro
389 papers in training set
Top 11%
0.5%
28
Communications Biology
886 papers in training set
Top 32%
0.5%
29
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 8%
0.5%