Back

FAIRSCAPE: An Evolving AI-readiness Framework for Biomedical Research

Al Manir, S.; Levinson, M. A.; Niestroy, J.; Churas, C.; Parker, J. A.; Clark, T.

2024-12-23 bioinformatics
10.1101/2024.12.23.629818 bioRxiv
Show abstract

ObjectiveBiomedical datasets intended for use in AI applications require packaging with rich pre-model metadata to support model development that is explainable, ethical, epistemically grounded and FAIR (Findable, Accessible, Interoperable, Reusable). MethodsWe developed FAIRSCAPE, a digital commons environment, using agile methods, in close alignment with the team developing the AI-readiness criteria and with the Bridge2AI data production teams. Work was initially based on an existing provenance-aware framework for clinical machine learning. We incrementally added RO-Crate data+metadata packaging and exchange methods, client-side packaging support, provenance visualization, and support metadata mapped to the AI-readiness criteria, with automated AI-readiness evaluation. LinkML semantic enrichment and Croissant ML-ecosystem translations were also incorporated. ResultsThe FAIRSCAPE framework generates, packages, evaluates, and manages critical pre-model AI-readiness and explainability information with descriptive metadata and deep provenance graphs for biomedical datasets. It provides ethical, schema, statistical, and semantic characterization of dataset releases, licensing and availability information, and an automated AI-readiness evaluation across all 28 AI-readiness criteria. We applied this framework to successive, large-scale releases of multimodal datasets, progressively increasing dataset AI-readiness to full compliance. ConclusionFAIRSCAPE enables AI-readiness in biomedical datasets using standard metadata components and has been used to establish this pattern across a major, multimodal NIH data generation program. It eliminates early-stage opacity apparent in many biomedical AI applications and provides a basis for establishing end-to-end AI explainability.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
GigaScience
172 papers in training set
Top 0.1%
28.3%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
19.0%
3
Database
51 papers in training set
Top 0.1%
4.4%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 35%
4.0%
5
npj Digital Medicine
97 papers in training set
Top 1%
3.7%
6
PLOS Digital Health
91 papers in training set
Top 0.7%
3.7%
7
Bioinformatics
1061 papers in training set
Top 6%
2.1%
8
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
9
Nature Communications
4913 papers in training set
Top 50%
1.7%
10
Scientific Reports
3102 papers in training set
Top 57%
1.7%
11
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
12
Journal of Biomedical Informatics
45 papers in training set
Top 0.9%
1.5%
13
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.4%
14
Biology Methods and Protocols
53 papers in training set
Top 1%
1.4%
15
PLOS Biology
408 papers in training set
Top 13%
1.3%
16
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
17
Patterns
70 papers in training set
Top 1%
1.3%
18
The Lancet Digital Health
25 papers in training set
Top 0.7%
1.1%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
20
iScience
1063 papers in training set
Top 28%
0.8%
21
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
22
Journal of Translational Medicine
46 papers in training set
Top 3%
0.8%
23
BioData Mining
15 papers in training set
Top 0.8%
0.8%
24
JMIR Medical Informatics
17 papers in training set
Top 2%
0.7%
25
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
26
Genome Medicine
154 papers in training set
Top 8%
0.7%
27
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
28
Cureus
67 papers in training set
Top 6%
0.7%
29
Med
38 papers in training set
Top 1%
0.7%
30
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%