Back

AbiOmics: An End-to-End Pipeline to Train Machine Learning Models for Discrimination of Plant Abiotic Stresses Using Transcriptomic Profiling Data

Park, M.; Oh, Y.; Choi, W.; Jo, Y. D.

2026-02-27 bioinformatics
10.64898/2026.02.25.707868 bioRxiv
Show abstract

Abiotic stresses are primary constraints on global crop productivity, reducing yields by up to 80%. While traditional phenotypic sensing detects stress only after physiological symptoms emerge and often fails to discriminate specific stressor types, transcriptomic profiling offers a high-dimensional solution, capturing rapid and sensitive molecular shifts. In this study, we developed AbiOmics, the first end-to-end machine learning pipeline specifically designed to identify and discriminate among multiple stressors. This approach represents a previously undocumented method for stress specification using large-scale transcriptomic big data. We identified 320 stress-specific marker genes using a curated collection of 1,243 transcriptomes of Arabidopsis samples treated with four major abiotic stresses, salt, cold, heat, and drought. A single-layer perceptron model trained on these features achieved 91% accuracy during five-fold cross-validation and 93% accuracy on an independent test set. The model demonstrated an unprecedented capacity to generalize to multi-stress conditions, identifying concurrent signatures in combinatorial salt-and-heat treatments. By integrating marker identification with SHAP-based biological interpretation, AbiOmics provides a rigorously validated diagnostic tool superior to conventional sensing. This framework establishes a high-confidence labeling strategy for AI-driven crop management and precision breeding to mitigate climate change impacts. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=73 SRC="FIGDIR/small/707868v1_ufig1.gif" ALT="Figure 1"> View larger version (30K): org.highwire.dtl.DTLVardef@573cb5org.highwire.dtl.DTLVardef@152a0b0org.highwire.dtl.DTLVardef@1b389a5org.highwire.dtl.DTLVardef@11c60d_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Plant Communications
35 papers in training set
Top 0.1%
22.0%
2
Nature Communications
4913 papers in training set
Top 10%
14.4%
3
Horticulture Research
43 papers in training set
Top 0.2%
9.9%
4
Advanced Science
249 papers in training set
Top 1%
9.9%
50% of probability mass above
5
Molecular Plant
36 papers in training set
Top 0.2%
6.2%
6
Cell Systems
167 papers in training set
Top 4%
3.2%
7
Plant Biotechnology Journal
56 papers in training set
Top 0.5%
2.3%
8
The Plant Journal
197 papers in training set
Top 3%
1.5%
9
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.5%
10
New Phytologist
309 papers in training set
Top 4%
1.3%
11
Plant Phenomics
17 papers in training set
Top 0.2%
1.3%
12
Plant Physiology
217 papers in training set
Top 2%
1.3%
13
Nature Plants
84 papers in training set
Top 1%
1.2%
14
Genome Biology
555 papers in training set
Top 6%
1.2%
15
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
16
Cell Genomics
162 papers in training set
Top 5%
1.1%
17
iScience
1063 papers in training set
Top 24%
1.1%
18
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
19
Science Advances
1098 papers in training set
Top 27%
0.9%
20
Scientific Reports
3102 papers in training set
Top 72%
0.9%
21
Bioinformatics
1061 papers in training set
Top 9%
0.9%
22
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
23
Genome Medicine
154 papers in training set
Top 8%
0.8%
24
PLOS ONE
4510 papers in training set
Top 67%
0.8%
25
eLife
5422 papers in training set
Top 59%
0.7%
26
Patterns
70 papers in training set
Top 3%
0.7%
27
Nature Methods
336 papers in training set
Top 6%
0.7%
28
Communications Biology
886 papers in training set
Top 30%
0.6%
29
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%