Back

The Limits of Cross-Species WGCNA: Library Imbalance and Signal Dilution Constrain Effector Gene Recovery in Dual-Organism RNA-seq

Fenn, A.; Hueckelhoven, R.; Kamal, N.

2026-05-05 systems biology
10.64898/2026.04.30.721941 bioRxiv
Show abstract

Dual-organism RNA sequencing (RNA-seq) experiments, in which the transcriptomes of a host and a microbe are sequenced simultaneously, are increasingly used to study plant-microbe interactions. A central analytical goal is identifying effector proteins and their host targets through gene co-expression. Weighted Gene Co-expression Network Analysis (WGCNA) is the dominant tool for gene co-expression analyses, yet its ability to recover interaction-interface genes from a merged dual-organism matrix has not been systematically characterised. Here we present a simulation framework using real gene models from Hordeum vulgare (barley) and Blumeria graminis f. sp. Hordei M.Liu & Hambl (powdery mildew) to evaluate single-network WGCNA across a gradient of plant-to-fungal library size ratios (1:1-20:1), three levels of co-expression signal strength, and three WGCNA network construction types (signed, unsigned, signed hybrid). We embed 20 model effector genes (bridge genes) driven by a mixed host-pathogen eigengene and evaluate recovery using four metrics aligned with the biological objective: cross-species hub rank, top-decile hub enrichment, bridge gene detection rate, and bridge co-separation (the fraction of effector-target pairs co-assigned to the same detected module). Across 225 simulation runs (15 conditions x 5 replicates x 3 network types), bridge genes are robustly identifiable as cross-species connectivity hubs (mean rank 0.92 versus 0.50 for module genes) but co-assignment of effector-target pairs to the same module fails in 41% of runs due to scale-free topology collapse. Signal strength (2 = 0.12) and library ratio (2 = 0.22) are the primary determinants of co-separation, while network type choice accounts for less than 2%. A read-depth bias systematically inflates pathogen gene hub ranks relative to host genes at high ratios. These results establish that the method can identify effector candidates as cross-species hubs under a broad range of conditions, but reliable co-assignment requires adequate pathogen read depth and strong co-expression signal--properties that experimental design, not analytical parameterisation, must provide.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.8%
12.4%
2
Cell Reports Methods
141 papers in training set
Top 0.1%
12.3%
3
Cell Systems
167 papers in training set
Top 1%
9.0%
4
Genome Biology
555 papers in training set
Top 0.7%
8.2%
5
Molecular Systems Biology
142 papers in training set
Top 0.1%
4.8%
6
PLOS Computational Biology
1633 papers in training set
Top 8%
4.2%
50% of probability mass above
7
Bioinformatics
1061 papers in training set
Top 5%
4.1%
8
Nature Communications
4913 papers in training set
Top 38%
3.9%
9
Bioinformatics Advances
184 papers in training set
Top 1%
3.6%
10
BMC Bioinformatics
383 papers in training set
Top 3%
3.5%
11
Genome Research
409 papers in training set
Top 1%
3.2%
12
npj Systems Biology and Applications
99 papers in training set
Top 0.9%
2.0%
13
Nature Biotechnology
147 papers in training set
Top 4%
1.9%
14
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
15
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.7%
16
Nature Microbiology
133 papers in training set
Top 2%
1.7%
17
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
18
Nature Protocols
30 papers in training set
Top 0.1%
1.5%
19
PLOS ONE
4510 papers in training set
Top 57%
1.5%
20
Nature Genetics
240 papers in training set
Top 6%
1.2%
21
Genome Medicine
154 papers in training set
Top 6%
1.2%
22
iScience
1063 papers in training set
Top 22%
1.2%
23
Scientific Reports
3102 papers in training set
Top 67%
1.2%
24
Cell Reports
1338 papers in training set
Top 30%
0.9%
25
eLife
5422 papers in training set
Top 52%
0.9%
26
BMC Genomics
328 papers in training set
Top 6%
0.7%
27
Microbiome
139 papers in training set
Top 3%
0.7%
28
Communications Biology
886 papers in training set
Top 25%
0.7%
29
Patterns
70 papers in training set
Top 3%
0.7%
30
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%