Back

A modular Bayesian framework for inferring transmission networks from polyclonal infections, with application to Plasmodium falciparum

Murphy, M. R.; Nielsen, R.; Perkins, A.; Greenhouse, B.

2026-05-15 bioinformatics
10.64898/2026.05.14.725082 bioRxiv
Show abstract

MotivationMolecular surveillance and infectious disease transmission network reconstruction can provide compelling evidence for estimating public-health quantities that are difficult to observe directly, including importation, source-sink structure, and differences in onward transmission across locations or intervention strata. These quantities can be expressed as functions of the underlying transmission network, but individual transmission events are rarely observed and many networks may be consistent with the same data. Existing transmission network reconstruction methods leveraging genetic data are often built for settings in which each infection has one dominant source, one representative haplotype, and mutation-driven genetic divergence along transmission chains. These assumptions are poorly matched to polyclonal infections, in which hosts carry multiple genetically distinct clones and recipient infections may reflect contributions from multiple sources. Such infections are common in malaria, tuberculosis, HIV, and many parasitic infections. Methods are needed that can accommodate these data. ResultsWe present a modular Bayesian framework for estimating directed transmission on sampled cases, where an infection may have no sampled parent, one parent, or several parents, including sources outside the observed panel. Pathogen-specific modules supply likelihoods over candidate parent sets and connect to shared inference that yields marginal directed edge probabilities, posterior mean out-degree, and inclusion probabilities for unobserved parents. We demonstrate our framework with Plasmotrack, a transmission network model for Plasmodium falciparum that uses targeted amplicon sequencing data. We implemented these components with a per-locus allele-mixture transmission likelihood, an amplicon genotyping error model, and data augmentation allowing for unobserved parents. Simulations from a biologically informed generative model, under which the inferential per-locus allele-mixture likelihood is misspecified, showed recovery of aggregate network summaries including mean outdegree and mean unobserved-source inclusion, alongside high precision and recall for detecting directed transmission. Other pathogens can reuse the same modular composition after substituting transmission and observation likelihoods. AvailabilityThe Plasmotrack software and documentation are available at https://github.com/eppicenter/plasmotrack. Source code and example datasets are provided under an open-source license. Contactmaxwell.murphy@ucsf.edu

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 0.3%
28.4%
2
Bioinformatics
1061 papers in training set
Top 0.8%
26.5%
50% of probability mass above
3
BMC Bioinformatics
383 papers in training set
Top 1%
6.5%
4
Nature Communications
4913 papers in training set
Top 39%
3.7%
5
Methods in Ecology and Evolution
160 papers in training set
Top 1%
1.7%
6
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
7
PLOS ONE
4510 papers in training set
Top 56%
1.5%
8
Genome Medicine
154 papers in training set
Top 5%
1.5%
9
eLife
5422 papers in training set
Top 45%
1.5%
10
Genetics
225 papers in training set
Top 3%
1.5%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.4%
12
Cell Systems
167 papers in training set
Top 9%
1.3%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
14
Microbial Genomics
204 papers in training set
Top 2%
0.8%
15
Nature Methods
336 papers in training set
Top 6%
0.8%
16
Scientific Reports
3102 papers in training set
Top 73%
0.8%
17
Genome Research
409 papers in training set
Top 4%
0.8%
18
PLOS Genetics
756 papers in training set
Top 15%
0.7%
19
Cell Reports Methods
141 papers in training set
Top 5%
0.7%
20
Peer Community Journal
254 papers in training set
Top 4%
0.7%
21
Nature Microbiology
133 papers in training set
Top 5%
0.7%
22
Epidemics
104 papers in training set
Top 2%
0.7%
23
Patterns
70 papers in training set
Top 3%
0.7%
24
Virus Evolution
140 papers in training set
Top 2%
0.7%
25
Nucleic Acids Research
1128 papers in training set
Top 19%
0.7%
26
IEEE Access
31 papers in training set
Top 1%
0.5%
27
Bioinformatics Advances
184 papers in training set
Top 6%
0.5%
28
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.5%
29
GigaScience
172 papers in training set
Top 4%
0.5%
30
Genome Biology
555 papers in training set
Top 9%
0.5%