Back

Statistical inference of the Tree of Blobs of a phylogenetic network from quartet concordance factors

Rhodes, J. A.; Allman, E. S.; Ane, C.; Banos, H.

2026-05-31 evolutionary biology
10.64898/2026.05.28.728501 bioRxiv
Show abstract

A phylogenetic network represents evolutionary relationships involving hybridization, gene flow, or admixture. While the full network may not be identifiable from genomic data under common coalescent models, its tree of blobs, depicting only the tree-like portions of the network structure, is. We introduce ECToBlob (Edge Contraction for Tree of Blobs), a new statistically-consistent algorithm to estimate the tree of blobs from quartet concordance factors. Starting from a resolved tree, ECToBlob successively contracts edges which statistical tests indicate do not belong in the tree of blobs, due to reticulate or polytomous signal. We show that ASTRAL provides a valid starting tree under common assumptions, in that, asymptotically in the number of loci, trees optimizing ASTRALs criterion refine the tree of blobs. We describe several algorithm variants, differing in how evidence from multiple tests are combined to determine if the edge should be contracted, and provide software implementations. Relevance to Life SciencesHybridization, gene flow, or admixture are now recognized as important aspects of evolutionary history, but their genomic signal is confounded with that from a coalescent process, creating substantial challenges for inferring phylogenetic networks. The networks tree of blobs identifies areas where reticulation occurred, separated by tree-like branching. ECToBlob quickly estimates the tree of blobs using quartet concordance factors from gene trees, and provides a measure of statistical support for its result. Performance is illustrated through simulation and on empirical data, using an implementation in the R package MSCquartets. While the presence of a blob may be all that can be inferred in some cases, in others ECToBlob offers a robust and principled way to focus further analyses on more local reticulate structure. Mathematical ContentThis work makes contributions to mathematical phylogenetics in optimization, combinatorics, and statistics. We show that any tree maximizing quartet support (the criterion underlying ASTRAL) is a refinement of the networks tree of blobs under the coalescent model. Second, we give a concise proof that whether a network has a cut-edge corresponding to a given split is determined by information in certain subcollections of its 4-taxon subnetworks (quarnets). Finally, we propose valid statistical approaches for combining p-values across multiple quarnet hypothesis tests, proving that their use with specific decreasing test levels leads to statistically consistent inference as the number of loci grows. MSC codes05C90, 60J95, 62-04, 62F07, 92D15

Matching journals

The top 3 journals account for 50% of the predicted probability mass.