Back

Interpreting GC content differences across populations at polymorphic sites

Chandra, S.; Gao, Z.

2026-05-18 evolutionary biology
10.64898/2026.05.16.725686 bioRxiv
Show abstract

Recent studies have reported consistent inter-population differences in GC content at polymorphic sites in multiple species, including humans. Specifically, populations that experienced recent bottlenecks exhibit lower average GC content (GC%) at common polymorphic sites compared to non-bottlenecked groups--an observation previously interpreted as indication of rapid evolution of base composition. In this study, we investigate the evolutionary and technical factors driving these patterns across humans, mice, maize, and silkworm. We find that GC% at polymorphic sites is highly sensitive to the allele frequency threshold applied. Relaxing this threshold reduces inter-population differences to negligible levels in humans and significantly attenuates similar signals in other species. We further observe substantial GC% variation across allele frequency bins, a pattern driven by the differential abundance of different mutation types. We demonstrate that these observations are collectively driven by an interaction between demographic history and a universal excess of strong-to-weak mutations relative to weak-to-strong mutations, which is counteracted by GC-biased gene conversion (gBGC) over long evolutionary timescales. Forward-in-time simulations with realistic parameters recapitulate observed patterns of GC% variation across both populations and allele frequency bins. Overall, our findings reveal that the base composition at polymorphic sites is strongly shaped by the interaction between demographic history, mutation bias, and gBGC, and does not represent stable, genome-wide trends. Consequently, inter-population differences in GC content--especially at common variants--should not be interpreted as evidence of ongoing divergence in base composition or shifts in mutation patterns.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.