Back

DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling

Douglas, G. M.; Bobay, L.-M.

2026-03-12 bioinformatics
10.64898/2026.03.09.710665 bioRxiv
Show abstract

Physicochemically similar amino acids undergo more frequent substitutions compared to dissimilar amino acid pairs. Despite their clear potential, amino acid similarity matrices remain underused in molecular evolution, partially due to the high number of proposed amino acid distance measures and the lack of agreement on which are most accurate. In this study, we assessed the performance of 30 amino acid distance measures, including a new amino acid distance measure we developed based on recent deep mutational scanning data. We compared these measures across codon substitution models fit to alignments spanning Streptococcus, Drosophila, and mammalian lineages, as well as segregating variants across Escherichia coli strains and human genotypes. We further constructed consensus measures from combinations of top-performing measures in this analysis using the DISTATIS approach and retested these matrices. Our results show that experimentally-derived measures, particularly our new measure and the existing experimental exchangeability (EX) measure, best fit codon substitution patterns across diverse lineages. We found that a consensus measure based on these two approaches, which we named DEX, performed best overall. In addition, although site-specific variant effect predictors are intended to identify deleterious mutations, the representative tools we tested did not outperform amino acid distance measures for predicting mean substitution frequencies. They were however substantially more informative for identifying individual highly deleterious mutations. Overall, we provide a systematic comparison of the performance of existing measures, and we introduce an improved general-purpose amino acid distance measure for molecular evolution models. SignificanceProtein-coding genes have long been a focus for researchers studying the strength and direction of selection. By studying non-synonymous substitutions, those that change amino acids, it is possible to estimate the relative strength of selection. Despite widespread interest in such approaches, information on which amino acids are exchanged is underused in molecular evolution models. This is partly because many different measures exist for quantifying amino acid distances, particularly those based on physicochemical properties. A newer class of amino acid distance measures is derived from deep mutational scanning datasets, where virtually every possible substitution is tested for its impact on protein function. We characterised and compared 30 amino acid distance measures, including a novel measure based on deep mutational scanning data. We highlight differences in how well these measures fit real substitution and polymorphism datasets. Overall, we find that DEX, which is a consensus of our new measure and an existing experimental exchangeability measure, represents the best available amino acid distance measure to incorporate into molecular evolution models.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.