Sequence effects on patterns of variation and DNA strand asymmetries observed from whole-genome sequenced UK Biobank participants
Curtis, D.
Show abstract
UK Biobank has released whole genome sequence data for 500,000 participants, including allele counts for hundreds of millions of variants and these were considered in the context of the pentanucleotide background on which they occurred. Frequencies of singleton variants were obtained and compared with frequencies of more common variants. Results were highly correlated across chromosomes, reflecting systematic effects. C>T singleton variants were less frequent in the CG context but the opposite was true for more common variants, suggesting that they are relatively well tolerated and not subject to strong negative selection. The frequencies of singleton variant types were strongly influenced by their trinucleotide context and the total counts of variants in their trinucleotide context could be well approximated by combining five mutational signatures obtained from genomes of cancer cells. For some variant types, there were marked asymmetries in counts between plus and minus DNA strands. The patterns of these asymmetries for singleton variants differed between chromosomes, with five being negatively correlated with the rest. These asymmetries did not appear related to strand-specific gene content. It was noted that there were also strand asymmetries for some pentanucleotide sequences in the reference genome and that these were consistent across chromosomes. The sequence TTCGT is seen 673300 times on the plus strand but only 465807 times on the minus strand. These findings must reflect strand-specific mechanisms affecting mutation and selection which are not currently well understood and which could be investigated further. This research has been conducted using the UK Biobank Resource.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.