Back

Genome-AC-GAN: Enhancing Synthetic Genotype Generationthrough Auxiliary Classification

Ahronoviz, S.; Gronau, I.

2024-02-16 genomics
10.1101/2024.02.14.580420 bioRxiv
Show abstract

In recent years, there have been increasing attempts to develop computational methods for generating synthetic genomic data that aim to mimic real genomic datasets. Artificial genomes (AGs) generated by these methods have emerged as a promising potential solution for privacy concerns raised by public genomic datasets and as means to provide adequate representation of under-sampled populations. However, existing methods for generating AGs provide a very limited capability for faithfully capturing features of different sub-populations within a larger cohort. In this study, we propose a novel method called the Genome Auxiliary Classifier Generative Adversarial Network (Genome-AC-GAN), which generates AGs tailored to specific sub-populations. We conducted experiments to evaluate the performance of the Genome-AC-GAN and compare the AGs it generates with real genomic data as well as with AGs generated by previously published methods. The Genome-AC-GAN outperforms other methods and faithfully models population structure, which is not adequately captured by existing methods. We also demonstrate the use of AGs generated by the Genome-AC-GAN in augmentation of datasets used as training sets for classifying genomes into populations. These experiments demonstrate the benefits of AGs in enhancing classification accuracy, especially when dealing with under-sampled and closely related populations.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Frontiers in Genetics
197 papers in training set
Top 0.2%
12.3%
2
Bioinformatics
1061 papers in training set
Top 2%
12.3%
3
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
10.0%
4
PLOS Computational Biology
1633 papers in training set
Top 4%
8.4%
5
Scientific Reports
3102 papers in training set
Top 24%
4.8%
6
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.3%
50% of probability mass above
7
Nature Communications
4913 papers in training set
Top 40%
3.6%
8
PLOS Genetics
756 papers in training set
Top 5%
3.2%
9
Nature Computational Science
50 papers in training set
Top 0.4%
2.1%
10
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
11
iScience
1063 papers in training set
Top 10%
2.1%
12
Genome Research
409 papers in training set
Top 2%
2.1%
13
PLOS ONE
4510 papers in training set
Top 48%
2.1%
14
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
15
Cell Genomics
162 papers in training set
Top 3%
1.7%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
17
Nature Genetics
240 papers in training set
Top 5%
1.5%
18
Genome Medicine
154 papers in training set
Top 5%
1.3%
19
Cell Systems
167 papers in training set
Top 9%
1.2%
20
Genome Biology
555 papers in training set
Top 6%
1.2%
21
Journal of Computational Biology
37 papers in training set
Top 0.4%
0.9%
22
Communications Biology
886 papers in training set
Top 17%
0.9%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
24
GigaScience
172 papers in training set
Top 3%
0.7%
25
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%
26
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
27
Genes
126 papers in training set
Top 3%
0.7%
28
Cell Reports Methods
141 papers in training set
Top 6%
0.6%
29
BMC Genomics
328 papers in training set
Top 7%
0.6%