Back

Machine learning prediction of Antibody-Antigen binding: dataset, method and testing

Ye, C.; Hu, W.; Gaeta, B.

2021-03-20 bioinformatics
10.1101/2021.03.19.435772 bioRxiv
Show abstract

DNA sequencing technologies are providing new insights into the immune response by allowing the large scale sequencing of rearranged immunoglobulin gene present in an individual, however the applications of this approach are limited by the lack of methods for determining the antigen(s) that an immunoglobulin encoded by a given sequence binds to. Computational methods for predicting antibody-antigen interactions that leverage structure prediction and docking have been proposed, however these methods require knowledge of the 3D structures. As a step towards the development of a machine learning method suitable for predicting antibody-antigen binding affinities from sequence data, a weighted nearest neighbor machine learning approach was applied to the problem. A prediction program was coded in Python and evaluated using cross-validation on a dataset of 600 antibodies interacting with 50 antigens. The classification predicting accuracy was around 76% for this dataset. These results provide a useful frame of reference as well as protocols and considerations for machine learning and dataset creation in this area. Both the dataset (in csv format) and the machine learning program (coded in python) are freely available for download.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
ImmunoInformatics
11 papers in training set
Top 0.1%
22.6%
2
mAbs
28 papers in training set
Top 0.1%
12.4%
3
Scientific Reports
3102 papers in training set
Top 18%
6.3%
4
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
5
Frontiers in Immunology
586 papers in training set
Top 1%
4.9%
50% of probability mass above
6
Bioinformatics
1061 papers in training set
Top 4%
4.9%
7
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
8
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
3.6%
9
PLOS ONE
4510 papers in training set
Top 44%
2.7%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.6%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
12
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
13
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
14
Antibody Therapeutics
16 papers in training set
Top 0.3%
1.2%
15
Biology Methods and Protocols
53 papers in training set
Top 2%
1.1%
16
Archives of Clinical and Biomedical Research
28 papers in training set
Top 1%
1.1%
17
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
18
eLife
5422 papers in training set
Top 53%
0.9%
19
Protein Science
221 papers in training set
Top 1%
0.9%
20
Journal of Immunological Methods
24 papers in training set
Top 0.2%
0.9%
21
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.8%
22
Viruses
318 papers in training set
Top 5%
0.7%
23
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.7%
24
Frontiers in Physiology
93 papers in training set
Top 6%
0.7%
25
PeerJ
261 papers in training set
Top 15%
0.7%
26
Communications Biology
886 papers in training set
Top 26%
0.7%
27
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
28
F1000Research
79 papers in training set
Top 6%
0.6%
29
Physical Biology
43 papers in training set
Top 3%
0.6%
30
Gigabyte
60 papers in training set
Top 2%
0.6%