Back

Emerging Concern of Scientific Fraud: Deep Learning and Image Manipulation

Qi, C.; Zhang, J.; Luo, P.

2021-01-17 scientific communication and education
10.1101/2020.11.24.395319 bioRxiv
Show abstract

Scientific fraud by image duplications and manipulations within western blot images is a rising problem. Currently, problematic western blot images are mainly detected by checking repeated bands or through visual observation. However, the completeness of the above methods in detecting problematic images has not been demonstrated. Here we show that Generative Adversarial Nets (GANs) can generate realistic western blot images that indistinguishable from real western blots. The overall accuracy of researchers for identifying synthetic western blot images is 0.52, which almost equal to blind guess (0.5). We found that GANs can generate western blot images with bands of the expected lengths, widths, and angles in desired positions that can fool researchers. For the case study, we find that the accuracy of detecting the synthetic western blot images is related to years of researchers performed studies relevant to western blots, but there was no apparent difference in accuracy among researchers with different academic degrees. Our results demonstrate that GANs can generate fake western blot images to fool existing problematic image detection methods. Therefore, more information is needed to ensure that the western blots appearing in scientific articles are real. We argue to require every western blot image to be uploaded along with a unique identifier generated by the laboratory machine and to peer review these images along with the corresponding submitted articles, which may reduce the incidence of scientific fraud.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 3%
28.6%
2
Entropy
20 papers in training set
Top 0.1%
10.4%
3
Scientific Reports
3102 papers in training set
Top 9%
8.7%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
7.1%
50% of probability mass above
5
Cognitive Neurodynamics
15 papers in training set
Top 0.1%
5.0%
6
BioData Mining
15 papers in training set
Top 0.1%
2.8%
7
Heliyon
146 papers in training set
Top 1.0%
2.1%
8
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
2.1%
9
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.2%
1.8%
10
Archives of Clinical and Biomedical Research
28 papers in training set
Top 0.6%
1.8%
11
Bioengineering
24 papers in training set
Top 0.4%
1.8%
12
Patterns
70 papers in training set
Top 0.8%
1.8%
13
eLife
5422 papers in training set
Top 44%
1.5%
14
FEBS Open Bio
29 papers in training set
Top 0.2%
1.3%
15
Ecological Informatics
29 papers in training set
Top 0.6%
0.9%
16
Communications Biology
886 papers in training set
Top 18%
0.9%
17
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
18
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.5%
0.8%
19
Expert Systems with Applications
11 papers in training set
Top 0.4%
0.8%
20
Bulletin of Mathematical Biology
84 papers in training set
Top 2%
0.8%
21
eneuro
389 papers in training set
Top 9%
0.8%
22
Animals
20 papers in training set
Top 1%
0.7%
23
Biomedical Signal Processing and Control
18 papers in training set
Top 0.6%
0.7%
24
Mathematics
11 papers in training set
Top 0.5%
0.7%
25
Nature Communications
4913 papers in training set
Top 66%
0.5%
26
IEEE Access
31 papers in training set
Top 1%
0.5%