Back

Data cleaning for image-based profiling enhancement

Rohban, M. H.; Bigverdi, M.; Rezvani, A.

2021-09-10 cell biology
10.1101/2021.09.09.459624 bioRxiv
Show abstract

With the advent of high-throughput assays, a large number of biological experiments can be carried out. Image-based assays are among the most accessible and inexpensive technologies for this purpose. Indeed, these assays have proved to be effective in characterizing unknown functions of genes and small molecules. Image analysis pipelines have a pivotal role in translating raw images that are captured in such assays into useful and compact representation, also known as measurements. CellProfiler is a popular and commonly used tool for this purpose through providing readily available modules for the cell/nuclei segmentation, and making various measurements, or features, for each cell/nuclei. Single cell features are then aggregated for each treatment replica to form treatment "profiles." However, there may be several sources of error in the CellProfiler quantification pipeline that affects the downstream analysis that is performed on the profiles. In this work, we examined various preprocessing approaches to improve the profiles. We consider identification of drug mechanisms of action as the downstream task to evaluate such preprocessing approaches. Our enhancement steps mainly consist of data cleaning, cell level outlier detection, toxic drug detection, and regressing out the cell area from all other features, as many of them are widely affected by the cell area. We also examined unsupervised and weakly-supervised deep learning based methods to reduce the feature dimensionality, and finally suggest possible avenues for future research.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 13%
14.5%
2
Frontiers in Pharmacology
100 papers in training set
Top 0.3%
6.9%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
6.9%
4
Scientific Reports
3102 papers in training set
Top 23%
4.9%
5
MethodsX
14 papers in training set
Top 0.1%
4.2%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.6%
7
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.5%
3.3%
8
Metabolites
50 papers in training set
Top 0.3%
2.6%
9
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.4%
10
Computers in Biology and Medicine
120 papers in training set
Top 2%
2.1%
50% of probability mass above
11
SoftwareX
15 papers in training set
Top 0.1%
2.1%
12
Heliyon
146 papers in training set
Top 1%
2.1%
13
iScience
1063 papers in training set
Top 11%
1.9%
14
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.8%
15
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.7%
16
BMC Genomics
328 papers in training set
Top 2%
1.7%
17
Cells
232 papers in training set
Top 2%
1.7%
18
SLAS Discovery
25 papers in training set
Top 0.1%
1.5%
19
Sensors
39 papers in training set
Top 1%
1.3%
20
Current Protocols
13 papers in training set
Top 0.1%
1.3%
21
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.3%
22
Bioinformatics
1061 papers in training set
Top 8%
1.2%
23
Patterns
70 papers in training set
Top 2%
1.0%
24
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 7%
0.9%
25
SLAS Technology
11 papers in training set
Top 0.2%
0.8%
26
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
27
Journal of Cheminformatics
25 papers in training set
Top 0.5%
0.8%
28
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.9%
0.8%
29
Molecules
37 papers in training set
Top 2%
0.7%
30
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%