Back

Empirical study on software and process quality in bioinformatics tools

Ferenc, K.; Otto, K.; de Oliveira Neto, F. G.; Davila Lopez, M.; Horkoff, J.; Schliep, A.

2022-03-13 bioinformatics
10.1101/2022.03.10.483804 bioRxiv
Show abstract

Software quality in computational tools impacts research output in a variety of scientific disciplines. Biology is one of these fields, especially for High Throughput Sequencing (HTS) data, such tools play an important role. This study therefore characterises the overall quality of a selection of tools which are frequently part of HTS pipelines, as well as analyses the maintainability and process quality of a selection of HTS alignment tools. Our findings highlight the most pressing issues, and point to software engineering best practices developed for the improvement of maintenance and process quality. To help future research, we share the tooling for the static code analysis with SonarCloud which we used to collect data on the maintainability of different alignment tools. The results of the analysis show that the maintainability level is generally high but trends towards increasing technical debt over time. We also observed that the development activities on alignment tools are generally driven by very few developers and are not utilising modern tooling to their advantage. Based on these observations, we recommend actions to improve both maintainability and process quality in open source alignment tools. Those actions include improvements in tooling like the use of linters as well as better documentation of architecture and features. We encourage developers to use these tools in order to ease future maintenance efforts, increase user experience, support reproducibility, and ultimately increase the quality of research through increasing the quality of research software tools.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.2%
22.9%
2
GigaScience
172 papers in training set
Top 0.1%
18.9%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
6.5%
4
PLOS ONE
4510 papers in training set
Top 31%
4.9%
50% of probability mass above
5
PeerJ
261 papers in training set
Top 1.0%
4.9%
6
Bioinformatics
1061 papers in training set
Top 5%
4.0%
7
F1000Research
79 papers in training set
Top 0.4%
4.0%
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 1.0%
2.8%
9
SoftwareX
15 papers in training set
Top 0.1%
1.9%
10
Frontiers in Bioinformatics
45 papers in training set
Top 0.2%
1.8%
11
BMC Genomics
328 papers in training set
Top 2%
1.7%
12
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.7%
13
Bioinformatics Advances
184 papers in training set
Top 3%
1.5%
14
Database
51 papers in training set
Top 0.5%
1.4%
15
BioData Mining
15 papers in training set
Top 0.5%
1.2%
16
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
17
Scientific Reports
3102 papers in training set
Top 72%
0.8%
18
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
19
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
21
Journal of Proteome Research
215 papers in training set
Top 2%
0.7%
22
Peer Community Journal
254 papers in training set
Top 4%
0.7%
23
Genomics
60 papers in training set
Top 3%
0.7%
24
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.8%
0.5%
25
Journal of Chemical Information and Modeling
207 papers in training set
Top 4%
0.5%
26
Gigabyte
60 papers in training set
Top 2%
0.5%
27
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.5%
28
Biology
43 papers in training set
Top 4%
0.5%
29
Computers in Biology and Medicine
120 papers in training set
Top 6%
0.5%