Back

The Human Canonical Core Histone Catalogue

Susano Pinto, D. M.; Flaus, A.

2019-07-30 molecular biology
10.1101/720235 bioRxiv
Show abstract

Core histone proteins H2A, H2B, H3, and H4 are encoded by a large family of genes distributed across the human genome. Canonical core histones contribute the majority of proteins to bulk chromatin packaging, and are encoded in 4 clusters by 65 coding genes comprising 17 for H2A, 18 for H2B, 15 for H3, and 15 for H4, along with at least 17 total pseudogenes. The canonical core histone genes display coding variation that gives rise to 11 H2A, 15 H2B, 4 H3, and 2 H4 unique protein isoforms. Although histone proteins are highly conserved overall, these isoforms represent a surprising and seldom recognised variation with amino acid identity as low as 77% between canonical histone proteins of the same type. The gene sequence and protein isoform diversity also exceeds commonly used subtype designations such as H2A.1 and H3.1, and exists in parallel with the well-known specialisation of variant histone proteins. RNA sequencing of histone transcripts shows evidence for differential expression of histone genes but the functional significance of this variation has not yet been investigated. To assist understanding of the implications of histone gene and protein diversity we have catalogued the entire human canonical core histone gene and protein complement. In order to organise this information in a robust, accessible, and accurate form, we applied software build automation tools to dynamically generate the canonical core histone repertoire based on current genome annotations and then to organise the information into a manuscript format. Automatically generated values are shown with a light grey background. Alongside recognition of the encoded protein diversity, this has led to multiple corrections to human histone annotations, reflecting the flux of the human genome as it is updated and enriched in reference databases. This dynamic manuscript approach is inspired by the aims of reproducible research and can be readily adapted to other gene families.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.1%
37.6%
2
Epigenetics & Chromatin
42 papers in training set
Top 0.1%
18.5%
50% of probability mass above
3
Scientific Reports
3102 papers in training set
Top 37%
3.6%
4
PLOS ONE
4510 papers in training set
Top 40%
3.6%
5
Nature Communications
4913 papers in training set
Top 43%
2.9%
6
Journal of Molecular Biology
217 papers in training set
Top 0.9%
2.6%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.9%
8
Genome Research
409 papers in training set
Top 2%
1.7%
9
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
10
Genome Biology
555 papers in training set
Top 4%
1.7%
11
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
12
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
13
Life Science Alliance
263 papers in training set
Top 0.7%
1.2%
14
Bioinformatics
1061 papers in training set
Top 8%
0.9%
15
Open Biology
95 papers in training set
Top 2%
0.8%
16
Journal of Proteome Research
215 papers in training set
Top 2%
0.7%
17
eLife
5422 papers in training set
Top 58%
0.7%
18
iScience
1063 papers in training set
Top 32%
0.7%
19
Biology Open
130 papers in training set
Top 3%
0.7%
20
Cells
232 papers in training set
Top 8%
0.6%
21
Nucleus
11 papers in training set
Top 0.1%
0.6%
22
EMBO reports
136 papers in training set
Top 8%
0.6%
23
Biophysical Journal
545 papers in training set
Top 6%
0.6%
24
Wellcome Open Research
57 papers in training set
Top 3%
0.6%