Though the collection of known 3D protein structures continues to expand the structures of some regions within the proteome have not been experimentally determined. Because these regions lack homology to known domains, they cannot be predicted. Researchers sought to identify such regions, collectively termed the “dark proteome.”
The authors used the program Aquaria, a publicly available tool previously designed to enable biologists to obtain structural information based on amino acid sequence. Using Aquaria to survey the 546,000 sequences within the Swiss-Prot database, the authors found that 13–14% of amino acid residues in prokaryotic proteins, 44% in eukaryotes, and 54% in viruses reside in the dark proteome.
Nearly half of the dark proteome consists of “dark proteins,” thus termed when the entire sequence is dark. Most dark proteins did not display a higher-than-normal level of predicted disorder, compositional bias, or transmembrane residues, although they had increased levels of cysteine, phenylalanine, and tryptophan residues, suggesting a high frequency of disulfide bonds and perhaps undetected transmembrane spans.
In eukaryotes, dark proteins were more likely than nondark proteins to be secreted. Many dark proteins contain rare domains that may have evolved recently, and a majority have unknown location or function, making them prime candidates for research, according to the authors.