Nullomers (a.k.a. minimal absent words) are the shortest sequences that do not exist in the entire genome or proteome of a species. The Nullomers Database is a web-based resource of significant nullomers. The term 'significant' denotes a sequence highly expected to be present (but in fact it is globally absent) as it has been assessed by the Nullomers Assessor method. The Nullomers Database is a constantly enriched repository of significant missing sequences from various organisms and aims to serve as a central hub of information for the reduction of the vast nullomers' space.
The graphical user interface of Nullomers Database is divided into three main categories, Genomic Nullomers, Peptide Nullomers and Nullomers in viruses in order to facilitate browsing and searching. In the Genomic Nullomers section, significant nullomers from hundreds of genomes, ranging from microbes to human, are provided. The Peptide nullomers have resulted from the analysis of two main organisms (Homo sapiens & Mus musculus), while particular emphasis has been given to protein regions that a significant nullomer can 'emerge' upon a single amino acid alteration. Finally, the Nullomers in viruses section hosts significant absent genomic motifs from thousands of human-isolated virus records (data retrieved by NCBI Virus).
Several annotation features as well as the impact of putative nullomer-making mutations have been incorporated and are visually presented by utilizing the web services of Uniprot and Mutation Assessor, respectively. The ultimate goal of Nullomers Database is to prioritise and highlight the most significant absent sequences across the tree of life.
The first part of Nullomers Database presents significant nucleotide sequences absent from several genomes. In simple words, these sequences are unlikely to be absent by chance. For more information about the probabilistic method and the statistical correction procedures that have been applied, please consult the publication.
By default, the total records from all species are presented in a paginated table. Users can browse the resulting table or download the result-set. Alternatively, results can be copied to the clipboard.
The interactive table allows users to sort the results either alphabetically or numerically as well as narrow-down the output by typing in the search box.
Another way of filtering is by selecting either a species, or a division (which will subsequently reduce the number of species in the other select box). Furthermore, users can effortlessly search for palindromic (reverse-complement) nullomers.
It is worth noting that not every species has significant nullomers. Throughout our analysis, a fixed false discovery threshold of 1% has been applied, both when searching for genomic, peptide or viral nullomers.
Proteins that are prone to 'generate' a nullomer in their sequences upon a single amino acid alteration are shown in the Peptide nullomers section. This includes only proteins that are one substitution away from containing a significant nullomer. The provided results are split into two major categories: i) nullomer-making mutations in genes of interest, ii) and list of proteins per significant absent peptides.
Nullomers per protein
In this section, users can search for proteins that are prone to 'create' a nullomer upon a mutation. As soon as the page loads, a suggestion engine initiates
providing users with a powerful way to search for proteins of interest. A search can be done either by typing a UniProt identifier, a gene name or simply a free-text description of a protein.
The suggested results are serarated into Reviewed and Unreviewed records while users can narrow down the list by selecting records from either Homo sapiens or Mus musculus only.
Upon selection, users should click the 'Search' button. Then, a graphical table displays information of the selected proteins, as well as the actual peptide sequences which are prone to create a nullomer. The 'nullomer-making' alteration coupled with additional information are highlighted, while a prediction of the functional impact of the specific substitution is provided by Mutation Assessor.
By clicking on a sequence, an interactive protein viewer (Molart plugin) appears which provides structural information and feature annotation of the protein. The displayed information is retrieved from UniProt database in realtime. The panel which displays sequential annotation and the 3D structure of the selected protein is interactive in several ways.
Zoom in/out, drag on selection, highlighting elements on click, export annotation and images at a specific focus, synchronization between panels while clicking or hovering as well as panning are some of the key features. Users can zoom-in/zoom-out by holding down the right button of the mouse while moving it up or down, respectively, or rotate the entire molecule by simply clicking on it.
Also, the structure can be moved (pan functionality) by holding down the middle scroll-wheel of the mouse. Users can instantly get information of co-occurring elements at a nullomer-making position and explore disease-associated, deleterious or benign variants that have been found in previous studies.
Proteins per nullomer
Next, lists of proteins which are prone to generate one of the significant peptide nullomer can be retrieved simply by selecting an organism and a nullomer. By default, only reviewed records are shown. Users can choose between Reviewed only or Reviewed and predicted records.
Subsequently, a graphical interactive table which includes Uniprot IDs, gene names, actual peptides and nullomer-making substitutions is shown. The resulted information can be handled in the same way as above. Results can be copied, exported, ordered and searched dynamically. By clicking on a UniProt ID, a new browser-tab opens redirecting users to the corresponding entry in UniProt database.
It should be noted that the content of Peptide nullomers sections is periodically updated and minor variations in resulting nullomers and/or nullomer-making substitutions between different versions might occur (mostly in predicted records) due to the dynamic nature of UniProt database.