Tracking the genetic variability of Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) is a crucial challenge. Mainly to identify target sequences in order to generate robust vaccines and neutralizing monoclonal antibodies, but also to track viral genetic temporal and geographic evolution and to mine for variants associated with reduced or increased disease severity. Several online tools and bioinformatic phylogenetic analyses have been released, but the main interest lies in the Spike protein, which is the pivotal element of current vaccine design, and in the Receptor Binding Domain, that accounts for most of the neutralizing the antibody activity.


Here, we present an open-source bioinformatic protocol, and a web portal focused on SARS-CoV-2 single mutations and minimal consensus sequence building as a companion vaccine design tool. Furthermore, we provide immunogenomic analyses to understand the impact of the most frequent RBD variations.


Results on the whole GISAID sequence dataset at the time of the writing (October 2020) reveals an emerging mutation, S477N, located on the central part of the Spike protein Receptor Binding Domain, the Receptor Binding Motif. Immunogenomic analyses revealed some variation in mutated epitope MHC compatibility, T-cell recognition, and B-cell epitope probability for most frequent human HLAs.


This work provides a framework able to track down SARS-CoV-2 genomic variability.

DOI https://doi.org/10.1186/s12967-020-02675-4

Learn More:

Figure 1. Structure and genomic organization of SARS-CoV-2. (A) Schematic representation of SARS-CoV-2 virus structure and the positions of spike glycoprotein, hemagglutinin-esterase, envelope, membrane, nucleocapsid, and RNA viral genome. (B) Genomic organization of SARS-CoV-2 representing ORF1a, ORF1B which encode for nonstructural proteins such as papain-like protease, 3CL-protease, RNA-dependent RNA polymerase, helicase, and endoribonuclease. Genes coding for spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins are also displayed. Ribosomal frameshift location between ORF1 and ORF2 is shown at the junction of ORF1/2. Genomic positions are shown with dashed lines followed by nucleotide position number in RNA viral genome. The box highlights the genomic organization of spike (S) gene showing distinct S1 and S2 subunits coding segments. (C) Schematic magnified representation of SARS-CoV-2 spike glycoprotein showing S1 and S2 subunits. (D) Crystallographic structure of SARS-CoV-2 spike glycoprotein adapted from PDB ID:6VXX. Receptor binding domain (RBD) representing ACE2 receptor binding site in human cells, N-terminal domain (NTD), fusion protein (FP), transmembrane anchor (T.A.), and intracellular tail (I.T.) protein domains are displayed.

Journal of Clinical Pathology. Gene of the month. Gene of the month: the 2019-nCoV/SARS-CoV-2 novel coronavirus spike protein

Figure 1. Structure of the (A) novel coronavirus severe acute respiratory syndrome-CoV-2 and the (B) spike protein.

Figure 2 Schematic diagram of the genomic structure of the 29.3 kilobase 2019-novel coronavirus (nCoV)gene and domain structure of the1273amino acid spike glycoprotein S (not to scale). E,envelope protein gene; M, membrane protein gene; N, nucleocapsid protein gene;RBM, receptor-binding motif;RdRP, RNA-dependent RNA polymerase; S, spike protein gene.

Learn More:

Reprinted for educational purposes and social benefit, not for profit.