Abstract

Introduction: The emergence of a novel coronavirus, SARS-CoV-2, an etiologic agent of coronavirus disease (COVID-19), has led to a pandemic of global concern. Considering the huge number of morbidity and mortality worldwide, the World Health Organization declared, on 11th March 2020, the pandemic as an unprecedented public health crisis. The virus is a member of plus sense RNA viruses that can show a high rate of mutations. The ongoing multiple mutations in the structural proteins of coronavirus drive viral evolution, enabling them to evade the host immunity and rapidly acquire drug resistance. In the present study, we focused mainly on the prevalence of mutations in the four types of structural proteins- S (spike), E (envelope), M (membrane), and N (nucleocapsid)- that are required for the assembly of a complete virion particle. Further, we estimated the antigenicity and allergenicity of these structural proteins to design and develop a potentially good candidate vaccine against SARS-CoV-2.


Methods: In the present in silico study, envelope protein was found to be highly antigenic, followed by the nucleocapsid, membrane, and spike proteins of SARS-CoV-2.


Results: In this study, we detected 987 mutations from 729 sequences from Asia in October 2020, and compared them with China's first Wuhan isolate sequence as a reference. Spike protein showed the highest mutations with 807 point mutations among the four structural proteins, followed by nucleocapsid with 151 mutations, while envelope showed 19 mutations and membrane only 10 point mutations.


Conclusion: Taken together, our study revealed that variations occurring in the structural protein of SARS-CoV-2 might be altering the viral structure and function, and that the envelope protein appears to be a promising vaccine candidate to curb coronavirus infections.


Introduction

Human Coronavirus (SARS-CoV-2, Severe acute respiratory syndrome) is a positive-sense RNA virus. As an etiologic agent of coronavirus disease 2019 (COVID-19), the virus induces moderate to severe respiratory distress1. This pandemic originated from an animal market in Wuhan city of China2. The ripple effect of this contagious viral disease has created a humanitarian health crisis and has become an enormous challenge to the entire health systems across the globe.

SARS-CoV-2 is a member of the Coronaviridae family and Nidovirales order. The virus is considered the third zoonotic coronavirus (after SARS-CoV and MERS-CoV) and originated from bats. However, this novel coronavirus has been the only one having pandemic potential3, 4, 5, 6. SARS-CoV-2, a beta coronavirus, is an enveloped single-stranded, positive-sense, non-segmented and genetically diverse RNA virus with the largest genome size among known RNA viruses (29,891 ase pair, encodes for approximately 9860 amino acids)2, 7, 8. The genome of SARS-CoV-2 encodes both structural proteins like spike (S), envelope (E), membrane (M), and nucleocapsid (N), as well as non-structural proteins ranging from NSP1 to NSP16.

RNA viruses, generally, show a drastically high rate of mutation, substantially higher than those of DNA viruses. Due to this high rate of mutation shown by SARS-CoV-2 over a short period, it has been observed that viruses exhibit genomic variability which enables them to modulate virulence properties in the host and subsequently evade the host immunity9, 10.

In the present research work, we detected 987 mutations from 729 sequences derived from Asia in in the October. Altogether spike showed the highest mutations with 807 point mutations among the four structural proteins, followed by nucleocapsid with 151 mutations. Envelope showed 19 mutations and membrane showed only 10 point mutations. The results of our study suggest that mutational analysis of this virus might be considered as a new approach to help understand its genomic variability. Similarly, using the predictive tools of immunoinformatics approach, the antigenicity and allergenicity of the structural proteins of SARS-CoV-2 have been determined to develop efficacious antiviral therapeutics or vaccines against COVID-19.

Methods

Data mining

The full-length protein sequences of SARS-CoV-2 structural proteins, i.e., envelope protein, nucleocapsid phosphoprotein, surface glycoprotein and membrane glycoprotein, were retrieved from the NCBI virus database, as submitted from Asia in the month of October. There were 729 SARS-CoV-2 structural protein sequences submitted from Asia in the month of October, including sequences of 165 envelope proteins, 159 nucleocapsid phosphoproteins, 246 surface glycoproteins, and 159 membrane glycoproteins. A total of four reference sequences for envelope protein (YP_009724392), nucleocapsid phosphoprotein (YP_009724397), surface glycoprotein (YP_009724390), and membrane glycoprotein (YP_009724393) were also retrieved for mutational studies.

Multiple sequence alignment (MSA) and mutational identification

Multiple sequence alignment was performed using Clustal Omega online platform (http://www.clustal.org/) based on HMM profile seeded guide trees11. The envelope, nucleocapsid phosphoprotein, surface glycoprotein, and membrane glycoprotein were aligned with their respective reference sequences. The aligned files were viewed using Jalview (https://www.jalview.org/) to identify the point mutations occurring in different structural proteins with respect to the Wuhan type isolate.

Antigenicity and allergenicity evaluation

Vaxijen v2.0 server was used for the estimation of antigenicity of all the four structural proteins to study the capability of structural proteins to be used in vaccine production. This online server predicts antigens as per the auto cross-covariance (called ACC transformation) of the peptide sequences submitted to it12. A good vaccine needs to be non-allergenic to the host, hence the rationale for evaluating the allergenicity of these structural proteins, AllerTOP server was used, which predicts allergenicity based on size, flexibility, and other parameters13.

Figure 1 . Showing the total number of mutations occurring in the structural proteins. a . Surface glycoprotein, b . Envelope protein, c . Membrane glycoprotein and, d . Nucleocapsid phosphoprotein.

Table 1.

Mutational location after Multiple Sequence Alignment of SARS-CoV-2 envelope protein sequence with position and sequence

Serial No. Accession Mutated sequence and position
1. BCM16104 S68F
2. BCM16116 S68F
3. BCM16128 S68F
4. BCM16176 S68F
5. BCM16188 S68F
6. BCM16200 S68F
7. BCM16212 S68F
8. BCM16140 S68F
9. QOP57282 V75F
10. QOP57300 V75F
11. QOP57289 V75F
12. QOP57280 V75F
13. QOP57294 V75F
14. QOS50800 V75F
15. QOS50895 I46V
16. QOS50728 V75F
17. QOS50501 V75F
18. QOU99241 I46V
19. QOU99253 I46V

Table 2.

Mutational location after Multiple Sequence Alignment of SARS-CoV-2 nucleocapsid phosphoprotein sequence with position and sequence

Serial No. Accession Mutated sequence and position
1. QJF74875 R203K
2. QJF74875 G204R
3. QKM75385 R203K
4. QKM75385 G204R
5. QKM75397 R203K
6. QKM75397 G204R
7. QKM75409 R203K
8. QKM75409 G204R
9. QKM75421 R203K
10. QKM75421 G204R
11. QKM75433 R203K
12. QKM75433 G204R
13. QKM75445 P207L
14. QKM75445 M210I
15. QKM75505 R203K
16. QKM75505 G204R
17. QKM75505 D377G
18. QKM75517 R203K
19. QKM75517 G204R
20. QKM75517 D377G
21. QKM75529 R203K
22. QKM75529 G204R
23. QKM75529 D377G
24. QKM75541 G204R
25. QKM75541 D377G
26. QKM75541 R203K
27. QKM75552 R203K
28. QKM75552 G204R
29. QKM75552 D377G
30. QKM75563 R203K
31. QKM75563 G204R
32. QKM75563 D377G
33. QKM75575 R203K
34. QKM75575 G204R
35. QKM75587 R203K
36. QKM75587 G204R
37. QKM75599 R203K
38. QKM75599 G204R
39. QKM75647 R203K
40. QKM75647 G204R
41. QKM75659 R203K
42. QKM75659 R204R
43. QKM75683 R203K
44. QKM75683 G204R
45. QKM75695 R203K
46. QKM75695 G204R
47. QKQ30536 R203K
48. QKQ30536 G204R
49. QKQ30548 R40C
50. QKQ30560 R203K
51. QKQ30560 G204R
52. QKQ30572 R203K
53. QKQ30572 G204R
54. QKQ30584 R203K
55. QKQ30584 G204R
56. QLA10246 R203K
57. QLA10246 G204R
58. QLA10270 R203K
59. QLA10270 G204RR
60. QLA10282 R203K
61. QLA10282 G204R
62. QLA10294 P383L
63. QLA10294 R203K
64. QLA10294 G204R
65. QLA10306 R203K
66. QLA10306 G204R
67. QLA10318 G204R
68. QLA10318 R203K
69. QLA10330 G204R
70. QLA10330 R203K
71. QLA10342 G204R
72. QLA10342 R203K
73. QLA10354 R203K
74. QLA10354 G204R
75. QOI53600 P13L
76. QOQ57020 S194L
77. QOQ57032 S194L
78. QOQ57044 S194L
79. QOQ57056 S194L
80. QOQ57068 S194L
81. QOQ57092 M234I
82. QOQ57104 S194L
83. QOQ57116 S194L
84. QOQ57129 S194L
85. QOQ72552 S194L
86. QOQ72564 S194L
87. QOQ72576 S194L
88. QOQ84803 S194L
89. QOQ84834 S194L
90. QOR63442 T205I
91. QOR63454 S194L
92. QOR63466 T205I
93. QOR63514 A119S
94. QOR63514 S194L
95. QOR64241 S194L
96. QOR64253 S194L
97. QOS50459 P13L
98. QOS50495 T91I
99. QOS50507 P13L
100. QOS50519 P13L
101. QOS50531 P13L
102. QOS50590 P13L
103. QOS50650 P13L
104. QOS50674 P13L
105. QOS50686 P13L
106. QOS50686 D225Y
107. QOS50722 P13L
108. QOS50734 P13L
109. QOS50746 P13L
110. QOS50746 S413I
111. QOS50758 S413I
112. QOS50758 P13L
113. QOS50770 P13L
114. QOS50782 P13L
115. QOS50818 P13L
116. QOS50830 P13L
117. QOS50853 P13L
118. QOS50865 P13L
119. QOS50889 Q9H
120. QOS50889 P199S
121. QOS50901 S202N
122. QOS50924 S202N
123. QOS50948 P13L
124. QOS50960 P13L
125. QOS50972 P13L
126. QOS50996 P13L
127. QOS51008 P13L
128. QOS51020 P13L
129. QOS51032 P13L
130. QOS51068 R209I
131. QOS51068 P367L
132. QOS51080 R203K
133. QOS51080 G204R
134. QOS51092 P13L
135. QOS51104 R203K
136. QOS51104 G204R
137. QOU99154 P14L
138. QOU99201 Q9H
139. QOU99201 P199S
140. QOU99223 Q9H
141. QOU99223 P199S
142. QOU99247 S202N
143. QOU99259 S202N
144. QOU99270 Q9H
145. QOU99270 P199S
146. QOU99281 Q9H
147. QOU99281 P199S
148. QOU99292 Q9H
149. QOU99292 P199S
150. QOU99303 Q9H
151. QOU99303 P199S

Results

Mutational identification

A total of 729 structural protein sequences were retrieved from the NCBI virus database for spike glycoproteins, nucleocapsid phosphoproteins, envelope proteins, and membrane glycoproteins submitted from Asian countries in the month of October 2020, along with four references sequences. The size of the different reference structural proteins, i.e., spikes glycoprotein, nucleocapsid phosphoprotein, envelope protein, and membrane glycoprotein being 1273, 419, 75, and 222 amino acids.

The sequences were viewed using Jalview after alignment to compare and detect the mutations among the Asian isolates with the Wuhan isolates with respect to structural proteins. Amongst the 729 sequences released from Asia, a total of 987 point mutations were detected in all four structural proteins (Figure 1). Among the 311 mutants, spike showed the highest mutations with 807 point mutations (Table 3), followed by nucleocapsid with 151 mutations (Table 2), while envelope showed 19 mutations (Table 1) and membrane showed only 10 point mutations (Table 4).

Table 3.

Mutational location after Multiple Sequence Alignment of SARS-CoV-2 surface glycoprotein sequence with position and sequence

S. No. Accession Mutated sequence and position
1. QJF74843 V367F
2. QJF74867 D614G
3. QOI53592 M153I
4. QMI57728 T95I
5. QMI57728 N185K
6. QOI53580 D614G
7. QOR64233 D614G
8. QOR64245 D614G
9. QOQ57012 D614G
10. QOQ57024 D614G
11. QOQ57060 D614G
12. QOR64233 A701T
13. QOR64233 P812L
14. QOR64245 P812L
15. QOQ57012 P812L
16. QOQ57060 P812L
17. QOR64233 H1083Q
18. QOR64245 H1083Q
19. QOQ57012 H1083Q
20. QOQ57024 H1083Q
21. QOQ57072 D614G
22. QOQ57084 D614G
23. QOQ57096 D614G
24. QOQ57108 D614G
25. QOQ57121 D614G
26. QOQ57108 A701T
27. QOQ57072 A701T
28. QOQ57096 P812L
29. QOQ57121 P812L
30. QOQ57096 H1083Q
31. QOQ57121 H1083Q
32. QOQ57108 H1083Q
33. QOQ72544 L54F
34. QOQ72556 L54F
35. QOQ72568 L54F
36. QOQ72544 D614G
37. QOQ72556 D614G
38. QOQ72568 D614G
39. QOQ72580 D614G
40. QOQ84795 D614G
41. QOQ72544 A701T
42. QOQ72556 P812L
43. QOQ72568 P812L
44. QOQ72580 P812L
45. QOQ72544 H1083Q
46. QOQ84826 D614G
47. QOR63434 D614G
48. QOR63446 D614G
49. QOR63458 D614G
50. QOR63470 D614G
51. QOR63434 P812L
52. QOR63470 P812L
53. QOQ53335 S305T
54. QOQ53335 C488R
55. QOR63482 D614G
56. QOQ57036 D614G
57. QOQ57048 D614G
58. QOR63506 D614G
59. QOQ53335 D614G
60. QOQ57048 A701T
61. QOQ57036 P812L
62. QOQ57048 P812L
63. QOQ57036 H1083Q
64. QOQ57048 H1083Q
65. QOQ53339 F2L
66. QOQ53339 V11I
67. QOQ53339 S13R
68. QOQ53339 Q14H
69. QOQ53339 R34H
70. QOQ53339 V42I
71. QOQ53339 R44K
72. QOQ53339 V47I
73. QOQ53339 F59I
74. QOQ53339 K77N
75. QOQ53339 D111N
76. QOQ53339 Q115H
77. QOQ53339 A123T
78. QOQ53339 N487I
79. QOQ53339 V512L
80. QOQ53339 A522P
81. QOQ53339 A262T
82. QOQ53339 Q677H
83. QOQ53336 G199R
84. QOQ53340 A262T
85. QOQ53338 C301S
86. QOQ53340 R328T
87. QOQ53337 R457T
88. QOQ53338 D614G
89. QOQ53340 D614G
90. QOQ53336 A684V
91. QOQ53336 A688P
92. QOQ53336 V705I
93. QOQ53337 H1048Y
94. QOQ53337 Q1180H
95. QOQ53337 K1181Q
96. QOQ53341 V11I
97. QOL24227 V11I
98. QOQ53341 R44K
99. QOQ53341 V47I
100. QOL24227 K77N
101. QOQ53341 K77N
102. QOQ53341 K97N
103. QOQ53341 D111N
104. QOL24227 D111N
105. QOL24227 R190K
106. QOL24227 D198E
107. QOL24225 E224K
108. QOL24225 D228N
109. QOL24226 E224K
110. QOL24226 D228N
111. QOL24227 A262T
112. QOQ53341 Q271H
113. QOQ53341 F275L
114. QOL24228 V407L
115. QOL24228 P412S
116. QOL24227 D427H
117. QOL24227 N440H
118. QOL24227 Q474P
119. QOL24228 D614G
120. QOL24227 D614G
121. QOQ53341 G669R
122. QOQ53341 Q675R
123. QOQ53341 Q677H
124. QOL24227 S686I
125. QOL24227 A688P
126. QOL24225 K790Q
127. QOL24226 K790Q
128. QOL24225 R815K
129. QOL24226 R815K
130. QOL24225 D820N
131. QOL24226 D820N
132. QOL24225 D830N
133. QOL24226 D830N
134. QOL24228 P863H
135. QOL24241 F2L
136. QOL24241 V11I
137. QOL24241 Q14H
138. QOL24241 R34C
139. QOL24241 Y37N
140. QOL24241 V42I
141. QOL24241 R44K
142. QOL24241 F65I
143. QOL78311 S94F
144. QOL78311 T95P
145. QOL24240 D111N
146. QOL24240 A282T
147. QOL78311 D568H
148. QOL78311 D614G
149. QOL24241 H655Y
150. QOL24240 Q675RR
151. QOL79057 S13N
152. QOL79057 D40E
153. QOL79057 V42L
154. QOL79057 S161F
155. QOL79057 S246N
156. QOL79057 D614G
157. QOL79058 D614G
158. QOL79058 R1019K
159. QOL79058 P1090L
160. QOL79059 V11F
161. QOL79059 R21K
162. QOL79135 R21K
163. QOL79135 A222V
164. QOL79061 K529I
165. QOL79059 D614G
166. QOL79135 D614G
167. QOL79061 E619K
168. QOL79061 G652R
169. QOL79061 Q677H
170. QOL79061 Y695N
171. QOL79061 V729A
172. QOL79136 D614G
173. QOL79137 V11I
174. QOL79137 Q115H
175. QOL79137 D614G
176. QOL79137 P863H
177. QOL79137 Q913H
178. QOL79137 I934T
179. QOL79333 C136W
180. QOL79333 N137Y
181. QOL79333 I203L
182. QOL21485 H207P
183. QOL20612 E224K
184. QOL21486 E224K
185. QOL21486 R237K
186. QOL21486 F238V
187. QOL21486 Q239P
188. QOL20612 T240S
189. QOL79332 A262T
190. QOL79333 L252P
191. QOL79332 D467N
192. QOL21486 A475V
193. QOL20612 Q506H
194. QOL20612 V510E
195. QOL20612 V512E
196. QOL20612 D614G
197. QOL21485 V511I
198. QOL79333 D568H
199. QOL79332 Q675R
200. QOL20612 V826G
201. QOL21485 V826G
202. QOL20612 I844M
203. QOL21486 I844F
204.