Received Date: November 02, 2016; Accepted Date: November 09, 2016; Published Date: November 16, 2016
Citation: Yaqoob U, Kaul T, Pandey S, Nawchoo IA (2016) In-silico Characterization, Structural Modelling, Docking Studies and Phylogenetic Analysis of 5-Enolpyruvylshikimate-3-Phosphate Synthase Gene of Oryza sativa L. Med Aromat Plants (Los Angel) 5:274. doi: 10.4172/2167-0412.1000274
Copyright: © 2016 Yaqoob U, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Medicinal & Aromatic Plants
The 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) is one of the vital enzymes of the shikimate pathway which is involved in the biosynthesis of secondary metabolites and several amino acids. The multiple sequence alignment of these EPSPS protein sequences from different plants showed conserved regions at different stretches with maximum homology in amino acid residues. We revealed the homology model of Oryza sativa EPSPS (OsEPSPS) protein using the structure of E. coli EPSPS as template. The resulting model structure was refined by PROCHECK, RAMPAGE server, ProSA, Verify3D etc. that indicated the model structure is reliable. Ramachandran plot analysis showed that conformations for 94.3% of amino acid residues are within the most favoured regions. Through motif analysis, it was revealed that a conserved EPSPS domain is uniformly found in all EPSPS proteins irrespective of variable plant species suggesting its possible role in cellular and metabolic functions. The phylogenetic tree constructed revealed different clusters based on EPSPS in respect of bacteria, monocot and dicot plants. The interacting partners of the gene shows the importance of this gene family in regulating developmental and metabolic functions. The two conserved motifs LP(G/S)KSLSNRILLLAAL and LFLGNAGTAMRPL present in almost all EPSPS plant species may function as the catalytic domains of EPSPS enzymes and are supposed to contribute in the glyphosate binding site.
The 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS), one of the key enzymes of the shikimate pathway is involved in the biosynthesis of several aromatic amino acids (Phenylalanine (Phe), Tyrosine (Tyr) and Tryptophan (Trp)) and other secondary products (auxin, salicylate, folic acid, phytoalexins, flavonoids, alkaloids etc.) essential for plant survival . It is also verified as a specific target of broad spectrum herbicide glyphosate (N-phosphonomethyl glycine) . EPSPS (aroA) plays a central role in catalysing the transfer of enolpyruvyl moiety from phosphoenol pyruvate (PEP) to shikimate-3-phosphate (S3P) forming EPSP and inorganic phosphate . The reaction is chemically infrequent because it proceeds via C–O bond cleavage of phosphoenol pyruvate rather than via P–O bond cleavage . Glyphosate (GPJ) inhibits EPSPS in a slowly reversible reaction, which is competitive with respect to PEP and uncompetitive with respect to S3P [5,6]. In most of the crops and weeds, glyphosate can starve the plants of aromatic amino acids by competitively inhibiting the binding of EPSPS with PEP. Mutagenesis of EPSPS was done in various species so as to obtain glyphosate-tolerant EPSPS like proline-106 to serine in E. indica , proline-106 to leucine in N. tabacum , glycine-100 to alanine in agrobacterium sp. strain CP4 , proline-101 to serine in N. tabacum . The occurrence of shikimate pathway in algae, bacteria, fungi and plants makes EPSPS a principal target for rising herbicide-resistant genetically modified crops . Thus understanding its mechanism for regulating metabolic and developmental processes in diverse plant species would be a great revolution for engineering new herbicides, developing glyphosate resistant crops, new antibiotic and anti-parasitic drugs.
Comparative modelling and structural analysis
The reference sequence of EPSPS from Oryza sativa was retrieved by using NCBI database (http://www.ncbi.nlm.nih.gov). By searching the PDB of known protein structures, the comparative modelling was performed with target sequence as the query . The target sequence was searched for similar sequence using the BLAST (Basic Local Alignment Search Tool)  against Protein Database (PDB) (http:// www.rcsb.org). The best template for query sequence was recognized based on the e-value, % sequence identity and % sequence coverage. The BLAST results yielded X-ray structure of EPSPS from E. coli with 53% similarity to our target protein (OsEPSPS). Using ClustalW , all the sequences of EPSPS were aligned to find out the similarity present among the sequences. 2D and 3D structure alignment was carried out using ClustalW  and MATRAS 1.2 , respectively. The sequences of the EPSPS were further analysed for the presence of specific EPSPS domains and motifs through motifscan (myhits.isbsib. ch/cgi-bin/motif scan) and scan prosite (Prosite.expasy.nlm.nih. gov). Analysis of conserved motifs was done by MEME version 3.5.7  using minimum and maximum motif width of 20 and 50 residues respectively and maximum number of 7 motifs, keeping rest of the considerations at default. Via Modeller 9.12 by comparative modelling of protein structure prediction, the theoretical structure of OsEPSPS from was generated.
The secondary structural features of the EPSPS sequences of template and target were calculated using SOPMA. The physico-chemical properties of EPSPS sequences like molecular weight, theoretical isoelectric point (pI), number of amino acids, total number of positive and negative residues, aliphatic index , grand average hydropathy (GRAVY)  extinction coefficient  and instability index  were evaluated by using Expasy’s ProtParam server (http://us.expasy. org/tools/protparam.html) . The sub-cellular localizations were predicted by using CELLO v.2.5 . Using NetNglyc 1.0 server (http://www.cbs.dtu.dk/services/NetNGlyc/), the N-glycosylation sites of the EPSPS proteins were predicted. Using String software (http:// string-db.org/) the interacting partners of EPSPS and its co-expressed genes were predicted .
Model validation of OsEPSPS
On the basis of geometrical and stereo-chemical constraints, the model was evaluated using RAMPAGE server (http://mordred.bioc. cam.ac.uk/-rapper/rampage.php), PROCHECK , Verify 3D  and ProSA-Web . The model with the least number of residues in the disallowed region was selected for the further studies. The RMSD value between the template and target was calculated using MOE . The best model structure was then compared with the template protein by superimposition using SuperPose Version 1.0 .
Active site prediction and molecular docking
Active sites of model and template proteins were identified using different binding site prediction servers like Q-site finder (http:// bmbpcu36.leeds.ac.uk/qsitefinder/), CASTp (http://sts-fw.bioengr.uic. edu/castp/) and PINUP server (http://sparks.informatics.iupui.edu/ PINUP/) [29-31]. The refined protein model (OsEPSPS) was used to study its ligand binding mechanism. Docking analysis was performed by Sybyl 8.0 molecular modelling tool to identify active sites on protein structure where favourable protein-ligand interactions can occur . The ligand molecules (S3P and GPJ) were docked inside the cavity of OsEPSPS protein.
Using Molecular Evolutionary Genetic Analysis (MEGA) software Version 4.1 , phylogenetic analysis of the sequences was carried by using UPGMA method. Each node was tested using the bootstrap approach by taking 5,000 replicates.
Comparative modelling and structural analysis
The Oryza sativa EPSPS (OsEPSPS) protein sequence comprises of 515 amino acid residues. Sequences that showed maximum identity with high score and low e-value were aligned. According to the result of BLAST search against PDB , three reference proteins (PDB ID: 3NVS, 1G6S, 3FJX) represented a high level of sequence identity - 54%, 53% and 53% respectively. The E. coli template (PDB ID: 1G6S) with an e-value of 2e-149 and a query cover of 84% was selected for homology modelling. Structurally conserved regions (SCRs) between model OsEPSPS (target) and homologous proteins (PDB: 1G6S, 3NVS, 3FJX) were determined by multiple sequence alignment (Figure 1). Multiple sequence alignment of the EPSPS sequences highlighted the sequence conservation of amino acid residues among different species (Supplementary File 1). Structurally conserved regions (SCRs) between model OsEPSPS and template (PDB: 1G6S) were also determined (Figure 2). An extensive search of the motifs and their positions was done by MEME software which identified several conserved motifs in the protein sequences of EPSPS (Figure 3). Multilevel consensus sequences for the MEME defined motifs along with their functions are shown in Table 1. LP(G/S)KSLSNRILLLAAL and LFLGNAGTAMRPL motifs were present in almost all selected species.
|Motif||Multilevel consensus sequences||Function|
|1||ITPPEKLNVTEIDTYDDHRMAMCFSLAACADVPVTIKDPGCTRKTFPDYF||Protein kinase C phosphorylation site,Casein kinase II phosphorylation site and N-glycosylation site|
|2||DVNMNKMPDVAMTLAVVALFADGPTAIRDVASWRVKETERMIAICTELRK||EPSP synthase, Protein kinase C phosphorylation site.|
|3||EGDASSASYFLAGAAITGGTVTVEGCGTNSLQGDVKFAEVLEKMGAKVTW||DLRB LDL-receptor class B (LDLRB), N-myristoylation site|
|4||ISSQYLTALLMAAPLALGDVEIEIIDKLISIPYVEMTLKLMERFGVSVEH||Protein Kinase C Phosphorylation Site|
|5||VLQPIKEISGTIKLPGSKSLSNRILLLAALSEGTTVVDNLLNSDDIHYML||Casein kinase II phosphorylation site, Protein kinase C phosphorylation site, Pumilio RNA-binding repeat profile.|
Table 1: Multilevel consensus sequences for the MEME defined motifs and their predicted functions.
The initial model of OsEPSPS was built by homology modelling methods using Modeller 9.12 software . The Modeller 9.12 software constructed five model structures for OsEPSPS and the model with the lowest Discrete Optimized Protein Energy (DOPE) score was visualized by Accelrys Discovery studio version 4.1. This model was used for the identification of active sites and for docking of the substrate with the EPSPS. The rice and E. coli harbours both of the EPSPS domains which probably indicate toward similar mode of action as in microbes. In this study, predicted 3D structure of OsEPSPS was generated and the N-terminal and C-terminal domains were identified (Figure 4). In E. coli , EPSPS consists of six aligned parallel alpha-helices in each of two similar EPSPS I domains . Similar domain structures were detected by Gong et al. , Garg et al.  and Filiz and Koc . Bacterial EPSPSs are reported to fold in two globular domains and an insideout α-β barrel domain with PEPS3P binding in the interdomain cleft region . The secondary structural features of the EPSPS sequences of 1G6S and OsEPSPS were calculated using SOPMA  with default parameters (Table 2). The EPSPS protein is composed of 42.52% α-helices, 17.86% extended strands and 10.10% beta turn in rice. In case of E. coli , the EPSPS protein is composed of 38.88% α-helices, 20.61% extended strands and 11.48% beta turn. Thus the α-helices and the beta sheets cover comparatively larger portions of the rice and E. coli EPSPS enzymes. Similar results have been observed by Gong et al. , Garg et al.  and Filiz and Koc  in several plant species. ScanProsite server identified the two signature sequences LFLGNAGTAMRPLTA (166-180) and RVKETERMVAIRTELTKLG (427-445) in both target and template. Several physico-chemical properties of EPSPS sequences were calculated by using Expasy’s ProtParam server . The results are shown in Table 3. In developing buffer system for protein purification (isoelectric focusing method), the computed isoelctric point (pI) will be useful. The very high aliphatic index of the EPSPS enzyme sequences indicate that these enzymes may be stable for a wide temperature range. The high extinction coefficient of enzyme in rice indicates the presence of more Cys, Trp and Tyr. The instability index value for the EPSPS proteins were found to be ranging from 28.78 to 33.83 indicating the stable nature of the proteins. Using NetNglyc 1.0 server, the N-glycosylation sites (188 NATY and 464 NITA) of the OsEPSPS protein were predicted and may play role in posttranslational modifications for enzymatic function. N-glycosylation is an essential process for posttranslational modifications of proteins .
|Secondary structure element||OsEPSPS||1G6S|
Table 2: Details of the calculated secondary structure elements by SOPMA.
|Number of amino acids||515||427|
|Grand average of hydropathicity (GRAVY)||0.101||‐0.005|
|Extinction coefficients (M‐1 cm‐1)||34755||30745|
|CELLO predicted location||Combined||Combined|
|Predicted N-glycosylation sites||188 NATY, 464 NITA||-|
Table 3: Physiochemical, structural and sequence properties, sub-cellular localizations and N-glycosylation sites of the EPSPS protein sequences.
Using String software, the EPSPS interacting partners as well as its co-expression genes were predicted in both rice and E. coli (Figure 5). Some proteins such as 3-dehydroquinate synthase, 3-dehydroquinate dehydratase, shikimate kinase, chorismate synthase and shikimate-5-dehydrogenase are found to be common interacting partners of EPSPS in both rice and E. coli . In the second step of shikimate pathway, 3-dehydroquinate synthase converts the 3-deoxy-arabinoheplutosonate-7-phosphate to 3-dehydroquinate and is essential for basic cellular metabolism machinery. In the fifth step of shikimate pathway, Shikimate kinase, an ATP dependent enzyme catalyzes the phosphorylation of shikimate to shikimate 3- phosphate. The seventh step of the shikimate pathway for the biosynthesis of aromatic amino acids is catalysed by chorismate synthase which is conserved in prokaryotes, fungi and plants .
Validation of OsEPSPS structure
RAMPAGE server and PROCHECK generated model revealed that 94.3% residues are falling in the most favoured region, 4.1% residues in allowed region, and 1.6% residues in outlier region of the Ramachandran plot (Figure 6). ProSA-Web analysis of the model revealed a Z-score value of target protein. The Z-score value of the target model OsEPSPS (-8.01) is located within the space of proteins determined by NMR and X-ray crystallography. This Z-score value is close to the value of template 1G6S (-11.83) which suggested that the obtained model was reliable and very close to experimentally determined structures (Figure 7a). Verify3D showed a score greater than 0.2 in 76% of the residues that corresponded to the quality of the OsEPSPS model that was acceptable and reliable. The value of RMSD indicates the degree to which the two three dimensional structures are similar. The lesser the value, the more similar the structures are. The Cα RMSD and backbone RSMD deviation for the OsEPSPS model and the E. coli template (1G6S) crystal structure were 1.58Å, and 1.56 Å, respectively and overall RMSD was 1.72 Å. Thus, the OsEPSPS model generated by Modeller 9.12 was confirmed to be reliable and accurate. The superimposition of the template and the model structure is shown in Figure 7b. It shows that the helix and the sheet regions of the template and model structure superimposed in a better way and a large deviation can be observed mainly in loop regions. It is reported that the loop region is the main region where the accuracy of a model protein structure deviates from the templates . The ribbon diagram shown in Figure 4. 14C shows the docking of glyphosate (white balls) and S3P (brown balls) into the structure of OsEPSPS (target).
Figure 7: (A) Validation of OsEPSPS by ProSA tool. The Z-score value OsEPSPS (target) and E. coli EPSPS (template) protein were determined by NMR (represented in dark blue colour) and X-ray (represented in light blue colour). The two black dots represent Z-score value of target and the template. (B) Superposition of OsEPSPS (target) and E. coli EPSPS template (PDBID: 1G6S) shown in blue and green colour respectively. (C) Ribbon diagram showing docking of glyphosate (white balls) and S3P (brown balls).
Prediction of active sites and docking studies
After the final model was built, the possible binding sites of OsEPSPS were searched using various binding site prediction servers such as Q-site finder, CASTp and PINUP [29-31]. These studies showed that residues K, Q, D were highly conserved in active site of both model and the template protein and hence it could be predicted that their biological function would be identical. These conserved residues may function as the catalytic domains of EPSPS enzymes and could be in the glyphosate binding site as seen in bacterial EPSPS . The mutation of a single amino acid (particularly lysine and arginine) can alter the binding site of glyphosate . Molecular docking was performed by Sybyl 8.0 Surflex-Dock method (Tripos Inc., USA). We docked S3P and GPJ inside the cavity of OsEPSPS protein (Figure 7c). The Shikimate-3-phosphate (S3P) has ligand binding residues at 94, 95, 99, 173, 249, 250, 251, 277, 280, 402, 429 and the binding residues are K, S, R, T, S, S, Q, S, Y, D, and K respectively. The glyphosate (GPJ) ligand has ligand binding residues at 94, 170, 172, 202, 251, 402, 430, 433, 474, 475, 500 and the binding residues are K, N, G, R, Q, D, E, R, H, R and K respectively. Both GPJ and S3P have similar amino acids K, Q, D at positions 94, 251 and 402 respectively (Table 4). The glyphosate binding site is dominated by basic residues (Arg and Lys)  indicating their role in glyphosate-EPSPS binding.
|Ligand Name||Binding Residues|
|S3P||94K 95S 99R 173T 249S 250S 251Q 277S 280Y 402D 429K|
|GPJ||94K 170N 172G 202R 251Q 402D 430E 433R 474H 475R 500K|
|PO4||168L 169G 170N 171A 196V 199M|
Table 4: Binding residues of different ligands of the OsEPSPS protein.
The phylogenetic analysis of EPSPS across the selected organisms showed a clear delineation of EPSPS into four clusters. Phylogenetic tree results outline the development of EPSPS in Arabidopsis thaliana, Amborella trichopoda, Brassica rapa, Brachypodium distachyon, Cucumis melo, Fragaria vesca, Glycine max, Malus domestica, Oryza sativa , Populus trichocarpa, Phoenix dactylifera, Setaria italica, Sorghum bicolor, Solanum lycopersicum, Vitis vinifera, Zea mays, E. coli and V. chloerae. Many of these exhibited orthologous and paralogous relations with each other (Figure 8). However, B. distachyon showed highest sequence similarity to OSEPSPS. Amborella trichopoda is believed to be the most basal lineage in the clade of angiosperms. The results indicate that EPSPS protein gene family is strictly conserved and has evolved from bacteria.
The first author is grateful to Council of Scientific and Industrial Research (CSIR) for providing financial assistance.
We declare that we have no conflict of interest.