Identification of Genes for Wheat Fungal Resistance Using Bioinformatics Techniques

For the majority of world populations, wheat (Triticum aestivum L.) would be the first essential and economic cereal grain crop. Pests and pathogens in both rich and developing countries are constantly threatening wheat production and sustainable development. Multiple gene pathways were recorded to share an association with fungal pathogens with wheat biological resistance. Our aim to use such tools in order to detect and classify fungal resistance genes in wheat through sequence alignment, protein domain identification and phylogenetic analysis. In addition the introduction for restriction fragment length polymorphism (RFLP) for such genes in the new primer database. Approximately 138 sequences of DNA were recovered from the wheat genome by aligning 3845 anti-fungal amino acids through tblastn tool. The NCBI blastn online tool used to detect sequences with functional genes, where 92 genes have been detected. The total number of nucleotides was 48385, where the smallest DNA sequence have 302 bp and the longest contains 977 bp with an average length of 525.9 bp per sequence. The wheat chromosomes 3D, and 4B have the highest number of sequences (9) followed by chromosomes 3B (7) and 3A(6), where wheat genomes A, B and D have 30, 35 and 27 genes, respectively. Five different amino acids motifs have been revealed among studied wheat amino acid sequences. The gene annotation tools used to infer studied amino acid gene annotation. Amino acid sequences belongs to lectin, kinase, tyrosine-protein kinase (STK), thaumatin, and cysteine-rich repeats representing 2, 9, 8, 19, 23 genes respectively, in addition to 31 hypothetical genes. The proteins chemical content have been assessed through 16 different amino acid chemical and physical characteristics.


Introduction
For the majority of world populations, wheat (Triticum aestivum L.) would be the first essential and economic cereal grain crop. Its 29 percent share shows the world economy's significance of wheat production from world's 730 million hectares of crops cultivated lands. Depending on FAO figures for 2017, this percentage is comparable to 218 million hectares of wheat region (1). In most of the world, wheat is an key ingredient of human diet. Worldwide, 735 million tons of wheat were produced in 2015/2016, worth around US$ 145 billion (2). Wheat accounts for almost 55% of the carbohydrates and 20% of the world's food calorie intake (3).
In Egypt, the significance of wheat crops returns to its strategic importance in Egyptian dietary commodities as it offers more than one-third of Egyptian consumers ' daily calorie intake and 45% of their daily total protein intake (4). Wheat is Egypt's largest winter grain and is cultivated across the Nile delta and along the Nile Valley length. Wheat is usually planted after summer maize, cotton, or rice crops in early November and collected in late April or early May. Germination happens after planting at 19 to 22 weeks (5). While Egypt's wheat productivity has increased over the past few years, wheat production provides only 45% of its yearly domestic demand. Egypt is still one of the largest importing wheat countries. Wheat imports were approximately 9.8 million tons in 2011, costing approximately US$ 3.2 billion (6).
Pests and pathogens in both rich and developing countries are constantly threatening wheat production and sustainable development (7). A key component for trying to meet this challenge is proper management of fungal diseases that may be accountable for yield losses of 15-20 percent per year. The rusts, blotches and head blight/scab are among the major wheat diseases that make a contribution to these losses (2). The rusts, blotches and head blight/scab are among the major wheat diseases that currently contribute to these losses. There are three diseases of wheat rust, namely stem, stripe and leaf rust, all caused by members of the Basidiomycete family, genus Puccinia, named P. graminis f. SP. tritici, P. striiformis f. sp. tritici (PST) and P. triticina (Pt), respectively (2).
Multiple gene pathways were recorded to share an association with fungal pathogens with wheat biological resistance. The resistance is attributable to the additive impacts of multiple resistance metabolites and proteins generated through a network of multiple plant R gene structures. Plants recognize pathogenic elicitors or receptors and then stimulate downstream genes to ultimately produce resistance metabolites and proteins that suppress plant pathogen progression (8). Following the view of pathogens and microbes, reactive oxygen species (ROS) were consistently identified to accumulate in the plant and over the years, ROS was postulated as an essential part of the plant's defense response (9).
The connection between genetic variability identified using distinct molecular marker assays and genes responsible for morphological and physiological characteristics could be identified through sophisticated bioinformatics analysis tools (10)(11)(12). These tools could be used to identify and reveal new genes in wheat related to resistance to fungal diseases. Our aim to use such tools in order to detect and classify fungal resistance genes in wheat through sequence alignment, protein domain identification and phylogenetic analysis. In addition the introduction for restriction fragment length polymorphism (RFLP) for such genes in the new primer database.

Materials and Methods
Anti fungal gene sequences have been downloaded from NCBI database (13). We have downloaded 3845 antifungal protein sequences from NCBI. The draft genome sequence of wheat has been downloaded from Ensemble database (14). The local NCBI blast package (15) was used to build sequence databases using the wheat draft genome and blast all anti-fungal genes with TBLASTN against wheat database. The online NCBI blast tblatn was used to annotate sequences recovered from previous step. MEME suite was used to discover amino acid motifs in the sequence (16). MegaX program was used to construct phylogenetic analysis through maximum likelihood algorithm (17). clustalo (18) tool was used to construct sequence similarity matrix through multiple sequence alignment. Clust-Vis was applied to create Principal Component Analysis (PCA) plots and heatmaps depicting genes similarities. Pepstat (19) was used through in-home per scrips to assess proteins chemical and physical characteristics. The perl script RestrictionDigest was used to detect restriction enzymes recognition sites in DNA sequences (20). BatchPrimer3 online tool was used for designing PCR primers could be used to target studied wheat genes (21).

Chromosomal distribution of wheat anti-fungal genes
Approximately 138 sequences of DNA were recovered from the wheat genome by aligning 3845 antifungal amino acids through tblastn tool. The NCBI blastn sequence. The wheat chromosomes 3D, and 4B have the highest number of sequences (9) followed by chromosomes 3B (7) and 3A (6)

Wheat anti-fungal proteins chemical and physical properties
The proteins chemical content have been assessed through 16 different amino acid chemical and physical characteristics ( Figure 2 and Table 1). The total molecular weight (MW) of studied fungal amino acid sequences was 1913 KDa with an average of 20 KDa, where 1A:551827789-551828139 has the minimum MW (10.6) and 5D:59043196-59044206 generated the maximum MW (71.911.68) Extinction coefficients are a measure of how much light the protein can be measured at a certain wavelength from the extinction factor. Approximation of this component is needed to be able to track the protein within the spectrophotometer. It is necessary to know the amino acid content to assess the molar extinction coefficient of the protein (22). The A280 molar extinction coefficients cystine bridges (A280-MECc ) and reduced (A280-MECr) are two different measurements of extinction coefficient, where salt bridges are important motifs of the tertiary protein structure and are mostly correlated with the structural influence force which maintains the stability of the protein. Commonly found on the solvent system and particularly vulnerable to solvent-solute interactions typically with water as well as other cosolvents (23).  The average residue weight (ARW) stands for the average collective weight for all amino acid sequences according to its length. The total ARW for all amino acid sequences was 9829.3 Da, where 7A:164401571-164402272 revealed the minimum ARW (98.498 Da). 3D:550382548-550383990 generated the maximum (116.631 Da) and the mean was 106.841 Da. The isoelectric point (IP) is the pH level at which protein's net charge is positive and is associated with its production of amino acids and protein conformation (25). The collective wheat fungal genes amino acids was 588.9 with a mean of 6.402, where it ranges from 4 (1A:20888904-20889590) to 10.4 (7B:40654848-40655408).
In addition, a protein's folded structure becomes thermodynamically less desirable because it decreases the protein's disorder or entropy. Nonpolar (water hating) side chains tend to push themselves inside a protein whereas side chains of polar (water loving) prefer to put themselves outside of the molecule (26) The studied fungal amino acids charges ranged from -9.5 (3B:690358283-690358846) to 22 (7B:40654848-40655408) ( Figure 2 and Table 1).

Detection of protein domain and phylogenetic analysis
A motif for the protein sequence is a brief pattern that nature retains. For proteins, a motif can refer to an enzyme's active site or a structural unit required for proper protein folding. Therefore, sequence motifs are among molecular evolution's basic functional components (16).  The Motif-1 and Motif-3 has a match to ELME000249 in motif database with p-value of 9.64e-04. This motif is a TRFH domain docking motifs wich are coordinating the telomeres with other proteins. Not only do they form homodimers with their TRFH domains, they also provide specific protein-binding surfaces for interaction (27). TRFHs and other proteins have been proposed to have important implications for the biology and evolution of telomeres. In general, this offers a basis for recognizing and controlling the hierarchical assembly and stoichiometry of telomere subunits during the cell cycle, cell division and senescence (28).
Motif-2 has an ELME000084 match which is ligands of phosphotyrosine bound by domains SH2. Src Homology 2 (SH2) domains identify tiny patterns comprising a residue of phosphorylated tyrosine. Up to four positions after the pTyr was found mainly to assess additional specificity. In plants SH2 dominates new signaling scenarios and participates in metazoan signal transduction, serving as key mediators of controlled protein-protein interactions with tyrosinephosphorylated substrates (29,30).
Moreover, the database motif of ELME000377 was highly similar to Motif-4, this motif codes for Pex14 ligand motif, which belongs to peroxisomes. Peroxisomes are subcellular organelles present in eukaryotes that are singlemembrane spherical. Peroxisomes relate together with glyoxysomes found in plants and glycosomes found in trypanosomes to microbody group of organelles. Import into the peroxisome of peroxisomal matrix enzyme proteins (PTS1 cargo) includes the identification of the PTS1 cargo by the cytosol Pex5 receptor, the docking of the PTS1-Pex5 complex at the peroxisomal membrane, and the translocation of the PTS1 cargo through the peroxisomal membrane into the matrix. This is preceded by recycling the Pex5 receptor back into the cytosol for another round of export of PTS1 stock. Imported into peroxisome are unfolded, folded oligomeric or cofactor-bound proteins (31).

Highlights in BioScience
Additionally, Motif-5 was with high similarity with ELME000159, which is MAPK Phosphorylation Site. Cascades of mitogen-activated protein kinase (MAPK) are highly conserved downstream signaling modules of receptors/sensors that transform extracellular stimuli in eukaryotes into intracellular responses. Plant MAPK cascades play crucial functions against pathogen invasion in signaling plant defense (32). The NCBI-blastx tool used to infer studied amino acid gene annotation. Amino acid sequences belongs to lectin, kinase, tyrosine-protein kinase (STK), thaumatin, and cysteine-rich repeats representing respectively, 2,9,8,19,23 genes, in addition to 31 hypothetical genes (Figure 1).
Lectins are non-immune proteins which attach carbohydrates directly and reversibly. The lectin's biochemical functions are really diverse. A common theme arising from the identified functions of many plant and animal lectins is their involvement in communicating with other organisms whether symbiosis or defense, as effectors or regulators (33). New types of nucleocytoplasmic plant lectins have been described and defined over the past decade, especially lectins expressed within the nucleus and the cytoplasm of plant cells, much as part of a particular plant response when exposed to different stressors or shifting environmental conditions (34). Lectins include proteins containing at least one non-catalytic domain which helps them to selectively identify and reversibly attach to different glycans that are either freely available or are member of glycoproteins or glycolipids.
Plants release an overwhelming number of highly complex lectins with various molecular structures and attaching specificities to endogenous (plant) glycans as well as exogenous (non-plant) glycans (35). The role of plant lectins in plant defense againt different pathogens including fungi have been reported in previous researches (36,37). There are several receptor-like kinases among the plant proteins proposed to engage in immunity pathways (38). For example, multiple kinase proteins have been recorded for wheat resistance to fungal diseases, which provides temperature-dependent resistance to wheat stripe rust (39), and fungal resistance in Arabidopsis (40).
Using maximum likelihood analysis phylogenetic analysis was effective in differentiating between anti-fungal wheat proteins according to their protein domains. Four classes grouped the phylogenetic tree. Group A comprises only Motif-3 domain protein fragments, group B has Motif-3, Motif-4 and Motif-5 domain proteins. In addition some group C contains genes with Motif-1, Motif-2 and Motif-5 domains, and eventually group D contains genes with all protein domains (Figure 4).

PCR primers and restriction enzymes analysis
About 85 PCR primers pairs have been designed to target most of the predicted anti-fungal genes (Supplementary file 2). The PCR predicated product size ranged from 300 bp to 708 pb, where the GC content ranges from 36.4 to 66.7% and the primers annealing temperature ranges from 58.05 oC to 62.55 oC. A total of 40 different restriction enzymes (RE) were used to identify potential locations for future RFLP tests for genetic polymorphism inside predicted anti-fungal wheat genes. Figure (5) demonstrates the site redundancy ratio for RE identification within genes. Figure (5) shows that some RE such as BanII, NlaIII and BaeGI have high probability to produce fragment length polymorphism isnide most of predicated anti-fungal genes.

Conclusion
Predicting using available fungal resistance genes in the public database was very helpful and suggested that several wheat genes could be used to restrict genetic research of genes that hold the key to fungal resistance in wheat. Most of the identified protein domains clarify the genetic structure of anti-fungal genes and suggest a potential role for the MAPK gene family in such pathways.
Providing restriction enzyme information and gene-specific PCR primers could be useful to wheat scientists and breeders, saving time and effort.