Название: Genotyping by Sequencing for Crop Improvement
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Биология
isbn: 9781119745679
isbn:
1.3.2.6 Expressed Sequence Tags (ESTs)
These markers are developed by end sequencing (generally 200–300 bp) of random cDNA clones. The sequence thus obtained is referred to as expressed sequence tags (ESTs). A large number of ESTs have been synthesized in several crop plants and are available in the EST database at NCBI (https://www.ncbi.nlm. nih.gov/dbEST/). These markers were originally developed to identify gene transcripts and have played important role in the identification of several genes and the development of markers such as RFLP, SSR, SNPs, CAPS, etc. (Semagn et al. 2006). However, EST‐based SSRs show less polymorphism as compared to genomic DNA‐based SSRs. Since EST markers are from expressed sequence regions, these are highly conserved among the species and can be used for synteny mapping. Most of these could also be functional genes. A large number of EST markers have been used in rice for developing a high‐density linkage map (Harushima et al. 1998) and for chromosome bin mapping in wheat using deletion stocks (Qi et al. 2003). In addition to these, several other molecular marker variants have been developed. The description of those markers is presented in Table 1.1.
1.4 Sequencing‐based Markers
1.4.1 Single‐Nucleotide Polymorphisms (SNPs)
Single‐nucleotide polymorphisms (SNPs) are more abundant resulted from single‐base pair variations. These are evenly distributed in a whole genome that can tag almost any gene or locus of a genome (Brookes 1999). However, the distribution of SNPs varies among species with 1 SNP per 60–120 bp in maize (Ching et al. 2002) and 1 SNP per 1000 bp in humans (Sachidanandam et al. 2001). SNPs are more prevalent in the noncoding region. In the coding region, SNPs could be synonymous or nonsynonymous. In synonymous SNPs, there is no change in the amino acid resulting in no phenotypic differences. However, phenotypic differences could be produced due to modified mRNA splicing (Richard and Beckman 1995). In nonsynonymous SNPs, change in amino acid results in phenotypic differences. SNPs are mostly bi‐allelic and cause polymorphism due to nucleotide base substitution. The two types of nucleotide base substitutions result in SNPs. A transition substitution occurs between purines (A, G) or between pyrimidines (C, T). This type of substitution constitutes two‐thirds of all SNPs. A transversion substitution occurs between a purine and pyrimidine. SNPs can be detected by the alignment of the similar genomic region of two different species. The SNPs have only two alleles compared to typical multiallele SSLP; however, this disadvantage can be compensated by using the high density of SNPs.
1.4.2 Identification of SNP in a Pregenomic Era
Initially, identification of SNP markers was laborious and expensive and involved allele‐specific sequencing (Ganal et al. 2009). This includes sequencing of unigene‐derived amplicons using Sanger’s method from two or more than two lines. In an experiment, about 350 bp of the RFLP clone, A‐519 was end sequenced in soybean and the flanking amplification primers were designed (Coryell et al. 1999). Primers were used to screen for allele diversity using PCR from ten genotypes and the amplicons were sequenced followed by sequence comparison to identify SNP. SNPs were also identified through mining a large number of EST sequences in EST databases, which are generated through improved sequencing technologies (Soleimani et al. 2003). These SNPs are further validated using PCR (Batley et al. 2003). These approaches allowed the identification of mainly gene‐based SNPs, but their frequency is generally low. Additionally, SNPs located in low‐copy noncoding regions and intergenic spaces could not be identified.
Several assays have been developed for genotyping based on identified SNPs which include, allele‐specific hybridization, primer extension, oligonucleotide ligation, and invasive cleavage (Sobrino et al. 2005). Besides, DNA chips, allele‐specific PCR, and primer extension were also attractive options since these are suitable for automation and can be used for the development of dense genetic maps. Allele‐specific hybridization was used for the identification of polymorphism in 570 genotypes of soybean (Coryell et al. 1999).
1.5 Recent Advances in Molecular Marker Technologies
The improvement of Sanger sequencing technology in the 1990s combined with the beginning of EST and genome sequencing projects in model plants led to the spurt in the identification of variation at the single‐base resolution (Wang et al. 1998). From 2005 onward, the emergence of NGS platforms such as Roche 454, Illumina HiSeq2500, ABI 5500xl SOLiD, Ion Torrent, PacBio RS, Oxford Nanopore, and advances in bioinformatics tools simplified the process of identification of genome‐wide SNPs and changed the face of molecular marker technology. NGS‐based genotyping platforms such as genotyping‐by‐sequencing (GBS), whole‐genome resequencing (WGR), and high‐density SNP arrays helped to type thousands of SNPs in a single reaction in hundreds of individuals.
1.5.1 Genotyping‐by‐Sequencing (GBS)
GBS is an NGS‐based reduced representation sequencing technique for the identification of genome‐wide SNPs and genotyping large populations (Bhatia et al. 2013). GBS is a one‐step approach for the identification and utilization of markers in a single reaction. It is a complexity reduction procedure where a combination of restriction enzymes is used to separate low copy sequences from high copy repetitive regions. In general, GBS involves the sequencing of fragments generated through restriction digestion of the genome on the NGS platform. In this process, the DNA of the population is digested with RE followed by ligation of RE‐specific adaptors containing genotype‐specific barcode sequences and sites for binding PCR and sequencing primers (Figure 1.1). The fragments thus generated can be PCR amplified and an equal volume of PCR product from different individuals are pooled in a tube. The fragments in the pool can be selected based on their size and sequenced on the NGS platform. The choice of restriction enzymes depends upon the complexity and size of the genome. Presently, different versions of GBS are available, which includes RAD‐seq (restriction associated DNA sequencing), ddRAD‐seq (double‐digest restriction associated sequencing), SLAF‐seq (specific‐locus amplified fragment sequencing), Rest‐seq (restriction DNA sequencing), Skim GBS (skim‐based GBS) (Bhatia 2020). These versions differ with respect to fragment size selection, the extent of complexity reduction, and genome coverage. Since GBS is a population‐dependent genotyping method, to make it cost‐effective a low‐depth sequencing is adopted which caused a high rate of missing data. The low‐depth sequencing makes it an ineffective genotyping approach in heterozygous populations. GBS has low genome coverage due to reduced representation sequencing.
Figure 1.1 An example of GBS and GBS data analysis workflow for identification of SNP markers.
GBS is being widely used to capture SNPs and other marker variations by NGS. GBS overtook the conventional genotyping procedures involving the use of traditional markers such as RAPD, AFLP, SSR, and many others in terms of time, labor, and cost involved. As an example, GBS can generate data of thousands of markers in a large population in a week, which can be analyzed in a month (Bhatia et al. 2018). The approach has been utilized in the mapping of several economically important traits in a number of crop plants (Poland and Rife 2012). Most of the developing countries have in‐house computational facilities that are being used for GBS СКАЧАТЬ