Genetic Analysis of Complex Disease. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Genetic Analysis of Complex Disease - Группа авторов страница 12

Название: Genetic Analysis of Complex Disease

Автор: Группа авторов

Издательство: John Wiley & Sons Limited

Жанр: Биология

Серия:

isbn: 9781119104070

isbn:

СКАЧАТЬ human leukocyte antigen genes, T‐cell receptor genes, and the myelin basic protein gene, are prime candidates for analysis. The strength and weakness of this approach arise from the confidence in the role of these genes. If the evidence is strong that a direct role is played, only a few such genes may need to be tested to find a trait‐associated variant. If the evidence is more circumstantial, then many genes may have equal justification for being studied, and not much is gained over conducting a genome‐wide screen. Such studies are now most often conducted as follow‐up of prior genomic screens or other hypothesis‐generating experiments.

      Analysis

      Genomic Analysis

      Statistical Analysis

      The analysis of genetic and phenotypic data for a complex trait is multifaceted and depends on the research question, study design, genomic data available, and phenotypic characteristics. Methods to analyze these data are under constant development, and new approaches are continuously being released. Therefore, the analytic strategy for a genomic study must be reviewed periodically and revised if necessary to take advantage of newly developed approaches. Depending on the study design, the analytic plan may include linkage analysis (Chapter 6) in families or association studies in families or population samples (Chapters 8 and 9). These approaches are not mutually exclusive – a design may start with a linkage analysis of large families followed by association analysis within regions of linkage. Similarly, other multi‐stage studies conduct a GWAS of individual SNPs (Chapter 9) and then incorporate gene–gene and gene–environment interactions to identify additional genetic loci. Additionally, “data mining” approaches may be applied to these datasets to extract even more genetic information using data reduction techniques, set‐based tests, and pathway analyses. These more complex analyses are discussed in detail in Chapter 11.

      Bioinformatics

      The large amount of information generated by any genomic study of a complex trait requires careful attention to quality control, efficient and secure storage, and compliance with data‐sharing requirements and privacy protections. These activities require a well‐designed and secure database system. Such systems have evolved over time from text files to relational databases, to large‐scale “data warehouses.” Such datasets also require large‐scale processing power with ample attached storage to facilitate linkage and association studies. High‐throughput sequencing in particular requires a large amount of storage and computational power for genome alignment (or assembly) and base calling. For multi‐site studies, these resources may need to be accessible from multiple locations, requiring levels of access and security depending on the role on the study and need to access other sites’ information. In addition to maintaining local resources for a study, a bioinformatics team also must be familiar with many different public sources of genomic data (e.g. UCSC and Ensembl browsers, ENCODE databases, sequence repositories, dbGaP) and be able to submit results to public repositories for sharing with the wider research community. These issues are discussed in more detail in Chapter 7.

      Follow‐up

      Variant Detection

      Replication

      The literature on most complex traits is at this point littered with initial reports of allelic or genotypic associations that cannot be replicated at all (or are replicated in a small minority of studies). Reproducibility of findings in independent samples is a critical characteristic most investigators seek when weighing the evidence for a trait‐associated variant. Because of this, most studies (particularly those seeking government or foundation funding) now include a plan for replication of findings in a second dataset. These replication datasets should be independent of the initial finding (e.g. do not overlap with the discovery dataset) and be assessed in similar fashion (e.g. phenotype definitions agree, ascertainment is similar, genetic analysis is comparable). This does not mean that the datasets must be from the same population – indeed, demonstrating replication across populations (e.g. European, Asian, and African) for a common complex trait locus may add strength to the study. However, for rare variants, cross‐population replication might be more difficult (due to population‐specific alleles); for such studies, replication in a second sample from the sample population would be desirable.

      Functional Studies

      While most disease gene discovery efforts have claimed success based on finding variants that segregate with traits in pedigrees or polymorphisms significantly associated with the trait in population samples, this is, strictly, not sufficient evidence. More conclusive is evidence arising from biological systems (e.g. cultured cells, animal models, or human blood and tissue samples) that the trait can be either induced by introduction of the allele or ameliorated by blocking the action of the allele. In genetically complex traits, where the responsible variation may be a common polymorphism, it is even more critical that such evidence be found before success is declared.