Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari
Чтение книги онлайн.

Читать онлайн книгу Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari страница 7

СКАЧАТЬ (potentially spurious) interactions. (b) Candidate protein complexes are predicted from this PPI network using network-clustering approaches. The quality of the predicted complexes is validated against bona fide complexes, whereas novel complexes are functionally assessed and assigned new roles where possible.

      3. identifying modular subnetworks from the PPI network to generate a candidate list of protein complexes; and

      4. evaluating these candidate complexes against bona fide complexes, and validating and assigning roles for novel complexes.

      As we shall see in the following chapters, several sophisticated approaches have been developed over the years to overcome some of the above-mentioned challenges.

      Computational methods have co-evolved with proteomics technologies, and over the last ten years a plethora of computational methods have been developed to predict complexes from PPI networks, which is the subject of this book. In general, computational methods complement experimental approaches in several ways. These methods have helped counter some of the limitations arising in proteomic studies, e.g., by eliminating spurious interactions via interaction scoring, and by enriching true interactions via prediction of missing interactions. The novel interactions and protein complexes predicted from these methods have been added back to proteomics databases, and these have helped to further enhance our resources and knowledge in the field.

      Several high-quality resources for protein complexes have been developed over the years covering both lower-order model and higher-order organisms (summarized in Table 1.3). In total, Aloy [Aloy et al. 2004], CYC2008 [Pu et al. 2009], and MIPS [Mewes et al. 2008] contain over 450 manually curated complexes from S. cerevisiae (budding yeast). CORUM [Reuepp et al. 2008, 2010] contains ∼3,000 mammalian complexes of which ∼1,970 are protein complexes identified from human cells. The European Molecular Biology Laboratory (EMBL) and European Bioinformatics Institute (EBI) maintain a database of manually curated protein complexes from 18 different species including C. elegans, H. sapiens, M. musculus, S. cerevisiae, and S. pombe [Meldal et al. 2015].

      Havugimana et al. [2012] present a dataset of 622 putative human soluble protein complexes (http://human.med.utoronto.ca/) identified using high-throughput AP/MS pulldown and PPI-clustering approaches. Huttlin et al. [2015] present 352 putative human complexes identified from human embryonic (HEK293T) cells (http://wren.hms.harvard.edu/bioplex/). Wan et al. [2015] present a catalog of conserved metazoan complexes (http://metazoa.med.utoronto.ca/) identified by clustering of high-quality pulldown interactions from C. elegans, D. melanogaster, H. sapiens, M. musculus, and Strongylocentrotus purpuratus (purple sea urchin). This dataset includes ~300 complexes composed of entirely ancient proteins (evolutionarily conserved from lower-order organisms), and ~500 complexes composed of largely ancient proteins conserved ubiquitously among eurkaryotes. Drew et al. [2017] present a comprehensive catalog of >4,600 computationally predicted human protein complexes covering >7,700 proteins and >56,000 interactions by analyzing data from >9,000 published mass spectrometry experiments. Vinayagam et al. [2013] present COMPLEAT (http://www.flyrnai.org/compleat/), a database of 3,077, 3,636, and 2,173 literature-curated protein complexes from D. melanogaster, H. sapiens, and S. cerevisiae, respectively. Ori et al. [2016] combined mammalian complexes from CORUM and COMPLEAT to generate a dataset of 279 protein complexes from mammals.

image

      a. No. of complexes as of 2016.

      b. COMPLEAT includes protein complexes from D. melanogaster, H. sapiens, and S. cerevisiae. The EMBL-EBI portal includes protein complexes from 18 different species of which are C. elegans (16 complexes), H. sapiens (441), M. musculus (404), S. cerevisiae (399), and S. pombe (16). CORUM includes mammalian protein complexes, mainly from H. sapiens (64%), M. musculus (house mouse) (15%) and R. norvegicus (12%) (Norwegian rat).

      c. Includes mainly conserved complexes among the metazoans, C.elegans, D. melanogaster, H. sapiens, M. musculus, and Strongylocentrotus purpuratus (purple sea urchin), consisting of 344 complexes with entirely ancient proteins and 490 complexes with largely ancient proteins conserved ubiquitously among eurkaryotes.

      The rest of this book reads as follows. Chapter 2 discusses important concepts underlying PPI networks and presents prerequisites for understanding subsequent chapters. We discuss different high-throughput experimental techniques employed to infer PPIs (including the Y2H and AP/MS techniques mentioned earlier), explaining briefly the biological and biochemical concepts underlying these techniques and highlighting their strengths and weaknesses. We explain computational approaches that denoise (PPI weighting) and integrate data from multiple experiments to construct reliable PPI networks. We also discuss topological properties of PPI networks, theoretical models for PPI networks, and the various databases and software tools that catalog and visualize PPI networks. Chapter 3 forms the main crux of this book as it introduces and discusses in depth the algorithmic underpinnings of some of the classical (seminal) computational methods to identify protein complexes from PPI networks. While some of these methods work solely on the topology of the PPI network, others incorporate additional biological information—e.g., in the form of functional annotations—with PPI network topology to improve their predictions. Chapter 4 presents a comprehensive empirical evaluation of six widely used protein complex prediction methods available in the literature using unweighted and weighted PPI networks from yeast and human. Taking a known human protein complex as an example, we discuss how the methods have fared in recovering this complex from the PPI network. Based on this evaluation, we explain in Chapter 5 the shortcomings of current methods in detecting certain kinds of protein complexes, e.g., protein complexes that are sparse or that overlap with other complexes. Through this, we highlight the open challenges that need to be tackled to improve coverage and accuracy of protein complex prediction. We discuss some recently proposed methods that attempt to tackle these open challenges and to what extent these methods have been successful. Chapter 6 is dedicated to an important class of protein complexes that are dynamic in their protein composition and assembly. While some of these protein complexes are temporal in nature—i.e., assemble at a specific timepoint and dissociate after that—others are structurally variable—e.g., change their 3D structure and/or composition—based on the cellular context. Quite obviously, it is not possible to detect dynamic complexes solely by analyzing the PPI network; methods that integrate gene or protein expression and 3D structural information are required. These more-sophisticated methods are covered here. Chapter 7 discusses methods to identify protein complexes that are conserved between organisms or species; these evolutionarily conserved complexes provide important insights into the conservation of cellular processes through the evolution. Finally, in today’s era of systems biology where biological systems are studied as a complex interplay of multiple (biomolecular) entities, we explain how protein complex prediction methods are playing a crucial role in shaping up the field; these СКАЧАТЬ