Название: Computational Prediction of Protein Complexes from Protein Interaction Networks
Автор: Sriganesh Srihari
Издательство: Ingram
Жанр: Программы
Серия: ACM Books
isbn: 9781970001549
isbn:
The Watts-Strogatz (WS) model [Watts and Strogatz 1998] is seldom used for modeling biological networks but demonstrates how easy it is to transition from a non-small world network (e.g., a lattice graph) to small-world network with random rewiring of a few edges. The generation procedure is amazingly simple: From a ring lattice with n vertices and k edges per vertex, each edge is rewired at random with probability p. If p = 0, then the graph is completely regular, and if 1, the graph is completely random/disordered. For the WS model, the intermediate region, where 0 < p < 1 can be selected to examine its intrinsic properties. Aside from potential use for investigating the level of ordered-ness of an observed network, the WS model has seen few applications in biology. Its major contribution toward network biology is that given how easy the small-world property can emerge from minor disruption of a regular lattice graph, it is not a unique defining property. Therefore, the myriad of research papers that describe the biological network under analysis as small world are not reporting particularly useful information.
The BA model is able to capture the scale-free property observed in biological networks but has very low internal clustering. This seemed at odds with the idea that biological networks, or at least PPI networks, are modular in nature. Proteins achieve their functionality by virtue of extensive interconnections with other proteins, forming simple physical interactions, which at higher levels can be envisaged as complexes and functional modules. To better capture the clustering effects in real biological networks, hierarchical models were proposed. Hierarchical network (HN) models [Ravasz and Barabási 2003] are iterative approaches for generating networks that encapsulate both scale-free and highly clustered behavior. For a real network, the average clustering coefficient for the entire network is higher than the BA model and ER model. To construct a HN model, tightly clustered cores with high clustering coefficient are first generated. These are then iteratively connected by selecting random nodes in each core, and having these connect to another. The downside, however, is that developing HN models requires making certain assumptions. For instance, we have to assume that we know the distribution of the sizes and clustering densities of the embedded modules. We also assume that these modules combine in an iterative manner. In biological networks, however, it seems that the distinction boundary between modules is not very sharp with high levels of interconnectivity.
PPI networks from the yeast S. cerevisiae and the bacteria H. pylori resulting from some of the high-throughput studies mentioned earlier [Uetz et al. 2000, Ito et al. 2000, Xenarios et al. 2002] have been shown to have scale-free degree distributions [Pržulj et al. 2004, Jeong et al. 2001, Maslov and Sneppen 2002]. However, the larger D. melanogaster (fruit fly) PPI network has been shown to decay faster than a power law [Giot et al. 2003]. Furthermore, the shortest path distribution and the frequencies of cycles of 3–15 nodes in the fruitfly network differ from those of randomly rewired networks which preserve the same degree distribution as the original PPI network [Giot et al. 2003]. To better capture these frequency distributions of node-cycles, geometric random graph models were proposed. In a geometric random graph model, the nodes correspond to independently and uniformly distributed points in a metric space, and two nodes are linked by an edge if the distance between them is smaller than or equal to some radius r, where the distance is an arbitrary distance norm in the metric space [Penrose 2003]. Pržulj et al. [2004] used geometric random graphs to model PPI networks, by defining the points in 2D, 3D, and 4D Euclidean space, and distance between the points measured using Euclidean distance. Pržulj et al. studied the similarity between geometric random graphs and PPI networks using the distributions for graphlets. A graphlet is a connected network with a small number of nodes, and graphlet frequency is the number of occurrences of a graphlet in a network. The authors defined 29 types of graphlets using 3–5 nodes (Figure 2.3). The relative frequency of a graphlet i from the PPI network G is defined as Ni(G)/T(G), where Ni(G) is the number of graphlets of type i ∈ {1, …, 29}, and
Figure 2.3 The 29 graphlets of 3–5 nodes defined by Pržulj et al. [2004].
where Fi(G) = − log(Ni(G)/T(G)); the logarithm being used because the graphlet frequencies can differ by several orders of magnitude. The authors generated geometric random networks with the same number of nodes as the proteins in S. cerevisiae and D. melanogaster PPI networks from high-throughput experiments. The authors found that, although the degree distributions of these PPI networks were closer to that of scale-free random networks, other topological parameters matched closely with those of geometric random networks. Specifically, the diameter, local and whole-network clustering coefficients, and the relative graphlet frequencies, computed as above, showed that PPI networks were closer to geometric random networks than scale-free networks. Furthermore, the authors suggest that as the quality and quantity of PPI data improve, the geometric random network may become better suited compared to scale-free and other networks to model PPI networks.
2.5 Visualizing PPI Networks
Visualization concerns an important component of the analysis of PPI networks. Very simply, a PPI network is visualized as dots and lines, where the dots (or other shapes) represent proteins and the lines connecting the dots represent interactions between the proteins. Such a visualization enables quick exploration of the topological properties of PPI networks—for example, counting the neighbors of a selected protein in a PPI network, the number of connected components in the network, or even spotting dense and sparse regions of the network. However, the ease of this exploration and subsequent analysis depends on how effective is the visualization method or tool used to render the PPI network. This rendering of the PPI network concerns the field of graph or network layout, where layout algorithms are used to draw the network—by appropriately positioning the dots and lines—in a 2D space. A good layout should be able to (visually) bring out the topological properties of the PPI network easily, and this has been a subject of research in graph visualization for several years. Here we briefly introduce some of the commonly used algorithms for PPI network layout; for details the readers are referred to the excellent reviews of Morris et al. [2014], Agaptio et al. [2013], and Doncheva et al. [2012].
Random layout arranges the dots (nodes) and lines (edges) in a random manner in the 2D space. The advantage of this algorithm is its simplicity, but on the other hand, it presents a high number of criss-crossing edges, and does not necessarily use the available space optimally, especially for large networks. Circular layout arranges the nodes in succession, one after the other, on a circle. While this algorithm also suffers from criss-crossing of edges, it is widely used to visualize small (sub)networks such as protein complexes and pathways. Tree layout arranges the network as a tree with a hierarchical organization of the nodes. This is obviously more suitable for visualizing trees than networks with cycles. Often, the layout is “ballooned out” by placing the children of each node in the tree on a circle surrounding the node, resulting in several concentric circles. Force-directed layout places the nodes according to a system of forces based on physical concepts in spring mechanics. Typically, the system combines attractive forces between adjacent nodes with repulsive forces between all pairs of nodes to seek an optimal layout in which the overall edge lengths are small while the nodes are well separated.