Machine Learning Techniques and Analytics for Cloud Security. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Machine Learning Techniques and Analytics for Cloud Security - Группа авторов страница 22

СКАЧАТЬ acid-binding immunoglobulin (Siglecs). Within the lectin family, lots of pattern recognition receptors are present like as DC-SIGN and Dectin-1 and the selectins that are L-selectin, E-selectin, and P-selectin. For leukocyte function, lectins are very complex to communicate with the glycans of cell surface that are basically sialyl-LewisX and 6′-sulfo-sialyl-LewisX. To bind the glycoprotein as counter receptors, L-lectins are expressed by leukocytes on endothelial cells for directing naive T cells. In contrast, on endothelial cells, E-lectins and P-lectins are both expressed as a impact of inflammation. The selectins and their glycan ligands interactions facilitate adherence of leukocytes along the endothelium and allowing the cells to migrate into tissues in response to chemokines that are bound to glycol-saminoglycans [12–18]. This way, communication of glycans and selectins are responsible for leukocyte function by arranging restricted to the ideal anatomic field. Another group of glycan-binding molecules are siglecs. But siglecs’ function is perfectly separate from lectins (c-type) and galectins. Siglecs are also receptors of cell surface for recognizing sialic acids and high-ranking vertebrates. It has also cytoplasmic tails that holds more than immune-receptor inhibitory motif sequences (tyrosine-based). Glycans holds different types of effects (indirect) on lymphocyte function [12–19]. To reduce the N-glycans’ complexity on T-cell receptors, these effects are resulted to raised T-cell receptors clustering and signaling at antigen density (lower). In the T-cell receptors signaling process, galectin is not directly engaged and Mgat5 enzyme plays an important role to contribute of N-glycan complexity that increase autoimmune disorders of H1N1 disease and had raised sensibility to empirical autoimmune encephalitis. Similarly, decease of N-glycan complexity on glycoproteins of the cell surface is changed the signaling via lectins and cytokine receptors [17–22].

      In 2015, a framework has been proposed of the genetics of the new strain and recognized its nearest relatives in swine using a cluster analysis approach like as the PCA and k-means clustering algorithm and suitable with a reassortment of Eurasian and North American swine viruses [5, 20]. Glycoproteins are the key elements of human pathogenic viruses and perform important roles in infection and immunity. The influenza A virus contains two surface glycoproteins which consist of hemagglutinin (HA) and neuraminidase (NA) that dominate the virion exterior and form antibodies. One major of the components of the outermost layer of viruses is glycans. The communication between the viral pathogens with pathogens’ hosts is affected by the glycans’ pattern and glycan-binding receptors. Due to the mass branching of carbohydrates, they are the complex bio-molecules, and in this process, various glycoproteins are used to recognize with human pathogens (virus). Infectious glycans can be either virus-encoded or can be host-derived that usually obtained by humoral immune responses (high) within the human body. HA and NA both are responsible for creating a connection with envelope glycoproteins of the influenza virus. When HA communicates with terminal sialyl residues of oligosaccharides that ensure the binding of the virion to the cell surface. To eliminate sialyl residues from oligosaccharides contained in cell and virus components, NA is also needed. It is a receptor-destroying enzyme that prevents aggregation of virus particles [7, 25].

      In this paper, our goal is to identify differentially expressed glycan. The clustering algorithms have been applied to H1N1 infected human datasets and non-infected human data-set. After that, we compare infected with the non-infected dataset and identify differentially expressed glycan.

      2.2 Proposed Methodology

      Input: Let, the dataset D consists of “n” number of glycan with “m” number of parameter values like RFU (relative fluorescence units), STDEV (standard deviation), and SEM (squared error mean). Each glycan is a vector and is represented by g1, g2, g3, …, gi, …, gn. The dataset D has two states normal (represented by DN) and diseased or H1N1 infected state (represented by DI).

      Output: Differentially expressed glycan identification G’

      Step-1: Apply clustering algorithm “C” on normal (represented by DN) and diseased or H1N1 infected state (represented by DI).

      Step-2: Result for normal state = images; similarly, result for infected state = images; Here, clusters number is k.

      Step-4: Perform cluster comparison and identify the differentially expressed glycan set G that has been changed quite significantly.

images

      Step-5: For multiple glycan datasets D1, D2,…, Dt, the resultant glycan set will be represented as G’= G1G2…Gt; here, G1 is the differentially expressed glycan set obtained in Step 4 for dataset D1.

      The first algorithm has been applied that is the k-means clustering and was proposed by scientist J.B. Macqueen. The actual idea behind this algorithm is to identify k centroids one for each cluster or group.

       (1) At first, choose some points to represent initial cluster focal points.

       (2) Secondly, assign each object to a cluster that has closed centroids.

       (3) Thirdly, when all objects are assigned, then recalculate the position of the k centroids, and lastly, this process will be continued until the centroids no longer move and this basically produces separation of the objects into clusters from which the metric is to be minimized can be calculated [23].

      The hierarchical clustering is the second algorithm. It groups similar objects into groups (cluster). In this algorithm, it basically treats every observation as an individual cluster. After that, it iterates the following steps continuously:

       (1) At first, consider the two clusters or groups that are closest together.

       (2) Then, combine the two most similar clusters. Until all the clusters are combined together, this process continues [24].

      The fuzzy c-means clustering is the last and third algorithm. This algorithm’s concept is very like to the k-means clustering. The algorithm is as follows:

       (1) At first, identify clusters number.

       (2) Then, randomly assign coefficients to each data point for being in the clusters.

       (3) Until the algorithm has converged, repeats (1) and (2) step:(i) Compute centroid of each cluster or group.(ii) For every data point, compute the coefficient of being in the cluster.

      2.3 Result

      Result section consists of description of datasets, analysis of results, and validation of results.

Schematic illustration of flowchart of the methodology.