Название: Machine Learning Techniques and Analytics for Cloud Security
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Программы
isbn: 9781119764090
isbn:
14. Ann, Y., McCullers, J.A., Alymova, I., Parson, L.M., Cipollo, J.F., Glycosylation analysis of engineered H3N2 influenza A virus hemagglutinins with sequentially added historically relevant glycosylation sites. J. Proteome Res., 14, 3957–3969, 2015.
15. Wang, T., Maamary, J., Tan, G., Bournazos, S., Davis, C., Krammer, F., Schlesinger, S., Palese, P., Ahmed, R., Ravetch, J., Anti-HA glycoforms drive B cell affinity selection and determine influenza vaccine efficacy. Cell, 162, 160–169, 2015.
16. Mkhikian, H., Mortales, C., Zhou, R.W., Khachikyan, K., Wu, G., Haslam, S., Kavarian, P., Dell, A., Demetriou, M., Golgi self-correction generates bioequivalent glycans to preserve cellular homeostasis. Elife, 5, e14814, 2016.
17. Le, N., Bowden, T., Struwe, W., Crispin, M., Immune recruitment or suppression by glycan engineering of endogenous and therapeutic antibodies. Biochim. Biophys. Acta, 1860, 1655–1668, 2016.
18. Cedeno-Laurent, F., Opperman, M., Barthel, S., Metabolic inhibition of galectin-1-binding carbohydrates accentuates antitumor immunity. J. Invest. Dermatol., 132, 410–420, 2012.
19. Maverakis, E., Kim, K., Shimoda, M., Gershwin, E., Patel, F., Wilken, R., Raychaudhuri, S., Ruhaak, L.R., Lebrilla, Glycans In The Immune system and The Altered Glycan Theory of Autoimmunity: A Critical Review. J. Autoimmun., 57, 1–13, 2015 February 1st.
20. Pereira, M., Alves, I., Vicente, M., Campar, A., Silva, C.M., Padrao, A., Dias, M.A., Pinho, S.S., Glycans as key checkpoints of T cell Activity and Function. Frontiers in immunology, https://doi.org/10.3389/fimmu.2018.02754, 2018.
21. Baum, G.L. and Cobb, A.B., The direct and indirect effects of glycans on immune function. Glycobiology, 27, 7, 619–624, July 2017.
22. Reily, C., Stewart, J.T., Novak, J., Glycosylation in health and disease. 15, 6, 346–366, 2019.
23. Youguo, L. and Haiyan, W., A Clustering Method Based On K-means Algorithm, Elsevier. Phys. Proc., 25, 1104–1109, 2012.
24. Murtagh, Fionn and Contreras, Pedro, Algorithms for hierarchical clustering: an overview, II, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7, 6, e1219, 2017.
25. Taubenberger, J.K. and Morens, M.D., The Pathology of Influenza Virus Infections. J. Clin. Micriobiol., 2008.
1 Email: [email protected]
3
Selection of Certain Cancer Mediating Genes Using a Hybrid Model Logistic Regression Supported by Principal Component Analysis (PC-LR)
Subir Hazra*, Alia Nikhat Khurshid and Akriti
Meghnad Saha Institute of Technology, Kolkata, India
Abstract
In recent times, gene selection whose mutation is associated with some cancers is a promising research area. An important tool to progress in this research work is analyzing microarray gene expression data. Literature survey shows that different algorithms based on Machine Learning have been found effective in cancer classification and gene selection. The selected genes play a significant role as a clinical decision-making support system. It becomes helpful in diagnosing cancer by identifying genes whose expression level changes significantly. As microarray gene expression data is huge in number, so developing gene selection algorithm through Machine Learning approach incurs high computational complexity. Too many features can cause of over fitting and gives poor performance for the algorithm. In the present article, we developed a hybrid approach where we reduced number of features using Principal Component Analysis (PCA) and then applied Logistic Regression model for prediction of genes. After fitting Logistic Regression on test data, it is compared with an accuracy score. By checking the accuracy score, finally, the set of candidate genes is selected whose expression levels are manifested disproportionately. The generated sets of genes are identified for having correlation with certain cancers. The proposed method is demonstrated with two datasets, viz., colon and lung cancer. The result has been finally validated biologically using NCBI database. The efficacy and robustness of the method have also been evaluated.
Keywords: Gene expression, PCA, Logistic Regression, dimensionality reduction, accuracy score, classification, F-score
3.1 Introduction
All cancer is the result of gene mutations. Mutations may be caused by several factors. Normal cells turn into cancerous cells largely due to mutations in their genes. Often, it is observed that a cell becomes cancer cell, when several mutations are involved. The mutations can influence various genes that control the division and growth of cells. Identifying the genes having correlation with certain cancer is a challenging task. Gene expression data obtained by high-performance–based technology, viz., DNA sequencing and DNA microarray, both have been proven to have high impact in cancer research [1]. Gene selection can help in many ways like cancer treatment, proper diagnosis, and drug discovery [2]. With the invention and advancement of DNA microarray technology, monitoring the levels of expression of thousands of genes is possible but the key task is to derive information from the vast amount of biological data and realizing the underlying patterns [3]. Over the past few decades, a lot of tools based on various computational techniques have been developed in the domain of cancer classification for making advancement in medical science which essentially improves the competence of biologists and physicians for detecting cancer mediating biomarkers [4].
Cancer classification with the help of analyzing microarray gene expression data is a conventional method nowadays. The biological relevance of genes substantially influences the accuracy of cancer classification. Thus, selection of genes plays a pivotal role and might be observed as main factor for classification of cancer on the basis of microarray data. The process of gene selection relates to the task of selecting a few significant genes that better characterizes the variations [5]. It is always effective to put focus some important genes which are obviously smaller in number and might differ in their expression levels from non-cancerous state to cancerous one. Thus, from the whole genome, only a few number of genes which are dominant should be identified by using effective gene selection method [6]. But extracting information from the vast amount of biological data and understanding the patterns is the most appealing task. This correlation is more pronounced when these genes are located on the same biological path. In this situation, the procedures traditionally used for feature selection often overlook the relationships between genes and select only a few the set of genes which are mostly linked. The irrelevant genes not only contribute to lower output of the classification but also bring additional difficulties in locating genes which are descriptive in nature [7].
Analyze microarray data and selection of informative genes is always a demanding task. Due to presence of diversity and complexity in different types of cancer, the task is more challenging. With the emergence in the field of biotechnology a bulk amount of data is being generated by utilizing high-density oli-gonucleotide chips and cDNA arrays [8, 9]. Researchers now can measure thousands of gene expression data simultaneously. But there is lack of suitable СКАЧАТЬ