Название: Data Analytics in Bioinformatics
Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Жанр: Программы
isbn: 9781119785606
isbn:
3.3.3 ML in Bioinformatics
Machine learning (ML) is a technique to develop computer program to access data and to learn knowledge automatically from experience without human interventions and assistance. Machine learning techniques and deep learning algorithms enable the classifier to make use of automatic feature learning technique, to make reasonably complex predictions when the model is being trained on large datasets. It means the algorithm is able to learn based on the dataset alone and can discover ways to integrate numerous features of the input information into one intellectual set of features from which further learning can be done [9]. Machine learning technique uses two different methods to train the model: supervised learning and unsupervised learning method [16]. In subsequent sections we will discuss machine learning models used for supervised learning problems.
In recent years, availability of biological datasets have risen abruptly, this has enabled bioinformatics researchers to make use of these machine learning algorithms. ML techniques has been applied to many biological domains such as Microarrays, Systems biology, Genomics, Proteomics, Stroke diagnosis and Text mining, etc. [8].
Machine learning in bioinformatics helps to explore, analyze, manage and store data to extract relevant information from biological data.
Gene identification and nucleotide identification help to understand gene and gene association with disease.
Machine learning tools are also used to determine genomic sequence and examine gene pattern.
Gene sequence classification allows us to grasp the concept of nucleic acid and protein sequence.
Advancement of this field can ultimately leads to the development of automated diagnostic tools, personalized and precision medicine, gene therapy, food analysis, biodiversity management and many more which will target Individual’s lifestyle, environment and custom medical treatments considering person’s vulnerability to disease.
There are many machine learning techniques among which Artificial Neural Network is an effective technique for the identification, selection, classification and prediction of the gene in the DNA Sequences.
3.3.4 Introduction to ANN
ANN is a computing system that resembles human brain, consisting of highly interconnected network of processing units called neurons. It has the ability to handle complex features within data in order to process the information.
Figure 3.2 represents the simple work flow of artificial neural network with a perceptron architecture, which consists of one input layer with few input units representing multiple features present in the database, depending on the objective being defined. Each inputs collected from the dataset are multiplied with weights and fed to a function called activation function to produce the actual output.
Depending on the difference between desired output and actual output weights are modified in each layer of connection and finally output is predicted. Weights are machine learnt values from neural networks. Neural network learns by a feedback process called backpropagation. In backpropagation, weights are modified from the output layer to the input layer, going backward through hidden layer to lessen the difference between actual output and desired output to a certain point [18]. Therefore, backpropagation algorithm helps to reduce the overall computational loss of the network while learning the network. The number of hidden layers used for computation and the type of computation being done in each layer together determine the processing speed of the network.
Figure 3.2 Simple network architecture of ANN with four input unit [17].
Neural networks are classified into different types based on the structure, neuron density, number of layers, etc. Most commonly used ANNs generally follow the three-layer architecture having an input layer, one or more hidden layers and an output layer as we have seen in Figure 3.2. The number of neurons to be used in each layer of the network depends on the number of features available in a dataset and the complexity of the problem [19].
We will discuss some of ANN model that are widely used as supervised learning models.
Perceptron Network: Perceptron model is a binary classifier that means it separates input data in to two categories. It is the simplest and oldest model of neural network which can implement linearly separable problems such as AND, OR, NOT gate but does not work for non-linear problems like XOR gate. So to deal with non-linear problems or more complex problems we utilize multilayer perceptron. Left side of Figure 3.3 shows a simple architecture of perceptron model consisting of an input layer and an output layer.
Multilayer Perceptron (MLP) Network: MLP supports multi-class classification. It consists of at least one hidden layer along with an input layer and an output layer. In this network every single node is connected to all other nodes in the next layer by connecting weights to develop a fully connected neural network. It may involve one or more than one hidden layer in the network for complex problem classification. MLP implements non-linear activation function for predicting the output units. It has a unique ability to propagate in both the direction i.e. forward propagation and backward propagation [18]. Right of Figure 3.3 shows the architecture of MLP with an input layer, one hidden layer and an output layer.
Figure 3.3 Single layer perceptron (left) and multilayer perceptron with one hidden layer (right) [20].
Backpropagation: It is a supervised learning algorithm implemented in MLP network that helps to change the weight to minimize the calculated error. It traverses backward from output layer to input layer through hidden layers.
Unsupervised learning occurs when the target output is unknown. In this kind of learning network supervises how to group the output result based on the given input data. So this is also defined as self-organization. Some of the applications of unsupervised learning include image processing, speech recognition, text mining, etc. There are some well-established form of neural network based unsupervised learning algorithms available, such as principal components analysis, Kohonen’s self-organizing maps, independent components analysis, Hebbian learning, etc. [9]. Though unsupervised learning algorithms have a proven track in many areas, unwrapping their application for a comprehensive review is beyond the scope of our discussion. This chapter will only focus on the application of supervised neural networks. To get more insights on unsupervised learning in proteomics and genomics you can refer [21].
3.3.5 Application of ANN in Bioinformatics
Artificial neural network has been used in many areas of bioinformatics and has proved to be one of the most powerful tools in solving many bioinformatics problems. Some of the areas of bioinformatics where ANN is applied are listed below and discussed in detail in further sections.
1. In DNA, RNA alignment
2. Image and signal processing
3. In the problems of genes identification
4. In the coding region recognition СКАЧАТЬ