Biomedical Data Mining for Information Retrieval. Группа авторов
Чтение книги онлайн.

Читать онлайн книгу Biomedical Data Mining for Information Retrieval - Группа авторов страница 21

Название: Biomedical Data Mining for Information Retrieval

Автор: Группа авторов

Издательство: John Wiley & Sons Limited

Жанр: Базы данных

Серия:

isbn: 9781119711261

isbn:

СКАЧАТЬ which produces high-resolution large-scale molecular structures very efficiently. Cryo-EM density maps make use of machine learning and artificial intelligence for prediction [43–46]. For such experiments protein crystal is needed which is the most disadvantageous or complex part of these methods because there are many liquid proteins which do not crystalize. Artificial intelligence comes to our aid here as it is a possible better pathway for sequencing these proteins [47, 48] due to the fact that they have proved their efficacy and accuracy of successful application in different fields like business [49], image recognition to name and can accurately and efficiently predict thousands of possible structures in shortest time by analysing big data where other methods have failed to deliver accurate and useful information.

Database sources Websites References
PDB http://www.rcsb.org/pdb/ [57]
UniProt http://www.uniprot.org/ [58]
DSSP http://swift.cmbi.ru.nl/gv/dssp/ [59]
SCOP http://scop.mrc-lmb.cam.ac.uk/ [60]
SCOP2 http://scop2.mrc-lmb.cam.ac.uk/ [61]
CATH http://www.cathdb.info/ [62]

      Hidden Markov Model for Prediction HMMs are among the most important techniques for protein fold recognition. In the HMM version of profile–profile methods, the HMM for the query is aligned with the prebuilt HMMs of the template library. This form of profile–profile alignment is also computed using standard dynamic programming methods. Earlier HMM approaches, such as SAM [63] and HMMer [64], built an HMM for a query with its homologous sequences and then used this HMM to score sequences with known structures in the PDB using the Viterbi algorithm, an instance of dynamic programming methods. This can be viewed as a form of profile-sequence alignment. More recently, profile–profile methods have been shown to significantly improve the sensitivity of fold recognition over profile–sequence, or sequence–sequence, methods [65].

      Neural Networks (NNs) It is very challenging to determine the structure of a protein if its sequence is given and hence making function determination more difficult. Since a lot of molecular interaction and various levels of folding are involved in a functional protein simple input of sequence will not result in desired output. Deep learning methods are rapidly evolving field in the context of complex relationships between input features and desired outputs which has been put to great use in structure prediction. Various deep neural network architectures resembling the neural network of a human have been proposed which includes deep feed-forward neural networks, recurrent neural networks and neural Turing machines and memory networks. Such advancements are making this field more competitive and accurate and a comparison can be made to a human brain where it receives so many information as inputs but is able to analyze and come to a logical conclusion.

      Pattern recognition and classification are important tools of NN. Examples of early NN methods that are still widely used today are PHD [66, 67] PSIPRED [68] and JPred [69] though advancement has occurred to a great deal as Deep neural network (DNN) models have been shown have an advantage of performance in image and language based problems [70] and has been seen to extend to some specific CASP areas such as residue-residue contact prediction and direct use for accurate tertiary structure generation [71–75].

      1 Known structures of protein in the data bank

      2 Evolutionary relationships of the predicted protein

      3 The various principles of bond formation governing the 3-D structure of protein.

      The advantages of SVM include avoidance of over-fitting very effectively which is a disadvantage with several other methods and is able to manage large feature spaces, and condensation of large amount of information data.

      Bayesian СКАЧАТЬ