Biological Language Model. Qiwen Dong
Чтение книги онлайн.

Читать онлайн книгу Biological Language Model - Qiwen Dong страница 3

Название: Biological Language Model

Автор: Qiwen Dong

Издательство: Ingram

Жанр: Медицина

Серия: East China Normal University Scientific Reports

isbn: 9789811212963

isbn:

СКАЧАТЬ structural and functional information of biological macromolecules contained in the sequence is analyzed by using the theories and methods of computer science, mathematics and statistics.

      Proteins play a key role in various basic biological processes. As the material basis of life activities, proteins participate in various life processes, such as catalyzing almost all chemical reactions in biological cells, regulating gene activity and participating in the formation of most cell structures. In view of the key role of proteins in life activities, the study of protein structure and function has always been the focus of life science research.

      Protein sequences are similar to sentences in natural language, as they are both linear arrangements of basic units. The mapping of sequences to structures and functions of proteins is conceptually similar to the mapping of words to meanings. This analogy has been studied by a growing body of research, but are there any linguistic features in protein sequences? What are the basic units in protein sequence language? Large amounts of genomic protein sequence data for Homo sapiens and other organisms have recently become available together with a growing body of protein structure and function data. The expected exponential increase in the amount of the data in the coming decade creates an opportunity for attacking the sequence–structure–function mapping problem with sophisticated data-driven methods. Such methods have been proven to be immensely successful in the domain of natural language.

      The purpose of this book is to introduce the relevant techniques of biological language modeling into bioinformatics and promote the development of protein sequence–structure–function mapping. In view of the above purpose, the linguistic features of protein sequences are analyzed and several amino acid encoding schemes are explored. Then, several research topics including remote homology detection, protein structure prediction and protein function prediction are investigated by using biological language model approaches. Finally, a brief summary and future perspective are proposed. We hope that this book will be helpful for research in the field of bioinformatics, especially the mapping of protein sequences to their structure and function.

      Qiwen Dong

      Xiuzhen Hu

      Xiaoyang Jing

      Aoying Zhou

       Acknowledgments

      This work was supported by the National Key Research and Development Program of China under grant 2016YFB1000905 and the National Natural Science Foundation of China (Grant No. U1401256, U1711262, U1811264, 61672234, 61961032, 31260203, 61402177).

      We would like to thank all the people who have made contributions to and given their valuable suggestions regarding this book, especially Bin Liu, Ming Gao, Dingjiang Huang and Daocheng Hong. We would also like to express our sincere thanks and appreciation to the people at University Press, for their generous help throughout the publication preparation process.

       Contents

       East China Normal University Scientific Reports

       Preface

       Acknowledgments

       1.Introduction

       1.1Background and Motivation

       1.2Related Topics

       1.3Organization of the Book Content

       References

       2.Linguistic Feature Analysis of Protein Sequences

       2.1Motivation and Basic Idea

       2.2Comparative n-gram Analysis

       2.3The Zipf Law Analysis

       2.4Distinguishing the Organisms by Uni-Gram Model

       2.5Conclusions

       References

       3.Amino Acid Encoding for Protein Sequence

       3.1Motivation and Basic Idea

       3.2Related Work

       3.3Discussion

       3.4The Assessment of Encoding Methods for Protein Secondary Structure Prediction

       3.5Assessments of Encoding Methods for Protein Fold Recognition

       3.6Conclusions

       References

       4.Remote Homology Detection

       4.1Motivation and Basic Idea

       4.2Related Work

       4.3Latent Semantic Analysis

       4.4Auto-cross Covariance Transformation

       4.5Conclusions

       References

       5.Structure Prediction

       5.1Motivation and Basic Idea СКАЧАТЬ