Название: Deep Learning Approaches to Text Production
Автор: Shashi Narayan
Издательство: Ingram
Жанр: Программы
Серия: Synthesis Lectures on Human Language Technologies
isbn: 9781681738215
isbn:
7.4Example data-document pair from the RotoWire data set
7.5Example input and output from the SemEval AMR-to-Text Generation Task
7.6Example shallow input from the SR’18 data set
7.7Example instance from the E2E data set
7.8Example summary from the NewsRoom data set
7.9An abridged example from the XSum data set
7.10PWKP complex and simplified example pairs
7.11Newsela example simplifications
7.12GigaWord sentence compression or summarisation example
7.13Sentence compression example
7.14Example of abstractive compression from Toutanova et al. [2016]
7.15Example of abstractive compression from Cohn and Lapata [2008]
7.16Example paraphrase pairs from ParaNMT-50
7.17Examples from the Twitter News URL Corpus
7.18Paraphrase examples from PIT-
7.19Paraphrase examples from the MSR corpus
List of Tables
6.1An abridged CNN article and its story highlights (Continues.)
6.1(Continued.) An abridged CNN article and its story highlights
7.1Summary of publicly available large corpora for summarisation
7.2Data statistics assessing extractiveness of summarisation data sets
7.3Summary of publicly available large sentential paraphrase corpora
Preface
Neural methods have triggered a paradigm shift in text production by supporting two key features. First, recurrent neural networks allow for the learning of powerful language models which can be conditioned on arbitrarily long input and are not limited by the Markov assumption. In practice, this proved to allow for the generation of highly fluent, natural sounding text. Second, the encoder-decoder architecture provides a natural and unifying framework for all generation tasks independent of the input type (data, text, or meaning representation). As shown by the dramatic increase in the number of conference and journal submissions on that topic, these two features have led to a veritable explosion of the field.
In this book, we introduce the basics of early neural text-production models and contrast them with pre-neural approaches. We begin by briefly reviewing the main characteristics of pre-neural text-production models, emphasising the stark contrast with early neural approaches which mostly modeled text-production tasks independent of the input type and of the communicative goal. We then introduce the encoder-decoder framework where, first, a continuous representation is learned for the input and, second, an output text is incrementally generated conditioned on the input representation and on the representation of the previously generated words. We discuss the attention, copy, and coverage mechanisms that were introduced to improve the quality of generated texts. We show how text-production can benefit from better input representation when the input is a long document or a graph. Finally, we motivate the need for neural models that are sensitive to the current communication goal. We describe different variants of neural models with task-specific objectives and architectures which directly optimise task-specific communication goals. We discuss generation from text, data, and meaning representations, bringing various text-production scenarios under one roof to study them all together. Throughout the book we provide an extensive list of references to support further reading.
As we were writing this book, the field had already moved on to new architectures and models (Transformer, pre-training, and fine-tuning have now become the dominant approach), and we discussed these briefly in the conclusion. We hope that this book will provide a useful introduction to the workings of neural text production and that it will help newcomers from both academia and industry quickly get acquainted with that rapidly expanding field.
We would like to thank several people who provided data or images, and authorization to use them in this book. In particular, we would like to thank Abigail See for the pointer-generator model, Asli Celikyilmaz for the diagrams of deep communicating paragraph encoders, Bayu Distiawan Trisedya for graph-triple encoders, Bernd Bohnet for an example from the 2018 surface realisation challenge, Diego Marcheggiani for graph convolutional network (GCN) diagrams, Jiwei Tan for hierarchical document encoders and graph-based attention mechanism using them, Jonathan May for an abstract meaning representation (AMR) graph, Laura Perez-Beltrachini for an extended RotoWire example, Linfeng Song for graph-state long short-term memories (LSTMs) for text production from AMR graphs, Marc’Aurelio Ranzato for exposure bias and curriculum learning algorithm diagrams, Qingyu Zhou for selective encoding figures, Sam Wiseman for a corrected RotoWire example, Sebastian Gehrmann for the bottom-up summarization diagram, Tsung-Hsien Wen for an alternative coverage mechanism plot, Xingxing Zhang for reinforcement learning for sentence simplification, and Yannis Konstas for AMR-to-text and data-to-text examples. Huge thanks to Emiel Krahmer, Graeme Hirst, and our anonymous reviewer for reviewing our book and providing us with detailed and constructive feedback. We have attempted to address all the issues they raised. All the remaining typos and inadequacies are entirely our responsibility. Finally, we would like to thank Morgan & Claypool Publishers for working with us in producing this manuscript. A very special thanks goes to Michael Morgan and Christine Kiilerich for always encouraging us and keeping us on track.
Shashi Narayan and Claire Gardent
March 2020
CHAPTER 1
Introduction
In this chapter, we outline the differences between text production and text СКАЧАТЬ