Deep Learning Approaches to Text Production. Shashi Narayan
Чтение книги онлайн.

Читать онлайн книгу Deep Learning Approaches to Text Production - Shashi Narayan страница 15

СКАЧАТЬ back propagates through multiple bounded non-linearities.

      LSTMs and GRUs have been very successful in modelling natural languages in recent years. They have practically replaced the vanilla RNN cell from recurrent networks.

      One of the key strengths of neural networks is that representation learning happens in a continuous space. For example, an RNN learns a continuous dense representation of an input text by encoding the sequence of words making up that text. At each time step, it takes a word represented as a continuous vector (often called a word embedding). In sharp contrast to pre-neural approaches, where words were often treated as symbolic features, word embeddings provide a more robust and enriched representation of words, capturing their meaning, semantic relationships, and distributional similarities (similarity of context they appear in).

      Figure 3.7 represents two-dimensional representation of word embeddings. As can be seen, words that often occur in a similar context (e.g., “battery” and “charger”) are mapped closer to each other compared to words that do not occur in a similar context (e.g., “battery” and “sink”). Word embeddings give a notion of similarity among words that look very different from each other in their surface forms. Due to this continuous representation, neural text-production approaches lead to a robust model and better generalisation compared to pre-neural approaches that use symbolic representations, making them brittle. Mikolov et al. [2013] further show that these word embeddings demonstrate compositional properties in distributional space, e.g., one could start from the word “queen” and get to the word “woman” following the direction from the word “king” to the word “man”.

      Given a vocabulary V , we represent each word w ∈ V by a continuous vector ew ∈ Rd of length d. We define a word embedding matrix W ∈ R|V|×d, representing each word in the vocabulary V. Earlier neural networks often used pre-trained word embeddings such as Word2Vec [Mikolov et al., 2013] or Glove [Pennington et al., 2014]. Using these approaches, the word embedding matrix W is learned in an unsupervised fashion from a large amount of raw text. Word2Vec adapts a predictive feed-forward model, aiming to maximise the prediction probability of a target word, given its surrounding context. Glove achieves this by directly reducing the dimensionality of the co-occurrence counts matrix. Importantly, embeddings learned from both approaches capture the distributional similarity among words. In a parallel trend to using pre-trained word embeddings, several other text-production models have shown that word embeddings can first be initialised randomly and then trained jointly with other network parameters; these jointly trained word embeddings are often fine tuned and better suited to the task.

      Конец ознакомительного фрагмента.

      Текст предоставлен ООО «ЛитРес».

      Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

      Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.

/9j/4R7lRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEaAAUAAAABAAAAYgEbAAUAAAABAAAA agEoAAMAAAABAAIAAAExAAIAAAAeAAAAcgEyAAIAAAAUAAAAkIdpAAQAAAABAAAApAAAANAALcbA AAAnEAAtxsAAACcQQWRvYmUgUGhvdG9zaG9wIENTNiAoV2luZG93cykAMjAyMDowMzoyMyAxMjox MzozMQAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAIw6ADAAQAAAABAAAK1AAAAAAAAAAGAQMAAwAA AAEABgAAARoABQAAAAEAAAEeARsABQAAAAEAAAEmASgAAwAAAAEAAgAAAgEABAAAAAEAAAEuAgIA BAAAAAEAAB2vAAAAAAAAAEgAAAABAAAASAAAAAH/2P/tAAxBZG9iZV9DTQAB/+4ADkFkb2JlAGSA AAAAAf/bAIQADAgICAkIDAkJDBELCgsRFQ8MDA8VGBMTFRMTG
СКАЧАТЬ