Introduction to Corpus Linguistics. Sandrine Zufferey
Чтение книги онлайн.

Читать онлайн книгу Introduction to Corpus Linguistics - Sandrine Zufferey страница 7

Название: Introduction to Corpus Linguistics

Автор: Sandrine Zufferey

Издательство: John Wiley & Sons Limited

Жанр: Учебная литература

Серия:

isbn: 9781119779704

isbn:

СКАЧАТЬ

      The problem of manual tracking and counting of occurrences is all the more acute since corpus linguistics is often based on large amounts of data which have not been drawn from a single book, in view of observing the multiple occurrences of a certain linguistic phenomenon and thus apprehending its specificities. For example, let us suppose that we wish to know whether Flaubert talks about love in his work. In this case, focusing solely on Madame Bovary would induce a bias, because this novel is not representative of the whole of his work. So, in order to be able to answer this question, it is necessary to go through the entirety of his novels, making the task even more complex to perform manually. Let us now imagine that this time we want to know whether the French authors of the 19th Century all deal with the question of love as much as Flaubert does. In this case, it would be impossible for us to look up the occurrence of terms related to love in all of the novels written by French authors in the 19th Century. In order to avoid this problem, it would be necessary to collect a sample of texts, representative of the works of this period. We will discuss this topic in Chapter 6, which is devoted to the methodological principles underlying the construction of a corpus. For the moment, the important point to bear in mind is that corpus linguistics often resorts to a quantitative methodology (see section 1.5) so as to be able to generalize the conclusions observed on the basis of a linguistic sample to the whole of the language, or belonging to a particular language register.

      To sum up, in this section, we have defined corpus linguistics as an empirical discipline, which observes and analyzes quantitative language samples gathered in a computerized format. In the following sections, we will discuss in depth the different central points of the definition, indicated in bold, in order to better understand the theoretical and methodological anchoring of corpus linguistics.

      Corpus linguistics is an empirical discipline, which means that it uses data produced by speakers in order to study language. This methodology is opposed to the rationalist method, which functions by looking for answers by relying on one’s own linguistic knowledge, rather than looking for it in external data. Let us take an example. In order to determine whether the phrase “When do you think he will prepare which cake?” is grammatically correct or not, the use of empirical methodology would go through large corpora to find whether this syntactic structure is used by English speakers or not.

      If sentences following such a syntactic structure never or almost never appear in the corpus, linguists might conclude that this sentence is only rarely used in English. Rationalist methodology, on the contrary, might respond to the same issue by relying on the intuitions of linguists. In this particular case, they might wonder whether they could produce such a sentence or not, whether it seems correct or incorrect depending on their knowledge of the language and might infer a grammaticality judgment from it. Grammaticality judgments are often classified into three types: correct, incorrect or marked, in the event that a sentence may seem possible, but sounds unnatural.

      This example illustrates a fundamental difference between empirical and rationalist methodology. While the rationalist methodology leads to the formulation of categorical judgments, the empirical methodology provides a more refined answer to this question, since the observation of corpus data offers a precise indication of frequency, rather than a result in terms of absence or presence. This is one of the reasons why many linguists currently consider that the empirical methodology better matches a scientific approach (in the sense of confrontation against the facts) than a purely rationalist method for studying language.

      Nonetheless, the choice between the use of empirical or rationalist methods is not limited to the field of linguistics. Certain scientific branches such as physics, chemistry, as well as sociology and history are essentially empirical disciplines. In fact, both physicists and historians base their insights on external data, which they collect in the world, in order to build a theory, test it and draw conclusions from it. On the other hand, other disciplines such as mathematics or philosophy are traditionally based on a rationalist approach, since mathematicians and philosophers use their own reasoning to build theories and to draw conclusions, rather than from the collection and observation of external data. Philosophers often resort to thought experiments, but these are not experiments in the empirical sense of the term, because they are based on the reflective abilities of researchers.

      Although corpus linguistics has experienced a strong growth over the past 20 years, the empirical grounding of linguistics is not new. Linguists have long used observational data. In the 19th Century, for example, linguists used to work on the comparison of Indo-European languages in an attempt to reconstruct their common origin. Research was based on existing data about the languages spoken in Europe such as German, French and English. Similarly, in the first half of the 20th Century in the United States, the so-called distributionist approach to syntax focused on the study of sentence formation in syntactic structures as they appeared in text corpora, and from there, tried to infer language’s general functioning. Around the late 1950s, the use of corpora in linguistics was almost completely interrupted in certain fields such as syntax, following the works of the American linguist Noam Chomsky. In fact, Chomsky defended a strictly rationalist methodological approach to linguistics, and fiercely opposed any use of external data. The objections made by Chomsky against the use of external data in linguistics have been numerous. We will briefly review them, to show in what ways most of them have lost their raison d’être in the context of current research.

      Chomsky’s СКАЧАТЬ