Introduction to Corpus Linguistics. Sandrine Zufferey
Чтение книги онлайн.

Читать онлайн книгу Introduction to Corpus Linguistics - Sandrine Zufferey страница 12

Название: Introduction to Corpus Linguistics

Автор: Sandrine Zufferey

Издательство: John Wiley & Sons Limited

Жанр: Учебная литература

Серия:

isbn: 9781119779704

isbn:

СКАЧАТЬ make it possible to study the history of a language, going back to ancient French, for example. Contemporary corpora are used for studying language in a synchronic way, that is, at a given moment during its evolution, whereas historical corpora make it possible to carry out studies from a diachronic point of view, that is, on the evolution of language.

      In this chapter, we have defined corpus linguistics as an empirical discipline, that is, based on the observation of real data. We have also seen that corpus linguistics often resorts to a quantitative methodology, studying a large sample of data which is representative of the phenomenon studied, with the aim of generalizing the observations to the whole of the language or to a language’s register. We have shown that the main difference between corpus linguistics and experimental linguistics is the way in which empirical data are collected. In the case of corpus linguistics, data are collected in a natural context and then observed, whereas in the case of experimental linguistics, one or more causes are manipulated within a controlled context in order to observe their effects. Finally, we have seen that corpora can be very diverse in nature, depending on whether they are made up once and for all or incremental, general or specialized, annotated or not, monolingual or multilingual, synchronous or diachronic.

      1.9.1. Questions

      1) Which of the following disciplines traditionally involves a rationalist methodology, and which disciplines are based on an empirical methodology? Can we think of any situation in which a discipline of a rather empirical nature could have recourse to a rationalist methodology and vice versa?

      chemistry – ethics – medicine – law – anthropology

      2) Among Chomsky’s objections to corpus linguistics, which of them can also be applied to the experimental methodology?

      3) In the research projects mentioned below, which one seems to use corpora as a methodological tool (corpus-based) and which seems to use corpora as a theoretical tool (corpus-driven)?

      a) Search in a corpus for all passive voice sentences in order to formulate the rules governing the use of this construction in French.

      b) Search in a corpus for all passive voice sentences in order to determine whether they are used more with state verbs than with activity verbs.

      4) Why have computing tools especially devoted to corpus linguistics been developed? What are their main functions?

      5) Look for an example of a quantitative study and another qualitative study that could be done so as to determine the most common types of spelling mistakes made by children. Which would be the specific contributions of each of these studies?

      6) How could we use a corpus and carry out an experiment to study the question of the different types of spelling errors in a complementary way?

      7) What type of corpus should be used to work with the research questions stated below?

      a) Study of the pronunciation of vowels in French-speaking Switzerland.

      b) Study of the evolution of word construction using the prefix hyperin French.

      c) Study of possible translations of idioms from French into English.

      1.9.2. Answer key

      1) First of all, let us recall that the rationalist methodology interrogates the knowledge of the researcher by means of introspection and reasoning, whereas the empirical methodology looks for answers by observing or experimenting on data that is external to the researcher. Chemistry is typically an empirical science, which makes extensive use of experimentation and observation. Ethics is a philosophical discipline that involves reflections on moral questions. These reflections are, by nature, introspective and involve a rationalist methodology. Law is a science that studies the rules and laws that govern social relationships. Many aspects of the law involve the interpretation of existing rules or the creation of rules based on reasoning and common sense. Thus, introspection plays a big role. That being said, in certain cases, law also deals with external data. For example, a search can be performed throughout previous decisions (case law), in order to find a similar case that could apply to a certain situation. The role of case law is very different in different legal systems. In English-speaking countries, which apply the common law, previous cases play a fundamental role, because they become binding rules for solving the following cases. We can therefore say that in these countries, the part of empiricism when applying the law is also very important. Anthropology is a discipline that studies humanity in its various aspects (physiological, social and cultural). This discipline places great importance on the observation of data. Despite the fact that we can generally classify a branch as being rather empirical or rationalist in nature, we should bear in mind that these two methodologies are often present in varying degrees. For instance, we have already discussed the case of law, where not only an introspective element is involved, but also the use of external data in the form of case law. We can also imagine other situations of interaction between methodologies. For example, we have classified ethics as a rationalist discipline. Nevertheless, ethics was also built on the basis of empirical material. In the field of medicine in particular, medical ethics is based on the facts observed in practice.

      2) Chomsky notably criticized corpus linguistics for offering only a partial vision of language, insofar as a corpus includes the productions of a limited number of speakers, at a given situation. This same observation also applies to the experimental methodology, which tests a small number of speakers along a limited number of linguistic stimuli. The main response to such criticism is that these areas are based on the use of quantitative methods (namely inferential statistics), which make it possible to draw conclusions from a sample and to extrapolate them to an entire population. The criticism of the potentially problematic choice of subjects who could be aphasic and not represent the normal use of language also applies to experimental methodology. In theory, though, such subjects could also be recruited for an experiment by mistake. That being said, good practices in corpus linguistics and experimental linguistics require obtaining information about participants beforehand, which can eventually eliminate this type of bias. Typically, researchers verify that the people who contribute to a French corpus are native French speakers. Likewise, they test the language skills of speakers before considering them by default as French-speaking, bilingual, etc.

      b) On the other hand, this type of research is corpus-based, because it starts from a hypothesis (e.g. “passive sentences tend to be used more frequently with state verbs”), and seeks to verify it in the corpus, which, in that way, only works as an analysis tool.

      4) These tools have been developed for simplifying searches within a corpus. Otherwise, it would be very inconvenient to use the standard tools that are present in a word processor, for example. In particular, concordancers make it very easy to extract all the occurrences of a word or an expression with its left and right context, as well as to determine its main collocations. These tools also help us create a list of all the words in the corpus, sorted by frequency. While one corpus can be compared to another reference corpus, these tools also make it possible to extract a list of keywords that are specific to the corpus studied. In the field of multilingual corpora, aligners СКАЧАТЬ