Introduction to Corpus Linguistics. Sandrine Zufferey
Чтение книги онлайн.

Читать онлайн книгу Introduction to Corpus Linguistics - Sandrine Zufferey страница 10

Название: Introduction to Corpus Linguistics

Автор: Sandrine Zufferey

Издательство: John Wiley & Sons Limited

Жанр: Учебная литература

Серия:

isbn: 9781119779704

isbn:

СКАЧАТЬ research paradigms involving a qualitative methodology typically resort to the administration of questionnaires with open questions, interviews, observations or introspective techniques, such as think-aloud protocols. For example, in order to study the differences in the way of expressing emotions between men and women, a qualitative methodology could involve asking a reduced number of speakers, for example three men and three women, to describe the way in which they express their emotions, either by talking freely with the experimenter or by talking to each other. The analysis would then require an in-depth study of some of the examples found interesting during the discussion.

      One of the main criticisms aimed at qualitative methods is that they are very subjective in nature, insofar as they are largely based on the interpretations made by linguists and the subjective impressions of a few speakers. Thus, the specific cases they describe cannot often be generalized to a population, which, by the way, is not the aim pursued by such studies. Rather than the generalization of results, these studies are based on the possibility of making a transfer from a particular situation so as to understand another one with which it shares common traits. For example, an in-depth case study on the difficulties of expressing emotions in an aphasic patient may help to highlight similar difficulties existing in other patients with the same disorder.

      For example, if we want to know whether learners of French as a foreign language at an advanced level are able to use collocations as native speakers do (collocations such as “prendre une décision” – to make a decision – or “pleuvoir à verse” – to pour with rain), we can search for occurrences of these expressions in text corpora produced by learners and compare the number of times these expressions appear – and their frequency – in a corpus of similar textual productions made by native speakers. By comparing these frequencies through statistical tests, we will know whether learners actually use these expressions as often as native speakers do, or not. Even if we find a difference between the two groups, something which this study will not tell us is why learners do not use these expressions as often as native speakers do or which expressions they use instead. To find out, we can complete this study with a qualitative analysis, by observing, for example, which words often accompany the occurrences of the noun décision in French, which are not the verb prendre. If we observe that several times the verb used is faire (make), rather than prendre (take), a decision in English-speaking learners, but not in German-speaking learners, we will conclude that these errors could come from a problem of transfer from their mother tongue and, more specifically, from the expression to make a decision in English.

      In summary, a corpus can be analyzed using a quantitative or qualitative methodology. While we acknowledge the use and importance of combining these two approaches, in the rest of the book we will focus on the quantitative approach to corpus linguistics, which poses its own theoretical and methodological challenges.

      Corpus linguistics and experimental linguistics share very important methodological properties, since both are empirical in nature and both generally involve a quantitative rather than a qualitative approach. However, these two types of approaches differ in one very important point. On the one hand, corpus linguistics focuses on data observation as found in collections of texts, recordings, etc. On the other hand, experimental linguistics points to the manipulation of one or more variables in order to study their effect on other variables.

      Let us imagine once again that we are interested in the types of language errors produced by learners of French. By means of a corpus study, we will be able to identify all the types of errors produced and then quantify each of them: for example, 30 spelling mistakes, 12 lexicon errors, 20 syntax mistakes, etc., made every 100 words. Then, by applying statistical tests, we will be able to determine whether one of the error categories is significantly more frequent than the others. We will also be able to compare the number of errors produced in each category by students of different levels and, thanks to statistical tests, determine whether students make significant progress faster in certain categories than in others. In contrast, what a corpus study will not help you to do is establish with certainty the factors influencing the number of errors. The corpus only shows you the result of the speakers’ production, but not what led to these results. In order to determine the factors that lead learners to make mistakes or not, we will need to resort to experimental methodology.

      When we conduct an experiment, the goal is to manipulate the possible causes and then to observe their effects. Going back to our example research question, we may wonder what makes some students produce more errors than others, and in certain contexts, what makes the same student produce more errors than in other contexts. As regards the difference between students, we may think that one possible cause is the level of general intelligence of each student, the assumption being that overall smarter students should produce fewer errors than less intelligent students. The level of intelligence thus constitutes the cause that we will manipulate in order to observe its effect on the number of errors produced. In order to measure the effect of the intelligence variable, we will first need to measure the students’ intelligence, for example by means of an IQ test. We will then use the result of this test to determine whether the students who have a higher IQ are also the ones who make the fewest language errors.

      The study of linguistic productions in a corpus and the manipulation of experimental variables both have their advantages and disadvantages. On the one hand, corpus linguistics has the advantage of favoring the observation of natural data, that is, those which are not influenced by an experimental context. A corpus of journalistic texts includes real productions by journalists, which are not produced for the purpose of being observed. Likewise, a text produced by a learner is also natural, insofar as it is produced in its usual conditions, without there having been any particular manipulation. In addition, the use of corpora favors the observation of a very large amount of linguistic data, whereas experiments are based on a limited number of linguistic items for the task to remain feasible for participants, who would not be able to read thousands of sentences at a laboratory, for example. Finally, once a corpus has been created, it can be used for numerous research questions without requiring any additional time or financial costs. On the other hand, experiments require significant time resources as well as the usual obligation of having to financially compensate participants for their cooperation.

      Experimental studies also have definite advantages over corpus studies. The first advantage, mentioned above, is that experiments allow us to test the existence СКАЧАТЬ