Multilingual corpus and its software for European studies research

Authors

DOI:

https://doi.org/10.32589/2311-0821.1.2023.286184

Keywords:

multilingual corpus, combined type of text corpus, European Studies, computer programs, corpus managers

Abstract

The paper proposes the method of working with the functionality of the computer programs AntConc, WordSmith, WordList, MonoConc Pro, CATMA, and KORP, which can be used to study the multilingual corpus texts on the topic of European Studies. Various debatable views of foreign scientists ‒ representatives of corpus linguistics ‒ regarding the content of the concept of the multilingual corpus have been considered. Besides, there has been formulated a working definition of the multilingual comparative corpus that belongs to the combined type of thematically oriented corpora of texts in different languages,
grouped into sub-corpora, with their translation into other languages (or with the possibility to use computer programs to translate texts). The paper also defines the role of corpus linguistic statistics, which deals with the analyzed computer programs and allows to calculate the frequency of words or collocations use, construct diagrams of the frequency of word or collocation use in sub-corpora of texts, etc. Thus there has been made a preliminary conclusion that the computer toolkit of the corpus managers AntConc, WordSmith, WordList, MonoConc Pro, CATMA and KORP makes it possible to construct both individual Key Word in Context (KWIC) and concordance lists of search elements on the subject of European Studies; distinguish between the functionality of one or another contextual meaning of search units in their most probable left-handed and right-handed valency in different languages; see the results of statistical processing of the information from corpus tags; save and print the results; support different formats of text data (txt, doc, rtf, html, etc.).

References

Андрушенко, О. (2022). Комплексна методика дослідження фокусувальних адвербів у сучасних та історичних корпусах текстів англійської мови. Innovative pathway for the development of modern philological sciences in Ukraine and EU countries (pp. 26–54). Publishing House “Baltija Publishing”.

Бобер, Н. М. (2020). Когнітивно-семантична матриця фразових дієслів емоційних станів людини у Британському національному корпусі [Дис. канд. філол. наук, Національний педагогічний університет імені М. П. Драгоманова].

Жуковська, В. В. (2013). Ресурси корпусної лінгвістики у дослідженні історичної динаміки мови. В Матеріали міжнародної наукової конференції “Слово і речення: синтактика, семантика, прагматика” (с. 151–156). Київський університет імені Бориса Грінченка.

Мейзерська, І. В. (2014). Корпусний підхід у сучасній лінгвістиці: перспективи і можливості застосування. Науковий вісник кафедри Юнеско. Серія Філологія. Педагогіка. Психологія, 28, 53–58.

Широков, В. А., Бугаков, О. В., & Грязнухіна, Т. О. (2005). Корпусна лінгвістика. Київ: Довіра.

Afli, H., Barrault, L., & Schwenk, H. (2012). Traduction automatique à partir de corpus comparables: extraction de phrases parallèles à partir de données comparables multimodales. Actes de la conférence conjointe JEP-TALN-RECITAL, 2, 447–454.

Barzilay, R., & Lee, L. (2003). Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. Edmonton.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press.

Bowker, L., & Pearson, J. (2002). Working with Specialized Language: A Practical Guide to Using Corpora. Routeledge.

Brawn, S. (2007). Designing and exploiting small multimedia corpora for autonomous learning and teaching. In E. Hidalgo, L. Quereda, J. Santana (Eds.), Corporain the Foreign Language Classroom: Selected Papers from the Sixth International Conference on Teaching and Language Corpora (TaLC 6, p. 32–33.). Rodopi.

Cartoni, B., & Deléger, L. (2011). Découverte de patrons paraphrastiques en corpus comparable: une approche basée sur les n-grammes. In Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts (p. 182–187). Montpellier.

Culo, O., Schirra, S.H., Neumann, S., & Vela, M. (2008). Empirical Studies on Language Contrast Using the English-German Comparable and Parallel Corpus. In N. Calzolari (Ed.), Workshop abstracts / Sixth International Conference on Language Resources and Evaluation (pp. 47–51). Palais des Congrès Mansour Eddahbi.

Elhadad, N., & Sutaria, K. (2007). Mining a Lexicon of Technical Terms and Lay Equivalents. In ACL BioNLP Workshop (p. 49–56). Prague.

Guidère, M. (2010). Introduction à la traductologie. Penser la traduction: hier, aujourd‟hui, demain. De Boeck Université.

Johns, T. (1991). Should You Be Persuaded – Two Samples of Data-driven Learning Materials. Classroom Concordancing: ELR Journal, 4, 1–16.

Kapranov, Ya. (2022). AntСonc corpus manager and its possibilities for keywords with resilience semantics search. In R. Vasko (Ed.), Language. Culture. Discourse (p. 22–32). РС ТЕСHNOLOGY СЕNTЕR.

Kilgariff, A. (2001). Comparing Corpora. International Journal of Corpus Linguistics, 6, 97–133.

McEnery, A. M. (2003). Corpus Linguistics. In R. Mitkov (Ed.), The Oxford Handbook of Computational Linguistics (p. 448–463). Oxford University Press.

McЕnery, A., & Xiao, Z. (2007). Parallel and Comparable Corpora: What is Happening. In Incorporating Corpora: Translation and the Linguist (p. 18–31). Multilingual Matters.

O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From Corpus to Classroom : Language Use and Language Teaching. Cambridge University Press.

Rauf, S. (2012). Efficient Corpus Selection for Statistical Machine Translation: thèse de Doctorat. Université du Maine.

Reppen, R. (2010). Using Corpora in the Language Classroom. Cambridge University Press.

Resnik, P., & Smith, N. A. (2003). The Web as a parallel corpus. Computational Linguistics, 29(3), 349‒380.

Scarpa, F. (2010). La traduction spécialisée: une approche professionnelle à l’enseignement de la traduction. University of Ottawa Press.

Zanettin, F. (1998). Bilingual Corpora and the Training of Translators. Meta, 4(43), 616–630.

Published

2023-08-31

Issue

Section

Articles