Lancsbox software options for the prospective investigation of the multilingual corpus for European studies

Authors

DOI:

https://doi.org/10.32589/2311-0821.1.2023.286180

Keywords:

European, LancsBox, corpus studies, corpus tools, automated analysis

Abstract

The paper presents a comparative analysis of the lexeme European in two language variations (British and American English) based on the built-in corpora represented by newspapers, fiction, etc. that are licensed by LancsBox software (AmE06 and BE06 respectively). The investigation describes the algorithms of implementing linguistic research as part of the project taught during the course “Multilingual Corpus and its Resources for European Studies (KNLU)” (Erasmus+ Program). The LancsBox user-friendly software, that works with major operating systems, has proved to be a powerful manager for compiling and using the existing corpora. It enables to visualize the textual data based on the following software package tools: KWIC, GraphColl, Words, Ngrams, Wizard, etc. essential for the study of a specific linguistic unit. The statistical analysis of both corpora under analysis has revealed that the word European belongs to the lexemes that are seldom employed in the language. The comparison of the two variations has shown that the word occurs in similar top-ten frequent collocates, however, the GraphColl tool visualization has indicated the major differences between two сorpora. Thus, in British English Corpus N+N structures are more commonly employed and are more vibrant than in American English Corpus. The t-test has proved a statistically significant difference between the corpora with regard to the linguistic variable European. These data may testify to cultural differences between the users of two language variations taking into account that both сorpora represent the same time frame.

References

Andrushenko, O. (2021). Information-structural transformations of additive adverb EVEN (a case study of the English language written records and corpora of the XII-XVII c.). Messenger of Kyiv National Linguistic University. Series Philology. Volume 24, No. 1,

pp. 16-32. DOI: 10.32589/2311-0821.24%20(1).2021.236109.

Andrushenko, O. (2022). The Scope of just: evidence from information-structure annotation in diachronic English Corpora. In N. Sharonova, V. Lytvyn, et al. (Eds.), Proceedings of the 6th international conference on computational linguistics and intelligent systems

(COLINS 2022), Vol. I: Main Conference, Gliwice, Poland, May 12-13, 2022 (pp. 677– 696). Available online: https://ceur-ws.org/Vol-3171/paper51.pdf

Andrushenko, O. (2023). Particularizing focus markers in Old English: just the case of adverb polysemy? Lege Artis: Language yesterday, today, tomorrow. (Accepted for publication, date of publication: December 2023).

Andrushenko О.Iu. Lancsbox software options for the prospective investigation of the multilingual corpus for European studies

Anokhina, T. (2023). Newspaper subcorpus (subcorpus of the modern european media) in the structure of the multilingual corpus. Philological Treatises. Volume 15, No. 1, pp. 7-15. DOI: 10.21272/Ftrk.2023.15(1)-1.

Baker, P. (2009). The BE06 Corpus of British English and recent language change. International Journal of Corpus Linguistics, 14 (3), 312–337. DOI: 10.1075/ijcl.14.3.02.bak.

Baker, P. (2010). Corpus methods in linguistics. In L. Litosseliti (Ed.), Research methods in linguistics (pp. 93–113). London, New York: Continuum.

Brezina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge: Cambridge University Press.

Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20 (2), 139–173.

Brezina, V., Porizka, P. (2021). Kolokační grafy a sítě s použitím nástroje #LancsBox: aplikace v angličtině a češtině. Časopis pro moderní filologii, 103, Č. 1, 36–59. DOI: 10.14712/23366591.2021.1.

Brezina, V., Timperley, M., & McEnery, T. (2018). #LancsBox 4.x [software]. Available online: http://corpora.lancs.ac.uk/lancsbox.

Brezina, V., Weill-Tessier, P., & McEnery, T. (2020). #LancsBox 5.x and 6.x [software]. Available online: http://corpora.lancs.ac.uk/lancsbox.

Collins, L. (2019). Corpus linguistics for online communication: A guide for research. New-York: Routledge.

Davies, M. (2019). The best of both worlds: Multi-billion word “dynamic” corpora. In P. Banski at al (Eds.). Proceeding of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019 (pp. 23–28). Manhein: Leibniz Institute für Deutsche Sprache. DOI: 10.14618/ids.pub.8998.

Gries, S. (2013). 50-something years of work on collocations: What is or should be next.... International Journal of Corpus Linguistics, 18(1), 137–166. DOI: 10.1075/ijcl.18.1.09gri.

Johansson, S. (2009). Some aspects of the development of corpus linguistics in thr 1970s and 1980s. In Anke Lüdeling & Merja Kytö (Eds), Corpus linguistics: An international handbook (pp. 33–53). Berlin: De Gruyter.

Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography, 1 (1), 7–36. DOI: 10.1007/s40607-014-0009-9.

Lange, C. & Leuckert, S. (2020) Corpus linguistics for world Englishers: A guide for research. New-York: Routledge.

Lavidas, N. & Haugh, D.T.T. (2020). Postclassical Greek and treebanks for a diachronic analysis. In D. Rafiyenko & I. Seržant (Eds.), Postclassical Greek: contemporary approaches to philology and linguistics (pp. 163–202). Berlin: Walter de Gruyter.

Lawrence, S. (2019). A rite of the edge: The language of baptism and christening in the church of England. London: SCM Press.

López-Couso, M. J., Méndez-Naya, A., Núñez-Pertejo, B. P., & Palacios-Martínez, I.M. (2016). Corpus linguistics on the move. Exploring and understanding English through corpora. Leiden, Boston: Brill Rodopi.

McEnery, T. & Hardie, A. (2015) Corpus Linguistics. New-York: Routledge.

O’Keeffe, A., McCarthy, M. (2021). The Routledge Handbook of Corpus Linguistics. NewYork: Routledge.

Potts, A., & Baker, P. (2012). Does semantic tagging identify cultural change in British and American English? International Journal of Corpus Linguistics, 17 (3), 295–324.

Rissanen, M. (2009). Corpus linguistics and historical linguistics. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (pp. 53–68). Berlin: De Gruyter.

Stefanowitsch, A. (2020). Corpus linguistics: a guide to the methodology. Berlin: Language Science.

Whitt, R. (2018). Using diachronic corpora to understand the connection between genre and language change. In R. Whitt (Ed.), Diachronic corpora, genre, and language change, (pp. 1-18), Amsterdam, Philadelphia: John Benjamins Publ

Downloads

Published

2023-08-31

Issue

Section

Articles