OPUS corpus toolkit for ensuring intelligent translation (case study of L1 and L2 texts of English-Ukrainian film discourse)
DOI:
https://doi.org/10.32589/2311-0821.2.2022.274929Keywords:
translation memory, Computer-Aided Translation, Machine Translation, Parallel corpus toolkit, Corpus Linguistics, OPUSAbstract
The article explains the concept of “translation memory” and defines it as a computer database where segments of texts of different L1 discourses are represented, as well as equivalents of these segments in L2. Computer-Aided Translation, Machine Translation and Parallel corpus toolkit are outlined as the main types of translation memory. In particular, Computer-Aided Translation is considered as the process of translating L1 text to obtain L2 by using specialized computer software. In this way, the human factor plays one of the most important missions in the process of performing Computer-Aided Translation, because the L1 text is subjected to three types of processing: pre-, inter- and post-editing. Machine Translation is viewed in a narrow sense as the process of translating a text from L1 to L2, that is performed by a computer in whole and/or in part, and in a broad sense as a branch of scientific research, that is in the focus of Linguistics, Mathematics and Cybernetics, and aims to build a system that implements Machine Translation in the narrow sense of this concept. Parallel corpus toolkit is a database with a set of L1 and L2 texts, that contains a large number of texts of different discourses, issues and topics. In addition, the attention is paid to the OPUS corpus toolkit as one of the translation memory types, which ensures the efficiency of the process of intelligent translation and is currently a free corpus system in the public domain, which contains corpora of texts from L1 and L2 to L3...Ln from numerous Internet resources and is constantly updated. The tested resource capabilities of the OPUS corpus tool have proved their effectiveness in the process of verification of one-, two-, and three-component L2 lexical constructs on the example of L1 and L2 text fragments belonging to film discourse.
References
Ємельянова, О. В., Мовчан, Д. В., & Баранова, С. В. (2018). ХХІ століття – нова ера
можливостей для студентів перекладачів. Проблеми освіти : збірник наукових
праць, 89, 134–144.
Попович, Н. М., Луцків, А. М., & Тищук, А. Г. (2020). Corpus-Based Concept
Translation. Фаховий та художній переклад: теорія, методологія, практика:
матеріали Міжнародної науково-практичної конференції, 306–314.
Alsop, S., King, V., Giaimo, G., & Xu, X. (2020), Uses of Corpus Linguistics in Higher
Education Research: An Adjustable Lens. In Huisman, J. and Tight, M. (Ed.) Theory
and Method in Higher Education Research (Theory and Method in Higher Education
Research, Vol. 6), Emerald Publishing Limited, Bingley, pp. 21–40. https://doi.
org/10.1108/S2056-375220200000006003
Cheng, Y., Jiang, L., & Macherey, W. (2019). Robust Neural Machine Translation with
Doubly Adversarial Inputs. Proceedings of ACL, 4324–4333.
Chitez, M., & Pungǎ, L. (2020). Digital Methods of Translation Studies: Using Corpus Data
To Assess Trainee Translations. British and American Studies; Timisoara Vol. 26, 241–270.
Halacsy, P., Kornai, A., & Oravecz, C. (2007). Poster paper: Hunpos – an open source
trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for
Computational Linguistics Companion Volume Proceedings of the Demo and Poster
Sessions, (pp. 209–212), Prague, Czech Republic, June. Association for Computational
Linguistics.
Kay, M. (1980). The proper place of men and machines in language translation. Xerox Palo
Alto Research Center, 1–21.
Kruger, A. (2002). “Corpus-based Translation Research: Its Development and Implications
for General, Literary and Bible Translation” in Acta Theologica Supplementum, 2, 70–106.
Neumann, S., Freiwald, J., & Heilmann, A. (2022). On the Use of Multiple Methods in
Empirical Translation Studies: A Combined Corpus and Experimental Analysis of
Subject Identifiability in English and German. In S. Granger & M. Lefer (Authors),
Extending the Scope of Corpus-Based Translation Studies (pp. 98–129). London:
Bloomsbury Academic.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., & Marsi,
E. (2007). MaltParser: A Language Independent System for Data-Driven Dependency
Parsing. Natural Language Engineering, 13(2), 95–135.
Pylypiuk, K. M. (2022). On the Issue of Interaction of Linguistic Regional Studies and
Translation Theory and Practice. Закарпатські філологічні студії, 22(1), 221–225.
https://doi.org/10.32782/tps2663-4880/2022.21.1.41
Stefanowitsch, A. (2020). Corpus Linguistics: A Guide to the Methodology. Berlin:
Language Science Press. https://doi.org/10.5281/zenodo.3735822
Tiedemann, J. (2009). News from OPUS – a Collection of Multilingual Parallel Corpora
with Tools and Interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, & R. Mitkov.
Recent Advances in Natural Language Processing, V, 237–248. John Benjamins,
Amsterdam/Philadelphia, Borovets, Bulgaria.
Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. In LREC Conferences,
–2218.
Tognini-Bonelli, E. (2001). Corpus Linguistics at Work. Studies in Corpus Linguistics, 6.
Amsterdam: John Benjamns.
Yifan He (2011). The Integration of Machine Translation and Translation Memory: Thesis.
Dublin City University School of Computing.
Downloads
Published
Issue
Section
License
1. Authors take full responsibility for the content of the articles as well as the fact of their publication.2. All the authors must follow the current requirements for publication of manuscripts. Plagiarism itself and its representation as the original work as well as submission to the editorial office previously published articles are unacceptable. In case of plagiarism discovery the authors of the submitted materials take all the responsibility.
3. Authors shall inform the editor of any possible conflict of interests which could be influenced by the publication of the manuscript results.
4. The editorial board has the right to refuse publication of an article in case of non-compliance with these requirements.