OPUS corpus toolkit for ensuring intelligent translation (case study of L1 and L2 texts of English-Ukrainian film discourse)





translation memory, Computer-Aided Translation, Machine Translation, Parallel corpus toolkit, Corpus Linguistics, OPUS


The article explains the concept of “translation memory” and defines it as a computer database where segments of texts of different L1 discourses are represented, as well as equivalents of these segments in L2. Computer-Aided Translation, Machine Translation and Parallel corpus toolkit are outlined as the main types of translation memory. In particular, Computer-Aided Translation is considered as the process of translating L1 text to obtain L2 by using specialized computer software. In this way, the human factor plays one of the most important missions in the process of performing Computer-Aided Translation, because the L1 text is subjected to three types of processing: pre-, inter- and post-editing. Machine Translation is viewed in a narrow sense as the process of translating a text from L1 to L2, that is performed by a computer in whole and/or in part, and in a broad sense as a branch of scientific research, that is in the focus of Linguistics, Mathematics and Cybernetics, and aims to build a system that implements Machine Translation in the narrow sense of this concept. Parallel corpus toolkit is a database with a set of L1 and L2 texts, that contains a large number of texts of different discourses, issues and topics. In addition, the attention is paid to the OPUS corpus toolkit as one of the translation memory types, which ensures the efficiency of the process of intelligent translation and is currently a free corpus system in the public domain, which contains corpora of texts from L1 and L2 to L3...Ln from numerous Internet resources and is constantly updated. The tested resource capabilities of the OPUS corpus tool have proved their effectiveness in the process of verification of one-, two-, and three-component L2 lexical constructs on the example of L1 and L2 text fragments belonging to film discourse.


Ємельянова, О. В., Мовчан, Д. В., & Баранова, С. В. (2018). ХХІ століття – нова ера

можливостей для студентів перекладачів. Проблеми освіти : збірник наукових

праць, 89, 134–144.

Попович, Н. М., Луцків, А. М., & Тищук, А. Г. (2020). Corpus-Based Concept

Translation. Фаховий та художній переклад: теорія, методологія, практика:

матеріали Міжнародної науково-практичної конференції, 306–314.

Alsop, S., King, V., Giaimo, G., & Xu, X. (2020), Uses of Corpus Linguistics in Higher

Education Research: An Adjustable Lens. In Huisman, J. and Tight, M. (Ed.) Theory

and Method in Higher Education Research (Theory and Method in Higher Education

Research, Vol. 6), Emerald Publishing Limited, Bingley, pp. 21–40. https://doi.


Cheng, Y., Jiang, L., & Macherey, W. (2019). Robust Neural Machine Translation with

Doubly Adversarial Inputs. Proceedings of ACL, 4324–4333.

Chitez, M., & Pungǎ, L. (2020). Digital Methods of Translation Studies: Using Corpus Data

To Assess Trainee Translations. British and American Studies; Timisoara Vol. 26, 241–270.

Halacsy, P., Kornai, A., & Oravecz, C. (2007). Poster paper: Hunpos – an open source

trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for

Computational Linguistics Companion Volume Proceedings of the Demo and Poster

Sessions, (pp. 209–212), Prague, Czech Republic, June. Association for Computational


Kay, M. (1980). The proper place of men and machines in language translation. Xerox Palo

Alto Research Center, 1–21.

Kruger, A. (2002). “Corpus-based Translation Research: Its Development and Implications

for General, Literary and Bible Translation” in Acta Theologica Supplementum, 2, 70–106.

Neumann, S., Freiwald, J., & Heilmann, A. (2022). On the Use of Multiple Methods in

Empirical Translation Studies: A Combined Corpus and Experimental Analysis of

Subject Identifiability in English and German. In S. Granger & M. Lefer (Authors),

Extending the Scope of Corpus-Based Translation Studies (pp. 98–129). London:

Bloomsbury Academic.

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., & Marsi,

E. (2007). MaltParser: A Language Independent System for Data-Driven Dependency

Parsing. Natural Language Engineering, 13(2), 95–135.

Pylypiuk, K. M. (2022). On the Issue of Interaction of Linguistic Regional Studies and

Translation Theory and Practice. Закарпатські філологічні студії, 22(1), 221–225.


Stefanowitsch, A. (2020). Corpus Linguistics: A Guide to the Methodology. Berlin:

Language Science Press. https://doi.org/10.5281/zenodo.3735822

Tiedemann, J. (2009). News from OPUS – a Collection of Multilingual Parallel Corpora

with Tools and Interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, & R. Mitkov.

Recent Advances in Natural Language Processing, V, 237–248. John Benjamins,

Amsterdam/Philadelphia, Borovets, Bulgaria.

Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. In LREC Conferences,


Tognini-Bonelli, E. (2001). Corpus Linguistics at Work. Studies in Corpus Linguistics, 6.

Amsterdam: John Benjamns.

Yifan He (2011). The Integration of Machine Translation and Translation Memory: Thesis.

Dublin City University School of Computing.