The translator, an endangered species?
Associate Professor at the School of Applied Language and Intercultural Studies as well as at the ADAPT Centre at Dublin City University, Ireland.
The first public machine translation experiment took place in 1954. Led by researchers from IBM and Georgetown University in Washington, DC, it was intended to make possible high-quality automatic translation from Russian into English within a few years. Since this first attempt, claims that machines could soon replace translators became commonplace. In 2018, Microsoft claimed that their Chinese to English news translations were of comparable quality to human translation. The paradox however, is that although automatic translation systems are now accessible to most people, the number of people working in the translation industry is higher than ever before – an approximate 600,000 people in the world. In this context, do professionals really have a reason to worry?
In fact, the situation is more complex than it seems. Firstly because translators themselves use digital translation tools. Translators who work on repetitive or iterative texts are likely to use translation memory, a labour-saving tool for recycling translations of sentences identical or similar to those translated previously. Many translators use machine translation and 'post-edit' the text generated by the machine. Some however, prefer not to accept machine translation post-editing jobs, as they find them to be uninteresting and poorly remunerated.
There are, of course, texts that are machine translated without any human input. Since there is now more digital content produced than ever before, there are simply too few human translators to translate it all. In general, a common rule of thumb guiding the level of human input is that the automation should be appropriate for the shelf-life of a text, as well as the level of risk in terms of subsequent consequences resulting from errors. This means that the translation of an online travel review or a tweet can be automated, whereas printed materials, marketing or medical texts, for example, require more human oversight.
Since the early days of automation, there has been a tendency to overestimate output quality. The hopes placed on machine translation in 1954 were let down, and the 2018 claim of news translations being of human parity was based on very limited evaluation criteria. In recent years, of course, these technologies have shown great progress.
The earliest systems were based on a hand coded set of rules and bilingual dictionaries. Since the 1990s, however, they have used previous human translations to compute the most statistically likely translation of a source text sentence. By the 2000s, free machine translation had become ubiquitous and around 2016 there was a leap forward in quality, brought about by neural machine translation (NMT). With the intention to replicate neural networks in the brain, these systems try to produce the most statistically likely translation for a sentence based on 'training data' – source sentences and their human translations.
Thanks to the increased accessibility and quality, online translation systems have become more useful and popular than ever. In 2016, for example, Google announced that their Google Translate system produced over 143 billion words per day, even though their accuracy sometimes leaves room for improvement.
Mistranslation and bias
While NMT’s complex mechanical operations can produce fluent-sounding output, it can also generate something entirely unexpected. Common issues are mistranslations of nouns or verbs in a sentence that reads well but has a different meaning from the source sentence. Another pitfall: encoded bias in systems, which can generate assumptions about certain adjectives, like ‘beautiful’, ‘sassy’ or ‘sexy’ to be female, and adjectives like ‘decent’, ‘hard-nosed’ or ‘likeable’ to be male.
These kinds of issues can be very difficult to spot and to resolve. Furthermore, ambiguous words that flummoxed early systems are still a problem. Very occasionally, NMT can also generate something entirely unexpected, what researchers call ‘hallucinations’, which have caused media flurries. A report from 2018 revealed examples of Google Translate translating random combinations of letters into text that resembles religious prophecies.
At present, the intention of the author and the purpose of a text cannot be encoded into a machine translation system. Its output will remove lexical richness, defaulting to the most common words from their training data. For creative texts in particular, any trace of the author’s voice will be lost.
Despite its limitations, machine translation is frequently used and post-editing has become an expected part of many translators' work: improved output quality and cost savings make the use of machine translation appealing. As most translators work on a freelance basis, they may not be in a strong position to refuse. Many subtitling jobs, for example, involve post-editing of machine translation to save on cost and turnaround time – often with related viewer complaints about translation quality. In relation to machine translation of literature, Dr. Ana Guerberof Arenas, Senior Lecturer in Translation and Multimodal Technologies at the University of Surrey in the United Kingdom, recently reported that, although readers may find the output comprehensible, they are less engaged with the narrative and enjoy reading it less than a human translation.
To summarize, machine translation can be efficient and useful – with or without human intervention. Despite this, there is a risk that heavily automated work processes may render the translating profession less attractive. That is a worry for the sustainability of both machine and human translation. However, we must not forget that, as Prof. Dorothy Kenny from the School of Applied Language and Intercultural Studies at Dublin City University in Ireland, has noted: for machine translation to replace human translators, the former depends on the latter for its training data and for its legitimacy.