Putting African science in the dictionary

Many technical terms do not have an equivalent in African languages, depriving parts of the population of scientific knowledge and its impact in society. Researchers and experts from the entire continent have decided to do something about this by enriching the vocabulary of several languages.
© Francesc Roig

Nick Dall
Journalist based in Cape Town, South Africa. He has co-authored two books on South African history: Rogues’ Gallery and Spoilt Ballots.

For South African science journalist Sibusiso Biyela, writing about a new dinosaur discovery in his home language Zulu should have been an easy task. But when he sat down to write the piece, as he told the British journal Nature’s podcast, he found that he “didn’t have the words for relatively simple scientific terms like ‘fossil’ or even ‘dinosaur’.” Biyela remembers being extremely discouraged.

Another journalist might have taken the easy way out and ‘Zulufied’ the English words by adding an ‘i’ to the beginning, but Biyela felt uncomfortable with this approach. He ended up translating ‘dinosaur’ as Isilwane sasemandulo or ‘ancient animal’. When it came to ‘fossils’, he took things even more literally, translating them as Amathambo amadala atholakala emhlabathini meaning “old bones found in the ground”.

Not having the words to discuss certain topics is a problem faced by people across Africa

This was by no means the only time Biyela had encountered such problems. Not having the words to discuss even mildly technical topics is a problem people across Africa face every day. Linguistically, the continent, which has an estimated 2,000 indigenous languages, has been bypassed by science and many other spheres.

Constructing together

In 2019, a group of researchers from across the continent formed Masakhane (‘we build together’ in Zulu). This grassroots non-profit organization is “focused on developing language technology for African languages”, explains co-founder Jade Abbott, an expert in natural language processing (NLP). Initially the group was made up primarily of machine learning experts, but it has since grown to include linguists, engineers, political scientists and communicators like Biyela. Being scattered across more than forty countries, these experts have developed the habit of working online. So when the Covid-19 pandemic struck, they were prepared.

At the start, Masakhane focused on developing machine translation tools for as many African languages as possible. Today many of us take tools like Google Translate for granted and we assume that any web page we access can automatically be translated into our home language. But to this day, speakers of only a handful of Africa’s over 2,000 languages have access to such a luxury.

The lack of data in African languages is hampering the development of machine translation tools

It is relatively easy to build machine translation tools, provided they have access to data – something which is sorely lacking for the vast majority of African languages. For this reason the Masakhane team focused on showing that “working in a participatory manner, with humans who understand the tools and the languages, enables you to get better data”, says Jade Abbott. 

A paper published in 2020, co-authored by fifty members of Masakhane in dozens of countries, won the Wikimedia Foundation Research Award. Examining the status quo for forty-eight of Africa’s most spoken languages, it provided a roadmap for establishing “machine translation benchmarks for over thirty languages” while also enabling people “without formal training to make a unique scientific contribution.”

Igbo, Swahili or Yoruba

Once this initial research phase was complete, Masakhane set about putting the theory into action. Their translation tool currently has working prototypes for six African languages (Igbo, Lingala, Shona, Swahili, Tshiluba and Yoruba). Abbott expects it to be a work in progress for several years. The team will also be exploring how to best make this tool accessible, as everyone involved is very keen “to make sure that the tools are used to improve communities in Africa rather than boosting profits for digital platforms”.

Masakhane’s members have produced over 200 academic papers and the organization has sanctioned seven other major projects. One of these, Decolonise Science, a collaboration with AfricaArxiv, an African digital archive working towards building an open scholarly repository, and ScienceLink, an open access scientific platform based in the Netherlands, in which Biyela, the Zulu journalist, is heavily involved. 

When the project kicked off in 2021, the initial goal was to translate around 200 scientific papers into six African languages. But the team soon realised that this was a nearly impossible task which would require the creation of hundreds of new terms (‘dinosaur’, ‘fossil’, etc) for each paper. A more realistic, revised goal will see the group translate the abstracts of 180 papers (which have already been selected via an intensive process that considers the field, impact, and geographic and gender diversity of the research) into the six languages while also generating five new terms for each paper.

If ‘decolonisation’ sounds like a destructive process of tearing down existing edifices, for Masakhane it is more about building new ones. “What happens for a lot of speakers of indigenous languages,” explains Biyela, “is that we can talk about sports and politics and other topics in our home language, but when it comes to talking about science or technology [we] have to code switch. [...] This can be problematic because it paints science as this foreign visitor that’s invading the conversation”, he adds. This situation is not without consequences, notably in the health domain. When facing people who are hesitant about getting vaccinated, for example, “you can’t really explain what mRNA or immunology is in your home language.” If Masakhane has their way this will all be changing very soon.

Translation: from one world to another
April-June 2022
Subscribe Courier