Siri, how do you say “hello” in Norwegian?
I can’t translate into Norwegian yet.
How about Punjabi?
I can’t translate into Punjabi yet.
I can’t translate into Korean yet.
Siri’s lack of translation ability is indicative of a trend known as the digital language barrier. This means that many languages still aren’t available or translatable online, despite having millions of native speakers. This is particularly true of languages in non-western countries, such as those in Asia and sub-Saharan Africa. For example, Maithili is spoken by about 15 million people in India and 3 million in Nepal, yet has no presence on Google Translate. Neither does Oromo, which boasts more than 30 million speakers in the Horn of Africa. And Siri can’t translate into popular languages such as Punjabi (130 million speakers) or Korean (75 million speakers). How is this possible?
Why are some languages translatable on the internet, and others aren't?
Conversely, it seems that most western languages are available on Google Translate and other technological services like Siri or Alexa. This is largely because there are 24 official languages in the European Union, and European Parliament documents must be human-translated and made available in all these languages. In turn, these documents which contain identical information in different languages can be used by machine translation to draw linguistic parallels and become smarter at translation.
Highly literate societies also have a greater wealth of language resources to draw from--novels, TV shows, newspapers, etc.--which can also be fed into machine translation resources and improve translatability. The more content to draw from, the more accurate the translation. But this system also means that regions that rely more heavily on oral tradition than written documents, or where literacy is not as widespread, are at a distinct disadvantage when it comes to online translation.
Why does this matter?
With 72% of people preferring to read online content in their native language, leaving certain populations out means many missed opportunities. For example, speakers of these mid-size languages might not have the same access to educational information or news sources as speakers of more common languages like English, Mandarin, or Arabic.
International businesses may not be able to reach these population segments because their website will not autotranslate to the appropriate language. In more extreme circumstances, rescue teams and aid organizations trying to help during a natural disaster may face an insurmountable language barrier in an area where one of these mid-size languages is predominantly used.
Internet translation is crucial in this interconnected world, and consequences could be dire without it.
what can we do about it?
The digital language barrier will continue to persist in the absence of strategic action. But there may be a solution on the horizon. In recent years, the US Defense Advanced Research Projects Agency (DARPA) has partnered with dozens of companies and universities with the goal of creating an automatic translation system for these “forgotten” languages.
In the absence of extensive parallel texts to draw from, such as with the European Union’s human-translated documents, DARPA uses text from social media and real conversations from native speakers to understand basic vocabulary, sentiment, and grammar in the world’s 7,000 languages. A machine can then build a database for the language and start learning on its own as more information is gathered. While any machine translated text may not be as accurate as a human-translated text, in extreme situations this program will provide enough information and context to enable effective communication. When this initial step is taken, it will become easier in the future to close the digital divide.
Ready to help create a world in which language is no longer a barrier? At ULG we strive to transform language barriers into opportunities by providing language solutions to help business in the global marketplace. Contact us today to learn more.