Probably the biggest project in African NLP is language translation. Yet we don't have a free publicly available and comprehensive dictionary for the largest language in the 2nd largest economy in Africa: Zulu. There is not even 1 Zulu thesaurus available in Zulu online anywhere.
The Zulu dictionaries that are online (all 2 of them) are user generated, far from being complete, incomprehensive, to do anything interesting NLP wise or for language acquisition. Now Zulu is a large language, in a powerful economy, yet it is still significantly under-resourced.
If Zulu can't, what hope do other minority languages? If the minority languages cannot, what about the even smaller marginal African languages that are facing extinction in the next 20 years? Yet it seems the majority of the NLP community is focusing on language translation...
Despite the lack of resources, datasets, we have not even talked about 1) how, in the available resources we have, the Latin script is severely deficient for expressing ZU and other Bantu languages, and I suspect this limits the ability to do downstream NLP tasks well... and
2) what about the fact that Much of the translation texts are from missionaries and religious institutions who literally bastardized ZU and other Bantu languages in their zeal to convert and colonize? Or lazy inadequate plenty translations that "africanize" English words?
Is this colonial era literature, and lazy colonial translations of English words the corpora that we will use for our language models and other NLP downstream tasks? Will we in "decolonize AI" ironically cement colonial legacy in the attempt to preserve "African" languages?
In the zeal/desire to advance African NLP & improve access to technology & web thru indigenous langs, is there moment for basics: for activism, to demand basic human/cultural rights from governments/institutions such as pure encyclopedias, dictionaries, thesauri in our languages?
You can follow @sabelonow.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: