The surge in civic tech and the need for Iinclusive AI in Africa » Capital News
By Kavengi Kitonga and Dr. Shikoh Gitau
In recent weeks, Kenya has experienced an unprecedented surge in Civic Tech development, driven by the passionate activism of Gen Z. As young Kenyans took to the streets to protest the Kenya Finance Bill 2024, a parallel digital movement emerged to educate and empower even more citizens.
The bill was quickly dissected and explained in bite-sized videos shared on platforms like TikTok, X, and Instagram. These videos then spread to WhatsApp groups, initially in English and aimed at Gen Z, but soon translated into over 10 local languages to reach parents and rural communities. This grassroots effort was mostly voluntary, highlighting the urgent need for accessible information.
In contrast, a Large Language Model (LLM) for the Finance Bill was released in English and Kiswahili, underscoring a significant gap in the inclusivity of AI ecosystems in Africa. Our observations revealed two key issues: African languages remain predominantly oral and are critically underrepresented in AI, and there is a need for a different approach to creating language datasets, as evidenced by the success of the viral explainer videos.
Coincidentally, Kenya recently hosted the World Kiswahili Language Day celebrations from July 5 to 7, featuring a series of events, including Usiku wa Mswahili. Established by UNESCO in November 2021, this day recognizes Kiswahili’s role in cultural preservation, awareness creation, expression, and social participation. Kiswahili is the most widely spoken language in Sub-Saharan Africa, with over 200 million speakers across more than 14 countries, and is one of the top 10 most spoken languages globally.
It serves as the official working language for the East African Community, Southern African Development Community, and the African Union. Despite its widespread use, Kiswahili’s digital presence is limited, making it a low-resource language. This scarcity of digitized text and speech data is a common issue for many African languages, given the continent’s linguistic diversity, with over 2,000 languages spoken.
The lack of digital data is exclusionary, especially in the era of artificial intelligence (AI), which has seen the proliferation of language applications (text-to-speech, machine translation), virtual assistants like Alexa and Siri, and tools such as ChatGPT, Llama2, and Mistral AI. The development of these tools involves collecting language data, training models, and deploying the tools. The shortage of African language datasets limits the ability of AI researchers and Natural Language Processing (NLP) practitioners to build relevant tools, intensifying the digital divide and muting the digital presence of millions of Africans, thereby limiting economic opportunities.
Creating and expanding language datasets is crucial for developing bespoke African models and AI tools. However, this task is enormous due to the continent’s linguistic diversity and the various dialects within languages. With limited resources (time, talent, and finances), an efficient data creation/expansion pathway is necessary. Indexes like the Government AI Readiness Index 2023 are useful in assessing countries’ preparedness to integrate AI within the public sector. A similar Global Language Readiness Index could be invaluable for prioritizing language data efforts. Through collaborative efforts among AI and NLP practitioners, such an index could outline critical pillars and indicators for gauging language readiness. It would help identify gaps in making African languages AI-ready, prioritize data collection/expansion efforts, and design efficient strategies for language data collection and expansion.
An index would provide a systematic, efficient approach to creating and expanding African language datasets, accelerating the innovation of African language tools and applications. This would ensure that speakers of African languages can access language tools in their own tongue, narrowing the digital divide, integrating African voices, and fostering accelerated economic growth.
As digital civic engagement continues to spread across Africa, it is crucial to ensure that the tools being created are inclusive for all Africans.