AI, Language Gaps, and Equity - Alessandro Zulberti

12 Sep AI, Language Gaps, and Equity

Posted at 17:58h in Fragments by Alessandro Zulberti

Performance Disparities: High-Resource vs Low-Resource Languages

Modern AI language systems have achieved impressive results in major languages, but performance drops steeply for low-resource languages. Benchmarks like FLORES-101/200 and XTREME have made these gaps measurable. For example, Meta AI’s FLORES evaluation set covers 100+ languages (many previously ignored) to assess translation quality on the “long tail” of low-resource languages. Using such benchmarks, Meta’s No Language Left Behind (NLLB-200) model (supporting 200 languages) delivered a 44% average BLEU improvement over prior state-of-the-art systems. In practice, NLLB-200 outperformed previous models by about +7 BLEU points on a subset of 87 languages, signalling substantial gains especially for under-served languages. Google has also expanded Translate to 24 new languages using zero-resource methods (training only on monolingual text), achieving BLEU scores in the 10–40 range. However, Google concedes that these new translations “still lag far behind” the quality of higher-resource languages in its system. This highlights that despite progress, major quality gaps remain between well-resourced and low-resourced tongues.

Multiple evaluations confirm the performance gap across languages. A 2025 study found that large language models show an over 15% average drop in accuracy on common-sense reasoning tasks when prompted in low-resource languages like Hindi or Swahili compared to English. Similarly, the AfroBench benchmark (2025) – testing 64 African languages – revealed “significant performance disparities” between English and most African languages examined. In one case, researchers fine-tuning a smaller LLM (Mistral 7B) for translation saw that Meta’s NLLB still achieved the highest BLEU, chrF++ and similarity scores for Zulu and Xhosa, outperforming both the fine-tuned model and Google Translate. These evaluations underscore a consistent pattern: AI systems perform dramatically better on high-resource languages, whereas translations and language understanding for low-resource languages are often error-prone and inferior. Even big tech’s multilingual models have shown “lacklustre performance on low-resourced languages” when compared to community-trained, local models. In short, inequities in training data translate directly into quality disparities, leaving many languages behind despite the “universal” ambitions of multilingual AI.

Cultural and Political Implications of Language Exclusion

The uneven performance and support for languages in AI systems lead to profound cultural and political consequences. Researchers refer to a growing “digital language divide” – a gap between languages with ample digital content/AI support and those virtually absent online. Only a small fraction of the world’s ~7,000 languages (under 5%) have a significant presence on the internet. This divide isn’t just about technology – it translates into epistemic injustice and knowledge marginalisation. When a language lacks digital support, its speakers’ knowledge and narratives become digitally invisible or underrepresented. AI language models today cover only a tiny subset of languages and “favour North American language and cultural perspectives,” introducing an Anglo-centric bias that undermines other worldviews. In effect, English and a few major languages dominate AI systems’ training data and outputs, so content generated by these models is “filtered through a Western lens,” often neglecting diverse cultural contexts. This bias toward Anglo-American norms means that minority languages and the perspectives encoded in them are sidelined – a subtle form of cultural homogenization and epistemic injustice in the digital sphere.

The exclusion of languages from digital infrastructure has real-world implications for equity and human rights. Communities whose languages lack AI support are caught in a “repeating cycle of exclusion,” unable to fully participate in or benefit from digital services and AI advancements. Speakers of these languages face information barriers and often endure poorer service from technology – a form of systemic bias or discrimination in access to knowledge. For many Global South countries, there are even geopolitical stakes: reliance on AI tools that only work in English (or Chinese, etc.) can lead to distorted global narratives and a loss of local “epistemic sovereignty” over information. In other words, if your language isn’t represented, your history and concerns risk being filtered out or misinterpreted by dominant digital platforms.

Several stark examples illustrate these risks. In Indonesia, local officials using a health IT system discovered grave translation errors when the software attempted to handle minority languages (Javanese, Sundanese); the mistakes led to dangerous misunderstandings in medication dosage instructions. This shows how lack of localization can literally endanger lives. Another case involved the Romani language: Google Translate’s inclusion of Romani, without proper safeguards, reportedly enabled police in Hungary and Romania to surveil and target Romani communities by misusing automated translations. Here, adding a marginalised language to AI systems without community consent or context actually facilitated harm against a minority group. Such incidents demonstrate that technological inclusion done wrong can backfire, reinforcing power imbalances. Indeed, the “technological bias” sends a message that some languages (and by extension, their people) matter less in the digital world, especially when the same privileged languages always perform best and receive the most attention.

Toward Linguistic Equity: Ethics and Policy Responses

Recognising these issues, scholars and policymakers argue that linguistic equity must be a priority in AI development. It’s not enough to “add more languages” into large models; power dynamics and biases in data and design must be addressed to truly include marginalised languages. AI ethicists note that current NLP methods often assume a one-size-fits-all, English-trained approach, which can misalign with local realities and even impose neocolonial dynamics. To avoid perpetuating injustice, community-centred and value-sensitive approaches are urged. For instance, participatory AI projects that involve native speakers in data collection, translation, and evaluation have proven feasible and effective. Grassroots initiatives (e.g. Masakhane or Lelapa in Africa) often produce higher-quality, culturally informed language tools by leveraging local knowledge.

At a policy level, various recommendations are emerging to close the “AI language gap.” In a 2025 multilingual AI policy primer, Cohere researchers call for: greater R&D investment in under-represented languages, support for creating open datasets, and international collaboration to share knowledge and best practices. Experts also emphasise that improving multilingual capabilities improves AI safety for all users, since weaknesses in one language can be exploited to spread harm across platforms. Ultimately, achieving linguistic equity in AI means treating language not just as data, but as a critical dimension of cultural diversity and justice. As one analysis put it, we must guard against “westernised cultural homogenisation” in AI outputs and ensure equal opportunities for all language communities in their self-representation and knowledge creation. In sum, bridging the multilingual performance gap is not only a technical endeavour – it is an ethical imperative to empower minority languages, protect cultural heritage, and promote a more inclusive digital future for speakers of all languages.

Sources: Recent academic studies and industry reports were used to ensure up-to-date information and examples (2022–2025). Key references include peer-reviewed benchmarks (e.g. FLORES-101 ), Meta AI’s NLLB project results, Google’s zero-resource translation initiative, as well as analyses from AI ethics and linguistics experts on the impacts of linguistic exclusion. These illustrate both the technical performance gaps and the broader socio-cultural stakes of linguistic diversity in AI. Each citation in the text corresponds to the source of the statement preceding it, allowing for verification and further reading.

References

Arxiv (2023). Diversity and Language Technology: How Techno-Linguistic Bias Can Cause Epistemic Injustice.
Bold Insight (2023). Context, accents, and privacy: Overcoming hurdles of AI translation tools in global UX research.
Ethnologue (2022). How many languages are there?
Highberg Insights (2025). Erik de Ruiter, Is Generative AI the Digital Tower of Babel? Pride Comes Before a Fall.
Johns Hopkins University (2025). Multilingual Artificial Intelligence Often Reinforces Bias.
Le Monde (2024). Google bets on African languages including Dyula, Wolof, Baoulé, and Tamazight.
Mozilla (2025). Towards Truly Multilingual AI: Breaking English Dominance.
RSIS International (2021). Accuracy of Google Translate in Translation of English–Kiswahili and Kiswahili–English Newspaper Headlines.
Reddit (2024). Google’s AI translations are a disaster for my language (Manx).
Wikipedia (2025). Google Translate.
Wired (2016). Meet Noto, Google’s Free Font for 800 Languages.
Google Blog (2016). Preserving Endangered Languages with Noto Fonts.