The Multilingual Scaling Myth: Why Bigger Multilingual Models Often Fail Smaller Languages

The global race to build ever larger multilingual language models assumes that scale automatically improves performance. Our research at Lelapa AI shows a different reality. When too many languages are forced into a single model, the model's understanding of some languages weakens. In practice, this means the system becomes less accurate at recognising words, distinguishing between similar languages, and completing language tasks for languages with limited data. Community-centred approaches that focus on related language families often deliver stronger results while using fewer computational resources. This insight supports a broader shift toward resource-efficient artificial intelligence (AI) design.

A Question the Industry Rarely Asks

"What happens when a language disappears inside a model that is supposed to support hundreds of them?"

This question rarely appears in conversations about AI scaling. Industry headlines celebrate models that support dozens or even hundreds of languages at once. The assumption is clear: more languages inside one system should produce better global performance.

Yet the experience of many researchers tells a different story.

In many parts of Africa and the Global South, teams work with languages that have limited digitised data, fewer computational resources, and smaller research ecosystems. These languages are often added to massive multilingual systems that were primarily trained on dominant languages with far larger datasets.

The result raises an important challenge for AI design. When a model attempts to learn too many languages simultaneously, some languages receive stronger representation while others fade into the background.

Our research paper, The Less the Merrier? Investigating Language Representation in Multilingual Models, examines this challenge and asks a fundamental question: how well do multilingual models truly represent the languages they claim to support?

The AI industry has embraced a clear belief: larger models produce better results. Modern Multilingual Language Models (language models trained on multiple languages within a single system) continue to grow in size, incorporating dozens or even hundreds of languages.

The logic appears straightforward. If one model can support many languages at once, it should accelerate progress across the world.

Yet an important question often goes unexamined: what happens to language performance when a model tries to learn too many languages at once?

Understanding Language Representation

Multilingual Language Models rely on a concept called language representation. Representation refers to how a model internally understands and organises linguistic patterns such as vocabulary, grammar, and sentence structure.

These representations allow models to perform Natural Language Processing (NLP), a field of AI that enables computers to understand and generate human language.

When multiple languages are trained within a single model, the model attempts to capture patterns across all of them. This enables cross-language transfer learning, where knowledge learned from one language can help improve performance on another.

In theory, this approach should help languages with smaller datasets benefit from languages with larger datasets.

Our research set out to test how well this assumption holds in practice.

When Scale Creates Competition

Multilingual models allocate limited capacity to represent many languages simultaneously. As the number of languages grows, those languages begin to compete for representation inside the model.

Languages with large datasets often dominate this competition. Languages with smaller datasets receive weaker representations, which can reduce performance on downstream tasks such as text generation or Named Entity Recognition (NER), a task that identifies entities such as people, organisations, and locations in text.

This dynamic becomes especially visible for low-resource languages. Low-resource languages are languages with limited digitised data, fewer linguistic tools, and smaller publicly available datasets.

These conditions are common across parts of Africa, Asia, Latin America, and other regions of the Global South.

What the Research Revealed

Our experiments analysed popular multilingual models to determine which languages are well supported and which receive weaker representation. We also evaluated performance on tasks such as text generation and Named Entity Recognition.

The findings revealed an important pattern. Models designed by communities around related languages often perform better at distinguishing between languages within the same family. These models develop stronger internal representations for those languages.

This result suggests that focused design can outperform broad scaling in certain contexts.

Community-centred models concentrate on languages that share linguistic characteristics, geographic proximity, or cultural context. This allows the model to learn patterns more effectively.

This principle aligns with a broader design philosophy that prioritises systems built with the communities that use them. At Lelapa AI, similar thinking informs initiatives such as the Esethu framework for sustainable data curation and governance. Esethu emphasises community participation in the creation and stewardship of language datasets. Strong data governance combined with focused modelling approaches strengthens representation quality and supports responsible AI development.

Rethinking Scale in Multilingual AI

The research highlights a broader design principle for AI systems.

Effective scale emerges from thoughtful architecture. Models benefit from clear design choices about which languages to group, how data is structured, and how computational capacity is allocated.

Blind expansion introduces complexity that can weaken representation quality. Strategic focus enables stronger performance and more reliable outcomes.

This insight holds relevance far beyond any single region. Many languages across the Global South face similar data constraints and representation challenges.

Linking Representation Quality to Resource-Efficient AI

For teams building language technologies in constrained environments, model design decisions carry significant consequences.

Our research contributes to a growing conversation within the AI community. Progress requires models that balance scale, representation quality, and computational efficiency.

Community-informed design offers one pathway forward. Models built around related language families or regional ecosystems can deliver stronger performance while using fewer resources.

Looking Ahead

Further research will continue to explore how multilingual architectures can support a wider range of languages without sacrificing representation quality.

These investigations deepen an important insight: the future of language AI depends on systems that are deliberately designed to perform well under real-world constraints.

At Lelapa AI, this principle guides how we build language technologies. Our research focuses on architectures that deliver strong performance across diverse environments while remaining computationally efficient.

If you are exploring how multilingual models represent languages and what design choices influence performance, the full paper provides deeper insights into the experiments, analysis, and implications behind this research.