Aligning AI with Culture

Large language models (LLMs) are remarkably adept at generating content that appears highly personalized—and relevant. Ask a chatbot a specific question and you receive a response that’s tailored to your exact input.

But there’s a problem. Open AI, Google, Anthropic, and other widely available LLMs deliver responses based on a narrow range of language, thinking, and values that do not reflect billions of people around the world, experts say. Not surprisingly, the models lean into U.S. and European languages and cultures.

“These widely used models tend to be less relevant for people in many parts of the world,” said Vukosi Marivate, chair of Data Science at the University of Pretoria in South Africa, and co-founder of the Masakhane Research Foundation, as well as startup Lelapa AI. In addition, he said, those models “frequently deliver results that aren’t useful or wrong.”

This is leading researchers down a new path: developing generative AI models tuned to the sensibilities of a specific culture, group, or language. “Aligning LLMs with the needs of individual cultures is crucial,” said Fabro Steibel, Executive Director of the Institute for Technology & Society, a Rio de Janeiro, Brazil organization that promotes digital accessibility.

Beyond Words

A fundamental problem with LLMs is that they excel at predicting language structure without an intrinsic understanding of what they’re saying. This parroting can lead to an inaccurate picture of a culture, said Emily M. Bender, a professor in the Department of Linguistics at the University of Washington.

“An outsider who doesn’t speak the language might think the structure and grammar are accurate and authentic, but an indigenous speaker can detect a difference,” Bender explained.

The problem extends beyond linguistics, however. Even if an LLM can translate languages well, it will likely miss critical nuances and subtleties about a culture—because its training doesn’t include content from the place it is generating words about; it’s simply spitting out synthetic text.

Consider: “If you live in Senegal, you’re likely to view content generated by a major LLM through French language and the lens of colonial history,” Steibel said. “What AI decides to show you can be a different interpretation or blatantly false because the LLM doesn’t understand the culture, the people, and the country.”

In places like Africa, where more than 2,000 languages and a diverse set of cultural norms exist, the challenges multiply. For instance, Nigeria alone is home to approximately 500 different languages. “There is no way to produce an English-speaking chatbot that has the technical and practical capacity to communicate effectively across all these languages,” Steibel said.

A Local Accent

The appeal of monolithic models like ChatGPT, Gemini, and Claude is that they can produce convincing text on virtually any topic. They’re easy to use and the basic versions are widely available and free. “But this doesn’t mean they are universally useful, even for people who come from a more dominant or mainstream culture,” Bender said.

Niche LLMs that align with local language and culture typically paint a more accurate picture—and fill critical gaps. They’re able to collect and share more relevant information. No less important: local citizens gain ownership over the LLM model. This makes it possible, for example, to embed cultural history or storytelling into AI or design an app that might help preserve or teach the language.

For example, the Masakhane Research Foundation incorporates an Africa-centric natural language processing (NLP) framework. Local communities manage research and development, with a focus on reproducible results and data sovereignty. Using open-source tools like DeepSeek and Llama, it’s possible to address cultural nuances, language characteristics, and cultural values. Currently, the organization has over 1,000 participants from 30 African countries.

Maritaca AI in Brazil also uses open-source tools to develop LLMs tailored to Portuguese speakers and Brazilian cultural norms. Its flagship model, Sabiá-3, enables localized problem-solving in areas like healthcare, education, and public transportation through detailed explanations of topics, contextual answers, and more accurate multilingual translation.

In India, which has upwards of 1,600 languages and dialects, the BharatGen initiative delivers an open-source, multimodal, multilingual generative AI model. Launched by the Indian federal government in September 2024, the project emphasizes data sovereignty and cultural relevance, including for marginalized and underrepresented communities.

In February 2025, BharatGen released Param 1, a 2.9-billion-parameter LLM that extends across 19 Indian language variations. The AI tool generates speech that matches a speaker’s input in Marathi, Bengali, Hindi, Punjabi, and other languages. Developers can build specialized AI apps ranging from chatbots to knowledge systems from the base LLM.

Researchers at IIT Bombay and other institutions built the model from scratch, with 25% of the training data coming from Indic languages. That compares to about 0.01% typically used in large, mainstream LLM models.

Decoding Culture

Constructing culturally aligned models is not without challenges. Many communities lack money and technical resources to build LLMs, data can come from biased or limited sources, and arguments can erupt about who should govern the model and what information should or shouldn’t appear in the LLM. Data ownership, consent, and data privacy issues can emerge as well.

Nevertheless, these specialized models are likely to proliferate over the coming years—and play an increasingly important role in countries and cultures. They’re likely to result in more responsive customer service, better public health outcomes, and more relevant and accurate research—often in a more resource-friendly and sustainable way.

For developing nations, there is an additional upside, Steibel noted. Access to open-source, purpose-built models can unlock creativity, spur innovation, and promote economic growth. “It isn’t only about having your own model to use as a consumer, it’s about the ability to become a creator and producer of technology,” he said. “We are starting to see clever and resourceful ways to use AI in a greater number of places.”

The move away from one-size-fits-all generative AI frameworks—along with the democratization that culturally aligned LLMs deliver—will almost certainly reshape the world in the coming years. Concluded Marivate: “Convenience amplifies biases and leaves many groups on the outside looking in. Culturally aligned models make AI more accessible and useful for everyone.”

Samuel Greengard is an author and journalist based in West Linn, OR, USA.