Meta returns to open source AI with omnilingual ASR model that can natively transcribe over 1,600 languages



Meta has released a new multilingual automatic speech recognition (ASR) system that supports over 1,600 languages. This dwarfs OpenAI’s open source Whisper model, which only supports 99 languages.

This architecture also allows developers to extend their support to thousands more. Through a feature called zero-shot in-context learning, users can provide several example audio-text pairs in a new language during inference, allowing the model to transcribe additional utterances in that language without retraining.

In practice, this expands the potential languages ​​to over 5,400 languages ​​(almost every spoken language with a known script).

This is a move from static model functionality to a flexible framework that the community can adapt themselves. Therefore, while the 1,600 languages ​​reflect formal training coverage, the broader number represents Omnilingual ASR’s ability to be generalized on demand, making it the most scalable speech recognition system ever released.

Best of all, it’s open sourced under the plain Apache 2.0 license. This is not a restrictive, quasi-open source Llama license like the company’s previous releases, which restricted use by large enterprises unless they paid a license fee. This means that researchers and developers are free to adopt and implement any commercial and enterprise-level projects right away, without any restrictions, and for free.

Released on Meta’s website Github on November 10, along with Hugging Face’s demo space and technical documentation, Meta’s omnilingual ASR suite includes a set of speech recognition models, a 7 billion parameter multilingual speech representation model, and a large speech corpus spanning over 350 previously unserved languages.

All resources are freely available under an open license, and the model supports audio-to-text transcription out of the box.

“By open sourcing these models and datasets, we aim to break down language barriers, expand digital access, and empower communities around the world,” Meta posted on X’s @AIatMeta account.

Designed for audio-to-text transcription

At the heart of Omnilingual ASR is a speech-to-text system.

The model is trained to convert spoken language into written language, supporting applications such as voice assistants, transcription tools, subtitles, digitization of oral archives, and accessibility features for low-resource languages.

Unlike previous ASR models that required large-scale labeled training data, omnilingual ASR includes a zero-shot variant.

This version allows you to transcribe languages ​​you’ve never seen before using just a few samples of audio and corresponding text pairs.

This significantly lowers the barrier to adding new or endangered languages ​​and eliminates the need for large corpora and retraining.

Model family and technical design

The Omnilingual ASR suite includes multiple model families trained on over 4.3 million hours of audio from over 1,600 languages.

  • wav2vec 2.0 model for self-supervised speech representation learning (300M to 7B parameters)

  • CTC-based ASR model for efficient supervised transcription

  • LLM-ASR model for state-of-the-art transcription combining audio encoder and Transformer-based text decoder

  • LLM-ZeroShot ASR model allows adaptation of inference time to unknown languages

All models follow encoder and decoder designs. That is, raw speech is converted into a language-independent representation and then decoded into written text.

Why scale matters

Whisper and similar models have advanced ASR capabilities for global languages, but fall short of the long tail of human language diversity. Whisper supports 99 languages. Meta system:

  • Direct support for over 1,600 languages

  • Generalizable to over 5,400 languages ​​using in-context learning

  • Achieved Character Error Rate (CER) <10% for 78% of supported languages

According to Meta’s research paper, supported languages ​​include more than 500 languages ​​not previously covered by ASR models.

This expansion opens new possibilities for communities whose languages ​​are often excluded from digital tools.

Below is a revised and expanded background section that integrates the broader context of Meta’s 2025 AI strategy, leadership changes, and Llama 4 acceptance. In-text citations and links are also included.

Background: Meta’s AI overhaul and rebound from Llama 4

The release of Omnilingual ASR comes at a pivotal time for Meta’s AI strategy, following a year marked by organizational disruption, leadership changes, and uneven product execution.

Omnilingual ASR is the first major open source model release since the publication of Llama 4, Meta’s latest large-scale language model. Llama 4 debuted in April 2025, but had little enterprise adoption compared to its Chinese open-source model competitors, mixed reviews, and ultimately poor ratings.

In response to this failure, Meta founder and CEO Mark Zuckerberg appointed Alexandr Wang, co-founder and former CEO of AI data supplier Scale AI, as chief AI officer, embarking on a massive and expensive hiring spree that shocked the AI ​​and business worlds with eye-popping pay packages for top AI researchers.

In contrast, omnilingual ASR represents a strategic and reputational reset. This returns Meta to polyglot AI, an area in which the company has historically led, and provides a truly extensible, community-oriented stack with minimal barriers to entry.

The system’s support for over 1,600 languages ​​and scalability to over 5,000 languages ​​with zero-shot in-context learning reaffirms Meta’s engineering credentials in language technology.

Importantly, we accomplish this through a free and permissively licensed release under Apache 2.0 with transparent dataset sourcing and reproducible training protocols.

This change is consistent with broader themes in Meta’s 2025 strategy. The company is refocusing its story around its “personal superintelligence” vision and investing heavily in infrastructure sources (including the September release of custom AI accelerators and an Arm-based inference stack), while downplaying the metaverse in favor of basic AI capabilities. The resumption of the release of training data in Europe after a regulatory pause also underscores the company’s intention to compete globally, despite privacy scrutiny sources.

So Omnilingual ASR is more than just a model release. This is a calculated move to reassert control of the narrative, from the piecemeal rollout of Llama 4 to actionable, research-based contributions that align with Meta’s long-term AI platform strategy.

Community-centric dataset collection

To achieve this scale, Meta collaborated with researchers and community organizations in Africa, Asia, and other regions to create the Omnilingual ASR Corpus, a 3,350-hour dataset across 348 low-resource languages. Contributors included paid local speakers, and recordings were collected in collaboration with groups such as:

  • african next voice: Gates Foundation Supported Consortium including Maseno University (Kenya), University of Pretoria, and Data Science Nigeria

  • The common voice of the Mozilla Foundationsupported through the Open Multilingual Speech Fund

  • Lanfrica / Naija Voicecreated data for 11 African languages ​​including Igala, Serer, and Urhobo

Data collection focused on unscripted, natural speech. Prompts are designed to be culturally relevant and open-ended, such as, “Is it better to have a few close friends or many casual acquaintances? Why?” Transcription uses established writing systems and quality assurance is built into every step.

Performance and hardware considerations

The largest model in the suite, omniASR_LLM_7B, requires up to 17GB of GPU memory for inference, making it suitable for deployment on high-end hardware. Smaller models (300M to 1B) can run on low-power devices and provide real-time transfer speeds.

Performance benchmarks show good results even in low resource scenarios.

  • CER <10% in 95% of high and medium resource languages

  • CER <10% in 36% of low-resource languages

  • Robustness in noisy situations and invisible areas, especially with fine-tuning

A zero-shot system, omniASR_LLM_7B_ZS can transcribe new languages ​​with minimal setup. The user provides some sample audio-text pairs, and the model generates transcriptions of new utterances in the same language.

Open access and developer tools

All models and datasets are licensed under the following permissive terms:

  • Apache 2.0 For model and code

  • CC-BY 4.0 HuggingFace for all languages ​​ASR corpus

Installation is supported via PyPI and uv.

pip install omnilingual-asr

Meta also provides the following features:

  • HuggingFace dataset integration

  • Pre-built inference pipeline

  • Improved accuracy by adjusting language code

Developers can view the complete list of supported languages ​​using the API.

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

print(len(supported_langs))
print(supported_langs)

wider impact

Omnilingual ASR reorganizes the language range of ASR from a fixed list to a list. extensible framework. This allows you to:

  • Community-driven inclusion of underrepresented languages

  • Digital access to oral and endangered languages

  • Research on speech technology in linguistically diverse contexts

Importantly, Meta emphasizes ethical considerations throughout and advocates for open source participation and collaboration with native speaker communities.

The Omnilingual ASR paper states, “No model can ever predict and incorporate all the world’s languages ​​in advance. However, Omnilingual ASR enables the community to extend recognition using their own data.”

Access the tools

All resources are currently available at:

  • code + model:github.com/facebookresearch/omnilingual-asr

  • dataset:huggingface.co/datasets/facebook/omnilingual-asr-corpus

  • blog post: ai.meta.com/blog/omnilingual-asr

What this means for businesses

For enterprise developers, especially those operating in multilingual or international markets, omnilingual ASR significantly lowers the barrier to deploying speech-to-text systems to a wider range of customers and geographies.

Instead of relying on commercial ASR APIs that only support a limited number of high-resource languages, teams can now integrate open source pipelines that cover over 1,600 languages ​​out of the box, with the option to expand to thousands more with zero-shot learning.

This flexibility is especially valuable for companies in sectors where local language coverage may be a competitive or regulatory necessity, such as voice-based customer support, transcription services, accessibility, education, and public technology. These models are released under the permissive Apache 2.0 license, allowing businesses to fine-tune, deploy, or integrate them into their own systems without any restrictions.

This also represents a change in the ASR environment from a centralized, cloud-gate service to a community-scalable infrastructure. Omnilingual ASR opens the door to a new generation of enterprise voice applications built around language inclusion, not language restriction, by making multilingual speech recognition more accessible, customizable, and cost-effective.



Source link