Ai2’s new Olmo 3.1 expands reinforcement learning training for more powerful inference benchmarks -

The Allen Institute for AI (Ai2) recently released what it calls its most powerful yet. No model family yet, Olmo 3. However, the company continued to iterate on the model and extend its reinforcement learning (RL) execution to create Olmo 3.1.

The new Olmo 3.1 model focuses on efficiency, transparency, and control for enterprises.

Ai2 has updated two of the three versions of Olmo 2. Olmo 3.1 Think 32B, the flagship model optimized for advanced research, and Olmo 3.1 Instruct 32B, designed for guided, multi-turn interaction and tool use.

Olmo 3 has a third version, Olmo 3-Base, for programming, understanding, and math. Also suitable for continuous fine-tuning.

According to Ai2, to upgrade the Olmo 3 Think 32B to Olmo 3.1, researchers extended the best RL runs with a longer training schedule.

“After the original Olmo 3 launch, we restarted the RL training run on the Olmo 3 32B Think and trained for an additional 21 days on 224 GPUs with additional epochs on the Dolci-Think-RL dataset,” Ai2 said. blog post. “This resulted in Olmo 3.1 32B Think, which delivered significant improvements across math, reasoning, and instruction-following benchmarks. It improved by over 5 points in AIME, over 4 points in ZebraLogic, over 4 points in IFEval, and over 20 points in IFBench, as well as improved performance on coding and complex multi-step tasks.”

According to Ai2, to realize the Olmo 3.1 Instruct, researchers applied the recipe behind the small Instruct size (7B) to the larger model.

Olmo 3.1 Instruction 32B is "Optimized for chat, tool usage, and multi-turn interactions, it’s a more performant sibling to the Olmo 3 Instruct 7B and ready for real-world applications,” said Ai2. Post to X.

Currently, the new checkpoint is available in Ai2 Playground or Hugging Face, and API access will be provided soon.

Improved performance in benchmarks

The Olmo 3.1 model performed well in benchmark tests and outperformed the Olmo 3 model as expected.

The Olmo 3.1 Think outperformed the Qwen 3 32B model on the AIME 2025 benchmark and performed close to the Gemma 27B.

Olmo 3.1 Instruct performed well against open source competitors and outperformed models such as Gemma 3 on Math benchmarks.

“When it comes to Olmo 3.1 32B Instruct, it’s a large-scale instruction coordination model built for chat, tool usage, and multi-turn interactions. Olmo 3.1 32B Instruct is the most capable full-open chat model to date, and in our evaluation, the strongest full-open 32B-scale instruction model,” the company said.

Ai2 also upgraded the RL-Zero 7B model for math and coding. The company said in the X that both models benefit from longer and more stable training runs.

Commitment to transparency and open source

Ai2 previously told VentureBeat that it designed the Olmo 3 family of models to give companies and research institutions more control and understanding of the data and training built into the models.

Organizations can add to the model’s data mix and retrain the model to learn from the added content.

This is a long-standing effort of Ai2. A tool called OlmoTrace Track how the LLM output matches its training data.

“Olmo 3.1 Think 32B and Olmo 3.1 Instruct 32B demonstrate that openness and performance can advance at the same time. By extending the same model flow, we continue to improve capabilities while maintaining end-to-end transparency into data, code, and training decisions,” said Ai2.

Source link