Arcee aims to reboot US open source AI with new Trinity model released with Apache 2.0



For most of 2025, the frontier of open-weight language models will be defined in Beijing and Hangzhou, not Silicon Valley or New York City.

Chinese research institutes such as Alibaba’s Qwen, DeepSeek, Moonshot, and Baidu are rapidly leading the way in developing large-scale, open mixed-of-experts (MoE) models, many with permissive licenses and superior benchmark performance. OpenAI also developed its own open source general-purpose LLMs (gpt-oss-20B and 120B) this summer, but adoption has been slow as there are many comparable or better-performing alternatives.

Currently, a small and medium-sized company in the United States is protesting.

Today, Arce AI announced the release of Trinity Mini and Trinity Nano Preview, the first two models of the new “Trinity” family, a suite of open weight MoE models fully trained in the United States.

Users can try the former directly for themselves in chatbot form on Acree’s new website, chat.arcee.ai, and developers can download the code for both models on Hugging Face to run or modify themselves./tweak All available for free under the Apache 2.0 enterprise license.

Although small compared to the largest frontier models, these releases represent a rare attempt by a U.S. startup to build end-to-end open-weight models at scale, trained from the ground up on U.S. infrastructure, using a U.S.-curated dataset pipeline.

"I’m having a hard time expressing in words how excited I am to launch these models, as I’m experiencing a combination of extreme pride in my team and crushing exhaustion." Lucas Atkins, Arcee’s CTO, said in a post on social network X (formerly Twitter). "Especially mini."

The third model, Trinity Large, is already in training. It is a 420B parameter model with 13 billion active parameters per token and is expected to be released in January 2026.

“We want to add what was missing to that picture,” Atkins said in Trinity’s launch manifest published on Arcee’s website. “A full-fledged open weight model family has been trained end-to-end in America and is now available for real-world ownership by businesses and developers.”

From small models to large-scale ambitions

The Trinity project marks a turning point for Arcee AI, which has traditionally been known for its compact, enterprise-oriented model. The company has raised $29.5 million in funding to date, including a $24 million Series A in 2024 led by Emergence Capital, and previous releases include AFM-4.5B, a compact instruction coordination model released in mid-2025, and SuperNova, an early 70B parameter instruction following model designed for enterprise deployments in VPC.

Both aimed to solve the regulatory and cost issues plaguing the implementation of proprietary LLMs within companies.

With Trinity, Arcee is aiming higher. Full-stack pre-training of open-weight foundational models built for instruction tuning and post-training, as well as long-context inference, synthetic data adaptation, and future integration with live retraining systems.

Originally conceived as a stepping stone to Trinity Large, Mini and Nano both grew out of early experiments in sparse modeling and quickly became commercial products in their own right.

Technical highlights

Trinity Mini is a 26B parameter model with 3B active per token, designed for high-throughput inference, function calls, and tool usage. Trinity Nano Preview is a 6B parameter model with approximately 800M active non-embedded parameters. This is a more experimental, chat-focused model with a stronger personality, but less robust reasoning.

Both models use Arcee’s new Attention-First Mixed Expert (AFMoE) architecture, a custom MoE design that blends global sparsity, local/global attention, and gated attention techniques.

Inspired by recent advances by DeepSeek and Qwen, AFMoE departs from traditional MoE by tightly integrating sparse expert routing and an enhanced attention stack, including grouped query attention, gated attention, and local/global patterns to improve long context inference.

Think of a typical MoE model as a call center with 128 specialized agents (called “experts”). However, only a few people are consulted on each call, depending on the question. It saves time and energy because it doesn’t have to be reviewed by all experts.

What makes AFMoE different is how we decide which agents to call and how we combine the answers. Most MoE models use a standard approach of selecting experts based on simple rankings.

In contrast, AFMoE uses a smoother method (called sigmoid routing) that is more like adjusting a volume dial than flipping a switch, allowing the model to blend multiple perspectives more gracefully.

The “attention-first” part means that the model focuses on how to pay attention to different parts of the conversation. Imagine reading a novel and remembering some parts more vividly than others based on importance, recency, or emotional impact. This is a caution. AFMoE improves this by combining local attention (focusing on what was just said) and global attention (remembering previous important points) using rhythms that keep things in balance.

Finally, AFMoE introduces something called gated attention. This acts like a volume control for each attention output. The model can help you emphasize or de-emphasize different information as needed, such as adjusting how much attention each voice deserves in a group discussion.

All of this is designed to make the model more stable during training and more efficient at scale. As a result, models can understand long conversations, reason more clearly, and run faster without requiring large amounts of computing resources.

Unlike many existing MoE implementations, AFMoE emphasizes stability at depth and training efficiency, using techniques such as sigmoid-based routing without auxiliary loss and depth scale normalization to support scaling without divergence.

Model features

Trinity Mini features a MoE architecture with 128 experts (8 active per token) and 1 always-on shared expert. The context window can reach up to 131,072 tokens depending on the provider.

In benchmarks, Trinity Mini shows competitive performance with larger models across inference tasks, including outperforming gpt-oss on the SimpleQA benchmark (testing factual reproduction and whether the model admits uncertainty), MMLU (zero-shot, measuring extensive academic knowledge and reasoning across many subjects without examples), and BFCL V3 (evaluating multi-step function calls and use of real-world tools).

  • MMLU (zero shot): 84.95

  • Mathematics-500: 92.10

  • GPQA Diamond: 58.55

  • BFCL V3: 59.67

Latency and throughput numbers across providers such as Together and Clarifai show throughput of over 200 tokens per second with less than 3 seconds of E2E latency, making Trinity Mini capable of running in interactive applications and agent pipelines.

Trinity Nano, although smaller in size and less stable in edge cases, demonstrates the viability of sparse MoE architectures with active parameters of less than 1B per token.

Access, pricing, and ecosystem integration

Both Trinity models are released in a permissive, enterprise environment. Apache 2.0 licenseunrestricted commercial and research use is permitted. Trinity Mini is available at:

  • hug face

  • open router

  • chat.arcee.ai

API pricing for Trinity Mini via OpenRouter:

  • $0.045 per million input tokens

  • $0.15 per million output tokens

  • Free tier available on OpenRouter for a limited time

This model is already integrated into apps like Benchable.ai, Open WebUI, and SillyTavern. This is supported by Hugging Face Transformers, VLLM, LM Studio, and llama.cpp.

Data without compromise: The role of DatologyAI

At the heart of Arcee’s approach is control over your training data. This is very different from many open models that are trained on web-scraped or legally ambiguous datasets. DatologyAI, a data curation startup co-founded by former Meta and DeepMind researcher Ali Molkos, plays a key role in this.

DatologyAI’s platform automates data filtering, deduplication, and quality improvement across modalities, ensuring Arcee’s training corpus avoids the pitfalls of noisy, biased, or copyright-risk content.

For Trinity, DatologyAI helped build a 10 trillion token curriculum organized into three phases, including 7T of general data, 1.8T of high-quality text, and 1.2T of math and code with STEM.

This is the same partnership that powers Arcee’s AFM-4.5B, but has been significantly expanded in both size and complexity. According to Arcee, Datology’s filtering and data ranking tools have allowed Trinity to scale cleanly while improving performance for tasks such as math, QA, and agent tool usage.

Datology’s contributions also extend to the generation of synthetic data. In the case of Trinity Large, the company has generated over 10 trillion synthetic tokens combined with 10T of hand-picked web tokens to form a training corpus of 20T tokens for its ongoing full-scale model.

Building the infrastructure to compete: Prime Intellect

Arcee is able to conduct full-scale training in the United States thanks to our infrastructure partner, Prime Intellect. Founded in early 2024, the startup began with a mission to democratize access to AI computing by building a decentralized GPU marketplace and training stack.

Prime Intellect made headlines for its distributed training of INTELLECT-1 (a 10B parameter model trained across contributors from five countries), but recent studies including 106B INTELLECT-3 have acknowledged scale trade-offs. In other words, distributed training works, but for models larger than 100B, a centralized infrastructure is still more efficient.

For Trinity Mini and Nano, Prime Intellect provided the orchestration stack, a modified TorchTitan runtime, and a physical computing environment (512 H200 GPUs in a custom bf16 pipeline running highly efficient HSDP parallelism). It also hosts a 2048 B300 GPU cluster used for training Trinity Large.

This collaboration shows the difference between branding and execution. While Prime Intellect’s long-term goal remains distributed computing, the near-term value for Arcee lies in an efficient and transparent training infrastructure, one that remains under U.S. jurisdiction with known origins and security controls.

Strategic bets on model sovereignty

Arcee’s push for full pre-training reflects a broader theory that the future of enterprise AI relies on owning the training loop, not just fine-tuning. As systems evolve to adapt to real-world usage and interact with tools autonomously, compliance and control over training objectives will become as important as performance.

“As applications become more ambitious, the boundaries between ‘model’ and ‘product’ continue to move,” Atkins said in Arcee’s Trinity manifesto. “To build such software, you need to control not only the instruction layer but also the weights and training pipeline.”

This framing sets Trinity apart from other open-ended efforts. Rather than patching another company’s base model, Arcee built its own model from data to deployment, infrastructure to optimizer, with partners who share our vision of openness and sovereignty.

Looking to the future: Trinity Large

Training of Trinity Large, Arcee’s 420B parameter MoE model is currently underway. It uses the same afmoe architecture scaled for a larger set of experts.

The dataset contains 20T tokens, evenly split between synthetic data from DatologyAI and selected WB data.

The model is expected to be launched next month in January 2026, with a full technical report expected shortly thereafter.

If successful, Trinity Large will be one of the only fully open weight models trained in the United States and position Arcee as a serious player in the open ecosystem, at a time when most U.S. LLM efforts are private or based in non-U.S. foundations.

US recommitment to open source

As the most ambitious open weight models are increasingly being shaped by Chinese research institutes, Arcee’s launch of Trinity signals a rare change in direction, an attempt to reclaim land for transparent model development controlled by the United States.

Built from the ground up with the support of expert data and infrastructure partners and long-term adaptability, Trinity is a bold statement about the future of AI development in the U.S., demonstrating that even as the industry becomes increasingly commoditized and commoditized, even smaller, lesser-known companies can push boundaries and innovate in the open.

What remains to be seen is whether Trinity Large can match the capabilities of its better-funded peers. But with the Mini and Nano already in use and a strong architectural foundation in place, Arcee may already have proven its central theme. This means model sovereignty, not just model size, will define the next era of AI.



Source link