Harvey AI launches global legal benchmark for UK, Australia and Spain



james ding
February 18, 2026 20:38

Harvey’s BigLaw Bench Global doubles the benchmark size and tests AI’s legal proficiency across jurisdictions as model scores reach 90% on core tasks.



Harvey AI launches global legal benchmark for UK, Australia and Spain

Harvey AI released BigLaw Bench: Global on February 18, more than doubling its public benchmark dataset with new assessments for the UK, Australian, and Spanish legal systems. The expansion is the first major update since Harvey announced plans to expand BLB five times earlier this month.

Timing is important. The primary foundation model now represents approximately 90% of BLB’s core legal work, up from approximately 60% in 2024. However, Harvey’s internal research shows that models perform poorly when tackling jurisdiction-specific tasks. BLB: Global aims to quantify exactly where localization gaps exist.

Six task categories under the microscope

Harvey built the benchmark around six workflows that corporate clients actually use: drafting, long document analysis, document comparison, public research, multi-document analysis, and extraction. Each task was designed by local practitioners in collaboration with Mercor and cross-reviewed by applied legal researchers at Harvey.

The scenario becomes more concrete. One UK task asks the model to advise on the FCA enforcement risks if a CSO sells shares before announcing a failed drug trial. The Spanish benchmark includes an analysis of CNMC’s antitrust violations against technology companies embroiled in anti-poaching agreements. Australia’s tasks include determining FIRB approval for infrastructure fund acquisitions.

“Our goal with BLB: Global is to help you understand and fix where your underlying models struggle to localize effectively on core AI tasks,” Harvey said in the announcement.

Why this matters for enterprise AI deployments

Law firms operating across borders face real challenges. An AI assistant that adeptly handles Delaware corporate law could stumble with UK financial regulations or Spanish competition law. Without standardized benchmarks, there is no way to verify consistent quality across the office.

Harvey’s approach creates a baseline against which to measure consistency by building jurisdiction-specific tasks with over 20 local experts. The company plans to expand BLB: Arena, a preference-based rating system launched in November 2025, to international markets.

More countries to come. Harvey said the company will continue to build local expert cohorts and deepen existing datasets based on customer feedback. For legal tech buyers evaluating AI vendors, BLB: Global offers something that hasn’t existed before: a standardized way to compare model performance in real-world legal work across multiple jurisdictions.

Image source: Shutterstock




Source link