Chinese Frontier Model Benchmarks (Jun 2026)

Latest frontier flagship per lab · Excludes OpenAI, Anthropic, Google Gemini models. Numbers are vendor-reported from official READMEs/blogs unless marked as independent eval.

As of 1 Jun 2026 7 labs · 8 models Sources: HF README · official blogs · AA · Vals Pricing: official API · USD / M tokens

Latest model per lab

MiniMax
M3
Jun 1 · 1M ctx · MSA · MIT (weights ~10d)
$0.30 / $1.20 in · out
DeepSeek
V4-Pro Max
Apr 24 · 1.6T/49B act · 1M ctx · MIT
$0.435 / $0.87 in · out
Moonshot
Kimi K2.6
Apr 21 · 1T/32B act · 256K · Mod. MIT
$0.95 / $4.00 in · out
Z.ai
GLM-5.1
Apr 8 · 744B/40B act · 202K · MIT
$1.40 / $4.40 in · out
Alibaba
Qwen3.7-Max
May 19 · 1M ctx · text agent · API-only
$2.50 / $7.50 in · out
Alibaba
Qwen3.7-Plus
May 26 · 1M ctx · multimodal · API-only
$0.40 / $1.60 in · out
Xiaomi
MiMo-V2.5-Pro
Apr 22 · 1.02T/42B act · 1M · MIT
$1.00 / $3.00 in · out
StepFun
Step 3.7 Flash
May 29 · 198B/11B act · 256K · Apache 2.0
$0.20 / $1.15 in · out

API pricing (USD per 1M tokens · official vendor)

Purple highlight = lowest in row Open-weight models also support self-hosting (no per-token API fee)
Model Access Input (cache miss) Output Cached input Notes
MiniMax M3 API $0.30 $1.20 $0.06 ≤512k ctx · 50% launch promo (list $0.60 / $2.40)
DeepSeek V4-Pro Max Open + API $0.435 $0.87 $0.003625 MIT weights · 1M ctx
Kimi K2.6 Open + API $0.95 $4.00 $0.16 Mod. MIT · 256k ctx · thinking default
GLM-5.1 Thinking Open + API $1.40 $4.40 $0.26 MIT weights · 202k ctx
Qwen3.7-Max API only $2.50 $7.50 $0.25 Text agent · 1M ctx · Alibaba Model Studio
Qwen3.7-Plus API only $0.40 $1.60 - Multimodal · ≤256k tier · $1.20 / $4.80 above 256k
MiMo-V2.5-Pro Open + API $1.00 $3.00 $0.20 MIT weights · ≤256k tier · overseas API
Step 3.7 Flash Open + API $0.20 $1.15 $0.04 Apache 2.0 · 256k ctx

Pricing cross-checked 2026-06-01 · Qwen3.7-Plus added · Long-context tiers cost more · Plus not on OpenRouter yet

Independent composite scores

Artificial Analysis (independent) Vals.ai (independent)
Benchmark M3 DS V4-Pro K2.6 GLM-5.1 Q3.7-Max Q3.7-Plus MiMo-V2.5 Step-3.7
AA Intelligence Index v4.0 - 52 54 51 57 - 54 -
Vals Index - 56.23% 55.55% 52.14% 57.29% - - -

AA / Vals cross-checked 2026-06-01 · Q3.7-Plus not on AA or Vals yet · Plus numbers from qwen.ai/blog

Full benchmark matrix (vendor-reported unless noted)

Official model card / blog Green highlight = best in row among models
Benchmark M3 DS V4-Pro Max K2.6 GLM-5.1 Q3.7-Max Q3.7-Plus MiMo-V2.5-Pro Step-3.7
Knowledge & Reasoning
MMLU-Pro - 87.5 - - 89.6 88.5 68.5 -
GPQA Diamond - 90.1 90.5 86.2 92.4 90.3 66.7 78.41
HLE (no tools) - 37.7 34.7 31.0 41.4 34.7 - 49.7
HLE w/ tools - 48.2 54.0 52.3 - - - 48.1
AIME 2026 - - 96.4 95.3 - - - -
LiveCodeBench v6 - 93.5 89.6 - 91.6 89.6 39.6 -
IMOAnswerBench - 89.8 86.0 83.8 - 86.0 - -
Apex Math Reasoning - 38.3 - - 44.5 22.7 - -
Coding & Agentic
SWE-bench Verified - 80.6 80.2 - 80.4 77.7 78.9 76.5
SWE-bench Pro 59.0 55.4 58.6 58.4 60.6 57.6 57.2 56.3
SWE-bench Multilingual - 76.2 76.7 - 78.3 75.8 - 72.4
Terminal-Bench 2.0 / 2.1 66.0 (2.1) 67.9 (2.0) 66.7 (2.0) 63.5 (2.0) 69.7 (2.0) 70.3 (2.0) 68.4 (2.0) 59.6 (2.1)
NL2Repo - - - 42.7 - 41.1 - -
BrowseComp - 83.4 83.2 68.0 - - - 75.82
Toolathlon - 51.8 50.0 40.7 - - - 49.5
MCPAtlas Public 74.2 73.6 - 71.8 76.4 73.2 - -
ClawEval Pass³ - - 62.3 62.7 65.2 62.7 64.0 67.1
GDPval-AA (Elo) - 1554 - - - - - 1415.8
τ²-Bench - - - - - - - -
Skillsbench - - - - 59.2 54.9 - -
SciCode - - 52.2 - 53.5 51.3 - -
CyberGym - - - 68.7 - - - -
KernelBench Hard 28.8 - - - - - - -
SWE-fficiency 34.8 - - - - - - -
PostTrainBench 0.37 - - - - - - -
SimpleVQA Search - - - - - 81.7 - 79.16
Long Context
MRCR 1M - 83.5 - - - - - -
CorpusQA 1M - 62.0 - - - - - -

Sources