大模型天梯

最新的 AI 大模型（LLM）评分天梯排行榜，综合了 10 项评测：GDPval-AA、τ²-Bench Telecom、Terminal-Bench Hard、SciCode、AA-LCR、AA-Omniscience、IFBench、Humanity's Last Exam、GPQA Diamond、CritPt

\	Anthropic	OpenAI	智谱AI	谷歌	阿里巴巴	深度求索	MiniMax	月之暗面	Meta	小米	Nvidia	xAI	腾讯	阶跃星辰	Mistral	AWS	KAT	LG	Upstage	昆仑万维
60-64	Claude Fable 5	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
55-59	Claude Opus 4.8 (max)	GPT-5.5 (xhigh)	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
50-54	Claude Opus 4.7 (max)	GPT-5.4 (xhigh)	GLM-5.2 (max)	Gemini 3.5 Flash	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
45-49	Claude Sonnet 4.6 (max)	-	-	Gemini 3.1 Pro Preview	Qwen3.7 Max	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
40-44	Claude Opus 4.6 (max)	GPT-5.3 Codex (xhigh) GPT-5.2 (xhigh) GPT-5.2 Codex (xhigh)	GLM-5 (Resoning) GLM-5.1 (Resoning)	Gemini 3 Pro Preview (high)	Qwen3.6 Max Preview Qwen3.6 Plus	DeepSeek V4 Pro DeepSeek V4 Flash	MiniMax-M3	Kimi K2.6	Muse Spark	MiMo-V2.5-Pro MiMo-V2-Pro	-	-	-	-	-	-	-	-	-	-
35-39	Claude Opus 4.6 (Non-reasoning) Claude 4.5 Sonnet (Reasoning) Claude Opus 4.5 (Non-Reasoning)	GPT-5 (high)	-	Gemini 3 Flash	Qwen3.7 Plus	-	MiniMax-M2.7	Kimi K2.5 (Resoning)	-	-	Nemotron 3 Ultra 550B A55B (Reasoning)	Grok 4.3 Grok 4.20 Beta 0309 (Reasoning) Grok 4.3 (medium)	-	-	-	-	-	-	-	-
30-34	Claude 4.5 Haiku	o3-pro o3	GLM-4.7 (Resoning) GLM-5 (Non-resoning)	-	Qwen3.5 27B (Reasoning) Qwen3.5 397B A7B (Resaoning) Qwen3 Max Thinking Qwen3.5 122B A10B Qwen3.5 122B A10B Qwen3.5 397B A7B (Non-resaoning)	DeepSeek V3.2	MiniMax-M2.5 MiniMax-M2.1	Kimi K2 (Thinking)	-	MiMo-V2-Flash	-	Grok 4 Grok 4.1 Fast	Hy3-preview	Step 3.7 Flash	Mistral Medium 3.5	-	-	-	-	-
25-29	Claude 3.7 Sonnet	-	-	Gemma 4 31B Gemini 2.5 Pro	Qwen3.5 27B (Non-reasoning) Qwen3.5 35B A3B	-	MiniMax-M2	Kimi K2.5 (No-Resoning)	-	-	Nemotron 3 Super 120B A12B (Reasoning)	-	-	Step 3.5 Flash 2603	-	Nova 2.0 Pro Review (medium)	KAT-Coder-Pro V1	-	-	-
20-24	-	gpt-oss-120B (high) o1	GLM-4.7 (Non-resoning) GLM-4.6 (Resoning)	Gemini 3.1 Flash-Lite Preview	Qwen3 Max	DeepSeek V3.1 Terminus (Reasoning) DeepSeek V3.2 Speciale	-	Kimi K2 (0905)	-	-	-	-	Hy3-preview (Non-reasoning)	-	-	-	-	K-EXAONE	-	-
15-19	-	-	GLM-4.5 (Resoning)	Gemini 2.5 Flash	-	DeepSeek V3.1 (Reasoning) DeepSeek R1 0528	-	Kimi K2	-	-	-	-	-	-	-	-	-	-	Solar Pro 3	K2 Think V2
10-14	-	gpt-oss-20B (high) GPT-4.1 mini GPT-5 (ChatGPT)	-	-	Qwen3 32B	-	MiniMax-M1 80K MiniMax-M1 80K	-	-	-	-	-	-	-	Mistral Large 3	-	-	-	-	-
5-9	-	GPT-4.1 nano	-	Gemini 2.0 Flash	-	-	-	-	Llama 4 Maverick	-	-	-	-	-	-	-	-	-	-	-
0-4	Claude 3 Opus Claude 2.1	GPT-4 Turbo GPT-4o mini	-	-	-	DeepSeek-V2.5	-	-	-	-	-	-	-	-	-	-	-	-	-	-