Open SourceHigh Impact·Thursday, April 2, 2026

Open Models Match Frontier AI on Agent Tasks at 20x Lower Cost

GLM-5 and MiniMax M2.7 now match Claude Opus on core agent tasks — tool use, file ops, instruction following — at $12/day vs $250/day.

What happened

The Deep Agents team published eval results showing open-weight models GLM-5 (z.ai) and MiniMax M2.7 perform comparably to closed frontier models like Claude Opus 4.6 on seven agentic task categories: file operations, tool use, retrieval, conversation, memory, summarization, and unit tests. At 10M tokens/day throughput, MiniMax M2.7 costs ~$12/day versus ~$250/day for Claude Opus 4.6 — roughly $87k annual difference. GLM-5 on Baseten averages 0.65s latency at 70 tokens/second versus 2.56s at 34 tokens/second for Claude Opus. Deep Agents supports model switching mid-session via a /model command and runs fully local via Ollama or vLLM.

Why it matters to you

personalized

GLM-5 and MiniMax M2.7 now pass the same agentic eval categories — tool calling, structured instruction following, file ops — that previously required closed frontier models. At 70 tokens/second and 0.65s latency via Baseten, they clear the bar for interactive products where Opus's 2.5s+ latency was a blocker. Deep Agents already supports these models, and you can run them locally via Ollama or vLLM for zero data-egress risk.

What to do about it

Swap MiniMax M2.7 into your highest-volume agent endpoint this week and run your existing eval suite — if pass rate holds within 5%, you've just cut that workload's cost by ~95%.

Open Models Match Frontier AI on Agent Tasks at 20x Lower Cost

What happened

Why it matters to you

What to do about it

Try this now