Chinese company Z.AI launches GLM-5.2: a model that rivals Cloud Opus – using Nvidia zero chips


short

  • GLM-5.2 trails Claude Opus 4.8 by just 1% on FrontierSWE — a benchmark that measures multi-hour standalone engineering projects — while outperforming GPT-5.5 on the same test. Ships under MIT license with no regional restrictions.
  • The model is built entirely on Huawei Ascend chipsets without having to use NVIDIA hardware.
  • Unsloth AI has already released 2-bit GGUF quantizations that reduced the model from 1.51TB to 238GB. You’ll still need 256GB of RAM or VRAM, but at this point, you can make it work.

Z.ai GLM-5.2 decreased on June 16, promising high-level performance, beating the already advanced GLM 5.1.

The Beijing-based laboratory is listed on the US Entity List Since January 2025The company appears to be capitalizing on growing concerns about America’s approach to artificial intelligence. Over the past week, the ban on Anthropic Fable and the release of this new model helped push zAI shares up 90%, sending them soaring to a new all-time high.

The GLM 5.2 has the numbers to back up the hype.

On FrontierSWE — a benchmark that evaluates whether an AI agent can complete open technical projects measured in hours, covering systems improvement, large-scale code building, and applied machine learning research, and which is scored by dominance rate — GLM-5.2 achieved 74.4 versus 75.1 for Claude Opus 4.8. It outperformed GPT-5.5 at 72.6. In SWE-bench Pro, which tests independent resolution of real-world GitHub issues and has a success rate recorded, GLM-5.2 scored 62.1 versus GPT-5.5’s 58.6 — and exceeded GLM-5.1’s 58.4 by a wide margin.

The jump in quality makes it the best open source model yet in the AI ​​Index, which aggregates the results of 9 different scores to evaluate the overall quality of an AI model. OpenRouter Standards Put it in the same category as the now banned Claude Fable 5.

The hardware used to achieve this feat is another interesting part of the story. GLM-5.2 is trained on Huawei Ascend chips, and there is no Nvidia anywhere in the pipeline. Imad Mushtaq, founder of Stability AI. estimated Total training costs are around $25 million, 80% of which is post-training, making it very cheap compared to its counterparts.

like The decryption was reported earlier this yearZ.ai was already training image models on Huawei’s Ascend Atlas servers without a single US chip. GLM-5.2 takes that infrastructure even further – a 744 billion-parameter expert mixture model with a true context window containing 1 million symbols, five times the 200,000 limit found in GLM-5.1, and an MIT license that means no government directive can flip the access switch.

Tokens are pieces of data that the model can read and create, while parameters are the number of internal settings and values ​​that determine how the model processes information and creates responses

Who is it and what does it cost

For developers, the context window is the operational shift. Navigating through the entire repo, rebuilding multiple files, and long proxy pipelines that previously required slicing are now a single-call workflow. The API is priced at $1.40 per million input codes and $4.40 per million outputs — versus Cloud Opus 4.8’s $5 inputs and $25 outputs. The coding plan starts at around $18 per month and works directly within Claude Code, Cline, Kilo Code, and the most popular proxy environments.

Local publishing is also technically possible. Unsloth Artificial Intelligence 2-bit GGUF quantization operations that compressed the model pushed it from 1.51 TB to 238 GB while maintaining up to 82% accuracy.

Don’t get too excited, though. This still means it requires 256GB of unified memory or a matching RAM/VRAM combination — the max for an M4 Ultra Mac Studio or a workstation with a mid-range GPU and 256GB of system RAM with an expert mix offloaded. It’s still a lot of money, but at least it’s something you can buy and run in your home if you really want to.

We ran a quick test, asking GLM-5.2 to create our standard game that combines typing and shooter mechanics. The UI wasn’t the prettiest, other models produced sleeker-looking interfaces, but the experience was the most varied: different scenarios across waves, enemy types changed, and bosses appearing later during the run.

It generated more diverse play situations than anything else we tested for the same task at zero setting.

If you want to play it, it is live on our website Itch.io profile.

This discrepancy indicates where GLM-5.2 makes more economic sense. For multi-shot creation workflows and proxy pipelines where output diversity is more important than polishing, the math is in Open source pricing levels It’s hard to argue with him. For the more difficult tasks – the SWE Marathon, where it earned 13.0 points versus 26.0 for the Opus 4.8 – the gap to closed limits is still real, and is 13 points wide.

Open source weights are still alive HuggingFace Under MIT license. Quantitative weights are also available at HuggingFace. GLM codec plan subscribers can switch now with the GLM-5.2 model series, which is also available for free testing at z.AI With some usage restrictions.

Daily debriefing Newsletter

Start each day with the latest news, plus original features, podcasts, videos and more.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *