谷歌放出Gemma 4 QAT量化版！26B模型仅需15GB内存，精度几乎无损

Type: article
Author: 今日头条作者（未具名）
Primary Topic: 本地部署
Ingested: 2026-06-07

Summary

Google DeepMind发布了Gemma 4全系列QAT量化版本，将26B MoE模型的显存需求从50GB压缩至15GB，降幅达72%且精度接近BF16原版。Unsloth团队进一步推出动态量化GGUF版本，采用UD-Q4_K_XL方案使26B模型Top-1准确率达85.6%，比普通Q4_0方案高出15.6%。此次更新大幅降低了本地运行大模型的硬件门槛，16GB显存的消费级显卡即可运行26B级别模型。

Key Concepts

QAT量化感知训练
GGUF格式
MoE模型
动态量化
本地推理
显存优化

Entities

Google DeepMind
Gemma 4
Unsloth
Hugging Face
RTX 4060Ti

Source

Raw: gemma4-qat-quantization-26b-15gb-memory.md

Relations

(none)

Auto-generated on 2026-06-07