TurboQuant:显存省5倍的KV Cache压缩算法
Type: article
Author: unknown
Primary Topic: knowledge-management
Ingested: 2026-04-22
Summary
Key Concepts
- (none)
Entities
- (none)
Source
Relations
- (none)
Auto-generated on 2026-04-22
相关文章(自动整合)
128GB内存跑Qwen 3.6-35B-A3B,1M上下文仅它没爆OOM — 两文均探讨llama.cpp中KV Cache量化方案对内存的影响(重叠度: 2,整合于 2026-05-07)
8G显存畅跑35B大模型|TurboQuant+llama.cpp+Qwen3.6 部署教程 — TurboQuant量化算法的原理介绍与实际部署教程(重叠度: 2,整合于 2026-04-28)
Lucebox:让单张RTX 3090跑Qwen3.5-27B,速度飙到207tok/s — Lucebox与TurboQuant均致力于降低本地LLM推理资源消耗(重叠度: 2,整合于 2026-04-27)