
Huawei’s Zurich Computing Systems Laboratory has released SINQ (Sinkhorn Normalization Quantization), an open-source quantization method that reduces the memory requirements of large language models (LLMs) by up to 70%. The breakthrough allows workloads that once needed enterprise GPUs like Nvidia’s A100 or H100 to run efficiently on consumer-grade cards such as the RTX 4090…
阅读更多(Read More)