airllm

2.11.0
216.91k

AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning. 8GB vmem to run 405B Llama3.1.

Gavin LiSep 21, 2024