Authors: Lucio Dery, Steven Kolawole, Jean-Francois Kagey, Virginia Smith, Graham Neubig, Ameet Talwalkar
Published on: February 08, 2024
Impact Score: 8.45
Arxiv code: Arxiv:2402.05406
Summary
- What is new: Bonsai is a new, gradient-free, perturbative pruning method for LLMs that works with low hardware resources and delivers small, fast, and accurate models.
- Why this is important: LLMs are increasingly becoming inaccessible to lay practitioners due to the generational gap in available hardware, and existing compression methods are too resource-intensive.
- What the research proposes: Using Bonsai, a method that requires only forward passes, to prune LLMs, making them accessible to users with limited hardware.
- Results: Bonsai pruned models outperform those from gradient-based methods and are twice as fast (with comparable accuracy) as models from semi-structured methods using similar resources. Demonstrated on the Huggingface Open LLM leaderboard with a new sub-2B model.
Technical Details
Technological frameworks used: Bonsai
Models used: Large Language Models (LLMs)
Data used: Huggingface Open LLM leaderboard tasks
Potential Impact
This innovation could disrupt markets that rely on large language models for services, including sectors like AI tools, search engines, and companies offering NLP services, by democratizing access to cutting-edge LLMs.
Want to implement this idea in a business?
We have generated a startup concept here: BonsaiTech.
Leave a Reply