Authors: Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam
Published on: March 20, 2024
Impact Score: 8.0
Arxiv code: Arxiv:2403.13737
Summary
- What is new: Introduction of EthioLLM — multilingual large language models for five Ethiopian languages and English, and Ethiobenchmark — a new benchmark dataset for various downstream NLP tasks.
- Why this is important: Insufficiency of resources for training large language models in low-resource languages, particularly Ethiopian languages.
- What the research proposes: Developed and open-sourced multilingual language models and a benchmark dataset specifically tailored for Ethiopian languages and various NLP tasks.
- Results: The performance of these models was evaluated across five downstream NLP tasks, demonstrating their effectiveness.
Technical Details
Technological frameworks used: Large Language Models (LLMs)
Models used: EthioLLM
Data used: Ethiobenchmark
Potential Impact
Companies in the AI and NLP spaces focusing on multilingual applications, especially those targeting African markets and languages.
Want to implement this idea in a business?
We have generated a startup concept here: EthioNLP Solutions.
Leave a Reply