Authors: Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, Timothy Baldwin
Published on: February 03, 2024
Impact Score: 8.22
Arxiv code: Arxiv:2402.02113
Summary
- What is new: Using multilingual lexicons in pretraining to boost the performance of multilingual language models, especially in low-resource languages, for zero-shot sentiment analysis tasks.
- Why this is important: The challenge of improving multilingual language models in low-resource languages due to a lack of large-scale data.
- What the research proposes: Pretraining language models using multilingual lexicons instead of relying on large-scale text data in those languages.
- Results: Achieved superior zero-shot performance in 34 languages, surpassing models fine-tuned on English sentiment datasets and large models like GPT-3.5, BLOOMZ, and XGLM.
Technical Details
Technological frameworks used: nan
Models used: GPT-3.5, BLOOMZ, XGLM
Data used: multilingual lexicons
Potential Impact
Language technology providers, multilingual customer service tools, social media monitoring platforms
Want to implement this idea in a business?
We have generated a startup concept here: LexiGlobe.
Leave a Reply