Authors: Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun
Published on: February 06, 2024
Impact Score: 8.22
Arxiv code: Arxiv:2402.03988
Summary
- What is new: REBORN introduces a novel approach that combines reinforcement learning with iterative training to improve unsupervised automatic speech recognition.
- Why this is important: Learning the mapping between speech signals and text without paired data is challenging due to the variable length and unknown boundaries of word/phoneme segments.
- What the research proposes: REBORN, a method that alternates between training a segmentation model and a phoneme prediction model, using reinforcement learning for improving segmentation.
- Results: REBORN outperforms previous unsupervised ASR models on multiple datasets, including LibriSpeech, TIMIT, and non-English languages from Multilingual LibriSpeech.
Technical Details
Technological frameworks used: Iterative training with reinforcement learning for boundary segmentation.
Models used: Segmentation model and phoneme prediction model.
Data used: LibriSpeech, TIMIT, Multilingual LibriSpeech datasets.
Potential Impact
Speech recognition software providers, educational tech, and language learning platforms could benefit or be disrupted.
Want to implement this idea in a business?
We have generated a startup concept here: PhonemeGenie.
Leave a Reply