Your conditions: 李劲东
  • MonTTS: A Real-time and High-fidelity Mongolian TTS Model with Complete Non-autoregressive Mechanism

    Subjects: Computer Science >> Other Disciplines of Computer Science submitted time 2021-12-20

    Abstract: " Aiming at achieving real-time and high-fidelity speech generation for Mongolian Text-to-Speech (TTS), a FastSpeech2 based non-autoregressive Mongolian TTS system, termed MonTTS, is proposed. To improve the overall performance in terms of prosody naturalness/fidelity, MonTTS adopted three novel mechanisms: 1) Mongolian phoneme sequence was used to repre-sent the Mongolian pronunciation; 2) phoneme-level variance adaptor was employed to learn the long-term prosody infor-mation; 3) two duration aligners, that are Mongolian speech recognition and Mongolian autoregressive TTS based models, were used to provide the duration supervise signal. Besides, we build a large-scale Mongolian TTS corpus, named Mon-Speech. The experimental results show that our MonTTS outperforms the state-of-the-art Tacotron-based Mongolian TTS and standard FastSpeech2 baseline systems significantly, with real-time rate (RTF) of 3.63× 10?3 and Mean Opinion Score (MOS) of 4.53, meeting the real-time and high-fidelity inference requirements. The training recipe and pretrained TTS models are freely available at https://github.com/ttslr/MonTTS."