This repository contains a TTS system based on Continuous vocoder developed at the Speech Technology and Smart Interactions Laboratory (SmartLab), Budapest University of Technology and Economics.
As a difference with other traditonal statistical parametric vocoders, continuous model focuses on extracting continuous parameters:
- Fundamental Frequency (F0)
- Maximum Voiced Freuqency (MVF)
- Mel-Generalized Cepstral (MGC)
Besides feed-forward neural networks, this demo also supports recurrent neural networks (RNNs):
- Long short-term memory (LSTM)
- Bidirectional LSTM (BLSTM)
- Gated recurrent units (GRU)
You need to have installed:
- compiles:
bash tools/compile_tools.sh - python dependencies:
pip install -r requirements.txt - festival:
sudo apt-get install festival
To run this demo, ./egs/slt_arctic/s1/run_full_voice.sh script will:
The first step is to check continuous vocoder requirements in your system.
./01_chk_rqmts.shThe second step is to run setup as it creates directories and downloads the required training data files.
./02_setup.sh slt_arctic_fullOR
./02_setup.sh bdl_arctic_fullIt also creates a global config file: conf/global_settings.cfg, where default settings are stored.
Directory structure:
.
├── misc
│ └── scripts
│ └── vocoder
│ ├── continuous
│ └── ...
├── egs
│ └── slt_arctic
│ └── s1
│ ├── run_full_voice.sh
│ ├── conf
│ ├── scripts
│ └── experiments
│ └── slt_arctic_full
│ ├── acoustic_model
│ ├── duration_model
│ └── test_synthesis
├── src
└── tools
At this point, we have to prepare two config files to train DNN models
- Acoustic Model
- Duration Model
To prepare config files:
./03_prepare_conf_files.sh conf/global_settings.cfgTo train duration model:
./04_train_duration_model.sh conf/duration_slt_arctic_full.confOR
./04_train_duration_model.sh conf/duration_bdl_arctic_full.confTo train acoustic model:
./05_train_acoustic_model.sh conf/acoustic_slt_arctic_full.confOR
./05_train_acoustic_model.sh conf/acoustic_bdl_arctic_full.confTo synthesize speech with continuous vocoder:
./06_run_merlin.sh conf/test_dur_synth_slt_arctic_full.conf conf/test_synth_slt_arctic_full.confOR
./06_run_merlin.sh conf/test_dur_synth_bdl_arctic_full.conf conf/test_synth_bdl_arctic_full.confThe synthesised waveforms will be stored in: /<experiment_dir>/test_synthesis/wav
If you want to test the trained version, ./tts_demo.sh script will:
- Create the txt directory in
experiments/slt_arctic_full/test_synthesis. - Ask you to enter a new sentenece.
- Synthesise speech with continuous vocoder
Post your questions, suggestions, and discussions to GitHub Issues.
Speech Technology and Smart Interactions Laboratory
If you publish work based on Continuous TTS, please cite:
-
Al-Radhi M.S., Csapó T.G., Németh G. (2020). conTTS: Text-to-Speech Application using a Continuous Vocoder. In: Accepted to ISSP 2020. Audio Samples.
-
Al-Radhi M.S., Csapó T.G., Németh G. (2020). Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis. IEICE Transactions on Information and Systems, E103.D(5), pp. 1099-1107. Audio Samples.
-
Al-Radhi M.S., Csapó T.G., Németh G. (2017). Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder. In: Karpov A., Potapova R., Mporas I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science, vol 10458. Springer, Cham, Hatfield, UK.
