Skip to content

Deep learning for Text-to-Speech with continuous speech analysis and synthesis system based on Merlin

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
COPYING
Notifications You must be signed in to change notification settings

malradhi/merlin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNN-TTS-ContVoc: Fully Text-To-Speech Demo using Continuous Vocoder

Build Status Build Status

This repository contains a TTS system based on Continuous vocoder developed at the Speech Technology and Smart Interactions Laboratory (SmartLab), Budapest University of Technology and Economics.

As a difference with other traditonal statistical parametric vocoders, continuous model focuses on extracting continuous parameters:

  • Fundamental Frequency (F0)
  • Maximum Voiced Freuqency (MVF)
  • Mel-Generalized Cepstral (MGC)

Continuous DNN-TTS

Besides feed-forward neural networks, this demo also supports recurrent neural networks (RNNs):

  • Long short-term memory (LSTM)
  • Bidirectional LSTM (BLSTM)
  • Gated recurrent units (GRU)

Installation

You need to have installed:

  • compiles: bash tools/compile_tools.sh
  • python dependencies: pip install -r requirements.txt
  • festival: sudo apt-get install festival

Run demo

To run this demo, ./egs/slt_arctic/s1/run_full_voice.sh script will:

1. Check for missing packages

The first step is to check continuous vocoder requirements in your system.

./01_chk_rqmts.sh

2. Setting up

The second step is to run setup as it creates directories and downloads the required training data files.

./02_setup.sh slt_arctic_full

OR

./02_setup.sh bdl_arctic_full

It also creates a global config file: conf/global_settings.cfg, where default settings are stored.

Directory structure:

.
├── misc
│   └── scripts
│       └── vocoder
│           ├── continuous        
│           └── ...
├── egs                     
│   └── slt_arctic
│       └── s1
│           ├── run_full_voice.sh
│           ├── conf
│           ├── scripts
│           └── experiments
│               └── slt_arctic_full                      
│                   ├── acoustic_model                  
│                   ├── duration_model                        
│                   └── test_synthesis
├── src
└── tools               

3. Prepare config files

At this point, we have to prepare two config files to train DNN models

  • Acoustic Model
  • Duration Model

To prepare config files:

./03_prepare_conf_files.sh conf/global_settings.cfg

4. Train duration model

To train duration model:

./04_train_duration_model.sh conf/duration_slt_arctic_full.conf

OR

./04_train_duration_model.sh conf/duration_bdl_arctic_full.conf

5. Train acoustic model

To train acoustic model:

./05_train_acoustic_model.sh conf/acoustic_slt_arctic_full.conf

OR

./05_train_acoustic_model.sh conf/acoustic_bdl_arctic_full.conf

6. Synthesize speech

To synthesize speech with continuous vocoder:

./06_run_merlin.sh conf/test_dur_synth_slt_arctic_full.conf conf/test_synth_slt_arctic_full.conf

OR

./06_run_merlin.sh conf/test_dur_synth_bdl_arctic_full.conf conf/test_synth_bdl_arctic_full.conf

The synthesised waveforms will be stored in: /<experiment_dir>/test_synthesis/wav



Test TTS demo with continuous vocoder

If you want to test the trained version, ./tts_demo.sh script will:

  • Create the txt directory in experiments/slt_arctic_full/test_synthesis.
  • Ask you to enter a new sentenece.
  • Synthesise speech with continuous vocoder





Contact Us

Post your questions, suggestions, and discussions to GitHub Issues.

Speech Technology and Smart Interactions Laboratory



Citation

If you publish work based on Continuous TTS, please cite:

About

Deep learning for Text-to-Speech with continuous speech analysis and synthesis system based on Merlin

Topics

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 74.9%
  • C++ 13.3%
  • Shell 5.1%
  • C 2.6%
  • MATLAB 1.7%
  • Scheme 1.4%
  • Other 1.0%