The implementation of the paper:
Autoregressive Co-Training for Learning Discrete Speech Representations
Sung-Lin Yeh, Hao Tang
pip install -r requirements.txt
The co-training model described in the paper is defined in cotraining.py. Different components of the model
are modular and can be easily modified.
Data are processed to Kaldi I/O form,
which uses scp files to map utterance ids to positions in ark files. Functions used to process .scp and .ark files
can be found under dataflow/. We provide a data sample in sample/ for users to run the pipeline. Users can simply pluge in
your custom dataloader here.
python3 train.py --config config/cotraining.yaml
| Hours | Num codes | Model | dev93 (PER) | eval92 (PER) | Link |
|---|---|---|---|---|---|
| 360 | 256 | 3-layer lstm with Marginalization | 19.5 | 19.0 | link |
| 960 | 256 | 3-layer lstm with Marginalization | 18.2 | 17.8 | link |