HIFIGAN_VOCODER_V3_LJSPEECH¶

torchaudio.prototype.pipelines.HIFIGAN_VOCODER_V3_LJSPEECH¶

[DEPRECATED]

Warning

This object is deprecated deprecated from version 2.8. It will be removed in the 2.9 release. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. Please see https://github.com/pytorch/audio/issues/3902 for more information.

HiFiGAN Vocoder pipeline, trained on The LJ Speech Dataset

[Ito and Johnson, 2017].

This pipeine can be used with an external component which generates mel spectrograms from text, for example, Tacotron2 - see examples in HiFiGANVocoderBundle. Although this works with the existing Tacotron2 bundles, for the best results one needs to retrain Tacotron2 using the same data preprocessing pipeline which was used for training HiFiGAN. In particular, the original HiFiGAN implementation uses a custom method of generating mel spectrograms from waveforms, different from torchaudio.transforms.MelSpectrogram. We reimplemented this transform as HiFiGANVocoderBundle.get_mel_transform(), making sure it is equivalent to the original HiFiGAN code here.

The underlying vocoder is constructed by torchaudio.prototype.models.hifigan_vocoder(). The weights are converted from the ones published with the original paper [Kong et al., 2020] under MIT License. See links to pre-trained models on GitHub.

Please refer to HiFiGANVocoderBundle for usage instructions.

HIFIGAN_VOCODER_V3_LJSPEECH¶

Docs

Tutorials

Resources