HiFiGANVocoderBundle¶
- class torchaudio.prototype.pipelines.HiFiGANVocoderBundle[source]¶
- Data class that bundles associated information to use pretrained - HiFiGANVocoder.- This class provides interfaces for instantiating the pretrained model along with the information necessary to retrieve pretrained weights and additional data to be used with the model. - Torchaudio library instantiates objects of this class, each of which represents a different pretrained model. Client code should access pretrained models via these instances. - This bundle can convert mel spectrorgam to waveforms and vice versa. A typical use case would be a flow like text -> mel spectrogram -> waveform, where one can use an external component, e.g. Tacotron2, to generate mel spectrogram from text. Please see below for the code example. - Example: Transform synthetic mel spectrogram to audio.
- >>> import torch >>> import torchaudio >>> # Since HiFiGAN bundle is in prototypes, it needs to be exported explicitly >>> from torchaudio.prototype.pipelines import HIFIGAN_VOCODER_V3_LJSPEECH as bundle >>> >>> # Load the HiFiGAN bundle >>> vocoder = bundle.get_vocoder() Downloading: "https://download.pytorch.org/torchaudio/models/hifigan_vocoder_v3_ljspeech.pth" 100%|████████████| 5.59M/5.59M [00:00<00:00, 18.7MB/s] >>> >>> # Generate synthetic mel spectrogram >>> specgram = torch.sin(0.5 * torch.arange(start=0, end=100)).expand(bundle._vocoder_params["in_channels"], 100) >>> >>> # Transform mel spectrogram into audio >>> waveform = vocoder(specgram) >>> torchaudio.save('sample.wav', waveform, bundle.sample_rate) 
- Example: Usage together with Tacotron2, text to audio.
- >>> import torch >>> import torchaudio >>> # Since HiFiGAN bundle is in prototypes, it needs to be exported explicitly >>> from torchaudio.prototype.pipelines import HIFIGAN_VOCODER_V3_LJSPEECH as bundle_hifigan >>> >>> # Load Tacotron2 bundle >>> bundle_tactron2 = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH >>> processor = bundle_tactron2.get_text_processor() >>> tacotron2 = bundle_tactron2.get_tacotron2() >>> >>> # Use Tacotron2 to convert text to mel spectrogram >>> text = "A quick brown fox jumped over a lazy dog" >>> input, lengths = processor(text) >>> specgram, lengths, _ = tacotron2.infer(input, lengths) >>> >>> # Load HiFiGAN bundle >>> vocoder = bundle_hifigan.get_vocoder() Downloading: "https://download.pytorch.org/torchaudio/models/hifigan_vocoder_v3_ljspeech.pth" 100%|████████████| 5.59M/5.59M [00:03<00:00, 1.55MB/s] >>> >>> # Use HiFiGAN to convert mel spectrogram to audio >>> waveform = vocoder(specgram).squeeze(0) >>> torchaudio.save('sample.wav', waveform, bundle_hifigan.sample_rate) 
 
Properties¶
sample_rate¶
Methods¶
get_mel_transform¶
get_vocoder¶
- HiFiGANVocoderBundle.get_vocoder(*, dl_kwargs=None) HiFiGANVocoder[source]¶
- Construct the HiFiGAN Generator model, which can be used a vocoder, and load the pretrained weight. - The weight file is downloaded from the internet and cached with - torch.hub.load_state_dict_from_url()- Parameters:
- dl_kwargs (dictionary of keyword arguments) – Passed to - torch.hub.load_state_dict_from_url().
- Returns:
- Variation of - HiFiGANVocoder.