torchaudio.prototype.models.conformer_wav2vec2_pretrain_large¶
- torchaudio.prototype.models.conformer_wav2vec2_pretrain_large(extractor_input_dim: int = 64, extractor_output_dim: int = 256, encoder_projection_dropout: float = 0.0, mask_prob: float = 0.3, mask_length: int = 3, num_negatives: int = 100, cross_sample_negatives: int = 0) ConformerWav2Vec2PretrainModel[source]¶
DEPRECATED
Warning
This function has been deprecated. It will be removed from 2.9 release. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. Please see https://github.com/pytorch/audio/issues/3902 for more information.
Build Conformer Wav2Vec2 Model for pre-training with “large” architecture from Conformer-Based Slef-Supervised Learning for Non-Speech Audio Tasks [Srivastava et al., 2022]
- Parameters
extractor_input_dim (int, optional) – Input dimension of the features. (Default: 64)
extractor_output_dim (int, optional) – Output dimension after feature extraction. (Default: 256)
encoder_projection_dropout (float, optional) – The dropout probability applied after the input feature is projected to
embed_dim. (Default: 0.0)mask_prob (float, optional) – Probability for each token to be chosen as start of the span to be masked. (Default: 0.3)
mask_length (int, optional) – The lengths of the mask. (Default: 3)
num_negatives (int, optional) – Number of sampled negatives. (Default: 0)
cross_sample_negatives (int, optional) – Number of cross sampled negatives. (Default: 0)
- Returns
The resulting model.
- Return type