Shortcuts

torchaudio.load

torchaudio.load(uri: Union[BinaryIO, str, PathLike], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None, buffer_size: int = 4096, backend: Optional[str] = None) Tuple[Tensor, int][source]

Load audio data from source using TorchCodec’s AudioDecoder.

Note

As of TorchAudio 2.9, this function relies on TorchCodec’s decoding capabilities under the hood. It is provided for convenience, but we do recommend that you port your code to natively use torchcodec’s AudioDecoder class for better performance: https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder. Because of the reliance on Torchcodec, the parameters normalize, buffer_size, and backend are ignored and accepted only for backwards compatibility. To install torchcodec, follow the instructions at https://github.com/pytorch/torchcodec#installing-torchcodec.

Parameters
  • uri (path-like object or file-like object) –

    Source of audio data. The following types are accepted:

    • path-like: File path or URL.

    • file-like: Object with read(size: int) -> bytes method.

  • frame_offset (int, optional) – Number of samples to skip before start reading data.

  • num_frames (int, optional) – Maximum number of samples to read. -1 reads all the remaining samples, starting from frame_offset.

  • normalize (bool, optional) – TorchCodec always returns normalized float32 samples. This parameter is ignored and a warning is issued if set to False. Default: True.

  • channels_first (bool, optional) – When True, the returned Tensor has dimension [channel, time]. Otherwise, the returned Tensor’s dimension is [time, channel].

  • format (str or None, optional) – Format hint for the decoder. May not be supported by all TorchCodec decoders. (Default: None)

  • buffer_size (int, optional) – Not used by TorchCodec AudioDecoder. Provided for API compatibility.

  • backend (str or None, optional) – Not used by TorchCodec AudioDecoder. Provided for API compatibility.

Returns

Resulting Tensor and sample rate. Always returns float32 tensors. If channels_first=True, shape is [channel, time], otherwise [time, channel].

Return type

(torch.Tensor, int)

Raises

Note

  • TorchCodec always returns normalized float32 samples, so the normalize

parameter has no effect. - The buffer_size and backend parameters are ignored. - Not all audio formats supported by torchaudio backends may be supported by TorchCodec.

Tutorials using load:
Speech Recognition with Wav2Vec2

Speech Recognition with Wav2Vec2

Speech Recognition with Wav2Vec2
Audio Feature Augmentation

Audio Feature Augmentation

Audio Feature Augmentation
Audio Data Augmentation

Audio Data Augmentation

Audio Data Augmentation
Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio

Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio

Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio
Audio Feature Extractions

Audio Feature Extractions

Audio Feature Extractions
Music Source Separation with Hybrid Demucs

Music Source Separation with Hybrid Demucs

Music Source Separation with Hybrid Demucs
Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming
CTC forced alignment API tutorial

CTC forced alignment API tutorial

CTC forced alignment API tutorial
Forced alignment for multilingual data

Forced alignment for multilingual data

Forced alignment for multilingual data
Forced Alignment with Wav2Vec2

Forced Alignment with Wav2Vec2

Forced Alignment with Wav2Vec2

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources