torchaudio.load¶

torchaudio.load(uri: Union[BinaryIO, str, PathLike], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None, buffer_size: int = 4096, backend: Optional[str] = None) → Tuple[Tensor, int][source]¶

Load audio data from source using TorchCodec’s AudioDecoder.

Note

As of TorchAudio 2.9, this function relies on TorchCodec’s decoding capabilities under the hood. It is provided for convenience, but we do recommend that you port your code to natively use torchcodec’s AudioDecoder class for better performance: https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder. Because of the reliance on Torchcodec, the parameters normalize, buffer_size, and backend are ignored and accepted only for backwards compatibility. To install torchcodec, follow the instructions at https://github.com/pytorch/torchcodec#installing-torchcodec.

Parameters

uri (path-like object or file-like object) –
Source of audio data. The following types are accepted:
- path-like: File path or URL.
- file-like: Object with read(size: int) -> bytes method.
frame_offset (int, optional) – Number of samples to skip before start reading data.
num_frames (int, optional) – Maximum number of samples to read. -1 reads all the remaining samples, starting from frame_offset.
normalize (bool, optional) – TorchCodec always returns normalized float32 samples. This parameter is ignored and a warning is issued if set to False. Default: True.
channels_first (bool, optional) – When True, the returned Tensor has dimension [channel, time]. Otherwise, the returned Tensor’s dimension is [time, channel].
format (str or None, optional) – Format hint for the decoder. May not be supported by all TorchCodec decoders. (Default: None)
buffer_size (int, optional) – Not used by TorchCodec AudioDecoder. Provided for API compatibility.
backend (str or None, optional) – Not used by TorchCodec AudioDecoder. Provided for API compatibility.

Returns

Resulting Tensor and sample rate. Always returns float32 tensors. If channels_first=True, shape is [channel, time], otherwise [time, channel].

Return type

(torch.Tensor, int)

Raises

ImportError – If torchcodec is not available.
ValueError – If unsupported parameters are used.
RuntimeError – If TorchCodec fails to decode the audio.

Note

TorchCodec always returns normalized float32 samples, so the normalize

parameter has no effect. - The buffer_size and backend parameters are ignored. - Not all audio formats supported by torchaudio backends may be supported by TorchCodec.

Tutorials using load:: Speech Recognition with Wav2Vec2

Speech Recognition with Wav2Vec2

Audio Feature Augmentation

Audio Feature Augmentation

Audio Data Augmentation

Audio Data Augmentation

Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio

Torchaudio-Squim: Non-intrusive Speech Assessment in TorchAudio

Audio Feature Extractions

Audio Feature Extractions

Music Source Separation with Hybrid Demucs

Music Source Separation with Hybrid Demucs

Speech Enhancement with MVDR Beamforming

Speech Enhancement with MVDR Beamforming

CTC forced alignment API tutorial

CTC forced alignment API tutorial

Forced alignment for multilingual data

Forced alignment for multilingual data

Forced Alignment with Wav2Vec2

Forced Alignment with Wav2Vec2

torchaudio.load¶

Docs

Tutorials

Resources