AudioDecoder¶

class torchcodec.decoders.AudioDecoder(source: Union[str, Path, RawIOBase, BufferedReader, bytes, Tensor], *, stream_index: Optional[int] = None, sample_rate: Optional[int] = None, num_channels: Optional[int] = None)[source]¶

A single-stream audio decoder.

This can be used to decode audio from pure audio files (e.g. mp3, wav, etc.), or from videos that contain audio streams (e.g. mp4 videos).

Returned samples are float samples normalized in [-1, 1]

Parameters:

source (str, Pathlib.path, bytes, torch.Tensor or file-like object) –
The source of the video:
- If str: a local path or a URL to a video or audio file.
- If Pathlib.path: a path to a local video or audio file.
- If bytes object or torch.Tensor: the raw encoded audio data.
- If file-like object: we read video data from the object on demand. The object must expose the methods read(self, size: int) -> bytes and seek(self, offset: int, whence: int) -> bytes. Read more in: Streaming data through file-like support.
stream_index (int, optional) – Specifies which stream in the file to decode samples from. Note that this index is absolute across all media types. If left unspecified, then the best stream is used.
sample_rate (int, optional) – The desired output sample rate of the decoded samples. By default, the sample rate of the source is used.
num_channels (int, optional) – The desired number of channels of the decoded samples. By default, the number of channels of the source is used.

Variables:

metadata (AudioStreamMetadata) – Metadata of the audio stream.
stream_index (int) – The stream index that this decoder is retrieving samples from. If a stream index was provided at initialization, this is the same value. If it was left unspecified, this is the best stream.

Examples using AudioDecoder:

Decoding audio streams with AudioDecoder

Streaming data through file-like support

get_all_samples() → AudioSamples[source]¶

Returns all the audio samples from the source.

To decode samples in a specific range, use get_samples_played_in_range().

Returns:: The samples within the file.
Return type:: AudioSamples

get_samples_played_in_range(start_seconds: float = 0.0, stop_seconds: Optional[float] = None) → AudioSamples[source]¶

Returns audio samples in the given range.

Samples are in the half open range [start_seconds, stop_seconds).

To decode all the samples from beginning to end, you can call this method while leaving start_seconds and stop_seconds to their default values, or use get_all_samples() as a more convenient alias.

Parameters:

start_seconds (float) – Time, in seconds, of the start of the range. Default: 0.
stop_seconds (float or None) – Time, in seconds, of the end of the range. As a half open range, the end is excluded. Default: None, which decodes samples until the end.

Returns:

The samples within the specified range.

Return type:

AudioSamples

AudioDecoder¶

Docs

Tutorials

Resources