Shortcuts

Encoding audio samples with AudioEncoder

In this example, we’ll learn how to encode audio samples to a file or to raw bytes using the AudioEncoder class.

Let’s first generate some samples to be encoded. The data to be encoded could also just come from an AudioDecoder!

import torch
from IPython.display import Audio as play_audio


def make_sinewave() -> tuple[torch.Tensor, int]:
    freq_A = 440  # Hz
    sample_rate = 16000  # Hz
    duration_seconds = 3  # seconds
    t = torch.linspace(0, duration_seconds, int(sample_rate * duration_seconds), dtype=torch.float32)
    return torch.sin(2 * torch.pi * freq_A * t), sample_rate


samples, sample_rate = make_sinewave()

print(f"Encoding samples with {samples.shape = } and {sample_rate = }")
play_audio(samples, rate=sample_rate)
Encoding samples with samples.shape = torch.Size([48000]) and sample_rate = 16000


We first instantiate an AudioEncoder. We pass it the samples to be encoded. The samples must be a 2D tensors of shape (num_channels, num_samples), or in this case, a 1D tensor where num_channels is assumed to be 1. The values must be float values normalized in [-1, 1]: this is also what the AudioDecoder would return.

Note

The sample_rate parameter corresponds to the sample rate of the input, not the desired encoded sample rate.

from torchcodec.encoders import AudioEncoder

encoder = AudioEncoder(samples=samples, sample_rate=sample_rate)

AudioEncoder supports encoding samples into a file via the to_file() method, or to raw bytes via to_tensor(). For the purpose of this tutorial we’ll use to_tensor(), so that we can easily re-decode the encoded samples and check their properies. The to_file() method works very similarly.

encoded_samples = encoder.to_tensor(format="mp3")
print(f"{encoded_samples.shape = }, {encoded_samples.dtype = }")
encoded_samples.shape = torch.Size([9512]), encoded_samples.dtype = torch.uint8

That’s it!

Now that we have our encoded data, we can decode it back, to make sure it looks and sounds as expected:

from torchcodec.decoders import AudioDecoder

samples_back = AudioDecoder(encoded_samples).get_all_samples()

print(samples_back)
play_audio(samples_back.data, rate=samples_back.sample_rate)
AudioSamples:
  data (shape): torch.Size([1, 48000])
  pts_seconds: 0.0690625
  duration_seconds: 3.0
  sample_rate: 16000


The encoder supports some encoding options that allow you to change how to data is encoded. For example, we can decide to encode our mono data (1 channel) into stereo data (2 channels), and to specify an output sample rate:

desired_sample_rate = 32000
encoded_samples = encoder.to_tensor(format="wav", num_channels=2, sample_rate=desired_sample_rate)

stereo_samples_back = AudioDecoder(encoded_samples).get_all_samples()

print(stereo_samples_back)
play_audio(stereo_samples_back.data, rate=desired_sample_rate)
AudioSamples:
  data (shape): torch.Size([2, 96000])
  pts_seconds: 0.0
  duration_seconds: 3.0
  sample_rate: 32000


Check the docstring of the encoding methods to learn about the different encoding options.

Total running time of the script: (0 minutes 0.043 seconds)

Gallery generated by Sphinx-Gallery

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources