Encoding audio samples with AudioEncoder¶
In this example, we’ll learn how to encode audio samples to a file or to raw
bytes using the AudioEncoder
class.
Let’s first generate some samples to be encoded. The data to be encoded could
also just come from an AudioDecoder
!
import torch
from IPython.display import Audio as play_audio
def make_sinewave() -> tuple[torch.Tensor, int]:
freq_A = 440 # Hz
sample_rate = 16000 # Hz
duration_seconds = 3 # seconds
t = torch.linspace(0, duration_seconds, int(sample_rate * duration_seconds), dtype=torch.float32)
return torch.sin(2 * torch.pi * freq_A * t), sample_rate
samples, sample_rate = make_sinewave()
print(f"Encoding samples with {samples.shape = } and {sample_rate = }")
play_audio(samples, rate=sample_rate)
Encoding samples with samples.shape = torch.Size([48000]) and sample_rate = 16000
We first instantiate an AudioEncoder
. We pass it
the samples to be encoded. The samples must be a 2D tensors of shape
(num_channels, num_samples)
, or in this case, a 1D tensor where
num_channels
is assumed to be 1. The values must be float values
normalized in [-1, 1]
: this is also what the
AudioDecoder
would return.
Note
The sample_rate
parameter corresponds to the sample rate of the
input, not the desired encoded sample rate.
from torchcodec.encoders import AudioEncoder
encoder = AudioEncoder(samples=samples, sample_rate=sample_rate)
AudioEncoder
supports encoding samples into a
file via the to_file()
method, or to
raw bytes via to_tensor()
. For the
purpose of this tutorial we’ll use
to_tensor()
, so that we can easily
re-decode the encoded samples and check their properies. The
to_file()
method works very similarly.
encoded_samples = encoder.to_tensor(format="mp3")
print(f"{encoded_samples.shape = }, {encoded_samples.dtype = }")
encoded_samples.shape = torch.Size([9512]), encoded_samples.dtype = torch.uint8
That’s it!
Now that we have our encoded data, we can decode it back, to make sure it looks and sounds as expected:
from torchcodec.decoders import AudioDecoder
samples_back = AudioDecoder(encoded_samples).get_all_samples()
print(samples_back)
play_audio(samples_back.data, rate=samples_back.sample_rate)
AudioSamples:
data (shape): torch.Size([1, 48000])
pts_seconds: 0.0690625
duration_seconds: 3.0
sample_rate: 16000
The encoder supports some encoding options that allow you to change how to data is encoded. For example, we can decide to encode our mono data (1 channel) into stereo data (2 channels), and to specify an output sample rate:
desired_sample_rate = 32000
encoded_samples = encoder.to_tensor(format="wav", num_channels=2, sample_rate=desired_sample_rate)
stereo_samples_back = AudioDecoder(encoded_samples).get_all_samples()
print(stereo_samples_back)
play_audio(stereo_samples_back.data, rate=desired_sample_rate)
AudioSamples:
data (shape): torch.Size([2, 96000])
pts_seconds: 0.0
duration_seconds: 3.0
sample_rate: 32000
Check the docstring of the encoding methods to learn about the different encoding options.
Total running time of the script: (0 minutes 0.043 seconds)