.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/audio_decoding.py" .. LINE NUMBERS ARE GIVEN BELOW. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_audio_decoding.py: ======================================== Decoding audio streams with AudioDecoder ======================================== In this example, we'll learn how to decode an audio file using the :class:`~torchcodec.decoders.AudioDecoder` class. .. GENERATED FROM PYTHON SOURCE LINES 17-20 First, a bit of boilerplate: we'll download an audio file from the web and define an audio playing utility. You can ignore that part and jump right below to :ref:`creating_decoder_audio`. .. GENERATED FROM PYTHON SOURCE LINES 20-37 .. code-block:: Python import requests from IPython.display import Audio def play_audio(samples): return Audio(samples.data, rate=samples.sample_rate) # Audio source is CC0: https://opengameart.org/content/town-theme-rpg # Attribution: cynicmusic.com pixelsphere.org url = "https://opengameart.org/sites/default/files/TownTheme.mp3" response = requests.get(url, headers={"User-Agent": ""}) if response.status_code != 200: raise RuntimeError(f"Failed to download video. {response.status_code = }.") raw_audio_bytes = response.content .. GENERATED FROM PYTHON SOURCE LINES 38-46 .. _creating_decoder_audio: Creating a decoder ------------------ We can now create a decoder from the raw (encoded) audio bytes. You can of course use a local audio file and pass the path as input. You can also decode audio streams from videos! .. GENERATED FROM PYTHON SOURCE LINES 46-51 .. code-block:: Python from torchcodec.decoders import AudioDecoder decoder = AudioDecoder(raw_audio_bytes) .. GENERATED FROM PYTHON SOURCE LINES 52-55 The has not yet been decoded by the decoder, but we already have access to some metadata via the ``metadata`` attribute which is an :class:`~torchcodec.decoders.AudioStreamMetadata` object. .. GENERATED FROM PYTHON SOURCE LINES 55-57 .. code-block:: Python print(decoder.metadata) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioStreamMetadata: duration_seconds_from_header: 97.48898 begin_stream_seconds_from_header: 0.025057 bit_rate: 108039.0 codec: mp3 stream_index: 0 sample_rate: 44100 num_channels: 2 sample_format: fltp .. GENERATED FROM PYTHON SOURCE LINES 58-64 Decoding samples ---------------- To get decoded samples, we just need to call the :meth:`~torchcodec.decoders.AudioDecoder.get_all_samples` method, which returns an :class:`~torchcodec.AudioSamples` object: .. GENERATED FROM PYTHON SOURCE LINES 64-70 .. code-block:: Python samples = decoder.get_all_samples() print(samples) play_audio(samples) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioSamples: data (shape): torch.Size([2, 4297722]) pts_seconds: 0.02505668934240363 duration_seconds: 97.45401360544217 sample_rate: 44100 .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 71-85 The ``.data`` field is a tensor of shape ``(num_channels, num_samples)`` and of float dtype with values in [-1, 1]. The ``.pts_seconds`` field indicates the starting time of the output samples. Here it's 0.025 seconds, even though we asked for samples starting from 0. Not all streams start exactly at 0! This is not a bug in TorchCodec, this is a property of the file that was defined when it was encoded. Specifying a range ------------------ If we don't need all the samples, we can use :meth:`~torchcodec.decoders.AudioDecoder.get_samples_played_in_range` to decode the samples within a custom range: .. GENERATED FROM PYTHON SOURCE LINES 85-91 .. code-block:: Python samples = decoder.get_samples_played_in_range(start_seconds=10, stop_seconds=70) print(samples) play_audio(samples) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioSamples: data (shape): torch.Size([2, 2646000]) pts_seconds: 10.0 duration_seconds: 60.0 sample_rate: 44100 .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 92-99 Custom sample rate ------------------ We can also decode the samples into a desired sample rate using the ``sample_rate`` parameter of :class:`~torchcodec.decoders.AudioDecoder`. The ouput will sound the same, but note that the number of samples greatly increased: .. GENERATED FROM PYTHON SOURCE LINES 99-105 .. code-block:: Python decoder = AudioDecoder(raw_audio_bytes, sample_rate=16_000) samples = decoder.get_all_samples() print(samples) play_audio(samples) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioSamples: data (shape): torch.Size([2, 1559264]) pts_seconds: 0.02505668934240363 duration_seconds: 97.454 sample_rate: 16000 .. raw:: html

.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.789 seconds) .. _sphx_glr_download_generated_examples_audio_decoding.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_decoding.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: audio_decoding.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: audio_decoding.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_