.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_io_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_io_tutorial.py: Audio I/O ========= **Author**: `Moto Hira `__ This tutorial shows how to use TorchAudio's basic I/O API to inspect audio data, load them into PyTorch Tensors and save PyTorch Tensors. .. warning:: Starting with version 2.8, we are refactoring TorchAudio to transition it into a maintenance phase. As a result: - The APIs described in this tutorial are deprecated in 2.8 and will be removed in 2.9. - The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. .. GENERATED FROM PYTHON SOURCE LINES 22-29 .. code-block:: default import torch import torchaudio print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 2.8.0+cu126 2.8.0 .. GENERATED FROM PYTHON SOURCE LINES 30-42 Preparation ----------- First, we import the modules and download the audio assets we use in this tutorial. .. note:: When running this tutorial in Google Colab, install the required packages with the following: .. code:: !pip install boto3 .. GENERATED FROM PYTHON SOURCE LINES 42-72 .. code-block:: default import io import os import tarfile import tempfile import boto3 import matplotlib.pyplot as plt import requests from botocore import UNSIGNED from botocore.config import Config from IPython.display import Audio from torchaudio.utils import download_asset SAMPLE_GSM = download_asset("tutorial-assets/steam-train-whistle-daniel_simon.gsm") SAMPLE_WAV = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav") SAMPLE_WAV_8000 = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav") def _hide_seek(obj): class _wrapper: def __init__(self, obj): self.obj = obj def read(self, n): return self.obj.read(n) return _wrapper(obj) .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/audio/examples/tutorials/audio_io_tutorial.py:56: UserWarning: torchaudio.utils.download.download_asset has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. SAMPLE_GSM = download_asset("tutorial-assets/steam-train-whistle-daniel_simon.gsm") 0%| | 0.00/7.99k [00:00`__ - ``"ULAW"``: Mu-law, [`wikipedia `__] - ``"ALAW"``: A-law [`wikipedia `__] - ``"MP3"`` : MP3, MPEG-1 Audio Layer III - ``"VORBIS"``: OGG Vorbis [`xiph.org `__] - ``"AMR_NB"``: Adaptive Multi-Rate [`wikipedia `__] - ``"AMR_WB"``: Adaptive Multi-Rate Wideband [`wikipedia `__] - ``"OPUS"``: Opus [`opus-codec.org `__] - ``"GSM"``: GSM-FR [`wikipedia `__] - ``"HTK"``: Single channel 16-bit PCM - ``"UNKNOWN"`` None of above .. GENERATED FROM PYTHON SOURCE LINES 117-123 **Note** - ``bits_per_sample`` can be ``0`` for formats with compression and/or variable bit rate (such as MP3). - ``num_frames`` can be ``0`` for GSM-FR format. .. GENERATED FROM PYTHON SOURCE LINES 123-128 .. code-block:: default metadata = torchaudio.info(SAMPLE_GSM) print(metadata) .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/audio/examples/tutorials/audio_io_tutorial.py:124: UserWarning: torchaudio._backend.utils.info has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. metadata = torchaudio.info(SAMPLE_GSM) /pytorch/audio/src/torchaudio/_backend/ffmpeg.py:20: UserWarning: torio.io._streaming_media_decoder.StreamingMediaDecoder has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. s = torchaudio.io.StreamReader(src, format, None, buffer_size) /pytorch/audio/src/torchaudio/_backend/ffmpeg.py:27: UserWarning: torchaudio._backend.common.AudioMetaData has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. return AudioMetaData( AudioMetaData(sample_rate=8000, num_frames=39680, num_channels=1, bits_per_sample=0, encoding=GSM) .. GENERATED FROM PYTHON SOURCE LINES 129-134 Querying file-like object ------------------------- :py:func:`torchaudio.info` works on file-like objects. .. GENERATED FROM PYTHON SOURCE LINES 134-140 .. code-block:: default url = "https://download.pytorch.org/torchaudio/tutorial-assets/steam-train-whistle-daniel_simon.wav" with requests.get(url, stream=True) as response: metadata = torchaudio.info(_hide_seek(response.raw)) print(metadata) .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/audio/examples/tutorials/audio_io_tutorial.py:137: UserWarning: torchaudio._backend.utils.info has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. metadata = torchaudio.info(_hide_seek(response.raw)) /pytorch/audio/src/torchaudio/_backend/ffmpeg.py:20: UserWarning: torio.io._streaming_media_decoder.StreamingMediaDecoder has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. s = torchaudio.io.StreamReader(src, format, None, buffer_size) /pytorch/audio/src/torchaudio/_backend/ffmpeg.py:27: UserWarning: torchaudio._backend.common.AudioMetaData has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. return AudioMetaData( AudioMetaData(sample_rate=44100, num_frames=109368, num_channels=2, bits_per_sample=16, encoding=PCM_S) .. GENERATED FROM PYTHON SOURCE LINES 141-149 .. note:: When passing a file-like object, ``info`` does not read all of the underlying data; rather, it reads only a portion of the data from the beginning. Therefore, for a given audio format, it may not be able to retrieve the correct metadata, including the format itself. In such case, you can pass ``format`` argument to specify the format of the audio. .. GENERATED FROM PYTHON SOURCE LINES 151-167 Loading audio data ------------------ To load audio data, you can use :py:func:`torchaudio.load`. This function accepts a path-like object or file-like object as input. The returned value is a tuple of waveform (``Tensor``) and sample rate (``int``). By default, the resulting tensor object has ``dtype=torch.float32`` and its value range is ``[-1.0, 1.0]``. For the list of supported format, please refer to `the torchaudio documentation `__. .. GENERATED FROM PYTHON SOURCE LINES 167-171 .. code-block:: default waveform, sample_rate = torchaudio.load(SAMPLE_WAV) .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/audio/src/torchaudio/_backend/utils.py:213: UserWarning: In 2.9, this function's implementation will be changed to use torchaudio.load_with_torchcodec` under the hood. Some parameters like ``normalize``, ``format``, ``buffer_size``, and ``backend`` will be ignored. We recommend that you port your code to rely directly on TorchCodec's decoder instead: https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder.html#torchcodec.decoders.AudioDecoder. warnings.warn( /pytorch/audio/src/torchaudio/_backend/ffmpeg.py:88: UserWarning: torio.io._streaming_media_decoder.StreamingMediaDecoder has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. s = torchaudio.io.StreamReader(src, format, None, buffer_size) .. GENERATED FROM PYTHON SOURCE LINES 173-190 .. code-block:: default def plot_waveform(waveform, sample_rate): waveform = waveform.numpy() num_channels, num_frames = waveform.shape time_axis = torch.arange(0, num_frames) / sample_rate figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].plot(time_axis, waveform[c], linewidth=1) axes[c].grid(True) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") figure.suptitle("waveform") .. GENERATED FROM PYTHON SOURCE LINES 192-195 .. code-block:: default plot_waveform(waveform, sample_rate) .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_001.png :alt: waveform :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 197-212 .. code-block:: default def plot_specgram(waveform, sample_rate, title="Spectrogram"): waveform = waveform.numpy() num_channels, num_frames = waveform.shape figure, axes = plt.subplots(num_channels, 1) if num_channels == 1: axes = [axes] for c in range(num_channels): axes[c].specgram(waveform[c], Fs=sample_rate) if num_channels > 1: axes[c].set_ylabel(f"Channel {c+1}") figure.suptitle(title) .. GENERATED FROM PYTHON SOURCE LINES 214-217 .. code-block:: default plot_specgram(waveform, sample_rate) .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_002.png :alt: Spectrogram :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 219-221 .. code-block:: default Audio(waveform.numpy()[0], rate=sample_rate) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 222-230 Loading from file-like object ----------------------------- The I/O functions support file-like objects. This allows for fetching and decoding audio data from locations within and beyond the local file system. The following examples illustrate this. .. GENERATED FROM PYTHON SOURCE LINES 233-240 .. code-block:: default # Load audio data as HTTP request url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" with requests.get(url, stream=True) as response: waveform, sample_rate = torchaudio.load(_hide_seek(response.raw)) plot_specgram(waveform, sample_rate, title="HTTP datasource") .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_003.png :alt: HTTP datasource :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/audio/src/torchaudio/_backend/utils.py:213: UserWarning: In 2.9, this function's implementation will be changed to use torchaudio.load_with_torchcodec` under the hood. Some parameters like ``normalize``, ``format``, ``buffer_size``, and ``backend`` will be ignored. We recommend that you port your code to rely directly on TorchCodec's decoder instead: https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder.html#torchcodec.decoders.AudioDecoder. warnings.warn( /pytorch/audio/src/torchaudio/_backend/ffmpeg.py:88: UserWarning: torio.io._streaming_media_decoder.StreamingMediaDecoder has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. s = torchaudio.io.StreamReader(src, format, None, buffer_size) .. GENERATED FROM PYTHON SOURCE LINES 242-251 .. code-block:: default # Load audio from tar file tar_path = download_asset("tutorial-assets/VOiCES_devkit.tar.gz") tar_item = "VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" with tarfile.open(tar_path, mode="r") as tarfile_: fileobj = tarfile_.extractfile(tar_item) waveform, sample_rate = torchaudio.load(fileobj) plot_specgram(waveform, sample_rate, title="TAR file") .. image-sg:: /tutorials/images/sphx_glr_audio_io_tutorial_004.png :alt: TAR file :srcset: /tutorials/images/sphx_glr_audio_io_tutorial_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /pytorch/audio/examples/tutorials/audio_io_tutorial.py:244: UserWarning: torchaudio.utils.download.download_asset has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. tar_path = download_asset("tutorial-assets/VOiCES_devkit.tar.gz") 0%| | 0.00/110k [00:00` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_io_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_