.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/audio_feature_augmentation_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_audio_feature_augmentation_tutorial.py: Audio Feature Augmentation ========================== **Author**: `Moto Hira `__ .. GENERATED FROM PYTHON SOURCE LINES 9-18 .. code-block:: default import torch import torchaudio import torchaudio.transforms as T import numpy as np print(torch.__version__) print(torchaudio.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 2.10.0.dev20251013+cu126 2.8.0a0+1d65bbe .. GENERATED FROM PYTHON SOURCE LINES 19-22 Preparation ----------- .. GENERATED FROM PYTHON SOURCE LINES 22-28 .. code-block:: default import matplotlib.pyplot as plt from IPython.display import Audio from torchaudio.utils import _download_asset import torchaudio .. GENERATED FROM PYTHON SOURCE LINES 29-32 In this tutorial, we will use a speech data from `VOiCES dataset `__, which is licensed under Creative Commos BY 4.0. .. GENERATED FROM PYTHON SOURCE LINES 32-62 .. code-block:: default SAMPLE_WAV_SPEECH_PATH = _download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav") def _get_sample(path): return torchaudio.load(path) def get_speech_sample(): return _get_sample(SAMPLE_WAV_SPEECH_PATH) def get_spectrogram( n_fft=400, win_len=None, hop_len=None, power=2.0, ): waveform, _ = get_speech_sample() spectrogram = T.Spectrogram( n_fft=n_fft, win_length=win_len, hop_length=hop_len, center=True, pad_mode="reflect", power=power, ) return spectrogram(waveform) .. GENERATED FROM PYTHON SOURCE LINES 63-73 SpecAugment ----------- `SpecAugment `__ is a popular spectrogram augmentation technique. ``torchaudio`` implements :py:func:`torchaudio.transforms.TimeStretch`, :py:func:`torchaudio.transforms.TimeMasking` and :py:func:`torchaudio.transforms.FrequencyMasking`. .. GENERATED FROM PYTHON SOURCE LINES 75-78 TimeStretch ----------- .. GENERATED FROM PYTHON SOURCE LINES 78-87 .. code-block:: default spec = get_spectrogram(power=None) stretch = T.TimeStretch() spec_12 = stretch(spec, overriding_rate=1.2) spec_09 = stretch(spec, overriding_rate=0.9) .. GENERATED FROM PYTHON SOURCE LINES 88-90 Visualization ~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 90-111 .. code-block:: default def power_to_db(S): S = np.asarray(S) return 10.0 * np.log10(np.maximum(1e-10, S)) def plot(): def plot_spec(ax, spec, title): ax.set_title(title) ax.imshow(power_to_db(spec**2), origin="lower", aspect="auto") fig, axes = plt.subplots(3, 1, sharex=True, sharey=True) plot_spec(axes[0], torch.abs(spec_12[0]), title="Stretched x1.2") plot_spec(axes[1], torch.abs(spec[0]), title="Original") plot_spec(axes[2], torch.abs(spec_09[0]), title="Stretched x0.9") fig.tight_layout() plot() .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_001.png :alt: Stretched x1.2, Original, Stretched x0.9 :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 112-114 Audio Samples ~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 114-124 .. code-block:: default def preview(spec, rate=16000): ispec = T.InverseSpectrogram() waveform = ispec(spec) return Audio(waveform[0].numpy().T, rate=rate) preview(spec) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 126-129 .. code-block:: default preview(spec_12) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 131-134 .. code-block:: default preview(spec_09) .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 135-138 Time and Frequency Masking -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 138-148 .. code-block:: default torch.random.manual_seed(4) time_masking = T.TimeMasking(time_mask_param=80) freq_masking = T.FrequencyMasking(freq_mask_param=80) spec = get_spectrogram() time_masked = time_masking(spec) freq_masked = freq_masking(spec) .. GENERATED FROM PYTHON SOURCE LINES 150-165 .. code-block:: default def plot(): def plot_spec(ax, spec, title): ax.set_title(title) ax.imshow(power_to_db(spec), origin="lower", aspect="auto") fig, axes = plt.subplots(3, 1, sharex=True, sharey=True) plot_spec(axes[0], spec[0], title="Original") plot_spec(axes[1], time_masked[0], title="Masked along time axis") plot_spec(axes[2], freq_masked[0], title="Masked along frequency axis") fig.tight_layout() plot() .. image-sg:: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_002.png :alt: Original, Masked along time axis, Masked along frequency axis :srcset: /tutorials/images/sphx_glr_audio_feature_augmentation_tutorial_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.814 seconds) .. _sphx_glr_download_tutorials_audio_feature_augmentation_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: audio_feature_augmentation_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_feature_augmentation_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_