SlidingWindowCmn¶

class torchaudio.transforms.SlidingWindowCmn(cmn_window: int = 600, min_cmn_window: int = 100, center: bool = False, norm_vars: bool = False)[source]¶

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

Parameters

cmn_window (int, optional) – Window in frames for running average CMN computation (int, default = 600)
min_cmn_window (int, optional) – Minimum CMN window used at start of decoding (adds latency only at start). Only applicable if center == false, ignored if center==true (int, default = 100)
center (bool, optional) – If true, use a window centered on the current frame (to the extent possible, modulo end effects). If false, window is to the left. (bool, default = false)
norm_vars (bool, optional) – If true, normalize variance to one. (bool, default = false)

Example

>>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True)
>>> transform = transforms.SlidingWindowCmn(cmn_window=1000)
>>> cmn_waveform = transform(waveform)

forward(specgram: Tensor) → Tensor[source]¶

Parameters: specgram (Tensor) – Tensor of spectrogram of dimension (…, time, freq).
Returns: Tensor of spectrogram of dimension (…, time, freq).
Return type: Tensor

SlidingWindowCmn¶

Docs

Tutorials

Resources