MFCC¶
- class torchaudio.transforms.MFCC(sample_rate: int = 16000, n_mfcc: int = 40, dct_type: int = 2, norm: str = 'ortho', log_mels: bool = False, melkwargs: Optional[dict] = None)[source]¶
- Create the Mel-frequency cepstrum coefficients from an audio signal. - By default, this calculates the MFCC on the DB-scaled Mel spectrogram. This is not the textbook implementation, but is implemented here to give consistency with librosa. - This output depends on the maximum value in the input spectrogram, and so may return different values for an audio clip split into snippets vs. a a full clip. - Parameters:
- sample_rate (int, optional) – Sample rate of audio signal. (Default: - 16000)
- n_mfcc (int, optional) – Number of mfc coefficients to retain. (Default: - 40)
- dct_type (int, optional) – type of DCT (discrete cosine transform) to use. (Default: - 2)
- norm (str, optional) – norm to use. (Default: - "ortho")
- log_mels (bool, optional) – whether to use log-mel spectrograms instead of db-scaled. (Default: - False)
- melkwargs (dict or None, optional) – arguments for MelSpectrogram. (Default: - None)
 
 - Example
- >>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True) >>> transform = transforms.MFCC( >>> sample_rate=sample_rate, >>> n_mfcc=13, >>> melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False}, >>> ) >>> mfcc = transform(waveform) 
 - See also - torchaudio.functional.melscale_fbanks()- The function used to generate the filter banks.- Tutorials using MFCC:
 
