torchaudio.functional¶
Functions to perform common audio operations.
Utility¶
Turn a spectrogram from the power/amplitude scale to the decibel scale.  | 
|
Turn a tensor from the decibel scale to the power/amplitude scale.  | 
|
Create a frequency bin conversion matrix.  | 
|
Creates a linear triangular filterbank.  | 
|
Create a DCT transformation matrix with shape (  | 
|
Apply a mask along   | 
|
Apply a mask along   | 
|
Encode signal based on mu-law companding.  | 
|
Decode mu-law encoded signal.  | 
|
DEPRECATED  | 
|
Resamples the waveform at the new frequency using bandlimited interpolation.  | 
|
Measure audio loudness according to the ITU-R BS.1770-4 recommendation.  | 
|
Convolves inputs along their last dimension using the direct method.  | 
|
Convolves inputs along their last dimension using FFT.  | 
|
Scales and adds noise to waveform per signal-to-noise ratio.  | 
|
Pre-emphasizes a waveform along its last dimension, i.e. for each signal \(x\) in   | 
|
De-emphasizes a waveform along its last dimension.  | 
|
Adjusts waveform speed.  | 
|
Computes the Fréchet distance between two multivariate normal distributions [Dowson and Landau, 1982].  | 
Forced Alignment¶
DEPRECATED  | 
|
Removes repeated tokens and blank tokens from the given CTC token sequence.  | 
|
Token with time stamps and score.  | 
Filtering¶
Design two-pole all-pass filter.  | 
|
Design two-pole band filter.  | 
|
Design two-pole band-pass filter.  | 
|
Design two-pole band-reject filter.  | 
|
Design a bass tone-control effect.  | 
|
Perform a biquad filter of input tensor.  | 
|
Apply contrast effect.  | 
|
Apply a DC shift to the audio.  | 
|
Apply ISO 908 CD de-emphasis (shelving) IIR filter.  | 
|
Apply dither  | 
|
Design biquad peaking equalizer filter and perform filtering.  | 
|
Apply an IIR filter forward and backward to a waveform.  | 
|
Apply a flanger effect to the audio.  | 
|
Apply amplification or attenuation to the whole waveform.  | 
|
Design biquad highpass filter and perform filtering.  | 
|
Perform an IIR filter by evaluating difference equation, using differentiable implementation developed separately by Yu et al. [Yu and Fazekas, 2023] and Forgione et al. [Forgione and Piga, 2021].  | 
|
Design biquad lowpass filter and perform filtering.  | 
|
Apply a overdrive effect to the audio.  | 
|
Apply a phasing effect to the audio.  | 
|
Apply RIAA vinyl playback equalization.  | 
|
Design a treble tone-control effect.  | 
Feature Extractions¶
Voice Activity Detector.  | 
|
Create a spectrogram or a batch of spectrograms from a raw audio signal.  | 
|
Create an inverse spectrogram or a batch of inverse spectrograms from the provided complex-valued spectrogram.  | 
|
Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation.  | 
|
Given a STFT tensor, speed up in time without modifying pitch by a factor of   | 
|
Shift the pitch of a waveform by   | 
|
Compute delta coefficients of a tensor, usually a spectrogram:  | 
|
Detect pitch frequency.  | 
|
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.  | 
|
Compute the spectral centroid for each channel along the time axis.  | 
Multi-channel¶
Compute cross-channel power spectral density (PSD) matrix.  | 
|
Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights by the method proposed by Souden et, al. [Souden et al., 2009].  | 
|
Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.  | 
|
Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.  | 
|
Estimate the relative transfer function (RTF) or the steering vector by the power method.  | 
|
Apply the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.  | 
Loss¶
DEPRECATED  | 
Metric¶
Calculate the word level edit (Levenshtein) distance between two sequences.  |