Shortcuts

CTCDecoder

class torchaudio.models.decoder.CTCDecoder[source]

CTC beam search decoder from Flashlight [Kahn et al., 2022].

This feature supports the following devices: CPU

Note

To build the decoder, please use the factory function ctc_decoder().

Tutorials using CTCDecoder:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Methods

__call__

CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) List[List[CTCHypothesis]][source]

Performs batched offline decoding.

Note

This method performs offline decoding in one go. To perform incremental decoding, please refer to decode_step().

Parameters
  • emissions (torch.FloatTensor) – CPU tensor of shape (batch, frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

  • lengths (Tensor or None, optional) – CPU tensor of shape (batch, ) storing the valid length of in time axis of the output Tensor in each batch.

Returns

List of sorted best hypotheses for each audio sequence in the batch.

Return type

List[List[CTCHypothesis]]

decode_begin

CTCDecoder.decode_begin()[source]

Initialize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_end

CTCDecoder.decode_end()[source]

Finalize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_step

CTCDecoder.decode_step(emissions: FloatTensor)[source]

Perform incremental decoding on top of the curent internal state.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

Parameters

emissions (torch.FloatTensor) – CPU tensor of shape (frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

Example

>>> decoder = torchaudio.models.decoder.ctc_decoder(...)
>>> decoder.decode_begin()
>>> decoder.decode_step(emission1)
>>> decoder.decode_step(emission2)
>>> decoder.decode_end()
>>> result = decoder.get_final_hypothesis()

get_final_hypothesis

CTCDecoder.get_final_hypothesis() List[CTCHypothesis][source]

Get the final hypothesis

Returns

List of sorted best hypotheses.

Return type

List[CTCHypothesis]

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

idxs_to_tokens

CTCDecoder.idxs_to_tokens(idxs: LongTensor) List[source]

Map raw token IDs into corresponding tokens

Parameters

idxs (LongTensor) – raw token IDs generated from decoder

Returns

tokens corresponding to the input IDs

Return type

List

Support Structures

CTCHypothesis

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]

Represents hypothesis generated by CTC beam search decoder CTCDecoder.

Tutorials using CTCHypothesis:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
tokens: LongTensor

Predicted sequence of token IDs. Shape (L, ), where L is the length of the output sequence

words: List[str]

List of predicted words.

Note

This attribute is only applicable if a lexicon is provided to the decoder. If decoding without a lexicon, it will be blank. Please refer to tokens and idxs_to_tokens() instead.

score: float

Score corresponding to hypothesis

timesteps: IntTensor

Timesteps corresponding to the tokens. Shape (L, ), where L is the length of the output sequence

CTCDecoderLM

class torchaudio.models.decoder.CTCDecoderLM[source]

Language model base class for creating custom language models to use with the decoder.

Tutorials using CTCDecoderLM:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
abstract start(start_with_nothing: bool) CTCDecoderLMState[source]

Initialize or reset the language model.

Parameters

start_with_nothing (bool) – whether or not to start sentence with sil token.

Returns

starting state

Return type

CTCDecoderLMState

abstract score(state: CTCDecoderLMState, usr_token_idx: int) Tuple[CTCDecoderLMState, float][source]

Evaluate the language model based on the current LM state and new word.

Parameters
Returns

(CTCDecoderLMState, float)
CTCDecoderLMState:

new LM state

float:

score

abstract finish(state: CTCDecoderLMState) Tuple[CTCDecoderLMState, float][source]

Evaluate end for language model based on current LM state.

Parameters

state (CTCDecoderLMState) – current LM state

Returns

(CTCDecoderLMState, float)
CTCDecoderLMState:

new LM state

float:

score

CTCDecoderLMState

class torchaudio.models.decoder.CTCDecoderLMState[source]

Language model state.

Tutorials using CTCDecoderLMState:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
property children: Dict[int, CTCDecoderLMState]

Map of indices to LM states

child(usr_index: int) CTCDecoderLMState[source]

Returns child corresponding to usr_index, or creates and returns a new state if input index is not found.

Parameters

usr_index (int) – index corresponding to child state

Returns

child state corresponding to usr_index

Return type

CTCDecoderLMState

compare(state: CTCDecoderLMState) CTCDecoderLMState[source]

Compare two language model states.

Parameters

state (CTCDecoderLMState) – LM state to compare against

Returns

0 if the states are the same, -1 if self is less, +1 if self is greater.

Return type

int

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources