CTCDecoder¶

class torchaudio.models.decoder.CTCDecoder[source]¶

CTC beam search decoder from Flashlight [Kahn et al., 2022].

Note

To build the decoder, please use the factory function ctc_decoder().

Tutorials using CTCDecoder:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Methods¶

call¶

CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) → List[List[CTCHypothesis]][source]¶

Performs batched offline decoding.

Note

This method performs offline decoding in one go. To perform incremental decoding, please refer to decode_step().

Parameters

emissions (torch.FloatTensor) – CPU tensor of shape (batch, frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.
lengths (Tensor or None, optional) – CPU tensor of shape (batch, ) storing the valid length of in time axis of the output Tensor in each batch.

Returns

List of sorted best hypotheses for each audio sequence in the batch.

Return type

List[List[CTCHypothesis]]

decode_begin¶

CTCDecoder.decode_begin()[source]¶

Initialize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_end¶

CTCDecoder.decode_end()[source]¶

Finalize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_step¶

CTCDecoder.decode_step(emissions: FloatTensor)[source]¶

Perform incremental decoding on top of the curent internal state.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

Parameters: emissions (torch.FloatTensor) – CPU tensor of shape (frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

Example

>>> decoder = torchaudio.models.decoder.ctc_decoder(...)
>>> decoder.decode_begin()
>>> decoder.decode_step(emission1)
>>> decoder.decode_step(emission2)
>>> decoder.decode_end()
>>> result = decoder.get_final_hypothesis()

get_final_hypothesis¶

CTCDecoder.get_final_hypothesis() → List[CTCHypothesis][source]¶

Get the final hypothesis

Returns: List of sorted best hypotheses.
Return type: List[CTCHypothesis]

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

idxs_to_tokens¶

CTCDecoder.idxs_to_tokens(idxs: LongTensor) → List[source]¶

Map raw token IDs into corresponding tokens

Parameters: idxs (LongTensor) – raw token IDs generated from decoder
Returns: tokens corresponding to the input IDs
Return type: List

Support Structures¶

CTCHypothesis¶

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]¶

Represents hypothesis generated by CTC beam search decoder CTCDecoder.

Tutorials using CTCHypothesis:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

tokens: LongTensor¶: Predicted sequence of token IDs. Shape (L, ), where L is the length of the output sequence

words: List[str]¶: List of predicted words.

Note

This attribute is only applicable if a lexicon is provided to the decoder. If decoding without a lexicon, it will be blank. Please refer to tokens and idxs_to_tokens() instead.

score: float¶: Score corresponding to hypothesis

timesteps: IntTensor¶: Timesteps corresponding to the tokens. Shape (L, ), where L is the length of the output sequence

CTCDecoderLM¶

class torchaudio.models.decoder.CTCDecoderLM[source]¶

Language model base class for creating custom language models to use with the decoder.

Tutorials using CTCDecoderLM:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

abstract start(start_with_nothing: bool) → CTCDecoderLMState[source]¶

Initialize or reset the language model.

Parameters: start_with_nothing (bool) – whether or not to start sentence with sil token.
Returns: starting state
Return type: CTCDecoderLMState

abstract score(state: CTCDecoderLMState, usr_token_idx: int) → Tuple[CTCDecoderLMState, float][source]¶

Evaluate the language model based on the current LM state and new word.

Parameters

state (CTCDecoderLMState) – current LM state
usr_token_idx (int) – index of the word

Returns

(CTCDecoderLMState, float)

CTCDecoderLMState:: new LM state
float:: score

abstract finish(state: CTCDecoderLMState) → Tuple[CTCDecoderLMState, float][source]¶

Evaluate end for language model based on current LM state.

Parameters

state (CTCDecoderLMState) – current LM state

Returns

(CTCDecoderLMState, float)

CTCDecoderLMState:: new LM state
float:: score

CTCDecoderLMState¶

class torchaudio.models.decoder.CTCDecoderLMState[source]¶

Language model state.

Tutorials using CTCDecoderLMState:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

property children: Dict[int, CTCDecoderLMState]¶: Map of indices to LM states

child(usr_index: int) → CTCDecoderLMState[source]¶

Returns child corresponding to usr_index, or creates and returns a new state if input index is not found.

Parameters: usr_index (int) – index corresponding to child state
Returns: child state corresponding to usr_index
Return type: CTCDecoderLMState

compare(state: CTCDecoderLMState) → CTCDecoderLMState[source]¶

Compare two language model states.

Parameters: state (CTCDecoderLMState) – LM state to compare against
Returns: 0 if the states are the same, -1 if self is less, +1 if self is greater.
Return type: int

CTCDecoder¶

Methods¶

call¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

Support Structures¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

Docs

Tutorials

Resources

CTCDecoder¶

Methods¶

__call__¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

Support Structures¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

Docs

Tutorials

Resources

call¶