RNNTBeamSearch¶
- class torchaudio.models.RNNTBeamSearch(model: RNNT, blank: int, temperature: float = 1.0, hypo_sort_key: Optional[Callable[[Tuple[List[int], Tensor, List[List[Tensor]], float]], float]] = None, step_max_tokens: int = 100)[source]¶
- Beam search decoder for RNN-T model. - See also - torchaudio.pipelines.RNNTBundle: ASR pipeline with pretrained model.
 - Parameters:
- model (RNNT) – RNN-T model to use. 
- blank (int) – index of blank token in vocabulary. 
- temperature (float, optional) – temperature to apply to joint network output. Larger values yield more uniform samples. (Default: 1.0) 
- hypo_sort_key (Callable[[Hypothesis], float] or None, optional) – callable that computes a score for a given hypothesis to rank hypotheses by. If - None, defaults to callable that returns hypothesis score normalized by token sequence length. (Default: None)
- step_max_tokens (int, optional) – maximum number of tokens to emit per input time step. (Default: 100) 
 
 - Tutorials using RNNTBeamSearch:
 
Methods¶
forward¶
- RNNTBeamSearch.forward(input: Tensor, length: Tensor, beam_width: int) List[Tuple[List[int], Tensor, List[List[Tensor]], float]][source]¶
- Performs beam search for the given input sequence. - T: number of frames; D: feature dimension of each frame. - Parameters:
- input (torch.Tensor) – sequence of input frames, with shape (T, D) or (1, T, D). 
- length (torch.Tensor) – number of valid frames in input sequence, with shape () or (1,). 
- beam_width (int) – beam size to use during search. 
 
- Returns:
- top- - beam_widthhypotheses found by beam search.
- Return type:
- List[Hypothesis] 
 
infer¶
- RNNTBeamSearch.infer(input: Tensor, length: Tensor, beam_width: int, state: Optional[List[List[Tensor]]] = None, hypothesis: Optional[List[Tuple[List[int], Tensor, List[List[Tensor]], float]]] = None) Tuple[List[Tuple[List[int], Tensor, List[List[Tensor]], float]], List[List[Tensor]]][source]¶
- Performs beam search for the given input sequence in streaming mode. - T: number of frames; D: feature dimension of each frame. - Parameters:
- input (torch.Tensor) – sequence of input frames, with shape (T, D) or (1, T, D). 
- length (torch.Tensor) – number of valid frames in input sequence, with shape () or (1,). 
- beam_width (int) – beam size to use during search. 
- state (List[List[torch.Tensor]] or None, optional) – list of lists of tensors representing transcription network internal state generated in preceding invocation. (Default: - None)
- hypothesis (List[Hypothesis] or None) – hypotheses from preceding invocation to seed search with. (Default: - None)
 
- Returns:
- List[Hypothesis]
- top- - beam_widthhypotheses found by beam search.
- List[List[torch.Tensor]]
- list of lists of tensors representing transcription network internal state generated in current invocation. 
 
- Return type:
- (List[Hypothesis], List[List[torch.Tensor]]) 
 
 
