STT_EN_FASTCONFORMER_TRANSDUCER_XLARG - Throws error for Tensor + List operation in confidence calculation #10066

Vladi-SmartAssets · 2024-08-07T11:57:36Z

Describe the bug

TypeError: unsupported operand type(s) for +: 'Tensor' and 'list', occurs when wanting to extract the confidence levels for the STT FastConformer model.

Steps/Code to reproduce bug

`class NemoModel(HuggingFaceBaseModel):

def __init__(self, model_name, model_path):
    super().__init__(model_name)
    self.model_path = model_path
    self.model = None

def load_model(self):
    self.model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(self.model_path, map_location="mps")

def predict(self, input_paths):

    confidence_cfg = ConfidenceConfig(
        preserve_frame_confidence=True,  # Internally set to true if preserve_token_confidence == True
        # or preserve_word_confidence == True
        preserve_token_confidence=True,  # Internally set to true if preserve_word_confidence == True
        preserve_word_confidence=True,
        aggregation="prod",  # How to aggregate frame scores to token scores and token scores to word scores
        exclude_blank=False,  # If true, only non-blank emissions contribute to confidence scores
        tdt_include_duration=False,  # If true, calculate duration confidence for the TDT models
        method_cfg=ConfidenceMethodConfig(  # Config for per-frame scores calculation (before aggregation)
            name="max_prob",  # Or "entropy" (default), which usually works better
            entropy_type="gibbs",  # Used only for name == "entropy". Recommended: "tsallis" (default) or "renyi"
            alpha=0.5,  # Low values (<1) increase sensitivity, high values decrease sensitivity
            entropy_norm="lin",  # How to normalize (map to [0,1]) entropy. Default: "exp"
        ),
    )
    self.model.change_decoding_strategy(RNNTDecodingConfig(fused_batch_size=-1, strategy="greedy_batch", confidence_cfg=confidence_cfg))

    transcriptions = self.model.transcribe(
        audio=input_paths, return_hypotheses=True
    )
    
    fastconformer_transcriptions = [x for x in transcriptions][0]

    return fastconformer_transcriptions

`
This when run with model.transcribe will throw the following error:

for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]): TypeError: unsupported operand type(s) for +: 'Tensor' and 'list'

Expected behavior

Expected behaviour is for the zip function to take the torch.tensor not the list, as hyp.timestamps is a Tensor and hyp.frame_confidence is a list of tensors. (Tensor[float], List[Tensor[float]]

Environment overview (please complete the following information)

Environment location: Docker
Method of NeMo install: pip install nemo

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

OS version: MacOS 14.5 (23F79)
PyTorch version: 2.3.1
Python version: 3.10

Additional context

Add any other context about the problem here.
Example: Using MPS

Proposed solution:
replace the following line 633 in nemo/collections/asr/parts/submodules/rnnt_decoding.py:

for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]):

with

for ts, te in zip(hyp.timestep, hyp.timestep[1:] + len(hyp.frame_confidence)):

The text was updated successfully, but these errors were encountered:

GNroy · 2024-08-22T17:25:35Z

@Vladi-SmartAssets Hi,
I cannot reproduce the issue in the latest main.
What NeMo version are you using?

Vladi-SmartAssets added the bug Something isn't working label Aug 7, 2024

GNroy self-assigned this Aug 17, 2024

GNroy linked a pull request Sep 18, 2024 that will close this issue

RNN-T confidence fix #10519

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STT_EN_FASTCONFORMER_TRANSDUCER_XLARG - Throws error for Tensor + List operation in confidence calculation #10066

STT_EN_FASTCONFORMER_TRANSDUCER_XLARG - Throws error for Tensor + List operation in confidence calculation #10066

Vladi-SmartAssets commented Aug 7, 2024

GNroy commented Aug 22, 2024

STT_EN_FASTCONFORMER_TRANSDUCER_XLARG - Throws error for Tensor + List operation in confidence calculation #10066

STT_EN_FASTCONFORMER_TRANSDUCER_XLARG - Throws error for Tensor + List operation in confidence calculation #10066

Comments

Vladi-SmartAssets commented Aug 7, 2024

GNroy commented Aug 22, 2024