You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TypeError: unsupported operand type(s) for +: 'Tensor' and 'list', occurs when wanting to extract the confidence levels for the STT FastConformer model.
Steps/Code to reproduce bug
`class NemoModel(HuggingFaceBaseModel):
def __init__(self, model_name, model_path):
super().__init__(model_name)
self.model_path = model_path
self.model = None
def load_model(self):
self.model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(self.model_path, map_location="mps")
def predict(self, input_paths):
confidence_cfg = ConfidenceConfig(
preserve_frame_confidence=True, # Internally set to true if preserve_token_confidence == True
# or preserve_word_confidence == True
preserve_token_confidence=True, # Internally set to true if preserve_word_confidence == True
preserve_word_confidence=True,
aggregation="prod", # How to aggregate frame scores to token scores and token scores to word scores
exclude_blank=False, # If true, only non-blank emissions contribute to confidence scores
tdt_include_duration=False, # If true, calculate duration confidence for the TDT models
method_cfg=ConfidenceMethodConfig( # Config for per-frame scores calculation (before aggregation)
name="max_prob", # Or "entropy" (default), which usually works better
entropy_type="gibbs", # Used only for name == "entropy". Recommended: "tsallis" (default) or "renyi"
alpha=0.5, # Low values (<1) increase sensitivity, high values decrease sensitivity
entropy_norm="lin", # How to normalize (map to [0,1]) entropy. Default: "exp"
),
)
self.model.change_decoding_strategy(RNNTDecodingConfig(fused_batch_size=-1, strategy="greedy_batch", confidence_cfg=confidence_cfg))
transcriptions = self.model.transcribe(
audio=input_paths, return_hypotheses=True
)
fastconformer_transcriptions = [x for x in transcriptions][0]
return fastconformer_transcriptions
`
This when run with model.transcribe will throw the following error:
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]): TypeError: unsupported operand type(s) for +: 'Tensor' and 'list'
Expected behavior
Expected behaviour is for the zip function to take the torch.tensor not the list, as hyp.timestamps is a Tensor and hyp.frame_confidence is a list of tensors. (Tensor[float], List[Tensor[float]]
Environment overview (please complete the following information)
Environment location: Docker
Method of NeMo install: pip install nemo
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
OS version: MacOS 14.5 (23F79)
PyTorch version: 2.3.1
Python version: 3.10
Additional context
Add any other context about the problem here.
Example: Using MPS
Proposed solution:
replace the following line 633 in nemo/collections/asr/parts/submodules/rnnt_decoding.py:
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]):
with
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + len(hyp.frame_confidence)):
The text was updated successfully, but these errors were encountered:
Describe the bug
TypeError: unsupported operand type(s) for +: 'Tensor' and 'list', occurs when wanting to extract the confidence levels for the STT FastConformer model.
Steps/Code to reproduce bug
`class NemoModel(HuggingFaceBaseModel):
`
This when run with model.transcribe will throw the following error:
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]): TypeError: unsupported operand type(s) for +: 'Tensor' and 'list'
Expected behavior
Expected behaviour is for the zip function to take the torch.tensor not the list, as hyp.timestamps is a Tensor and hyp.frame_confidence is a list of tensors. (Tensor[float], List[Tensor[float]]
Environment overview (please complete the following information)
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
Example: Using MPS
Proposed solution:
replace the following line 633 in nemo/collections/asr/parts/submodules/rnnt_decoding.py:
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + [len(hyp.frame_confidence)]):
with
for ts, te in zip(hyp.timestep, hyp.timestep[1:] + len(hyp.frame_confidence)):
The text was updated successfully, but these errors were encountered: