Conversation
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
|
Tests is WIP |
📝 WalkthroughWalkthroughAdds a NeMo ASR integration: a new NemoASRModel client, a FastAPI-based NemoASRServer exposing /v1/audio/transcriptions, orchestration/registry updates to support nemo_asr, and unit tests for tar/datastore and prompt handling. Changes
Sequence DiagramsequenceDiagram
actor User
participant Client as "NemoASRModel\n(Async Client)"
participant HTTPClient as "httpx.AsyncClient\n(Connection Pool)"
participant Server as "NemoASRServer\n(FastAPI)"
participant NeMo as "NeMo ASR\n(Model)"
User->>Client: generate_async(prompt, ...)
Client->>Client: resolve audio path / tar member
Client->>Client: optional duration check / chunk decision
Client->>HTTPClient: POST /v1/audio/transcriptions\n(multipart/form-data)
HTTPClient->>Server: HTTP Request
Server->>Server: validate params & options
alt chunking requested
Server->>NeMo: transcribe(chunk_1)
NeMo->>Server: text_chunk_1
Server->>NeMo: transcribe(chunk_n)
NeMo->>Server: text_chunk_n
Server->>Server: combine results & timestamps
else single transcription
Server->>NeMo: transcribe(audio)
NeMo->>Server: text + metadata
end
Server->>HTTPClient: HTTP Response (OpenAI format)
HTTPClient->>Client: Response JSON
Client->>User: parsed structured result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 222-227: The direct access hypotheses[0][0] in serve_nemo_asr.py
can raise IndexError if hypotheses[0] is empty; update the extraction logic in
the same block that currently raises RuntimeError("No transcription returned
from model") to also validate that hypotheses[0] has at least one element before
indexing, and if not raise a clear RuntimeError (or include the error context) —
locate the lines around hypothesis = hypotheses[0][0] and text = hypothesis.text
and add a guard that checks len(hypotheses[0]) > 0 (or similar) and handles the
empty-inner-list case consistently with the chunked path.
- Around line 157-159: The code accesses hypotheses[0][0].text without ensuring
hypotheses[0] is non-empty which can raise IndexError; update the block in
serve_nemo_asr.py (around where chunk_texts is appended) to first check both
len(hypotheses) > 0 and len(hypotheses[0]) > 0 (or safely find the first
non-empty hypothesis) before reading .text, e.g., guard the access to
hypotheses[0][0].text and only append text.strip() when that nested element
exists.
- Around line 206-258: The verbose_json path references hypothesis.timestep but
hypothesis is only set in the non-chunking branch, so when chunking is used (see
_transcribe_with_chunking) you get a NameError; fix by either making
_transcribe_with_chunking return timestamps/word-level data alongside (text,
inference_time) and use those to populate words (with offsets) when
enable_timestamps is true, or skip timestamp extraction for the chunked path
(i.e., only populate words when hypothesis exists or when the chunked result
includes a timestamps structure); update the code paths handling
chunk_duration_sec, _transcribe_with_chunking return signature, and the
verbose_json block (references: _transcribe_with_chunking, hypothesis,
enable_timestamps, response_format == "verbose_json") accordingly.
🧹 Nitpick comments (5)
nemo_skills/inference/server/serve_nemo_asr.py (2)
83-83: Remove extraneousfprefix from string literal.This f-string has no placeholders.
Suggested fix
- LOG.info(f"Model moved to GPU") + LOG.info("Model moved to GPU")
294-295: Consider logging whenmodelparameter differs from server model.The
modelparameter is documented as ignored for OpenAI API compatibility, which is fine. However, if a user passes a different model name expecting different behavior, they might be confused. Consider logging a debug message when the passed model differs from the actual loaded model.nemo_skills/inference/model/nemo_asr.py (3)
92-93: Unused**kwargssilently ignores unexpected parameters.Per coding guidelines, the code should fail if a user specifies an unsupported argument. Currently, any extra keyword arguments passed to
__init__are silently ignored.Suggested fix: Warn or raise for unexpected kwargs
max_workers: int = 64, - **kwargs, ) -> None: """Initialize NemoASRModel client."""Or if compatibility requires accepting kwargs, at least log a warning:
**kwargs, ) -> None: """Initialize NemoASRModel client.""" + if kwargs: + LOG.warning(f"NemoASRModel ignoring unexpected parameters: {list(kwargs.keys())}")
201-201: Use direct key access for expected"text"field.Per coding guidelines, don't use
.get()for accessing dictionary keys that are expected to be present. The transcription response should always contain"text". Using.get()with a default masks malformed server responses.Suggested fix
- pred_text = result_data.get("text", "") + pred_text = result_data["text"]As per coding guidelines: "Don't use
.get()for accessing dictionary keys if the code expects them to be present. Use direct key access likedata[key_name]to allow clear errors when keys are missing."
268-287:asyncio.get_event_loop()is deprecated;__del__cleanup is fragile.
asyncio.get_event_loop()raisesDeprecationWarningin Python 3.10+ when called without a running loop. Thecreate_taskwithout storing a reference (line 276) may result in the task being garbage-collected before completion.Consider relying on explicit cleanup via the async context manager (
__aexit__) instead of__del__, or use a more robust pattern.Suggested simplification
def __del__(self): """Clean up resources.""" - # Close HTTP client - if hasattr(self, "_client"): - try: - # Try to close gracefully if event loop is available - loop = asyncio.get_event_loop() - if loop.is_running(): - asyncio.create_task(self._client.aclose()) - else: - loop.run_until_complete(self._client.aclose()) - except Exception: - pass # Ignore errors during cleanup - # Close SSH tunnel if hasattr(self, "_tunnel") and self._tunnel: try: self._tunnel.stop() except Exception: pass # Ignore errors during cleanupFor HTTP client cleanup, document that users should use the async context manager or call
await model._client.aclose()explicitly.
|
@Jorjeous can we also add support for reading audio samples from nemo tarred dataset format for inference |
|
Yep, if you feel the need for it |
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
nemo_skills/inference/server/serve_nemo_asr.py (1)
53-56: Missingraise ... fromin threeexceptre-raise sites.All three spots raise a new exception inside an
exceptclause without chaining the cause, losing the original traceback context (Ruff B904). Line 70 also warrantsLOG.exceptioninstead ofLOG.errorto capture the stack trace (Ruff TRY400).♻️ Proposed fixes
# Line 54-56 except ImportError: - raise ImportError("NeMo toolkit is not installed. ...") + raise ImportError("NeMo toolkit is not installed. ...") from None # Line 68-70 except Exception as e: - LOG.error(f"Failed to load model from NGC: {e}") + LOG.exception(f"Failed to load model from NGC: {e}") # Line 141-143 except ImportError: - raise ImportError("soundfile and numpy are required for audio chunking") + raise ImportError("soundfile and numpy are required for audio chunking") from None # Line 349-351 except Exception as e: LOG.error(f"Transcription failed: {e}", exc_info=True) - raise HTTPException(status_code=500, detail=str(e)) + raise HTTPException(status_code=500, detail=str(e)) from eAlso applies to: 140-143, 349-351
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_nemo_asr.py` around lines 53 - 56, Update the three exception handlers that re-raise ImportError to preserve exception chaining and context by using "except Exception as e: raise ImportError('...') from e" (apply this to the nemo_asr import block and the two similar import blocks further down), and replace the LOG.error(...) call at the referenced catch site with LOG.exception(...) so the stack trace is logged; use a consistent exception variable name (e or exc) in each handler when chaining.nemo_skills/inference/model/nemo_asr.py (1)
519-538:except Exceptionin_check_audio_durationsilently maskssoundfileparsing errors.A corrupted or unsupported audio file would cause
sf.info()to raise, which is caught here and silently returnsNone(no chunking). The caller then proceeds to send the file to the server, which may fail with a less informative error. Per coding guidelines, exceptions not normally expected to be raised should not be caught silently.♻️ Proposed fix – narrow the catch
- except Exception as e: - LOG.warning(f"Failed to check audio duration: {e}") + except (OSError, RuntimeError) as e: + LOG.warning(f"Failed to check audio duration: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/model/nemo_asr.py` around lines 519 - 538, The current _check_audio_duration function catches all exceptions around sf.info() which masks parsing errors; change the except block to only catch soundfile-specific exceptions (use sf.SoundFileError or the appropriate soundfile exception type) and log/return None for those cases, while allowing other unexpected exceptions to propagate (re-raise) so callers see real failures; reference sf.info, sf.SoundFileError (or soundfile's specific error class), and chunk_audio_threshold_sec when making the change.tests/test_nemo_asr_support.py (2)
1-253: No test covers thegenerate_asyncend-to-end flow.The core integration path —
generate_asyncsending multipart form data to the server and parsing the response — has no test coverage. This is the primary public interface ofNemoASRModel. Even a basic mock ofhttpx.AsyncClient.postwould provide a safety net for the request-building and response-parsing logic (lines 472–498 innemo_asr.py).Would you like me to draft a
pytest-asyncio-based test that mockshttpx.AsyncClient.postto cover thegenerate_asynchappy path and the HTTP error path?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_nemo_asr_support.py` around lines 1 - 253, Add an async pytest that exercises NemoASRModel.generate_async by mocking httpx.AsyncClient.post to simulate the server happy path and an error path: create a test using pytest-asyncio that constructs a NemoASRModel, patches httpx.AsyncClient.post to capture the sent multipart/form-data request (asserting file payload, form fields such as language/timestamps, and headers), and returns an httpx.Response-like object with status_code 200 and a JSON body matching the server hypothesis structure so generate_async returns the expected transcription; also add a second case where the mock returns a non-200 status (or raises httpx.HTTPError) and assert generate_async raises the appropriate error. Ensure you reference NemoASRModel.generate_async and patch httpx.AsyncClient.post (or the instance on model._client) so the test covers request-building and response-parsing end-to-end.
77-87: Usepytest.raisesinstead of the manualtry/except/AssertionErrorpattern.The current approach produces confusing failure output and is more verbose.
pytest.raisesis the idiomatic replacement.♻️ Proposed refactor
+import pytest - try: - NemoASRServer._extract_first_hypothesis([]) - raise AssertionError("Expected RuntimeError for empty hypotheses") - except RuntimeError as e: - assert "No transcription returned" in str(e) - - try: - NemoASRServer._extract_first_hypothesis([[]]) - raise AssertionError("Expected RuntimeError for empty inner hypotheses") - except RuntimeError as e: - assert "empty transcription hypotheses" in str(e) + with pytest.raises(RuntimeError, match="No transcription returned"): + NemoASRServer._extract_first_hypothesis([]) + + with pytest.raises(RuntimeError, match="empty transcription hypotheses"): + NemoASRServer._extract_first_hypothesis([[]])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_nemo_asr_support.py` around lines 77 - 87, Replace the manual try/except/AssertionError blocks testing NemoASRServer._extract_first_hypothesis with pytest.raises context managers: use "with pytest.raises(RuntimeError, match='No transcription returned')" for the first call with [] and "with pytest.raises(RuntimeError, match='empty transcription hypotheses')" for the second call with [[]] (or alternatively capture the context as e and assert on str(e.value)). This will make tests concise and produce clearer failure output while still verifying the exact exception messages from _extract_first_hypothesis.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/inference/model/nemo_asr.py`:
- Around line 549-556: The __del__ method currently only stops the SSH tunnel
and leaves httpx.AsyncClient connection pools open; update __del__ to also close
self._client by checking hasattr(self, "_client") and not
self._client.is_closed, then attempt to close it: if the event loop is running
schedule self._client.aclose() via loop.create_task(...) otherwise call
loop.run_until_complete(self._client.aclose()); keep the existing try/except
swallowing errors and retain the _tunnel.stop() cleanup; alternatively, if you
prefer, document that callers must use "async with NemoASRModel(...)" to avoid
leaks.
- Around line 106-109: The NemoASRModel currently logs and ignores unsupported
parameters (tokenizer and output_dir) via LOG.warning; change this to raise a
clear exception so callers cannot silently pass unsupported args: inside the
NemoASRModel constructor or initializer where tokenizer and output_dir are
checked, replace the LOG.warning calls for tokenizer and output_dir with raised
exceptions (e.g., ValueError or a custom ConfigError) that include the parameter
name and a brief message that NemoASRModel does not accept that parameter,
ensuring callers receive an immediate failure instead of a silent warning.
- Around line 233-236: The loop that filters glob results into tar_files
currently drops non-".tar" paths silently; update the code in the
function/method containing the loop that iterates "for path_str in
sorted(matches):" so that when a Path(path_str).expanduser() has a suffix !=
".tar" you append a warning log (e.g., logger.warning or processLogger.warn
depending on the module logger) mentioning the skipped file and the reason; keep
existing behavior of appending str(path.absolute()) to tar_files for ".tar"
files and ensure the logger symbol you use (logger/processLogger) is the
module's configured logger to make skipped-file diagnostics visible.
In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 122-199: _transcribe_with_chunking is async but performs blocking
I/O and CPU work (sf.read, sf.write, os.path.exists/unlink and calling the
synchronous self._transcribe_single), which blocks the event loop; wrap these
blocking calls in asyncio.to_thread (e.g., await asyncio.to_thread(sf.read,
audio_path), await asyncio.to_thread(sf.write, chunk_path, audio_chunk,
sampling_rate), await asyncio.to_thread(os.path.exists, chunk_path) / await
asyncio.to_thread(os.unlink, chunk_path)) and offload the sync inference call by
awaiting await asyncio.to_thread(self._transcribe_single, [chunk_path],
enable_timestamps, language); also apply the same to the synchronous call to
self._transcribe_single inside the async transcribe method so all blocking file
I/O and inference run in threads instead of the event loop.
- Line 112: The current truthy check `if language:` in serve_nemo_asr.py drops
an explicit empty-string language; change that check to an explicit None check
(`if language is not None:`) so an empty string is treated as a valid value, and
audit any other occurrences in the same function/method that reference the
variable `language` to replace truthiness checks with `is not None` where intent
is to allow empty-string inputs.
- Line 222: The temporary file is always created with suffix=".wav" which can
mislead NeMo ASR format detection; change the NamedTemporaryFile call to
preserve the uploaded file's original extension by extracting it (e.g., ext =
os.path.splitext(original_filename)[1].lower()) and passing suffix=ext (with a
safe default like ".wav" if ext is empty or not in an allowed set). Update the
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") usage to use the
computed suffix and validate/sanitize the extension before creating tmp_file so
tmp_file has the correct format for downstream NeMo functions.
---
Nitpick comments:
In `@nemo_skills/inference/model/nemo_asr.py`:
- Around line 519-538: The current _check_audio_duration function catches all
exceptions around sf.info() which masks parsing errors; change the except block
to only catch soundfile-specific exceptions (use sf.SoundFileError or the
appropriate soundfile exception type) and log/return None for those cases, while
allowing other unexpected exceptions to propagate (re-raise) so callers see real
failures; reference sf.info, sf.SoundFileError (or soundfile's specific error
class), and chunk_audio_threshold_sec when making the change.
In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 53-56: Update the three exception handlers that re-raise
ImportError to preserve exception chaining and context by using "except
Exception as e: raise ImportError('...') from e" (apply this to the nemo_asr
import block and the two similar import blocks further down), and replace the
LOG.error(...) call at the referenced catch site with LOG.exception(...) so the
stack trace is logged; use a consistent exception variable name (e or exc) in
each handler when chaining.
In `@tests/test_nemo_asr_support.py`:
- Around line 1-253: Add an async pytest that exercises
NemoASRModel.generate_async by mocking httpx.AsyncClient.post to simulate the
server happy path and an error path: create a test using pytest-asyncio that
constructs a NemoASRModel, patches httpx.AsyncClient.post to capture the sent
multipart/form-data request (asserting file payload, form fields such as
language/timestamps, and headers), and returns an httpx.Response-like object
with status_code 200 and a JSON body matching the server hypothesis structure so
generate_async returns the expected transcription; also add a second case where
the mock returns a non-200 status (or raises httpx.HTTPError) and assert
generate_async raises the appropriate error. Ensure you reference
NemoASRModel.generate_async and patch httpx.AsyncClient.post (or the instance on
model._client) so the test covers request-building and response-parsing
end-to-end.
- Around line 77-87: Replace the manual try/except/AssertionError blocks testing
NemoASRServer._extract_first_hypothesis with pytest.raises context managers: use
"with pytest.raises(RuntimeError, match='No transcription returned')" for the
first call with [] and "with pytest.raises(RuntimeError, match='empty
transcription hypotheses')" for the second call with [[]] (or alternatively
capture the context as e and assert on str(e.value)). This will make tests
concise and produce clearer failure output while still verifying the exact
exception messages from _extract_first_hypothesis.
| if tokenizer is not None: | ||
| LOG.warning("NemoASRModel does not use tokenizer. Ignoring tokenizer argument.") | ||
| if output_dir is not None: | ||
| LOG.warning("NemoASRModel does not use output_dir. Ignoring output_dir argument.") |
There was a problem hiding this comment.
tokenizer and output_dir silently accepted via LOG.warning instead of failing — coding guideline violation.
Per coding guidelines: "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided." Issuing a warning allows callers to silently misconfigure the model without any breakage signal.
🐛 Proposed fix
- if tokenizer is not None:
- LOG.warning("NemoASRModel does not use tokenizer. Ignoring tokenizer argument.")
- if output_dir is not None:
- LOG.warning("NemoASRModel does not use output_dir. Ignoring output_dir argument.")
+ if tokenizer is not None:
+ raise ValueError("NemoASRModel does not support a tokenizer argument.")
+ if output_dir is not None:
+ raise ValueError("NemoASRModel does not support an output_dir argument.")As per coding guidelines, "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/model/nemo_asr.py` around lines 106 - 109, The
NemoASRModel currently logs and ignores unsupported parameters (tokenizer and
output_dir) via LOG.warning; change this to raise a clear exception so callers
cannot silently pass unsupported args: inside the NemoASRModel constructor or
initializer where tokenizer and output_dir are checked, replace the LOG.warning
calls for tokenizer and output_dir with raised exceptions (e.g., ValueError or a
custom ConfigError) that include the parameter name and a brief message that
NemoASRModel does not accept that parameter, ensuring callers receive an
immediate failure instead of a silent warning.
| for path_str in sorted(matches): | ||
| path = Path(path_str).expanduser() | ||
| if path.suffix == ".tar": | ||
| tar_files.append(str(path.absolute())) |
There was a problem hiding this comment.
Non-.tar files in glob results are silently dropped with no warning.
If a glob pattern like "audio_*" matches both .tar and non-tar files, only .tar files are kept. Non-matching files are silently discarded, with no log message. A misconfigured path (e.g., "*.wav") would result in an empty tar_files list and cryptic FileNotFoundError at inference time.
🐛 Proposed fix – warn on skipped files
for path_str in sorted(matches):
path = Path(path_str).expanduser()
if path.suffix == ".tar":
tar_files.append(str(path.absolute()))
+ else:
+ LOG.warning("Skipping non-tar file in tarred_audio_filepaths: %s", path_str)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/model/nemo_asr.py` around lines 233 - 236, The loop
that filters glob results into tar_files currently drops non-".tar" paths
silently; update the code in the function/method containing the loop that
iterates "for path_str in sorted(matches):" so that when a
Path(path_str).expanduser() has a suffix != ".tar" you append a warning log
(e.g., logger.warning or processLogger.warn depending on the module logger)
mentioning the skipped file and the reason; keep existing behavior of appending
str(path.absolute()) to tar_files for ".tar" files and ensure the logger symbol
you use (logger/processLogger) is the module's configured logger to make
skipped-file diagnostics visible.
| def __del__(self): | ||
| """Clean up resources.""" | ||
| # Close SSH tunnel | ||
| if hasattr(self, "_tunnel") and self._tunnel: | ||
| try: | ||
| self._tunnel.stop() | ||
| except Exception: | ||
| pass # Ignore errors during cleanup |
There was a problem hiding this comment.
__del__ closes the SSH tunnel but not the httpx.AsyncClient, leaving connection-pool handles open for callers not using the async context manager.
Because __del__ is synchronous, await self._client.aclose() cannot be called, so the HTTP connection pool leaks. The test workaround asyncio.run(model._client.aclose()) at line 27 of the test file confirms this gap. Callers that don't use async with NemoASRModel(...) will leak connections for the lifetime of the process.
Consider documenting that async with is required, or adding a synchronous fallback using asyncio.get_event_loop().run_until_complete:
def __del__(self):
if hasattr(self, "_client") and not self._client.is_closed:
try:
loop = asyncio.get_event_loop()
if loop.is_running():
loop.create_task(self._client.aclose())
else:
loop.run_until_complete(self._client.aclose())
except Exception:
pass
if hasattr(self, "_tunnel") and self._tunnel:
try:
self._tunnel.stop()
except Exception:
pass🧰 Tools
🪛 Ruff (0.15.1)
[error] 555-556: try-except-pass detected, consider logging the exception
(S110)
[warning] 555-555: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/model/nemo_asr.py` around lines 549 - 556, The __del__
method currently only stops the SSH tunnel and leaves httpx.AsyncClient
connection pools open; update __del__ to also close self._client by checking
hasattr(self, "_client") and not self._client.is_closed, then attempt to close
it: if the event loop is running schedule self._client.aclose() via
loop.create_task(...) otherwise call
loop.run_until_complete(self._client.aclose()); keep the existing try/except
swallowing errors and retain the _tunnel.stop() cleanup; alternatively, if you
prefer, document that callers must use "async with NemoASRModel(...)" to avoid
leaks.
| "return_hypotheses": True, | ||
| "timestamps": enable_timestamps, | ||
| } | ||
| if language: |
There was a problem hiding this comment.
if language: silently drops an explicit empty-string language.
if language: evaluates to False for "", so an explicitly passed empty string is silently discarded. Prefer an explicit None check throughout.
🐛 Proposed fix
- if language:
+ if language is not None:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if language: | |
| if language is not None: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/server/serve_nemo_asr.py` at line 112, The current
truthy check `if language:` in serve_nemo_asr.py drops an explicit empty-string
language; change that check to an explicit None check (`if language is not
None:`) so an empty string is treated as a valid value, and audit any other
occurrences in the same function/method that reference the variable `language`
to replace truthiness checks with `is not None` where intent is to allow
empty-string inputs.
| async def _transcribe_with_chunking( | ||
| self, | ||
| audio_path: str, | ||
| chunk_duration_sec: float, | ||
| enable_timestamps: bool = False, | ||
| language: Optional[str] = None, | ||
| ) -> tuple[str, float]: | ||
| """Transcribe long audio by chunking it into smaller segments. | ||
|
|
||
| Args: | ||
| audio_path: Path to audio file | ||
| chunk_duration_sec: Duration of each chunk in seconds | ||
| enable_timestamps: Whether to enable timestamps | ||
|
|
||
| Returns: | ||
| Tuple of (transcribed_text, total_inference_time) | ||
| """ | ||
| try: | ||
| import numpy as np | ||
| import soundfile as sf | ||
| except ImportError: | ||
| raise ImportError("soundfile and numpy are required for audio chunking") | ||
|
|
||
| # Load audio file | ||
| audio_array, sampling_rate = sf.read(audio_path) | ||
| duration = len(audio_array) / sampling_rate | ||
|
|
||
| LOG.info(f"Chunking audio ({duration:.1f}s) into segments of {chunk_duration_sec}s") | ||
|
|
||
| # Calculate chunks | ||
| chunk_samples = int(chunk_duration_sec * sampling_rate) | ||
| num_chunks = int(np.ceil(len(audio_array) / chunk_samples)) | ||
|
|
||
| chunks = [] | ||
| for i in range(num_chunks): | ||
| start = i * chunk_samples | ||
| end = min((i + 1) * chunk_samples, len(audio_array)) | ||
| chunk = audio_array[start:end] | ||
|
|
||
| # Merge tiny trailing chunks | ||
| min_chunk_samples = int(0.5 * sampling_rate) # 0.5 second minimum | ||
| if len(chunk) < min_chunk_samples and chunks: | ||
| chunks[-1] = np.concatenate([chunks[-1], chunk]) | ||
| else: | ||
| chunks.append(chunk) | ||
|
|
||
| LOG.info(f"Created {len(chunks)} audio chunks") | ||
|
|
||
| # Transcribe each chunk | ||
| chunk_texts = [] | ||
| total_time = 0.0 | ||
|
|
||
| for chunk_idx, audio_chunk in enumerate(chunks): | ||
| # Save chunk to temporary file | ||
| chunk_path = f"{audio_path}.chunk_{chunk_idx}.wav" | ||
| try: | ||
| sf.write(chunk_path, audio_chunk, sampling_rate) | ||
|
|
||
| # Transcribe chunk | ||
| start_time = time.time() | ||
| hypotheses = self._transcribe_single([chunk_path], enable_timestamps, language) | ||
| chunk_time = time.time() - start_time | ||
| total_time += chunk_time | ||
|
|
||
| hypothesis = self._extract_first_hypothesis(hypotheses) | ||
| text = hypothesis.text | ||
| chunk_texts.append(text.strip()) | ||
| LOG.debug(f"Chunk {chunk_idx + 1}/{len(chunks)}: {text[:50]}...") | ||
|
|
||
| finally: | ||
| # Clean up chunk file | ||
| if os.path.exists(chunk_path): | ||
| os.unlink(chunk_path) | ||
|
|
||
| # Concatenate all chunk transcriptions | ||
| full_text = " ".join(chunk_texts) | ||
|
|
||
| return full_text, total_time |
There was a problem hiding this comment.
Blocking synchronous operations inside async handlers will stall the event loop.
_transcribe_with_chunking is declared async but contains zero await calls — sf.read (line 146), sf.write (line 178), and self._transcribe_single (line 182) are all synchronous blocking calls. Identically, self._transcribe_single at line 246 inside the async transcribe method blocks the event loop directly. Under uvicorn, any other in-flight request (e.g., health-check) is frozen until inference finishes.
Use asyncio.to_thread to offload blocking work:
🔧 Proposed fix – offload blocking inference and file I/O to a thread
+import asyncio
# _transcribe_with_chunking
- audio_array, sampling_rate = sf.read(audio_path)
+ audio_array, sampling_rate = await asyncio.to_thread(sf.read, audio_path)
- sf.write(chunk_path, audio_chunk, sampling_rate)
-
- start_time = time.time()
- hypotheses = self._transcribe_single([chunk_path], enable_timestamps, language)
+ await asyncio.to_thread(sf.write, chunk_path, audio_chunk, sampling_rate)
+
+ start_time = time.time()
+ hypotheses = await asyncio.to_thread(
+ self._transcribe_single, [chunk_path], enable_timestamps, language
+ )
# transcribe method
- hypotheses = self._transcribe_single([tmp_path], enable_timestamps, language)
+ hypotheses = await asyncio.to_thread(
+ self._transcribe_single, [tmp_path], enable_timestamps, language
+ )Also applies to: 243-247
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 143-143: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
[warning] 143-143: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/server/serve_nemo_asr.py` around lines 122 - 199,
_transcribe_with_chunking is async but performs blocking I/O and CPU work
(sf.read, sf.write, os.path.exists/unlink and calling the synchronous
self._transcribe_single), which blocks the event loop; wrap these blocking calls
in asyncio.to_thread (e.g., await asyncio.to_thread(sf.read, audio_path), await
asyncio.to_thread(sf.write, chunk_path, audio_chunk, sampling_rate), await
asyncio.to_thread(os.path.exists, chunk_path) / await
asyncio.to_thread(os.unlink, chunk_path)) and offload the sync inference call by
awaiting await asyncio.to_thread(self._transcribe_single, [chunk_path],
enable_timestamps, language); also apply the same to the synchronous call to
self._transcribe_single inside the async transcribe method so all blocking file
I/O and inference run in threads instead of the event loop.
| Transcription result in OpenAI-compatible format | ||
| """ | ||
| # Save uploaded file to temporary location | ||
| with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file: |
There was a problem hiding this comment.
Temporary file always gets a .wav suffix regardless of the uploaded file format.
NeMo ASR backends (Canary, Parakeet) may use the file extension for format detection. An uploaded FLAC or MP3 would be saved as .wav, potentially causing a decoding error or silently incorrect decoding.
🐛 Proposed fix – preserve the original extension
- with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
+ original_suffix = os.path.splitext(audio_file.filename or "audio.wav")[1] or ".wav"
+ with tempfile.NamedTemporaryFile(delete=False, suffix=original_suffix) as tmp_file:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/server/serve_nemo_asr.py` at line 222, The temporary
file is always created with suffix=".wav" which can mislead NeMo ASR format
detection; change the NamedTemporaryFile call to preserve the uploaded file's
original extension by extracting it (e.g., ext =
os.path.splitext(original_filename)[1].lower()) and passing suffix=ext (with a
safe default like ".wav" if ext is empty or not in an allowed set). Update the
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") usage to use the
computed suffix and validate/sanitize the extension before creating tmp_file so
tmp_file has the correct format for downstream NeMo functions.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/inference/model/__init__.py`:
- Around line 23-24: The import order is wrong causing ruff to auto-fix; move
the `from .nemo_asr import NemoASRModel` line so imports are alphabetically
sorted among the NeMo-related imports (place `.nemo_asr` after `.megatron` and
before `.openai`, keeping it adjacent to other speech model imports like
`.asr_nim`/`.audio_utils` as appropriate) and commit the updated file so the
ruff-check pipeline passes.
| # NeMo models (speech) | ||
| from .nemo_asr import NemoASRModel |
There was a problem hiding this comment.
Fix import order to resolve the ruff-check pipeline failure.
The CI pipeline reports that ruff auto-fixed the file due to an import ordering violation. The .nemo_asr import is placed between .asr_nim and .audio_utils, but alphabetically it belongs after .megatron and before .openai. Commit the ruff-fixed version before merging.
🔧 Proposed fix – move the import to its sorted position
-# NeMo models (speech)
-from .nemo_asr import NemoASRModel
-
# Audio utilities
from .audio_utils import (
audio_file_to_base64,
chunk_audio,
load_audio_file,
make_audio_content_block,
save_audio_chunk_to_base64,
)
from .azure import AzureOpenAIModel
...
from .megatron import MegatronModel
+from .nemo_asr import NemoASRModel
from .openai import OpenAIModel🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/model/__init__.py` around lines 23 - 24, The import
order is wrong causing ruff to auto-fix; move the `from .nemo_asr import
NemoASRModel` line so imports are alphabetically sorted among the NeMo-related
imports (place `.nemo_asr` after `.megatron` and before `.openai`, keeping it
adjacent to other speech model imports like `.asr_nim`/`.audio_utils` as
appropriate) and commit the updated file so the ruff-check pipeline passes.
Summary by CodeRabbit
New Features
Tests