Nemo-ASR models server [do not merge] by Jorjeous · Pull Request #1211 · NVIDIA-NeMo/Skills

Jorjeous · 2026-02-04T16:25:40Z

Summary by CodeRabbit

New Features
- NeMo ASR speech-to-text added with a client and an OpenAI-compatible server API.
- Supports chunked transcription, language selection, multiple response formats, optional timestamps, and GPU-enabled server runtime.
- Robust handling of tarred audio datastores and remote URIs with automatic materialization and flexible prompt/audio reference parsing.
Tests
- Comprehensive unit tests covering audio resolution, tar/shard handling, prompt parsing, and transcription flows.

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous · 2026-02-04T16:26:52Z

Tests is WIP

greptile-apps

_{2 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/inference/server/serve_nemo_asr.py

greptile-apps · 2026-02-04T16:28:21Z

nemo_skills/inference/server/serve_nemo_asr.py

+"""
+
+import argparse
+import asyncio


asyncio imported but never used

nemo_skills/inference/model/nemo_asr.py

coderabbitai · 2026-02-04T16:30:40Z

📝 Walkthrough

Walkthrough

Adds a NeMo ASR integration: a new NemoASRModel client, a FastAPI-based NemoASRServer exposing /v1/audio/transcriptions, orchestration/registry updates to support nemo_asr, and unit tests for tar/datastore and prompt handling.

Changes

Cohort / File(s)	Summary
Model Registry Integration `nemo_skills/inference/model/__init__.py`	Imports and registers `NemoASRModel` under the `"nemo_asr"` entry in the public models dictionary.
ASR Client Implementation `nemo_skills/inference/model/nemo_asr.py`	Adds `NemoASRModel`: async HTTP client to call `/v1/audio/transcriptions`, handles base_url/SSH tunneling, audio resolution (local, `ais://`, `s3://`, tarred shards), tar/materialization helpers, optional duration-based chunking, multipart uploads, response parsing, async lifecycle (context manager/destructor), and error handling.
ASR Server Implementation `nemo_skills/inference/server/serve_nemo_asr.py`	Adds `NemoASRServer` and FastAPI app exposing `/v1/audio/transcriptions`. Loads NeMo models (local `.nemo` or pretrained), supports GPU placement, chunked transcription, optional timestamps, OpenAI-compatible responses, and a CLI `main()` entry.
Server Orchestration `nemo_skills/pipeline/utils/server.py`	Adds `nemo_asr` enum member(s) and `get_server_command` handling for `nemo_asr`, including default entrypoint and start command construction (model_path, num_gpus, num_nodes, port, server_args).
Tests `tests/test_nemo_asr_support.py`	Adds tests for tar/member resolution (local and datastore mocks), prompt parsing (strings, lists, OpenAI-style messages, manifests), shard-aware logic, server extraction behavior, and SupportedServers inclusion.

Sequence Diagram

sequenceDiagram
    actor User
    participant Client as "NemoASRModel\n(Async Client)"
    participant HTTPClient as "httpx.AsyncClient\n(Connection Pool)"
    participant Server as "NemoASRServer\n(FastAPI)"
    participant NeMo as "NeMo ASR\n(Model)"

    User->>Client: generate_async(prompt, ...)
    Client->>Client: resolve audio path / tar member
    Client->>Client: optional duration check / chunk decision
    Client->>HTTPClient: POST /v1/audio/transcriptions\n(multipart/form-data)
    HTTPClient->>Server: HTTP Request
    Server->>Server: validate params & options
    alt chunking requested
        Server->>NeMo: transcribe(chunk_1)
        NeMo->>Server: text_chunk_1
        Server->>NeMo: transcribe(chunk_n)
        NeMo->>Server: text_chunk_n
        Server->>Server: combine results & timestamps
    else single transcription
        Server->>NeMo: transcribe(audio)
        NeMo->>Server: text + metadata
    end
    Server->>HTTPClient: HTTP Response (OpenAI format)
    HTTPClient->>Client: Response JSON
    Client->>User: parsed structured result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

update vllm miltimodal for api calls convenience #1213 — overlaps on server startup logic and get_server_command changes for adding new server types.
Riva NIMs can be run as part of the generate pipeline #790 — related model/server integration and registry updates exposing ASR model types.

Suggested reviewers

vmendelev

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main addition: a NeMo-ASR server implementation with client integration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch nemo-asr-server

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 222-227: The direct access hypotheses[0][0] in serve_nemo_asr.py
can raise IndexError if hypotheses[0] is empty; update the extraction logic in
the same block that currently raises RuntimeError("No transcription returned
from model") to also validate that hypotheses[0] has at least one element before
indexing, and if not raise a clear RuntimeError (or include the error context) —
locate the lines around hypothesis = hypotheses[0][0] and text = hypothesis.text
and add a guard that checks len(hypotheses[0]) > 0 (or similar) and handles the
empty-inner-list case consistently with the chunked path.
- Around line 157-159: The code accesses hypotheses[0][0].text without ensuring
hypotheses[0] is non-empty which can raise IndexError; update the block in
serve_nemo_asr.py (around where chunk_texts is appended) to first check both
len(hypotheses) > 0 and len(hypotheses[0]) > 0 (or safely find the first
non-empty hypothesis) before reading .text, e.g., guard the access to
hypotheses[0][0].text and only append text.strip() when that nested element
exists.
- Around line 206-258: The verbose_json path references hypothesis.timestep but
hypothesis is only set in the non-chunking branch, so when chunking is used (see
_transcribe_with_chunking) you get a NameError; fix by either making
_transcribe_with_chunking return timestamps/word-level data alongside (text,
inference_time) and use those to populate words (with offsets) when
enable_timestamps is true, or skip timestamp extraction for the chunked path
(i.e., only populate words when hypothesis exists or when the chunked result
includes a timestamps structure); update the code paths handling
chunk_duration_sec, _transcribe_with_chunking return signature, and the
verbose_json block (references: _transcribe_with_chunking, hypothesis,
enable_timestamps, response_format == "verbose_json") accordingly.

🧹 Nitpick comments (5)

nemo_skills/inference/server/serve_nemo_asr.py (2)
83-83: Remove extraneous f prefix from string literal.

This f-string has no placeholders.
Suggested fix
-                LOG.info(f"Model moved to GPU")
+                LOG.info("Model moved to GPU")
294-295: Consider logging when model parameter differs from server model.

The model parameter is documented as ignored for OpenAI API compatibility, which is fine. However, if a user passes a different model name expecting different behavior, they might be confused. Consider logging a debug message when the passed model differs from the actual loaded model.
nemo_skills/inference/model/nemo_asr.py (3)
92-93: Unused **kwargs silently ignores unexpected parameters.

Per coding guidelines, the code should fail if a user specifies an unsupported argument. Currently, any extra keyword arguments passed to __init__ are silently ignored.
Suggested fix: Warn or raise for unexpected kwargs
         max_workers: int = 64,
-        **kwargs,
     ) -> None:
         """Initialize NemoASRModel client."""
Or if compatibility requires accepting kwargs, at least log a warning:
         **kwargs,
     ) -> None:
         """Initialize NemoASRModel client."""
+        if kwargs:
+            LOG.warning(f"NemoASRModel ignoring unexpected parameters: {list(kwargs.keys())}")
201-201: Use direct key access for expected "text" field.

Per coding guidelines, don't use .get() for accessing dictionary keys that are expected to be present. The transcription response should always contain "text". Using .get() with a default masks malformed server responses.
Suggested fix
-            pred_text = result_data.get("text", "")
+            pred_text = result_data["text"]
As per coding guidelines: "Don't use .get() for accessing dictionary keys if the code expects them to be present. Use direct key access like data[key_name] to allow clear errors when keys are missing."

268-287: asyncio.get_event_loop() is deprecated; __del__ cleanup is fragile.

asyncio.get_event_loop() raises DeprecationWarning in Python 3.10+ when called without a running loop. The create_task without storing a reference (line 276) may result in the task being garbage-collected before completion.

Consider relying on explicit cleanup via the async context manager (__aexit__) instead of __del__, or use a more robust pattern.
Suggested simplification
     def __del__(self):
         """Clean up resources."""
-        # Close HTTP client
-        if hasattr(self, "_client"):
-            try:
-                # Try to close gracefully if event loop is available
-                loop = asyncio.get_event_loop()
-                if loop.is_running():
-                    asyncio.create_task(self._client.aclose())
-                else:
-                    loop.run_until_complete(self._client.aclose())
-            except Exception:
-                pass  # Ignore errors during cleanup
-
         # Close SSH tunnel
         if hasattr(self, "_tunnel") and self._tunnel:
             try:
                 self._tunnel.stop()
             except Exception:
                 pass  # Ignore errors during cleanup
For HTTP client cleanup, document that users should use the async context manager or call await model._client.aclose() explicitly.

nemo_skills/inference/server/serve_nemo_asr.py

nithinraok · 2026-02-19T17:02:15Z

@Jorjeous can we also add support for reading audio samples from nemo tarred dataset format for inference

Jorjeous · 2026-02-19T19:22:23Z

Yep, if you feel the need for it

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

nemo_skills/inference/server/serve_nemo_asr.py (1)

53-56: Missing raise ... from in three except re-raise sites.

All three spots raise a new exception inside an except clause without chaining the cause, losing the original traceback context (Ruff B904). Line 70 also warrants LOG.exception instead of LOG.error to capture the stack trace (Ruff TRY400).

♻️ Proposed fixes

 # Line 54-56
         except ImportError:
-            raise ImportError("NeMo toolkit is not installed. ...")
+            raise ImportError("NeMo toolkit is not installed. ...") from None

 # Line 68-70
         except Exception as e:
-            LOG.error(f"Failed to load model from NGC: {e}")
+            LOG.exception(f"Failed to load model from NGC: {e}")

 # Line 141-143
         except ImportError:
-            raise ImportError("soundfile and numpy are required for audio chunking")
+            raise ImportError("soundfile and numpy are required for audio chunking") from None

 # Line 349-351
         except Exception as e:
             LOG.error(f"Transcription failed: {e}", exc_info=True)
-            raise HTTPException(status_code=500, detail=str(e))
+            raise HTTPException(status_code=500, detail=str(e)) from e

Also applies to: 140-143, 349-351

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_nemo_asr.py` around lines 53 - 56, Update
the three exception handlers that re-raise ImportError to preserve exception
chaining and context by using "except Exception as e: raise ImportError('...')
from e" (apply this to the nemo_asr import block and the two similar import
blocks further down), and replace the LOG.error(...) call at the referenced
catch site with LOG.exception(...) so the stack trace is logged; use a
consistent exception variable name (e or exc) in each handler when chaining.

nemo_skills/inference/model/nemo_asr.py (1)

519-538: except Exception in _check_audio_duration silently masks soundfile parsing errors.

A corrupted or unsupported audio file would cause sf.info() to raise, which is caught here and silently returns None (no chunking). The caller then proceeds to send the file to the server, which may fail with a less informative error. Per coding guidelines, exceptions not normally expected to be raised should not be caught silently.
♻️ Proposed fix – narrow the catch
-        except Exception as e:
-            LOG.warning(f"Failed to check audio duration: {e}")
+        except (OSError, RuntimeError) as e:
+            LOG.warning(f"Failed to check audio duration: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/nemo_asr.py` around lines 519 - 538, The current
_check_audio_duration function catches all exceptions around sf.info() which
masks parsing errors; change the except block to only catch soundfile-specific
exceptions (use sf.SoundFileError or the appropriate soundfile exception type)
and log/return None for those cases, while allowing other unexpected exceptions
to propagate (re-raise) so callers see real failures; reference sf.info,
sf.SoundFileError (or soundfile's specific error class), and
chunk_audio_threshold_sec when making the change.

tests/test_nemo_asr_support.py (2)

1-253: No test covers the generate_async end-to-end flow.

The core integration path — generate_async sending multipart form data to the server and parsing the response — has no test coverage. This is the primary public interface of NemoASRModel. Even a basic mock of httpx.AsyncClient.post would provide a safety net for the request-building and response-parsing logic (lines 472–498 in nemo_asr.py).

Would you like me to draft a pytest-asyncio-based test that mocks httpx.AsyncClient.post to cover the generate_async happy path and the HTTP error path?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_nemo_asr_support.py` around lines 1 - 253, Add an async pytest
that exercises NemoASRModel.generate_async by mocking httpx.AsyncClient.post to
simulate the server happy path and an error path: create a test using
pytest-asyncio that constructs a NemoASRModel, patches httpx.AsyncClient.post to
capture the sent multipart/form-data request (asserting file payload, form
fields such as language/timestamps, and headers), and returns an
httpx.Response-like object with status_code 200 and a JSON body matching the
server hypothesis structure so generate_async returns the expected
transcription; also add a second case where the mock returns a non-200 status
(or raises httpx.HTTPError) and assert generate_async raises the appropriate
error. Ensure you reference NemoASRModel.generate_async and patch
httpx.AsyncClient.post (or the instance on model._client) so the test covers
request-building and response-parsing end-to-end.

77-87: Use pytest.raises instead of the manual try/except/AssertionError pattern.

The current approach produces confusing failure output and is more verbose. pytest.raises is the idiomatic replacement.

♻️ Proposed refactor

+import pytest

-    try:
-        NemoASRServer._extract_first_hypothesis([])
-        raise AssertionError("Expected RuntimeError for empty hypotheses")
-    except RuntimeError as e:
-        assert "No transcription returned" in str(e)
-
-    try:
-        NemoASRServer._extract_first_hypothesis([[]])
-        raise AssertionError("Expected RuntimeError for empty inner hypotheses")
-    except RuntimeError as e:
-        assert "empty transcription hypotheses" in str(e)
+    with pytest.raises(RuntimeError, match="No transcription returned"):
+        NemoASRServer._extract_first_hypothesis([])
+
+    with pytest.raises(RuntimeError, match="empty transcription hypotheses"):
+        NemoASRServer._extract_first_hypothesis([[]])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_nemo_asr_support.py` around lines 77 - 87, Replace the manual
try/except/AssertionError blocks testing NemoASRServer._extract_first_hypothesis
with pytest.raises context managers: use "with pytest.raises(RuntimeError,
match='No transcription returned')" for the first call with [] and "with
pytest.raises(RuntimeError, match='empty transcription hypotheses')" for the
second call with [[]] (or alternatively capture the context as e and assert on
str(e.value)). This will make tests concise and produce clearer failure output
while still verifying the exact exception messages from
_extract_first_hypothesis.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/model/nemo_asr.py`:
- Around line 549-556: The __del__ method currently only stops the SSH tunnel
and leaves httpx.AsyncClient connection pools open; update __del__ to also close
self._client by checking hasattr(self, "_client") and not
self._client.is_closed, then attempt to close it: if the event loop is running
schedule self._client.aclose() via loop.create_task(...) otherwise call
loop.run_until_complete(self._client.aclose()); keep the existing try/except
swallowing errors and retain the _tunnel.stop() cleanup; alternatively, if you
prefer, document that callers must use "async with NemoASRModel(...)" to avoid
leaks.
- Around line 106-109: The NemoASRModel currently logs and ignores unsupported
parameters (tokenizer and output_dir) via LOG.warning; change this to raise a
clear exception so callers cannot silently pass unsupported args: inside the
NemoASRModel constructor or initializer where tokenizer and output_dir are
checked, replace the LOG.warning calls for tokenizer and output_dir with raised
exceptions (e.g., ValueError or a custom ConfigError) that include the parameter
name and a brief message that NemoASRModel does not accept that parameter,
ensuring callers receive an immediate failure instead of a silent warning.
- Around line 233-236: The loop that filters glob results into tar_files
currently drops non-".tar" paths silently; update the code in the
function/method containing the loop that iterates "for path_str in
sorted(matches):" so that when a Path(path_str).expanduser() has a suffix !=
".tar" you append a warning log (e.g., logger.warning or processLogger.warn
depending on the module logger) mentioning the skipped file and the reason; keep
existing behavior of appending str(path.absolute()) to tar_files for ".tar"
files and ensure the logger symbol you use (logger/processLogger) is the
module's configured logger to make skipped-file diagnostics visible.

In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 122-199: _transcribe_with_chunking is async but performs blocking
I/O and CPU work (sf.read, sf.write, os.path.exists/unlink and calling the
synchronous self._transcribe_single), which blocks the event loop; wrap these
blocking calls in asyncio.to_thread (e.g., await asyncio.to_thread(sf.read,
audio_path), await asyncio.to_thread(sf.write, chunk_path, audio_chunk,
sampling_rate), await asyncio.to_thread(os.path.exists, chunk_path) / await
asyncio.to_thread(os.unlink, chunk_path)) and offload the sync inference call by
awaiting await asyncio.to_thread(self._transcribe_single, [chunk_path],
enable_timestamps, language); also apply the same to the synchronous call to
self._transcribe_single inside the async transcribe method so all blocking file
I/O and inference run in threads instead of the event loop.
- Line 112: The current truthy check `if language:` in serve_nemo_asr.py drops
an explicit empty-string language; change that check to an explicit None check
(`if language is not None:`) so an empty string is treated as a valid value, and
audit any other occurrences in the same function/method that reference the
variable `language` to replace truthiness checks with `is not None` where intent
is to allow empty-string inputs.
- Line 222: The temporary file is always created with suffix=".wav" which can
mislead NeMo ASR format detection; change the NamedTemporaryFile call to
preserve the uploaded file's original extension by extracting it (e.g., ext =
os.path.splitext(original_filename)[1].lower()) and passing suffix=ext (with a
safe default like ".wav" if ext is empty or not in an allowed set). Update the
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") usage to use the
computed suffix and validate/sanitize the extension before creating tmp_file so
tmp_file has the correct format for downstream NeMo functions.

---

Nitpick comments:
In `@nemo_skills/inference/model/nemo_asr.py`:
- Around line 519-538: The current _check_audio_duration function catches all
exceptions around sf.info() which masks parsing errors; change the except block
to only catch soundfile-specific exceptions (use sf.SoundFileError or the
appropriate soundfile exception type) and log/return None for those cases, while
allowing other unexpected exceptions to propagate (re-raise) so callers see real
failures; reference sf.info, sf.SoundFileError (or soundfile's specific error
class), and chunk_audio_threshold_sec when making the change.

In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 53-56: Update the three exception handlers that re-raise
ImportError to preserve exception chaining and context by using "except
Exception as e: raise ImportError('...') from e" (apply this to the nemo_asr
import block and the two similar import blocks further down), and replace the
LOG.error(...) call at the referenced catch site with LOG.exception(...) so the
stack trace is logged; use a consistent exception variable name (e or exc) in
each handler when chaining.

In `@tests/test_nemo_asr_support.py`:
- Around line 1-253: Add an async pytest that exercises
NemoASRModel.generate_async by mocking httpx.AsyncClient.post to simulate the
server happy path and an error path: create a test using pytest-asyncio that
constructs a NemoASRModel, patches httpx.AsyncClient.post to capture the sent
multipart/form-data request (asserting file payload, form fields such as
language/timestamps, and headers), and returns an httpx.Response-like object
with status_code 200 and a JSON body matching the server hypothesis structure so
generate_async returns the expected transcription; also add a second case where
the mock returns a non-200 status (or raises httpx.HTTPError) and assert
generate_async raises the appropriate error. Ensure you reference
NemoASRModel.generate_async and patch httpx.AsyncClient.post (or the instance on
model._client) so the test covers request-building and response-parsing
end-to-end.
- Around line 77-87: Replace the manual try/except/AssertionError blocks testing
NemoASRServer._extract_first_hypothesis with pytest.raises context managers: use
"with pytest.raises(RuntimeError, match='No transcription returned')" for the
first call with [] and "with pytest.raises(RuntimeError, match='empty
transcription hypotheses')" for the second call with [[]] (or alternatively
capture the context as e and assert on str(e.value)). This will make tests
concise and produce clearer failure output while still verifying the exact
exception messages from _extract_first_hypothesis.

coderabbitai · 2026-02-20T16:08:41Z

nemo_skills/inference/model/nemo_asr.py

+        if tokenizer is not None:
+            LOG.warning("NemoASRModel does not use tokenizer. Ignoring tokenizer argument.")
+        if output_dir is not None:
+            LOG.warning("NemoASRModel does not use output_dir. Ignoring output_dir argument.")


⚠️ Potential issue | 🟡 Minor

tokenizer and output_dir silently accepted via LOG.warning instead of failing — coding guideline violation.

Per coding guidelines: "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided." Issuing a warning allows callers to silently misconfigure the model without any breakage signal.

🐛 Proposed fix

- if tokenizer is not None: - LOG.warning("NemoASRModel does not use tokenizer. Ignoring tokenizer argument.") - if output_dir is not None: - LOG.warning("NemoASRModel does not use output_dir. Ignoring output_dir argument.") + if tokenizer is not None: + raise ValueError("NemoASRModel does not support a tokenizer argument.") + if output_dir is not None: + raise ValueError("NemoASRModel does not support an output_dir argument.")

As per coding guidelines, "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/model/nemo_asr.py` around lines 106 - 109, The NemoASRModel currently logs and ignores unsupported parameters (tokenizer and output_dir) via LOG.warning; change this to raise a clear exception so callers cannot silently pass unsupported args: inside the NemoASRModel constructor or initializer where tokenizer and output_dir are checked, replace the LOG.warning calls for tokenizer and output_dir with raised exceptions (e.g., ValueError or a custom ConfigError) that include the parameter name and a brief message that NemoASRModel does not accept that parameter, ensuring callers receive an immediate failure instead of a silent warning.

coderabbitai · 2026-02-20T16:08:41Z

nemo_skills/inference/model/nemo_asr.py

+            for path_str in sorted(matches):
+                path = Path(path_str).expanduser()
+                if path.suffix == ".tar":
+                    tar_files.append(str(path.absolute()))


⚠️ Potential issue | 🟡 Minor

Non-.tar files in glob results are silently dropped with no warning.

If a glob pattern like "audio_*" matches both .tar and non-tar files, only .tar files are kept. Non-matching files are silently discarded, with no log message. A misconfigured path (e.g., "*.wav") would result in an empty tar_files list and cryptic FileNotFoundError at inference time.

🐛 Proposed fix – warn on skipped files

for path_str in sorted(matches): path = Path(path_str).expanduser() if path.suffix == ".tar": tar_files.append(str(path.absolute())) + else: + LOG.warning("Skipping non-tar file in tarred_audio_filepaths: %s", path_str)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/model/nemo_asr.py` around lines 233 - 236, The loop that filters glob results into tar_files currently drops non-".tar" paths silently; update the code in the function/method containing the loop that iterates "for path_str in sorted(matches):" so that when a Path(path_str).expanduser() has a suffix != ".tar" you append a warning log (e.g., logger.warning or processLogger.warn depending on the module logger) mentioning the skipped file and the reason; keep existing behavior of appending str(path.absolute()) to tar_files for ".tar" files and ensure the logger symbol you use (logger/processLogger) is the module's configured logger to make skipped-file diagnostics visible.

coderabbitai · 2026-02-20T16:08:41Z

nemo_skills/inference/model/nemo_asr.py

+    def __del__(self):
+        """Clean up resources."""
+        # Close SSH tunnel
+        if hasattr(self, "_tunnel") and self._tunnel:
+            try:
+                self._tunnel.stop()
+            except Exception:
+                pass  # Ignore errors during cleanup


⚠️ Potential issue | 🟡 Minor

__del__ closes the SSH tunnel but not the httpx.AsyncClient, leaving connection-pool handles open for callers not using the async context manager.

Because __del__ is synchronous, await self._client.aclose() cannot be called, so the HTTP connection pool leaks. The test workaround asyncio.run(model._client.aclose()) at line 27 of the test file confirms this gap. Callers that don't use async with NemoASRModel(...) will leak connections for the lifetime of the process.

Consider documenting that async with is required, or adding a synchronous fallback using asyncio.get_event_loop().run_until_complete:

def __del__(self): if hasattr(self, "_client") and not self._client.is_closed: try: loop = asyncio.get_event_loop() if loop.is_running(): loop.create_task(self._client.aclose()) else: loop.run_until_complete(self._client.aclose()) except Exception: pass if hasattr(self, "_tunnel") and self._tunnel: try: self._tunnel.stop() except Exception: pass

🧰 Tools

🪛 Ruff (0.15.1)

[error] 555-556: try-except-pass detected, consider logging the exception

(S110)

[warning] 555-555: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/model/nemo_asr.py` around lines 549 - 556, The __del__ method currently only stops the SSH tunnel and leaves httpx.AsyncClient connection pools open; update __del__ to also close self._client by checking hasattr(self, "_client") and not self._client.is_closed, then attempt to close it: if the event loop is running schedule self._client.aclose() via loop.create_task(...) otherwise call loop.run_until_complete(self._client.aclose()); keep the existing try/except swallowing errors and retain the _tunnel.stop() cleanup; alternatively, if you prefer, document that callers must use "async with NemoASRModel(...)" to avoid leaks.

coderabbitai · 2026-02-20T16:08:41Z

nemo_skills/inference/server/serve_nemo_asr.py

+            "return_hypotheses": True,
+            "timestamps": enable_timestamps,
+        }
+        if language:


⚠️ Potential issue | 🟡 Minor

if language: silently drops an explicit empty-string language.

if language: evaluates to False for "", so an explicitly passed empty string is silently discarded. Prefer an explicit None check throughout.

🐛 Proposed fix

- if language: + if language is not None:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if language:

if language is not None:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_nemo_asr.py` at line 112, The current truthy check `if language:` in serve_nemo_asr.py drops an explicit empty-string language; change that check to an explicit None check (`if language is not None:`) so an empty string is treated as a valid value, and audit any other occurrences in the same function/method that reference the variable `language` to replace truthiness checks with `is not None` where intent is to allow empty-string inputs.

coderabbitai · 2026-02-20T16:08:41Z

nemo_skills/inference/server/serve_nemo_asr.py

+    async def _transcribe_with_chunking(
+        self,
+        audio_path: str,
+        chunk_duration_sec: float,
+        enable_timestamps: bool = False,
+        language: Optional[str] = None,
+    ) -> tuple[str, float]:
+        """Transcribe long audio by chunking it into smaller segments.
+
+        Args:
+            audio_path: Path to audio file
+            chunk_duration_sec: Duration of each chunk in seconds
+            enable_timestamps: Whether to enable timestamps
+
+        Returns:
+            Tuple of (transcribed_text, total_inference_time)
+        """
+        try:
+            import numpy as np
+            import soundfile as sf
+        except ImportError:
+            raise ImportError("soundfile and numpy are required for audio chunking")
+
+        # Load audio file
+        audio_array, sampling_rate = sf.read(audio_path)
+        duration = len(audio_array) / sampling_rate
+
+        LOG.info(f"Chunking audio ({duration:.1f}s) into segments of {chunk_duration_sec}s")
+
+        # Calculate chunks
+        chunk_samples = int(chunk_duration_sec * sampling_rate)
+        num_chunks = int(np.ceil(len(audio_array) / chunk_samples))
+
+        chunks = []
+        for i in range(num_chunks):
+            start = i * chunk_samples
+            end = min((i + 1) * chunk_samples, len(audio_array))
+            chunk = audio_array[start:end]
+
+            # Merge tiny trailing chunks
+            min_chunk_samples = int(0.5 * sampling_rate)  # 0.5 second minimum
+            if len(chunk) < min_chunk_samples and chunks:
+                chunks[-1] = np.concatenate([chunks[-1], chunk])
+            else:
+                chunks.append(chunk)
+
+        LOG.info(f"Created {len(chunks)} audio chunks")
+
+        # Transcribe each chunk
+        chunk_texts = []
+        total_time = 0.0
+
+        for chunk_idx, audio_chunk in enumerate(chunks):
+            # Save chunk to temporary file
+            chunk_path = f"{audio_path}.chunk_{chunk_idx}.wav"
+            try:
+                sf.write(chunk_path, audio_chunk, sampling_rate)
+
+                # Transcribe chunk
+                start_time = time.time()
+                hypotheses = self._transcribe_single([chunk_path], enable_timestamps, language)
+                chunk_time = time.time() - start_time
+                total_time += chunk_time
+
+                hypothesis = self._extract_first_hypothesis(hypotheses)
+                text = hypothesis.text
+                chunk_texts.append(text.strip())
+                LOG.debug(f"Chunk {chunk_idx + 1}/{len(chunks)}: {text[:50]}...")
+
+            finally:
+                # Clean up chunk file
+                if os.path.exists(chunk_path):
+                    os.unlink(chunk_path)
+
+        # Concatenate all chunk transcriptions
+        full_text = " ".join(chunk_texts)
+
+        return full_text, total_time


⚠️ Potential issue | 🟠 Major

Blocking synchronous operations inside async handlers will stall the event loop.

_transcribe_with_chunking is declared async but contains zero await calls — sf.read (line 146), sf.write (line 178), and self._transcribe_single (line 182) are all synchronous blocking calls. Identically, self._transcribe_single at line 246 inside the async transcribe method blocks the event loop directly. Under uvicorn, any other in-flight request (e.g., health-check) is frozen until inference finishes.

Use asyncio.to_thread to offload blocking work:

🔧 Proposed fix – offload blocking inference and file I/O to a thread

+import asyncio # _transcribe_with_chunking - audio_array, sampling_rate = sf.read(audio_path) + audio_array, sampling_rate = await asyncio.to_thread(sf.read, audio_path) - sf.write(chunk_path, audio_chunk, sampling_rate) - - start_time = time.time() - hypotheses = self._transcribe_single([chunk_path], enable_timestamps, language) + await asyncio.to_thread(sf.write, chunk_path, audio_chunk, sampling_rate) + + start_time = time.time() + hypotheses = await asyncio.to_thread( + self._transcribe_single, [chunk_path], enable_timestamps, language + ) # transcribe method - hypotheses = self._transcribe_single([tmp_path], enable_timestamps, language) + hypotheses = await asyncio.to_thread( + self._transcribe_single, [tmp_path], enable_timestamps, language + )

Also applies to: 243-247

🧰 Tools

🪛 Ruff (0.15.1)

[warning] 143-143: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 143-143: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_nemo_asr.py` around lines 122 - 199, _transcribe_with_chunking is async but performs blocking I/O and CPU work (sf.read, sf.write, os.path.exists/unlink and calling the synchronous self._transcribe_single), which blocks the event loop; wrap these blocking calls in asyncio.to_thread (e.g., await asyncio.to_thread(sf.read, audio_path), await asyncio.to_thread(sf.write, chunk_path, audio_chunk, sampling_rate), await asyncio.to_thread(os.path.exists, chunk_path) / await asyncio.to_thread(os.unlink, chunk_path)) and offload the sync inference call by awaiting await asyncio.to_thread(self._transcribe_single, [chunk_path], enable_timestamps, language); also apply the same to the synchronous call to self._transcribe_single inside the async transcribe method so all blocking file I/O and inference run in threads instead of the event loop.

coderabbitai · 2026-02-20T16:08:41Z

nemo_skills/inference/server/serve_nemo_asr.py

+            Transcription result in OpenAI-compatible format
+        """
+        # Save uploaded file to temporary location
+        with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:


⚠️ Potential issue | 🟡 Minor

Temporary file always gets a .wav suffix regardless of the uploaded file format.

NeMo ASR backends (Canary, Parakeet) may use the file extension for format detection. An uploaded FLAC or MP3 would be saved as .wav, potentially causing a decoding error or silently incorrect decoding.

🐛 Proposed fix – preserve the original extension

- with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file: + original_suffix = os.path.splitext(audio_file.filename or "audio.wav")[1] or ".wav" + with tempfile.NamedTemporaryFile(delete=False, suffix=original_suffix) as tmp_file:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/server/serve_nemo_asr.py` at line 222, The temporary file is always created with suffix=".wav" which can mislead NeMo ASR format detection; change the NamedTemporaryFile call to preserve the uploaded file's original extension by extracting it (e.g., ext = os.path.splitext(original_filename)[1].lower()) and passing suffix=ext (with a safe default like ".wav" if ext is empty or not in an allowed set). Update the with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") usage to use the computed suffix and validate/sanitize the extension before creating tmp_file so tmp_file has the correct format for downstream NeMo functions.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/model/__init__.py`:
- Around line 23-24: The import order is wrong causing ruff to auto-fix; move
the `from .nemo_asr import NemoASRModel` line so imports are alphabetically
sorted among the NeMo-related imports (place `.nemo_asr` after `.megatron` and
before `.openai`, keeping it adjacent to other speech model imports like
`.asr_nim`/`.audio_utils` as appropriate) and commit the updated file so the
ruff-check pipeline passes.

coderabbitai · 2026-02-20T16:13:09Z

nemo_skills/inference/model/__init__.py

+# NeMo models (speech)
+from .nemo_asr import NemoASRModel


⚠️ Potential issue | 🟠 Major

Fix import order to resolve the ruff-check pipeline failure.

The CI pipeline reports that ruff auto-fixed the file due to an import ordering violation. The .nemo_asr import is placed between .asr_nim and .audio_utils, but alphabetically it belongs after .megatron and before .openai. Commit the ruff-fixed version before merging.

🔧 Proposed fix – move the import to its sorted position

-# NeMo models (speech) -from .nemo_asr import NemoASRModel - # Audio utilities from .audio_utils import ( audio_file_to_base64, chunk_audio, load_audio_file, make_audio_content_block, save_audio_chunk_to_base64, ) from .azure import AzureOpenAIModel ... from .megatron import MegatronModel +from .nemo_asr import NemoASRModel from .openai import OpenAIModel

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/model/__init__.py` around lines 23 - 24, The import order is wrong causing ruff to auto-fix; move the `from .nemo_asr import NemoASRModel` line so imports are alphabetically sorted among the NeMo-related imports (place `.nemo_asr` after `.megatron` and before `.openai`, keeping it adjacent to other speech model imports like `.asr_nim`/`.audio_utils` as appropriate) and commit the updated file so the ruff-check pipeline passes.

ASR models support

7c33247

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous changed the title ~~ASR models support~~ ASR models support [not merge yet] Feb 4, 2026

Jorjeous changed the title ~~ASR models support [not merge yet]~~ ASR models support [do not merge] Feb 4, 2026

greptile-apps bot reviewed Feb 4, 2026

View reviewed changes

Jorjeous changed the title ~~ASR models support [do not merge]~~ Nemo-ASR models server [do not merge] Feb 4, 2026

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

nemo_skills/inference/server/serve_nemo_asr.py Outdated Show resolved Hide resolved

nemo_skills/inference/server/serve_nemo_asr.py Show resolved Hide resolved

nemo_skills/inference/server/serve_nemo_asr.py Show resolved Hide resolved

Jorjeous added 2 commits February 20, 2026 07:59

refactored code, added tar support s3 support nemo manifest support

0d519ea

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Merge origin/main into nemo-asr-server and resolve server enum conflict

b04fa5b

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

Comments

Conversation

Jorjeous commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Jorjeous commented Feb 4, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithinraok commented Feb 19, 2026

Uh oh!

Jorjeous commented Feb 19, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jorjeous commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading