Skip to content

Comments

Nemo-ASR models server [do not merge]#1211

Open
Jorjeous wants to merge 3 commits intomainfrom
nemo-asr-server
Open

Nemo-ASR models server [do not merge]#1211
Jorjeous wants to merge 3 commits intomainfrom
nemo-asr-server

Conversation

@Jorjeous
Copy link
Member

@Jorjeous Jorjeous commented Feb 4, 2026

Summary by CodeRabbit

  • New Features

    • NeMo ASR speech-to-text added with a client and an OpenAI-compatible server API.
    • Supports chunked transcription, language selection, multiple response formats, optional timestamps, and GPU-enabled server runtime.
    • Robust handling of tarred audio datastores and remote URIs with automatic materialization and flexible prompt/audio reference parsing.
  • Tests

    • Comprehensive unit tests covering audio resolution, tar/shard handling, prompt parsing, and transcription flows.

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
@Jorjeous Jorjeous changed the title ASR models support ASR models support [not merge yet] Feb 4, 2026
@Jorjeous
Copy link
Member Author

Jorjeous commented Feb 4, 2026

Tests is WIP

@Jorjeous Jorjeous changed the title ASR models support [not merge yet] ASR models support [do not merge] Feb 4, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

"""

import argparse
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio imported but never used

@Jorjeous Jorjeous changed the title ASR models support [do not merge] Nemo-ASR models server [do not merge] Feb 4, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Adds a NeMo ASR integration: a new NemoASRModel client, a FastAPI-based NemoASRServer exposing /v1/audio/transcriptions, orchestration/registry updates to support nemo_asr, and unit tests for tar/datastore and prompt handling.

Changes

Cohort / File(s) Summary
Model Registry Integration
nemo_skills/inference/model/__init__.py
Imports and registers NemoASRModel under the "nemo_asr" entry in the public models dictionary.
ASR Client Implementation
nemo_skills/inference/model/nemo_asr.py
Adds NemoASRModel: async HTTP client to call /v1/audio/transcriptions, handles base_url/SSH tunneling, audio resolution (local, ais://, s3://, tarred shards), tar/materialization helpers, optional duration-based chunking, multipart uploads, response parsing, async lifecycle (context manager/destructor), and error handling.
ASR Server Implementation
nemo_skills/inference/server/serve_nemo_asr.py
Adds NemoASRServer and FastAPI app exposing /v1/audio/transcriptions. Loads NeMo models (local .nemo or pretrained), supports GPU placement, chunked transcription, optional timestamps, OpenAI-compatible responses, and a CLI main() entry.
Server Orchestration
nemo_skills/pipeline/utils/server.py
Adds nemo_asr enum member(s) and get_server_command handling for nemo_asr, including default entrypoint and start command construction (model_path, num_gpus, num_nodes, port, server_args).
Tests
tests/test_nemo_asr_support.py
Adds tests for tar/member resolution (local and datastore mocks), prompt parsing (strings, lists, OpenAI-style messages, manifests), shard-aware logic, server extraction behavior, and SupportedServers inclusion.

Sequence Diagram

sequenceDiagram
    actor User
    participant Client as "NemoASRModel\n(Async Client)"
    participant HTTPClient as "httpx.AsyncClient\n(Connection Pool)"
    participant Server as "NemoASRServer\n(FastAPI)"
    participant NeMo as "NeMo ASR\n(Model)"

    User->>Client: generate_async(prompt, ...)
    Client->>Client: resolve audio path / tar member
    Client->>Client: optional duration check / chunk decision
    Client->>HTTPClient: POST /v1/audio/transcriptions\n(multipart/form-data)
    HTTPClient->>Server: HTTP Request
    Server->>Server: validate params & options
    alt chunking requested
        Server->>NeMo: transcribe(chunk_1)
        NeMo->>Server: text_chunk_1
        Server->>NeMo: transcribe(chunk_n)
        NeMo->>Server: text_chunk_n
        Server->>Server: combine results & timestamps
    else single transcription
        Server->>NeMo: transcribe(audio)
        NeMo->>Server: text + metadata
    end
    Server->>HTTPClient: HTTP Response (OpenAI format)
    HTTPClient->>Client: Response JSON
    Client->>User: parsed structured result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • vmendelev
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main addition: a NeMo-ASR server implementation with client integration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nemo-asr-server

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 222-227: The direct access hypotheses[0][0] in serve_nemo_asr.py
can raise IndexError if hypotheses[0] is empty; update the extraction logic in
the same block that currently raises RuntimeError("No transcription returned
from model") to also validate that hypotheses[0] has at least one element before
indexing, and if not raise a clear RuntimeError (or include the error context) —
locate the lines around hypothesis = hypotheses[0][0] and text = hypothesis.text
and add a guard that checks len(hypotheses[0]) > 0 (or similar) and handles the
empty-inner-list case consistently with the chunked path.
- Around line 157-159: The code accesses hypotheses[0][0].text without ensuring
hypotheses[0] is non-empty which can raise IndexError; update the block in
serve_nemo_asr.py (around where chunk_texts is appended) to first check both
len(hypotheses) > 0 and len(hypotheses[0]) > 0 (or safely find the first
non-empty hypothesis) before reading .text, e.g., guard the access to
hypotheses[0][0].text and only append text.strip() when that nested element
exists.
- Around line 206-258: The verbose_json path references hypothesis.timestep but
hypothesis is only set in the non-chunking branch, so when chunking is used (see
_transcribe_with_chunking) you get a NameError; fix by either making
_transcribe_with_chunking return timestamps/word-level data alongside (text,
inference_time) and use those to populate words (with offsets) when
enable_timestamps is true, or skip timestamp extraction for the chunked path
(i.e., only populate words when hypothesis exists or when the chunked result
includes a timestamps structure); update the code paths handling
chunk_duration_sec, _transcribe_with_chunking return signature, and the
verbose_json block (references: _transcribe_with_chunking, hypothesis,
enable_timestamps, response_format == "verbose_json") accordingly.
🧹 Nitpick comments (5)
nemo_skills/inference/server/serve_nemo_asr.py (2)

83-83: Remove extraneous f prefix from string literal.

This f-string has no placeholders.

Suggested fix
-                LOG.info(f"Model moved to GPU")
+                LOG.info("Model moved to GPU")

294-295: Consider logging when model parameter differs from server model.

The model parameter is documented as ignored for OpenAI API compatibility, which is fine. However, if a user passes a different model name expecting different behavior, they might be confused. Consider logging a debug message when the passed model differs from the actual loaded model.

nemo_skills/inference/model/nemo_asr.py (3)

92-93: Unused **kwargs silently ignores unexpected parameters.

Per coding guidelines, the code should fail if a user specifies an unsupported argument. Currently, any extra keyword arguments passed to __init__ are silently ignored.

Suggested fix: Warn or raise for unexpected kwargs
         max_workers: int = 64,
-        **kwargs,
     ) -> None:
         """Initialize NemoASRModel client."""

Or if compatibility requires accepting kwargs, at least log a warning:

         **kwargs,
     ) -> None:
         """Initialize NemoASRModel client."""
+        if kwargs:
+            LOG.warning(f"NemoASRModel ignoring unexpected parameters: {list(kwargs.keys())}")

201-201: Use direct key access for expected "text" field.

Per coding guidelines, don't use .get() for accessing dictionary keys that are expected to be present. The transcription response should always contain "text". Using .get() with a default masks malformed server responses.

Suggested fix
-            pred_text = result_data.get("text", "")
+            pred_text = result_data["text"]

As per coding guidelines: "Don't use .get() for accessing dictionary keys if the code expects them to be present. Use direct key access like data[key_name] to allow clear errors when keys are missing."


268-287: asyncio.get_event_loop() is deprecated; __del__ cleanup is fragile.

asyncio.get_event_loop() raises DeprecationWarning in Python 3.10+ when called without a running loop. The create_task without storing a reference (line 276) may result in the task being garbage-collected before completion.

Consider relying on explicit cleanup via the async context manager (__aexit__) instead of __del__, or use a more robust pattern.

Suggested simplification
     def __del__(self):
         """Clean up resources."""
-        # Close HTTP client
-        if hasattr(self, "_client"):
-            try:
-                # Try to close gracefully if event loop is available
-                loop = asyncio.get_event_loop()
-                if loop.is_running():
-                    asyncio.create_task(self._client.aclose())
-                else:
-                    loop.run_until_complete(self._client.aclose())
-            except Exception:
-                pass  # Ignore errors during cleanup
-
         # Close SSH tunnel
         if hasattr(self, "_tunnel") and self._tunnel:
             try:
                 self._tunnel.stop()
             except Exception:
                 pass  # Ignore errors during cleanup

For HTTP client cleanup, document that users should use the async context manager or call await model._client.aclose() explicitly.

@nithinraok
Copy link
Member

@Jorjeous can we also add support for reading audio samples from nemo tarred dataset format for inference

@Jorjeous
Copy link
Member Author

Yep, if you feel the need for it

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
nemo_skills/inference/server/serve_nemo_asr.py (1)

53-56: Missing raise ... from in three except re-raise sites.

All three spots raise a new exception inside an except clause without chaining the cause, losing the original traceback context (Ruff B904). Line 70 also warrants LOG.exception instead of LOG.error to capture the stack trace (Ruff TRY400).

♻️ Proposed fixes
 # Line 54-56
         except ImportError:
-            raise ImportError("NeMo toolkit is not installed. ...")
+            raise ImportError("NeMo toolkit is not installed. ...") from None

 # Line 68-70
         except Exception as e:
-            LOG.error(f"Failed to load model from NGC: {e}")
+            LOG.exception(f"Failed to load model from NGC: {e}")

 # Line 141-143
         except ImportError:
-            raise ImportError("soundfile and numpy are required for audio chunking")
+            raise ImportError("soundfile and numpy are required for audio chunking") from None

 # Line 349-351
         except Exception as e:
             LOG.error(f"Transcription failed: {e}", exc_info=True)
-            raise HTTPException(status_code=500, detail=str(e))
+            raise HTTPException(status_code=500, detail=str(e)) from e

Also applies to: 140-143, 349-351

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_nemo_asr.py` around lines 53 - 56, Update
the three exception handlers that re-raise ImportError to preserve exception
chaining and context by using "except Exception as e: raise ImportError('...')
from e" (apply this to the nemo_asr import block and the two similar import
blocks further down), and replace the LOG.error(...) call at the referenced
catch site with LOG.exception(...) so the stack trace is logged; use a
consistent exception variable name (e or exc) in each handler when chaining.
nemo_skills/inference/model/nemo_asr.py (1)

519-538: except Exception in _check_audio_duration silently masks soundfile parsing errors.

A corrupted or unsupported audio file would cause sf.info() to raise, which is caught here and silently returns None (no chunking). The caller then proceeds to send the file to the server, which may fail with a less informative error. Per coding guidelines, exceptions not normally expected to be raised should not be caught silently.

♻️ Proposed fix – narrow the catch
-        except Exception as e:
-            LOG.warning(f"Failed to check audio duration: {e}")
+        except (OSError, RuntimeError) as e:
+            LOG.warning(f"Failed to check audio duration: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/nemo_asr.py` around lines 519 - 538, The current
_check_audio_duration function catches all exceptions around sf.info() which
masks parsing errors; change the except block to only catch soundfile-specific
exceptions (use sf.SoundFileError or the appropriate soundfile exception type)
and log/return None for those cases, while allowing other unexpected exceptions
to propagate (re-raise) so callers see real failures; reference sf.info,
sf.SoundFileError (or soundfile's specific error class), and
chunk_audio_threshold_sec when making the change.
tests/test_nemo_asr_support.py (2)

1-253: No test covers the generate_async end-to-end flow.

The core integration path — generate_async sending multipart form data to the server and parsing the response — has no test coverage. This is the primary public interface of NemoASRModel. Even a basic mock of httpx.AsyncClient.post would provide a safety net for the request-building and response-parsing logic (lines 472–498 in nemo_asr.py).

Would you like me to draft a pytest-asyncio-based test that mocks httpx.AsyncClient.post to cover the generate_async happy path and the HTTP error path?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_nemo_asr_support.py` around lines 1 - 253, Add an async pytest
that exercises NemoASRModel.generate_async by mocking httpx.AsyncClient.post to
simulate the server happy path and an error path: create a test using
pytest-asyncio that constructs a NemoASRModel, patches httpx.AsyncClient.post to
capture the sent multipart/form-data request (asserting file payload, form
fields such as language/timestamps, and headers), and returns an
httpx.Response-like object with status_code 200 and a JSON body matching the
server hypothesis structure so generate_async returns the expected
transcription; also add a second case where the mock returns a non-200 status
(or raises httpx.HTTPError) and assert generate_async raises the appropriate
error. Ensure you reference NemoASRModel.generate_async and patch
httpx.AsyncClient.post (or the instance on model._client) so the test covers
request-building and response-parsing end-to-end.

77-87: Use pytest.raises instead of the manual try/except/AssertionError pattern.

The current approach produces confusing failure output and is more verbose. pytest.raises is the idiomatic replacement.

♻️ Proposed refactor
+import pytest

-    try:
-        NemoASRServer._extract_first_hypothesis([])
-        raise AssertionError("Expected RuntimeError for empty hypotheses")
-    except RuntimeError as e:
-        assert "No transcription returned" in str(e)
-
-    try:
-        NemoASRServer._extract_first_hypothesis([[]])
-        raise AssertionError("Expected RuntimeError for empty inner hypotheses")
-    except RuntimeError as e:
-        assert "empty transcription hypotheses" in str(e)
+    with pytest.raises(RuntimeError, match="No transcription returned"):
+        NemoASRServer._extract_first_hypothesis([])
+
+    with pytest.raises(RuntimeError, match="empty transcription hypotheses"):
+        NemoASRServer._extract_first_hypothesis([[]])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_nemo_asr_support.py` around lines 77 - 87, Replace the manual
try/except/AssertionError blocks testing NemoASRServer._extract_first_hypothesis
with pytest.raises context managers: use "with pytest.raises(RuntimeError,
match='No transcription returned')" for the first call with [] and "with
pytest.raises(RuntimeError, match='empty transcription hypotheses')" for the
second call with [[]] (or alternatively capture the context as e and assert on
str(e.value)). This will make tests concise and produce clearer failure output
while still verifying the exact exception messages from
_extract_first_hypothesis.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/model/nemo_asr.py`:
- Around line 549-556: The __del__ method currently only stops the SSH tunnel
and leaves httpx.AsyncClient connection pools open; update __del__ to also close
self._client by checking hasattr(self, "_client") and not
self._client.is_closed, then attempt to close it: if the event loop is running
schedule self._client.aclose() via loop.create_task(...) otherwise call
loop.run_until_complete(self._client.aclose()); keep the existing try/except
swallowing errors and retain the _tunnel.stop() cleanup; alternatively, if you
prefer, document that callers must use "async with NemoASRModel(...)" to avoid
leaks.
- Around line 106-109: The NemoASRModel currently logs and ignores unsupported
parameters (tokenizer and output_dir) via LOG.warning; change this to raise a
clear exception so callers cannot silently pass unsupported args: inside the
NemoASRModel constructor or initializer where tokenizer and output_dir are
checked, replace the LOG.warning calls for tokenizer and output_dir with raised
exceptions (e.g., ValueError or a custom ConfigError) that include the parameter
name and a brief message that NemoASRModel does not accept that parameter,
ensuring callers receive an immediate failure instead of a silent warning.
- Around line 233-236: The loop that filters glob results into tar_files
currently drops non-".tar" paths silently; update the code in the
function/method containing the loop that iterates "for path_str in
sorted(matches):" so that when a Path(path_str).expanduser() has a suffix !=
".tar" you append a warning log (e.g., logger.warning or processLogger.warn
depending on the module logger) mentioning the skipped file and the reason; keep
existing behavior of appending str(path.absolute()) to tar_files for ".tar"
files and ensure the logger symbol you use (logger/processLogger) is the
module's configured logger to make skipped-file diagnostics visible.

In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 122-199: _transcribe_with_chunking is async but performs blocking
I/O and CPU work (sf.read, sf.write, os.path.exists/unlink and calling the
synchronous self._transcribe_single), which blocks the event loop; wrap these
blocking calls in asyncio.to_thread (e.g., await asyncio.to_thread(sf.read,
audio_path), await asyncio.to_thread(sf.write, chunk_path, audio_chunk,
sampling_rate), await asyncio.to_thread(os.path.exists, chunk_path) / await
asyncio.to_thread(os.unlink, chunk_path)) and offload the sync inference call by
awaiting await asyncio.to_thread(self._transcribe_single, [chunk_path],
enable_timestamps, language); also apply the same to the synchronous call to
self._transcribe_single inside the async transcribe method so all blocking file
I/O and inference run in threads instead of the event loop.
- Line 112: The current truthy check `if language:` in serve_nemo_asr.py drops
an explicit empty-string language; change that check to an explicit None check
(`if language is not None:`) so an empty string is treated as a valid value, and
audit any other occurrences in the same function/method that reference the
variable `language` to replace truthiness checks with `is not None` where intent
is to allow empty-string inputs.
- Line 222: The temporary file is always created with suffix=".wav" which can
mislead NeMo ASR format detection; change the NamedTemporaryFile call to
preserve the uploaded file's original extension by extracting it (e.g., ext =
os.path.splitext(original_filename)[1].lower()) and passing suffix=ext (with a
safe default like ".wav" if ext is empty or not in an allowed set). Update the
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") usage to use the
computed suffix and validate/sanitize the extension before creating tmp_file so
tmp_file has the correct format for downstream NeMo functions.

---

Nitpick comments:
In `@nemo_skills/inference/model/nemo_asr.py`:
- Around line 519-538: The current _check_audio_duration function catches all
exceptions around sf.info() which masks parsing errors; change the except block
to only catch soundfile-specific exceptions (use sf.SoundFileError or the
appropriate soundfile exception type) and log/return None for those cases, while
allowing other unexpected exceptions to propagate (re-raise) so callers see real
failures; reference sf.info, sf.SoundFileError (or soundfile's specific error
class), and chunk_audio_threshold_sec when making the change.

In `@nemo_skills/inference/server/serve_nemo_asr.py`:
- Around line 53-56: Update the three exception handlers that re-raise
ImportError to preserve exception chaining and context by using "except
Exception as e: raise ImportError('...') from e" (apply this to the nemo_asr
import block and the two similar import blocks further down), and replace the
LOG.error(...) call at the referenced catch site with LOG.exception(...) so the
stack trace is logged; use a consistent exception variable name (e or exc) in
each handler when chaining.

In `@tests/test_nemo_asr_support.py`:
- Around line 1-253: Add an async pytest that exercises
NemoASRModel.generate_async by mocking httpx.AsyncClient.post to simulate the
server happy path and an error path: create a test using pytest-asyncio that
constructs a NemoASRModel, patches httpx.AsyncClient.post to capture the sent
multipart/form-data request (asserting file payload, form fields such as
language/timestamps, and headers), and returns an httpx.Response-like object
with status_code 200 and a JSON body matching the server hypothesis structure so
generate_async returns the expected transcription; also add a second case where
the mock returns a non-200 status (or raises httpx.HTTPError) and assert
generate_async raises the appropriate error. Ensure you reference
NemoASRModel.generate_async and patch httpx.AsyncClient.post (or the instance on
model._client) so the test covers request-building and response-parsing
end-to-end.
- Around line 77-87: Replace the manual try/except/AssertionError blocks testing
NemoASRServer._extract_first_hypothesis with pytest.raises context managers: use
"with pytest.raises(RuntimeError, match='No transcription returned')" for the
first call with [] and "with pytest.raises(RuntimeError, match='empty
transcription hypotheses')" for the second call with [[]] (or alternatively
capture the context as e and assert on str(e.value)). This will make tests
concise and produce clearer failure output while still verifying the exact
exception messages from _extract_first_hypothesis.

Comment on lines +106 to +109
if tokenizer is not None:
LOG.warning("NemoASRModel does not use tokenizer. Ignoring tokenizer argument.")
if output_dir is not None:
LOG.warning("NemoASRModel does not use output_dir. Ignoring output_dir argument.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

tokenizer and output_dir silently accepted via LOG.warning instead of failing — coding guideline violation.

Per coding guidelines: "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided." Issuing a warning allows callers to silently misconfigure the model without any breakage signal.

🐛 Proposed fix
-        if tokenizer is not None:
-            LOG.warning("NemoASRModel does not use tokenizer. Ignoring tokenizer argument.")
-        if output_dir is not None:
-            LOG.warning("NemoASRModel does not use output_dir. Ignoring output_dir argument.")
+        if tokenizer is not None:
+            raise ValueError("NemoASRModel does not support a tokenizer argument.")
+        if output_dir is not None:
+            raise ValueError("NemoASRModel does not support an output_dir argument.")

As per coding guidelines, "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/nemo_asr.py` around lines 106 - 109, The
NemoASRModel currently logs and ignores unsupported parameters (tokenizer and
output_dir) via LOG.warning; change this to raise a clear exception so callers
cannot silently pass unsupported args: inside the NemoASRModel constructor or
initializer where tokenizer and output_dir are checked, replace the LOG.warning
calls for tokenizer and output_dir with raised exceptions (e.g., ValueError or a
custom ConfigError) that include the parameter name and a brief message that
NemoASRModel does not accept that parameter, ensuring callers receive an
immediate failure instead of a silent warning.

Comment on lines +233 to +236
for path_str in sorted(matches):
path = Path(path_str).expanduser()
if path.suffix == ".tar":
tar_files.append(str(path.absolute()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Non-.tar files in glob results are silently dropped with no warning.

If a glob pattern like "audio_*" matches both .tar and non-tar files, only .tar files are kept. Non-matching files are silently discarded, with no log message. A misconfigured path (e.g., "*.wav") would result in an empty tar_files list and cryptic FileNotFoundError at inference time.

🐛 Proposed fix – warn on skipped files
             for path_str in sorted(matches):
                 path = Path(path_str).expanduser()
                 if path.suffix == ".tar":
                     tar_files.append(str(path.absolute()))
+                else:
+                    LOG.warning("Skipping non-tar file in tarred_audio_filepaths: %s", path_str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/nemo_asr.py` around lines 233 - 236, The loop
that filters glob results into tar_files currently drops non-".tar" paths
silently; update the code in the function/method containing the loop that
iterates "for path_str in sorted(matches):" so that when a
Path(path_str).expanduser() has a suffix != ".tar" you append a warning log
(e.g., logger.warning or processLogger.warn depending on the module logger)
mentioning the skipped file and the reason; keep existing behavior of appending
str(path.absolute()) to tar_files for ".tar" files and ensure the logger symbol
you use (logger/processLogger) is the module's configured logger to make
skipped-file diagnostics visible.

Comment on lines +549 to +556
def __del__(self):
"""Clean up resources."""
# Close SSH tunnel
if hasattr(self, "_tunnel") and self._tunnel:
try:
self._tunnel.stop()
except Exception:
pass # Ignore errors during cleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

__del__ closes the SSH tunnel but not the httpx.AsyncClient, leaving connection-pool handles open for callers not using the async context manager.

Because __del__ is synchronous, await self._client.aclose() cannot be called, so the HTTP connection pool leaks. The test workaround asyncio.run(model._client.aclose()) at line 27 of the test file confirms this gap. Callers that don't use async with NemoASRModel(...) will leak connections for the lifetime of the process.

Consider documenting that async with is required, or adding a synchronous fallback using asyncio.get_event_loop().run_until_complete:

def __del__(self):
    if hasattr(self, "_client") and not self._client.is_closed:
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                loop.create_task(self._client.aclose())
            else:
                loop.run_until_complete(self._client.aclose())
        except Exception:
            pass
    if hasattr(self, "_tunnel") and self._tunnel:
        try:
            self._tunnel.stop()
        except Exception:
            pass
🧰 Tools
🪛 Ruff (0.15.1)

[error] 555-556: try-except-pass detected, consider logging the exception

(S110)


[warning] 555-555: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/nemo_asr.py` around lines 549 - 556, The __del__
method currently only stops the SSH tunnel and leaves httpx.AsyncClient
connection pools open; update __del__ to also close self._client by checking
hasattr(self, "_client") and not self._client.is_closed, then attempt to close
it: if the event loop is running schedule self._client.aclose() via
loop.create_task(...) otherwise call
loop.run_until_complete(self._client.aclose()); keep the existing try/except
swallowing errors and retain the _tunnel.stop() cleanup; alternatively, if you
prefer, document that callers must use "async with NemoASRModel(...)" to avoid
leaks.

"return_hypotheses": True,
"timestamps": enable_timestamps,
}
if language:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

if language: silently drops an explicit empty-string language.

if language: evaluates to False for "", so an explicitly passed empty string is silently discarded. Prefer an explicit None check throughout.

🐛 Proposed fix
-        if language:
+        if language is not None:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if language:
if language is not None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_nemo_asr.py` at line 112, The current
truthy check `if language:` in serve_nemo_asr.py drops an explicit empty-string
language; change that check to an explicit None check (`if language is not
None:`) so an empty string is treated as a valid value, and audit any other
occurrences in the same function/method that reference the variable `language`
to replace truthiness checks with `is not None` where intent is to allow
empty-string inputs.

Comment on lines +122 to +199
async def _transcribe_with_chunking(
self,
audio_path: str,
chunk_duration_sec: float,
enable_timestamps: bool = False,
language: Optional[str] = None,
) -> tuple[str, float]:
"""Transcribe long audio by chunking it into smaller segments.

Args:
audio_path: Path to audio file
chunk_duration_sec: Duration of each chunk in seconds
enable_timestamps: Whether to enable timestamps

Returns:
Tuple of (transcribed_text, total_inference_time)
"""
try:
import numpy as np
import soundfile as sf
except ImportError:
raise ImportError("soundfile and numpy are required for audio chunking")

# Load audio file
audio_array, sampling_rate = sf.read(audio_path)
duration = len(audio_array) / sampling_rate

LOG.info(f"Chunking audio ({duration:.1f}s) into segments of {chunk_duration_sec}s")

# Calculate chunks
chunk_samples = int(chunk_duration_sec * sampling_rate)
num_chunks = int(np.ceil(len(audio_array) / chunk_samples))

chunks = []
for i in range(num_chunks):
start = i * chunk_samples
end = min((i + 1) * chunk_samples, len(audio_array))
chunk = audio_array[start:end]

# Merge tiny trailing chunks
min_chunk_samples = int(0.5 * sampling_rate) # 0.5 second minimum
if len(chunk) < min_chunk_samples and chunks:
chunks[-1] = np.concatenate([chunks[-1], chunk])
else:
chunks.append(chunk)

LOG.info(f"Created {len(chunks)} audio chunks")

# Transcribe each chunk
chunk_texts = []
total_time = 0.0

for chunk_idx, audio_chunk in enumerate(chunks):
# Save chunk to temporary file
chunk_path = f"{audio_path}.chunk_{chunk_idx}.wav"
try:
sf.write(chunk_path, audio_chunk, sampling_rate)

# Transcribe chunk
start_time = time.time()
hypotheses = self._transcribe_single([chunk_path], enable_timestamps, language)
chunk_time = time.time() - start_time
total_time += chunk_time

hypothesis = self._extract_first_hypothesis(hypotheses)
text = hypothesis.text
chunk_texts.append(text.strip())
LOG.debug(f"Chunk {chunk_idx + 1}/{len(chunks)}: {text[:50]}...")

finally:
# Clean up chunk file
if os.path.exists(chunk_path):
os.unlink(chunk_path)

# Concatenate all chunk transcriptions
full_text = " ".join(chunk_texts)

return full_text, total_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocking synchronous operations inside async handlers will stall the event loop.

_transcribe_with_chunking is declared async but contains zero await calls — sf.read (line 146), sf.write (line 178), and self._transcribe_single (line 182) are all synchronous blocking calls. Identically, self._transcribe_single at line 246 inside the async transcribe method blocks the event loop directly. Under uvicorn, any other in-flight request (e.g., health-check) is frozen until inference finishes.

Use asyncio.to_thread to offload blocking work:

🔧 Proposed fix – offload blocking inference and file I/O to a thread
+import asyncio

 # _transcribe_with_chunking
-        audio_array, sampling_rate = sf.read(audio_path)
+        audio_array, sampling_rate = await asyncio.to_thread(sf.read, audio_path)

-                sf.write(chunk_path, audio_chunk, sampling_rate)
-
-                start_time = time.time()
-                hypotheses = self._transcribe_single([chunk_path], enable_timestamps, language)
+                await asyncio.to_thread(sf.write, chunk_path, audio_chunk, sampling_rate)
+
+                start_time = time.time()
+                hypotheses = await asyncio.to_thread(
+                    self._transcribe_single, [chunk_path], enable_timestamps, language
+                )

 # transcribe method
-                hypotheses = self._transcribe_single([tmp_path], enable_timestamps, language)
+                hypotheses = await asyncio.to_thread(
+                    self._transcribe_single, [tmp_path], enable_timestamps, language
+                )

Also applies to: 243-247

🧰 Tools
🪛 Ruff (0.15.1)

[warning] 143-143: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 143-143: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_nemo_asr.py` around lines 122 - 199,
_transcribe_with_chunking is async but performs blocking I/O and CPU work
(sf.read, sf.write, os.path.exists/unlink and calling the synchronous
self._transcribe_single), which blocks the event loop; wrap these blocking calls
in asyncio.to_thread (e.g., await asyncio.to_thread(sf.read, audio_path), await
asyncio.to_thread(sf.write, chunk_path, audio_chunk, sampling_rate), await
asyncio.to_thread(os.path.exists, chunk_path) / await
asyncio.to_thread(os.unlink, chunk_path)) and offload the sync inference call by
awaiting await asyncio.to_thread(self._transcribe_single, [chunk_path],
enable_timestamps, language); also apply the same to the synchronous call to
self._transcribe_single inside the async transcribe method so all blocking file
I/O and inference run in threads instead of the event loop.

Transcription result in OpenAI-compatible format
"""
# Save uploaded file to temporary location
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Temporary file always gets a .wav suffix regardless of the uploaded file format.

NeMo ASR backends (Canary, Parakeet) may use the file extension for format detection. An uploaded FLAC or MP3 would be saved as .wav, potentially causing a decoding error or silently incorrect decoding.

🐛 Proposed fix – preserve the original extension
-        with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
+        original_suffix = os.path.splitext(audio_file.filename or "audio.wav")[1] or ".wav"
+        with tempfile.NamedTemporaryFile(delete=False, suffix=original_suffix) as tmp_file:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/server/serve_nemo_asr.py` at line 222, The temporary
file is always created with suffix=".wav" which can mislead NeMo ASR format
detection; change the NamedTemporaryFile call to preserve the uploaded file's
original extension by extracting it (e.g., ext =
os.path.splitext(original_filename)[1].lower()) and passing suffix=ext (with a
safe default like ".wav" if ext is empty or not in an allowed set). Update the
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") usage to use the
computed suffix and validate/sanitize the extension before creating tmp_file so
tmp_file has the correct format for downstream NeMo functions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/model/__init__.py`:
- Around line 23-24: The import order is wrong causing ruff to auto-fix; move
the `from .nemo_asr import NemoASRModel` line so imports are alphabetically
sorted among the NeMo-related imports (place `.nemo_asr` after `.megatron` and
before `.openai`, keeping it adjacent to other speech model imports like
`.asr_nim`/`.audio_utils` as appropriate) and commit the updated file so the
ruff-check pipeline passes.

Comment on lines +23 to +24
# NeMo models (speech)
from .nemo_asr import NemoASRModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix import order to resolve the ruff-check pipeline failure.

The CI pipeline reports that ruff auto-fixed the file due to an import ordering violation. The .nemo_asr import is placed between .asr_nim and .audio_utils, but alphabetically it belongs after .megatron and before .openai. Commit the ruff-fixed version before merging.

🔧 Proposed fix – move the import to its sorted position
-# NeMo models (speech)
-from .nemo_asr import NemoASRModel
-
 # Audio utilities
 from .audio_utils import (
     audio_file_to_base64,
     chunk_audio,
     load_audio_file,
     make_audio_content_block,
     save_audio_chunk_to_base64,
 )
 from .azure import AzureOpenAIModel
 ...
 from .megatron import MegatronModel
+from .nemo_asr import NemoASRModel
 from .openai import OpenAIModel
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/__init__.py` around lines 23 - 24, The import
order is wrong causing ruff to auto-fix; move the `from .nemo_asr import
NemoASRModel` line so imports are alphabetically sorted among the NeMo-related
imports (place `.nemo_asr` after `.megatron` and before `.openai`, keeping it
adjacent to other speech model imports like `.asr_nim`/`.audio_utils` as
appropriate) and commit the updated file so the ruff-check pipeline passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants