Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Oct 24, 2025

Overview:

Add HuggingFace model cache checking to sanity_check.py for better pre-deployment validation.

Details:

  • Add HuggingFaceInfo class to check ~/.cache/huggingface/hub
  • Fix SGLang's Python site-packages check

Where should the reviewer start?

deploy/sanity_check.py

/coderabbit profile chill

Summary by CodeRabbit

  • New Features
    • Deployment verification now includes HuggingFace model cache analysis.
    • Thorough-check mode displays detailed HuggingFace model cache information and token status.
    • Help documentation updated to reflect HuggingFace cache monitoring.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

Walkthrough

The changes add a new HuggingFaceInfo class to analyze and report HuggingFace model cache state as part of the diagnostic system. This class integrates into the SystemInfo diagnostic tree, with optional detailed model enumeration in thorough-check mode. Help text is updated to reflect this addition.

Changes

Cohort / File(s) Summary
HuggingFace Cache Diagnostics
deploy/sanity_check.py
Adds HuggingFaceInfo class to detect and report cached Hugging Face models under ~/.cache/huggingface/hub, with optional thorough-check mode for detailed model enumeration and HF_TOKEN indicator. Integrates into SystemInfo construction via add_child(). Updates help text and intro narrative to reflect HuggingFace cache inspection. Note: Duplicate class definition present in file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring extra attention:

  • Duplicate HuggingFaceInfo class definition — Two identical class definitions appear in the same file; verify intent and consolidate or remove one copy
  • Integration into SystemInfo subtree — Confirm add_child() call is correctly positioned and doesn't introduce circular dependencies
  • Cache path assumptions — Validate that ~/.cache/huggingface/hub path handling is robust across platforms and edge cases
  • HF_TOKEN handling — Ensure sensitive token information is appropriately handled and not leaked in output

Poem

🐰 A cache so deep, with models galore,
HuggingFace treasures we now explore!
Though duplicates dance where one should be,
Our diagnostics bloom, for all to see! 🌱

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The pull request title "fix: sanity_check.py 1) sglang Python site-packages check 2) adding HuggingFace cache checking" is clearly related to the main changes in the changeset. According to the raw summary and PR objectives, the changes do include both fixing the SGLang Python site-packages check and adding a new HuggingFaceInfo class for HuggingFace cache checking. While the title uses a numbered list format which adds some verbosity, it remains specific and communicates the key changes without excessive vagueness. The title accurately reflects real aspects of the changeset, though it does not mention the deletion of deploy/dynamo_check.py.
Description Check ✅ Passed The pull request description includes three of the four required template sections: Overview (clearly describes adding HuggingFace cache checking), Details (lists specific changes to HuggingFaceInfo and SGLang check), and Where should the reviewer start (points to deploy/sanity_check.py). However, the description is missing the "Related Issues" section from the template, which should reference any associated GitHub issues using action keywords like Closes, Fixes, or Relates to. The existing content is substantive and covers the essential information, though this missing section represents an incomplete adherence to the template structure.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
deploy/sanity_check.py (4)

16-16: Standardize “Hugging Face” naming and sync help text.

Use “Hugging Face” consistently and align the options text with argparse help.

- - HuggingFace model cache (detailed with --thorough-check)
+ - Hugging Face model cache (detailed with --thorough-check)
@@
-    --thorough-check  Enable thorough checking (file permissions, directory sizes, HuggingFace model details)
+    --thorough-check  Enable thorough checking (file permissions, directory sizes, disk space, Hugging Face model details)

Also applies to: 92-93


1361-1367: Tighten exceptions, add debug logs, and fix unused loop var (ruff).

Narrow broad excepts, add debug logging instead of silent pass, and rename dirnames to _dirnames.

-                        try:
-                            stat_info = os.stat(item_path)
+                        try:
+                            stat_info = os.stat(item_path)
                             # Use the earlier of creation time or modification time
                             download_time = min(stat_info.st_ctime, stat_info.st_mtime)
                             download_date = self._format_timestamp_pdt(download_time)
-                        except Exception:
+                        except OSError as e:
+                            logging.debug("HF cache: stat failed for %s: %s", item_path, e)
                             download_date = "unknown"
@@
-        except Exception:
-            pass
+        except OSError as e:
+            logging.debug("HF cache: listing failed for %s: %s", cache_path, e)
@@
-            for dirpath, dirnames, filenames in os.walk(directory):
+            for dirpath, _dirnames, filenames in os.walk(directory):
                 for filename in filenames:
@@
-        except Exception:
-            pass
+        except OSError as e:
+            logging.debug("HF cache: size scan failed for %s: %s", directory, e)

Based on static analysis hints.

Also applies to: 1370-1377, 1386-1389, 1394-1395


1308-1319: Use model name as the node label for readability.

Makes the tree easier to scan than “Model N”.

-            model_node = NodeInfo(
-                label=f"Model {i+1}",
-                desc=f"{model_name}, downloaded={download_date}, size={size_str}",
-                status=NodeStatus.INFO,
-            )
+            model_node = NodeInfo(
+                label=model_name,
+                desc=f"downloaded={download_date}, size={size_str}",
+                status=NodeStatus.INFO,
+            )

1320-1329: Optional: also show when HF_TOKEN is not set.

If helpful, add a small INFO/WARNING when HF_TOKEN is unset to hint at auth-protected models.

     def _add_hf_token_info(self):
         """Add HF_TOKEN information if the environment variable is set."""
-        if os.environ.get("HF_TOKEN"):
+        if os.environ.get("HF_TOKEN"):
             token_node = NodeInfo(
                 label="HF_TOKEN",
                 desc="<set>",
                 status=NodeStatus.INFO,
             )
             self.add_child(token_node)
+        else:
+            self.add_child(NodeInfo(label="HF_TOKEN", desc="not set", status=NodeStatus.INFO))
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b622c and 1e91bb4.

📒 Files selected for processing (2)
  • deploy/dynamo_check.py (0 hunks)
  • deploy/sanity_check.py (5 hunks)
💤 Files with no reviewable changes (1)
  • deploy/dynamo_check.py
🧰 Additional context used
🪛 Ruff (0.14.1)
deploy/sanity_check.py

1365-1365: Do not catch blind exception: Exception

(BLE001)


1372-1372: Do not catch blind exception: Exception

(BLE001)


1376-1377: try-except-pass detected, consider logging the exception

(S110)


1376-1376: Do not catch blind exception: Exception

(BLE001)


1386-1386: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)


1394-1395: try-except-pass detected, consider logging the exception

(S110)


1394-1394: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: vllm (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
deploy/sanity_check.py (4)

19-26: Docs update looks good.

Clear rationale for standalone behavior and hard-coded paths.


47-50: Example output changes look fine.


57-57: Example HF cache line looks good.

Will reflect accurate counts after model filtering fix below.


337-339: Integration into SystemInfo is correct.

Runs in non-terse mode; respects --thorough-check.

- Only count models--* directories, excluding datasets--, spaces--, blobs
- Gate size calculation on thorough_check flag to keep default mode fast
- Add compute_sizes parameter with documentation

Signed-off-by: Keiven Chang <[email protected]>
When virtualenv site-packages is writable, downgrade non-writable system
directories from ERROR to WARNING.

Signed-off-by: Keiven Chang <[email protected]>
@keivenchang keivenchang changed the title feat: add HuggingFace cache checking to sanity_check.py fix: sglang Python check and adding HuggingFace cache checking to sanity_check.py Oct 28, 2025
@keivenchang keivenchang changed the title fix: sglang Python check and adding HuggingFace cache checking to sanity_check.py fix: sglang Python site-packages check and adding HuggingFace cache checking to sanity_check.py Oct 28, 2025
@github-actions github-actions bot added fix and removed feat labels Oct 28, 2025
@keivenchang keivenchang changed the title fix: sglang Python site-packages check and adding HuggingFace cache checking to sanity_check.py fix: sanity_check.py 1) sglang Python site-packages check 2) adding HuggingFace cache checking Oct 28, 2025
@keivenchang keivenchang merged commit f155286 into main Oct 29, 2025
26 of 27 checks passed
@keivenchang keivenchang deleted the keivenchang/update-sanity-check.py branch October 29, 2025 02:36
karen-sy pushed a commit that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants