Mlflow#119
Conversation
|
Warning Review limit reached
More reviews will be available in 28 minutes and 55 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (17)
📝 WalkthroughWalkthroughAdds MLflow experiment-tracking support (server compose, launch/stop, venv deps), Ansible playbook + Python CLI to log LLM benchmark results (single and batch modes), helper shell tooling and docs, plus CPU benchmark JSON→CSV conversion scripts and small repo config/site navigation updates. ChangesMLflow Tracking & Logging
CPU Benchmark Conversion
Repository Configuration & Site Navigation
Sequence Diagram(s)sequenceDiagram
participant Playbook as Ansible Playbook
participant Python as log_to_mlflow.py
participant MLflow as MLflow Server
participant FS as Filesystem
Playbook->>Playbook: determine mlflow_tracking_uri / mode
Playbook->>Playbook: verify mlflow import available
Playbook->>MLflow: optional health check
alt Batch import
Playbook->>FS: find benchmarks.json under results_base
Playbook->>Playbook: apply model/workload filters
loop per matched benchmark
Playbook->>FS: locate sibling test-metadata.json
Playbook->>Python: exec log_to_mlflow.py with paths
end
else Single import
Playbook->>FS: assert provided files exist
Playbook->>Python: exec log_to_mlflow.py with single paths
end
Python->>FS: load benchmarks.json and test-metadata.json
Python->>MLflow: search runs for params.test_run_id
alt existing run
Python->>Python: skip logging (deduplicate)
else new run
Python->>MLflow: create/run experiment
Python->>FS: read optional CSV/logs/vllm-metrics
Python->>MLflow: log artifacts, params, tags
Python->>MLflow: log aggregate and per-load-point metrics
Python->>MLflow: log server-side metrics if present
end
Python-->>Playbook: exit status
Playbook->>Playbook: aggregate processed/success/failed counts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
automation/test-execution/ansible/scripts/mlflow-quick-log.sh (1)
140-140: 💤 Low valueGNU
find-printfmay not work on macOS.The
-printfoption is GNU-specific and unavailable on macOS's BSDfind. Consider usingstatwith portable options or documenting the Linux requirement.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh` at line 140, The current LATEST_BENCH assignment uses GNU-only find -printf (line using LATEST_BENCH), which breaks on macOS; change it to a portable approach by replacing the find ... -printf pipeline with a find ... -print0 | xargs -0 stat pipeline and add a small platform switch: for Linux use stat -c '%Y %n' and for macOS (Darwin) use stat -f '%m %N', then sort by timestamp and pick the newest file into LATEST_BENCH; update the mlflow-quick-log.sh snippet that sets LATEST_BENCH to use this cross-platform pipeline (or alternatively document that the script requires GNU find) so the script works on both Linux and macOS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 220-223: In the import_playbook vars block replace default(omit)
with default(None) for the variables vllm_cpu_start, vllm_numa_node,
guidellm_cpus, and guidellm_numa_node (also apply same change for the Phase 2
and Phase 3 occurrences), and then update downstream conditional checks in
llm-benchmark-auto.yml that currently use "when: vllm_cpu_start is defined" (and
similar) to use "is not none" (e.g., "when: vllm_cpu_start is not none") and
ensure any int() conversions are only executed when the value is not none to
avoid conversion errors.
In `@automation/test-execution/ansible/scripts/log_to_mlflow.py`:
- Around line 472-476: The three print statements that start with print(f"⚠️
Test already logged to MLflow") / print(f" Existing Run ID: {existing_run_id}")
/ print(f" Existing Run Name: {existing_run_name}") (and a similar occurrence
around line 540) use the f-string prefix when some of them contain no
placeholders; remove the unnecessary leading f from any string literal that has
no interpolation (leave the f only where variables are actually formatted),
e.g., update the print calls referencing existing_run_id and existing_run_name
to use regular string literals for static text and keep variable interpolation
only where needed so prints use correct plain strings without redundant
f-prefixes.
In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh`:
- Line 184: The echo uses nested command substitutions with basename and dirname
but the dirname output isn't quoted, which can break on paths with spaces;
change the substitution to quote the dirname result like $(basename "$(dirname
"$bench")") (i.e., ensure dirname "$bench" is wrapped in quotes and passed as a
single argument to basename) so the Logging line becomes robust for paths
containing spaces.
In `@automation/test-execution/mlflow/docker-compose.yml`:
- Line 1: The YAML linter is failing on the compose file's version string;
update the top-level version entry in
automation/test-execution/mlflow/docker-compose.yml (the version key) to use a
double-quoted string (e.g., change the current version value to "3.8") so it
satisfies the repository's quoted-strings rule and fixes the CI lint failure.
- Around line 5-9: Replace the non-reproducible and insecure image and root
user: change the image value (currently "ghcr.io/mlflow/mlflow:latest") to a
specific version tag like "ghcr.io/mlflow/mlflow:v3.10.1-full" or, better, pin
to the GHCR image digest (sha256:...) to ensure reproducible deployments, and
remove or change the user field (currently "user: \"0:0\"") to a non-root
UID:GID such as "1000:1000" (or omit the user line if the container image
defaults to a non-root user); update the docker-compose service that uses
image/container_name to apply these changes.
In `@automation/test-execution/mlflow/requirements.txt`:
- Line 1: Update the mlflow requirement to exclude vulnerable releases by
changing the version spec in automation/test-execution/mlflow/requirements.txt
from an open lower-bound (mlflow>=2.17.0) to a pinned safe range (e.g.,
mlflow>=3.11.1) so package installs will avoid versions <=3.10.1 that contain
the reported vulnerabilities; ensure the new spec is applied in the same
requirements entry so CI and dependency installers pick up the safer minimum
version.
In `@automation/test-execution/mlflow/stop-mlflow.sh`:
- Line 39: The script uses cd "$SCRIPT_DIR" without checking for failure, so if
the directory change fails subsequent commands (e.g., $COMPOSE_CMD down) run in
the wrong location; update stop-mlflow.sh to verify the cd succeeded by testing
its exit status and aborting with a clear error message (or use a safe
pushd/popd and handle failure) before running $COMPOSE_CMD or any other commands
that assume being in $SCRIPT_DIR.
---
Nitpick comments:
In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh`:
- Line 140: The current LATEST_BENCH assignment uses GNU-only find -printf (line
using LATEST_BENCH), which breaks on macOS; change it to a portable approach by
replacing the find ... -printf pipeline with a find ... -print0 | xargs -0 stat
pipeline and add a small platform switch: for Linux use stat -c '%Y %n' and for
macOS (Darwin) use stat -f '%m %N', then sort by timestamp and pick the newest
file into LATEST_BENCH; update the mlflow-quick-log.sh snippet that sets
LATEST_BENCH to use this cross-platform pipeline (or alternatively document that
the script requires GNU find) so the script works on both Linux and macOS.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: e978dd99-3c0e-432d-bea7-1e941c96bd8a
📒 Files selected for processing (17)
.gitignore_config.ymlautomation/test-execution/ansible/examples/mlflow-logging-examples.shautomation/test-execution/ansible/llm-benchmark-concurrent-load.ymlautomation/test-execution/ansible/log-to-mlflow.ymlautomation/test-execution/ansible/scripts/log_to_mlflow.pyautomation/test-execution/ansible/scripts/mlflow-quick-log.shautomation/test-execution/mlflow/README.mdautomation/test-execution/mlflow/docker-compose.ymlautomation/test-execution/mlflow/launch-mlflow.shautomation/test-execution/mlflow/requirements.txtautomation/test-execution/mlflow/stop-mlflow.shdocs/mlflow.mdrequirements.txtresults/README-cpu-results-psap-conversion.mdresults/scripts/batch_convert_results.pyresults/scripts/import_manual_runs_json_cpu.py
2fde3db to
d03f49e
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
automation/test-execution/ansible/log-to-mlflow.yml (1)
100-117: ⚡ Quick winPrefer
argvform forcommandto avoid path-splitting fragility.
ansible.builtin.commandwithcmd: >-joins everything into one string and re-tokenizes via shlex. Any benchmark/metadata path containing whitespace, or a templatedexperiment_name/run_namecontaining quotes, will split into multiple args and silently corrupt the call. Using theargvlist form makes argument boundaries explicit and removes the need for conditional Jinja blocks. The same applies to the batch task at lines 177-190.♻️ Suggested refactor (single-test task)
- - name: Log test to MLflow - ansible.builtin.command: - cmd: >- - python3 {{ playbook_dir }}/scripts/log_to_mlflow.py - {{ benchmarks_file }} - {{ metadata_file }} - {% if experiment_name is defined %}-e "{{ experiment_name }}"{% endif %} - {% if run_name is defined %}-r "{{ run_name }}"{% endif %} - {% if mlflow_tracking_uri is defined %}-u "{{ mlflow_tracking_uri }}"{% endif %} - {% if log_per_load_point | default(false) %}--log-per-load-point{% endif %} + - name: Log test to MLflow + ansible.builtin.command: + argv: "{{ ['python3', playbook_dir ~ '/scripts/log_to_mlflow.py', benchmarks_file, metadata_file] + + (['-e', experiment_name] if experiment_name is defined else []) + + (['-r', run_name] if run_name is defined else []) + + (['-u', mlflow_tracking_uri] if mlflow_tracking_uri is defined else []) + + (['--log-per-load-point'] if (log_per_load_point | default(false)) else []) }}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@automation/test-execution/ansible/log-to-mlflow.yml` around lines 100 - 117, The task "Log test to MLflow" currently uses ansible.builtin.command with a single cmd string which is fragile to spaces/quotes; change it to the argv (list) form so each argument is an explicit list element, build the args for calling python3 and "{{ playbook_dir }}/scripts/log_to_mlflow.py" followed by "{{ benchmarks_file }}" and "{{ metadata_file }}", and append optional elements only when defined (e.g., experiment_name -> "-e", run_name -> "-r", mlflow_tracking_uri -> "-u", and "--log-per-load-point" when log_per_load_point is true) to avoid shell re-tokenization; keep the environment MLFLOW_TRACKING_URI, register: mlflow_log_result, changed_when: false and the same when conditions, and apply the same argv refactor to the batch task that calls the same script in the other block.automation/test-execution/mlflow/launch-mlflow.sh (1)
73-84: ⚡ Quick winDead error branch under
set -e.Because
set -eis enabled (line 9),python3 -m venv "$SCRIPT_DIR/venv"will already abort the script on failure, so theelsebranch printing "Failed to create virtual environment" is unreachable. Same pattern repeats at lines 124-131 for$COMPOSE_CMD up -d. Either drop the redundantif [ $? -eq 0 ]checks and rely onset -e, or invoke the command in a controlledif !form so the failure branch can actually run.♻️ Suggested simplification
-if [ ! -d "$SCRIPT_DIR/venv" ]; then - echo "Creating virtual environment..." - python3 -m venv "$SCRIPT_DIR/venv" - if [ $? -eq 0 ]; then - echo -e "${GREEN}✓ Virtual environment created${NC}" - else - echo -e "${RED}✗ Failed to create virtual environment${NC}" - exit 1 - fi -else +if [ ! -d "$SCRIPT_DIR/venv" ]; then + echo "Creating virtual environment..." + if ! python3 -m venv "$SCRIPT_DIR/venv"; then + echo -e "${RED}✗ Failed to create virtual environment${NC}" + exit 1 + fi + echo -e "${GREEN}✓ Virtual environment created${NC}" +else echo -e "${GREEN}✓ Virtual environment exists${NC}" fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@automation/test-execution/mlflow/launch-mlflow.sh` around lines 73 - 84, The error-handling branches after commands are dead because the script uses set -e; update the venv and compose blocks to use explicit conditional forms so the failure branches can run: replace the current pattern that runs python3 -m venv "$SCRIPT_DIR/venv" followed by if [ $? -eq 0 ] with a single if ! python3 -m venv "$SCRIPT_DIR/venv"; then echo error and exit 1; fi (or simply run python3 -m venv without the redundant if and rely on set -e), and do the same for the $COMPOSE_CMD up -d invocation so the failure message and exit are reachable (reference: "$SCRIPT_DIR/venv", python3 -m venv, and "$COMPOSE_CMD up -d").
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Line 139: The when clause uses "is not none" which does not guard against
undefined Jinja2 variables; update the condition to check both defined and not
none, e.g. change "when: vllm_cpu_start is not none" to "when: vllm_cpu_start is
defined and vllm_cpu_start is not none", and apply the same pattern to the other
similar when conditions (the other vllm_* variables referenced in the nearby
assert tasks) so each uses "is defined and is not none".
In `@automation/test-execution/ansible/scripts/log_to_mlflow.py`:
- Around line 463-467: The MLflow filter string interpolates
metadata.get('test_run_id', 'unknown') directly into search_runs
(mlflow.search_runs) which will break if the test_run_id contains a single
quote; sanitize/escape the value before building filter_string (e.g., replace
any single-quote characters with two single-quotes or otherwise escape per
MLflow filter quoting rules) or validate/deny quotes, then use the sanitized
value in the filter_string f"params.test_run_id = '{sanitized_test_run_id}'" to
avoid MlflowException and ensure dedup works.
In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh`:
- Around line 184-202: The postfix arithmetic increment ((COUNT++)) in the
mlflow-quick-log.sh "today" branch (inside the while loop that reads
benchmarks.json) returns 0 on the first iteration and causes the script to exit
under set -e; change the increment to a form that yields a non-zero exit status
such as prefix increment ((++COUNT)) or an explicit arithmetic assignment
COUNT=$((COUNT+1)) so the loop can continue and the final echo in the "today"
case reports the correct count.
In `@results/scripts/batch_convert_results.py`:
- Around line 61-74: Add a timeout to the subprocess.run call to avoid
indefinite blocking: in the block that invokes subprocess.run with cmd (and
returns True on success), pass a sensible timeout value (e.g., timeout=300) and
add an except handler for subprocess.TimeoutExpired to log the timeout
(including benchmarks_json and any partial output) and return False; keep the
existing subprocess.CalledProcessError handling intact.
---
Nitpick comments:
In `@automation/test-execution/ansible/log-to-mlflow.yml`:
- Around line 100-117: The task "Log test to MLflow" currently uses
ansible.builtin.command with a single cmd string which is fragile to
spaces/quotes; change it to the argv (list) form so each argument is an explicit
list element, build the args for calling python3 and "{{ playbook_dir
}}/scripts/log_to_mlflow.py" followed by "{{ benchmarks_file }}" and "{{
metadata_file }}", and append optional elements only when defined (e.g.,
experiment_name -> "-e", run_name -> "-r", mlflow_tracking_uri -> "-u", and
"--log-per-load-point" when log_per_load_point is true) to avoid shell
re-tokenization; keep the environment MLFLOW_TRACKING_URI, register:
mlflow_log_result, changed_when: false and the same when conditions, and apply
the same argv refactor to the batch task that calls the same script in the other
block.
In `@automation/test-execution/mlflow/launch-mlflow.sh`:
- Around line 73-84: The error-handling branches after commands are dead because
the script uses set -e; update the venv and compose blocks to use explicit
conditional forms so the failure branches can run: replace the current pattern
that runs python3 -m venv "$SCRIPT_DIR/venv" followed by if [ $? -eq 0 ] with a
single if ! python3 -m venv "$SCRIPT_DIR/venv"; then echo error and exit 1; fi
(or simply run python3 -m venv without the redundant if and rely on set -e), and
do the same for the $COMPOSE_CMD up -d invocation so the failure message and
exit are reachable (reference: "$SCRIPT_DIR/venv", python3 -m venv, and
"$COMPOSE_CMD up -d").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: baca9071-312e-4255-af19-5a6f61bc7f72
📒 Files selected for processing (18)
.gitignore_config.ymlautomation/test-execution/ansible/examples/mlflow-logging-examples.shautomation/test-execution/ansible/llm-benchmark-auto.ymlautomation/test-execution/ansible/llm-benchmark-concurrent-load.ymlautomation/test-execution/ansible/log-to-mlflow.ymlautomation/test-execution/ansible/scripts/log_to_mlflow.pyautomation/test-execution/ansible/scripts/mlflow-quick-log.shautomation/test-execution/mlflow/README.mdautomation/test-execution/mlflow/docker-compose.ymlautomation/test-execution/mlflow/launch-mlflow.shautomation/test-execution/mlflow/requirements.txtautomation/test-execution/mlflow/stop-mlflow.shdocs/mlflow.mdrequirements.txtresults/README-cpu-results-psap-conversion.mdresults/scripts/batch_convert_results.pyresults/scripts/import_manual_runs_json_cpu.py
✅ Files skipped from review due to trivial changes (7)
- requirements.txt
- _config.yml
- .gitignore
- automation/test-execution/mlflow/docker-compose.yml
- results/README-cpu-results-psap-conversion.md
- automation/test-execution/mlflow/README.md
- docs/mlflow.md
🚧 Files skipped from review as they are similar to previous changes (2)
- automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
- results/scripts/import_manual_runs_json_cpu.py
This adds tooling to convert CPU-based guidellm benchmark results to the format required by the OpenShift PSAP performance dashboard. Changes: - Add import_manual_runs_json_cpu.py: CPU-specific converter adapted from the dashboard's import_manual_runs_json_v2.py with CPU-specific fields (core_count, cpuset_cpus, cpuset_mems, omp_num_threads, tpot_mean) - Add batch_convert_results.py: Batch processor for all results in results/llm/ directory - Add comprehensive documentation in results/README-cpu-results-psap-conversion.md - Add requirements.txt for pandas dependency - Update .gitignore to track conversion scripts while ignoring result data The scripts auto-detect CPU configuration from test-metadata.json files and generate CSV output compatible with the performance dashboard schema. Successfully converted 47 test configurations (208 benchmark runs) with 100% success rate. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Integrate MLflow for comprehensive experiment tracking and comparison of LLM performance benchmarks with both client and server metrics. Features: - One-command MLflow server setup with Docker/Podman support - Automated logging of benchmark results to MLflow - Client-side metrics from GuideLLM (throughput, latency, efficiency) - Server-side metrics from vLLM (resource usage, cache hit rates, queue times) - Batch import capabilities with filters (model, workload, date) - Deduplication and deleted experiment recovery - GitHub Pages documentation Components: - MLflow tracking server (Docker Compose with SQLite backend) - Python logging script with artifact upload - Ansible playbook for single/batch logging - Helper scripts for quick imports - Virtual environment setup matching Streamlit dashboard pattern Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Replace default(omit) with default(None) for socket pinning vars - Update conditionals to use 'is not none' for proper null checks - Remove unnecessary f-string prefixes from static strings - Quote shell command substitutions for paths with spaces - Make find command portable across Linux and macOS - Pin MLflow image to v3.11.1 and upgrade requirements - Change docker-compose to run as non-root user (1000:1000) - Add error handling for cd command in stop-mlflow.sh - Fix markdown heading formatting (remove trailing colons) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
…riptions Improve MLflow experiment organization and usability: - Implement parent/child run structure for load sweep tests - Enable per-load-point metrics logging by default - Auto-generate run descriptions with test context - Create child runs for each load point with dedicated metrics - Add load point tags for easier filtering and comparison - Fix code formatting to comply with linting standards This provides better experiment organization in the MLflow UI, making it easier to compare metrics across load points and different test configurations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Fix undefined variable errors in Ansible when clauses by adding 'is defined' checks - Sanitize test_run_id to prevent MLflow filter string injection - Fix postfix increment under set -e in mlflow-quick-log.sh - Add timeout to subprocess.run in batch_convert_results.py to prevent indefinite blocking Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Add MLflow quick link in main docs index - Add Results Analysis section with MLflow, Dashboards, and Metrics - Update documentation status table - Fix getting-started link path The MLflow documentation already exists at docs/mlflow.md but wasn't discoverable from the main documentation index. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
depends on #117
Summary by CodeRabbit
New Features
Documentation
Chores