Mlflow by maryamtahhan · Pull Request #119 · redhat-et/vllm-cpu-perf-eval

maryamtahhan · 2026-05-01T13:24:14Z

depends on #117

Summary by CodeRabbit

New Features
- MLflow experiment-tracking support: server launch/stop, Docker compose, CLI and Ansible-based import workflows, and a Python logging tool for ingesting benchmark runs
- CPU benchmark conversion utilities and a batch converter for generating standardized CSV outputs
Documentation
- End-to-end MLflow guide and quick-starts; added navigation link for MLflow docs
- CPU conversion usage guide and examples
Chores
- Updated .gitignore to narrowly manage results and MLflow artifacts
- Added pandas dependency for processing results

coderabbitai · 2026-05-01T13:24:28Z

Warning

Review limit reached

@maryamtahhan, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 55 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 93410c64-d132-4675-a39c-8fb063bce96c

📥 Commits

Reviewing files that changed from the base of the PR and between d03f49e and 0c8cb98.

📒 Files selected for processing (17)

.gitignore
_config.yml
automation/test-execution/ansible/examples/mlflow-logging-examples.sh
automation/test-execution/ansible/llm-benchmark-auto.yml
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/log-to-mlflow.yml
automation/test-execution/ansible/scripts/log_to_mlflow.py
automation/test-execution/ansible/scripts/mlflow-quick-log.sh
automation/test-execution/mlflow/README.md
automation/test-execution/mlflow/docker-compose.yml
automation/test-execution/mlflow/launch-mlflow.sh
automation/test-execution/mlflow/requirements.txt
automation/test-execution/mlflow/stop-mlflow.sh
docs/docs.md
docs/mlflow.md
results/convert.md
results/scripts/convert_batch.py

📝 Walkthrough

Walkthrough

Adds MLflow experiment-tracking support (server compose, launch/stop, venv deps), Ansible playbook + Python CLI to log LLM benchmark results (single and batch modes), helper shell tooling and docs, plus CPU benchmark JSON→CSV conversion scripts and small repo config/site navigation updates.

Changes

MLflow Tracking & Logging

Layer / File(s)	Summary
Configuration / Data Shape `automation/test-execution/mlflow/docker-compose.yml`, `automation/test-execution/mlflow/requirements.txt`, `automation/test-execution/mlflow/README.md`, `automation/test-execution/mlflow/launch-mlflow.sh`	Adds MLflow server compose, mlflow dependency (>=3.11.1), launch script that prepares venv/installs MLflow and waits for health endpoint, and README with quick start and operational notes.
Core Implementation `automation/test-execution/ansible/scripts/log_to_mlflow.py`	New Python CLI implementing MLflow logging: loads `benchmarks.json`/`test-metadata.json`, deduplicates by `test_run_id`, creates/restores experiments, logs artifacts, params, tags, aggregate and per-load-point metrics, and optional server-side vLLM metrics; exposes public helper functions.
Wiring / Orchestration `automation/test-execution/ansible/log-to-mlflow.yml`, `automation/test-execution/ansible/scripts/mlflow-quick-log.sh`, `automation/test-execution/ansible/examples/mlflow-logging-examples.sh`, `automation/test-execution/mlflow/stop-mlflow.sh`	Adds Ansible playbook that validates MLflow availability, runs single-test or batch-import flows (with model/workload filtering), prints summaries, and a bash helper to invoke the playbook in `latest
Playbook Integration Adjustments `automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`, `automation/test-execution/ansible/llm-benchmark-auto.yml`	Adjusts how socket-pinning passthrough variables are passed (`
Documentation / Tests `docs/mlflow.md`, `automation/test-execution/mlflow/README.md`	Comprehensive MLflow guide added documenting launch/import flows, expected metrics/tags/artifacts, ingestion options, operational management, troubleshooting, and examples for the new tooling.

CPU Benchmark Conversion

Layer / File(s)	Summary
Data Shape / Parsing `results/scripts/import_manual_runs_json_cpu.py`, `results/README-cpu-results-psap-conversion.md`	Adds CPU-specific JSON importer that parses guidellm benchmark JSON and optional `test-metadata.json`, extracts throughput/latency/concurrency and CPU-affinity/threading fields, documents expected CSV schema and usage.
Batch Orchestration `results/scripts/batch_convert_results.py`	Adds batch scanner that finds `benchmarks.json`/`test-metadata.json` pairs under `results/llm` and invokes the CPU importer, aggregates progress and exits non-zero on failures.
Output / Docs `results/README-cpu-results-psap-conversion.md`, `results/scripts/*`	Docs and CLI behavior described; CSV append/update semantics and field-order enforcement implemented in importer.

Repository Configuration & Site Navigation

Layer / File(s)	Summary
Repo ignores & artifacts `.gitignore`	Replaces broad `results/` ignore with targeted ignores (`results/llm`, `results/.csv`), unignores `results/README.md` and `results/scripts/`, and adds ignores for MLflow artifacts/DB: `mlruns/`, `mlflow.db`, `.db-journal`, and `automation/test-execution/mlflow/artifacts/`.
Site nav / Dependencies `_config.yml`, `requirements.txt`	Adds "MLflow Tracking" nav link to site config and adds `pandas>=2.0.0` to top-level requirements.

Sequence Diagram(s)

sequenceDiagram
    participant Playbook as Ansible Playbook
    participant Python as log_to_mlflow.py
    participant MLflow as MLflow Server
    participant FS as Filesystem

    Playbook->>Playbook: determine mlflow_tracking_uri / mode
    Playbook->>Playbook: verify mlflow import available
    Playbook->>MLflow: optional health check

    alt Batch import
        Playbook->>FS: find benchmarks.json under results_base
        Playbook->>Playbook: apply model/workload filters
        loop per matched benchmark
            Playbook->>FS: locate sibling test-metadata.json
            Playbook->>Python: exec log_to_mlflow.py with paths
        end
    else Single import
        Playbook->>FS: assert provided files exist
        Playbook->>Python: exec log_to_mlflow.py with single paths
    end

    Python->>FS: load benchmarks.json and test-metadata.json
    Python->>MLflow: search runs for params.test_run_id
    alt existing run
        Python->>Python: skip logging (deduplicate)
    else new run
        Python->>MLflow: create/run experiment
        Python->>FS: read optional CSV/logs/vllm-metrics
        Python->>MLflow: log artifacts, params, tags
        Python->>MLflow: log aggregate and per-load-point metrics
        Python->>MLflow: log server-side metrics if present
    end

    Python-->>Playbook: exit status
    Playbook->>Playbook: aggregate processed/success/failed counts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The pull request title 'Mlflow' is vague and generic, using a single non-descriptive term that does not convey meaningful information about the changeset.	Replace the generic title with a specific, descriptive phrase that summarizes the main change, such as 'Add MLflow experiment tracking integration' or 'Integrate MLflow for benchmark result tracking'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

automation/test-execution/ansible/scripts/mlflow-quick-log.sh (1)

140-140: 💤 Low value

GNU find -printf may not work on macOS.

The -printf option is GNU-specific and unavailable on macOS's BSD find. Consider using stat with portable options or documenting the Linux requirement.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh` at line 140,
The current LATEST_BENCH assignment uses GNU-only find -printf (line using
LATEST_BENCH), which breaks on macOS; change it to a portable approach by
replacing the find ... -printf pipeline with a find ... -print0 | xargs -0 stat
pipeline and add a small platform switch: for Linux use stat -c '%Y %n' and for
macOS (Darwin) use stat -f '%m %N', then sort by timestamp and pick the newest
file into LATEST_BENCH; update the mlflow-quick-log.sh snippet that sets
LATEST_BENCH to use this cross-platform pipeline (or alternatively document that
the script requires GNU find) so the script works on both Linux and macOS.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 220-223: In the import_playbook vars block replace default(omit)
with default(None) for the variables vllm_cpu_start, vllm_numa_node,
guidellm_cpus, and guidellm_numa_node (also apply same change for the Phase 2
and Phase 3 occurrences), and then update downstream conditional checks in
llm-benchmark-auto.yml that currently use "when: vllm_cpu_start is defined" (and
similar) to use "is not none" (e.g., "when: vllm_cpu_start is not none") and
ensure any int() conversions are only executed when the value is not none to
avoid conversion errors.

In `@automation/test-execution/ansible/scripts/log_to_mlflow.py`:
- Around line 472-476: The three print statements that start with print(f"⚠️ 
Test already logged to MLflow") / print(f"  Existing Run ID: {existing_run_id}")
/ print(f"  Existing Run Name: {existing_run_name}") (and a similar occurrence
around line 540) use the f-string prefix when some of them contain no
placeholders; remove the unnecessary leading f from any string literal that has
no interpolation (leave the f only where variables are actually formatted),
e.g., update the print calls referencing existing_run_id and existing_run_name
to use regular string literals for static text and keep variable interpolation
only where needed so prints use correct plain strings without redundant
f-prefixes.

In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh`:
- Line 184: The echo uses nested command substitutions with basename and dirname
but the dirname output isn't quoted, which can break on paths with spaces;
change the substitution to quote the dirname result like $(basename "$(dirname
"$bench")") (i.e., ensure dirname "$bench" is wrapped in quotes and passed as a
single argument to basename) so the Logging line becomes robust for paths
containing spaces.

In `@automation/test-execution/mlflow/docker-compose.yml`:
- Line 1: The YAML linter is failing on the compose file's version string;
update the top-level version entry in
automation/test-execution/mlflow/docker-compose.yml (the version key) to use a
double-quoted string (e.g., change the current version value to "3.8") so it
satisfies the repository's quoted-strings rule and fixes the CI lint failure.
- Around line 5-9: Replace the non-reproducible and insecure image and root
user: change the image value (currently "ghcr.io/mlflow/mlflow:latest") to a
specific version tag like "ghcr.io/mlflow/mlflow:v3.10.1-full" or, better, pin
to the GHCR image digest (sha256:...) to ensure reproducible deployments, and
remove or change the user field (currently "user: \"0:0\"") to a non-root
UID:GID such as "1000:1000" (or omit the user line if the container image
defaults to a non-root user); update the docker-compose service that uses
image/container_name to apply these changes.

In `@automation/test-execution/mlflow/requirements.txt`:
- Line 1: Update the mlflow requirement to exclude vulnerable releases by
changing the version spec in automation/test-execution/mlflow/requirements.txt
from an open lower-bound (mlflow>=2.17.0) to a pinned safe range (e.g.,
mlflow>=3.11.1) so package installs will avoid versions <=3.10.1 that contain
the reported vulnerabilities; ensure the new spec is applied in the same
requirements entry so CI and dependency installers pick up the safer minimum
version.

In `@automation/test-execution/mlflow/stop-mlflow.sh`:
- Line 39: The script uses cd "$SCRIPT_DIR" without checking for failure, so if
the directory change fails subsequent commands (e.g., $COMPOSE_CMD down) run in
the wrong location; update stop-mlflow.sh to verify the cd succeeded by testing
its exit status and aborting with a clear error message (or use a safe
pushd/popd and handle failure) before running $COMPOSE_CMD or any other commands
that assume being in $SCRIPT_DIR.

---

Nitpick comments:
In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh`:
- Line 140: The current LATEST_BENCH assignment uses GNU-only find -printf (line
using LATEST_BENCH), which breaks on macOS; change it to a portable approach by
replacing the find ... -printf pipeline with a find ... -print0 | xargs -0 stat
pipeline and add a small platform switch: for Linux use stat -c '%Y %n' and for
macOS (Darwin) use stat -f '%m %N', then sort by timestamp and pick the newest
file into LATEST_BENCH; update the mlflow-quick-log.sh snippet that sets
LATEST_BENCH to use this cross-platform pipeline (or alternatively document that
the script requires GNU find) so the script works on both Linux and macOS.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: e978dd99-3c0e-432d-bea7-1e941c96bd8a

📥 Commits

Reviewing files that changed from the base of the PR and between 33ee047 and 640ad8e.

📒 Files selected for processing (17)

.gitignore
_config.yml
automation/test-execution/ansible/examples/mlflow-logging-examples.sh
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/log-to-mlflow.yml
automation/test-execution/ansible/scripts/log_to_mlflow.py
automation/test-execution/ansible/scripts/mlflow-quick-log.sh
automation/test-execution/mlflow/README.md
automation/test-execution/mlflow/docker-compose.yml
automation/test-execution/mlflow/launch-mlflow.sh
automation/test-execution/mlflow/requirements.txt
automation/test-execution/mlflow/stop-mlflow.sh
docs/mlflow.md
requirements.txt
results/README-cpu-results-psap-conversion.md
results/scripts/batch_convert_results.py
results/scripts/import_manual_runs_json_cpu.py

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

automation/test-execution/ansible/log-to-mlflow.yml (1)

100-117: ⚡ Quick win

Prefer argv form for command to avoid path-splitting fragility.

ansible.builtin.command with cmd: >- joins everything into one string and re-tokenizes via shlex. Any benchmark/metadata path containing whitespace, or a templated experiment_name/run_name containing quotes, will split into multiple args and silently corrupt the call. Using the argv list form makes argument boundaries explicit and removes the need for conditional Jinja blocks. The same applies to the batch task at lines 177-190.

♻️ Suggested refactor (single-test task)

-    - name: Log test to MLflow
-      ansible.builtin.command:
-        cmd: >-
-          python3 {{ playbook_dir }}/scripts/log_to_mlflow.py
-          {{ benchmarks_file }}
-          {{ metadata_file }}
-          {% if experiment_name is defined %}-e "{{ experiment_name }}"{% endif %}
-          {% if run_name is defined %}-r "{{ run_name }}"{% endif %}
-          {% if mlflow_tracking_uri is defined %}-u "{{ mlflow_tracking_uri }}"{% endif %}
-          {% if log_per_load_point | default(false) %}--log-per-load-point{% endif %}
+    - name: Log test to MLflow
+      ansible.builtin.command:
+        argv: "{{ ['python3', playbook_dir ~ '/scripts/log_to_mlflow.py', benchmarks_file, metadata_file]
+                 + (['-e', experiment_name] if experiment_name is defined else [])
+                 + (['-r', run_name] if run_name is defined else [])
+                 + (['-u', mlflow_tracking_uri] if mlflow_tracking_uri is defined else [])
+                 + (['--log-per-load-point'] if (log_per_load_point | default(false)) else []) }}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@automation/test-execution/ansible/log-to-mlflow.yml` around lines 100 - 117,
The task "Log test to MLflow" currently uses ansible.builtin.command with a
single cmd string which is fragile to spaces/quotes; change it to the argv
(list) form so each argument is an explicit list element, build the args for
calling python3 and "{{ playbook_dir }}/scripts/log_to_mlflow.py" followed by
"{{ benchmarks_file }}" and "{{ metadata_file }}", and append optional elements
only when defined (e.g., experiment_name -> "-e", run_name -> "-r",
mlflow_tracking_uri -> "-u", and "--log-per-load-point" when log_per_load_point
is true) to avoid shell re-tokenization; keep the environment
MLFLOW_TRACKING_URI, register: mlflow_log_result, changed_when: false and the
same when conditions, and apply the same argv refactor to the batch task that
calls the same script in the other block.

automation/test-execution/mlflow/launch-mlflow.sh (1)

73-84: ⚡ Quick win

Dead error branch under set -e.

Because set -e is enabled (line 9), python3 -m venv "$SCRIPT_DIR/venv" will already abort the script on failure, so the else branch printing "Failed to create virtual environment" is unreachable. Same pattern repeats at lines 124-131 for $COMPOSE_CMD up -d. Either drop the redundant if [ $? -eq 0 ] checks and rely on set -e, or invoke the command in a controlled if ! form so the failure branch can actually run.

♻️ Suggested simplification

-if [ ! -d "$SCRIPT_DIR/venv" ]; then
-    echo "Creating virtual environment..."
-    python3 -m venv "$SCRIPT_DIR/venv"
-    if [ $? -eq 0 ]; then
-        echo -e "${GREEN}✓ Virtual environment created${NC}"
-    else
-        echo -e "${RED}✗ Failed to create virtual environment${NC}"
-        exit 1
-    fi
-else
+if [ ! -d "$SCRIPT_DIR/venv" ]; then
+    echo "Creating virtual environment..."
+    if ! python3 -m venv "$SCRIPT_DIR/venv"; then
+        echo -e "${RED}✗ Failed to create virtual environment${NC}"
+        exit 1
+    fi
+    echo -e "${GREEN}✓ Virtual environment created${NC}"
+else
     echo -e "${GREEN}✓ Virtual environment exists${NC}"
 fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@automation/test-execution/mlflow/launch-mlflow.sh` around lines 73 - 84, The
error-handling branches after commands are dead because the script uses set -e;
update the venv and compose blocks to use explicit conditional forms so the
failure branches can run: replace the current pattern that runs python3 -m venv
"$SCRIPT_DIR/venv" followed by if [ $? -eq 0 ] with a single if ! python3 -m
venv "$SCRIPT_DIR/venv"; then echo error and exit 1; fi (or simply run python3
-m venv without the redundant if and rely on set -e), and do the same for the
$COMPOSE_CMD up -d invocation so the failure message and exit are reachable
(reference: "$SCRIPT_DIR/venv", python3 -m venv, and "$COMPOSE_CMD up -d").

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Line 139: The when clause uses "is not none" which does not guard against
undefined Jinja2 variables; update the condition to check both defined and not
none, e.g. change "when: vllm_cpu_start is not none" to "when: vllm_cpu_start is
defined and vllm_cpu_start is not none", and apply the same pattern to the other
similar when conditions (the other vllm_* variables referenced in the nearby
assert tasks) so each uses "is defined and is not none".

In `@automation/test-execution/ansible/scripts/log_to_mlflow.py`:
- Around line 463-467: The MLflow filter string interpolates
metadata.get('test_run_id', 'unknown') directly into search_runs
(mlflow.search_runs) which will break if the test_run_id contains a single
quote; sanitize/escape the value before building filter_string (e.g., replace
any single-quote characters with two single-quotes or otherwise escape per
MLflow filter quoting rules) or validate/deny quotes, then use the sanitized
value in the filter_string f"params.test_run_id = '{sanitized_test_run_id}'" to
avoid MlflowException and ensure dedup works.

In `@automation/test-execution/ansible/scripts/mlflow-quick-log.sh`:
- Around line 184-202: The postfix arithmetic increment ((COUNT++)) in the
mlflow-quick-log.sh "today" branch (inside the while loop that reads
benchmarks.json) returns 0 on the first iteration and causes the script to exit
under set -e; change the increment to a form that yields a non-zero exit status
such as prefix increment ((++COUNT)) or an explicit arithmetic assignment
COUNT=$((COUNT+1)) so the loop can continue and the final echo in the "today"
case reports the correct count.

In `@results/scripts/batch_convert_results.py`:
- Around line 61-74: Add a timeout to the subprocess.run call to avoid
indefinite blocking: in the block that invokes subprocess.run with cmd (and
returns True on success), pass a sensible timeout value (e.g., timeout=300) and
add an except handler for subprocess.TimeoutExpired to log the timeout
(including benchmarks_json and any partial output) and return False; keep the
existing subprocess.CalledProcessError handling intact.

---

Nitpick comments:
In `@automation/test-execution/ansible/log-to-mlflow.yml`:
- Around line 100-117: The task "Log test to MLflow" currently uses
ansible.builtin.command with a single cmd string which is fragile to
spaces/quotes; change it to the argv (list) form so each argument is an explicit
list element, build the args for calling python3 and "{{ playbook_dir
}}/scripts/log_to_mlflow.py" followed by "{{ benchmarks_file }}" and "{{
metadata_file }}", and append optional elements only when defined (e.g.,
experiment_name -> "-e", run_name -> "-r", mlflow_tracking_uri -> "-u", and
"--log-per-load-point" when log_per_load_point is true) to avoid shell
re-tokenization; keep the environment MLFLOW_TRACKING_URI, register:
mlflow_log_result, changed_when: false and the same when conditions, and apply
the same argv refactor to the batch task that calls the same script in the other
block.

In `@automation/test-execution/mlflow/launch-mlflow.sh`:
- Around line 73-84: The error-handling branches after commands are dead because
the script uses set -e; update the venv and compose blocks to use explicit
conditional forms so the failure branches can run: replace the current pattern
that runs python3 -m venv "$SCRIPT_DIR/venv" followed by if [ $? -eq 0 ] with a
single if ! python3 -m venv "$SCRIPT_DIR/venv"; then echo error and exit 1; fi
(or simply run python3 -m venv without the redundant if and rely on set -e), and
do the same for the $COMPOSE_CMD up -d invocation so the failure message and
exit are reachable (reference: "$SCRIPT_DIR/venv", python3 -m venv, and
"$COMPOSE_CMD up -d").

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: baca9071-312e-4255-af19-5a6f61bc7f72

📥 Commits

Reviewing files that changed from the base of the PR and between 640ad8e and d03f49e.

📒 Files selected for processing (18)

.gitignore
_config.yml
automation/test-execution/ansible/examples/mlflow-logging-examples.sh
automation/test-execution/ansible/llm-benchmark-auto.yml
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/log-to-mlflow.yml
automation/test-execution/ansible/scripts/log_to_mlflow.py
automation/test-execution/ansible/scripts/mlflow-quick-log.sh
automation/test-execution/mlflow/README.md
automation/test-execution/mlflow/docker-compose.yml
automation/test-execution/mlflow/launch-mlflow.sh
automation/test-execution/mlflow/requirements.txt
automation/test-execution/mlflow/stop-mlflow.sh
docs/mlflow.md
requirements.txt
results/README-cpu-results-psap-conversion.md
results/scripts/batch_convert_results.py
results/scripts/import_manual_runs_json_cpu.py

✅ Files skipped from review due to trivial changes (7)

requirements.txt
_config.yml
.gitignore
automation/test-execution/mlflow/docker-compose.yml
results/README-cpu-results-psap-conversion.md
automation/test-execution/mlflow/README.md
docs/mlflow.md

🚧 Files skipped from review as they are similar to previous changes (2)

automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
results/scripts/import_manual_runs_json_cpu.py

This adds tooling to convert CPU-based guidellm benchmark results to the format required by the OpenShift PSAP performance dashboard. Changes: - Add import_manual_runs_json_cpu.py: CPU-specific converter adapted from the dashboard's import_manual_runs_json_v2.py with CPU-specific fields (core_count, cpuset_cpus, cpuset_mems, omp_num_threads, tpot_mean) - Add batch_convert_results.py: Batch processor for all results in results/llm/ directory - Add comprehensive documentation in results/README-cpu-results-psap-conversion.md - Add requirements.txt for pandas dependency - Update .gitignore to track conversion scripts while ignoring result data The scripts auto-detect CPU configuration from test-metadata.json files and generate CSV output compatible with the performance dashboard schema. Successfully converted 47 test configurations (208 benchmark runs) with 100% success rate. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Integrate MLflow for comprehensive experiment tracking and comparison of LLM performance benchmarks with both client and server metrics. Features: - One-command MLflow server setup with Docker/Podman support - Automated logging of benchmark results to MLflow - Client-side metrics from GuideLLM (throughput, latency, efficiency) - Server-side metrics from vLLM (resource usage, cache hit rates, queue times) - Batch import capabilities with filters (model, workload, date) - Deduplication and deleted experiment recovery - GitHub Pages documentation Components: - MLflow tracking server (Docker Compose with SQLite backend) - Python logging script with artifact upload - Ansible playbook for single/batch logging - Helper scripts for quick imports - Virtual environment setup matching Streamlit dashboard pattern Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

- Replace default(omit) with default(None) for socket pinning vars - Update conditionals to use 'is not none' for proper null checks - Remove unnecessary f-string prefixes from static strings - Quote shell command substitutions for paths with spaces - Make find command portable across Linux and macOS - Pin MLflow image to v3.11.1 and upgrade requirements - Change docker-compose to run as non-root user (1000:1000) - Add error handling for cd command in stop-mlflow.sh - Fix markdown heading formatting (remove trailing colons) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

…riptions Improve MLflow experiment organization and usability: - Implement parent/child run structure for load sweep tests - Enable per-load-point metrics logging by default - Auto-generate run descriptions with test context - Create child runs for each load point with dedicated metrics - Add load point tags for easier filtering and comparison - Fix code formatting to comply with linting standards This provides better experiment organization in the MLflow UI, making it easier to compare metrics across load points and different test configurations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

- Fix undefined variable errors in Ansible when clauses by adding 'is defined' checks - Sanitize test_run_id to prevent MLflow filter string injection - Fix postfix increment under set -e in mlflow-quick-log.sh - Add timeout to subprocess.run in batch_convert_results.py to prevent indefinite blocking Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

- Add MLflow quick link in main docs index - Add Results Analysis section with MLflow, Dashboards, and Metrics - Update documentation status table - Fix getting-started link path The MLflow documentation already exists at docs/mlflow.md but wasn't discoverable from the main documentation index. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan requested review from PoolPooer, jharriga and lmzuccarelli May 1, 2026 13:24

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

maryamtahhan force-pushed the mlflow branch 4 times, most recently from 2fde3db to d03f49e Compare May 5, 2026 11:39

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Comment thread automation/test-execution/ansible/llm-benchmark-auto.yml Outdated

Comment thread automation/test-execution/ansible/scripts/log_to_mlflow.py

Comment thread automation/test-execution/ansible/scripts/mlflow-quick-log.sh

Comment thread results/scripts/batch_convert_results.py

maryamtahhan and others added 5 commits May 27, 2026 09:39

maryamtahhan force-pushed the mlflow branch from 41f54c1 to efeca98 Compare May 27, 2026 08:40

maryamtahhan merged commit 8df1a72 into redhat-et:main May 27, 2026
7 checks passed

maryamtahhan deleted the mlflow branch May 27, 2026 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mlflow#119

Mlflow#119
maryamtahhan merged 6 commits into
redhat-et:mainfrom
maryamtahhan:mlflow

maryamtahhan commented May 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maryamtahhan commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maryamtahhan commented May 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading