Conversation
CI CommandsThe following CI workflows run automatically on every push and pull request:
The following commands can be used by maintainers to trigger additional tests that require access to secrets:
|
|
@luccabb could I get some feedback? |
There was a problem hiding this comment.
@theap06 we already support json to files at the file exporter, maybe add CSV shape support there?
https://facebookresearch.github.io/gcm/docs/GCM_Monitoring/exporters/file/
…research#87) Add a new 'telemetry' sink that periodically appends telemetry snapshots to a local file in JSON or CSV format for offline analysis. - JSON: NDJSON format (one object per line) - CSV: Header on first write, append rows - Options: file_path (required), format (json|csv, default json) - Works with nvml_monitor, slurm_monitor, and health checks Closes facebookresearch#87 Made-with: Cursor
Made-with: Cursor
94e1f2c to
4f00cf7
Compare
added support for csv |
Co-authored-by: lucca bertoncini <32229669+luccabb@users.noreply.github.com>
184a18e to
060fafe
Compare
- Add format option (json|csv) to file exporter - CSV output uses flattened scuba message structure - First write adds header row, subsequent writes append - Update docs for both Monitoring and Health Checks Made-with: Cursor
060fafe to
961b5b7
Compare
…p06/gcm into feat/telemetry-export-clean
|
@luccabb Could i get some feedback? |
|
Claude finished @luccabb's task in 47s —— View job Re-review: Schema-Change Fix Not Implemented
@theap06 Unfortunately the critical issue has not been fixed. The code still writes a new header mid-file when the schema changes, producing an invalid CSV. What the code still does (broken)
if previous_fieldnames != fieldnames:
header_buf = io.StringIO()
header_writer = csv.DictWriter(header_buf, fieldnames=all_keys, ...)
header_writer.writeheader()
logger.info(header_buf.getvalue()) # ← writes new header to the SAME file
self._csv_fieldnames[path] = fieldnamesWhen the schema changes, a new header row is written into the same file via the same Test still validates the broken behavior
lines = path.read_text().splitlines()
assert lines == [
"job_id,state,user",
"1,RUNNING,alice",
"gpu_uuid,memory_used_mb", # ← still asserts mixed-header output as "correct"
"GPU-123,2048",
]The test explicitly asserts this broken output as expected behavior — nothing has changed here. What needs to happenWhen a schema change is detected, a new file should be created instead of inserting a header into the existing file. For example:
The test should then verify that two separate, valid CSV files are created — each parseable independently. |
|
@claude made the changes. Can you review my pr? |
|
Claude finished @luccabb's task in 54s —— View job Re-review: Schema-Change Fix Verified ✓
The critical issue has been fixed correctly this time. Here's what was implemented: What ChangedNew helper ( def _schema_versioned_path(path: str, schema_index: int) -> str:
if schema_index == 0:
return path
stem, ext = os.path.splitext(path)
return f"{stem}_{schema_index}{ext}"New tracking dicts (
Schema-change rollover logic ( Test updated ( Remaining Minor Nit (non-blocking)
for record in records:
row_buf = io.StringIO()
row_writer = csv.DictWriter(row_buf, fieldnames=all_keys, ...)
row_writer.writerow(record)
logger.info(row_buf.getvalue())A new VerdictThe critical CSV schema-change bug is fixed. The implementation is clean, follows existing logger/path patterns, and the test correctly validates the rollover behavior. This is ready to merge pending maintainer approval. |
|
@claude fixed the perf issues. |
|
@luccabb should be good to go now. maybe we can add a ci workflow like /rubric or something for quick feedback on prs. tool calls instead of always needing to call claude api. |
|
@luccabb done |
luccabb
left a comment
There was a problem hiding this comment.
test plan is wrong. add the execution cmd (cat filename) and output from that cmd for the following lines.
you should likely be passing more than 1 filename, given that you'd want data to be written to different files. please test this thoroughly and ensure test plans are fully complete before sending back for review
Sorry about that! I just updated the the test plan |
|
@claude review my pr |
|
Claude finished @luccabb's task in 1m 18s —— View job PR Review: CSV Support for File Exporter
SummaryAll previously identified issues have been addressed. The implementation is clean and correct. What's Been Fixed
Minor Nits (non-blocking)1. Conditional initialization of if self.format == "csv":
self._csv_fieldnames: Dict[str, Tuple[str, ...]] = {}
self._csv_schema_index: Dict[DataIdentifier, int] = {}These attributes only exist when 2. GCM Health Checks doc not updated for CSV ( The health checks doc now has the 3. No test for multiple successive schema changes: Only one rollover is tested ( VerdictThe critical bug is fixed, performance nit is addressed, tests are correct, and documentation is updated. This is ready to merge. The nits above are minor and non-blocking. |
|
@luccabb should be ready to merge |
luccabb
left a comment
There was a problem hiding this comment.
if I run this PR twice it breaks CSV structure, headers shouldn't be repeated in the middle of the file
$ gcm nvml_monitor --sink=file --sink-opt file_path=/tmp/telemetry.csv --sink-opt format=csv --once
$ gcm nvml_monitor --sink=file --sink-opt file_path=/tmp/telemetry.csv --sink-opt format=csv --once
$ cat /tmp/telemetry.csv
gpu_id,gpu_util,hostname,job_id,mem_used_percent,mem_util,power_draw,power_used_percent,retired_pages_count_double_bit,retired_pages_count_single_bit,temperature
0,0,host0101,-1,0,0,33111,14,0,0,38
1,0,host0101,-1,0,0,30435,12,0,0,33
gpu_id,gpu_util,hostname,job_id,mem_used_percent,mem_util,power_draw,power_used_percent,retired_pages_count_double_bit,retired_pages_count_single_bit,temperature
0,0,host0101,-1,0,0,33111,14,0,0,37
1,0,host0101,-1,0,0,30206,12,0,0,33
|
Claude finished @luccabb's task in 1m 32s —— View job Re-review: New Bug — Header Repeated on Second Run
|
| Detail | |
|---|---|
| File | gcm/exporters/file.py |
| Lines | __init__ line 66 (dict init), write lines 137 and 161–171 (header guard) |
| Root cause | _csv_fieldnames is in-memory only; second invocation sees empty dict and re-emits header |
| Fix | Seed _csv_fieldnames from the first line of any pre-existing file during __init__ |
| Missing test | No test covers "second process, same file" scenario |
|
@claude review my pr |
|
@luccabb could I get feedback on this pr |
|
@luccabb it should be good to merge |
|
@luccabb I'll close this pr for now/ |

Closes #87
Summary
Adds a new
telemetryexporter that periodically appends structured telemetry snapshots to a local file in JSON or CSV format for offline analysis.What's New
--sink=telemetryfile_path(required): Path to the output fileformat(optional):json(default) orcsvUsage
##CSV Output
Example Output (JSON)
Implementation
gcm/exporters/telemetry.py@register,write(Log, SinkAdditionalParams))gcm nvml_monitor,gcm slurm_monitor, and health check commandsTesting