Add post-processing pipeline for exporting benchmark results to OpenSearch by acalhounRH · Pull Request #334 · redhat-performance/zathras

acalhounRH · 2025-12-01T17:48:12Z

Summary

This PR adds a post-processing pipeline that converts Zathras benchmark results into structured JSON documents and exports them to OpenSearch for analysis and performance tracking.

What's Included

12 Benchmark Processors

CPU: CoreMark, CoreMark Pro, SPEC CPU 2017, Passmark
Memory: STREAMS
I/O: FIO
Compute: Auto HPL
Java: SpecJBB
Python: PyPerf
Network: Uperf
Scheduling: Apache Pig
Multi-workload: Phoronix Test Suite

Key Features

Object-based schema with full metadata extraction (CPU, memory, NUMA, OS config, cloud provider)
Two-index design: zathras-results (summary) + zathras-timeseries (individual points)
Content-based deduplication (SHA256 hashing prevents duplicate uploads)
OpenSearch and Horreum export with automatic retry logic
Batch processing with recursive directory discovery

Usage

python3 -m post_processing.run_postprocessing
--input /path/to/results/
--config export_config.yml
--opensearch

✅ Tested with 34 Azure instances, 78 benchmarks
✅ Exported 629 summary docs + 34,698 time series points
✅ Deduplication verified (reprocessing updates, doesn't duplicate)
✅ Zero flake8 errors (1,340 issues fixed)

Files Changed

35 files added, 10,568 lines (35% code, 65% docs/config)
Main: run_postprocessing.py, schema.py, 12 processors
Exporters: OpenSearch, Horreum, TimeSeries
Utils: Archive handler, metadata extractor, parsers
Config: Index templates, example YAML

new feature addition only.

Dependencies

pyyaml>=6.0
opensearch-py>=2.0.0
requests>=2.28.0
python-dateutil>=2.8.0

AI Assistance

This PR was developed with AI assistance using Claude Sonnet 4.5 (Anthropic) via Cursor IDE. The implementation was collaboratively developed through iterative design discussions, code generation, testing, and refinement.

github-actions · 2025-12-01T17:48:48Z

PullHero Review

Provider: generativelanguage.googleapis.com Model: gemini-2.0-flash

Okay, I've reviewed the changes for the post-processing functionality being added to Zathras. This is a significant addition, enabling automated results analysis and export to external systems like OpenSearch and Horreum. The changes are extensive, so I'll focus on the overall design, key implementation details, and potential areas for improvement.

High-Level Assessment

The changes introduce a well-structured post-processing pipeline with clear separation of concerns. The use of abstract base classes and benchmark-specific processors promotes modularity and extensibility. The addition of exporters enables integrations with external data warehouses. The attention to detail regarding schema design, duplicate prevention, and error handling is commendable.

Key Strengths

Modular Design: The use of BaseProcessor and benchmark-specific processors promotes code reuse and makes it easier to add support for new benchmarks.
Schema-Driven Approach: Defining a clear schema (schema.py) ensures data consistency and facilitates querying. The updates to the schema look sensible.
Automated Metadata Extraction: The MetadataExtractor automates the process of extracting system configuration details, reducing manual effort and potential errors.
Comprehensive Test Support: Support for a wide range of benchmarks (12 out of 16 initially) demonstrates a commitment to broad applicability.
Duplicate Prevention: Using content-based checksums is a robust way to prevent duplicate documents in OpenSearch, especially when reprocessing data.
Clear Documentation: The README provides a good overview of the post-processing pipeline, including usage examples, schema details, and troubleshooting tips.
Two-Index OpenSearch Architecture: Addressing the OpenSearch field limit by using two indices (summary and timeseries) is a smart design choice.
Configurability: The export_config.yml file allows users to customize export settings, such as OpenSearch credentials and index names.
CI/CD Integration: Providing clear instructions for integrating the post-processing pipeline into the burden script is valuable for automated testing.
Colorized Logging: This is a nice touch that makes it easier to scan the logs and identify warnings and errors.

Areas for Improvement

Error Handling: While there's a good start to error handling, more specific exception handling within the processors could improve robustness. For example, catching specific exceptions during CSV parsing (e.g., csv.Error, ValueError) and providing more informative error messages. Also, consider a global try-except block in run_postprocessing.py to catch any unhandled exceptions.
Test Coverage: It's difficult to assess the test coverage without seeing the test suite. Ensure that there are comprehensive unit and integration tests for the processors and exporters, including tests for error conditions and edge cases. Consider adding tests that verify deduplication is working as expected.
Scalability: The current implementation processes results sequentially. Consider adding support for parallel processing (e.g., using multiprocessing or concurrent.futures) to improve performance when processing large numbers of result directories.
Configurable Directory Structures: The code relies on a specific directory structure for extracting metadata. While this is documented, it might be beneficial to allow users to configure the directory structure using regular expressions or other flexible mechanisms.
Dependency Management: While requirements.txt is present, explicitly stating the minimum required versions for dependencies would improve reproducibility.
Code Style: While the code is generally well-structured, consider using a linter (e.g., flake8, pylint) to enforce consistent code style and identify potential issues.
Data Validation: There's a good start with the validate method on the ZathrasDocument, but this could be expanded to include more specific validation rules for the different benchmark types. Consider using a library like jsonschema to enforce the schema.
Consider a dedicated tool to run and test the conversion: The file run_postprocessing.py is both the entry point for users, and the code to run the conversion. Consider moving the actual conversion logic to a new module, so run_postprocessing.py can be more focused on being the entry point, and the new module can be tested.
Remove dead code: There are unused imports (urllib.request, urllib.error) in some of the files.

Specific Feedback

.gitignore: The addition of .gitignore files is good for preventing sensitive information and temporary files from being committed. Consider adding more specific patterns to ignore IDE-specific files (e.g., `.idea/*

Vote: 0

PullHero

github-actions · 2025-12-01T17:48:51Z

PullHero Review

Provider: generativelanguage.googleapis.com Model: gemini-2.0-flash

Okay, I've reviewed the changes for the post-processing module. This is a significant addition to the Zathras framework, enabling automated extraction, transformation, and export of benchmark results to OpenSearch and Horreum. The changes include new files for schema definitions, result processors for various benchmarks, and exporters for OpenSearch and Horreum.

General Observations

Comprehensive Functionality: The added functionality provides a complete solution for post-processing and exporting benchmark data, addressing a key need for Zathras users.
Well-Structured Code: The code is generally well-organized, following clear patterns and conventions. The use of dataclasses for schema definition enhances readability and maintainability.
Good Documentation: The README.md file provides detailed instructions on how to use the post-processing module, including installation, configuration, and example queries.
Error Handling: The code includes robust error handling, with appropriate logging and retry mechanisms.
Duplicate Handling: The use of content-based checksums to prevent duplicate documents is a good design choice.

Specific Feedback

1. .gitignore Files

.gitignore: The addition of post_processing/* entries seems reasonable to exclude generated files and credentials. Good job.
post_processing/.gitignore: The contents of this file look good.

2. post_processing/README.md

Benchmark Support Status Table: Consider adding a "Last Updated" date to the table to indicate when the support status was last reviewed.
CI/CD Integration: It would be beneficial to mention the location within the burden script where the new lines of code need to be added.
Security Note: Add a section about the implications of disabling verify_ssl: false.
Link to docs folder: Add a link to the documents folder to make it easy to find.

3. post_processing/config/apply_opensearch_templates.sh

Security: The script deletes existing indices. This is a highly destructive action and should be called out with a prominent warning in the README, or perhaps removed entirely. It might be more useful to provide an example of how to update the index mappings instead of deleting the index.
Clarity: The script lacks error handling. If any curl command fails, the script should exit with an error code.

4. post_processing/exporters/opensearch_exporter.py

HTTP 409 Handling: Although the code logs HTTP 409 errors as warnings, consider adding an option to treat these errors as successes.
Docstrings and Typing: The code uses good docstrings and typing, which is appreciated.

5. post_processing/processors/base_processor.py

Validation: The validation check only checks for basic metadata. If possible, add validation for data types of other fields.
Hardcoded Zathras Version: The hardcoded zathras version may not be the best practice. Try and grab the version dynamically.
Consider extract_test_timestamp return value: It currently tries to return a time in ISO format, but can also return None. Consider returning a datetime object instead, and converting it into the desired format when needed.

6. post_processing/processors/pyperf_processor.py

Error Handling: The process_multiple function has a broad except clause. Narrow down the exception type to catch specific errors.

7. General

Configuration Validation: Add configuration validation to the config loading to ensure the schema is correct.

Code Snippets with Suggested Changes

--- a/post_processing/README.md
+++ b/post_processing/README.md
@@ -10,6 +10,8 @@
 
 Process entire result directories automatically:
 
+**Note: Before running, ensure you have Zathras installed on your controller system (see [Installation](../README.md#installation))**
+
 ```bash
 # 1. Configure credentials
 cp post_processing/config/export_config_example.yml post_processing/config/export_config.yml
@@ -20,6 +22,8 @@
     --input /path/to/results \
     --config post_processing/config/export_config.yml \
     --opensearch
+
+# See [Documentation](docs/post_processing/README.md) for detailed instructions
 
 # Or just generate JSON files
 python3 -

**Vote**: +1

**[PullHero](https://github.com/pullhero/)**

github-actions · 2025-12-01T17:48:55Z

PullHero Review

Provider: generativelanguage.googleapis.com Model: gemini-2.0-flash

Okay, I've reviewed the changes and here's my assessment.

Overview

This PR introduces a comprehensive post-processing framework for Zathras benchmark results. It includes new modules for:

Extracting data from various benchmark result formats.
Structuring data into a consistent object-based schema.
Exporting data to OpenSearch and Horreum.
Provides tooling to increase the field count in OpenSearch
Provides documentation on using the new framework.

This is a significant addition to Zathras, enabling centralized analysis and long-term performance tracking.

Code Quality and Maintainability

Object-Oriented Design: The use of dataclasses for schema definition and a BaseProcessor class with specialized subclasses is a good design pattern. It promotes code reuse and makes it easy to add support for new benchmarks.
Type Hints: The extensive use of type hints improves code readability and helps prevent errors.
Logging: The logging is well-structured and provides useful information for debugging. The addition of colored logging is a nice touch for CLI usability.
Configuration: The use of YAML for configuration is appropriate.
Modular Structure: The clear separation of concerns (processors, exporters, utilities) makes the code easier to understand and maintain.
Error Handling: There's good error handling with specific exceptions (e.g., ArchiveExtractionError, ProcessorError) and try...except blocks.
Code Documentation: Great use of docstrings.
Duplicate Prevention: Implementation of content-based checksums is great.

Potential Bugs and Security Issues

SQL Injection: The code does not appear to have any direct SQL queries, so there is no risk of SQL injection.
Path Traversal: There are a few places that have Path() operations. Ensure that the path does not contain things like ../ to traverse to a location it should not be.
Credentials in config file: the system does not verify the credentials. This should be documented.

Adherence to Project Conventions

The code follows standard Python naming conventions and coding style.
The directory structure is well-organized.
The changes are well-documented.

Documentation Completeness

The README.md is comprehensive and provides clear instructions on how to use the new framework.
The schema is well-documented in schema.py.
There is a clear roadmap for future development in IMPLEMENTATION_TODO.md.

Specific Feedback

.gitignore: The addition of .gitignore files to ignore various build products and credential files is good.
apply_opensearch_templates.sh: The script is well-written and provides clear instructions on how to apply the index templates. It could be improved by adding checks to ensure that the required tools (curl, python3) are installed. Also, the warning about deleting existing indices should be prominently displayed.
Horreum exporter: The Horreum exporter looks well written and the code makes it very clear what the system is doing.
OpenSearch exporter: The OpenSearch exporter has a good use of try/expect blocks to handle issues
Base Processor: The base processor is well designed and handles the base tasks that need to be performed.
Processors: The processors all follow the expected design pattern, and make it easy to add new processors.
Robustness: The system has a good design to be as robust as possible.

Improvements and suggestions

Path Traversal: Review the code for potential path traversal vulnerabilities, especially when constructing file paths from user-provided input (e.g., the --input argument). Use Path.resolve() to sanitize paths and prevent access to unintended locations. For example:
```
input_path = args.input.resolve()  # Add this
if not input_path.exists():
    logger.error(f"Input path does not exist: {args.input}")
    sys.exit(1)
```
Credentials in config file: add a big comment about how the system does not verify the credentials.
apply_opensearch_templates.sh: Add checks for curl and python3 before attempting to apply the templates. Also, make the warning about deleting existing indices more prominent.
Testing: Add a section about unit and integration testing.

Conclusion

This is an excellent addition to Zathras. The code is

Vote: 0

PullHero

dvalinrh · 2025-12-04T16:38:05Z

If we have multiple runs of the same benchmark (it does happen), which run will this use (should be the last run of the benchmark).

kdvalin

A quick initial review

post_processing/config/apply_opensearch_templates.sh

kdvalin · 2025-12-04T19:34:05Z

post_processing/exporters/opensearch_exporter.py

+                req = urllib.request.Request(
+                    url,
+                    data=request_data,
+                    headers=headers,
+                    method=method
+                )


Isn't there an opensearch module for python?

We could stay with requests to minimize dependencies. @frival @dvalinrh thoughts?

I think @acalhounRH had said one of his goals was to make this data sink independent, such that we could switch to e.g. Horreum with relatively minimal interruption.

kdvalin · 2025-12-04T19:36:43Z

post_processing/utils/archive_handler.py

+Archive Handler Utility
+
+Handles extraction of Zathras result archives:
+- results_{test}.zip → results_{test}_.tar → test result files


@dvalinrh did we go back to a regular tarball instead of a zip -> tar -> test results?

kdvalin · 2025-12-04T19:39:23Z

post_processing/utils/metadata_extractor.py

+                        if field == 'Architecture':
+                            cpu_info['architecture'] = data
+                        elif field == 'Vendor ID':
+                            cpu_info['vendor'] = data
+                        elif field == 'Model name':
+                            cpu_info['model'] = data
+                        elif field == 'CPU(s)':
+                            cpu_info['cores'] = int(data) if data.isdigit() else None


We may want to consider how to simplify this... cpu_info[field] = data would be nice if possible and reduce the length of this if/elif chain.

kdvalin · 2025-12-04T19:39:40Z

post_processing/utils/metadata_extractor.py

+                if 'vendor_id' in proc_data:
+                    cpu_info['vendor'] = proc_data['vendor_id']
+                if 'model_name' in proc_data:
+                    cpu_info['model'] = proc_data['model_name']
+                if 'cpu_cores' in proc_data:
+                    cpu_info['cores'] = int(proc_data['cpu_cores'])


kdvalin · 2025-12-04T19:41:31Z

post_processing/utils/metadata_extractor.py

+        # Convert to GB
+        if unit == 'K':
+            return round(value / 1024 / 1024)
+        elif unit == 'M':
+            return round(value / 1024)
+        elif unit == 'G':
+            return round(value)
+        elif unit == 'T':
+            return round(value * 1024)


Are we getting IEC base-2 amounts or SI base-10 amounts?

Mix/matching them would be probably close enough:tm:, but could provide headaches.

kdvalin · 2025-12-04T19:45:08Z

Let's move some discussion here.

I think that putting the extraction scripts for each benchmark in Zathras is against how we have this set up. If a wrapper gets out of sync of Zathras (in either direction) then that can cause problems.

This is not a simple change, since the wrappers get downloaded to the remote system and the controller never sees/cares about the wrappers presently. Some discussion around this would be helpful.

grdumas · 2025-12-08T20:17:39Z

This changeset includes thousands of test results under alex_95/ - can we remove this directory from the PR?

…earch and Horreum - Implement 12 benchmark processors (CoreMark, STREAMS, FIO, SPEC CPU 2017, etc.) - Object-based schema with full metadata extraction (CPU, memory, NUMA, OS config) - Two-index design: zathras-results (summary) + zathras-timeseries (individual points) - Content-based deduplication using SHA256 hashing - OpenSearch and Horreum exporters with automatic retry logic - Batch processing with recursive directory discovery - Zero flake8 errors (1,340 issues fixed) Tested with 34 Azure instances, 78 benchmarks successfully exported. AI-assisted implementation using Claude Sonnet 4.5 (Anthropic) via Cursor IDE.

acalhounRH · 2025-12-08T22:03:43Z

If we have multiple runs of the same benchmark (it does happen), which run will this use (should be the last run of the benchmark).

the processing will treat each run separately + common metadata to support the association. so all runs will(should) be exported to data store.

frival · 2026-01-07T14:44:25Z

Code Review: Post-Processing Pipeline

Overview

This PR adds a significant post-processing pipeline (~10,500 lines of code across 34 files) for exporting Zathras benchmark results to OpenSearch and Horreum. The implementation is well-structured and follows good Python practices.

Summary Assessment

Category	Rating	Notes
Architecture	✅ Good	Clean separation of concerns, modular design
Code Style	✅ Good	Consistent formatting, docstrings present
Modularity	✅ Excellent	Abstract base class, pluggable processors
Security	⚠️ Minor Issues	See security section below
Best Practices	✅ Good	Type hints, logging, error handling
Documentation	✅ Excellent	Comprehensive README, inline docs

Strengths

1. Well-Designed Architecture

Base Processor Pattern (base_processor.py:52-570): Clean abstract base class with template method pattern - subclasses only implement parse_runs() and get_test_name()
Schema-First Design (schema.py): Dataclasses with clear structure, content-based hashing for deduplication
Two-Index Architecture: Smart separation of summary and time series data to avoid OpenSearch field limits

2. Excellent Modularity

12 benchmark processors all follow the same pattern
Exporters are independent and can be used separately
Utility functions are well-organized and reusable

3. Good Error Handling

Comprehensive try/catch blocks with cleanup
Graceful degradation when optional files are missing
Context managers for resource cleanup (archive_handler.py:319-326)

4. Thoughtful Design Decisions

Content-based document IDs for deduplication (schema.py:359-400)
PyPerf multi-document mode to handle field limits (pyperf_processor.py:40-138)
Object-based keys instead of arrays for OpenSearch optimization

Issues & Recommendations

Security Concerns

1. SSL Certificate Verification Disabled by Default

File: post_processing/config/export_config_example.yml:15

verify_ssl: false  # Set to true for production

Risk: Users may copy this example and leave SSL verification disabled, exposing credentials to MITM attacks.

Recommendation: Default to true and require explicit opt-out:

verify_ssl: true  # Only set to false for development with self-signed certs

2. Unverified SSL Context Creation

File: opensearch_exporter.py:137-140

if not self.verify_ssl and url.startswith('https'):
    context = ssl._create_unverified_context()

Risk: Using ssl._create_unverified_context() bypasses all certificate validation.

Recommendation: Consider logging a warning when SSL verification is disabled:

if not self.verify_ssl:
    logger.warning("SSL verification disabled - credentials may be exposed")

3. Archive Extraction Without Path Validation

File: archive_handler.py:90-102

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(temp_dir)

Risk: Zip Slip vulnerability - malicious archives could write files outside the temp directory using ../ paths.

Recommendation: Validate extracted paths:

for member in zip_ref.namelist():
    member_path = os.path.join(temp_dir, member)
    if not os.path.commonpath([temp_dir, member_path]) == temp_dir:
        raise ArchiveExtractionError(f"Path traversal detected: {member}")

Code Style Issues

1. Unused Import in requirements.txt

File: requirements.txt:9

opensearch-py>=2.0.0

Issue: The code uses urllib.request directly, not the opensearch-py library. Either use the library or remove the dependency.

2. Hardcoded Placeholder Timestamp

File: coremark_processor.py:194

base_time = datetime(2025, 11, 6, 5, 9, 45)  # Placeholder

Recommendation: Use current time or document the limitation more clearly.

3. Magic Numbers

File: opensearch_exporter.py:269

"limit": 5000  # Higher limit for dynamic fields

Recommendation: Define as a constant at module level for clarity.

Modularity Improvements

1. Duplicate HTTP Logic

Both opensearch_exporter.py and horreum_exporter.py have nearly identical _make_request() methods.

Recommendation: Extract to a common http_client.py utility.

2. Statistics Calculation Duplication

Similar statistics calculations appear in multiple processors (coremark_processor.py:260-269, pyperf_processor.py:305-312).

Recommendation: Add helper function in parser_utils.py:

def calculate_summary_stats(values: List[float]) -> TimeSeriesSummary:
    ...

Best Practices

1. Missing `all` Exports

Files: Various __init__.py files

Recommendation: Add explicit __all__ lists to control public API:

__all__ = ['ZathrasDocument', 'Metadata', 'Results', ...]

2. Inconsistent Error Types

Some functions raise Exception, others raise custom exceptions like ProcessorError.

Recommendation: Use custom exceptions consistently for better error handling by callers.

3. No Input Validation on URLs

File: opensearch_exporter.py:61

self.url = url.rstrip('/')

Recommendation: Validate URL format:

from urllib.parse import urlparse
parsed = urlparse(url)
if not parsed.scheme in ('http', 'https'):
    raise ValueError(f"Invalid URL scheme: {url}")

Minor Issues

Type hint inconsistency: tuple vs Tuple used interchangeably (schema.py:529 uses lowercase tuple which requires Python 3.9+)
Potential divide-by-zero: fio_processor.py:341-354 - total_ios could be 0 if all jobs have empty operation data
Logging level inconsistency: Some debug info logged at INFO level (run_postprocessing.py:357)

Recommendations Summary

High Priority

⚠️ Add archive path traversal protection
⚠️ Change default SSL verification to true
⚠️ Add warning when SSL verification disabled

Medium Priority

Remove or use opensearch-py dependency
Extract common HTTP client code
Add __all__ exports to public modules

Low Priority

Consolidate statistics calculation helpers
Use consistent exception types
Add URL validation

Conclusion

This is a well-designed, production-quality module that follows Python best practices. The architecture is clean and extensible. The main concerns are around security (SSL verification defaults and archive extraction) which should be addressed before merging. The documentation is excellent and will help users get started quickly.

Recommendation: Address the security concerns and the code is ready to merge.

🤖 Generated with Claude Code

grdumas · 2026-01-20T13:39:12Z

post_processing/processors/coremark_processor.py

+        duration = summary.get('total_time_secs') if summary else None
+
+        # Estimate timestamps (we don't have exact timestamps in CoreMark)
+        base_time = datetime(2025, 11, 6, 5, 9, 45)  # Placeholder


Hardcoded timestamps will cause problems.

- Parse CoreMark results CSV with comma delimiter and optional comment lines (skip_comments in parser_utils.parse_csv_timeseries). - Use Start_Date/End_Date from the CSV for run and time-series timestamps when present. - Require valid timestamps: raise ProcessorError instead of synthetic timestamps when the CSV lacks Start_Date/End_Date or when a row has missing/invalid Start_Date. - Drop legacy colon-delimited, no-timestamps format support.

- Parse Start_Date and End_Date from comma-delimited results_streams.csv and use them for run timeseries instead of datetime.now(). - Require the CSV header to include Start_Date and End_Date as the last two columns; raise ProcessorError with a clear message when timestamps are missing, empty, or not valid ISO 8601. - Support direct CSV path via files['results_streams_csv'] and allow results_streams.csv in extracted_path (no streams_* subdir required), matching the pattern used under tmp/coremark.

- Use timestamps from benchmark output: require comma-delimited CSV with Start_Date and End_Date (ISO 8601); use them for run start_time/end_time and a single timeseries point. No datetime.now()/utcnow(). - Require timestamps and fail clearly: when timestamps are missing or malformed, raise ProcessorError with a short message (expected format T/V,N,NB,P,Q,Time,Gflops,Start_Date,End_Date). Add ISO 8601 validation helper and reject legacy colon-delimited format without timestamps.

- Use timestamps from benchmark output: require comma-delimited CSV with Start_Date and End_Date (ISO 8601). Use run-level start/end from the results and per-workload timestamps from each row's Start_Date for timeseries. No datetime.now()/utcnow(). - Require timestamps and fail clearly: when timestamps are missing from the header or malformed in any row, raise ProcessorError with a short, descriptive message (expected format Test,Multi_iterations,Single_iterations,Scaling,Start_Date,End_Date). Add ISO 8601 validation helper; reject legacy colon-delimited format without timestamps. - Match tmp/coremark path layout: allow direct results path via extracted_result['files']['results_csv'] when provided (demos pass single file path). When using extracted_path, allow results.csv directly in that directory or in a coremark_pro_* subdirectory. - Demo: add tmp/coremark_pro/ CSVs (valid, no_timestamps, invalid and empty timestamp rows) and post_processing/demo_coremark_pro_timestamps.py using the same files-based extracted_result pattern as the CoreMark demo; run valid (success), no timestamps (ProcessorError), invalid and empty timestamp (ProcessorError) cases with short summary and error message output.

…d path - Use timestamps from benchmark output: parse CSV with Start_Date/End_Date (ISO 8601) per row and use them for timeseries points instead of datetime.now(). - Require timestamps and fail clearly: when CSV lacks Start_Date/End_Date or any value is missing/malformed, raise ProcessorError with a short message (expected format and what went wrong). Reject legacy colon-delimited Warehouses:Bops format without timestamps. - Add _validate_iso8601_timestamp() and _ISO8601_PATTERN for validation. - Allow direct results path via extracted_result['files']['results_specjbb_csv'] when provided; when using extracted_path, look for results_specjbb.csv in that directory or via _find_csv_file (rglob). Assisted-by: Cursor

…path - Use timestamps from benchmark output: parse CSV with Start_Date/End_Date (ISO 8601) per row for single-CSV path; for net_results layout use run_metadata.csv run-level Start_Date/End_Date and interpolate timestamps for each data point. No datetime.now()/utcnow(). - Require timestamps and fail clearly: when CSV lacks Start_Date/End_Date or any value is missing/malformed, raise ProcessorError with a short message (expected format and what went wrong). - Add _validate_iso8601_timestamp() and _ISO8601_PATTERN for validation. - Allow direct results path via extracted_result['files']['results_uperf_csv'] when provided. When using extracted_path, look for results_uperf.csv in that directory or in a uperf_* subdir; if not found, use net_results/ and require run_metadata.csv with Start_Date,End_Date. Assisted-by: Cursor

acalhounRH requested review from dvalinrh, grdumas and kdvalin December 1, 2025 17:48

acalhounRH self-assigned this Dec 1, 2025

acalhounRH added the enhancement New feature or request label Dec 1, 2025

kdvalin requested changes Dec 4, 2025

View reviewed changes

acalhounRH force-pushed the enable-post-processing-and-export branch from ec544d8 to c51fc7c Compare December 8, 2025 14:21

acalhounRH force-pushed the enable-post-processing-and-export branch from c51fc7c to 97691e1 Compare December 8, 2025 21:51

grdumas reviewed Jan 20, 2026

View reviewed changes

grdumas added 6 commits February 12, 2026 20:17

Comments

Conversation

acalhounRH commented Dec 1, 2025

Summary

What's Included

Usage

Files Changed

new feature addition only.

Dependencies

AI Assistance

Uh oh!

github-actions bot commented Dec 1, 2025

PullHero Review

High-Level Assessment

Key Strengths

Areas for Improvement

Specific Feedback

Uh oh!

github-actions bot commented Dec 1, 2025

PullHero Review

General Observations

Specific Feedback

1. .gitignore Files

2. post_processing/README.md

3. post_processing/config/apply_opensearch_templates.sh

4. post_processing/exporters/opensearch_exporter.py

5. post_processing/processors/base_processor.py

6. post_processing/processors/pyperf_processor.py

7. General

Code Snippets with Suggested Changes

Uh oh!

github-actions bot commented Dec 1, 2025

PullHero Review

Overview

Code Quality and Maintainability

Potential Bugs and Security Issues

Adherence to Project Conventions

Documentation Completeness

Specific Feedback

Improvements and suggestions

Conclusion

Uh oh!

dvalinrh commented Dec 4, 2025

Uh oh!

kdvalin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kdvalin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

frival Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

kdvalin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

kdvalin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

kdvalin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

kdvalin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

kdvalin commented Dec 4, 2025

Uh oh!

grdumas commented Dec 8, 2025

Uh oh!

acalhounRH commented Dec 8, 2025

Uh oh!

frival commented Jan 7, 2026

Code Review: Post-Processing Pipeline

Overview

Summary Assessment

Strengths

1. Well-Designed Architecture

2. Excellent Modularity

3. Good Error Handling

1. Missing `all` Exports