-
Notifications
You must be signed in to change notification settings - Fork 319
feat(profiling): Complete OTLP profiles implementation with JFR conversion pipeline #10098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jbachorik
wants to merge
26
commits into
master
Choose a base branch
from
jb/rnd_otlp_profile
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add profiling-otel module with core infrastructure for JFR to OTLP profiles conversion: - Dictionary tables for OTLP compression (StringTable, FunctionTable, LocationTable, StackTable, LinkTable, AttributeTable) - ProtobufEncoder for hand-coded protobuf wire format encoding - OtlpProtoFields constants for OTLP profiles proto field numbers - Unit tests for all dictionary tables and encoder - Architecture documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add JMH benchmark filtering via -PjmhIncludes property in build.gradle.kts
- Update JfrToOtlpConverterBenchmark parameters to {50, 500, 5000} events
- Run comprehensive benchmarks and document actual performance results
- Update BENCHMARKS.md with measured throughput data (Apple M3 Max)
- Update ARCHITECTURE.md with performance characteristics
- Key findings: Stack depth is primary bottleneck (~60% reduction per 10x increase)
- Linear scaling with event count, minimal impact from context count
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
…pport Reverted Phase 1 optimization attempts that showed no improvement: - Removed tryGetExisting() optimization from JfrToOtlpConverter - Deleted tryGetExisting() method from FunctionTable - The optimization added overhead (2 FunctionKey allocations vs 1) Added JMH profiling support: - Added profiling configuration to build.gradle.kts - Enable with -PjmhProfile=true flag - Configures stack profiler (CPU sampling) and GC profiler (allocations) Profiling results reveal actual bottlenecks: - JFR File I/O: ~20% (jafar-parser, external dependency) - Protobuf encoding: ~5% (fundamental serialization cost) - Conversion logic: ~3% (our code) - Dictionary operations: ~1-2% (NOT the bottleneck) Key findings: - Dictionary operations already well-optimized at ~1-2% of runtime - Modern JVM escape analysis optimizes temporary allocations - Stack depth is dominant factor (O(n) frame processing) - HashMap lookups (~10-20ns) dominated by I/O overhead Updated documentation: - BENCHMARKS.md: Added profiling section with findings - ARCHITECTURE.md: Added profiling support and results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…ant pool IDs Leverage JFR's internal stack trace deduplication by caching conversions based on constant pool IDs. This avoids redundant processing of identical stack traces that appear multiple times in profiling data. Implementation: - Add @JfrField(raw=true) stackTraceId() methods to all event interfaces (ExecutionSample, MethodSample, ObjectSample, JavaMonitorEnter, JavaMonitorWait) - Implement HashMap cache in JfrToOtlpConverter with lazy stack trace resolution - Cache key combines stackTraceId XOR (identityHashCode(chunkInfo) << 32) for chunk-unique identification - Modify convertStackTrace() to accept Supplier<JfrStackTrace> and check cache before resolution - Update all event handlers to pass method references (event::stackTrace) instead of resolved stacks - Add stackDuplicationPercent parameter to JfrToOtlpConverterBenchmark (0%, 70%, 90%) - Document Phase 5.6: Stack Trace Deduplication Optimization in ARCHITECTURE.md Performance Results: - 0% stack duplication: 8.1 ops/s (baseline, no cache benefit) - 70% stack duplication: 14.4 ops/s (+78% improvement, typical production workload) - 90% stack duplication: 20.5 ops/s (+153% improvement, 2.5x faster for hot-path heavy workloads) All 82 tests pass. Zero overhead for unique stacks, significant gains for realistic duplication patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…n Docker unavailable Use @testcontainers(disabledWithoutDocker = true) to automatically skip OtlpCollectorValidationTest when Docker is not available instead of failing with IllegalStateException. This allows the test suite to pass cleanly in environments without Docker while still running all other tests. When Docker is available, these tests will run normally. Result: 82 tests pass, Docker tests gracefully skipped when unavailable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implement support for OTLP profiles original_payload and original_payload_format fields (fields 9 and 10) to include source JFR recording(s) in OTLP output for debugging and compliance verification. Key features: - Zero-copy streaming architecture using SequenceInputStream - Automatic uber-JFR concatenation for multiple recordings - Disabled by default per OTLP spec recommendation (size considerations) - Fluent API: setIncludeOriginalPayload(boolean) Implementation details: - Enhanced ProtobufEncoder with streaming writeBytesField(InputStream, long) method - Single file optimization: direct FileInputStream - Multiple files: SequenceInputStream chains files with zero memory overhead - Streams data in 8KB chunks directly into protobuf output Test coverage: - Default behavior verification (payload disabled) - Single file with payload enabled - Multiple files creating uber-JFR concatenation - Setting persistence across converter reuse Documentation: - Added Phase 6 to ARCHITECTURE.md with usage examples, design decisions, and performance characteristics - Centralized jafar-parser dependency version in gradle/libs.versions.toml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…uration constants Implement foundation for parallel OTLP profile uploads alongside JFR format. **Step 1: RecordingData Reference Counting** Add thread-safe reference counting to support multiple listeners accessing the same RecordingData: - Add AtomicInteger refCount and volatile boolean released flag - Add retain() method to increment reference count before passing to additional listeners - Make release() final with automatic reference counting (decrements and calls doRelease at 0) - Add protected doRelease() for actual cleanup (called when refcount reaches 0) - Update all implementations: OpenJdkRecordingData, DatadogProfilerRecordingData, OracleJdkRecordingData, CompositeRecordingData Reference counting pattern enables multiple uploaders (JFR + OTLP) to safely share RecordingData without double-release or resource leaks. Each listener calls retain() before use and release() when done. Actual cleanup happens only when refcount reaches zero. **Step 2: OTLP Configuration Constants** Add configuration property keys to ProfilingConfig for OTLP profile format support: - profiling.otlp.enabled (default: false) - Enable parallel OTLP upload - profiling.otlp.include.original.payload (default: false) - Embed source JFR in OTLP - profiling.otlp.url (default: "") - OTLP endpoint URL (empty = derive from agent URL) - profiling.otlp.compression (default: "gzip") - Compression type for OTLP upload Configuration will be read directly from ConfigProvider in OtlpProfileUploader for testability. Next steps: - Step 3: Implement OtlpProfileUploader class (reads config from ConfigProvider) - Step 4: Integrate with ProfilingAgent - Step 5: Add tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add OtlpProfileUploader class implementing RecordingDataListener - Read configuration from ConfigProvider for testability - Support GZIP compression (configurable via boolean flag) - Use JfrToOtlpConverter to transform JFR recordings to OTLP format - Derive OTLP endpoint from agent URL (port 4318, /v1/profiles) - Handle both synchronous and asynchronous uploads - Use TempLocationManager for temp file creation - Add profiling-otel dependency to profiling-uploader module - Add basic unit tests for OtlpProfileUploader Configuration options: - profiling.otlp.enabled (default: false) - profiling.otlp.url (default: derived from agent URL) - profiling.otlp.compression.enabled (default: true) - profiling.otlp.include.original.payload (default: false) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…e counting Integrate OtlpProfileUploader into ProfilingAgent to enable parallel JFR and OTLP profile uploads when configured. Implements explicit reference counting pattern for RecordingData to safely support multiple concurrent handlers. Key changes: 1. ProfilingAgent integration: - Add OtlpProfileUploader alongside ProfileUploader - Extract handler methods (handleRecordingData, handleRecordingDataWithDump) - Use method references instead of capturing lambdas for better performance - Call retain() once for each handler (dumper, OTLP, JFR) - Update shutdown hooks to properly cleanup OTLP uploader 2. Explicit reference counting in RecordingData: - Change initial refcount from 1 to 0 for clarity - Each handler must call retain() before processing - Each handler calls release() when done - doRelease() called only when refcount reaches 0 - Updated javadocs to reflect explicit counting pattern 3. Comprehensive test coverage: - RecordingDataRefCountingTest validates all handler combinations - Tests single, dual, and triple handler scenarios - Verifies thread-safety with concurrent handlers - Tests error conditions (premature release, retain after release) - Confirms idempotent release behavior Benefits: - Symmetric treatment of all handlers (no special first handler) - Clear, explicit reference counting (easier to understand and verify) - No resource leaks or premature cleanup - Efficient method references (no lambda capture overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Include OTLP profiles converter and its dependencies in the agent-profiling uber JAR for integration into dd-java-agent.jar. The profiling-otel module and its jafar-parser dependency are now bundled, while shared dependencies (internal-api, components:json) are correctly excluded via the existing excludeShared configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add command-line interface for testing and validating JFR to OTLP
conversions with real profiling data.
Features:
- Convert single or multiple JFR files to OTLP protobuf or JSON
- Include original JFR payload for validation (optional)
- Merge multiple recordings into single output
- Detailed conversion statistics
Usage:
./gradlew :dd-java-agent:agent-profiling:profiling-otel:convertJfr \
-Pargs="recording.jfr output.pb"
./gradlew :dd-java-agent:agent-profiling:profiling-otel:convertJfr \
-Pargs="--json recording.jfr output.json"
See doc/CLI.md for complete documentation and examples.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Add --pretty flag to control JSON pretty-printing in the CLI converter. By default, JSON output is compact for efficient processing. Use --pretty for human-readable output with indentation. Usage: # Compact JSON (default) ./gradlew convertJfr --args="--json input.jfr output.json" # Pretty-printed JSON ./gradlew convertJfr --args="--json --pretty input.jfr output.json" The pretty-printer is a simple, dependency-free implementation that adds newlines and 2-space indentation without external libraries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Integrates OpenTelemetry's profcheck tool to validate OTLP profiles conform to the specification. This provides automated conformance testing and helps catch encoding bugs early. Key additions: - Docker-based profcheck integration (docker/Dockerfile.profcheck) - Gradle tasks for building profcheck image and validation - ProfcheckValidationTest with Testcontainers integration - Comprehensive documentation in PROFCHECK_INTEGRATION.md Gradle tasks: - buildProfcheck: Builds profcheck Docker image from upstream PR - validateOtlp: Validates OTLP files using profcheck - Auto-build profcheck image before tests tagged with @tag("docker") Test results: - ✅ testEmptyProfile: Passes validation - ✅ testAllocationProfile: Passes validation - ❌ testCpuProfile: Revealed stack_index out of range bugs - ❌ testMixedProfile: Revealed protobuf wire-format encoding bugs The test failures are expected and valuable - they uncovered real bugs in the OTLP encoder that need to be fixed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Dictionary tables (location, function, link, stack, attribute) were omitting their required index 0 sentinel entries from the wire format, causing profcheck validation failures. Root cause: 1. Dictionary loops started at i=1 instead of i=0, skipping sentinels 2. ProtobufEncoder.writeNestedMessage() had an if (length > 0) check that completely skipped writing empty messages 3. Sentinel entries encode as empty messages (all fields are 0/empty) 4. Result: Index 0 was not present in wire format, causing off-by-one array indexing errors in profcheck validation Fix: - Changed ProtobufEncoder.writeNestedMessage() to always write tag+length even for empty messages (required for sentinels) - Changed all dictionary table loops to start from i=0 to include sentinels - Added attribute_table encoding (was completely missing) - Updated JSON encoding to match protobuf encoding - Fixed test to use correct event type (datadog.ObjectSample) All profcheck validation tests now pass with "conformance checks passed". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…rter
This commit adds support for mapping JFR event attributes to OTLP profile
sample attributes, enabling richer profiling data with contextual metadata.
Key changes:
1. Sample Attributes Implementation:
- Added attributeIndices field to SampleData class
- Implemented getSampleTypeAttributeIndex() helper for creating sample type attributes
- Updated all event handlers (CPU, allocation, lock) to include sample.type attribute
- Uses packed repeated int32 format for attribute_indices per proto3 spec
2. ObjectSample Enhancements:
- Added objectClass, size, and weight fields to ObjectSample interface
- Implemented upscaling: sample value = size * weight
- Added alloc.class attribute for allocation profiling
- Maintains backwards compatibility with allocationSize field
3. OTLP Proto Field Number Corrections:
- Fixed Sample field numbers to match official Go module proto:
* stack_index = 1
* values = 2 (was 4)
* attribute_indices = 3 (was 2)
* link_index = 4 (was 3)
* timestamps_unix_nano = 5 (was 5)
- Corrects discrepancy between proto file and generated Go code
4. Dual Validation System:
- Updated Dockerfile.profcheck to include both protoc and profcheck
- Created validate-profile wrapper script
- Protoc validation is authoritative (official Protocol Buffers compiler)
- Profcheck warnings are captured but don't fail builds
- Documents known profcheck timestamp validation issues
5. Test Updates:
- Updated smoke tests to use new ObjectSample fields (size, weight)
- Modified validation tests to check for protoc validation success
- All validation tests passing with spec-compliant output
Design decisions:
- Measurements (duration, size*weight) are stored as sample VALUES
- Labels/metadata (sample.type, alloc.class) are stored as ATTRIBUTES
- AttributeTable provides automatic deduplication via internString()
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Fixed profcheck timestamp validation errors and made profcheck validation mandatory to pass alongside protoc validation. Timestamp Issues Fixed: - Removed manual startTime field assignments in all test JFR events - Manual timestamps were being interpreted as JFR ticks (not epoch nanos) - Let JFR recording system automatically assign correct timestamps - JFR auto-timestamps are properly converted via chunkInfo.asInstant() Validation Changes: - Made profcheck validation mandatory (previously only protoc was required) - Updated validation script to require both protoc AND profcheck to pass - Removed special handling for "known attribute_indices bug" (now fixed) - Updated test assertions to verify both validators pass - Both validators now cleanly pass for all test profiles Result: Complete OTLP profiles spec compliance with both: - protoc (official Protocol Buffers compiler) - structural validation - profcheck (OpenTelemetry conformance checker) - semantic validation All tests passing: empty, CPU, allocation, and mixed profiles. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added convert-jfr.sh script that provides a simplified interface for converting JFR files to OTLP format without needing to remember Gradle task paths. Features: - Automatic compilation if needed - Simplified command-line interface - Colored output for better visibility - File size reporting - Comprehensive help message - Error handling with clear messages Usage: ./convert-jfr.sh recording.jfr output.pb ./convert-jfr.sh --json recording.jfr output.json ./convert-jfr.sh --pretty recording.jfr output.json ./convert-jfr.sh file1.jfr file2.jfr merged.pb Updated CLI.md documentation with: - Quick start section featuring the convenience script - Complete usage examples - Feature list and when to use the script vs Gradle directly The script wraps the existing Gradle convertJfr task, providing a more user-friendly interface for development and testing workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Enhanced the conversion script with detailed diagnostic output showing: - Input file sizes (individual and total) - Output file size - Wall-clock conversion time - Compression ratio (output vs input size) - Space savings (bytes and percentage) Usage: ./convert-jfr.sh --diagnostics recording.jfr output.pb Example output: [DIAG] Input: recording.jfr (89.3KB) [DIAG] Total input size: 89.3KB [DIAG] === Conversion Diagnostics === [DIAG] Wall time: 127.3ms [DIAG] Output size: 45.2KB [DIAG] Size ratio: 50.6% of input [DIAG] Savings: 44.1KB (49.4% reduction) Features: - Cross-platform file size detection (macOS and Linux) - Nanosecond-precision timing - Human-readable size formatting (B, KB, MB, GB) - Automatic compression ratio calculation - Color-coded diagnostic output (cyan) Updated CLI.md with: - --diagnostics option documentation - Example output showing diagnostic information - Updated feature list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…iagnostics Added convert-jfr.sh convenience wrapper for JFR to OTLP conversion with comprehensive diagnostic output and cross-platform compatibility. Features: - Simple CLI interface wrapping Gradle convertJfr task - Support for all converter options (--json, --pretty, --include-payload) - --diagnostics flag showing detailed metrics: * Input/output file sizes with human-readable formatting * Actual conversion time (parsed from converter output) * Compression ratios and savings - Colored output for better readability - Cross-platform file size detection (Linux and macOS) - Automatic compilation via Gradle Implementation: - Parses converter's own timing output to show actual conversion time (e.g., 141ms) instead of total Gradle execution time (13+ seconds) - Uses try-fallback approach for stat command (GNU stat → BSD stat) - Works on Linux, macOS with GNU coreutils, and native macOS Documentation: - Added "Convenience Script" section to doc/CLI.md - Usage examples and feature list - Diagnostic output examples Example: ./convert-jfr.sh --diagnostics recording.jfr output.pb Shows: 141ms conversion time, 2.0MB → 2.2KB (99.9% reduction) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…speedup Replaced Gradle-based execution with a fat jar approach for dramatic performance improvement in the JFR to OTLP conversion script. Performance improvement: - Previous: ~13+ seconds (Gradle overhead) - New: ~0.4 seconds (< 0.5s total) - Speedup: ~31x faster - Actual conversion time: ~120ms (unchanged) Implementation: - Added shadowJar task to build.gradle.kts with minimization - Modified convert-jfr.sh to use fat jar directly via java -jar - Added automatic rebuild detection based on source file mtimes - Jar only rebuilds when source files are newer than jar - Cross-platform mtime detection (GNU stat → BSD stat fallback) - Suppressed harmless SLF4J warnings (defaults to NOP logger) Features: - Automatic jar rebuild only when source files change - Fast startup (no Gradle overhead) - Clean output with SLF4J warnings filtered - All existing diagnostics and features preserved Fat jar details: - Size: 1.9MB (minimized with shadow plugin) - Location: build/libs/profiling-otel-*-cli.jar - Main-Class manifest entry for direct execution - Excludes unnecessary SLF4J service providers Documentation: - Updated CLI.md to highlight performance improvements - Noted fat jar usage instead of Gradle task Example: ./convert-jfr.sh --diagnostics recording.jfr output.pb Total time: 0.4s (vs 13+ seconds with Gradle) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Simplified the conversion script output to avoid duplicate information:
Default mode (no flags):
- Single concise line: "[SUCCESS] Converted: output.pb (45.2KB, 127ms)"
- No verbose converter output shown
- Perfect for scripting and quick conversions
Diagnostics mode (--diagnostics):
- Shows converter's detailed output (files, format, time)
- Enhanced diagnostics section with compression metrics
- Clear input→output flow visualization
- Space savings calculations
Changes:
- Removed duplicate "Converting..." and "Conversion complete" messages
- Eliminated redundant output file info in default mode
- Consolidated size/time reporting
- Renamed section to "Enhanced Diagnostics" to distinguish from converter output
Example outputs:
Default:
[SUCCESS] Converted: output.pb (45.2KB, 127ms)
With --diagnostics:
[DIAG] Input: recording.jfr (89.3KB)
Converting 1 JFR file(s) to OTLP format...
Adding: recording.jfr
Conversion complete!
Output: output.pb
Format: PROTO
Size: 45.2 KB
Time: 127 ms
[DIAG] === Enhanced Diagnostics ===
[DIAG] Input → Output: 89.3KB → 45.2KB
[DIAG] Compression: 50.6% of original
[DIAG] Space saved: 44.1KB (49.4% reduction)
Documentation updated in CLI.md with both output examples.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
c98c2a0 to
52c579e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements a complete JFR to OTLP (OpenTelemetry Protocol) profiles converter with comprehensive validation, performance optimizations, and a high-performance CLI tool.
Core Implementation
OTLP Profiles Format Support
{stack_index, attribute_indices, link_index}JFR Event Type Support
datadog.ExecutionSample→ cpu/samplesdatadog.MethodSample→ wall/samplesdatadog.ObjectSample→ alloc-samples/bytes withobjectClassattributesjdk.JavaMonitorEnter,jdk.JavaMonitorWait→ lock-contention/nanosecondsPerformance Optimizations
Original Payload Support
original_payload+original_payload_formatfieldsSequenceInputStream)Usage
Validation & Compliance
CLI Tool
Features
Usage
References
Note: R&D work for OTLP profile support. May still be changing.
🤖 Generated with Claude Code