feat: OTLP trace export via observability.otlp frontmatter config#24441
feat: OTLP trace export via observability.otlp frontmatter config#24441
Conversation
…issue) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/49301b6f-02ce-44b4-8fd5-1bc9a878d6cc Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot,add a JavaScript implementation to push messages to the OTLP endpoint and instrument the action setup to take the job name as an input so it pushes a message each time it runs the job. |
…e input Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5738fc76-45bf-47ab-af6c-8de1dc9ec689 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5738fc76-45bf-47ab-af6c-8de1dc9ec689 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in c890c53 (building on 233b74c):
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
set the trace id as an output of the action and also as a optional input so that you can correlate all jobs (use traceid from activation job, ignore preactivation) |
…orrelation Agent-Logs-Url: https://github.com/github/gh-aw/sessions/461b1d13-5dec-4bb7-8daa-d36d583465a4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/461b1d13-5dec-4bb7-8daa-d36d583465a4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…idTraceId in index.js Agent-Logs-Url: https://github.com/github/gh-aw/sessions/461b1d13-5dec-4bb7-8daa-d36d583465a4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 4a40521 (building on ffd3965 and 6a3a6b9):
Usage for cross-job correlation: # Activation job
- uses: github/gh-aw/actions/setup@main
id: setup
with:
job-name: activation
# Agent job (depends on activation)
- uses: github/gh-aw/actions/setup@main
with:
job-name: agent
trace-id: ${{ needs.activation.outputs.setup-trace-id }}Pre-activation jobs that don't pass |
actions/setup/js/send_otlp_span.cjs
Outdated
| }, | ||
| scopeSpans: [ | ||
| { | ||
| scope: { name: "gh-aw.setup", version: "1.0.0" }, |
There was a problem hiding this comment.
- name gh-aw
- pass current version
There was a problem hiding this comment.
Done in 965d5d7:
- Scope
namechanged from"gh-aw.setup"to"gh-aw" - Scope
versionnow readsprocess.env.GH_AW_INFO_VERSION || "unknown"(same env var written by the compiler for every job)
actions/setup/js/send_otlp_span.cjs
Outdated
| body: JSON.stringify(payload), | ||
| }); | ||
| if (!response.ok) { | ||
| throw new Error(`OTLP export failed: HTTP ${response.status} ${response.statusText}`); |
There was a problem hiding this comment.
Done in 965d5d7. sendOTLPSpan now:
- Uses
console.warninstead of throwing on non-2xx responses - Retries up to 3 times (configurable) with exponential back-off (100 ms, 200 ms) before the final warning
|
@copilot add support for configuring headers for otel |
… spans for safe-outputs/conclusion jobs" Agent-Logs-Url: https://github.com/github/gh-aw/sessions/2a7e539d-2a7e-452d-8b26-8de15a7ebabe Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 965d5d7:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready. |
|
🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨ |
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
This comment has been minimized.
This comment has been minimized.
|
🤖 Smoke test #23986960002 results for
Overall:
|
There was a problem hiding this comment.
Smoke test review for PR #24441: OTLP trace export implementation looks comprehensive. Cross-job trace correlation wiring is well-structured. Minor suggestion: document retry backoff constants in send_otlp_span.cjs.
📰 BREAKING: Report filed by Smoke Copilot · ● 1.6M
| @@ -0,0 +1,512 @@ | |||
| // @ts-check | |||
There was a problem hiding this comment.
🤖 Smoke test inline review comment — the OTLP span sender looks solid. Consider adding a comment explaining the retry backoff constants (100ms/200ms) for future readers.
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💥 Automated smoke test review - all systems nominal!
💥 [THE END] — Illustrated by Smoke Claude · ● 254.1K
| lockdown_check_failed: ${{ steps.generate_aw_info.outputs.lockdown_check_failed == 'true' }} | ||
| model: ${{ steps.generate_aw_info.outputs.model }} | ||
| secret_verification_result: ${{ steps.validate-secret.outputs.verification_result }} | ||
| setup-trace-id: ${{ steps.setup.outputs.trace-id }} |
There was a problem hiding this comment.
The setup-trace-id output propagation looks correct — this ensures the activation trace is forwarded to downstream jobs for unified OTLP tracing.
| actions | ||
| persist-credentials: false | ||
| - name: Setup Scripts | ||
| id: setup |
There was a problem hiding this comment.
Adding id: setup here enables the trace-id output to be referenced via steps.setup.outputs.trace-id. This is a clean pattern for capturing step outputs.
There was a problem hiding this comment.
🤖 Smoke test agent was here! Everything looks great from the automated review pass. 🚀
📰 BREAKING: Report filed by Smoke Copilot · ● 984.7K
|
@copilot the agent traceid does not match the activation traceid |
|
@copilot add logging of input traceid in setup otlp JavaScript code |
…P endpoint Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e46dbe87-e033-44bd-be84-32251ebdfee6 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e46dbe87-e033-44bd-be84-32251ebdfee6 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in a86b33b. Root cause: Changes:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/165b2769-a20c-4f03-bd15-324e279a0b2c Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 340235b. The setup OTLP code now logs:
|
|
✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready. |
|
🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨ |
|
🌑 The shadows whisper... Smoke Codex failed. The oracle requires further meditation... |
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
Agent Container Tool Check
Result: 12/12 tools available ✅ Overall Status: PASS
|
|
Smoke Test: Copilot - 23988420675 | PR by
Overall: PASS 🎉
|
There was a problem hiding this comment.
Smoke test review of PR #24441. The OTLP trace export implementation looks solid — cross-job trace correlation is wired correctly across all generated lock files, and the JavaScript instrumentation handles retries and failures gracefully without breaking workflows.
📰 BREAKING: Report filed by Smoke Copilot · ● 984.7K
| @@ -0,0 +1,512 @@ | |||
| // @ts-check | |||
There was a problem hiding this comment.
The new send_otlp_span.cjs file looks well-structured. Consider adding a JSDoc comment at the top describing the module purpose and its main exports (sendJobSetupSpan, sendJobConclusionSpan) for better discoverability.
| lockdown_check_failed: ${{ steps.generate_aw_info.outputs.lockdown_check_failed == 'true' }} | ||
| model: ${{ steps.generate_aw_info.outputs.model }} | ||
| secret_verification_result: ${{ steps.validate-secret.outputs.verification_result }} | ||
| setup-trace-id: ${{ steps.setup.outputs.trace-id }} |
There was a problem hiding this comment.
The new setup-trace-id output is consistently wired across jobs. Nice work ensuring cross-job trace correlation is complete — this pattern looks correct and uniform throughout the generated lock files.
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
|
Smoke Test Run §23988420639
Overall: PARTIAL (Tavily rate-limited, 2 skipped)
|
There was a problem hiding this comment.
💥 Automated smoke test review - all systems nominal!
💥 [THE END] — Illustrated by Smoke Claude · ● 191.1K
| @@ -525,7 +545,7 @@ func (fc *FrontmatterConfig) ToMap() map[string]any { | |||
| if fc.Description != "" { | |||
| result["description"] = fc.Description | |||
| } | |||
| if fc.Engine != "" { | |||
| if fc.Engine != nil { | |||
| result["engine"] = fc.Engine | |||
| } | |||
There was a problem hiding this comment.
The nil check for Engine is a good start, but an empty string "" would also pass as a non-nil any. Consider adding a type-switch guard to skip empty-string engine values and preserve the previous behavior where blank engines were omitted from the map output.
| actions | ||
| persist-credentials: false | ||
| - name: Setup Scripts | ||
| id: setup |
There was a problem hiding this comment.
Adding id: setup to the Setup Scripts step enables trace-id output propagation cleanly. The cross-job correlation pattern looks correct — needs.activation.outputs.setup-trace-id is consistently threaded through downstream jobs.
The OTel dependency in gh-aw was dormant — no way to export structured per-call span data to any OTLP backend. This adds the configuration plumbing to activate it, including a JavaScript implementation that instruments every job run with OTLP spans, supports cross-job trace correlation, propagates parent span context within a job for proper trace hierarchy, and threads the trace ID through
aw_contextso composite actions (dispatched child workflows) share a single OTLP trace with their parent.Changes
Go compiler
OTLPConfigstruct + schema — newobservability.otlp.endpointandobservability.otlp.headersfrontmatter fields; validated by the JSON schema withadditionalProperties: falseobservability_otlp.go— focused helpers:extractOTLPEndpointDomain— parses hostname from static URLs; skips${{ expressions }}that can't be resolved at compile timegetOTLPEndpointEnvValue— reads endpoint fromFrontmatterConfigextractOTLPConfigFromRaw— readsobservability.otlpdirectly from the rawmap[string]anyfrontmatter, bypassingParseFrontmatterConfig; ensures OTLP injection works even when parsing fails due to complex engine/tool configs(*Compiler).injectOTLPConfig— uses raw frontmatter as primary source (withParsedFrontmatteras fallback); auto-adds domain toNetworkPermissions.Allowed(AWF firewall) and appendsOTEL_EXPORTER_OTLP_ENDPOINT,OTEL_SERVICE_NAME=gh-aw, andOTEL_EXPORTER_OTLP_HEADERS(when configured) to the workflow-levelenv:blockfrontmatter_types.go—FrontmatterConfig.Enginechanged fromstringtoanyso object-style engine configs (e.g.{id: copilot, max-continuations: 2}) no longer causeParseFrontmatterConfigto fail silentlycompiler_orchestrator_workflow.go— single call toinjectOTLPConfigafterextractYAMLSections; no-op when unconfiguredcompiler_yaml_step_generation.go— setup step getsid: setupandjob-name: ${{ github.job }}; accepts atraceIDparameter for cross-job correlation; script-mode setup step gainsINPUT_JOB_NAME: ${{ github.job }}env var; newgenerateScriptModeCleanupStep()generates aClean Scriptsstep (if: always()) that runsclean.shwithINPUT_JOB_NAMEandINPUT_DESTINATIONsetup-trace-idoutput through all downstream jobs vianeeds.activation.outputs.setup-trace-id:setup-trace-id: ${{ steps.setup.outputs.trace-id }}as a job outputneeds.activation.outputs.setup-trace-idastrace-idinput and exposes its ownsetup-trace-idoutputactivationto theirneedsand passneeds.activation.outputs.setup-trace-iddirectlyneeds.activation.outputs.setup-trace-idJavaScript instrumentation
actions/setup/js/send_otlp_span.cjs:gh-aw.job.setupspan toOTEL_EXPORTER_OTLP_ENDPOINT/v1/tracesusing nativefetch(Node 24, no extra dependencies)nameis"gh-aw"andversionis read fromGH_AW_INFO_VERSIONat runtimeparseOTLPHeaders(raw)— parses the OTel speckey=value,key=valueformat with percent-decoding; correctly handles Sentry-style headers where the value contains spaces and embedded=signs (e.g.x-sentry-auth=Sentry sentry_key=<key>)sendOTLPSpanretries up to 3 times with exponential back-off (100 ms, 200 ms) and warns viaconsole.warnon failure instead of throwing — OTLP failures can never break a workflowsendJobSetupSpan— trace ID priority:options.traceId>INPUT_TRACE_ID>aw_info.context.otel_trace_id(parent's trace fromaw_context) > newly generated; returns{ traceId, spanId }for cross-job correlation; span attributes includegh-aw.job.name,gh-aw.workflow.name,gh-aw.engine.id,gh-aw.run.id,gh-aw.run.actor, andgh-aw.repositorysendJobConclusionSpan(spanName)— readsaw_info.jsonfor workflow/engine/model/version metadata andGH_AW_EFFECTIVE_TOKENSfor token usage; readsGITHUB_AW_OTEL_TRACE_IDandGITHUB_AW_OTEL_PARENT_SPAN_IDfrom env to establish parent-child span relationshipsbuildOTLPPayload— accepts an optionalparentSpanIdfield (omitted for root spans)appendToOTLPJSONL(payload)— mirrors every span payload as a JSON line to/tmp/gh-aw/otel.jsonlfor artifact inspectionactions/setup/js/action_setup_otlp.cjs(new) — standalone OTLP setup span runner shared by bothindex.js(dev/release/action mode) andsetup.sh(script mode); exportsrun()and executes immediately when invoked directly; writestrace-idtoGITHUB_OUTPUTandGITHUB_AW_OTEL_TRACE_ID/GITHUB_AW_OTEL_PARENT_SPAN_IDtoGITHUB_ENV; logs endpoint, sent span IDs, and each env var written; no-op with informational log when endpoint is not configuredactions/setup/js/action_conclusion_otlp.cjs(new) — standalone OTLP conclusion span runner shared by bothpost.js(dev/release/action mode) andclean.sh(script mode); exportsrun()and executes immediately when invoked directly; span named afterINPUT_JOB_NAMEwhen set; logs endpoint, span name, and completion; no-op with informational log when endpoint is not configuredactions/setup/js/aw_context.cjs—buildAwContext()now includesotel_trace_id: process.env.GITHUB_AW_OTEL_TRACE_IDso dispatched child workflows receive the parent's trace ID in theaw_contextpayloadactions/setup/action.yml— addedjob-nameinput;trace-idoptional input (32-char hex) andtrace-idoutput for cross-job correlationactions/setup/index.js— delegates OTLP setup span toaction_setup_otlp.cjsviaawait run(); passesGH_AW_SKIP_SETUP_OTLP=1when spawningsetup.shto prevent the double-span issue in action modeactions/setup/post.js— delegates OTLP conclusion span toaction_conclusion_otlp.cjsviaawait run()before/tmp/gh-awcleanupactions/setup/setup.sh— capturesSETUP_START_MSat the top; callsnode "${DESTINATION}/action_setup_otlp.cjs"at the end with surrounding echo messages; skips the OTLP block whenGH_AW_SKIP_SETUP_OTLP=1(set byindex.jsin action mode)actions/setup/clean.sh(new) — script-mode mirror ofpost.js: callsnode "${DESTINATION}/action_conclusion_otlp.cjs"with surrounding echo messages then removes/tmp/gh-aw/; injected by the compiler as aClean Scriptsstep (if: always()) in every jobSmoke workflows
smoke-copilot.mdandsmoke-copilot-arm.md— configured withobservability.otlp.endpoint: ${{ secrets.GH_AW_OTEL_ENDPOINT }}andobservability.otlp.headers: ${{ secrets.GH_AW_OTEL_HEADERS }}Usage
The compiler automatically wires cross-job trace correlation: every job (detection, safe-outputs, cache, repo-memory, publish-assets, unlock, notify-comment, QMD) receives the activation job's
setup-trace-iddirectly vianeeds.activation.outputs.setup-trace-id, ensuring all spans in a workflow run share the same OTLP trace ID originated in the activation job. Within each job the setup step writesGITHUB_AW_OTEL_TRACE_IDandGITHUB_AW_OTEL_PARENT_SPAN_IDto$GITHUB_ENVso all subsequent steps inherit parent span context: 1 trace per run, 1 parent span per job.The conclusion span is emitted by
post.js(dev/release/action mode) andclean.sh(script mode) — both call the sameaction_conclusion_otlp.cjsmodule. When a static URL is given as the endpoint, the hostname is automatically allowlisted in the AWF firewall.Composite action trace propagation
When a workflow dispatches a child workflow via
dispatch_workflow, the parent'sGITHUB_AW_OTEL_TRACE_IDis included in theaw_contextpayload asotel_trace_id. The child workflow's setup step reads this value fromaw_info.context.otel_trace_idand reuses it as the trace ID — connecting all composite-action spans under a single OTLP trace.✨ PR Review Safe Output Test - Run 23985734470
✨ PR Review Safe Output Test - Run 23986960030
✨ PR Review Safe Output Test - Run 23988420639