merge upstream changes#1
Open
CoolDuke wants to merge 449 commits into
Open
Conversation
…support TLS settings for Remote Read Co-authored-by: Dana Pruitt <dpruitt@vmware.com> Co-authored-by: Jeremy Alvis <jalvis@vmware.com>
Signed-off-by: Dana Pruitt <dpruitt@vmware.com>
Now NodeReplacer treats subquery as effectively a new EvalStmt. This dramatically simplifies the codebase (as we don't have to handle all the various edge cases) and allows us to generically support Subquerys. The only catch here is that the data passed down will be interpreted with the step requested in the query (which is what the spec calls for) unlike prom which actually fetches all raw data when using its own tsdb direct. Because of this there are some slight deltas based on the interpolation of step values in promql but this is unfortunately unavoidable. Fixes #273
This also adds multi-arch support
Unfortunately upstream has decided to effectively break the remote_write client (only supports sending from a WAL directory). So for now I've updated the fork of the old remote client. In the future I may replace this with some other client (not sure how that'll jive with configs etc.) but for now this seems to work Fixes #386
Without this option we have a 0s LookbackDelta which causes the /federate endpoint to query upstreams with a 1s lookback (which is too short). Fixes #397
Previously it was creating another "router" object which was causing the API endpoints to not have the prometheus_http metrics.
In the new parser format (included in the prom fork upgrade) the way MatrixSelectors worked changed. Unfortunately this broke functionality as we ended up treating them as VectorSelectors (since MatrixSelectors now include VectorSelectors -- so the tree replacement handled it). This changes the VectorSelector in the NodeReplacer to ignore VectorSelectors that are children of a MatrixSelector. Fixes #409
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.3 to 1.79.3. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.58.3...v1.79.3) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.79.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.41.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.41.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel dependency-version: 1.41.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Introduces logo.svg — a triple-chevron mark in Prometheus orange (#E6522C) paired with the "promxy" wordmark in Inter SemiBold at slate #3F4448. Wordmark glyphs are baked as SVG paths so the file renders identically without depending on system font availability. The logo is also linked from the top of README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the prometheus dependency from the 2.37 LTS line to 3.5 LTS via the local jacksontj/prometheus fork (replaced as a path replace during the migration; switch to a tagged remote replace before shipping). Major upstream changes adapted in this commit: - Logger: prometheus/common/promlog -> promslog; go-kit/log -> log/slog. pkg/logging is rewritten as a logrus-backed slog.Handler. - storage.Warnings -> annotations.Annotations across querier/series-set code paths and remote read/codec. - storage.Querier.Select gained a leading context.Context. storage.Queryable.Querier dropped its context argument. storage.Querier.LabelNames/LabelValues gained context + LabelHints. - storage.Appender gained AppendHistogram, AppendHistogramCTZeroSample, AppendCTZeroSample, UpdateMetadata, SetOptions. The remote_write appender stubs continue to return "not implemented" for histogram appends; recording-rule histogram output is left as a follow-up. - chunkenc.Iterator.Next/Seek now return ValueType, plus AtHistogram / AtFloatHistogram / AtT methods. - labels.Labels is no longer a slice; replaced index/range with .Range, .Len, .IsEmpty, ScratchBuilder, FromStrings, EmptyLabels. - relabel.Process now returns (Labels, keep bool); call sites updated. - prometheus/common/sigv4 moved to prometheus/sigv4. - HTTP auth helpers now take SecretReader (NewInlineSecret / NewFileSecret). - discovery.NewManager and scrape.NewManager take registry/SD-metrics and now multi-return. - web.TLSStruct -> web.TLSConfig (exporter-toolkit). - web.SetReady takes web.ReadyStatus instead of bool. - web.api.v1.NewAPI signature expanded with scrape-pools, OTLP, and notifications hooks; threaded with sensible nil defaults. - promql.Engine.NewRangeQuery takes a leading context and a QueryOpts interface (use NewPrometheusQueryOpts). - common/version.NewCollector moved to client_golang/.../collectors/version. - promql.Test removed; the in-package end-to-end tests are stubbed with t.Skip and a TODO pending an exported promqltest.NewTest in the fork. Native histogram support (closes/refs #637): The 3.x chunkenc.Iterator stubs are filled in so promxy can fan out, merge, and return native histograms alongside floats: - pkg/promclient/iterators.go: SeriesIterator merge-walks float and histogram samples on a model.SampleStream; Seek now respects the current position instead of unconditionally advancing. - pkg/promclient/histogram_convert.go: floatHistogramToSampleHistogram / sampleHistogramToFloatHistogram bridge histogram.FloatHistogram and the API-facing model.SampleHistogram, with a pointer-keyed side channel (finalizer-cleaned) that pins the original FloatHistogram so the remote_read fanout preserves full schema fidelity. The HTTP-API JSON fanout falls back to a best-effort custom-buckets reconstruction. - pkg/remote/codec.go: ToQueryResult / FromQueryResult and concreteSeriesIterator round-trip prompb.Histogram via FromIntHistogram / FromFloatHistogram and ToIntHistogram / ToFloatHistogram. - pkg/promclient/api.go: PromAPIRemoteRead.GetValue extracts histogram samples via AtFloatHistogram alongside floats. - pkg/promclient/engine.go: ParserValueToModelValue populates model.SampleStream.Histograms from promql.HPoint so engine-evaluated queries return histogram results. - pkg/promhttputil/merge.go: anti-affinity dedup runs in parallel for the Histograms slice; MergeValues Vector branch propagates the Sample.Histogram pointer when both sides carry histograms. - Test coverage: pkg/promclient/histogram_convert_test.go (4 cases), pkg/promhttputil/merge_test.go (4 histogram merge cases), pkg/remote/codec_test.go (round-trip with int + float histograms), test/promql_test.go re-enables native_histograms.test (285 evals) through the remote_read config. histograms.test is held back -- 7 of its 105 evals fail in engine-annotation propagation and zero-bucket result encoding paths, tracked as a follow-up. Remaining follow-ups: AppendHistogram on the remote_write storage so recording rules can produce native histograms; lossless plumbing for the histograms.test failures called out above; an exported promqltest.NewTest in the fork; re-attaching glog/klog to the slog logger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the prometheus replace pin to the merged tip of jacksontj/prometheus@release-3.5_fork_promxy (99c77ced4af1), which now carries the cherry-picked upstream prometheus/prometheus migration of discovery/aws to aws-sdk-go-v2: - prometheus/prometheus#16950 "Upgrade AWS SDK to v2" - prometheus/prometheus#17355 "discovery/ec2: Fix AWS SDK v2 credentials handling for EC2 and Lightsail discovery" aws-sdk-go v1 was previously pulled in transitively through discovery/aws (registered via discovery/install in cmd/promxy/main.go); prometheus/sigv4, the only other AWS-touching dep, was already on v2. After this bump, github.com/aws/aws-sdk-go is no longer in the module graph, addressing the EOL/CVE concerns in #738. Fixes #738 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pkg/remote/ fork existed pending resolution of prometheus/prometheus#5523, which has since been resolved -- promxy already imports the upstream package directly for the read path (pkg/promclient, pkg/servergroup). This deletes the fork and switches the remote_write path in proxystorage to use upstream as well. Differences vs the fork: - Upstream NewStorage requires a WAL directory. When --storage.path is set, it is used as the WAL base (durable across restarts). Otherwise a per-process os.MkdirTemp directory is created and removed on shutdown, preserving the fork's "no persistence across restarts" behavior. - Upstream requires a ReadyScrapeManager; promxy has none, so a noop implementation that returns an error is supplied. Upstream tolerates this and skips metadata sending. - Metrics move from the promxy_remote_storage_* namespace to the upstream prometheus_remote_storage_* namespace. The upstream metric set is significantly richer (per-queue counters, retry tracking, etc.). Dashboards and alerts referencing the old names will need updating. - promxy now picks up upstream features the fork lacked: native histogram support, remote_write v2 protocol, SigV4/AzureAD/GoogleIAM auth on the write path, and ongoing upstream improvements. Also rename --storage.tsdb.path to --storage.path. Promxy has no TSDB; the flag's only uses are local working state (active-query tracker file and now the remote_write WAL). --storage.tsdb.path is retained as a deprecated alias: passing it logs a warning and sets --storage.path. Passing both is fatal to avoid silent ambiguity. Closes: #389 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds cmd/promxy/multi_tenant.conf demonstrating how to model a multi-tenant downstream (e.g. Mimir, Cortex, GEM) where a single backend serves different data per X-Scope-OrgID. The example splits each (backend, tenant) pair into its own server_group with a static `dc` label, so promxy's "same data within a server_group" assumption holds and label-less aggregations like count(up) are not under-counted. Also calls this out near the http_headers documentation in config.yaml so users hitting the multi-tenant case discover the example without tripping over issue #703 first. Refs: #703
The promql tests were intermittently failing with `connection refused` to
localhost:8083. Two issues:
1. startAPIForTest signaled "started" by closing a channel inside the
server goroutine *before* srv.ListenAndServe was called, so the
caller could fire requests before the listener was bound.
2. Subtests share a hardcoded port and create/destroy a server per
case. Rapid bind/unbind cycles hit TIME_WAIT, occasionally rejecting
the next bind.
Switch startAPIForTest to net.Listen on 127.0.0.1:0 (synchronous bind,
OS-assigned port) and srv.Serve(ln). Return the bound addr so callers
can format it into the YAML config. Both TestUpstreamEvaluations and
TestEvaluations are now stable across 5 sequential runs and -count=3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now NodeReplacer bailed out as soon as any VectorSelector in the
subtree carried an @ timestamp, sending all such queries through the
slow Querier.Select fallback. That's correct but expensive — every
`sum(metric @ T)`, `rate(foo[1m] @ T)`, etc. fetched raw samples and
computed locally.
Lift the @-modifier deferral guards in NodeReplacer:
- SubqueryExpr.Timestamp is now used as the basis for subEvalStmt
.End / .Start so the inner eval window matches the @-pinned time
instead of the outer eval range. Unblocks at_modifier.test cases
like `sum_over_time(...[100s:1s] @ 100 offset 20s)` (was returning
132 instead of 288).
- The top-level timestampFinder skip is replaced with a
`subtreeHasAt` flag that gates the strip-offset / time-shift
dance. removeOffsetFn is a no-op when @ is present (the downstream
resolves `@ T offset O` itself; stripping the offset would
silently move the lookup time). reqOffset (request shift) and
synthOffset (offset on the synthesized replacement
VectorSelector) are zeroed for @-bearing subtrees so the request
range stays unchanged and the engine looks samples up at the
request timestamps.
- count(@) goes through the same in-place n.Op = SUM rewrite as
before; the regression that previously kept it deferred turned
out to be an upstream prometheus/common bug (see below), not a
promxy issue.
- Bare VectorSelector @ T synthesises a flat VectorSelector with no
@ and no offset, samples positioned at the request timestamps;
the engine looks them up by ts directly instead of re-applying
the @ pin and offset to a sample set that's already
step-invariant.
Pushdown coverage now includes:
- sum / min / max / topk / bottomk / group(@)
- avg(@) (rewritten to sum/count, both pushed)
- count(@), count_values(@)
- rate(@), irate(@), other Calls
- aggregate(@) <op> literal BinaryExprs
- bare VectorSelector @ T
- SubqueryExpr with @ on the subquery
Info-level annotation propagation
---------------------------------
Pushing rate(@) down ran into a Prometheus 3.x annotation gap:
client_golang's v1 client only parses the `warnings` field of the
JSON response and drops `infos` (where `rate()` on a non-_total
metric, the histogram_quantile non-monotonic-bucket hint, etc. show
up). PromAPIV1 now also holds the lower-level api.Client; when
populated, Query / QueryRange bypass the typed v1.API methods and
post directly via api.Client.Do, decoding both `warnings` and
`infos` and concatenating them with their "PromQL warning: " /
"PromQL info: " prefixes preserved. promhttputil.WarningsConvert
reclassifies them at the consumer so CountWarningsAndInfo and the
test framework's checkAnnotations can tell them apart again. Falls
back to the v1.API path when Client is nil for backwards-compat
with stub-API tests.
Upstream JSON timestamp bug
---------------------------
prometheus/common's model.Time.UnmarshalJSON mis-decodes pre-epoch
sub-second timestamps. For the JSON literal "-59.200" it returns
Time(-58800) instead of Time(-59200) because the negative sign on
the integer part is not propagated to the fractional part. The bug
is still present in v0.67.5 (latest as of writing) and is harmless
in production (Unix timestamps are non-negative) but corrupts
pushed-down range query results when the upstream promqltest
framework's @-modifier multi-evalTime sweep generates
negative-fractional range starts (only seen on collision.test, eval
at 4s, where the sweep visits evalTime=0.8s).
Rather than vendoring a patch or silently producing shifted data,
PromAPIV1.Query / QueryRange reject pre-epoch sub-second timestamps
with an explicit error
(errNegativeFractionalTimestamp). Production never hits this path;
collision.test joins the existing skip list with a comment.
TestNodeReplacer pins the new contract for both pushed-down shapes
and the deferred ones.
Issue: #724
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure) from 2.2.1 to 2.4.0. - [Release notes](https://github.com/go-viper/mapstructure/releases) - [Changelog](https://github.com/go-viper/mapstructure/blob/main/CHANGELOG.md) - [Commits](go-viper/mapstructure@v2.2.1...v2.4.0) --- updated-dependencies: - dependency-name: github.com/go-viper/mapstructure/v2 dependency-version: 2.4.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Prometheus' web.New() unconditionally indexes ListenAddresses[0] when building GlobalURLOptions, so promxy panicked at startup since the field was never initialized. Promxy doesn't use the embedded listeners (it runs its own server), but the slice still needs at least one entry; populate it with the configured bind address so the API reports a sensible value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…port Switch --web.config.file to the standard Prometheus web.config schema by handing the listener off to github.com/prometheus/exporter-toolkit/web.Serve, mirroring how Prometheus itself wires its web server. Promxy now supports TLS, response headers (http_server_config.headers), and basic auth users (basic_auth_users) from a single config file, with no bespoke parsing on our side. Replaces the approach in #744, which added a second --prometheus.web.config.file flag alongside the existing one and predated the upgrade to exporter-toolkit v0.14. The flag is marked [EXPERIMENTAL]; existing config files using the flat schema must be wrapped under `tls_server_config:`. A small slog -> logrus handler forwards the toolkit's startup messages into promxy's existing log stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Updated the server group struct and logging to
utilize this new field, improving clarity in target transition logs.
Signed-off-by: Satyam Bhardwaj <stmbhardwaj@gmail.com>
Incorporates the changes requested on #743: - groupIdentifier always includes the ordinal (guaranteed unique) and appends the optional, non-unique name when set, rather than using one or the other. The structured log fields follow suit (always `ordinal`, `name` only when set). - Inline the currentTargetCount / initialLoad locals at the logTargetTransition call site. Also adds a `server_group_targets` gauge (labelled by ordinal and name) set on every discovery update, so an empty server group can be alerted on directly (e.g. `server_group_targets == 0`) as suggested in #742 -- a metric complements the log line, which isn't alertable on its own. Adds tests for groupIdentifier and the new metric (zero and non-zero target counts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…list item The `- # ...` form read awkwardly; place the name field after static_configs with its comment above it, matching the convention used for every other field in the example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#771) The prom 3.5 migration replaced promxy's pkg/remote/ fork with upstream storage/remote. The fork's appender pushed samples directly into each QueueManager; upstream instead expects samples to be written to an on-disk WAL that the QueueManagers' WAL watchers tail. promxy has no local TSDB writing a WAL, so: - the watcher failed every tick with `error tailing WAL ... <dir>/wal: no such file or directory` (the `wal` subdir was never created), and - upstream's Appender is a timestampTracker that discards sample values, so alert/recording-rule samples were never shipped at all. Run an agent-mode (WAL-only) tsdb DB alongside remote.Storage, pointed at the same dir. Its appender writes the WAL records that the queue managers consume, and it handles series management, checkpointing and WAL truncation. Because the agent appender is single-use (returns itself to a pool on Commit), ProxyStorage.Appender now hands out a fresh appender per call by storing the Appendable rather than a shared Appender. The agent DB and remote.Storage are reused across config reloads (same as before) and closed together on shutdown. Adds a regression test asserting the WAL dir is created, appends are accepted by the WAL-backed appender, and successive Appender() calls return distinct instances. Closes: #771 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stands up a real remote_write receiver (httptest server that decodes the snappy/protobuf write requests) and a ProxyStorage configured with remote_write but no server_groups, then appends on an interval as the rule manager would. Asserts the receiver actually observes the shipped samples with the expected value. This covers the full path that issue #771 broke -- appender -> agent WAL -> queue manager WAL watcher -> HTTP POST -- which the unit test only exercises at the wiring level. Verified to fail (30s timeout, 0 POSTs, "error tailing WAL ... no such file or directory") against the pre-fix code and pass after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The migration to exporter-toolkit (0cabe7f) changed --web.config.file parsing from promxy's flat web.TLSConfig schema (cert_file/key_file at the top level) to the exporter-toolkit web.Config schema, which nests TLS under tls_server_config. This silently broke every existing deployment's web config, failing startup with a cryptic "field cert_file not found in type web.Config" error. Detect the legacy flat schema before handing off to exporter-toolkit and serve it directly (as promxy did before the migration), logging a deprecation warning that points users at tls_server_config. Files using the current schema, empty files, and basic_auth_users/http_server_config-only files are untouched and still flow through exporter-toolkit. Adds a regression test that brings up a TLS server from a legacy flat config and verifies an authenticated client can connect. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… under 32 MiB (#781) Prometheus 3.5.3+ enforces a 32 MiB cap on the snappy-decoded body of remote_write requests on the receiver (GHSA-8rm2-7qqf-34qm). promxy's old vendored remote_write fork defaulted max_samples_per_send to 100; commit 01b9e33 switched to upstream prometheus/storage/remote, whose default is 2000. Upstream batches by sample count only (no byte cap), so shipping recording-rule output with large/high-cardinality series in 2000-sample batches can decompress past 32 MiB. The receiver then rejects the request with "snappy: decoded length N exceeds limit 33554432" and the queue manager drops the batch (non-recoverable 4xx), silently losing samples. Restore promxy's historical default of 100 when the user does not set queue_config.max_samples_per_send explicitly. Because the base unmarshal always populates QueueConfig from Prometheus' DefaultQueueConfig, an explicit value is indistinguishable from the upstream default after the fact, so we re-parse the raw YAML to detect which remote_write entries set it and only override the unset ones. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an `inject_matchers` server_group option that injects a static set of
label matchers into every selector of every request sent to that downstream
(query, query_range, series, label_names, label_values, and remote-read).
For example, with `cluster="A"` configured, `count(up)` is sent downstream
as `count(up{cluster="A"})` -- scoping the server_group to a slice of the
data even for queries that never reference `cluster`.
This fills the gap between the existing mechanisms:
- `labels` only *adds* labels to responses
- `label_filter` only *drops* queries that can't match
- `inject_matchers` always *adds* the matchers to the queries themselves
The motivating use-case (#698) is presenting a per-tenant view of a merged
downstream (e.g. a single Mimir/Thanos/Prometheus holding many clusters) by
running a promxy per tenant.
The matchers are injected beneath the existing label-manipulation wrappers
so they reach the downstream verbatim, without interacting with
label_filter's query filtering or metrics_relabel's matcher reversal.
Closes #698
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esSet Decode the Prometheus v1 JSON API response straight into a storage.SeriesSet via json-iterator, rather than going through client_golang's model.Value (Vector/Matrix/Scalar with model.Metric maps) and stdlib encoding/json's checkValid. Metrics are read directly into a labels.ScratchBuilder and samples (float and native histogram) without model.SamplePair, which is materially faster on the query/federate hot path and avoids the model.Metric -> labels.Labels re-conversion on the way into the prometheus-native engine. Native histograms are decoded into the SeriesSet, and the decoder is client_golang-free and self-contained so it can be lifted into its own reusable package. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extracts the streaming decoder into a new, self-contained pkg/promapi and adds
a thin HTTP Client on top of it:
- Client.Query / QueryRange / Select return storage.SeriesSet directly,
decoded via DecodeSeriesSet (no model.Value, no model.Metric maps).
- Pluggable *http.Client transport (TLS/auth/timeouts are the caller's).
- Depends only on prometheus storage/labels/model + json-iterator -- no
client_golang -- so it is reusable on its own and composes with the
prometheus storage ecosystem.
This is the basis for promxy's new bottom layer (replacing PromAPIV1's
client_golang transport + model.Value decode) and is independently publishable.
Tests: decode correctness (vector/matrix/scalar/specials/native histograms),
error envelope, and an httptest round-trip for Query + Select selector building.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…adata) Rounds out pkg/promapi into a full, standalone client_golang replacement: metadata methods alongside the SeriesSet-returning data methods. Series decodes label sets straight into labels.Labels; the rest decode the simple JSON shapes. All still client_golang-free. pkg/promapi is now a complete, reusable Prometheus HTTP API query client. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds MergeSeriesSets, which merges HA-member SeriesSets with anti-affinity dedup by reusing the existing, heavily-tested promhttputil.MergeSampleStream verbatim (converting each same-labeled series to a model.SampleStream, folding, converting back; histograms round-trip losslessly via the floatHistogram pin). Inputs are sorted and combined with storage.NewMergeSeriesSet. Verified to produce identical output to model.Value MergeValues across overlap, hole-filling (base-by-point-count), preferMax, and disjoint-series cases -- so the SeriesSet interface flip can preserve promxy's HA merge semantics exactly. Also exports promapi.FloatSample/HistogramSample so callers can build series. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…el.Value)
Flips the promclient.API data methods (Query/QueryRange/GetValue) from
(model.Value, v1.Warnings, error) to storage.SeriesSet, propagated through the
entire decorator stack and consumers:
- Bottom layer (PromAPIV1) decodes downstream responses via pkg/promapi's
streaming DecodeSeriesSet (no model.Value, no model.Metric maps, no
checkValid); PromAPIRemoteRead returns the remote.Read SeriesSet.
- MultiAPI merges HA members with MergeSeriesSets (the validated anti-affinity
adapter); warnings/errors ride inside the SeriesSet.
- Decorators (error_wrap, ignore_error, downgrade, debug, recover, engine,
time_truncate, timefilter, label, labelfilter, metric_relabel,
inject_matchers) convert via small shared wrappers (MapErr/MapLabels/
DowngradeErr/WithWarnings) instead of rewriting model.Metric.
- Consumers: proxyquerier.Select returns the client SeriesSet directly;
proxystorage pushdown assigns it straight to UnexpandedSeriesSet, deleting
every IteratorsForValue/NewSeriesSet dance; alertbackfill materializes via
SeriesSetToMatrix. Dead iterators.go / proxyquerier/series.go removed.
Correctness fixes found under the full integration suite:
- promapi decode re-wraps downstream warning/info strings with the
annotations.PromQLWarning/PromQLInfo sentinels (via toAnnotationError)
instead of plain errors.New, so the engine can classify info-vs-warning
(fixes "unexpected annotation type" on offset queries).
- PromAPIRemoteRead.GetValue materializes the SeriesSet before returning: the
streamed/chunked remote_read variant is lazy and reads from an HTTP body
whose context is canceled once GetValue returns ("context canceled" on
iteration). It is drained while the context is alive.
- Series are built with a copy-on-read float-histogram iterator
(promapi.NewSeries / NewListSeriesIteratorWithCopy) so a hint reused across
repeated histograms can't observe an aliased pointer; float reads stay
zero-copy.
go test ./... is green, including the full test/ integration suite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pairs the shipped BenchmarkDecodeSeriesSet with BenchmarkDecodeModelValueStdlib, the pre-refactor baseline (stdlib encoding/json into a model.Matrix, the shape queryWithInfos decoded). Same input body, so the delta is exactly what dropping model.Value for a streaming SeriesSet decode buys: ~7x faster, ~3x fewer bytes allocated on a 5000-series matrix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ecorators Adds focused unit tests for the paths the integration suite caught but that lacked direct coverage: - promapi: assert decoded warnings/infos classify via errors.Is against annotations.PromQLWarning/PromQLInfo (guards the "unexpected annotation type" fix); cover empty/string/unknown result types. - materializeSeriesSet: assert it drains the source eagerly so the result still yields every sample after the source goes dead (guards the remote_read "context canceled" fix). - MergeSeriesSets: cover the no-sets and single-set fast paths (warnings/data preserved). - seriesset_wrap: cover MapErr/DowngradeErr/WithWarnings/MergeAnnotations/ MapLabels, the error/annotation decorators that had no SeriesSet-path tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.