Skip to content

merge upstream changes#1

Open
CoolDuke wants to merge 449 commits into
freenetdigital:masterfrom
jacksontj:master
Open

merge upstream changes#1
CoolDuke wants to merge 449 commits into
freenetdigital:masterfrom
jacksontj:master

Conversation

@CoolDuke

@CoolDuke CoolDuke commented Jan 3, 2019

Copy link
Copy Markdown
Member

No description provided.

Dana Pruitt and others added 29 commits January 19, 2021 14:03
…support TLS settings for Remote Read

Co-authored-by: Dana Pruitt <dpruitt@vmware.com>
Co-authored-by: Jeremy Alvis <jalvis@vmware.com>
Signed-off-by: Dana Pruitt <dpruitt@vmware.com>
Now NodeReplacer treats subquery as effectively a new EvalStmt. This
dramatically simplifies the codebase (as we don't have to handle all the
various edge cases) and allows us to generically support Subquerys. The
only catch here is that the data passed down will be interpreted with
the step requested in the query (which is what the spec calls for)
unlike prom which actually fetches all raw data when using its own tsdb
direct. Because of this there are some slight deltas based on the
interpolation of step values in promql but this is unfortunately
unavoidable.

Fixes #273
There were some bugs in calculating offsets within subqueries (and
missing step alignment) which was causing issues with data showing only
for the first 5m of a window (lookback delta).

Fixes #319 #311
This also adds multi-arch support
Unfortunately upstream has decided to effectively break the remote_write
client (only supports sending from a WAL directory). So for now I've
updated the fork of the old remote client. In the future I may replace
this with some other client (not sure how that'll jive with configs
etc.) but for now this seems to work

Fixes #386
Without this option we have a 0s LookbackDelta which causes the /federate endpoint to query upstreams with a 1s lookback (which is too short).

Fixes #397
Previously it was creating another "router" object which was causing the
API endpoints to not have the prometheus_http metrics.
In the new parser format (included in the prom fork upgrade) the way
MatrixSelectors worked changed. Unfortunately this broke functionality
as we ended up treating them as VectorSelectors (since MatrixSelectors
now include VectorSelectors -- so the tree replacement handled it). This
changes the VectorSelector in the NodeReplacer to ignore VectorSelectors
that are children of a MatrixSelector.

Fixes #409
lewinkedrs and others added 30 commits March 2, 2026 20:12
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.3 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.58.3...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.41.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.41.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.41.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Introduces logo.svg — a triple-chevron mark in Prometheus orange
(#E6522C) paired with the "promxy" wordmark in Inter SemiBold at
slate #3F4448. Wordmark glyphs are baked as SVG paths so the file
renders identically without depending on system font availability.

The logo is also linked from the top of README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the prometheus dependency from the 2.37 LTS line to 3.5 LTS via the
local jacksontj/prometheus fork (replaced as a path replace during the
migration; switch to a tagged remote replace before shipping).

Major upstream changes adapted in this commit:

- Logger: prometheus/common/promlog -> promslog; go-kit/log -> log/slog.
  pkg/logging is rewritten as a logrus-backed slog.Handler.
- storage.Warnings -> annotations.Annotations across querier/series-set
  code paths and remote read/codec.
- storage.Querier.Select gained a leading context.Context.
  storage.Queryable.Querier dropped its context argument.
  storage.Querier.LabelNames/LabelValues gained context + LabelHints.
- storage.Appender gained AppendHistogram, AppendHistogramCTZeroSample,
  AppendCTZeroSample, UpdateMetadata, SetOptions. The remote_write
  appender stubs continue to return "not implemented" for histogram
  appends; recording-rule histogram output is left as a follow-up.
- chunkenc.Iterator.Next/Seek now return ValueType, plus AtHistogram /
  AtFloatHistogram / AtT methods.
- labels.Labels is no longer a slice; replaced index/range with .Range,
  .Len, .IsEmpty, ScratchBuilder, FromStrings, EmptyLabels.
- relabel.Process now returns (Labels, keep bool); call sites updated.
- prometheus/common/sigv4 moved to prometheus/sigv4.
- HTTP auth helpers now take SecretReader (NewInlineSecret / NewFileSecret).
- discovery.NewManager and scrape.NewManager take registry/SD-metrics
  and now multi-return.
- web.TLSStruct -> web.TLSConfig (exporter-toolkit).
- web.SetReady takes web.ReadyStatus instead of bool.
- web.api.v1.NewAPI signature expanded with scrape-pools, OTLP, and
  notifications hooks; threaded with sensible nil defaults.
- promql.Engine.NewRangeQuery takes a leading context and a QueryOpts
  interface (use NewPrometheusQueryOpts).
- common/version.NewCollector moved to client_golang/.../collectors/version.
- promql.Test removed; the in-package end-to-end tests are stubbed with
  t.Skip and a TODO pending an exported promqltest.NewTest in the fork.

Native histogram support (closes/refs #637):

The 3.x chunkenc.Iterator stubs are filled in so promxy can fan out, merge,
and return native histograms alongside floats:

- pkg/promclient/iterators.go: SeriesIterator merge-walks float and
  histogram samples on a model.SampleStream; Seek now respects the current
  position instead of unconditionally advancing.
- pkg/promclient/histogram_convert.go: floatHistogramToSampleHistogram /
  sampleHistogramToFloatHistogram bridge histogram.FloatHistogram and
  the API-facing model.SampleHistogram, with a pointer-keyed side channel
  (finalizer-cleaned) that pins the original FloatHistogram so the
  remote_read fanout preserves full schema fidelity. The HTTP-API JSON
  fanout falls back to a best-effort custom-buckets reconstruction.
- pkg/remote/codec.go: ToQueryResult / FromQueryResult and
  concreteSeriesIterator round-trip prompb.Histogram via FromIntHistogram /
  FromFloatHistogram and ToIntHistogram / ToFloatHistogram.
- pkg/promclient/api.go: PromAPIRemoteRead.GetValue extracts histogram
  samples via AtFloatHistogram alongside floats.
- pkg/promclient/engine.go: ParserValueToModelValue populates
  model.SampleStream.Histograms from promql.HPoint so engine-evaluated
  queries return histogram results.
- pkg/promhttputil/merge.go: anti-affinity dedup runs in parallel for
  the Histograms slice; MergeValues Vector branch propagates the
  Sample.Histogram pointer when both sides carry histograms.
- Test coverage: pkg/promclient/histogram_convert_test.go (4 cases),
  pkg/promhttputil/merge_test.go (4 histogram merge cases),
  pkg/remote/codec_test.go (round-trip with int + float histograms),
  test/promql_test.go re-enables native_histograms.test (285 evals)
  through the remote_read config. histograms.test is held back -- 7 of
  its 105 evals fail in engine-annotation propagation and zero-bucket
  result encoding paths, tracked as a follow-up.

Remaining follow-ups: AppendHistogram on the remote_write storage so
recording rules can produce native histograms; lossless plumbing for
the histograms.test failures called out above; an exported
promqltest.NewTest in the fork; re-attaching glog/klog to the slog logger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the prometheus replace pin to the merged tip of
jacksontj/prometheus@release-3.5_fork_promxy (99c77ced4af1), which now
carries the cherry-picked upstream prometheus/prometheus migration of
discovery/aws to aws-sdk-go-v2:

- prometheus/prometheus#16950 "Upgrade AWS SDK to v2"
- prometheus/prometheus#17355 "discovery/ec2: Fix AWS SDK v2 credentials
  handling for EC2 and Lightsail discovery"

aws-sdk-go v1 was previously pulled in transitively through
discovery/aws (registered via discovery/install in cmd/promxy/main.go);
prometheus/sigv4, the only other AWS-touching dep, was already on v2.
After this bump, github.com/aws/aws-sdk-go is no longer in the module
graph, addressing the EOL/CVE concerns in #738.

Fixes #738

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pkg/remote/ fork existed pending resolution of
prometheus/prometheus#5523, which has since been resolved -- promxy
already imports the upstream package directly for the read path
(pkg/promclient, pkg/servergroup). This deletes the fork and switches
the remote_write path in proxystorage to use upstream as well.

Differences vs the fork:

- Upstream NewStorage requires a WAL directory. When --storage.path is
  set, it is used as the WAL base (durable across restarts). Otherwise
  a per-process os.MkdirTemp directory is created and removed on
  shutdown, preserving the fork's "no persistence across restarts"
  behavior.
- Upstream requires a ReadyScrapeManager; promxy has none, so a noop
  implementation that returns an error is supplied. Upstream tolerates
  this and skips metadata sending.
- Metrics move from the promxy_remote_storage_* namespace to the
  upstream prometheus_remote_storage_* namespace. The upstream metric
  set is significantly richer (per-queue counters, retry tracking,
  etc.). Dashboards and alerts referencing the old names will need
  updating.
- promxy now picks up upstream features the fork lacked: native
  histogram support, remote_write v2 protocol, SigV4/AzureAD/GoogleIAM
  auth on the write path, and ongoing upstream improvements.

Also rename --storage.tsdb.path to --storage.path. Promxy has no TSDB;
the flag's only uses are local working state (active-query tracker
file and now the remote_write WAL). --storage.tsdb.path is retained as
a deprecated alias: passing it logs a warning and sets --storage.path.
Passing both is fatal to avoid silent ambiguity.

Closes: #389

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds cmd/promxy/multi_tenant.conf demonstrating how to model a
multi-tenant downstream (e.g. Mimir, Cortex, GEM) where a single
backend serves different data per X-Scope-OrgID. The example splits
each (backend, tenant) pair into its own server_group with a static
`dc` label, so promxy's "same data within a server_group" assumption
holds and label-less aggregations like count(up) are not under-counted.

Also calls this out near the http_headers documentation in
config.yaml so users hitting the multi-tenant case discover the
example without tripping over issue #703 first.

Refs: #703
The promql tests were intermittently failing with `connection refused` to
localhost:8083. Two issues:

  1. startAPIForTest signaled "started" by closing a channel inside the
     server goroutine *before* srv.ListenAndServe was called, so the
     caller could fire requests before the listener was bound.
  2. Subtests share a hardcoded port and create/destroy a server per
     case. Rapid bind/unbind cycles hit TIME_WAIT, occasionally rejecting
     the next bind.

Switch startAPIForTest to net.Listen on 127.0.0.1:0 (synchronous bind,
OS-assigned port) and srv.Serve(ln). Return the bound addr so callers
can format it into the YAML config. Both TestUpstreamEvaluations and
TestEvaluations are now stable across 5 sequential runs and -count=3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now NodeReplacer bailed out as soon as any VectorSelector in the
subtree carried an @ timestamp, sending all such queries through the
slow Querier.Select fallback. That's correct but expensive — every
`sum(metric @ T)`, `rate(foo[1m] @ T)`, etc. fetched raw samples and
computed locally.

Lift the @-modifier deferral guards in NodeReplacer:

  - SubqueryExpr.Timestamp is now used as the basis for subEvalStmt
    .End / .Start so the inner eval window matches the @-pinned time
    instead of the outer eval range. Unblocks at_modifier.test cases
    like `sum_over_time(...[100s:1s] @ 100 offset 20s)` (was returning
    132 instead of 288).

  - The top-level timestampFinder skip is replaced with a
    `subtreeHasAt` flag that gates the strip-offset / time-shift
    dance. removeOffsetFn is a no-op when @ is present (the downstream
    resolves `@ T offset O` itself; stripping the offset would
    silently move the lookup time). reqOffset (request shift) and
    synthOffset (offset on the synthesized replacement
    VectorSelector) are zeroed for @-bearing subtrees so the request
    range stays unchanged and the engine looks samples up at the
    request timestamps.

  - count(@) goes through the same in-place n.Op = SUM rewrite as
    before; the regression that previously kept it deferred turned
    out to be an upstream prometheus/common bug (see below), not a
    promxy issue.

  - Bare VectorSelector @ T synthesises a flat VectorSelector with no
    @ and no offset, samples positioned at the request timestamps;
    the engine looks them up by ts directly instead of re-applying
    the @ pin and offset to a sample set that's already
    step-invariant.

Pushdown coverage now includes:

  - sum / min / max / topk / bottomk / group(@)
  - avg(@) (rewritten to sum/count, both pushed)
  - count(@), count_values(@)
  - rate(@), irate(@), other Calls
  - aggregate(@) <op> literal BinaryExprs
  - bare VectorSelector @ T
  - SubqueryExpr with @ on the subquery

Info-level annotation propagation
---------------------------------

Pushing rate(@) down ran into a Prometheus 3.x annotation gap:
client_golang's v1 client only parses the `warnings` field of the
JSON response and drops `infos` (where `rate()` on a non-_total
metric, the histogram_quantile non-monotonic-bucket hint, etc. show
up). PromAPIV1 now also holds the lower-level api.Client; when
populated, Query / QueryRange bypass the typed v1.API methods and
post directly via api.Client.Do, decoding both `warnings` and
`infos` and concatenating them with their "PromQL warning: " /
"PromQL info: " prefixes preserved. promhttputil.WarningsConvert
reclassifies them at the consumer so CountWarningsAndInfo and the
test framework's checkAnnotations can tell them apart again. Falls
back to the v1.API path when Client is nil for backwards-compat
with stub-API tests.

Upstream JSON timestamp bug
---------------------------

prometheus/common's model.Time.UnmarshalJSON mis-decodes pre-epoch
sub-second timestamps. For the JSON literal "-59.200" it returns
Time(-58800) instead of Time(-59200) because the negative sign on
the integer part is not propagated to the fractional part. The bug
is still present in v0.67.5 (latest as of writing) and is harmless
in production (Unix timestamps are non-negative) but corrupts
pushed-down range query results when the upstream promqltest
framework's @-modifier multi-evalTime sweep generates
negative-fractional range starts (only seen on collision.test, eval
at 4s, where the sweep visits evalTime=0.8s).

Rather than vendoring a patch or silently producing shifted data,
PromAPIV1.Query / QueryRange reject pre-epoch sub-second timestamps
with an explicit error
(errNegativeFractionalTimestamp). Production never hits this path;
collision.test joins the existing skip list with a comment.

TestNodeReplacer pins the new contract for both pushed-down shapes
and the deferred ones.

Issue: #724

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.43.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.43.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.43.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure) from 2.2.1 to 2.4.0.
- [Release notes](https://github.com/go-viper/mapstructure/releases)
- [Changelog](https://github.com/go-viper/mapstructure/blob/main/CHANGELOG.md)
- [Commits](go-viper/mapstructure@v2.2.1...v2.4.0)

---
updated-dependencies:
- dependency-name: github.com/go-viper/mapstructure/v2
  dependency-version: 2.4.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Prometheus' web.New() unconditionally indexes ListenAddresses[0] when
building GlobalURLOptions, so promxy panicked at startup since the field
was never initialized. Promxy doesn't use the embedded listeners (it
runs its own server), but the slice still needs at least one entry;
populate it with the configured bind address so the API reports a
sensible value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…port

Switch --web.config.file to the standard Prometheus web.config schema by
handing the listener off to github.com/prometheus/exporter-toolkit/web.Serve,
mirroring how Prometheus itself wires its web server. Promxy now supports
TLS, response headers (http_server_config.headers), and basic auth users
(basic_auth_users) from a single config file, with no bespoke parsing on
our side.

Replaces the approach in #744, which added a second --prometheus.web.config.file
flag alongside the existing one and predated the upgrade to
exporter-toolkit v0.14. The flag is marked [EXPERIMENTAL]; existing config
files using the flat schema must be wrapped under `tls_server_config:`.

A small slog -> logrus handler forwards the toolkit's startup messages
into promxy's existing log stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    - Updated the server group struct and logging to
    utilize this new field, improving clarity in target transition logs.

Signed-off-by: Satyam Bhardwaj <stmbhardwaj@gmail.com>
Incorporates the changes requested on #743:

- groupIdentifier always includes the ordinal (guaranteed unique) and
  appends the optional, non-unique name when set, rather than using one
  or the other. The structured log fields follow suit (always `ordinal`,
  `name` only when set).
- Inline the currentTargetCount / initialLoad locals at the
  logTargetTransition call site.

Also adds a `server_group_targets` gauge (labelled by ordinal and name)
set on every discovery update, so an empty server group can be alerted
on directly (e.g. `server_group_targets == 0`) as suggested in #742 --
a metric complements the log line, which isn't alertable on its own.

Adds tests for groupIdentifier and the new metric (zero and non-zero
target counts).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…list item

The `- # ...` form read awkwardly; place the name field after static_configs
with its comment above it, matching the convention used for every other field
in the example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#771)

The prom 3.5 migration replaced promxy's pkg/remote/ fork with upstream
storage/remote. The fork's appender pushed samples directly into each
QueueManager; upstream instead expects samples to be written to an
on-disk WAL that the QueueManagers' WAL watchers tail. promxy has no
local TSDB writing a WAL, so:

  - the watcher failed every tick with
    `error tailing WAL ... <dir>/wal: no such file or directory`
    (the `wal` subdir was never created), and
  - upstream's Appender is a timestampTracker that discards sample
    values, so alert/recording-rule samples were never shipped at all.

Run an agent-mode (WAL-only) tsdb DB alongside remote.Storage, pointed
at the same dir. Its appender writes the WAL records that the queue
managers consume, and it handles series management, checkpointing and
WAL truncation. Because the agent appender is single-use (returns itself
to a pool on Commit), ProxyStorage.Appender now hands out a fresh
appender per call by storing the Appendable rather than a shared
Appender. The agent DB and remote.Storage are reused across config
reloads (same as before) and closed together on shutdown.

Adds a regression test asserting the WAL dir is created, appends are
accepted by the WAL-backed appender, and successive Appender() calls
return distinct instances.

Closes: #771

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stands up a real remote_write receiver (httptest server that decodes the
snappy/protobuf write requests) and a ProxyStorage configured with
remote_write but no server_groups, then appends on an interval as the
rule manager would. Asserts the receiver actually observes the shipped
samples with the expected value.

This covers the full path that issue #771 broke -- appender -> agent WAL
-> queue manager WAL watcher -> HTTP POST -- which the unit test only
exercises at the wiring level. Verified to fail (30s timeout, 0 POSTs,
"error tailing WAL ... no such file or directory") against the pre-fix
code and pass after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The migration to exporter-toolkit (0cabe7f) changed --web.config.file
parsing from promxy's flat web.TLSConfig schema (cert_file/key_file at the
top level) to the exporter-toolkit web.Config schema, which nests TLS under
tls_server_config. This silently broke every existing deployment's web
config, failing startup with a cryptic "field cert_file not found in type
web.Config" error.

Detect the legacy flat schema before handing off to exporter-toolkit and
serve it directly (as promxy did before the migration), logging a
deprecation warning that points users at tls_server_config. Files using the
current schema, empty files, and basic_auth_users/http_server_config-only
files are untouched and still flow through exporter-toolkit.

Adds a regression test that brings up a TLS server from a legacy flat config
and verifies an authenticated client can connect.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… under 32 MiB (#781)

Prometheus 3.5.3+ enforces a 32 MiB cap on the snappy-decoded body of
remote_write requests on the receiver (GHSA-8rm2-7qqf-34qm). promxy's old
vendored remote_write fork defaulted max_samples_per_send to 100; commit
01b9e33 switched to upstream prometheus/storage/remote, whose default is
2000. Upstream batches by sample count only (no byte cap), so shipping
recording-rule output with large/high-cardinality series in 2000-sample
batches can decompress past 32 MiB. The receiver then rejects the request
with "snappy: decoded length N exceeds limit 33554432" and the queue
manager drops the batch (non-recoverable 4xx), silently losing samples.

Restore promxy's historical default of 100 when the user does not set
queue_config.max_samples_per_send explicitly. Because the base unmarshal
always populates QueueConfig from Prometheus' DefaultQueueConfig, an
explicit value is indistinguishable from the upstream default after the
fact, so we re-parse the raw YAML to detect which remote_write entries set
it and only override the unset ones.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an `inject_matchers` server_group option that injects a static set of
label matchers into every selector of every request sent to that downstream
(query, query_range, series, label_names, label_values, and remote-read).
For example, with `cluster="A"` configured, `count(up)` is sent downstream
as `count(up{cluster="A"})` -- scoping the server_group to a slice of the
data even for queries that never reference `cluster`.

This fills the gap between the existing mechanisms:
  - `labels` only *adds* labels to responses
  - `label_filter` only *drops* queries that can't match
  - `inject_matchers` always *adds* the matchers to the queries themselves

The motivating use-case (#698) is presenting a per-tenant view of a merged
downstream (e.g. a single Mimir/Thanos/Prometheus holding many clusters) by
running a promxy per tenant.

The matchers are injected beneath the existing label-manipulation wrappers
so they reach the downstream verbatim, without interacting with
label_filter's query filtering or metrics_relabel's matcher reversal.

Closes #698

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esSet

Decode the Prometheus v1 JSON API response straight into a storage.SeriesSet
via json-iterator, rather than going through client_golang's model.Value
(Vector/Matrix/Scalar with model.Metric maps) and stdlib encoding/json's
checkValid. Metrics are read directly into a labels.ScratchBuilder and samples
(float and native histogram) without model.SamplePair, which is materially
faster on the query/federate hot path and avoids the model.Metric ->
labels.Labels re-conversion on the way into the prometheus-native engine.

Native histograms are decoded into the SeriesSet, and the decoder is
client_golang-free and self-contained so it can be lifted into its own
reusable package.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extracts the streaming decoder into a new, self-contained pkg/promapi and adds
a thin HTTP Client on top of it:

  - Client.Query / QueryRange / Select return storage.SeriesSet directly,
    decoded via DecodeSeriesSet (no model.Value, no model.Metric maps).
  - Pluggable *http.Client transport (TLS/auth/timeouts are the caller's).
  - Depends only on prometheus storage/labels/model + json-iterator -- no
    client_golang -- so it is reusable on its own and composes with the
    prometheus storage ecosystem.

This is the basis for promxy's new bottom layer (replacing PromAPIV1's
client_golang transport + model.Value decode) and is independently publishable.

Tests: decode correctness (vector/matrix/scalar/specials/native histograms),
error envelope, and an httptest round-trip for Query + Select selector building.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…adata)

Rounds out pkg/promapi into a full, standalone client_golang replacement:
metadata methods alongside the SeriesSet-returning data methods. Series decodes
label sets straight into labels.Labels; the rest decode the simple JSON shapes.
All still client_golang-free.

pkg/promapi is now a complete, reusable Prometheus HTTP API query client.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds MergeSeriesSets, which merges HA-member SeriesSets with anti-affinity
dedup by reusing the existing, heavily-tested promhttputil.MergeSampleStream
verbatim (converting each same-labeled series to a model.SampleStream, folding,
converting back; histograms round-trip losslessly via the floatHistogram pin).
Inputs are sorted and combined with storage.NewMergeSeriesSet.

Verified to produce identical output to model.Value MergeValues across overlap,
hole-filling (base-by-point-count), preferMax, and disjoint-series cases -- so
the SeriesSet interface flip can preserve promxy's HA merge semantics exactly.

Also exports promapi.FloatSample/HistogramSample so callers can build series.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…el.Value)

Flips the promclient.API data methods (Query/QueryRange/GetValue) from
(model.Value, v1.Warnings, error) to storage.SeriesSet, propagated through the
entire decorator stack and consumers:

  - Bottom layer (PromAPIV1) decodes downstream responses via pkg/promapi's
    streaming DecodeSeriesSet (no model.Value, no model.Metric maps, no
    checkValid); PromAPIRemoteRead returns the remote.Read SeriesSet.
  - MultiAPI merges HA members with MergeSeriesSets (the validated anti-affinity
    adapter); warnings/errors ride inside the SeriesSet.
  - Decorators (error_wrap, ignore_error, downgrade, debug, recover, engine,
    time_truncate, timefilter, label, labelfilter, metric_relabel,
    inject_matchers) convert via small shared wrappers (MapErr/MapLabels/
    DowngradeErr/WithWarnings) instead of rewriting model.Metric.
  - Consumers: proxyquerier.Select returns the client SeriesSet directly;
    proxystorage pushdown assigns it straight to UnexpandedSeriesSet, deleting
    every IteratorsForValue/NewSeriesSet dance; alertbackfill materializes via
    SeriesSetToMatrix. Dead iterators.go / proxyquerier/series.go removed.

Correctness fixes found under the full integration suite:

  - promapi decode re-wraps downstream warning/info strings with the
    annotations.PromQLWarning/PromQLInfo sentinels (via toAnnotationError)
    instead of plain errors.New, so the engine can classify info-vs-warning
    (fixes "unexpected annotation type" on offset queries).
  - PromAPIRemoteRead.GetValue materializes the SeriesSet before returning: the
    streamed/chunked remote_read variant is lazy and reads from an HTTP body
    whose context is canceled once GetValue returns ("context canceled" on
    iteration). It is drained while the context is alive.
  - Series are built with a copy-on-read float-histogram iterator
    (promapi.NewSeries / NewListSeriesIteratorWithCopy) so a hint reused across
    repeated histograms can't observe an aliased pointer; float reads stay
    zero-copy.

go test ./... is green, including the full test/ integration suite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pairs the shipped BenchmarkDecodeSeriesSet with BenchmarkDecodeModelValueStdlib,
the pre-refactor baseline (stdlib encoding/json into a model.Matrix, the shape
queryWithInfos decoded). Same input body, so the delta is exactly what dropping
model.Value for a streaming SeriesSet decode buys: ~7x faster, ~3x fewer bytes
allocated on a 5000-series matrix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ecorators

Adds focused unit tests for the paths the integration suite caught but that
lacked direct coverage:

- promapi: assert decoded warnings/infos classify via errors.Is against
  annotations.PromQLWarning/PromQLInfo (guards the "unexpected annotation type"
  fix); cover empty/string/unknown result types.
- materializeSeriesSet: assert it drains the source eagerly so the result still
  yields every sample after the source goes dead (guards the remote_read
  "context canceled" fix).
- MergeSeriesSets: cover the no-sets and single-set fast paths (warnings/data
  preserved).
- seriesset_wrap: cover MapErr/DowngradeErr/WithWarnings/MergeAnnotations/
  MapLabels, the error/annotation decorators that had no SeriesSet-path tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.