Release v0.1.0 · kubernetes-sigs/gateway-api-inference-extension

API version: v1alpha1

We are excited to announce the v0.1.0 release of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs.

Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!

Getting Started

If you'd like to jump right in, head here!

What we support

GIE v0.1.0 was developed on:

vLLM v0.7.1
Envoy Gateway v1.2.1(or higher)
k8s v1.31

With more model servers and gateway implementations coming soon!

Note: Model servers seeking to support GIE should implement our model server protocol here. Any feedback on the protocol or adoption process is very welcomed!

Note: v0.1.0 was necessary to enable Gateways to begin adopting this tooling. Any Gateway implementation that supports ext-proc & the Gateway API will be able to support GIE.

Disclaimers

Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.

What's Changed

Owners addition by @kfswain in #2
proposed repo structure + copy of initial proposal by @kfswain in #1
Repo structure by @kfswain in #3
Update OWNERS by @smarterclayton in #6
PoC implementation by @kfswain in #4
Fix build for ext-proc example by @terrytangyuan in #7
Simplify POC installation by @liu-cong in #8
docs: poc markdown improvements by @Xunzhuo in #9
fix: inconsistent secret key with deployment by @Xunzhuo in #11
Updating top level README by @kfswain in #13
API Proposal by @kfswain in #5
Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
Envoy update by @kfswain in #18
CRD implementation by @kfswain in #20
Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
Add priority based scheduling by @liu-cong in #25
Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
Add a hermetic e2e test with fake backend pods by @liu-cong in #29
Fix mutierr appending; add a unit test. by @liu-cong in #33
Some minor fixes in Envoy setup by @liu-cong in #35
Update targetModel in request body by @liu-cong in #37
Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
Simulation code for llm inference gateway by @kaushikmitr in #15
Add myself to approvers by @kfswain in #42
Dynamic lora load/unload sidecar by @coolkp in #31
LLMServerPool Implementation by @kfswain in #36
Repo cleanup by @kfswain in #46
Updating API and generating code by @kfswain in #47
Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
llmservice reconciler implementation by @kfswain in #48
Update README.md by @BenTheElder in #52
Fixing hermetic_test, small formatting changes by @kfswain in #53
Add myself to reviewers by @liu-cong in #40
Add dependency updates by @robert-cronin in #57
Bump the kubernetes group with 4 updates by @dependabot in #58
Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
Fixing Groupversion by @kfswain in #63
Integrating LLMService with weight splitting by @kfswain in #64
Fix build and test by @liu-cong in #65
Makefile fixes with generated output by @kfswain in #67
Manifest updates by @kaushikmitr in #81
Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
Proposal update for the API names and latency objective by @ahg-g in #91
Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
switch to using upstream vllm with new metric by @coolkp in #54
Updating cloudbuild to have image name by @kfswain in #106
Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
Bump the kubernetes group with 4 updates by @dependabot in #101
Bump google.golang.org/protobuf from 1.35.2 to 1.36.0 by @dependabot in #104
Change from SIG Apps to SIG Network by @terrytangyuan in #92
Add response body handler by @liu-cong in #90
API Shift/Refactor by @kfswain in #93
API compliance fix and build fixes by @kfswain in #114
Added a verify rule to Makefile by @ahg-g in #122
update the linter version by @ahg-g in #123
Disable response body processing by @liu-cong in #121
Adding initial docs infra by @robscott in #118
Lint fixes/updating .golangci to not use deprecated linter by @kfswain in #125
fixing some lint errors by @ahg-g in #126
Fix the make build command and add main tag to the latest image by @ahg-g in #127
Fixing the build command by @ahg-g in #128
Fixed the rest of the lint errors and updating the linters by @ahg-g in #134
Remove outdated configurations and ensure the tutorial runs smoothly by @Jeffwan in #136
Update README: Add Hugging Face Token Setup Instructions and Improve Deployment Instructions by @yankay in #139
Updates APIs Based on Kubernetes API Conventions by @danehans in #143
Fix InferencePoolReconciler by @MaYuan-02 in #147
Updating the boilerplate template and regenerating by @kfswain in #156
Bump github.com/envoyproxy/go-control-plane from 0.13.1 to 0.13.3 by @dependabot in #155
Updating non-generated docs/ minor formatting by @kfswain in #160
Bump github.com/onsi/ginkgo/v2 from 2.22.0 to 2.22.2 by @dependabot in #138
Bump google.golang.org/grpc from 1.69.0 to 1.69.2 by @dependabot in #133
.*: change llm-instance-gateway -> gateway-api-inference-extension by @MadhavJivrajani in #161
Changes InferencePool EPP Flags by @danehans in #152
ext-proc: remove unused fields from EndpointSliceReconciler by @MadhavJivrajani in #165
ext-proc/backend: add unit test for InferencePoolReconciler by @MadhavJivrajani in #168
ext-proc: change Inference* APIs to use NamespacedName by @MadhavJivrajani in #172
manifests: remove unused curl image from EPP manifest by @MadhavJivrajani in #180
Add a few debug logs by @liu-cong in #179
Adds Health gRPC Server and Refactors Main() by @danehans in #148
Adding some initial docs content and diagrams by @robscott in #129
Fixing Netlify builds by @robscott in #195
Bump google.golang.org/grpc from 1.69.2 to 1.69.4 by @dependabot in #193
Bump github.com/envoyproxy/go-control-plane/envoy from 1.32.2 to 1.32.3 by @dependabot in #194
Bump sigs.k8s.io/controller-runtime from 0.19.3 to 0.19.4 by @dependabot in #191
Bump google.golang.org/protobuf from 1.36.1 to 1.36.2 by @dependabot in #192
[v0.1 API Review] Grammatical fixes and TypedCondition creation/defaulting by @kfswain in #186
Add logging guidelines by @liu-cong in #182
dev: respect the IMAGE args in Makefile by @spacewander in #205
fix small typo by @LiorLieberman in #206
Adding metrics for request total, latency and size by @courageJ in #177
Bump google.golang.org/protobuf from 1.36.2 to 1.36.3 by @dependabot in #209
Bump github.com/prometheus/common from 0.61.0 to 0.62.0 by @dependabot in #211
Bump the kubernetes group across 1 directory with 5 updates by @dependabot in #212
[v0.1 API Review] Cleaning up optional fields/clearer wording by @kfswain in #185
Add link to meeting notes by @terrytangyuan in #215
Adding new maintainers by @robscott in #203
Adding myself & ahg-g to owners directly while we figure out alias bug by @kfswain in #218
only print out pods & metrics when log level is DEBUG by @spacewander in #216
Alias fix by @kfswain in #220
Separates EnvoyExtensionPolicy from Ext Proc by @danehans in #200
[Metrics] Add input/output token and request size metrics by @JeffLuoo in #214
Update to k8s v0.32.0 and runtime to v0.20.0 by @ahg-g in #226
Slight cleanup of some of our readmes by @kfswain in #221
Bump google.golang.org/grpc from 1.69.4 to 1.70.0 by @dependabot in #231
Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 by @dependabot in #232
More Getting Started updates by @kfswain in #233
Bump the kubernetes group with 5 updates by @dependabot in #228
Bump sigs.k8s.io/controller-runtime from 0.20.0 to 0.20.1 by @dependabot in #229
Bump google.golang.org/protobuf from 1.36.3 to 1.36.4 by @dependabot in #230
Adding logging const and updating usage by @kfswain in #236
Adds Initial e2e Tests and Tooling by @danehans in #217
Fixes gomega.Eventually() in e2e Test by @danehans in #241
Add Endpoint Picker Protocol Proposal by @liu-cong in #164
Docker modification to support simple docker build by @kfswain in #242
Refactor ext-proc Main with Server Package Add Hermetic Test with k8s API Client for EPP by @BenjaminBraunDev in #222
Requeue reconcile requests for endpointslice until the inferencepool is available by @ahg-g in #248
Repo cleanup by @ahg-g in #255
Fix e2e test by @ahg-g in #246
inferencemodel_reconciler: Fix a log message by @tchap in #261
Populating api-types & concepts by @kfswain in #254
Proposals cleanup by @ahg-g in #266
InferencePool config proposal for API review by @ahg-g in #162
Add TRACE log level for the metric refresh loop by @liu-cong in #275
Using constants for the repeated values by @adarshagrawal38 in #273
Update default target-pod and inject it into response metadata by @ahg-g in #270
[Metrics] Add grafana dashboard for Inference extension and vLLM metrics by @JeffLuoo in #237
Release setup by @ahg-g in #274
Updating to the new gcr repo by @kfswain in #279
Replace EndpointSlice reconciler with pod list backed by informer by @ahg-g in #271
Bump github.com/envoyproxy/go-control-plane/envoy from 1.32.3 to 1.32.4 by @dependabot in #277
feat: adds initial release automation by @danehans in #291

New Contributors

@kfswain made their first contribution in #2
@smarterclayton made their first contribution in #6
@terrytangyuan made their first contribution in #7
@liu-cong made their first contribution in #8
@Xunzhuo made their first contribution in #9
@kaushikmitr made their first contribution in #15
@coolkp made their first contribution in #31
@BenTheElder made their first contribution in #52
@robert-cronin made their first contribution in #57
@dependabot made their first contribution in #58
@ahg-g made their first contribution in #91
@robscott made their first contribution in #118
@Jeffwan made their first contribution in #136
@yankay made their first contribution in #139
@MaYuan-02 made their first contribution in #147
@MadhavJivrajani made their first contribution in #161
@spacewander made their first contribution in #205
@LiorLieberman made their first contribution in #206
@courageJ made their first contribution in #177
@JeffLuoo made their first contribution in #214
@BenjaminBraunDev made their first contribution in #222
@tchap made their first contribution in #261
@adarshagrawal38 made their first contribution in #273

Full Changelog: https://github.com/kubernetes-sigs/gateway-api-inference-extension/commits/v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0