Releases: kubernetes-sigs/gateway-api-inference-extension
v0.1.0
API version: v1alpha1
We are excited to announce the v0.1.0 release of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs.
Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!
Getting Started
If you'd like to jump right in, head here!
What we support
GIE v0.1.0 was developed on:
- vLLM v0.7.1
- Envoy Gateway v1.2.1(or higher)
- k8s v1.31
With more model servers and gateway implementations coming soon!
Note: Model servers seeking to support GIE should implement our model server protocol here. Any feedback on the protocol or adoption process is very welcomed!
Note: v0.1.0 was necessary to enable Gateways to begin adopting this tooling. Any Gateway implementation that supports ext-proc & the Gateway API will be able to support GIE.
Disclaimers
- Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
- Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.
What's Changed
- Owners addition by @kfswain in #2
- proposed repo structure + copy of initial proposal by @kfswain in #1
- Repo structure by @kfswain in #3
- Update OWNERS by @smarterclayton in #6
- PoC implementation by @kfswain in #4
- Fix build for ext-proc example by @terrytangyuan in #7
- Simplify POC installation by @liu-cong in #8
- docs: poc markdown improvements by @Xunzhuo in #9
- fix: inconsistent secret key with deployment by @Xunzhuo in #11
- Updating top level README by @kfswain in #13
- API Proposal by @kfswain in #5
- Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
- Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
- Envoy update by @kfswain in #18
- CRD implementation by @kfswain in #20
- Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
- Add priority based scheduling by @liu-cong in #25
- Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
- Add a hermetic e2e test with fake backend pods by @liu-cong in #29
- Fix mutierr appending; add a unit test. by @liu-cong in #33
- Some minor fixes in Envoy setup by @liu-cong in #35
- Update targetModel in request body by @liu-cong in #37
- Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
- Simulation code for llm inference gateway by @kaushikmitr in #15
- Add myself to approvers by @kfswain in #42
- Dynamic lora load/unload sidecar by @coolkp in #31
- LLMServerPool Implementation by @kfswain in #36
- Repo cleanup by @kfswain in #46
- Updating API and generating code by @kfswain in #47
- Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
- llmservice reconciler implementation by @kfswain in #48
- Update README.md by @BenTheElder in #52
- Fixing hermetic_test, small formatting changes by @kfswain in #53
- Add myself to reviewers by @liu-cong in #40
- Add dependency updates by @robert-cronin in #57
- Bump the kubernetes group with 4 updates by @dependabot in #58
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
- Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
- Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
- Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
- Fixing Groupversion by @kfswain in #63
- Integrating LLMService with weight splitting by @kfswain in #64
- Fix build and test by @liu-cong in #65
- Makefile fixes with generated output by @kfswain in #67
- Manifest updates by @kaushikmitr in #81
- Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
- Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
- Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
- Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
- Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
- Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
- Proposal update for the API names and latency objective by @ahg-g in #91
- Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
- switch to using upstream vllm with new metric by @coolkp in #54
- Updating cloudbuild to have image name by @kfswain in #106
- Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
- Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
- Bump the kubernetes group with 4 updates by @dependabot in https...
v0.1.0-rc.1
API version: v1alpha1
We are excited to announce the v0.1.0-rc.1
release candidate of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs. Please note the following:
- Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We strongly advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
- Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.
Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!
What's Changed
- Owners addition by @kfswain in #2
- proposed repo structure + copy of initial proposal by @kfswain in #1
- Repo structure by @kfswain in #3
- Update OWNERS by @smarterclayton in #6
- PoC implementation by @kfswain in #4
- Fix build for ext-proc example by @terrytangyuan in #7
- Simplify POC installation by @liu-cong in #8
- docs: poc markdown improvements by @Xunzhuo in #9
- fix: inconsistent secret key with deployment by @Xunzhuo in #11
- Updating top level README by @kfswain in #13
- API Proposal by @kfswain in #5
- Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
- Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
- Envoy update by @kfswain in #18
- CRD implementation by @kfswain in #20
- Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
- Add priority based scheduling by @liu-cong in #25
- Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
- Add a hermetic e2e test with fake backend pods by @liu-cong in #29
- Fix mutierr appending; add a unit test. by @liu-cong in #33
- Some minor fixes in Envoy setup by @liu-cong in #35
- Update targetModel in request body by @liu-cong in #37
- Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
- Simulation code for llm inference gateway by @kaushikmitr in #15
- Add myself to approvers by @kfswain in #42
- Dynamic lora load/unload sidecar by @coolkp in #31
- LLMServerPool Implementation by @kfswain in #36
- Repo cleanup by @kfswain in #46
- Updating API and generating code by @kfswain in #47
- Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
- llmservice reconciler implementation by @kfswain in #48
- Update README.md by @BenTheElder in #52
- Fixing hermetic_test, small formatting changes by @kfswain in #53
- Add myself to reviewers by @liu-cong in #40
- Add dependency updates by @robert-cronin in #57
- Bump the kubernetes group with 4 updates by @dependabot in #58
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
- Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
- Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
- Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
- Fixing Groupversion by @kfswain in #63
- Integrating LLMService with weight splitting by @kfswain in #64
- Fix build and test by @liu-cong in #65
- Makefile fixes with generated output by @kfswain in #67
- Manifest updates by @kaushikmitr in #81
- Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
- Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
- Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
- Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
- Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
- Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
- Proposal update for the API names and latency objective by @ahg-g in #91
- Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
- switch to using upstream vllm with new metric by @coolkp in #54
- Updating cloudbuild to have image name by @kfswain in #106
- Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
- Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
- Bump the kubernetes group with 4 updates by @dependabot in #101
- Bump google.golang.org/protobuf from 1.35.2 to 1.36.0 by @dependabot in #104
- Change from SIG Apps to SIG Network by @terrytangyuan in #92
- Add response body handler by @liu-cong in #90
- API Shift/Refactor by @kfswain in #93
- API compliance fix and build fixes by @kfswain in #114
- Added a verify rule to Makefile by @ahg-g in #122
- update the linter version by @ahg-g in https://github.com/kubernetes-sigs/gatew...