Skip to content

Fix in scorer manager in picking the best target #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 66 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
0c95c2a
Add initial support for scorers, used as part of decision which pod w…
mayabar Apr 10, 2025
bde57da
Fixes in scores infrastructure & session aware scorer
mayabar Apr 14, 2025
8f9785e
- Add cleanup for session->pod map
mayabar Apr 14, 2025
52af66f
Export score and pod for external implementations
mayabar Apr 14, 2025
90e23bc
Rename session id header
mayabar Apr 15, 2025
c946acc
Separate code of Scorer interface and scorer implementations + add sc…
mayabar Apr 15, 2025
137ca09
Remove vllmRequest from scoreTargets API since it exists in the context
mayabar Apr 15, 2025
c42f72a
Support negative score weights
mayabar Apr 15, 2025
a05a573
Fix fakeDataStore to be compatible with DataStore intereface
mayabar Apr 15, 2025
aca8e07
- Check for nils in list of available pods in main scoring function o…
mayabar Apr 16, 2025
bae2a66
[version bump] Promote 0.0.2 to prod, bump dev to 0.0.3
Apr 16, 2025
c398f4b
chore: move openshift router deployment to extra
shaneutt Apr 16, 2025
95e6bb1
feat: add deployment for sail operator
shaneutt Apr 16, 2025
6d12b06
feat: add istio control-plane deployment
shaneutt Apr 16, 2025
8c4eb46
feat: add vllm simulator deployment
shaneutt Apr 16, 2025
58ef159
feat: add inference-gateway deployment
shaneutt Apr 16, 2025
f606e0d
feat: add kind environment deployment
shaneutt Apr 16, 2025
c679724
feat: kind dev env deployment script
shaneutt Apr 16, 2025
a426e55
Bump golang.org/x/net from 0.37.0 to 0.38.0
dependabot[bot] Apr 16, 2025
09e79e6
Merge pull request #5 from neuralmagic/dependabot/go_modules/golang.o…
clubanderson Apr 16, 2025
d8303a0
Merge pull request #1 from mayabar/main
mayabar Apr 17, 2025
dad8db2
Merge pull request #4 from shaneutt/shaneutt/initial-dev-deployments
clubanderson Apr 17, 2025
f608526
fix: basic container image builds for linux
shaneutt Apr 17, 2025
2dd2ee7
fix: lint fix
shaneutt Apr 17, 2025
10be213
Merge pull request #10 from shaneutt/shaneutt/fix-image-builds
elevran Apr 17, 2025
a24a801
fix: move openshift deployment to environments
shaneutt Apr 17, 2025
950e07b
fix: retarget kustomize deployments in Makefile
shaneutt Apr 17, 2025
47bed9d
empty top level kustomization.yaml - make CICD happy
elevran Apr 17, 2025
0c4e6c8
Merge pull request #17 from elevran/deploy_kustomization_yaml
clubanderson Apr 17, 2025
e0dcba6
Merge pull request #15 from shaneutt/fix-kustomize-envs
clubanderson Apr 17, 2025
f36b10f
feat: add infra targets for ocp to Makefile
shaneutt Apr 17, 2025
b3feea1
Merge pull request #20 from shaneutt/makefile-openshift-infrastructur…
clubanderson Apr 17, 2025
320961c
Minor fixes to enable image building matching GIE
elevran Apr 17, 2025
1024f07
Yamls for inference model and inference pool
mayabar Apr 18, 2025
7655e3c
update vllm-sim deployment yaml
mayabar Apr 18, 2025
c9fd65b
draft changes in run-kind
elevran Apr 18, 2025
5e0ad98
testing new deployment dev
clubanderson Apr 18, 2025
9413ed0
testing new deployment dev
clubanderson Apr 18, 2025
e41f154
chore: inference-gateway component fixes for namespace-level deployment
shaneutt Apr 18, 2025
aad629b
Merge pull request #21 from elevran/image_build
shaneutt Apr 18, 2025
84da7a5
docs: add issue links for some TODOs
shaneutt Apr 18, 2025
be9c800
Merge pull request #29 from elevran/kind_env
shaneutt Apr 18, 2025
8004347
Merge pull request #24 from mayabar/dev
shaneutt Apr 18, 2025
6bd139f
feat: add crd deployment component
shaneutt Apr 18, 2025
9202462
fix: remove podman load instructions that are no longer needed
shaneutt Apr 18, 2025
8e46ea9
upgrade golang.org/x/oauth2 to v0.27.0
elevran Apr 18, 2025
e7a53af
Merge pull request #31 from shaneutt/shaneutt/crd-deployments
clubanderson Apr 18, 2025
142668c
Merge pull request #32 from elevran/oauth2_vuln
clubanderson Apr 18, 2025
413c4a7
feat: add istio crds to deployments
shaneutt Apr 18, 2025
43ab7c1
chore: add custom build for istio-control-plane
shaneutt Apr 18, 2025
a3355b3
chore: cleanup vllm-sim deployments
shaneutt Apr 18, 2025
f699eb7
chore: update gateway deployment for gie compat
shaneutt Apr 18, 2025
760414e
chore: kind env script cleanup
shaneutt Apr 18, 2025
5659f22
chore: cleanup sail operator deployment
shaneutt Apr 18, 2025
1ef859f
chore: cleanup kind dev env deployment
shaneutt Apr 18, 2025
322a421
chore: move kind dev env deploys
shaneutt Apr 18, 2025
383d2db
chore: move openshift dev env deploys
shaneutt Apr 18, 2025
55bf0f8
feat: add environment.dev.kind makefile target
shaneutt Apr 18, 2025
896270f
docs: add development documentation
shaneutt Apr 18, 2025
a83015b
chore: cleanup some language in the Makefile
shaneutt Apr 18, 2025
33a10b5
Merge pull request #33 from shaneutt/shaneutt/kind-full-stack
chcost Apr 18, 2025
175fbec
added infra pipeline run stuff
clubanderson Apr 19, 2025
b7bcf73
added infra pipeline run stuff
clubanderson Apr 19, 2025
eea72e4
added infra pipeline run stuff
clubanderson Apr 19, 2025
72a4328
pre-commit hook added
clubanderson Apr 19, 2025
e195481
scorer fix
mayabar Apr 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions .tekton/deploy-infra-to-openshift.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: openshift-redeploy-infra-task
spec:
params:
- name: source-branch
type: string
description: "Git branch name"
- name: prod-version
type: string
- name: dev-version
type: string
- name: prod_image_tag_base
type: string
- name: dev_image_tag_base
type: string
workspaces:
- name: source
steps:
- name: redeploy
image: quay.io/projectquay/golang:1.24
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
workingDir: $(workspaces.source.path)
env:
- name: STORAGE_DRIVER
value: vfs
script: |
#!/bin/bash
set -e

echo "📦 Installing dependencies with dnf..."
dnf install -y make jq curl gettext && dnf clean all

echo "📥 Installing kubectl..."
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

echo "📥 Installing kustomize..."
KUSTOMIZE_TAG=$(curl -s https://api.github.com/repos/kubernetes-sigs/kustomize/releases/latest | jq -r '.tag_name')
KUSTOMIZE_VERSION="${KUSTOMIZE_TAG##*/}" # strips prefix like 'kustomize/' from tag

curl -LO "https://github.com/kubernetes-sigs/kustomize/releases/download/${KUSTOMIZE_TAG}/kustomize_${KUSTOMIZE_VERSION}_linux_amd64.tar.gz"

tar -xzf "kustomize_${KUSTOMIZE_VERSION}_linux_amd64.tar.gz" -C /usr/local/bin
chmod +x /usr/local/bin/kustomize
kustomize version

echo "🔧 Getting namespace and project_name from Makefile..."
DEFAULT_NAMESPACE=$(make -s print-namespace)
PROJECT_NAME=$(make -s print-project-name)

if [ "$(params.source-branch)" = "main" ]; then
NS="${DEFAULT_NAMESPACE}"
IMAGE_TAG_BASE=$(params.prod_image_tag_base)
VERSION=$(params.prod-version)
else
NS="${DEFAULT_NAMESPACE}-dev"
IMAGE_TAG_BASE=$(params.dev_image_tag_base)
VERSION=$(params.dev-version)
fi

echo "🔧 Using namespace: $NS"
echo "🔧 Using project_name: $PROJECT_NAME"
echo "🔧 Using image_tag_base: $IMAGE_TAG_BASE"
echo "🔧 Using version: $VERSION"

# echo "🧹 Uninstalling existing deployment..."
# # make uninstall-openshift NAMESPACE=$NS PROJECT_NAME=$PROJECT_NAME IMAGE_TAG_BASE=$IMAGE_TAG_BASE VERSION=$VERSION || echo "❗️ Failed to uninstall deployment"

# (make uninstall || echo "❗️ Failed to uninstall") && (make undeploy IMG=$IMAGE_TAG_BASE:$VERSION || echo "❗️ Failed to uninstall deployment")

# echo "⏳ Waiting 3 seconds before reinstall..."
# sleep 3

echo "🚀 Reinstalling OpenShift deployment..."
INFRASTRUCTURE_OVERRIDE=true make install-openshift-infrastructure

echo "⏳ Waiting 20 seconds before verifying resources..."
sleep 20

echo "🔍 Checking status of resources in namespace: $NS"
kubectl get pods -n $NS || echo "❗️ Failed to get pods"
kubectl get deploy -n $NS || echo "❗️ Failed to get deployments"
kubectl get svc -n $NS || echo "❗️ Failed to get services"
kubectl get routes -n $NS || echo "❗️ Failed to get routes"
182 changes: 182 additions & 0 deletions .tekton/infra-pipelinerun.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: modelservice-infra
annotations:
pipelinesascode.tekton.dev/on-event: "[push]"
pipelinesascode.tekton.dev/on-target-branch: "[infra]"
pipelinesascode.tekton.dev/task: "git-clone"
pipelinesascode.tekton.dev/max-keep-runs: "3"
pipelinesascode.tekton.dev/git-status: "true"
pipelinesascode.tekton.dev/on-cel-expression: >
(!has(body.ref) || body.ref == 'refs/heads/infra') &&
(!has(body.head_commit) || !has(body.head_commit.author) || !body.head_commit.author.name.matches("(?i).*ci-tag-bot.*")) &&
(!has(body.pull_request) || body.pull_request.base.ref == 'infra')
spec:
podTemplate:
serviceAccountName: pipeline
securityContext:
fsGroup: 0
imagePullSecrets:
- name: icr-secret
params:
- name: runOptional
value: "true"
- name: repo_url
value: "{{ repo_url }}"
- name: revision
value: "{{ revision }}"
- name: deleteExisting
value: "true"
- name: source_branch
value: "{{ source_branch }}"
pipelineSpec:
params:
- name: repo_url
- name: revision
- name: deleteExisting
- name: source_branch
workspaces:
- name: source
- name: basic-auth
- name: git-auth
- name: registry-secret
tasks:
- name: fix-permissions
taskSpec:
workspaces:
- name: source
workspace: source
steps:
- name: fix
image: quay.io/projectquay/golang:1.24
script: |
#!/bin/sh
echo "Fixing permissions on /workspace/source..."
chmod -R 777 /workspace/source || true
workspaces:
- name: source
workspace: source

- name: which-branch
taskRef:
name: print-branch-task
runAfter:
- fix-permissions
params:
- name: source-branch
value: "$(params.source_branch)"
workspaces:
- name: source
workspace: source

- name: fetch-repository
taskRef:
name: git-clone
runAfter:
- which-branch
workspaces:
- name: output
workspace: source
- name: basic-auth
workspace: basic-auth
params:
- name: url
value: $(params.repo_url)
- name: revision
value: $(params.revision)
- name: deleteExisting
value: "$(params.deleteExisting)"

- name: extract-version-and-registry
params:
- name: source-branch
value: "$(params.source_branch)"
runAfter:
- fetch-repository
taskRef:
name: extract-version-and-registry-task
workspaces:
- name: source
workspace: source

- name: tag-version
when:
- input: "$(params.runOptional)"
operator: in
values: ["true"]
- input: "$(params.source_branch)"
operator: in
values: ["infra"]
taskRef:
name: tag-version-task
params:
- name: source-branch
value: "$(params.source_branch)"
- name: prod-version
value: "$(tasks.extract-version-and-registry.results.prod-version)"
- name: dev-version
value: "$(tasks.extract-version-and-registry.results.dev-version)"
runAfter:
- extract-version-and-registry
workspaces:
- name: source
workspace: source
- name: git-auth
workspace: git-auth

- name: openshift-redeploy
when:
- input: "$(params.runOptional)"
operator: in
values: ["true"]
- input: "$(params.source_branch)"
operator: in
values: ["infra"]
taskRef:
name: openshift-redeploy-infra-task
params:
- name: source-branch
value: "$(params.source_branch)"
- name: prod-version
value: "$(tasks.extract-version-and-registry.results.prod-version)"
- name: dev-version
value: "$(tasks.extract-version-and-registry.results.dev-version)"
- name: prod_image_tag_base
value: "$(tasks.extract-version-and-registry.results.prod-image-tag-base)"
- name: dev_image_tag_base
value: "$(tasks.extract-version-and-registry.results.dev-image-tag-base)"
runAfter:
- tag-version
workspaces:
- name: source
workspace: source

- name: pipeline-complete-infra
when:
- input: "$(params.source_branch)"
operator: in
values: ["infra"]
runAfter:
- openshift-redeploy
taskRef:
name: noop-task

workspaces:
- name: source
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- name: basic-auth
secret:
secretName: "{{ git_auth_secret }}"
- name: git-auth
secret:
secretName: "git-auth-secret-neuralmagic"
- name: registry-secret
secret:
secretName: quay-secret
2 changes: 1 addition & 1 deletion .tekton/pipelinerun.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ metadata:
pipelinesascode.tekton.dev/on-event: "[pull_request, push]"
pipelinesascode.tekton.dev/on-target-branch: "[main, dev]"
pipelinesascode.tekton.dev/task: "git-clone"
pipelinesascode.tekton.dev/max-keep-runs: "5"
pipelinesascode.tekton.dev/max-keep-runs: "3"
pipelinesascode.tekton.dev/git-status: "true"
pipelinesascode.tekton.dev/on-cel-expression: >
(!has(body.ref) || body.ref == 'refs/heads/main' || body.ref == 'refs/heads/dev') &&
Expand Down
4 changes: 2 additions & 2 deletions .version.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"dev-version": "0.0.2",
"dev-version": "0.0.3",
"dev-registry": "quay.io/vllm-d/gateway-api-inference-extension-dev",
"prod-version": "0.0.1",
"prod-version": "0.0.2",
"prod-registry": "quay.io/vllm-d/gateway-api-inference-extension"
}
73 changes: 73 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Development

Developing and testing the Gateway API Inference Extension (GIE) is done by
building your Endpoint Picker (EPP) image and attaching that to a `Gateway` on a
development cluster, with some model serving backend to route traffic to.

We provide `Makefile` targets and development environment deployment manifests
under the `deploy/environments` directory, which include support for
multiple kinds of clusters:

* Kubernetes In Docker (KIND)
* Kubernetes (WIP: https://github.com/neuralmagic/gateway-api-inference-extension/issues/14)
* OpenShift (WIP: https://github.com/neuralmagic/gateway-api-inference-extension/issues/22)

We support multiple different model serving platforms for testing:

* VLLM
* VLLM-Simulator

In the following sections we will cover how to use the different development
environment options.

## Kubernetes In Docker (KIND)

A [KIND] cluster can be used for basic development and testing on a local
system. This environment will generally be limited to using a model serving
simulator and as such is very limited compared to clusters with full model
serving resources.

[KIND]:https://github.com/kubernetes-sigs/kind

### Setup

> **WARNING**: This current requires you to have manually built the vllm
> simulator separately on your local system. In a future iteration this will
> be handled automatically and will not be required.

Run the following:

```console
make environment.dev.kind
```

This will create a `kind` cluster (or re-use an existing one) using the system's
local container runtime and deploy the development stack into the `default`
namespace. Instrutions will be provided on how to access the `Gateway` and send
requests for testing.

> **NOTE**: If you require significant customization of this environment beyond
> what the standard deployment provides, you can use the `deploy/components`
> with `kustomize` to build your own highly customized environment. You can use
> the `deploy/environments/kind` deployment as a reference for your own.

#### Development Cycle

To test your changes to the GIE in this environment, make your changes locally
and then run the following:

```console
make environment.dev.kind.update
```

This will build images with your recent changes and load the new images to the
cluster. Then a rollout the `Deployments` will be performed so that your
recent changes are refleted.

## Kubernetes

WIP

## OpenShift

WIP
4 changes: 1 addition & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,4 @@ WORKDIR /
COPY --from=builder /workspace/bin/epp /app/epp
USER 65532:65532

CMD ["sleep", "infinity"]


ENTRYPOINT ["/app/epp"]
Loading