Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to InferenceModel Should Trigger EndpointSlice Reconciliation #151

Open
danehans opened this issue Jan 6, 2025 · 3 comments
Open
Assignees

Comments

@danehans
Copy link
Contributor

danehans commented Jan 6, 2025

The data store is not populated with the required Pod details when the InferenceModel and InferencePool CRs are added after EPP is started:

Skipping reconciling EndpointSlice because the InferencePool is not available yet: InferencePool hasn't been initialized yet
...
===DEBUG: Current Pods and metrics: []

Add the example InferenceModel and InferencePool CRs:

reconciling InferencePooldefault/vllm-llama2-7b-pool
reconciling InferenceModeldefault/inferencemodel-sample
Incoming pool ref {inference.networking.x-k8s.io InferencePool vllm-llama2-7b-pool}, server pool name: vllm-llama2-7b-pool
Adding/Updating inference model: tweet-summary
===DEBUG: Current Pods and metrics: []

Recreate the EndpointSlice for the example service and the data store reflects the required Pod details:

I0106 17:05:36.413276       1 endpointslice_reconciler.go:34] Reconciling EndpointSlice default/vllm-llama2-7b-pool-xvkcm
I0106 17:05:36.545209       1 provider.go:92] ===DEBUG: Current Pods and metrics: [Pod: vllm-llama2-7b-pool-59d86b6c85-ktsw7:10.244.0.16:8000; Metrics: {ActiveModels:map[] MaxActiveModels:0 RunningQueueSize:0 WaitingQueueSize:0 KVCacheUsagePercent:0 KvCacheMaxTokenCapacity:0}]
...

EndpointSlice reconciliation should be triggered whenever an InferencePool CRUD operation occurs since it manages the internal Pod state which depends on InferencePool details, e.g. targetPortNumber.

@danehans
Copy link
Contributor Author

danehans commented Jan 6, 2025

/assign

@ahg-g
Copy link
Contributor

ahg-g commented Jan 7, 2025

Great catch, so we will likely need to implement this using WatchesRawSource and channels. See how we did this in Kueue:

What we need to do in the endpointslice controller: https://github.com/kubernetes-sigs/kueue/blob/4c2a0cddc35b80995653c24a9fe85cb57743179f/pkg/controller/core/resourceflavor_controller.go#L262

The function that the InferencePool controller must call to trigger a reconcile on the endpointslice: https://github.com/kubernetes-sigs/kueue/blob/4c2a0cddc35b80995653c24a9fe85cb57743179f/pkg/controller/core/resourceflavor_controller.go#L194

@Kuromesi
Copy link

I'm also facing this issue recently, this leads to a response of HTTP/1.1 429 Too Many Requests and I have to restart the external processing. Is there any progress? If not, I would be happy to make some contribution. :)

BTW, I think we should also check the namespace of the endpointslice, instead of only check if the service name matches the owner label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants