Changes to InferenceModel Should Trigger EndpointSlice Reconciliation #151

danehans · 2025-01-06T17:55:19Z

The data store is not populated with the required Pod details when the InferenceModel and InferencePool CRs are added after EPP is started:

Skipping reconciling EndpointSlice because the InferencePool is not available yet: InferencePool hasn't been initialized yet
...
===DEBUG: Current Pods and metrics: []

Add the example InferenceModel and InferencePool CRs:

reconciling InferencePooldefault/vllm-llama2-7b-pool
reconciling InferenceModeldefault/inferencemodel-sample
Incoming pool ref {inference.networking.x-k8s.io InferencePool vllm-llama2-7b-pool}, server pool name: vllm-llama2-7b-pool
Adding/Updating inference model: tweet-summary
===DEBUG: Current Pods and metrics: []

Recreate the EndpointSlice for the example service and the data store reflects the required Pod details:

I0106 17:05:36.413276       1 endpointslice_reconciler.go:34] Reconciling EndpointSlice default/vllm-llama2-7b-pool-xvkcm
I0106 17:05:36.545209       1 provider.go:92] ===DEBUG: Current Pods and metrics: [Pod: vllm-llama2-7b-pool-59d86b6c85-ktsw7:10.244.0.16:8000; Metrics: {ActiveModels:map[] MaxActiveModels:0 RunningQueueSize:0 WaitingQueueSize:0 KVCacheUsagePercent:0 KvCacheMaxTokenCapacity:0}]
...

EndpointSlice reconciliation should be triggered whenever an InferencePool CRUD operation occurs since it manages the internal Pod state which depends on InferencePool details, e.g. targetPortNumber.

The text was updated successfully, but these errors were encountered:

danehans · 2025-01-06T17:55:24Z

/assign

ahg-g · 2025-01-07T04:36:18Z

Great catch, so we will likely need to implement this using WatchesRawSource and channels. See how we did this in Kueue:

What we need to do in the endpointslice controller: https://github.com/kubernetes-sigs/kueue/blob/4c2a0cddc35b80995653c24a9fe85cb57743179f/pkg/controller/core/resourceflavor_controller.go#L262

The function that the InferencePool controller must call to trigger a reconcile on the endpointslice: https://github.com/kubernetes-sigs/kueue/blob/4c2a0cddc35b80995653c24a9fe85cb57743179f/pkg/controller/core/resourceflavor_controller.go#L194

Kuromesi · 2025-01-15T03:56:51Z

I'm also facing this issue recently, this leads to a response of HTTP/1.1 429 Too Many Requests and I have to restart the external processing. Is there any progress? If not, I would be happy to make some contribution. :)

BTW, I think we should also check the namespace of the endpointslice, instead of only check if the service name matches the owner label.

k8s-ci-robot assigned danehans Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to InferenceModel Should Trigger EndpointSlice Reconciliation #151

Changes to InferenceModel Should Trigger EndpointSlice Reconciliation #151

danehans commented Jan 6, 2025

danehans commented Jan 6, 2025

ahg-g commented Jan 7, 2025

Kuromesi commented Jan 15, 2025

Changes to InferenceModel Should Trigger EndpointSlice Reconciliation #151

Changes to InferenceModel Should Trigger EndpointSlice Reconciliation #151

Comments

danehans commented Jan 6, 2025

danehans commented Jan 6, 2025

ahg-g commented Jan 7, 2025

Kuromesi commented Jan 15, 2025