This repository was archived by the owner on May 15, 2025. It is now read-only.
  
  
  
            
  
    
      forked from kubernetes-sigs/gateway-api-inference-extension
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 3
feat: Add scripts for kubernetes dev env using vLLM and vLLM-p2p #60
          
     Merged
      
      
    
  
     Merged
                    Changes from all commits
      Commits
    
    
            Show all changes
          
          
            5 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      bf8017f
              
                 feat: add scripts for kubernetes dev env using vLLM and vLLM-p2p (se…
              
              
                kfirtoledo 78157d5
              
                [fix]: Small fixes for development YAMLs
              
              
                kfirtoledo a11e984
              
                [fix]: Small fixes for deployment and fix comments
              
              
                kfirtoledo 937bb50
              
                [fix]: fix typos and edit the Readme and env vars
              
              
                kfirtoledo 17a23e5
              
                [fix] Fix the kind environemnt and set gateway service to be NodePort
              
              
                kfirtoledo File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
| kind: InferencePool | ||
| metadata: | ||
| name: vllm-llama3-8b-instruct | ||
| name: ${POOL_NAME} | ||
| spec: | ||
| targetPortNumber: 8000 | ||
| selector: | ||
| app: vllm-llama3-8b-instruct | ||
| app: ${POOL_NAME} | ||
| extensionRef: | ||
| name: endpoint-picker | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # ------------------------------------------------------------------------------ | ||
| # vLLM P2P Deployment | ||
| # | ||
| # This deploys the full vLLM model server, capable of serving real models such | ||
| # as Llama 3.1-8B-Instruct via the OpenAI-compatible API. It is intended for | ||
| # environments with GPU resources and where full inference capabilities are | ||
| # required. | ||
| # in additon it add LMcache a LLM serving engine extension using Redis to vLLM image | ||
| # | ||
| # The deployment can be customized using environment variables to set: | ||
| # - The container image and tag (VLLM_IMAGE, VLLM_TAG) | ||
| # - The model to load (MODEL_NAME) | ||
| # | ||
| # This setup is suitable for testing on Kubernetes (including | ||
| # GPU-enabled nodes or clusters with scheduling for `nvidia.com/gpu`). | ||
| # ----------------------------------------------------------------------------- | ||
| apiVersion: kustomize.config.k8s.io/v1beta1 | ||
|         
                  shaneutt marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| kind: Kustomization | ||
|  | ||
| resources: | ||
| - vllm-deployment.yaml | ||
| - redis-deployment.yaml | ||
| - redis-service.yaml | ||
| - secret.yaml | ||
|  | ||
| images: | ||
| - name: vllm/vllm-openai | ||
| newName: ${VLLM_IMAGE} | ||
| newTag: ${VLLM_TAG} | ||
| - name: redis | ||
| newName: ${REDIS_IMAGE} | ||
| newTag: ${REDIS_TAG} | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: ${REDIS_DEPLOYMENT_NAME} | ||
| labels: | ||
| app.kubernetes.io/name: redis | ||
| app.kubernetes.io/component: redis-lookup-server | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| app.kubernetes.io/name: redis | ||
| app.kubernetes.io/component: redis-lookup-server | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app.kubernetes.io/name: redis | ||
| app.kubernetes.io/component: redis-lookup-server | ||
| spec: | ||
| containers: | ||
| - name: lookup-server | ||
| image: ${REDIS_IMAGE}:${REDIS_TAG} | ||
| imagePullPolicy: IfNotPresent | ||
| command: | ||
| - redis-server | ||
| ports: | ||
| - name: redis-port | ||
| containerPort: ${REDIS_TARGET_PORT} | ||
| protocol: TCP | ||
| resources: | ||
| limits: | ||
| cpu: "4" | ||
| memory: 10G | ||
| requests: | ||
| cpu: "4" | ||
| memory: 8G | ||
| terminationMessagePath: /dev/termination-log | ||
| terminationMessagePolicy: File | ||
| restartPolicy: Always | ||
| terminationGracePeriodSeconds: 30 | ||
| dnsPolicy: ClusterFirst | ||
| securityContext: {} | ||
| schedulerName: default-scheduler | ||
| strategy: | ||
| type: RollingUpdate | ||
| rollingUpdate: | ||
| maxUnavailable: 25% | ||
| maxSurge: 25% | ||
| revisionHistoryLimit: 10 | ||
| progressDeadlineSeconds: 600 | 
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.