diff --git a/gfmstudio/amo/a-model-01-name-chart/README.md b/gfmstudio/amo/a-model-01-name-chart/README.md
new file mode 100644
index 0000000..ee9a1e9
--- /dev/null
+++ b/gfmstudio/amo/a-model-01-name-chart/README.md
@@ -0,0 +1,259 @@
+# Automated Model Deployment Guide
+
+This guide explains how to use the vllm-inference-server template for automated model deployments.
+
+## Overview
+
+The deployment system uses a template-based approach where:
+- All services are prefixed with `gfm-amo-{MODEL_NAME}`
+- KServe can be toggled on/off during deployment
+- Resource requirements are configurable per deployment
+- Model names are dynamically injected into the deployment
+
+## Files
+
+1. **vllm-inference-server-template.yaml** - The main template with placeholders
+2. **deploy_model.py** - Python script for automated deployments
+3. **vllm-inference-server-all-in-one.yaml** - Reference with Helm variables
+4. **vllm-inference-server-standalone.yaml** - Example with filled values
+
+## Template Placeholders
+
+The template uses the following placeholders that must be replaced:
+
+| Placeholder | Description | Example |
+|------------|-------------|---------|
+| `${MODEL_NAME}` | Model name (prefixed with gfm-amo-) | `my-model` |
+| `${NAMESPACE}` | Kubernetes namespace | `geospatial-studio` |
+| `${ENABLE_KSERVE}` | Enable KServe (true/false) | `false` |
+| `${IMAGE_REPOSITORY}` | Container image repository | `us.icr.io/gfmaas/vllm-small` |
+| `${IMAGE_TAG}` | Container image tag | `v0.0.6` |
+| `${IMAGE_PULL_SECRET}` | Image pull secret name | `my-registry-secret` |
+| `${SERVICE_ACCOUNT}` | Service account name | `default` |
+| `${MODELS_PVC}` | Models storage PVC | `vllm-models-pvc` |
+| `${INFERENCE_SHARED_PVC}` | Shared inference PVC | `inference-shared-pvc` |
+| `${GPU_COUNT}` | Number of GPUs | `1` |
+| `${CPU_LIMIT}` | CPU limit | `2000m` |
+| `${MEMORY_LIMIT}` | Memory limit | `8Gi` |
+| `${CPU_REQUEST}` | CPU request | `1000m` |
+| `${MEMORY_REQUEST}` | Memory request | `4Gi` |
+
+## Deployment Modes
+
+### Standard Deployment (KServe Disabled)
+
+Creates a standard Kubernetes Deployment with:
+- Fixed replica count (default: 1)
+- Always-on pods
+- Direct service access
+
+**Use when:**
+- You need consistent availability
+- Scale-to-zero is not required
+- You want simpler networking
+
+### KServe InferenceService (KServe Enabled)
+
+Creates a KServe InferenceService with:
+- Scale-to-zero capability (minReplicas: 0)
+- Automatic scaling based on traffic
+- Advanced serving features
+
+**Use when:**
+- You want to save resources with scale-to-zero
+- You need automatic scaling
+- KServe is installed in your cluster
+
+## Usage Examples
+
+### Using Python Script
+
+#### 1. Deploy with Standard Deployment (Dry Run)
+
+```bash
+python deploy_model.py my-flood-model \
+  --namespace geospatial-studio \
+  --dry-run
+```
+
+#### 2. Deploy with KServe InferenceService
+
+```bash
+python deploy_model.py my-flood-model \
+  --namespace geospatial-studio \
+  --enable-kserve
+```
+
+#### 3. Deploy with Custom Resources
+
+```bash
+python deploy_model.py my-large-model \
+  --namespace geospatial-studio \
+  --gpu-count 2 \
+  --memory-limit 16Gi \
+  --cpu-limit 4000m \
+  --memory-request 8Gi \
+  --cpu-request 2000m
+```
+
+#### 4. Generate YAML File Without Deploying
+
+```bash
+python deploy_model.py my-model \
+  --namespace geospatial-studio \
+  --enable-kserve \
+  --dry-run \
+  --output my-model-deployment.yaml
+```
+
+### Using Shell Script (sed/envsubst)
+
+```bash
+#!/bin/bash
+
+# Set variables
+export MODEL_NAME="my-flood-model"
+export NAMESPACE="geospatial-studio"
+export ENABLE_KSERVE="false"
+export IMAGE_REPOSITORY="us.icr.io/gfmaas/vllm-small"
+export IMAGE_TAG="v0.0.6"
+export IMAGE_PULL_SECRET="my-registry-secret"
+export SERVICE_ACCOUNT="default"
+export MODELS_PVC="vllm-models-pvc"
+export INFERENCE_SHARED_PVC="inference-shared-pvc"
+export GPU_COUNT="1"
+export CPU_LIMIT="2000m"
+export MEMORY_LIMIT="8Gi"
+export CPU_REQUEST="1000m"
+export MEMORY_REQUEST="4Gi"
+
+# Generate YAML
+envsubst < vllm-inference-server-template.yaml > deployment.yaml
+
+# Filter based on KServe mode
+if [ "$ENABLE_KSERVE" = "true" ]; then
+  # Remove Deployment section, keep InferenceService
+  sed -i '/# Deployment (used when ENABLE_KSERVE=false)/,/^---$/d' deployment.yaml
+else
+  # Remove InferenceService section, keep Deployment
+  sed -i '/# InferenceService (used when ENABLE_KSERVE=true)/,/^# Made with Bob$/d' deployment.yaml
+fi
+
+# Apply
+kubectl apply -f deployment.yaml
+```
+
+### Programmatic Integration (Python)
+
+```python
+from deploy_model import deploy_model
+
+# Deploy a model programmatically
+success = deploy_model(
+    model_name="my-burn-scar-model",
+    namespace="geospatial-studio",
+    enable_kserve=True,
+    image_repository="us.icr.io/gfmaas/vllm-small",
+    image_tag="v0.0.6",
+    gpu_count="1",
+    memory_limit="8Gi",
+    dry_run=False
+)
+
+if success:
+    print("Model deployed successfully!")
+else:
+    print("Deployment failed!")
+```
+
+## Service Naming Convention
+
+All deployed services follow this naming pattern:
+- Service name: `gfm-amo-{MODEL_NAME}`
+- Example: `gfm-amo-my-flood-model`
+
+This ensures:
+- Consistent naming across deployments
+- Easy identification of automated deployments
+- No naming conflicts with manual deployments
+
+## Verification
+
+After deployment, verify the resources:
+
+```bash
+# Check deployment/inferenceservice
+kubectl get deployment -n geospatial-studio gfm-amo-{MODEL_NAME}
+# OR
+kubectl get inferenceservice -n geospatial-studio gfm-amo-{MODEL_NAME}
+
+# Check service
+kubectl get svc -n geospatial-studio gfm-amo-{MODEL_NAME}
+
+# Check pods
+kubectl get pods -n geospatial-studio -l app.kubernetes.io/name=gfm-amo-{MODEL_NAME}
+
+# View logs
+kubectl logs -n geospatial-studio -l app.kubernetes.io/name=gfm-amo-{MODEL_NAME}
+```
+
+## Cleanup
+
+To remove a deployed model:
+
+```bash
+# Delete all resources for a model
+kubectl delete all -n geospatial-studio -l app.kubernetes.io/name=gfm-amo-{MODEL_NAME}
+
+# Or delete specific resources
+kubectl delete deployment gfm-amo-{MODEL_NAME} -n geospatial-studio
+kubectl delete svc gfm-amo-{MODEL_NAME} -n geospatial-studio
+```
+
+## Integration with Deployment Services
+
+For automated deployment services, the recommended approach is:
+
+1. **Store the template** in your deployment service
+2. **Accept user inputs** for model name and KServe toggle
+3. **Replace placeholders** using your preferred method (Python, Go, etc.)
+4. **Filter resources** based on KServe mode
+5. **Apply to cluster** using kubectl or Kubernetes client library
+
+Example workflow:
+```
+User Request → Model Name + KServe Toggle
+    ↓
+Load Template
+    ↓
+Replace Placeholders
+    ↓
+Filter by KServe Mode
+    ↓
+Apply to Kubernetes
+    ↓
+Return Service URL: gfm-amo-{MODEL_NAME}.{NAMESPACE}.svc.cluster.local
+```
+
+## Troubleshooting
+
+### Issue: Pods not starting
+
+Check:
+1. GPU availability: `kubectl describe node | grep nvidia.com/gpu`
+2. Image pull secrets: `kubectl get secret {IMAGE_PULL_SECRET} -n {NAMESPACE}`
+3. PVC status: `kubectl get pvc -n {NAMESPACE}`
+
+### Issue: KServe InferenceService not working
+
+Verify:
+1. KServe is installed: `kubectl get crd inferenceservices.serving.kserve.io`
+2. Knative Serving is running: `kubectl get pods -n knative-serving`
+3. Check InferenceService status: `kubectl describe inferenceservice gfm-amo-{MODEL_NAME} -n {NAMESPACE}`
+
+### Issue: Service not accessible
+
+Check:
+1. Service exists: `kubectl get svc gfm-amo-{MODEL_NAME} -n {NAMESPACE}`
+2. Endpoints are ready: `kubectl get endpoints gfm-amo-{MODEL_NAME} -n {NAMESPACE}`
+3. Pod is running: `kubectl get pods -n {NAMESPACE} -l app.kubernetes.io/name=gfm-amo-{MODEL_NAME}`
diff --git a/gfmstudio/amo/a-model-01-name-chart/deploy_model.py b/gfmstudio/amo/a-model-01-name-chart/deploy_model.py
new file mode 100644
index 0000000..ec021df
--- /dev/null
+++ b/gfmstudio/amo/a-model-01-name-chart/deploy_model.py
@@ -0,0 +1,266 @@
+#!/usr/bin/env python3
+"""
+© Copyright IBM Corporation 2026
+SPDX-License-Identifier: Apache-2.0
+
+Automated Model Deployment Script
+This script demonstrates how to deploy models using the vllm-inference-server template.
+"""
+
+import argparse
+import subprocess
+import sys
+from pathlib import Path
+from typing import Dict, Optional
+
+
+def load_template(template_path: str) -> str:
+    """Load the YAML template file."""
+    with open(template_path, "r") as f:
+        return f.read()
+
+
+def replace_placeholders(template: str, config: Dict[str, str]) -> str:
+    """Replace all placeholders in the template with actual values."""
+    result = template
+    for key, value in config.items():
+        placeholder = f"${{{key}}}"
+        result = result.replace(placeholder, str(value))
+    return result
+
+
+def filter_by_kserve_mode(yaml_content: str, enable_kserve: bool) -> str:
+    """
+    Filter the YAML content based on KServe mode.
+    Remove Deployment if KServe is enabled, or remove InferenceService if disabled.
+    """
+    lines = yaml_content.split("\n")
+    filtered_lines = []
+    skip_section = False
+    current_section = None
+
+    for line in lines:
+        # Detect section starts
+        if line.startswith("# Deployment (used when ENABLE_KSERVE=false)"):
+            current_section = "deployment"
+            skip_section = enable_kserve  # Skip if KServe is enabled
+        elif line.startswith("# InferenceService (used when ENABLE_KSERVE=true)"):
+            current_section = "inferenceservice"
+            skip_section = not enable_kserve  # Skip if KServe is disabled
+        elif line.startswith("---") and current_section:
+            # End of section
+            current_section = None
+            skip_section = False
+
+        # Add line if not skipping
+        if not skip_section:
+            filtered_lines.append(line)
+
+    return "\n".join(filtered_lines)
+
+
+def deploy_model(
+    model_name: str,
+    namespace: str,
+    enable_kserve: bool = False,
+    image_repository: str = "us.icr.io/gfmaas/vllm-small",
+    image_tag: str = "v0.0.6",
+    image_pull_secret: str = "my-registry-secret",
+    service_account: str = "default",
+    models_pvc: str = "vllm-models-pvc",
+    inference_shared_pvc: str = "inference-shared-pvc",
+    gpu_count: str = "1",
+    cpu_limit: str = "2000m",
+    memory_limit: str = "8Gi",
+    cpu_request: str = "1000m",
+    memory_request: str = "4Gi",
+    dry_run: bool = False,
+    output_file: Optional[str] = None,
+) -> bool:
+    """
+    Deploy a model using the template.
+
+    Args:
+        model_name: Name of the model (will be prefixed with gfm-amo-)
+        namespace: Kubernetes namespace
+        enable_kserve: Whether to use KServe InferenceService
+        image_repository: Container image repository
+        image_tag: Container image tag
+        image_pull_secret: Name of image pull secret
+        service_account: Service account name
+        models_pvc: PVC name for models storage
+        inference_shared_pvc: PVC name for shared inference data
+        gpu_count: Number of GPUs to request
+        cpu_limit: CPU limit
+        memory_limit: Memory limit
+        cpu_request: CPU request
+        memory_request: Memory request
+        dry_run: If True, only generate YAML without applying
+        output_file: If provided, save generated YAML to this file
+
+    Returns:
+        True if successful, False otherwise
+    """
+    # Configuration dictionary
+    config = {
+        "MODEL_NAME": model_name,
+        "NAMESPACE": namespace,
+        "ENABLE_KSERVE": str(enable_kserve).lower(),
+        "IMAGE_REPOSITORY": image_repository,
+        "IMAGE_TAG": image_tag,
+        "IMAGE_PULL_SECRET": image_pull_secret,
+        "SERVICE_ACCOUNT": service_account,
+        "MODELS_PVC": models_pvc,
+        "INFERENCE_SHARED_PVC": inference_shared_pvc,
+        "GPU_COUNT": gpu_count,
+        "CPU_LIMIT": cpu_limit,
+        "MEMORY_LIMIT": memory_limit,
+        "CPU_REQUEST": cpu_request,
+        "MEMORY_REQUEST": memory_request,
+    }
+
+    # Load template
+    template_path = Path(__file__).parent / "vllm-inference-server-template.yaml"
+    if not template_path.exists():
+        print(f"Error: Template file not found at {template_path}", file=sys.stderr)
+        return False
+
+    template = load_template(str(template_path))
+
+    # Replace placeholders
+    yaml_content = replace_placeholders(template, config)
+
+    # Filter based on KServe mode
+    yaml_content = filter_by_kserve_mode(yaml_content, enable_kserve)
+
+    # Save to file if requested
+    if output_file:
+        with open(output_file, "w") as f:
+            f.write(yaml_content)
+        print(f"Generated YAML saved to: {output_file}")
+
+    # Print or apply
+    if dry_run:
+        print("=" * 80)
+        print("DRY RUN - Generated YAML:")
+        print("=" * 80)
+        print(yaml_content)
+        print("=" * 80)
+        deployment_type = (
+            "KServe InferenceService" if enable_kserve else "Standard Deployment"
+        )
+        print(f"\nDeployment type: {deployment_type}")
+        print(f"Service name: gfm-amo-{model_name}")
+        print(f"Namespace: {namespace}")
+        return True
+    else:
+        # Apply to Kubernetes
+        try:
+            result = subprocess.run(
+                ["kubectl", "apply", "-f", "-"],
+                input=yaml_content.encode(),
+                capture_output=True,
+                check=True,
+            )
+            print(result.stdout.decode())
+            deployment_type = (
+                "KServe InferenceService" if enable_kserve else "Standard Deployment"
+            )
+            print(
+                f"\n✓ Successfully deployed {deployment_type} for model: gfm-amo-{model_name}"
+            )
+            return True
+        except subprocess.CalledProcessError as e:
+            print(f"Error deploying model: {e.stderr.decode()}", file=sys.stderr)
+            return False
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Deploy GFM inference server for a model",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Deploy with standard Deployment
+  python deploy_model.py my-model --namespace geospatial-studio --dry-run
+  
+  # Deploy with KServe InferenceService
+  python deploy_model.py my-model --namespace geospatial-studio --enable-kserve --dry-run
+  
+  # Deploy with custom resources
+  python deploy_model.py my-model --namespace geospatial-studio \\
+    --gpu-count 2 --memory-limit 16Gi --cpu-limit 4000m
+  
+  # Generate YAML file without deploying
+  python deploy_model.py my-model --namespace geospatial-studio \\
+    --dry-run --output my-model-deployment.yaml
+        """,
+    )
+
+    parser.add_argument("model_name", help="Name of the model to deploy")
+    parser.add_argument("--namespace", required=True, help="Kubernetes namespace")
+    parser.add_argument(
+        "--enable-kserve",
+        action="store_true",
+        help="Use KServe InferenceService instead of standard Deployment",
+    )
+    parser.add_argument(
+        "--image-repository",
+        default="us.icr.io/gfmaas/vllm-small",
+        help="Container image repository",
+    )
+    parser.add_argument("--image-tag", default="v0.0.6", help="Container image tag")
+    parser.add_argument(
+        "--image-pull-secret",
+        default="my-registry-secret",
+        help="Name of image pull secret",
+    )
+    parser.add_argument(
+        "--service-account", default="default", help="Service account name"
+    )
+    parser.add_argument(
+        "--models-pvc", default="vllm-models-pvc", help="PVC name for models storage"
+    )
+    parser.add_argument(
+        "--inference-shared-pvc",
+        default="inference-shared-pvc",
+        help="PVC name for shared inference data",
+    )
+    parser.add_argument("--gpu-count", default="1", help="Number of GPUs to request")
+    parser.add_argument("--cpu-limit", default="2000m", help="CPU limit")
+    parser.add_argument("--memory-limit", default="8Gi", help="Memory limit")
+    parser.add_argument("--cpu-request", default="1000m", help="CPU request")
+    parser.add_argument("--memory-request", default="4Gi", help="Memory request")
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Generate YAML without applying to cluster",
+    )
+    parser.add_argument("--output", help="Save generated YAML to file")
+
+    args = parser.parse_args()
+
+    success = deploy_model(
+        model_name=args.model_name,
+        namespace=args.namespace,
+        enable_kserve=args.enable_kserve,
+        image_repository=args.image_repository,
+        image_tag=args.image_tag,
+        image_pull_secret=args.image_pull_secret,
+        service_account=args.service_account,
+        models_pvc=args.models_pvc,
+        inference_shared_pvc=args.inference_shared_pvc,
+        gpu_count=args.gpu_count,
+        cpu_limit=args.cpu_limit,
+        memory_limit=args.memory_limit,
+        cpu_request=args.cpu_request,
+        memory_request=args.memory_request,
+        dry_run=args.dry_run,
+        output_file=args.output,
+    )
+
+    sys.exit(0 if success else 1)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/gfmstudio/amo/a-model-01-name-chart/vllm-inference-server-template.yaml b/gfmstudio/amo/a-model-01-name-chart/vllm-inference-server-template.yaml
new file mode 100644
index 0000000..59ae2ba
--- /dev/null
+++ b/gfmstudio/amo/a-model-01-name-chart/vllm-inference-server-template.yaml
@@ -0,0 +1,208 @@
+# © Copyright IBM Corporation 2026
+# SPDX-License-Identifier: Apache-2.0
+
+# GFM Inference Server - Automated Deployment Template
+# This template is designed for automated model deployment services
+#
+# PLACEHOLDERS TO REPLACE:
+# - ${MODEL_NAME} - Name of the model being deployed (e.g., "my-model")
+# - ${NAMESPACE} - Target Kubernetes namespace
+# - ${ENABLE_KSERVE} - "true" or "false" to toggle KServe InferenceService
+# - ${IMAGE_REPOSITORY} - Container image repository
+# - ${IMAGE_TAG} - Container image tag
+# - ${IMAGE_PULL_SECRET} - Name of the image pull secret
+# - ${SERVICE_ACCOUNT} - Service account name
+# - ${MODELS_PVC} - PVC name for models storage
+# - ${INFERENCE_SHARED_PVC} - PVC name for shared inference data
+# - ${GPU_COUNT} - Number of GPUs to request (e.g., "1")
+# - ${CPU_LIMIT} - CPU limit (e.g., "2000m")
+# - ${MEMORY_LIMIT} - Memory limit (e.g., "8Gi")
+# - ${CPU_REQUEST} - CPU request (e.g., "1000m")
+# - ${MEMORY_REQUEST} - Memory request (e.g., "4Gi")
+#
+# Service names will be prefixed with "gfm-amo-" automatically
+
+---
+# Service
+apiVersion: v1
+kind: Service
+metadata:
+  name: gfm-amo-${MODEL_NAME}
+  namespace: ${NAMESPACE}
+  labels:
+    app.kubernetes.io/name: gfm-amo-${MODEL_NAME}
+    app.kubernetes.io/component: inference-server
+    app.kubernetes.io/managed-by: automated-deployment
+spec:
+  type: ClusterIP
+  ports:
+    - port: 80
+      targetPort: http
+      protocol: TCP
+      name: http
+  selector:
+    app.kubernetes.io/name: gfm-amo-${MODEL_NAME}
+
+---
+# Deployment (used when ENABLE_KSERVE=false)
+# CONDITIONAL: Only deploy if ${ENABLE_KSERVE} == "false"
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: gfm-amo-${MODEL_NAME}
+  namespace: ${NAMESPACE}
+  labels:
+    app.kubernetes.io/name: gfm-amo-${MODEL_NAME}
+    app.kubernetes.io/component: inference-server
+    app.kubernetes.io/managed-by: automated-deployment
+  annotations:
+    deployment.type: "standard"
+spec:
+  replicas: 1
+  strategy:
+    type: Recreate
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: gfm-amo-${MODEL_NAME}
+  template:
+    metadata:
+      labels:
+        app.kubernetes.io/name: gfm-amo-${MODEL_NAME}
+        app.kubernetes.io/component: inference-server
+        app.kubernetes.io/managed-by: automated-deployment
+    spec:
+      imagePullSecrets:
+        - name: ${IMAGE_PULL_SECRET}
+      serviceAccountName: ${SERVICE_ACCOUNT}
+      affinity:
+        nodeAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 100
+            preference:
+              matchExpressions:
+              - key: nvidia.com/gpu
+                operator: Exists
+      initContainers:
+      - name: setup-directory
+        image: busybox
+        command: ['sh', '-c', 'mkdir -p /data/outputs && chmod 777 /data/outputs']
+        volumeMounts:
+        - name: inference-shared-pvc
+          mountPath: /data
+      containers:
+      - name: inference-server
+        image: "${IMAGE_REPOSITORY}:${IMAGE_TAG}"
+        imagePullPolicy: Always
+        ports:
+        - name: http
+          containerPort: 8000
+          protocol: TCP
+        env:
+        - name: MODELS_PATH
+          value: "/models"
+        - name: HF_HOME
+          value: "/models/huggingface_cache"
+        - name: TERRATORCH_SEGMENTATION_IO_PROCESSOR_CONFIG
+          value: "{\"output_path\": \"/data/outputs/\"}"
+        - name: MODEL_NAME
+          value: "${MODEL_NAME}"
+        volumeMounts:
+        - name: models-storage
+          mountPath: /models
+        - name: inference-shared-pvc
+          mountPath: /data
+        resources:
+          limits:
+            cpu: "${CPU_LIMIT}"
+            memory: "${MEMORY_LIMIT}"
+            nvidia.com/gpu: "${GPU_COUNT}"
+          requests:
+            cpu: "${CPU_REQUEST}"
+            memory: "${MEMORY_REQUEST}"
+            nvidia.com/gpu: "${GPU_COUNT}"
+      volumes:
+      - name: models-storage
+        persistentVolumeClaim:
+          claimName: ${MODELS_PVC}
+      - name: inference-shared-pvc
+        persistentVolumeClaim:
+          claimName: ${INFERENCE_SHARED_PVC}
+
+---
+# InferenceService (used when ENABLE_KSERVE=true)
+# CONDITIONAL: Only deploy if ${ENABLE_KSERVE} == "true"
+# Provides KServe integration with scale-to-zero capabilities
+apiVersion: serving.kserve.io/v1beta1
+kind: InferenceService
+metadata:
+  name: gfm-amo-${MODEL_NAME}
+  namespace: ${NAMESPACE}
+  labels:
+    app.kubernetes.io/name: gfm-amo-${MODEL_NAME}
+    app.kubernetes.io/component: inference-server
+    app.kubernetes.io/managed-by: automated-deployment
+  annotations:
+    deployment.type: "kserve"
+spec:
+  predictor:
+    minReplicas: 0
+    maxReplicas: 3
+    scaleTarget: 100
+    scaleMetric: concurrency
+    containerConcurrency: 0
+    timeout: 600
+    containers:
+    - name: kserve-container
+      image: "${IMAGE_REPOSITORY}:${IMAGE_TAG}"
+      imagePullPolicy: Always
+      ports:
+      - name: http
+        containerPort: 8000
+        protocol: TCP
+      env:
+      - name: MODELS_PATH
+        value: "/models"
+      - name: HF_HOME
+        value: "/models/huggingface_cache"
+      - name: TERRATORCH_SEGMENTATION_IO_PROCESSOR_CONFIG
+        value: "{\"output_path\": \"/data/outputs/\"}"
+      - name: MODEL_NAME
+        value: "${MODEL_NAME}"
+      volumeMounts:
+      - name: models-storage
+        mountPath: /models
+      - name: inference-shared-pvc
+        mountPath: /data
+      resources:
+        limits:
+          cpu: "${CPU_LIMIT}"
+          memory: "${MEMORY_LIMIT}"
+          nvidia.com/gpu: "${GPU_COUNT}"
+        requests:
+          cpu: "${CPU_REQUEST}"
+          memory: "${MEMORY_REQUEST}"
+          nvidia.com/gpu: "${GPU_COUNT}"
+    initContainers:
+    - name: setup-directory
+      image: busybox
+      command: ['sh', '-c', 'mkdir -p /data/outputs && chmod 777 /data/outputs']
+      volumeMounts:
+      - name: inference-shared-pvc
+        mountPath: /data
+    volumes:
+    - name: models-storage
+      persistentVolumeClaim:
+        claimName: ${MODELS_PVC}
+    - name: inference-shared-pvc
+      persistentVolumeClaim:
+        claimName: ${INFERENCE_SHARED_PVC}
+    affinity:
+      nodeAffinity:
+        preferredDuringSchedulingIgnoredDuringExecution:
+        - weight: 100
+          preference:
+            matchExpressions:
+            - key: nvidia.com/gpu
+              operator: Exists
+    imagePullSecrets:
+    - name: ${IMAGE_PULL_SECRET}