[ML] Adding configurable inference service #127939

jonathan-buttner · 2025-05-08T20:08:21Z

Taking the ideas and commits from #124299

Notable changes from initial PR:

Flattened structure by removing the path and method nesting
- I expect that we'll only have a single path and the method will always be POST
- The path portion of the url can be placed directly in the url field
Removed query_string and converted it to a list of tuples, to leverage
Removed description and version as they weren't used
Flattened the sparse embedding response parser format by removing the sparse_result and value fields
Refactored the sparse embedding response parser format to have the token and weight fields include the full path
Adding response.error_parser to indicate the location to find the error message field
Removed the custom task type support, the reason being that it'd be difficult for the client libraries to handle a custom response
Refactored the sparse embedding json_parser fields to only be path
- The parsing logic expects the response to be a map of token id and weight so we only need a path field to tell it where to find that nested map
- NOTE: This will not support how ELSER formats the response (for example hugging face elser, or elasticsearch). If we want to support that in the future, I think we could add a format field that specifies how the response is structure (elser's structure is an array of maps, where the key is the token id and the value is the weight, this parser expects the map to have a token id field and a weight field)

Add Custom Model support to Inference API.

You can use this Inference API to invoke models that support the HTTP format.

Inference Endpoint Creation:

Endpoint creation

PUT _inference/{task_type}/{inference_id}
{
  "service": "custom-model",
  "service_settings": {
    "secret_parameters": {
      ...
    },
    "url": "<<url>>",
    "headers": {
      <<header parameters>>
    },
    "query_parameters": {
      <<parameters>>
    },
    "request": {
      "content": "<<content>>"
    },
    "response": {
      "json_parser":{
        ...
      },
      "error_parser":{
       ...
      }
    }
  },
  "task_settings": {
    "parameters":{
      ...
    }
  }
}

Support task_type

text_embedding
sparse_embedding
rerank
completion

Parameter Description

secret_parameters: secret parameters like api_key can be defined here.

"secret_parameters":{
  "api_key":"xxx"
}

headers(optional):https' header parameters

"headers":{
  "Authorization": "Bearer ${api_key}",    //Replace the placeholders when constructing the request.
  "Content-Type": "application/json;charset=utf-8"
}

request.content: The body structure of the request requires passing in the string-escaped result of the JSON format HTTP request body.

"request":{
  "content":"{\"input\":${input}}"
}

# use kibana
"request":{
  "content":"""
    {
      "input":${input}   //Replace the placeholders when constructing the request.
    }
    """
}

NOTE: Unfortunately, if we aren't using kibana the content string needs to be a single line

response.json_parser: We need to parse the returned response into an object that Elasticsearch can recognize.(TextEmbeddingFloatResults, SparseEmbeddingResults, RankedDocsResults, ChatCompletionResults)
Therefore, we use jsonPath syntax to parse the necessary content from the response.
(For the text_embedding type, we need a List<List<Float>> object. The same applies to other types.)
Different task types have different json_parser parameters.

# text_embedding
"response":{
  "json_parser":{
    "text_embeddings":"$.result.embeddings[*].embedding"
  }
}

# sparse_embedding
"response":{
  "json_parser":{
    "token_path":"$.result[*].embeddings[*].token",
    "weight_path":"$.result[*].embeddings[*].weight"
  }
}

# rerank
"response":{
  "json_parser":{
    "reranked_index":"$.result.scores[*].index",    // optional
    "relevance_score":"$.result.scores[*].score",
    "document_text":"xxx"    // optional
  }
}

# completion
"response":{
  "json_parser":{
    "completion_result":"$.result.text"
  }
}

response.error_parser: Since each 3rd party service can have its own error response format we'll need the user to give us the location to retrieve the base error message. For example, openai's error structure is here: https://platform.openai.com/docs/api-reference/realtime-server-events/error. We'd want to extract the message field. An example of that might look like:

"response": {
    "error_parser": {
        "path": "$.error.message"
    }
}

task_settings.parameters: Due to the limitations of the inference framework, if the model requires more parameters to be configured, they can be set in task_settings.parameters. These parameters can be placed in the request.body as placeholders and replaced with the configured values when constructing the request.

"task_settings":{
  "parameters":{
    "input_type":"query",
    "return_token":true
  }
}

Testing

🚧 In progress

Jon Testing

OpenAI

Texting Embedding

PUT _inference/text_embedding/test
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": <api_key>
        },
        "url": "https://api.openai.com/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
        },
        "request": {
            "content": "{\"input\": ${input}, \"model\": \"text-embedding-3-small\"}"
        },
        "response": {
            "json_parser": {
                "text_embeddings": "$.data[*].embedding[*]"
            },
            "error_parser": {
                "path": "$.error.message"
            }
        }
    }
}

POST _inference/text_embedding/test
{
    "input": ["The quick brown fox jumps over the lazy dog"]
}

Cohere

Rerank

PUT _inference/rerank/test-rerank
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/rerank",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": {
            "content": "{\"documents\": ${input}, \"query\": ${query}, \"model\": \"rerank-v3.5\"}"
        },
        "response": {
            "json_parser": {
                "reranked_index":"$.results[*].index",
                "relevance_score":"$.results[*].relevance_score"
            },
            "error_parser": {
                "path": "$.message"
            }
        }
    }
}


POST _inference/rerank/test-rerank
{
    "input": [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
        "Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
        "Capital punishment has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
    ],
    "query": "What is the capital of the United States?"
}

Azure OpenAI

PUT _inference/completion/test-azure
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": <api key>
        },
        "url": <url>,
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "query_parameters": [
            ["api-version", "2025-01-01-preview"]
        ],
        "request": {
            "content": "{\"model\": \"gpt-4\", \"messages\":[{\"role\":\"user\",\"content\":${input}}]}"
        },
        "response": {
            "json_parser": {
                "completion_result":"$.choices[*].message.content"
            },
            "error_parser": {
                "path": "$.error.message"
            }
        }
    }
}

POST _inference/completion/test-azure
{
    "input": "Who is the president of the United States?"
}

Alibaba Testing

we use Alibaba Cloud AI Search Model for example,
Please replace the value of secret_parameters.api_key with your api_key.

text_embedding

PUT _inference/text_embedding/custom_embeddings
{
  "service":"custom-model",
  "service_settings":{
        "secret_parameters":{
        "api_key":"<<your api_key>>"
        },
        "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
        },
        "request":{
            "content":"""
                {
                "input":${input}
                }
                """
        },
        "response":{
            "json_parser":{
                "text_embeddings":"$.result.embeddings[*].embedding"
            },
            "error_parser": {
                "path": "$.error.message"
            }
        }
    }
}

POST _inference/text_embedding/custom_embeddings
{
  "input":"test"
}

sparse_embedding

PUT _inference/sparse_embedding/custom_sparse_embedding
{
  "service":"custom-model",
  "service_settings":{
    "secret_parameters":{
      "api_key":<<your api_key>>
    },
    "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-sparse-embedding/ops-text-sparse-embedding-001",
    "headers":{
      "Authorization": "Bearer ${api_key}",
      "Content-Type": "application/json;charset=utf-8"
    },
    "request":{
      "content":"""
        {
          "input": ${input},
          "input_type": "${input_type}",
          "return_token": ${return_token}
        }
        """
    },
    "response":{
      "json_parser":{
        "token_path":"$.result[*].embeddings[*].token",
         "weight_path":"$.result[*].embeddings[*].weight"
      },
      "error_parser": {
         "path": "$.error.message"
      }
    }
  },
  "task_settings":{
    "parameters":{
      "input_type":"query",
      "return_token":true
    }
  }
}

POST _inference/sparse_embedding/custom_sparse_embedding?error_trace
{
  "input":["hello", "world"]
}

rerank

PUT _inference/rerank/custom_rerank
{
    "service":"custom-model",
    "service_settings":{
        "secret_parameters":{
            "api_key":<<your api_key>>
        },
        "url":"http://default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
        "headers":{
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
        },
        "request":{
            "content":"""
                {
                "query": "${query}",
                "docs": ${input}
                }
            """
        },
        "response":{
            "json_parser":{
                "reranked_index":"$.result.scores[*].index",
                "relevance_score":"$.result.scores[*].score"
            },
            "error_parser": {
                "path": "$.error.message"
            }
        }
    }
}

POST _inference/rerank/custom_rerank
{
  "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
  "query": "star wars main character"
}

elasticsearchmachine · 2025-05-08T20:08:48Z

Hi @jonathan-buttner, I've created a changelog YAML for you.

…ttner/elasticsearch into custom-inference-service-jon

jonathan-buttner · 2025-05-20T14:27:43Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

@@ -36,7 +36,7 @@ public abstract class BaseResponseHandler implements ResponseHandler {
    public static final String METHOD_NOT_ALLOWED = "Received a method not allowed status code";

    protected final String requestType;
-    private final ResponseParser parseFunction;
+    protected final ResponseParser parseFunction;


Making this available so the custom response handler can immediately return on a parse failure instead of retrying.

jonathan-buttner · 2025-05-20T14:28:43Z

...inference/src/main/java/org/elasticsearch/xpack/inference/services/custom/CustomService.java

+        private static final LazyInitializable<InferenceServiceConfiguration, RuntimeException> configuration = new LazyInitializable<>(
+            () -> {
+                var configurationMap = new HashMap<String, SettingsConfiguration>();
+                // TODO revisit this


We'll need to create some more complex configuration types to support the fields (like maps, lists of lists etc). Maybe for now we don't expose this in the services API?

jonathan-buttner · 2025-05-20T14:29:21Z

...e/src/main/java/org/elasticsearch/xpack/inference/services/custom/CustomServiceSettings.java

+
+        Map<String, Object> headers = extractOptionalMap(map, HEADERS, ModelConfigurations.SERVICE_SETTINGS, validationException);
+        removeNullValues(headers);
+        var stringHeaders = validateMapStringValues(headers, HEADERS, validationException, false);


This should limit the values in the header map to only strings.

jonathan-buttner · 2025-05-20T14:29:53Z

...ence/src/main/java/org/elasticsearch/xpack/inference/services/custom/CustomTaskSettings.java

+        removeNullValues(parameters);
+        validateMapValues(
+            parameters,
+            List.of(String.class, Integer.class, Double.class, Float.class, Boolean.class),


Restricting the task settings to these types (no nested fields aka maps or lists).

jonathan-buttner · 2025-05-20T14:30:19Z

...ference/src/main/java/org/elasticsearch/xpack/inference/services/custom/QueryParameters.java

+    public static final String QUERY_PARAMETERS = "query_parameters";
+
+    public static QueryParameters fromMap(Map<String, Object> map, ValidationException validationException) {
+        List<Tuple<String, String>> queryParams = extractOptionalListOfStringTuples(


Query parameters can have duplicate keys which is why I'm not using a map here.

jonathan-buttner · 2025-05-20T14:31:13Z

...e/src/main/java/org/elasticsearch/xpack/inference/services/custom/request/CustomRequest.java

+        uri = buildUri();
+    }
+
+    private static void addStringParams(Map<String, String> stringParams, Map<String, ?> paramsToAdd) {


Fields like the url, query parameters, and headers should not have their values converted to json format. This only accepts strings and doesn't manipulate them.

jonathan-buttner · 2025-05-20T14:31:34Z

...e/src/main/java/org/elasticsearch/xpack/inference/services/custom/request/CustomRequest.java

+        }
+    }
+
+    private static void addJsonStringParams(Map<String, String> jsonStringParams, Map<String, ?> params) {


Fields like the request body need to be a valid json object so we'll convert the values into json

jonathan-buttner · 2025-05-20T14:32:07Z

.../main/java/org/elasticsearch/xpack/inference/services/settings/SerializableSecureString.java

+import java.io.IOException;
+import java.util.Objects;
+
+public class SerializableSecureString implements ToXContentFragment, Writeable {


If we need to serialize the api key or some secrets to the body of a request this class will make that process a little easier by implementing toXContent()

I'm not sure if this class adds or removes complexity. CustomSecretSettings could easily handle the streaming of a secure string and the toXContent parts. Perhaps this is a naming issue as SerializableSecureString is not a type of SecureString it's a wrapper around one

Addressed here, I was able to remove the class and use SecureString directly: #128698

jonathan-buttner · 2025-05-20T14:33:35Z

...inference/src/test/java/org/elasticsearch/xpack/inference/services/AbstractServiceTests.java

+import static org.hamcrest.Matchers.is;
+import static org.mockito.Mockito.mock;
+
+/**


This class is an attempt to push a lot of the duplicate logic in the inference service tests into a central place. If we create more services we should leverage this base class to remove the copy/paste.

jonathan-buttner · 2025-05-20T14:34:09Z

...ain/java/org/elasticsearch/xpack/inference/services/custom/response/ErrorResponseParser.java

-            // swallow the error
+            var resultAsString = new String(httpResult.body(), StandardCharsets.UTF_8);
+
+            logger.info(


I ran into a scenario where azure openai didn't return a json response. Normally if that happens we'd swallow the parse error but wouldn't report anything useful back. With this change we'll log the parse failure. The error parsing logic should only be called if we receive a failure status code. If many requests fail, and we are unable to parse the error we could log many errors.

…ttner/elasticsearch into custom-inference-service-jon

jonathan-buttner · 2025-05-20T19:16:20Z

@elasticmachine merge upstream

elasticmachine · 2025-05-20T19:16:23Z

There are no new commits on the base branch.

elasticsearchmachine · 2025-05-20T20:51:55Z

Pinging @elastic/ml-core (Team:ML)

davidkyle

LGTM

.../plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ServiceUtils.java

davidkyle · 2025-05-29T09:26:45Z

.../main/java/org/elasticsearch/xpack/inference/services/settings/SerializableSecureString.java

+import java.io.IOException;
+import java.util.Objects;
+
+public class SerializableSecureString implements ToXContentFragment, Writeable {


I'm not sure if this class adds or removes complexity. CustomSecretSettings could easily handle the streaming of a secure string and the toXContent parts. Perhaps this is a naming issue as SerializableSecureString is not a type of SecureString it's a wrapper around one

davidkyle · 2025-05-29T09:38:03Z

...inference/src/test/java/org/elasticsearch/xpack/inference/services/AbstractServiceTests.java

+ * To use this class, extend it and pass the constructor a configuration.
+ * </p>
+ */
+public abstract class AbstractServiceTests extends ESTestCase {


Suggested change

public abstract class AbstractServiceTests extends ESTestCase {

public abstract class AbstractInferenceServiceTests extends ESTestCase {

davidkyle · 2025-05-29T11:44:25Z

Embedding Services

I successfully created embedding services for Cohere and VoyageAI, it was actually quite simple to get something working but I have a few questions/suggestions.

Can the request body go in the request field rather than request.content. I can't think of any other field to nest under request that would warrant it being an object rather than a string.
error_parser is required but it's actually quite difficult to figure out what the path should be. Not many services document the error response. Can error_parser be optional and default to stringifying the error response?
Embedding services have an input_type option the value of which depends on the context (ingest or search) and is different for different services (e.g. query vs search_document). To support input_type in a custom service the service should declare a map from the context to the string value so it can be added to the request body "input_type": $input_type
There isn't a way of declaring what the embedding data type is (binary, byte, float)
Embeddings need to be tested with semantic text
The validation of the $variable replacement works well but is hard to read. In the example below I mistyped api_key as ai_key. It would be better if the Found placeholder [${ai_key}] in field [header.Authorization] after replacement call error appeared as the root_cause

{
  "error": {
    "root_cause": [
      {
        "type": "exception",
        "reason": "Http client failed to send request from inference entity id [custom_embeddings2]"
      }
    ],
    "type": "status_exception",
    "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
    "caused_by": {
      "type": "exception",
      "reason": "Http client failed to send request from inference entity id [custom_embeddings2]",
      "caused_by": {
        "type": "illegal_state_exception",
        "reason": "Found placeholder [${ai_key}] in field [header.Authorization] after replacement call"
      }
    }
  },
  "status": 400
}

# Cohere 
PUT _inference/text_embedding/custom_cohere_embeddings
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "X"
    },
    "url": "https://api.cohere.com/v2/embed",
    "headers": {
      "Authorization": "Bearer ${api_key}",
      "Content-Type": "application/json;charset=utf-8"
    },
    "request": {
      "content": """
        {
          "model": "embed-v4.0",
          "texts":${input},
          "input_type":"search",
          "output_dimension": 512
        }
        """
    },
    "response": {
      "json_parser": {
        "text_embeddings": "$.embeddings.float[*]"
      },
      "error_parser": {
        "path": "$.message"
      }
    }
  }
}


# VoyageAI
PUT _inference/text_embedding/voyage_embeddings
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "X"
    },
    "url": "https://api.voyageai.com/v1/embeddings",
    "headers": {
      "Authorization": "Bearer ${api_key}",
      "Content-Type": "application/json"
    },
    "request": {
      "content": """
        {
          "model": "voyage-3-large",
          "input":${input},
          "input_type":"query",
          "output_dimension": 512,
          "output_dtype": "int8"
        }
        """
    },
    "response": {
      "json_parser": {
        "text_embeddings": "$.data[*].embedding"
      },
      "error_parser": {
        "path": "$.message"
      }
    }
  }
}

# Granite running on local LM Studio
PUT _inference/text_embedding/granite_embeddings
{
  "service": "custom",
  "service_settings": {
    "similarity": "cosine",
    "url": "http://127.0.0.1:1234/api/v0/embeddings",
    "headers": {
      "Content-Type": "application/json"
    },
    "request": {
      "content": """
        {
          "model": "text-embedding-granite-embedding-278m-multilingual",
          "input":${input}
        }
        """
    },
    "response": {
      "json_parser": {
        "text_embeddings": "$.data[*].embedding"
      },
      "error_parser": {
        "path": "$.message"
      }
    }
  }
}

davidkyle · 2025-05-29T12:30:08Z

...ce/src/main/java/org/elasticsearch/xpack/inference/services/custom/CustomRequestManager.java

+        }
+
+        try {
+            var request = new CustomRequest(query, input, model);


EmbeddingsInput has a inputType parameter that should be passed to the CustomRequest so that is can be replaced in the request body. Same for the topN and returnDocuments options in QueryAndDocsInputs, these could all be passed to the CustomRequest constructor as a loose map

davidkyle · 2025-05-29T13:48:18Z

Rerank

# With JinaAi
PUT _inference/rerank/jina
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "X"
    },    
    "url": "https://api.jina.ai/v1/rerank",
    "headers": {
      "Content-Type": "application/json",
      "Authorization": "Bearer ${api_key}"
    },
    "request": {
      "content": """
        {
          "model": "jina-reranker-v2-base-multilingual",
          "query": ${query},
          "documents":${input}
        }
        """
    },
    "response": {
      "json_parser": {
        "relevance_score": "$.results[*].relevance_score",
        "reranked_index": "$.results[*].index"
      },
      "error_parser": {
        "path": "$.message"
      }
    }
  }
}

davidkyle · 2025-05-29T14:37:57Z

Sparse Embdding

# Using the Alibaba sparse model
PUT _inference/sparse_embedding/custom_sparse_embedding
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "X"
    },
    "url": "http://XXX/v3/openapi/workspaces/default/text-sparse-embedding/ops-text-sparse-embedding-001",
    "headers": {
      "Authorization": "Bearer ${api_key}",
      "Content-Type": "application/json;charset=utf-8"
    },
    "request": {
      "content": """
              {
                "input": ${input}
              }
              """
    },
    "response": {
      "json_parser": {
        "token_path": "$.result.sparse_embeddings[*].embedding[*].token_id",
        "weight_path": "$.result.sparse_embeddings[*].embedding[*].weight"
      },
      "error_parser": {
        "path": "$.message"
      }
    }
  }
}

davidkyle

LGTM

jonathan-buttner · 2025-05-29T15:00:04Z

Dave and I chatted, I'll address the feedback in followup PRs. I added a feature flag to exclude the current functionality from production.

…nference-service-jon

…ttner/elasticsearch into custom-inference-service-jon

…nference-service-jon

…ttner/elasticsearch into custom-inference-service-jon

elasticsearchmachine · 2025-05-29T21:13:34Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127939

jonathan-buttner · 2025-05-29T21:22:05Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Questions ?

Please refer to the Backport tool documentation

* Inference changes * Custom service fixes * Update docs/changelog/127939.yaml * Cleaning up from failed merge * Fixing changelog * [CI] Auto commit changes from spotless * Fixing test * Adding feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 9db1837) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # test/test-clusters/src/main/java/org/elasticsearch/test/cluster/FeatureFlag.java

jonathan-buttner · 2025-05-30T20:31:26Z

I'll track addressing the feedback using this comment

Dave's feedback:

Can the request body go in the request field rather than request.content. I can't think of any other field to nest under request that would warrant it being an object rather than a string. [ML] Flattening request map for custom service #128699
error_parser is required but it's actually quite difficult to figure out what the path should be. Not many services document the error response. Can error_parser be optional and default to stringifying the error response?
- I removed the error parser, now we'll just convert the response to a string [ML] Remove error parsing functionality for custom service #128778
Embedding services have an input_type option the value of which depends on the context (ingest or search) and is different for different services (e.g. query vs search_document). To support input_type in a custom service the service should declare a map from the context to the string value so it can be added to the request body "input_type": $input_type [ML] Custom service add support for input_type, top_n, and return_documents #129441
Embeddings need to be tested with semantic text [ML] Custom service adding support for the semantic text field #129558
The validation of the $variable replacement works well but is hard to read. In the example below I mistyped api_key as ai_key. It would be better if the Found placeholder [${ai_key}] in field [header.Authorization] after replacement call error appeared as the root_cause [ML] CustomService adding template validation prior to request flow #129591
There isn't a way of declaring what the embedding data type is (binary, byte, float)

* Inference changes * Custom service fixes * Update docs/changelog/127939.yaml * Cleaning up from failed merge * Fixing changelog * [CI] Auto commit changes from spotless * Fixing test * Adding feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>

jonathan-buttner added 2 commits May 8, 2025 15:54

Inference changes

14a5383

Custom service fixes

eba5fce

jonathan-buttner added >enhancement :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels May 8, 2025

Update docs/changelog/127939.yaml

9af98be

jonathan-buttner added 3 commits May 8, 2025 16:16

Cleaning up from failed merge

cb09e30

Merge branch 'custom-inference-service-jon' of github.com:jonathan-bu…

c8642cd

…ttner/elasticsearch into custom-inference-service-jon

Fixing changelog

e7c62d8

This was referenced May 8, 2025

[ML] Custom Inference Service #125679

Closed

[Inference API] Add Custom Model support to Inference API #124299

Open

[CI] Auto commit changes from spotless

6bb2a95

jonathan-buttner changed the title ~~Custom inference service jon~~ Adding configurable inference service May 20, 2025

jonathan-buttner commented May 20, 2025

View reviewed changes

jonathan-buttner added 4 commits May 20, 2025 10:39

Fixing transport version

67329e2

Merge branch 'custom-inference-service-jon' of github.com:jonathan-bu…

dd14970

…ttner/elasticsearch into custom-inference-service-jon

Fixing test

6be22b5

Fixing transport version

da1c71f

jonathan-buttner marked this pull request as ready for review May 20, 2025 20:51

davidkyle reviewed May 29, 2025

View reviewed changes

[CI] Auto commit changes from spotless

280d4dd

davidkyle approved these changes May 29, 2025

View reviewed changes

jonathan-buttner enabled auto-merge (squash) May 29, 2025 15:00

jonathan-buttner and others added 7 commits May 29, 2025 11:47

Fixing test issue

d1137b6

Merge branch 'main' of github.com:elastic/elasticsearch into custom-i…

e955bf4

…nference-service-jon

Merge branch 'custom-inference-service-jon' of github.com:jonathan-bu…

27fdfa8

…ttner/elasticsearch into custom-inference-service-jon

[CI] Auto commit changes from spotless

7d2c112

Fixing the expected values

63fdaed

Merge branch 'main' of github.com:elastic/elasticsearch into custom-i…

95f05d2

…nference-service-jon

Merge branch 'custom-inference-service-jon' of github.com:jonathan-bu…

e085040

…ttner/elasticsearch into custom-inference-service-jon

jonathan-buttner merged commit 9db1837 into elastic:main May 29, 2025
18 checks passed

elasticsearchmachine added the backport pending label May 29, 2025

jonathan-buttner mentioned this pull request May 29, 2025

[8.19] Adding configurable inference service (#127939) #128644

Merged

jonathan-buttner deleted the custom-inference-service-jon branch May 30, 2025 14:56

jonathan-buttner changed the title ~~Adding configurable inference service~~ [ML] Adding configurable inference service May 30, 2025

jonathan-buttner mentioned this pull request May 30, 2025

[ML] Removing secure string wrapper for custom service #128698

Merged

This was referenced May 30, 2025

[ML] Flattening request map for custom service #128699

Merged

[ML] Remove error parsing functionality for custom service #128778

Merged

	public abstract class AbstractServiceTests extends ESTestCase {
	public abstract class AbstractInferenceServiceTests extends ESTestCase {

[ML] Adding configurable inference service #127939

[ML] Adding configurable inference service #127939

Conversation

jonathan-buttner commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notable changes from initial PR:

Inference Endpoint Creation:

Support task_type

Parameter Description

Testing

Texting Embedding

Rerank

text_embedding

sparse_embedding

rerank

Uh oh!

elasticsearchmachine commented May 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner commented May 20, 2025

Uh oh!

elasticmachine commented May 20, 2025

Uh oh!

elasticsearchmachine commented May 20, 2025

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Embedding Services

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle commented May 29, 2025

Rerank

Uh oh!

davidkyle commented May 29, 2025

Sparse Embdding

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner commented May 29, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented May 29, 2025

💔 Backport failed

jonathan-buttner commented May 8, 2025 •

edited

Loading

jonathan-buttner May 30, 2025 •

edited

Loading

davidkyle commented May 29, 2025 •

edited

Loading

jonathan-buttner commented May 30, 2025 •

edited

Loading