-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[ML] Adding configurable inference service #127939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Adding configurable inference service #127939
Conversation
Hi @jonathan-buttner, I've created a changelog YAML for you. |
…ttner/elasticsearch into custom-inference-service-jon
@@ -36,7 +36,7 @@ public abstract class BaseResponseHandler implements ResponseHandler { | |||
public static final String METHOD_NOT_ALLOWED = "Received a method not allowed status code"; | |||
|
|||
protected final String requestType; | |||
private final ResponseParser parseFunction; | |||
protected final ResponseParser parseFunction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this available so the custom response handler can immediately return on a parse failure instead of retrying.
private static final LazyInitializable<InferenceServiceConfiguration, RuntimeException> configuration = new LazyInitializable<>( | ||
() -> { | ||
var configurationMap = new HashMap<String, SettingsConfiguration>(); | ||
// TODO revisit this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to create some more complex configuration types to support the fields (like maps, lists of lists etc). Maybe for now we don't expose this in the services API?
|
||
Map<String, Object> headers = extractOptionalMap(map, HEADERS, ModelConfigurations.SERVICE_SETTINGS, validationException); | ||
removeNullValues(headers); | ||
var stringHeaders = validateMapStringValues(headers, HEADERS, validationException, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should limit the values in the header map to only strings.
removeNullValues(parameters); | ||
validateMapValues( | ||
parameters, | ||
List.of(String.class, Integer.class, Double.class, Float.class, Boolean.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restricting the task settings to these types (no nested fields aka maps or lists).
public static final String QUERY_PARAMETERS = "query_parameters"; | ||
|
||
public static QueryParameters fromMap(Map<String, Object> map, ValidationException validationException) { | ||
List<Tuple<String, String>> queryParams = extractOptionalListOfStringTuples( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query parameters can have duplicate keys which is why I'm not using a map here.
uri = buildUri(); | ||
} | ||
|
||
private static void addStringParams(Map<String, String> stringParams, Map<String, ?> paramsToAdd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields like the url, query parameters, and headers should not have their values converted to json format. This only accepts strings and doesn't manipulate them.
} | ||
} | ||
|
||
private static void addJsonStringParams(Map<String, String> jsonStringParams, Map<String, ?> params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields like the request body need to be a valid json object so we'll convert the values into json
import java.io.IOException; | ||
import java.util.Objects; | ||
|
||
public class SerializableSecureString implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to serialize the api key or some secrets to the body of a request this class will make that process a little easier by implementing toXContent()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this class adds or removes complexity. CustomSecretSettings
could easily handle the streaming of a secure string and the toXContent parts. Perhaps this is a naming issue as SerializableSecureString
is not a type of SecureString
it's a wrapper around one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here, I was able to remove the class and use SecureString
directly: #128698
import static org.hamcrest.Matchers.is; | ||
import static org.mockito.Mockito.mock; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is an attempt to push a lot of the duplicate logic in the inference service tests into a central place. If we create more services we should leverage this base class to remove the copy/paste.
// swallow the error | ||
var resultAsString = new String(httpResult.body(), StandardCharsets.UTF_8); | ||
|
||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into a scenario where azure openai didn't return a json response. Normally if that happens we'd swallow the parse error but wouldn't report anything useful back. With this change we'll log the parse failure. The error parsing logic should only be called if we receive a failure status code. If many requests fail, and we are unable to parse the error we could log many errors.
…ttner/elasticsearch into custom-inference-service-jon
@elasticmachine merge upstream |
There are no new commits on the base branch. |
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
.../plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/ServiceUtils.java
Outdated
Show resolved
Hide resolved
import java.io.IOException; | ||
import java.util.Objects; | ||
|
||
public class SerializableSecureString implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this class adds or removes complexity. CustomSecretSettings
could easily handle the streaming of a secure string and the toXContent parts. Perhaps this is a naming issue as SerializableSecureString
is not a type of SecureString
it's a wrapper around one
* To use this class, extend it and pass the constructor a configuration. | ||
* </p> | ||
*/ | ||
public abstract class AbstractServiceTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public abstract class AbstractServiceTests extends ESTestCase { | |
public abstract class AbstractInferenceServiceTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Embedding ServicesI successfully created embedding services for Cohere and VoyageAI, it was actually quite simple to get something working but I have a few questions/suggestions.
|
} | ||
|
||
try { | ||
var request = new CustomRequest(query, input, model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EmbeddingsInput
has a inputType
parameter that should be passed to the CustomRequest
so that is can be replaced in the request body. Same for the topN
and returnDocuments
options in QueryAndDocsInputs
, these could all be passed to the CustomRequest
constructor as a loose map
Rerank
|
Sparse Embdding
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Dave and I chatted, I'll address the feedback in followup PRs. I added a feature flag to exclude the current functionality from production. |
…nference-service-jon
…ttner/elasticsearch into custom-inference-service-jon
…nference-service-jon
…ttner/elasticsearch into custom-inference-service-jon
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Inference changes * Custom service fixes * Update docs/changelog/127939.yaml * Cleaning up from failed merge * Fixing changelog * [CI] Auto commit changes from spotless * Fixing test * Adding feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 9db1837) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # test/test-clusters/src/main/java/org/elasticsearch/test/cluster/FeatureFlag.java
I'll track addressing the feedback using this comment Dave's feedback:
|
* Inference changes * Custom service fixes * Update docs/changelog/127939.yaml * Cleaning up from failed merge * Fixing changelog * [CI] Auto commit changes from spotless * Fixing test * Adding feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
* Inference changes * Custom service fixes * Update docs/changelog/127939.yaml * Cleaning up from failed merge * Fixing changelog * [CI] Auto commit changes from spotless * Fixing test * Adding feature flag * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
Taking the ideas and commits from #124299
Notable changes from initial PR:
path
andmethod
nestingurl
fieldquery_string
and converted it to a list of tuples, to leveragedescription
andversion
as they weren't usedsparse_result
andvalue
fieldsresponse.error_parser
to indicate the location to find the error message fieldpath
path
field to tell it where to find that nested mapformat
field that specifies how the response is structure (elser's structure is an array of maps, where the key is the token id and the value is the weight, this parser expects the map to have a token id field and a weight field)Add Custom Model support to Inference API.
You can use this Inference API to invoke models that support the HTTP format.
Inference Endpoint Creation:
Endpoint creation
Support task_type
Parameter Description
Parameter Description
secret_parameters
: secret parameters like api_key can be defined here.headers
(optional):https' header parametersrequest.content
: The body structure of the request requires passing in the string-escaped result of the JSON format HTTP request body.NOTE: Unfortunately, if we aren't using kibana the content string needs to be a single line
response.json_parser
: We need to parse the returned response into an object that Elasticsearch can recognize.(TextEmbeddingFloatResults, SparseEmbeddingResults, RankedDocsResults, ChatCompletionResults)Therefore, we use jsonPath syntax to parse the necessary content from the response.
(For the text_embedding type, we need a
List<List<Float>>
object. The same applies to other types.)Different task types have different json_parser parameters.
response.error_parser
: Since each 3rd party service can have its own error response format we'll need the user to give us the location to retrieve the base error message. For example, openai's error structure is here: https://platform.openai.com/docs/api-reference/realtime-server-events/error. We'd want to extract themessage
field. An example of that might look like:task_settings.parameters
: Due to the limitations of the inference framework, if the model requires more parameters to be configured, they can be set in task_settings.parameters. These parameters can be placed in the request.body as placeholders and replaced with the configured values when constructing the request.Testing
🚧 In progress
Jon Testing
OpenAI
Texting Embedding
Cohere
Rerank
Azure OpenAI
Alibaba Testing
we use Alibaba Cloud AI Search Model for example,
Please replace the value of
secret_parameters.api_key
with your api_key.text_embedding
sparse_embedding
rerank