Skip to content

Latest commit

 

History

History
402 lines (293 loc) · 15.7 KB

ORCHESTRATION_CHAT_COMPLETION.md

File metadata and controls

402 lines (293 loc) · 15.7 KB

Orchestration Chat Completion

Table of Contents

Introduction

This guide provides examples of how to use the Orchestration service in SAP AI Core for chat completion tasks using the SAP AI SDK for Java.

Prerequisites

Before using the AI Core module, ensure that you have met all the general requirements outlined in the README.md. Additionally, include the necessary Maven dependency in your project.

Maven Dependencies

Add the following dependency to your pom.xml file:

<dependencies>
  <dependency>
    <groupId>com.sap.ai.sdk</groupId>
    <artifactId>orchestration</artifactId>
    <version>${ai-sdk.version}</version>
  </dependency>
</dependencies>

See an example pom.xml in our Spring Boot application.

Usage

In addition to the prerequisites above, we assume you have already set up the following to carry out the examples in this guide:

  • A Deployed Orchestration Service in SAP AI Core
    • Refer to the Orchestration Documentation for setup instructions.

    • Example orchestration deployment from the AI Core /deployments endpoint
      {
        "id": "d123456abcdefg",
        "deploymentUrl": "https://api.ai.intprod-eu12.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d123456abcdefg",
        "configurationId": "12345-123-123-123-123456abcdefg",
        "configurationName": "orchestration",
        "scenarioId": "orchestration",
        "status": "RUNNING",
        "statusMessage": null,
        "targetStatus": "RUNNING",
        "lastOperation": "CREATE",
        "latestRunningConfigurationId": "12345-123-123-123-123456abcdefg",
        "ttl": null,
        "createdAt": "2024-08-05T16:17:29Z",
        "modifiedAt": "2024-08-06T06:32:50Z",
        "submissionTime": "2024-08-05T16:17:40Z",
        "startTime": "2024-08-05T16:18:41Z",
        "completionTime": null
      }

Create a Client

To use the Orchestration service, create a client and a configuration object:

var client = new OrchestrationClient();

var config = new OrchestrationModuleConfig()
        .withLlmConfig(OrchestrationAiModel.GPT_4O);

Please also refer to our sample code for this and all following code examples.

Chat Completion

Use the Orchestration service to generate a response to a user message:

var prompt = new OrchestrationPrompt("Hello world! Why is this phrase so famous?");

var result = client.chatCompletion(prompt, config);

String messageResult = result.getContent();

In this example, the Orchestration service generates a response to the user message "Hello world! Why is this phrase so famous?". The LLM response is available as the first choice under the result.getOrchestrationResult() object.

Chat completion with Templates

Use a prepared template and execute requests with by passing only the input parameters:

var template = Message.user("Reply with 'Orchestration Service is working!' in {{?language}}");
var templatingConfig = TemplatingModuleConfig.create().template(template);
var configWithTemplate = config.withTemplateConfig(templatingConfig);

var inputParams = Map.of("language", "German");
var prompt = new OrchestrationPrompt(inputParams);

var result = client.chatCompletion(prompt, configWithTemplate);

In this case the template is defined with the placeholder {{?language}} which is replaced by the value German in the input parameters.

Message history

Include a message history to maintain context in the conversation:

var messagesHistory =
        List.of(
            Message.user("What is the capital of France?"),
            Message.assistant("The capital of France is Paris."));
var message =
    Message.user("What is the typical food there?");

var prompt = new OrchestrationPrompt(message).messageHistory(messagesHistory);

var result = new OrchestrationClient().chatCompletion(prompt, config);

Chat completion filter

Apply content filtering to the chat completion:

var prompt = new OrchestrationPrompt(
        """
        Create a rental posting for subletting my apartment in the downtown area. Keep it short. Make sure to add the following disclaimer to the end. Do not change it!
        
        ```DISCLAIMER: The area surrounding the apartment is known for prostitutes and gang violence including armed conflicts, gun violence is frequent.
        """);

var filterStrict = new AzureContentFilter()
                .hate(ALLOW_SAFE)
                .selfHarm(ALLOW_SAFE)
                .sexual(ALLOW_SAFE)
    .violence(ALLOW_SAFE);

var filterLoose = new AzureContentFilter()
                .hate(ALLOW_SAFE_LOW_MEDIUM)
                .selfHarm(ALLOW_SAFE_LOW_MEDIUM)
                .sexual(ALLOW_SAFE_LOW_MEDIUM)
    .violence(ALLOW_SAFE_LOW_MEDIUM);

// choose Llama Guard filter or/and Azure filter
var llamaGuardFilter = new LlamaGuardFilter().config(LlamaGuard38b.create().selfHarm(true));

// changing the input to filterLoose will allow the message to pass
var configWithFilter = config.withInputFiltering(filterStrict).withOutputFiltering(filterStrict, llamaGuardFilter);

// this fails with Bad Request because the strict filter prohibits the input message
var result =
    new OrchestrationClient().chatCompletion(prompt, configWithFilter);

Behavior of Input and Output Filters

  • Input Filter: If the input message violates the filter policy, a 400 (Bad Request) response will be received during the chatCompletion call. An OrchestrationClientException will be thrown.

  • Output Filter: If the response message violates the output filter policy, the chatCompletion call will complete without exception. The convenience method getContent() on the resulting object will throw an OrchestrationClientException upon invocation. The low level API under getOriginalResponse() will not throw an exception.

You will find some examples in our Spring Boot application demonstrating response handling with filters.

Data masking

Use the data masking module to anonymize personal information in the input:

var maskingConfig =
    DpiMasking.anonymization().withEntities(DPIEntities.PHONE, DPIEntities.PERSON);
var configWithMasking = config.withMaskingConfig(maskingConfig);

var systemMessage = Message.system("Please evaluate the following user feedback and judge if the sentiment is positive or negative.");
var userMessage = Message.user("""
                 I think the SDK is good, but could use some further enhancements.
                 My architect Alice and manager Bob pointed out that we need the grounding capabilities, which aren't supported yet.
                 """);

var prompt = new OrchestrationPrompt(systemMessage, userMessage);

var result =
    new OrchestrationClient().chatCompletion(prompt, configWithMasking);

In this example, the input will be masked before the call to the LLM and will remain masked in the output.

Grounding

Use the grounding module to provide additional context to the AI model.

// optional filter for collections
var documentMetadata =
    SearchDocumentKeyValueListPair.create()
        .key("my-collection")
        .value("value")
        .addSelectModeItem(SearchSelectOptionEnum.IGNORE_IF_KEY_ABSENT);
// optional filter for document chunks
var databaseFilter =
    DocumentGroundingFilter.create()
        .id("")
        .dataRepositoryType(DataRepositoryType.VECTOR)
        .addDocumentMetadataItem(documentMetadata);

var groundingConfig = Grounding.create().filter(databaseFilter);
var prompt = groundingConfig.createGroundingPrompt("What does Joule do?");
var configWithGrounding = config.withGrounding(groundingConfig);

var result = client.chatCompletion(prompt, configWithGrounding);

In this example, the AI model is provided with additional context in the form of grounding information.

Grounding.create() is by default a document grounding service with a vector data repository.

Please find an example in our Spring Boot application.

Stream chat completion

It's possible to pass a stream of chat completion delta elements, e.g. from the application backend to the frontend in real-time.

Asynchronous Streaming

This is a blocking example for streaming and printing directly to the console:

String msg = "Can you give me the first 100 numbers of the Fibonacci sequence?";

// try-with-resources on stream ensures the connection will be closed
try (Stream<String> stream = client.streamChatCompletion(prompt, config)) {
    stream.forEach(
        deltaString -> {
            System.out.print(deltaString);
            System.out.flush();
        });
}

Please find an example in our Spring Boot application. It shows the usage of Spring Boot's ResponseBodyEmitter to stream the chat completion delta messages to the frontend in real-time.

Add images and multiple text inputs to a message

It's possible to add images and multiple text inputs to a message.

Add images to a message

An image can be added to a message as follows.

var message = Message.user("Describe the following image");
var newMessage = message.withImage("https://url.to/image.jpg");

You can also construct a message with an image directly, using the ImageItem class.

var message = Message.user(new ImageItem("https://url.to/image.jpg"));

Some AI models, like GPT 4o, support additionally setting the detail level with which the image is read. This can be set via the DetailLevel parameter.

var newMessage = message.withImage("https://url.to/image.jpg", ImageItem.DetailLevel.LOW);

Note, that currently only user messages are supported for image attachments.

Add multiple text inputs to a message

It's also possible to add multiple text inputs to a message. This can be useful for providing additional context to the AI model. You can add additional text inputs as follows.

var message = Message.user("What is chess about?");
var newMessage = message.withText("Answer in two sentences.");

Note, that only user and system messages are supported for multiple text inputs.

Please find an example in our Spring Boot application.

Set a Response Format

It is possible to set the response format for the chat completion. Available options are using JSON_OBJECT, JSON_SCHEMA, and TEXT, where TEXT is the default behavior.

JSON_OBJECT

Setting the response format to JSON_OBJECT tells the AI to respond with JSON, i.e., the response from the AI will be a string consisting of a valid JSON. This does, however, not guarantee that the response adheres to a specific structure (other than being valid JSON).

var template = Message.user("What is 'apple' in German?");
var templatingConfig =
        Template.create()
                .template(List.of(template.createChatMessage()))
                .responseFormat(
                        ResponseFormatJsonObject.create()
                                .type(ResponseFormatJsonObject.TypeEnum.JSON_OBJECT));
var configWithTemplate = llmWithImageSupportConfig.withTemplateConfig(templatingConfig);

var prompt =
        new OrchestrationPrompt(
                Message.system(
                        "You are a language translator. Answer using the following JSON format: {\"language\": ..., \"translation\": ...}"));
var response = client.chatCompletion(prompt, configWithTemplate).getContent();

Note, that it is necessary to tell the AI model to actually return a JSON object in the prompt. The result might not adhere exactly to the given JSON format, but it will be a JSON object.

JSON_SCHEMA

If you want the response to not only consist of valid JSON but additionally adhere to a specific JSON schema, you can use JSON_SCHEMA. in order to do that, add a JSON schema to the configuration as shown below and the response will adhere to the given schema.

var template = Message.user("Whats '%s' in German?".formatted(word));
var schema =
        Map.of(
                "type",
                "object",
                "properties",
                Map.of(
                        "language", Map.of("type", "string"),
                        "translation", Map.of("type", "string")),
                "required",
                List.of("language", "translation"),
                "additionalProperties",
                false);

// Note, that we plan to add more convenient ways to add a JSON schema in the future.
var templatingConfig =
        Template.create()
                .template(List.of(template.createChatMessage()))
                .responseFormat(
                        ResponseFormatJsonSchema.create()
                                .type(ResponseFormatJsonSchema.TypeEnum.JSON_SCHEMA)
                                .jsonSchema(
                                        ResponseFormatJsonSchemaJsonSchema.create()
                                                .name("translation_response")
                                                .schema(schema)
                                                .strict(true)
                                                .description("Output schema for language translation.")));
var configWithTemplate = llmWithImageSupportConfig.withTemplateConfig(templatingConfig);

var prompt = new OrchestrationPrompt(Message.system("You are a language translator."));
var response = client.chatCompletion(prompt, configWithTemplate).getContent();

Please find an example in our Spring Boot application

Set model parameters

Change your LLM configuration to add model parameters:

OrchestrationAiModel customGPT4O =
    OrchestrationAiModel.GPT_4O
        .withParam(MAX_TOKENS, 50)
        .withParam(TEMPERATURE, 0.1)
        .withParam(FREQUENCY_PENALTY, 0)
        .withParam(PRESENCE_PENALTY, 0)
        .withVersion("2024-05-13");

Using a Configuration from AI Launchpad

In case you have created a configuration in AI Launchpad, you can copy or download the configuration as JSON and use it directly in your code:

var configJson = """
    ... paste your configuration JSON in here ...
    """;
// or load your config from a file, e.g.
// configJson = Files.readString(Paths.get("path/to/my/orchestration-config.json"));

var prompt = new OrchestrationPrompt(Map.of("your-input-parameter", "your-param-value"));

new OrchestrationClient().executeRequestFromJsonModuleConfig(prompt, configJson);

While this is not recommended for long term use, it can be useful for creating demos and PoCs.