Skip to content

[CI] CohereServiceMixedIT testCohereEmbeddings failing #127872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elasticsearchmachine opened this issue May 7, 2025 · 4 comments
Closed

[CI] CohereServiceMixedIT testCohereEmbeddings failing #127872

elasticsearchmachine opened this issue May 7, 2025 · 4 comments
Assignees
Labels
:ml Machine learning needs:risk Requires assignment of a risk label (low, medium, blocker) Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented May 7, 2025

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:inference:qa:mixed-cluster:v8.19.0#javaRestTest" -Dtests.class="org.elasticsearch.xpack.inference.qa.mixed.CohereServiceMixedIT" -Dtests.method="testCohereEmbeddings" -Dtests.seed=3944E0D378F145C9 -Dtests.bwc=true -Dtests.locale=ar-JO -Dtests.timezone=Asia/Novosibirsk -Druntime.java=24

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:36815], URI [_inference/text_embedding/mixed-cluster-cohere-embeddings-float], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"resource_not_found_exception","reason":"Inference endpoint not found [mixed-cluster-cohere-embeddings-float]"}],"type":"resource_not_found_exception","reason":"Inference endpoint not found [mixed-cluster-cohere-embeddings-float]"},"status":404}

Issue Reasons:

  • [main] 3 consecutive failures in test testCohereEmbeddings
  • [main] 9 consecutive failures in step 8.19.0_bwc-snapshots
  • [main] 2 consecutive failures in pipeline elasticsearch-pull-request
  • [main] 10 failures in test testCohereEmbeddings (3.9% fail rate in 259 executions)
  • [main] 10 failures in step 8.19.0_bwc-snapshots (8.1% fail rate in 123 executions)
  • [main] 4 failures in pipeline elasticsearch-intake (12.9% fail rate in 31 executions)
  • [main] 5 failures in pipeline elasticsearch-pull-request (5.3% fail rate in 94 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added >test-failure Triaged test failures from CI :ml Machine learning labels May 7, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 2 consecutive failures in step 8.19.0_bwc-snapshots
  • [main] 3 failures in test testCohereEmbeddings (1.2% fail rate in 256 executions)
  • [main] 3 failures in step 8.19.0_bwc-snapshots (2.5% fail rate in 121 executions)
  • [main] 2 failures in pipeline elasticsearch-pull-request (2.1% fail rate in 96 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label May 7, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label May 7, 2025
@jonathan-buttner jonathan-buttner self-assigned this May 8, 2025
@jonathan-buttner
Copy link
Contributor

I think the cause of these is a transport version mismatch. It should be fixed here: https://github.com/elastic/elasticsearch/pull/127779/files#diff-85e782e9e33a0f8ca8e99b41c17f9d04e3a7981d435abf44a3aa5d954a47cd8fR220

After that is merged I'll do some testing and unmute the tests.

@jonathan-buttner
Copy link
Contributor

PR to fix: #128000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning needs:risk Requires assignment of a risk label (low, medium, blocker) Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

2 participants