Skip to content

Conversation

@abeglova
Copy link
Contributor

@abeglova abeglova commented Nov 20, 2025

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/9148

Description (What does it do?)

This pr updates the hybrid search index in opensearch to reuse vectors from qdrant. The hybrid search is currently behind a flag (search_mode=hybrid)

How can this be tested?

set
OPENAI_API_KEY to the value from rc
and
QDRANT_ENCODER=vector_search.encoders.litellm.LiteLLMEncoder

run docker-compose run web ./manage.py generate_embeddings --skip-contentfiles to ensure you have resource embedding in qdrant

run docker-compose run web ./manage.py recreate_index --combined_hybrid

Got to
http://open.odl.local:8062/search

search for "intro to ai course" you should have no results. The search with search mode not set should behave normally

select search_mode=hybrid from the admin options panel (login required) or just add it to the url

search for "intro to ai course" you should get results. Facets and base search should also still work

@github-actions
Copy link

github-actions bot commented Nov 20, 2025

OpenAPI Changes

Show/hide No detectable change.

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

@abeglova abeglova force-pushed the ab/use-qdrant-vector branch 2 times, most recently from 11eb260 to 2bca7f2 Compare November 20, 2025 16:36
@abeglova abeglova marked this pull request as ready for review November 20, 2025 17:35
@abeglova abeglova marked this pull request as draft November 20, 2025 18:17
@abeglova abeglova marked this pull request as ready for review November 20, 2025 20:27
@shanbady shanbady self-requested a review November 21, 2025 14:45
Copy link
Contributor

@shanbady shanbady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally it works great! just some minor cleanup suggestions/comments

}


def get_vector_for_learning_resource(readable_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be cleaner to add a "with_vectors" as an optional parameter to retrieve_points_matching_params and use that:

retrieve_points_matching_params({"readable_id": "course-234"}, with_vectors=True)[0]

}
encoder = dense_encoder()
query_vector = encoder.embed_query(text)
vector_query = {"knn": {"vector_embedding": {"vector": query_vector, "k": 5}}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to have "k" and "pagination_depth" defined in constants or in settings.py (or omitted altogether if they default to something reasonable)

"combination": {
"technique": "arithmetic_mean",
"parameters": {"weights": [0.6, 0.2, 0.2]},
"parameters": {"weights": [0.8, 0.2]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these also seem like magic numbers we may tweak often. consider moving to settings or as a constant. since all of search_pipeline is static json we could do something like:

OPENSEARCH_HYBRID_PIPELINE_CONFIGURATION = {...}

}
}

if object_type == COMBINED_INDEX:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the naming of these indexes are getting a bit confusing (constants.COMBINED_INDEX vs constants.ALL_INDEXES vs constants.BOTH_INDEXES) might be better to call this HYBRID_INDEX or HYBRID_COMBINED_INDEX etc

Copy link
Contributor

@shanbady shanbady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@abeglova abeglova force-pushed the ab/use-qdrant-vector branch from 0a3af13 to 3586b9e Compare November 21, 2025 21:38
@abeglova abeglova merged commit 3c35312 into main Nov 21, 2025
13 checks passed
@abeglova abeglova deleted the ab/use-qdrant-vector branch November 21, 2025 21:51
@odlbot odlbot mentioned this pull request Nov 24, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants