Skip to content

support build IVF_FLAT/PQ/SQ vector index distributedly #4723

@chenghao-guo

Description

@chenghao-guo

Sub-issue of: #4155

Proposal: Distributed Vector Index (CPU-based IVF)

  • Question: Is there ongoing or existing work on a distributed vector index building?
  • Context:
    • Current vector index builds rely on CUDA for acceleration.
    • Single-node index builds still have high resource requirements (e.g., transient disk usage during build), which may also impact distributed scenarios.
  • Plan:
    • I intend to begin with a distributed IVF index implementation targeted at CPU-only nodes.
    • The design will draw on approaches from OpenSearch’s KNN and other disk-based ANN systems for building and managing IVF in a distributed setting.

Next steps

  • @yanghua and I will conduct research and begin a prototype over the next few months.
  • We will share design notes and open a PR once we have a concrete proposal or initial implementation for community feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions