Skip to content

feat(volc_mysql): add VolcMySQL backend with HNSW vector index support#804

Open
FishMage wants to merge 1 commit into
zilliztech:mainfrom
FishMage:volc_mysql
Open

feat(volc_mysql): add VolcMySQL backend with HNSW vector index support#804
FishMage wants to merge 1 commit into
zilliztech:mainfrom
FishMage:volc_mysql

Conversation

@FishMage

Copy link
Copy Markdown

Add a vector database backend for VolcMySQL (Volcano Engine MySQL with native VECTOR type and HNSW vector index), connecting via mysql-connector-python over the MySQL wire protocol.

Components:

  • volc_mysql.py: VectorDB implementation with VECTOR(dim) table, bulk LOAD DATA LOCAL INFILE loading, and CREATE VECTOR INDEX via SECONDARY_ENGINE_ATTRIBUTE (algorithm/distance/m/ef_construction and optional quant_algorithm/quant_type/refine_type for SQ/PQ/RaBitQ)
  • config.py: DBConfig (host/port/user/password) and VolcMySQLHNSWConfig (m/ef_search/ef_construction + quantization params)
  • cli.py: Click command VolcMySQLHNSW

Registration:

  • Add VolcMySQL to the DB enum in backend/clients/init.py with lazy imports for init_cls, config_cls, and case_config_cls
  • Register VolcMySQLHNSW CLI command in cli/vectordbbench.py
  • Add volc_mysql optional dependency in pyproject.toml

Binary VECTOR path with auto-probe fallback:

  • Vectors are sent as raw little-endian float32 bytes (UNHEX on load, the _binary introducer on query), avoiding to_vector() text parsing and Python str() formatting; _binary literals stay constant-foldable so the HNSW index scan is preserved
  • init() probes once per connection (session-local TEMPORARY TABLE) whether the server accepts the binary path and transparently falls back to the to_vector() text path -- for both insert and query -- when it does not. VDB_BINARY_VEC overrides the probe (1=force binary, 0=force text)

Tests: tests/test_volc_mysql_encoder.py covers binary and text TSV encoding.

@sre-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: FishMage
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add a vector database backend for VolcMySQL (Volcano Engine MySQL with
native VECTOR type and HNSW vector index), connecting via
mysql-connector-python over the MySQL wire protocol.

Components:
- volc_mysql.py: VectorDB implementation with VECTOR(dim) table,
  bulk LOAD DATA LOCAL INFILE loading, and CREATE VECTOR INDEX via
  SECONDARY_ENGINE_ATTRIBUTE (algorithm/distance/m/ef_construction and
  optional quant_algorithm/quant_type for SQ/PQ)
- config.py: DBConfig (host/port/user/password) and VolcMySQLHNSWConfig
  (m/ef_search/ef_construction + quantization params)
- cli.py: Click command `VolcMySQLHNSW`

Registration:
- Add VolcMySQL to the DB enum in backend/clients/__init__.py with lazy
  imports for init_cls, config_cls, and case_config_cls
- Register VolcMySQLHNSW CLI command in cli/vectordbbench.py
- Add volc_mysql optional dependency in pyproject.toml

Binary VECTOR path with auto-probe fallback:
- Vectors are sent as raw little-endian float32 bytes (UNHEX on load, the
  `_binary` introducer on query), avoiding to_vector() text parsing and
  Python str() formatting; `_binary` literals stay constant-foldable so the
  HNSW index scan is preserved
- init() probes once per connection (session-local TEMPORARY TABLE) whether
  the server accepts the binary path and transparently falls back to the
  to_vector() text path -- for both insert and query -- when it does not.
  VDB_BINARY_VEC overrides the probe (1=force binary, 0=force text)
- Recall is identical between the two paths (0.9786 vs 0.9773 on 1536D50K)

Tests: tests/test_volc_mysql_encoder.py covers binary and text TSV encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants