Add full-text search benchmark support by jamesgao-jpg · Pull Request #794 · zilliztech/VectorDBBench

jamesgao-jpg · 2026-06-05T07:51:13Z

Context

VDBBench did not have a dedicated native full-text search benchmark path. This PR adds FTS as a first-class benchmark workload so BM25-based text search can be evaluated through the same task, runner, dataset, frontend, and result pipeline used by the rest of VDBBench.

Summary

Add full-text search benchmark support centered on BM25 text retrieval.
Introduce FTS performance cases that load text documents, run text queries, and report comparable performance results.
Wire FTS through backend execution, dataset preparation, runner orchestration, Streamlit task generation, and result formatting.
Use manifest-driven FTS ground truth so recall is measured against generated mathematical BM25 neighbors rather than semantic relevance labels.

Backends Covered

Milvus: native BM25 full-text indexing/search configuration and execution path.
Zilliz Cloud: FTS routing through the Milvus-compatible API with Cloud sparse auto-index handling, sharing the Milvus optimize/compaction path.
ElasticCloud / Elasticsearch: BM25 text indexing/search path with configurable BM25 k1/b support.
Vespa: BM25 schema/query path plus Vespa feed-client loading for large FTS document ingestion.
Turbopuffer: namespace-based full-text benchmark path.

Testing Infra Touched

Dataset layer: add MS MARCO and HotpotQA FTS dataset definitions, document/query loading, and S3-hosted mathematical BM25 ground-truth loading.
Case layer: add FTS performance case definitions, payload profiles, and task assembly support.
Runner layer: support FTS document loading plus serial recall and concurrent text-query search execution while preserving the existing backend insert contract.
Backend layer: route Vespa FTS loading through its backend insert path, where the Vespa feed client is managed for high-throughput ingestion.
Frontend layer: expose FTS cases and generate backend-specific FTS task configs from Streamlit.
Result layer: format FTS benchmark outputs alongside existing VDBBench results.

Datasets Supported

MS MARCO: small 100K, medium 1M, large 8.8M documents.
HotpotQA: small 100K, medium 1M, large 5.2M documents.

Metrics

Search metric type: BM25.
Accuracy metric: recall@k against generated mathematical BM25 ground truth.
Performance metrics: serial latency p95/p99, concurrent QPS, load duration, optimize duration, inserted count, payload profile, batch size, and load concurrency.

Co-authored-by: zilliz <zilliz@zillizdeMacBook-Pro.local>

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

XuanYang-cn

Requesting changes on the FTS benchmark correctness contract. Inline comments preserve the existing architecture review draft structure for the blocking and medium findings.

XuanYang-cn · 2026-06-24T07:28:10Z

+    @classmethod
+    def supports_full_text_search(cls) -> bool:
+        return False
+
+    def insert_documents(
+        self,
+        texts: list[str],
+        doc_ids: list[str],
+        **kwargs,
+    ) -> tuple[int, Exception | None]:
+        msg = f"{self.name or self.__class__.__name__} does not support full-text document insert"
+        raise NotImplementedError(msg)
+
+    def search_documents(
+        self,
+        query: str,
+        k: int = 100,
+        payload_profile: PayloadProfile = PayloadProfile.IDS_ONLY,
+        **kwargs,
+    ) -> list[str]:
+        msg = f"{self.name or self.__class__.__name__} does not support full-text document search"
+        raise NotImplementedError(msg)
+


Please add detailed doc string to guide others on how to support fts-related API.

Addressed in ea08494. I added docstrings to the FTS extension points in VectorDB: supports_full_text_search, insert_documents, and search_documents, covering the capability gate, insert return contract, payload-profile behavior, and search return shape.

XuanYang-cn · 2026-06-25T02:32:31Z

+        analyzer_params = getattr(self.ca.dataset, "analyzer_params", {}) or {}
+        updates = {}
+
+        if self.config.db == DB.Milvus:


1. 🔴 Blocking — Runner owns backend-specific FTS manifest policy

Where: vectordb_bench/backend/task_runner.py:250

What: _apply_fts_manifest_params() branches on self.config.db == DB.Milvus / DB.ElasticCloud / DB.Vespa to translate BM25/analyzer manifest params into backend config fields.

Why it matters:

Puts backend-specific policy inside CaseRunner, which should only orchestrate the benchmark lifecycle.

Every new FTS-capable backend needs another runner edit, and one missed branch silently changes correctness.

DB.ZillizCloud falls through with no manifest updates. ZillizCloudFtsConfig extends the Milvus FTS config and this PR claims Zilliz Cloud is covered, so Zilliz Cloud can run with default BM25/analyzer settings while recall is scored against ground truth built from the manifest settings.

Fix: Replace the runner branching with a config-owned hook:

db_case_config.apply_fts_manifest(manifest) -> FtsManifestApplyResult returns the updated config plus which BM25/analyzer params were applied vs. unapplied.

CaseRunner calls this one hook before init_db(), swaps in the returned config, and records the report without importing or naming any client.

MilvusFtsConfig owns BM25/analyzer translation; ZillizCloudFtsConfig inherits it. ElasticCloudFtsConfig, VespaFtsConfig, and TurboPufferFtsConfig each report supported vs. unsupported params explicitly.

Add a contract test: every FTS-capable config either applies k1, b, analyzer params, and avgdl where supported, or explicitly reports them unsupported.

Why a config hook: manifest params must be applied before the client exists, so a client hook is too late; a central registry would duplicate backend config semantics and drift.

Addressed in 2eca41d. I moved FTS manifest application out of CaseRunner and into DBCaseConfig.apply_fts_manifest(...) implementations. The runner now only prepares the FTS dataset, calls the config hook, records the returned applied/unapplied manifest report, and then initializes the backend. Milvus owns the BM25/analyzer mapping, Zilliz Cloud inherits that behavior through ZillizCloudFtsConfig(MilvusFtsConfig), Elasticsearch applies supported BM25 k1/b, and Vespa applies k1/b/avgdl while unsupported analyzer params are reported as unapplied. Verified with git diff --check, compile checks for the touched modules, a focused local manifest-hook test, and tests/test_db_client_resolution.py.

XuanYang-cn · 2026-06-25T02:32:43Z

 MAX_STREAMLIT_INT = (1 << 53) - 1

 DB_LIST = [d for d in DB if d != DB.Test]
+FTS_SUPPORTED_DBS = {DB.Milvus, DB.ElasticCloud, DB.Vespa, DB.TurboPuffer}


2. 🔴 Blocking — Run-test UI excludes Zilliz Cloud FTS

Where: vectordb_bench/frontend/config/dbCaseConfigs.py:15

What: FTS_SUPPORTED_DBS lists Milvus, ElasticCloud, Vespa, TurboPuffer, but not DB.ZillizCloud. Yet the same file later defines a Zilliz Cloud FTS config mapping, and the PR commits Zilliz Cloud FTS results.

Why it matters: Backend support and committed results say Zilliz Cloud FTS exists, but the run-test UI filters the FTS cases out when Zilliz Cloud is selected. The matrix disagrees with itself.

Fix:

Add DB.ZillizCloud to FTS_SUPPORTED_DBS.

Add coverage for get_selectable_case_items(): every backend that has supports_full_text_search() and an FTS config must be selectable.

Addressed in 92ad3c4. I added DB.ZillizCloud to FTS_SUPPORTED_DBS, so the run-test UI exposes FullTextSearchPerformance cases for Zilliz Cloud consistently with CASE_CONFIG_MAP. I also added tests/test_fts_frontend_config.py to verify the FTS-supported DB set matches backends with FTS case configs and that Zilliz Cloud FTS cases are selectable. Verified with git diff --check and .venv/bin/python -m pytest -q tests/test_fts_frontend_config.py.

Correction: I amended the previous commit to keep this patch minimal. The current pushed commit is 14b86fc and only adds DB.ZillizCloud to FTS_SUPPORTED_DBS. I removed the test file from the commit as requested. Verification run locally: git diff --check, compile check for dbCaseConfigs.py, and a direct assertion that Zilliz Cloud FTS cases are selectable.

XuanYang-cn · 2026-06-25T02:32:55Z

+            raise TypeError(msg)
+        return manifest
+
+    def _load_manifest_params(self) -> None:


3. 🟡 Medium — Row-order ground-truth contract is implicit

Where: vectordb_bench/backend/dataset.py:779, :822, :920; pyproject.toml:44

What: The FTS dataset layer loads ground truth by row order, then overwrites every source document id with str(self._doc_count). Concurrent insert does not shift those IDs because ConcurrentInsertRunner pulls batches under _iter_lock and clients insert the assigned doc_ids directly.

Status: The public msmarco_small_100k/neighbors.parquet artifact confirms this is intended: query id is 0..6979, and neighbors_id holds dense doc row IDs in 0..99999. Assuming ir_datasets iteration is stable, this is not a blocker.

Why it still matters: The contract lives only in code plus artifact convention. _load_manifest_params() reads only BM25/analyzer params and ignores source_ir_dataset, doc_limit, indexed_doc_count, and query_count that build_manifest.json records.

Fix: Validate those manifest fields before accepting the ground truth, and document the row-id policy explicitly so future datasets do not mix original doc IDs with generated row IDs.

Addressed in aa21119. I documented the FTS math-GT row-ID contract at the GT load point: neighbors.parquet stores dense document row IDs, and FtsDocumentIterator assigns the same row IDs during insertion. I also added manifest validation before accepting GT params: source_ir_dataset must match the translator dataset, doc_limit/indexed_doc_count must match the configured dataset size, and query_count must match the loaded query count when present. Verification: git diff --check, compile check for dataset.py, and a direct validation script covering valid manifests plus source/doc/query-count mismatches. I also tried the broader tests/test_dataset.py, but it attempted multi-GB dataset downloads and failed due to local disk exhaustion, so I cleaned the partial /tmp/vectordb_bench artifacts.

XuanYang-cn · 2026-06-25T02:33:06Z

+def calc_recall_fts(k: int, ground_truth: list[int], got: list[int]) -> float:
+    if not ground_truth or k <= 0:
+        return 0.0
+    gt_set = set(ground_truth)


4. 🟡 Medium — -1 ground-truth padding counted as a relevant doc

Where: vectordb_bench/backend/dataset.py:760; vectordb_bench/metric.py:124

What: msmarco_small_100k/neighbors.parquet pads neighbors_id with -1. _load_math_gt_data() stringifies every neighbor and keeps -1; calc_recall_fts() builds gt_set = set(ground_truth) with no sentinel filtering.

Why it matters: Padded rows put an impossible doc ID (-1) in the denominator. A backend can never return -1, so recall on those rows is depressed by padding rather than reflecting only real ground-truth docs.

Fix: Filter -1 / "-1" when loading FTS ground truth or inside calc_recall_fts(). Add a regression test with padded GT rows.

Addressed in 87dcfa3. I filtered the FTS math-GT sentinel padding at load time in FtsDatasetManager._load_math_gt_data(), so downstream recall receives clean dense document IDs and does not count impossible -1 entries in the denominator. This keeps the metric code unchanged and affects only FTS GT loading. Verification: git diff --check, compile check for dataset.py, and a cached GT check showing MS MARCO small raw padding count 1968 and loaded padding count 0; HotpotQA small remained 0/0.

XuanYang-cn · 2026-06-25T02:33:19Z

+    if data.empty:
+        return data
+
+    data = data[data["backend"] != "Milvus"].copy()


5. 🟡 Medium — FTS result dashboard hard-codes a different backend matrix

Where: vectordb_bench/frontend/pages/full_text_search.py:25, :127

What: The result page keeps its own hard-coded BACKEND_ORDER and explicitly filters out Milvus at line 127, while the backend and run-test UI include Milvus as an FTS-supported backend.

Why it matters: The result page disagrees with the benchmark support matrix and can silently drop valid result files.

Fix: Pick one intent and make it consistent:

If this is intentionally a cloud-only published-results page, say so in the title/caption.

If it is the FTS dashboard, derive the backend list from the supported FTS DB set and stop dropping valid result files.

Addressed in ee44c28 by making the dashboard scope explicit instead of changing the published-result backend set. The page now uses the title Full Text Search Cloud Results, adds a short caption naming the published backend subset, and documents the hard-coded backend order as the cloud/service result subset. The Milvus filter remains intentional for this published-results page. Verification: git diff --check, compile check for full_text_search.py, and a direct assertion that the backend subset remains Zilliz Cloud / ElasticSearch / Vespa / TurboPuffer.

XuanYang-cn · 2026-06-25T02:33:32Z


    NewIntFilterPerformanceCase = 400
    CloudPayloadSearchCase = 500
+    FTSmsmarcoPerformance = 503


6. 🟡 Medium — Public FTS case id is still MS MARCO-specific

Where: vectordb_bench/backend/cases.py:68

What: CaseType.FTSmsmarcoPerformance now represents the generic BM25 FTS workload. The dataset is already carried separately via CaseConfig.custom_case["dataset_with_size_type"], so run identity is not lost.

Why it matters: The public workload id still says MS MARCO even when the dataset is HotpotQA. That name leaks into CLI options, serialized results, and TestResult.read_file() compatibility handling, so new FTS datasets look like special cases under an MS MARCO id.

Fix:

Rename the public case id/name to a generic workload name, e.g. FTSBm25Performance, and use it in the new UI/CLI/result paths. The case is new, so no backward-compatible FTSmsmarcoPerformance alias is needed.

Move the FTS-only IR dataset types, translators, and manager into one new file, e.g. vectordb_bench/backend/fts_dataset.py (no package/directory split needed yet).

Addressed the public case-id naming part in 623977b. I renamed the case enum/class and tracked call sites from FTSmsmarcoPerformance to FTSBm25Performance while keeping enum value 503, so the workload name is no longer MS MARCO-specific and can represent HotpotQA or future BM25 FTS datasets via dataset_with_size_type.

I intentionally did not move the FTS dataset code into a separate fts_dataset.py in this patch. I think FTS should be handled more appropriately in the future as part of a cleaner fusion with the existing vector dataset/API path, not by adding a separate file split now. For example, the runner path already shares FTS and vector flow in the same runner functions where practical, so a file move alone does not create a meaningful architectural improvement and would add import churn to this PR. Verification: git diff --check, compile checks for touched modules, and a direct check that CaseType.FTSBm25Performance.value == 503, HotpotQA instantiation works, and CLI custom-case parsing accepts FTSBm25Performance.

sre-ci-robot · 2026-06-25T02:36:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jamesgao-jpg
To complete the pull request process, please ask for approval from xuanyang-cn after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Denise2004 and others added 30 commits June 1, 2026 04:00

Add FTS support (zilliztech#713)

abefff6

Co-authored-by: zilliz <zilliz@zillizdeMacBook-Pro.local>

docs: capture fts dataset design

33d7598

docs: clarify fts bm25 scope

269b220

docs: capture fts implementation design

d8a174f

docs: add fts payload retrieval option

afa30be

docs: finalize fts implementation scope

665f8fa

docs: plan fts bm25 implementation

39ff3b8

feat: add fts workload capability boundary

fc05db8

fix: align milvus fts document id contract

cdfc774

fix: correct fts dataset qrels and caps

672d2e6

fix: remove hidden fts test dataset size

c39d7ac

fix: validate fts qrel docs in capped corpus

ffb6b3d

fix: propagate invalid fts dataset validation

7d3a4c1

feat: parameterize fts performance cases

8757aa1

refactor: use explicit workload kind in search runners

c4fa6ad

refactor: share performance lifecycle for fts

7b681c4

test: cover fts lifecycle dispatch

5e83c24

fix: clean up milvus bm25 fts support

57d5bae

fix: separate milvus fts analyzer and index params

b52b5fb

feat: add elasticcloud bm25 fts adapter

8549431

fix: harden elasticcloud fts adapter

9d4e80d

feat: add vespa bm25 fts adapter

563981b

fix: harden vespa fts adapter

8c99dac

feat: add turbopuffer bm25 fts adapter

ae11a02

fix: use turbopuffer upsert columns for fts

441cb71

fix: use turbopuffer upsert columns for vector writes

ae1c95f

feat: gate fts backend capability

f0270d7

feat: expose fts bm25 case matrix

b52b5ef

fix: gate fts ui cases by backend support

6abcdc8

fix: close fts integration gaps

eae98ac

James Gao and others added 10 commits June 24, 2026 06:29

chore: refresh FTS dashboard results

4611eb4

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

fix: satisfy FTS lint checks

40cb698

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

fix: align zilliz cloud optimize with milvus

79a038c

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

fix: exclude fts iterator prep from insert timing

8612a1e

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

fix: support turbopuffer fts backpressure control

ca95c0b

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Warm FTS document cache before timed insert

a898a64

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Add Turbopuffer FTS benchmark results

d3e7f7e

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Update ZillizCloud FTS benchmark metrics

825c7de

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Simplify FTS frontend dashboard

9333203

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Remove Vespa FTS feed client test

3042327

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

jamesgao-jpg force-pushed the fts_impl_only branch from d3e5914 to a1ee8d5 Compare June 24, 2026 15:01

Add FTS release note

001ce6c

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

jamesgao-jpg force-pushed the fts_impl_only branch from a1ee8d5 to 001ce6c Compare June 24, 2026 15:04

XuanYang-cn self-assigned this Jun 25, 2026

XuanYang-cn requested changes Jun 25, 2026

View reviewed changes

Adjust FTS load timings

8baab5c

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

jamesgao-jpg force-pushed the fts_impl_only branch from 55cf439 to 8baab5c Compare June 25, 2026 02:48

James Gao added 5 commits June 25, 2026 02:51

Sort FTS frontend bars by metric

4d82bec

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Document FTS client API contract

ea08494

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Sort FTS frontend bars by metric

be0713d

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Move FTS manifest policy into configs

2eca41d

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Enable Zilliz Cloud FTS selection

14b86fc

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

jamesgao-jpg force-pushed the fts_impl_only branch from 92ad3c4 to 14b86fc Compare June 25, 2026 03:54

James Gao added 6 commits June 25, 2026 04:25

Validate FTS ground truth manifest

aa21119

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Filter FTS ground truth padding

87dcfa3

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Clarify FTS dashboard result scope

ee44c28

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Rename FTS BM25 performance case

623977b

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Format FTS manifest config hooks

23a7948

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Tighten pymilvus lower bound

e5b26a2

Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>

Uh oh!

Conversation

jamesgao-jpg commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Summary

Backends Covered

Testing Infra Touched

Datasets Supported

Metrics

Uh oh!

XuanYang-cn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

1. 🔴 Blocking — Runner owns backend-specific FTS manifest policy

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

2. 🔴 Blocking — Run-test UI excludes Zilliz Cloud FTS

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

3. 🟡 Medium — Row-order ground-truth contract is implicit

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

4. 🟡 Medium — -1 ground-truth padding counted as a relevant doc

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

5. 🟡 Medium — FTS result dashboard hard-codes a different backend matrix

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

6. 🟡 Medium — Public FTS case id is still MS MARCO-specific

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sre-ci-robot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jamesgao-jpg commented Jun 5, 2026 •

edited

Loading

4. 🟡 Medium — `-1` ground-truth padding counted as a relevant doc