Add full-text search benchmark support#794
Conversation
Co-authored-by: zilliz <zilliz@zillizdeMacBook-Pro.local>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
d3e5914 to
a1ee8d5
Compare
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
a1ee8d5 to
001ce6c
Compare
XuanYang-cn
left a comment
There was a problem hiding this comment.
Requesting changes on the FTS benchmark correctness contract. Inline comments preserve the existing architecture review draft structure for the blocking and medium findings.
| @classmethod | ||
| def supports_full_text_search(cls) -> bool: | ||
| return False | ||
|
|
||
| def insert_documents( | ||
| self, | ||
| texts: list[str], | ||
| doc_ids: list[str], | ||
| **kwargs, | ||
| ) -> tuple[int, Exception | None]: | ||
| msg = f"{self.name or self.__class__.__name__} does not support full-text document insert" | ||
| raise NotImplementedError(msg) | ||
|
|
||
| def search_documents( | ||
| self, | ||
| query: str, | ||
| k: int = 100, | ||
| payload_profile: PayloadProfile = PayloadProfile.IDS_ONLY, | ||
| **kwargs, | ||
| ) -> list[str]: | ||
| msg = f"{self.name or self.__class__.__name__} does not support full-text document search" | ||
| raise NotImplementedError(msg) | ||
|
|
There was a problem hiding this comment.
Please add detailed doc string to guide others on how to support fts-related API.
There was a problem hiding this comment.
Addressed in ea08494. I added docstrings to the FTS extension points in VectorDB: supports_full_text_search, insert_documents, and search_documents, covering the capability gate, insert return contract, payload-profile behavior, and search return shape.
| analyzer_params = getattr(self.ca.dataset, "analyzer_params", {}) or {} | ||
| updates = {} | ||
|
|
||
| if self.config.db == DB.Milvus: |
There was a problem hiding this comment.
1. 🔴 Blocking — Runner owns backend-specific FTS manifest policy
Where: vectordb_bench/backend/task_runner.py:250
What: _apply_fts_manifest_params() branches on self.config.db == DB.Milvus / DB.ElasticCloud / DB.Vespa to translate BM25/analyzer manifest params into backend config fields.
Why it matters:
- Puts backend-specific policy inside
CaseRunner, which should only orchestrate the benchmark lifecycle. - Every new FTS-capable backend needs another runner edit, and one missed branch silently changes correctness.
DB.ZillizCloudfalls through with no manifest updates.ZillizCloudFtsConfigextends the Milvus FTS config and this PR claims Zilliz Cloud is covered, so Zilliz Cloud can run with default BM25/analyzer settings while recall is scored against ground truth built from the manifest settings.
Fix: Replace the runner branching with a config-owned hook:
db_case_config.apply_fts_manifest(manifest) -> FtsManifestApplyResultreturns the updated config plus which BM25/analyzer params were applied vs. unapplied.CaseRunnercalls this one hook beforeinit_db(), swaps in the returned config, and records the report without importing or naming any client.MilvusFtsConfigowns BM25/analyzer translation;ZillizCloudFtsConfiginherits it.ElasticCloudFtsConfig,VespaFtsConfig, andTurboPufferFtsConfigeach report supported vs. unsupported params explicitly.- Add a contract test: every FTS-capable config either applies
k1,b, analyzer params, andavgdlwhere supported, or explicitly reports them unsupported.
Why a config hook: manifest params must be applied before the client exists, so a client hook is too late; a central registry would duplicate backend config semantics and drift.
There was a problem hiding this comment.
Addressed in 2eca41d. I moved FTS manifest application out of CaseRunner and into DBCaseConfig.apply_fts_manifest(...) implementations. The runner now only prepares the FTS dataset, calls the config hook, records the returned applied/unapplied manifest report, and then initializes the backend. Milvus owns the BM25/analyzer mapping, Zilliz Cloud inherits that behavior through ZillizCloudFtsConfig(MilvusFtsConfig), Elasticsearch applies supported BM25 k1/b, and Vespa applies k1/b/avgdl while unsupported analyzer params are reported as unapplied. Verified with git diff --check, compile checks for the touched modules, a focused local manifest-hook test, and tests/test_db_client_resolution.py.
| MAX_STREAMLIT_INT = (1 << 53) - 1 | ||
|
|
||
| DB_LIST = [d for d in DB if d != DB.Test] | ||
| FTS_SUPPORTED_DBS = {DB.Milvus, DB.ElasticCloud, DB.Vespa, DB.TurboPuffer} |
There was a problem hiding this comment.
2. 🔴 Blocking — Run-test UI excludes Zilliz Cloud FTS
Where: vectordb_bench/frontend/config/dbCaseConfigs.py:15
What: FTS_SUPPORTED_DBS lists Milvus, ElasticCloud, Vespa, TurboPuffer, but not DB.ZillizCloud. Yet the same file later defines a Zilliz Cloud FTS config mapping, and the PR commits Zilliz Cloud FTS results.
Why it matters: Backend support and committed results say Zilliz Cloud FTS exists, but the run-test UI filters the FTS cases out when Zilliz Cloud is selected. The matrix disagrees with itself.
Fix:
- Add
DB.ZillizCloudtoFTS_SUPPORTED_DBS. - Add coverage for
get_selectable_case_items(): every backend that hassupports_full_text_search()and an FTS config must be selectable.
There was a problem hiding this comment.
Addressed in 92ad3c4. I added DB.ZillizCloud to FTS_SUPPORTED_DBS, so the run-test UI exposes FullTextSearchPerformance cases for Zilliz Cloud consistently with CASE_CONFIG_MAP. I also added tests/test_fts_frontend_config.py to verify the FTS-supported DB set matches backends with FTS case configs and that Zilliz Cloud FTS cases are selectable. Verified with git diff --check and .venv/bin/python -m pytest -q tests/test_fts_frontend_config.py.
There was a problem hiding this comment.
Correction: I amended the previous commit to keep this patch minimal. The current pushed commit is 14b86fc and only adds DB.ZillizCloud to FTS_SUPPORTED_DBS. I removed the test file from the commit as requested. Verification run locally: git diff --check, compile check for dbCaseConfigs.py, and a direct assertion that Zilliz Cloud FTS cases are selectable.
| raise TypeError(msg) | ||
| return manifest | ||
|
|
||
| def _load_manifest_params(self) -> None: |
There was a problem hiding this comment.
3. 🟡 Medium — Row-order ground-truth contract is implicit
Where: vectordb_bench/backend/dataset.py:779, :822, :920; pyproject.toml:44
What: The FTS dataset layer loads ground truth by row order, then overwrites every source document id with str(self._doc_count). Concurrent insert does not shift those IDs because ConcurrentInsertRunner pulls batches under _iter_lock and clients insert the assigned doc_ids directly.
Status: The public msmarco_small_100k/neighbors.parquet artifact confirms this is intended: query id is 0..6979, and neighbors_id holds dense doc row IDs in 0..99999. Assuming ir_datasets iteration is stable, this is not a blocker.
Why it still matters: The contract lives only in code plus artifact convention. _load_manifest_params() reads only BM25/analyzer params and ignores source_ir_dataset, doc_limit, indexed_doc_count, and query_count that build_manifest.json records.
Fix: Validate those manifest fields before accepting the ground truth, and document the row-id policy explicitly so future datasets do not mix original doc IDs with generated row IDs.
There was a problem hiding this comment.
Addressed in aa21119. I documented the FTS math-GT row-ID contract at the GT load point: neighbors.parquet stores dense document row IDs, and FtsDocumentIterator assigns the same row IDs during insertion. I also added manifest validation before accepting GT params: source_ir_dataset must match the translator dataset, doc_limit/indexed_doc_count must match the configured dataset size, and query_count must match the loaded query count when present. Verification: git diff --check, compile check for dataset.py, and a direct validation script covering valid manifests plus source/doc/query-count mismatches. I also tried the broader tests/test_dataset.py, but it attempted multi-GB dataset downloads and failed due to local disk exhaustion, so I cleaned the partial /tmp/vectordb_bench artifacts.
| def calc_recall_fts(k: int, ground_truth: list[int], got: list[int]) -> float: | ||
| if not ground_truth or k <= 0: | ||
| return 0.0 | ||
| gt_set = set(ground_truth) |
There was a problem hiding this comment.
4. 🟡 Medium — -1 ground-truth padding counted as a relevant doc
Where: vectordb_bench/backend/dataset.py:760; vectordb_bench/metric.py:124
What: msmarco_small_100k/neighbors.parquet pads neighbors_id with -1. _load_math_gt_data() stringifies every neighbor and keeps -1; calc_recall_fts() builds gt_set = set(ground_truth) with no sentinel filtering.
Why it matters: Padded rows put an impossible doc ID (-1) in the denominator. A backend can never return -1, so recall on those rows is depressed by padding rather than reflecting only real ground-truth docs.
Fix: Filter -1 / "-1" when loading FTS ground truth or inside calc_recall_fts(). Add a regression test with padded GT rows.
There was a problem hiding this comment.
Addressed in 87dcfa3. I filtered the FTS math-GT sentinel padding at load time in FtsDatasetManager._load_math_gt_data(), so downstream recall receives clean dense document IDs and does not count impossible -1 entries in the denominator. This keeps the metric code unchanged and affects only FTS GT loading. Verification: git diff --check, compile check for dataset.py, and a cached GT check showing MS MARCO small raw padding count 1968 and loaded padding count 0; HotpotQA small remained 0/0.
| if data.empty: | ||
| return data | ||
|
|
||
| data = data[data["backend"] != "Milvus"].copy() |
There was a problem hiding this comment.
5. 🟡 Medium — FTS result dashboard hard-codes a different backend matrix
Where: vectordb_bench/frontend/pages/full_text_search.py:25, :127
What: The result page keeps its own hard-coded BACKEND_ORDER and explicitly filters out Milvus at line 127, while the backend and run-test UI include Milvus as an FTS-supported backend.
Why it matters: The result page disagrees with the benchmark support matrix and can silently drop valid result files.
Fix: Pick one intent and make it consistent:
- If this is intentionally a cloud-only published-results page, say so in the title/caption.
- If it is the FTS dashboard, derive the backend list from the supported FTS DB set and stop dropping valid result files.
There was a problem hiding this comment.
Addressed in ee44c28 by making the dashboard scope explicit instead of changing the published-result backend set. The page now uses the title Full Text Search Cloud Results, adds a short caption naming the published backend subset, and documents the hard-coded backend order as the cloud/service result subset. The Milvus filter remains intentional for this published-results page. Verification: git diff --check, compile check for full_text_search.py, and a direct assertion that the backend subset remains Zilliz Cloud / ElasticSearch / Vespa / TurboPuffer.
|
|
||
| NewIntFilterPerformanceCase = 400 | ||
| CloudPayloadSearchCase = 500 | ||
| FTSmsmarcoPerformance = 503 |
There was a problem hiding this comment.
6. 🟡 Medium — Public FTS case id is still MS MARCO-specific
Where: vectordb_bench/backend/cases.py:68
What: CaseType.FTSmsmarcoPerformance now represents the generic BM25 FTS workload. The dataset is already carried separately via CaseConfig.custom_case["dataset_with_size_type"], so run identity is not lost.
Why it matters: The public workload id still says MS MARCO even when the dataset is HotpotQA. That name leaks into CLI options, serialized results, and TestResult.read_file() compatibility handling, so new FTS datasets look like special cases under an MS MARCO id.
Fix:
- Rename the public case id/name to a generic workload name, e.g.
FTSBm25Performance, and use it in the new UI/CLI/result paths. The case is new, so no backward-compatibleFTSmsmarcoPerformancealias is needed. - Move the FTS-only IR dataset types, translators, and manager into one new file, e.g.
vectordb_bench/backend/fts_dataset.py(no package/directory split needed yet).
There was a problem hiding this comment.
Addressed the public case-id naming part in 623977b. I renamed the case enum/class and tracked call sites from FTSmsmarcoPerformance to FTSBm25Performance while keeping enum value 503, so the workload name is no longer MS MARCO-specific and can represent HotpotQA or future BM25 FTS datasets via dataset_with_size_type.
I intentionally did not move the FTS dataset code into a separate fts_dataset.py in this patch. I think FTS should be handled more appropriately in the future as part of a cleaner fusion with the existing vector dataset/API path, not by adding a separate file split now. For example, the runner path already shares FTS and vector flow in the same runner functions where practical, so a file move alone does not create a meaningful architectural improvement and would add import churn to this PR. Verification: git diff --check, compile checks for touched modules, and a direct check that CaseType.FTSBm25Performance.value == 503, HotpotQA instantiation works, and CLI custom-case parsing accepts FTSBm25Performance.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jamesgao-jpg The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
55cf439 to
8baab5c
Compare
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
92ad3c4 to
14b86fc
Compare
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Signed-off-by: jamesgao-jpg <james.gao@zilliz.com>
Context
VDBBench did not have a dedicated native full-text search benchmark path. This PR adds FTS as a first-class benchmark workload so BM25-based text search can be evaluated through the same task, runner, dataset, frontend, and result pipeline used by the rest of VDBBench.
Summary
Backends Covered
Testing Infra Touched
Datasets Supported
Metrics