Skip to content

Conversation

@bryndenZh
Copy link

Purpose

In high-concurrency point query scenarios on the primary key table, we observed high CPU usage mainly caused by deserialization overhead of DV metadata. Currently, reading deletion vector metadata for a single bucket requires reading and deserializing a large number of entries from the index manifest, if the table has many partition and buckets.
image

This PR introduces a bucket-level dv meta cache which reduces CPU load and significantly improves QPS for single-bucket query scenarios on primary key tables.

Tests

API and Format

Documentation


@Nullable
// Construct DataFile -> DeletionFile based on IndexFileMeta
public Map<String, DeletionFile> extractDeletionFileByMeta(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove public and add @VisibleForTesting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove public will lead to compilation problems

DELETION_VECTORS_INDEX,
partitionBuckets.stream().map(Pair::getLeft).collect(Collectors.toSet()));
Map<Pair<BinaryRow, Integer>, Map<String, DeletionFile>> result = new HashMap<>();
partitionBuckets.forEach(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use partitionFileMetas.forEach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unnecessary buckets need to be filtered out?


public DVMetaCache(long maxElementSize) {
this.cache =
Caffeine.newBuilder().maximumSize(maxElementSize).executor(Runnable::run).build();
Copy link
Contributor

@JingsongLi JingsongLi Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be set to max number of DVMetaCacheValue? And use softValues?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added softValues. Do you mean to use List<DVMetaCacheValue> as the weight? I think the List might be empty, and those cache items won't be restricted because of the 0 weight

@bryndenZh bryndenZh requested a review from JingsongLi October 24, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants