-
Notifications
You must be signed in to change notification settings - Fork 530
Open
Description
To track this issue: #5117 (comment)
cpoied here:
- The partition sources of the single-machine and the distributed mode are different:
- for the single machine: the data comes from the memory;
- for distributed: the data is persisted into the storage;
- About the file writer initatization they are also different:
- for the single machine, it is initialized in the
merge_partitionsmethod; - for the distributed: we implement via
init_writer_for_flat/pq/sqfor different vector index type;
- for the single machine, it is initialized in the
- About the logic of the merger, they are also different due to different partition data sources.
For different partition sources, it would be better to abstract a PartitionSource trait. After that, we can introduce a UnifiedPartitionMerger to do a general merger. Introducing a StorageWriterFactory to create different writers.
For a merger, the generic logic can be split into four common steps:
- create merger;
- instantiate merger;
- merger#merge();
- write final metadata;
Metadata
Metadata
Assignees
Labels
No labels