Skip to content

Refactor merge logic to use one abstruct #5622

@yanghua

Description

@yanghua

To track this issue: #5117 (comment)

cpoied here:

  • The partition sources of the single-machine and the distributed mode are different:
    • for the single machine: the data comes from the memory;
    • for distributed: the data is persisted into the storage;
  • About the file writer initatization they are also different:
    • for the single machine, it is initialized in the merge_partitions method;
    • for the distributed: we implement via init_writer_for_flat/pq/sq for different vector index type;
  • About the logic of the merger, they are also different due to different partition data sources.

For different partition sources, it would be better to abstract a PartitionSource trait. After that, we can introduce a UnifiedPartitionMerger to do a general merger. Introducing a StorageWriterFactory to create different writers.

For a merger, the generic logic can be split into four common steps:

  • create merger;
  • instantiate merger;
  • merger#merge();
  • write final metadata;

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions