Skip to content

Conversation

TheAssembler1
Copy link
Collaborator

@TheAssembler1 TheAssembler1 commented Aug 6, 2025

Adds the following enum so the storage strategy can be selected:

typedef enum pdc_region_writeout_strategy {
    /**
     * Store data as multiple regions inside a single file.
     * Overlapping writes that are not fully contained append new regions
     * to the end of the file, with metadata tracking region locations.
     * Supports incremental updates without rewriting large parts of the file.
     */
    STORE_REGION_BY_REGION_SINGLE_FILE = 0,

    /**
     * Store the entire object as a single flat file.
     * Reads and writes operate by seeking directly within the file.
     * No region metadata bookkeeping; simpler but less flexible for partial updates.
     */
    STORE_FLATTENED_SINGLE_FILE,

    /**
     * Store each flattened region in its own separate file.
     * Enables independent file management per region.
     */
    STORE_FLATTENED_REGION_PER_FILE
} pdc_region_writeout_strategy;

The STORE_REGION_BY_REGION_SINGLE_FILE is the default strategy. The STORE_FLATTENED_REGION_PER_FILE is the new strategy which stores each region of an object in a separate file. The region size the object is sliced into is decided in:

/**
 * Used decide how to split object into chunks each of which will be a file on disk
 */
static perr_t
PDC_shrink_file_dims(uint64_t *temp_file_dims, const uint64_t *obj_dims, uint8_t obj_ndim, size_t unit)

By default it will try to slice the object into regions that are 4 MB in size by halving the largest dimension of the object iteratively until within the <= 4 MB.

This is set here uint64_t max_bytes_per_file = 4ULL * 1024 * 1024; within the PDC_shrink_file_dims function.

@TheAssembler1
Copy link
Collaborator Author

We might want to compare the performance between the storage strategies before merging.

@TheAssembler1 TheAssembler1 marked this pull request as ready for review September 18, 2025 19:29
@TheAssembler1 TheAssembler1 requested a review from a team as a code owner September 18, 2025 19:29
@TheAssembler1 TheAssembler1 changed the title Draft: Region Per File Region Per File Sep 18, 2025
@TheAssembler1 TheAssembler1 changed the title Region Per File Region per file storage strategy Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant