Redefinition of Curvine Mount Modes #663

szbr9486 · 2026-02-25T06:11:37Z

szbr9486
Feb 25, 2026
Maintainer

1. Overview and Design Goals

Optimize Curvine's data read/write modes into two simple mount point read/write modes:

CacheMode: Reads and writes are based on UFS (Underlying File System), providing read-only cache acceleration. Data is written directly to UFS without passing through Curvine, and UFS is highly perceptible to users. This mode can serve as a read cache accelerator and a unified proxy for UFS.
FsMode: Reads and writes are based on Curvine, with metadata managed independently by Curvine and read/write cache acceleration provided. UFS only acts as the cold storage layer for Curvine, and UFS is transparent to users. This mode better supports POSIX semantic read/write operations and accelerates the performance of large-scale file processing.

FsMode Goals	Description
Unified Entry	All operations are performed through Curvine paths; applications do not access UFS directly.
POSIX Semantics	Supports complete POSIX file system semantics (directory tree, random read/write, renaming, atomicity, strong consistency, etc.).
Tiered Storage	The Curvine layer stores hot data (metadata + optional local blocks), and UFS stores persistent/cold data replicas.
Background Flushing	The Master periodically submits Load/Dump tasks to flush Curvine data to UFS based on operations and policies.
UFS-Only Replicas	Supports scenarios where data only exists in UFS (e.g., S3), with on-demand backfilling or direct reading during data access.

2. FsMode

2.1 Semantics

FsMode refers to the mount write mode for hierarchical file systems, with the following clear semantics:

All I/O operations go through Curvine: Applications only use Curvine paths and do not access UFS directly. If Curvine is bypassed for UFS read/write operations, data consistency is not guaranteed, and users are advised to avoid such operations as much as possible.
Write path: Data is first written to Curvine (metadata + blocks), and the Master-side policy periodically submits Load/Dump tasks in the background to flush the data to UFS (e.g., S3).
Read path: Priority is given to reading from Curvine; if the data has been evicted or only exists in UFS, the data is backfilled via Load (UFS→Curvine) or read directly from UFS.
Replica state: It is allowed that data only exists in UFS, where UFS data serves as the only data replica of the file.
Metadata synchronization: The mount operation synchronizes all metadata of the directory, and no active full metadata synchronization is performed subsequently. However, a command (reload-meta) can be provided to synchronize the mount point metadata (only updating the metadata of files that exist solely in UFS to Curvine's metadata; no operations are performed on other types of files).
Cache lazy loading mode: If a file is read with no metadata in Curvine but the file exists in UFS, the read operation fails and a "file does not exist" error is reported. Users can manually trigger the reload-meta command to synchronize the metadata before re-reading the file.
Fault scenarios:
- Master failure: Users can only access data through the UFS interface.
- Worker failure: For multi-replica data, other replicas remain accessible; for single-replica data, direct reading from UFS is required.

2.2 Comparison between CacheMode and FsMode

Comparison Item	CacheMode	FsMode
Semantic Support	Only supports the native semantics of UFS	Supports complete POSIX semantics (directory tree, random read/write, renaming, atomicity, strong consistency, etc.)
Writing	Data is written directly to UFS (transparent write), with tight coupling between applications and UFS	Data is written to Curvine and asynchronously flushed to UFS by the Job Manager; applications only interact with Curvine
Metadata	Metadata is accessed directly from UFS	Metadata is maintained by the Curvine Master and periodically synchronized to UFS, with Curvine taking precedence in case of conflicts. Curvine will not actively detect metadata modifications to UFS via other interfaces.
Reading	Reads from Curvine if the data is cached; an asynchronous task is submitted to load the data into Curvine, and the current read is performed directly from UFS	Priority is given to reading from Curvine; if no data exists in Curvine, the Master marks the file as hot data and backfills it to Curvine, with the current read performed directly from UFS
Data Expiration	Deletes both metadata and data blocks in Curvine	Only deletes data blocks in Curvine, retaining metadata
Consistency	Constrained by UFS (e.g., S3's eventual consistency)	Strong consistency on the Curvine side; eventual consistency with UFS via asynchronous tasks

3. Core Processes of FsMode

3.1 Write Process

Metadata

All metadata operations (creation/deletion/renaming, etc.) access Curvine directly and are maintained by the Master.
The Master periodically synchronizes directory and file operations to UFS based on Curvine's metadata journal to keep the UFS namespace consistent with Curvine.
If conflicts are found between Curvine and UFS during synchronization (e.g., duplicate-named files, inconsistent directory structures), Curvine takes precedence and overwrites UFS directly.

Data

Data written by applications is directly persisted to Curvine (blocks allocated by the Master, stored by Workers).
The Job Manager initiates Load tasks to flush data on Curvine to UFS; the flushing trigger can be log-driven, time-scheduled, or event-driven.

3.2 Read Process

If data exists in Curvine: Read the data directly from Curvine.
If data does not exist in Curvine (e.g., only a UFS replica exists after TTL eviction):
1. The Master marks the file as hot data.
2. Submit a Load task (UFS→Curvine) to the Job Manager.
3. After the data is flushed to Curvine, read the data from Curvine (or read directly from UFS as selected by the implementation).

3.3 Data Expiration

Only delete data blocks on Curvine, without deleting metadata.
Metadata is retained to facilitate: visibility of directory listings and file attributes; submission of Load tasks from UFS for data backfilling based on metadata during subsequent read operations.

4. Recommended Extension Points (Configuration and Code)

The following are design-level recommendations and do not involve specific code implementation details.

4.1 Mount Configuration

Add a new enumeration value FsMode to WriteType.
Specify write_type=FsMode (or an equivalent configuration) during mounting, indicating that the mount point uses tiered semantics: write to Curvine, flush to UFS in the background, and allow UFS-only replicas.

4.2 Master-Side Policies

Flushing policy: Based on existing journals, TTL (Time-To-Live) or scheduled tasks, generate LoadJobCommand (source=Curvine, target=UFS) for paths under FsMode mounting periodically or event-driven, and reuse the existing submit_load_job and Worker Load processes.
UFS-only replicas: Reuse the existing TTL + Export and block eviction logic; FsMode only clarifies this behavior as the "expected" storage state.

4.3 Client

UnifiedFileSystem: When write_type == FsMode, data is first written to Curvine, with flushing driven by the Master side. After completion, the client can optionally trigger a single submit_load (compatible with existing behavior) or rely entirely on the Master for periodic flushing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redefinition of Curvine Mount Modes #663

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Redefinition of Curvine Mount Modes #663

Uh oh!

Uh oh!

szbr9486 Feb 25, 2026 Maintainer

1. Overview and Design Goals

2. FsMode

2.1 Semantics

2.2 Comparison between CacheMode and FsMode

3. Core Processes of FsMode

3.1 Write Process

Metadata

Data

3.2 Read Process

3.3 Data Expiration

4. Recommended Extension Points (Configuration and Code)

4.1 Mount Configuration

4.2 Master-Side Policies

4.3 Client

Replies: 0 comments

szbr9486
Feb 25, 2026
Maintainer