Skip to content

[RFC] Enhance Filesystem Tests: Comprehensive Stress and Advanced Testing Coverage #1309

@priyama2

Description

@priyama2

@pevik

Abstract

The current LTP filesystem test suite (testcases/kernel/fs/) provides solid functional validation of core filesystem operations. However, there's significant opportunity to expand test coverage to include intensive stress testing, concurrent operations, boundary conditions, and advanced filesystem features that are critical for production environments.

This proposal outlines a comprehensive enhancement plan to expand the LTP filesystem tests into a more robust testing suite that can:

  • Detect race conditions through concurrent I/O operations
  • Validate filesystem behavior at boundary limits (max file sizes, path lengths, inode exhaustion)
  • Test corruption recovery and fault tolerance
  • Measure performance and scalability characteristics
  • Verify security and permission models under stress
  • Test filesystem-specific advanced features (CoW, compression, encryption)

The enhancements are organized into 12 major categories with 100+ specific test scenarios, complete with implementation examples and a phased rollout plan.


Motivation

Current Test Coverage

The existing filesystem tests effectively validate:

  • Basic file operations (read, write, open, close)
  • Directory operations (mkdir, rmdir, readdir)
  • File attributes (chmod, chown, utime)
  • Links (symlink, hardlink)
  • Special files (pipes, sockets, device files)

Opportunity for Enhancement

Production filesystems face scenarios that aren't fully covered:

  • Hundreds of concurrent processes accessing files simultaneously
  • Multi-terabyte files and millions of small files
  • Sudden power losses and hardware failures
  • Memory pressure and resource exhaustion
  • Security attacks and permission boundary testing
  • 24/7 sustained workloads causing fragmentation

Expanding test coverage to include these scenarios will significantly improve filesystem reliability and robustness.


Proposed Enhancement Categories

1. Concurrent I/O and Race Condition Testing

Test IDs: fs_concurrent_01 through fs_concurrent_10

Key Scenarios:

  • Multiple threads reading/writing same file simultaneously
  • Concurrent directory operations (create/delete/rename)
  • Parallel file creation in same directory (stress inode allocation)
  • Race between unlink and open operations
  • Fork bomb stress testing
  • Lock contention measurement

Value: Detects data corruption, deadlocks, and race conditions that only appear under concurrent load.


2. Filesystem Boundary and Limit Testing

Test IDs: fs_maxsize_01, fs_pathlen_01, fs_inode_exhaust_01, fs_diskfull_01

Key Scenarios:

  • Maximum file size testing (16TB for ext4, 8EB for xfs)
  • PATH_MAX boundary testing (4096 bytes)
  • Deeply nested directories (directory depth limits)
  • Symlink chain length limits (SYMLOOP_MAX)
  • Filename length limits (NAME_MAX = 255)
  • Inode exhaustion testing
  • Disk space exhaustion (100% full filesystem)

Value: Ensures filesystems handle boundary conditions gracefully and return appropriate errors.


3. Filesystem Corruption and Recovery Testing

Test IDs: fs_powerloss_01, fs_corruption_01, fs_fsck_stress_01

Key Scenarios:

  • Sudden power loss simulation (using dm-flakey)
  • Journal recovery testing (ext4, xfs)
  • fsck effectiveness validation
  • Bit flip and corruption detection
  • Checksum validation (btrfs, zfs)
  • Orphaned inode handling
  • Cross-linked file detection

Value: Validates filesystem resilience and data integrity guarantees.


4. Memory Pressure and Resource Exhaustion

Test IDs: fs_lowmem_01, fs_fd_exhaust_01, fs_buffer_exhaust_01

Key Scenarios:

  • Filesystem operations under low memory conditions
  • Page cache eviction behavior
  • Buffer cache stress testing
  • Memory allocation failure handling
  • File descriptor exhaustion (EMFILE)
  • Kernel buffer exhaustion
  • Memory cgroup limit testing

Value: Tests filesystem behavior when system resources are constrained.


5. Performance and Scalability Testing

Test IDs: fs_iopattern_01, fs_metadata_perf_01, fs_scale_01

Key Scenarios:

  • Sequential vs random I/O patterns
  • Small file (4KB) vs large file (1GB+) performance
  • Direct I/O vs buffered I/O comparison
  • Metadata operation performance (create/delete/stat 1M files)
  • Throughput, IOPS, latency measurement (p50, p95, p99)
  • Scalability with increasing file counts (1K to 10M files)
  • CPU and memory utilization profiling

Value: Provides performance baselines and detects regressions.


6. Security and Permission Testing

Test IDs: fs_perm_boundary_01, fs_privesc_01, fs_namespace_01

Key Scenarios:

  • All permission combinations (000 to 777)
  • setuid/setgid/sticky bit behavior
  • ACL (Access Control List) edge cases
  • Capability-based access testing
  • Privilege escalation detection (TOCTOU vulnerabilities)
  • Symlink attack prevention
  • Namespace and container isolation (mount/user namespaces)

Value: Ensures filesystem security models are robust against attacks.


7. Filesystem-Specific Advanced Features

Test IDs: fs_xattr_stress_01, fs_cow_01, fs_compress_01, fs_dedup_01, fs_encrypt_01

Key Scenarios:

  • Extended Attributes (xattr): Maximum size, namespace testing, concurrent operations
  • Copy-on-Write (CoW): Reflink, snapshots, clone operations (btrfs, zfs)
  • Compression: Transparent compression, performance impact (btrfs, zfs, f2fs)
  • Deduplication: Inline/background dedup, space savings (btrfs, zfs)
  • Encryption: Per-directory encryption, key management (ext4, f2fs with fscrypt)

Value: Validates advanced features that differentiate modern filesystems.


8. Long-Running Stress Tests

Test IDs: fs_sustained_01, fs_aging_01

Key Scenarios:

  • Sustained mixed workload for 24+ hours
  • Memory leak detection
  • Performance degradation monitoring
  • Filesystem aging effects
  • Fragmentation impact measurement
  • Create/delete cycles to fragment filesystem

Value: Detects issues that only appear after extended operation.


9. Error Injection and Fault Tolerance

Test IDs: fs_ioerror_01, fs_nfs_error_01

Key Scenarios:

  • I/O error injection (read/write errors using dm-error)
  • Network filesystem errors (NFS, CIFS)
  • Server unavailability simulation
  • Timeout handling
  • Stale file handle recovery
  • Lock recovery after failures

Value: Tests error handling and recovery mechanisms.


10. Special File and Feature Testing

Test IDs: fs_sparse_01, fs_mmap_stress_01, fs_directio_01, fs_aio_01

Key Scenarios:

  • Sparse Files: Extremely large sparse files (TB size), hole punching, SEEK_HOLE/SEEK_DATA
  • Memory-Mapped Files: Large mmap operations, concurrent mmap, msync behavior
  • Direct I/O: O_DIRECT with alignment requirements, mixed buffered/direct I/O
  • Asynchronous I/O: io_submit/io_getevents, io_uring (modern async I/O)
  • File locking: flock, fcntl locks, mandatory locking

Value: Tests advanced I/O patterns used by databases and high-performance applications.


11. Filesystem Monitoring and Observability

Test IDs: fs_stats_01, fs_notify_01

Key Scenarios:

  • Filesystem statistics validation (df, du accuracy)
  • statfs/statvfs correctness
  • Inode count reporting
  • Quota reporting accuracy
  • inotify/fanotify stress testing (many watches)
  • Event overflow handling

Value: Ensures monitoring tools receive accurate information.


12. Cross-Filesystem Testing

Test IDs: fs_migrate_01, fs_compare_01

Key Scenarios:

  • Filesystem migration testing (copy between different fs types)
  • Data integrity verification after migration
  • Metadata preservation (timestamps, permissions, xattrs)
  • Performance comparison across filesystems (ext4 vs xfs vs btrfs)
  • Feature support matrix generation
  • Error handling behavior comparison

Value: Helps users choose appropriate filesystems for their workloads.


Example Implementation

Here's a concrete example of a concurrent write stress test:

/*
 * Test: fs_concurrent_write_01
 * Description: Multiple threads writing to same file simultaneously
 * Expected: No data corruption, proper locking behavior
 */

#include "tst_test.h"
#include <pthread.h>

#define NUM_THREADS 100
#define WRITES_PER_THREAD 1000
#define WRITE_SIZE 4096

static int fd;
static char *filename = "testfile";

struct thread_data {
    int thread_id;
    int errors;
};

static void *write_worker(void *arg)
{
    struct thread_data *data = (struct thread_data *)arg;
    char buffer[WRITE_SIZE];
    int i;
    
    memset(buffer, data->thread_id, WRITE_SIZE);
    
    for (i = 0; i < WRITES_PER_THREAD; i++) {
        off_t offset = (data->thread_id * WRITES_PER_THREAD + i) * WRITE_SIZE;
        
        if (pwrite(fd, buffer, WRITE_SIZE, offset) != WRITE_SIZE) {
            data->errors++;
        }
    }
    
    return NULL;
}

static void verify_data_integrity(void)
{
    char buffer[WRITE_SIZE];
    int i, j;
    
    for (i = 0; i < NUM_THREADS; i++) {
        for (j = 0; j < WRITES_PER_THREAD; j++) {
            off_t offset = (i * WRITES_PER_THREAD + j) * WRITE_SIZE;
            
            SAFE_PREAD(1, fd, buffer, WRITE_SIZE, offset);
            
            for (int k = 1; k < WRITE_SIZE; k++) {
                if (buffer[k] != buffer[0]) {
                    tst_res(TFAIL, "Data corruption at offset %ld", offset);
                    return;
                }
            }
        }
    }
    
    tst_res(TPASS, "No data corruption detected");
}

static void run_test(void)
{
    pthread_t threads[NUM_THREADS];
    struct thread_data thread_data[NUM_THREADS];
    int i;
    
    fd = SAFE_OPEN(filename, O_CREAT | O_RDWR, 0644);
    
    for (i = 0; i < NUM_THREADS; i++) {
        thread_data[i].thread_id = i;
        thread_data[i].errors = 0;
        pthread_create(&threads[i], NULL, write_worker, &thread_data[i]);
    }
    
    for (i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        if (thread_data[i].errors > 0) {
            tst_res(TFAIL, "Thread %d had %d errors", i, thread_data[i].errors);
        }
    }
    
    verify_data_integrity();
    SAFE_CLOSE(fd);
}

static struct tst_test test = {
    .test_all = run_test,
    .needs_tmpdir = 1,
};

Testing Infrastructure Requirements

Hardware

  • Multiple storage types (HDD, SSD, NVMe)
  • Systems with varying RAM (2GB to 128GB+)
  • Multi-core systems for concurrency testing

Software

  • Kernel versions: 4.19 LTS, 5.4 LTS, 5.10 LTS, 5.15 LTS, latest stable
  • Filesystem tools: mkfs, fsck, tune2fs, xfs_repair, btrfs-progs
  • Monitoring: iostat, vmstat, perf, ftrace
  • Fault injection: dm-flakey, dm-error, fail_make_request

CI/CD Integration

  • Automated test execution on kernel updates
  • Performance regression detection
  • Result tracking and historical comparison
  • HTML/PDF report generation

Expected Benefits

  1. Improved Filesystem Quality: Catch bugs before production deployment
  2. Performance Validation: Ensure filesystems meet performance requirements
  3. Regression Prevention: Detect performance and functional regressions early
  4. Security Hardening: Identify and prevent security vulnerabilities
  5. Better Documentation: Test cases serve as usage examples
  6. Industry Standard: Strengthen LTP as the comprehensive filesystem testing suite

Request for Feedback

I'm seeking feedback from LTP maintainers and the community on:

  1. Scope: Is this enhancement plan aligned with LTP's goals?
  2. Priorities: Which categories should be implemented first?
  3. Implementation: Should this be:
    • Integrated into existing testcases/kernel/fs/ structure?
    • Created as a new testcases/kernel/fs/stress/ directory?
    • Developed as a separate test suite?
  4. Contribution Process: What's the preferred way to contribute these tests?
    • Single large patch series?
    • Incremental patches by category?
    • RFC patches for review first?
  5. Compatibility: Any concerns about backward compatibility or test infrastructure changes?

I'm ready to contribute these enhancements and would appreciate guidance on the best approach to integrate them into LTP.

References


Author: Priya A (git id: priyama2)
Contact: priyama2@in.ibm.com
Date: 2026-04-28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions