[RFC] Enhance Filesystem Tests: Comprehensive Stress and Advanced Testing Coverage

@pevik 

## Abstract

The current LTP filesystem test suite (`testcases/kernel/fs/`) provides solid functional validation of core filesystem operations. However, there's significant opportunity to expand test coverage to include **intensive stress testing**, **concurrent operations**, **boundary conditions**, and **advanced filesystem features** that are critical for production environments.

This proposal outlines a comprehensive enhancement plan to expand the LTP filesystem tests into a more robust testing suite that can:
- Detect race conditions through concurrent I/O operations
- Validate filesystem behavior at boundary limits (max file sizes, path lengths, inode exhaustion)
- Test corruption recovery and fault tolerance
- Measure performance and scalability characteristics
- Verify security and permission models under stress
- Test filesystem-specific advanced features (CoW, compression, encryption)

The enhancements are organized into **12 major categories** with **100+ specific test scenarios**, complete with implementation examples and a phased rollout plan.

---

## Motivation

### Current Test Coverage
The existing filesystem tests effectively validate:
- Basic file operations (read, write, open, close)
- Directory operations (mkdir, rmdir, readdir)
- File attributes (chmod, chown, utime)
- Links (symlink, hardlink)
- Special files (pipes, sockets, device files)

### Opportunity for Enhancement
Production filesystems face scenarios that aren't fully covered:
- Hundreds of concurrent processes accessing files simultaneously
- Multi-terabyte files and millions of small files
- Sudden power losses and hardware failures
- Memory pressure and resource exhaustion
- Security attacks and permission boundary testing
- 24/7 sustained workloads causing fragmentation

Expanding test coverage to include these scenarios will significantly improve filesystem reliability and robustness.

---

## Proposed Enhancement Categories

### 1. Concurrent I/O and Race Condition Testing
**Test IDs**: `fs_concurrent_01` through `fs_concurrent_10`

**Key Scenarios**:
- Multiple threads reading/writing same file simultaneously
- Concurrent directory operations (create/delete/rename)
- Parallel file creation in same directory (stress inode allocation)
- Race between unlink and open operations
- Fork bomb stress testing
- Lock contention measurement

**Value**: Detects data corruption, deadlocks, and race conditions that only appear under concurrent load.

---

### 2. Filesystem Boundary and Limit Testing
**Test IDs**: `fs_maxsize_01`, `fs_pathlen_01`, `fs_inode_exhaust_01`, `fs_diskfull_01`

**Key Scenarios**:
- Maximum file size testing (16TB for ext4, 8EB for xfs)
- PATH_MAX boundary testing (4096 bytes)
- Deeply nested directories (directory depth limits)
- Symlink chain length limits (SYMLOOP_MAX)
- Filename length limits (NAME_MAX = 255)
- Inode exhaustion testing
- Disk space exhaustion (100% full filesystem)

**Value**: Ensures filesystems handle boundary conditions gracefully and return appropriate errors.

---

### 3. Filesystem Corruption and Recovery Testing
**Test IDs**: `fs_powerloss_01`, `fs_corruption_01`, `fs_fsck_stress_01`

**Key Scenarios**:
- Sudden power loss simulation (using dm-flakey)
- Journal recovery testing (ext4, xfs)
- fsck effectiveness validation
- Bit flip and corruption detection
- Checksum validation (btrfs, zfs)
- Orphaned inode handling
- Cross-linked file detection

**Value**: Validates filesystem resilience and data integrity guarantees.

---

### 4. Memory Pressure and Resource Exhaustion
**Test IDs**: `fs_lowmem_01`, `fs_fd_exhaust_01`, `fs_buffer_exhaust_01`

**Key Scenarios**:
- Filesystem operations under low memory conditions
- Page cache eviction behavior
- Buffer cache stress testing
- Memory allocation failure handling
- File descriptor exhaustion (EMFILE)
- Kernel buffer exhaustion
- Memory cgroup limit testing

**Value**: Tests filesystem behavior when system resources are constrained.

---

### 5. Performance and Scalability Testing
**Test IDs**: `fs_iopattern_01`, `fs_metadata_perf_01`, `fs_scale_01`

**Key Scenarios**:
- Sequential vs random I/O patterns
- Small file (4KB) vs large file (1GB+) performance
- Direct I/O vs buffered I/O comparison
- Metadata operation performance (create/delete/stat 1M files)
- Throughput, IOPS, latency measurement (p50, p95, p99)
- Scalability with increasing file counts (1K to 10M files)
- CPU and memory utilization profiling

**Value**: Provides performance baselines and detects regressions.

---

### 6. Security and Permission Testing
**Test IDs**: `fs_perm_boundary_01`, `fs_privesc_01`, `fs_namespace_01`

**Key Scenarios**:
- All permission combinations (000 to 777)
- setuid/setgid/sticky bit behavior
- ACL (Access Control List) edge cases
- Capability-based access testing
- Privilege escalation detection (TOCTOU vulnerabilities)
- Symlink attack prevention
- Namespace and container isolation (mount/user namespaces)

**Value**: Ensures filesystem security models are robust against attacks.

---

### 7. Filesystem-Specific Advanced Features
**Test IDs**: `fs_xattr_stress_01`, `fs_cow_01`, `fs_compress_01`, `fs_dedup_01`, `fs_encrypt_01`

**Key Scenarios**:
- **Extended Attributes (xattr)**: Maximum size, namespace testing, concurrent operations
- **Copy-on-Write (CoW)**: Reflink, snapshots, clone operations (btrfs, zfs)
- **Compression**: Transparent compression, performance impact (btrfs, zfs, f2fs)
- **Deduplication**: Inline/background dedup, space savings (btrfs, zfs)
- **Encryption**: Per-directory encryption, key management (ext4, f2fs with fscrypt)

**Value**: Validates advanced features that differentiate modern filesystems.

---

### 8. Long-Running Stress Tests
**Test IDs**: `fs_sustained_01`, `fs_aging_01`

**Key Scenarios**:
- Sustained mixed workload for 24+ hours
- Memory leak detection
- Performance degradation monitoring
- Filesystem aging effects
- Fragmentation impact measurement
- Create/delete cycles to fragment filesystem

**Value**: Detects issues that only appear after extended operation.

---

### 9. Error Injection and Fault Tolerance
**Test IDs**: `fs_ioerror_01`, `fs_nfs_error_01`

**Key Scenarios**:
- I/O error injection (read/write errors using dm-error)
- Network filesystem errors (NFS, CIFS)
- Server unavailability simulation
- Timeout handling
- Stale file handle recovery
- Lock recovery after failures

**Value**: Tests error handling and recovery mechanisms.

---

### 10. Special File and Feature Testing
**Test IDs**: `fs_sparse_01`, `fs_mmap_stress_01`, `fs_directio_01`, `fs_aio_01`

**Key Scenarios**:
- **Sparse Files**: Extremely large sparse files (TB size), hole punching, SEEK_HOLE/SEEK_DATA
- **Memory-Mapped Files**: Large mmap operations, concurrent mmap, msync behavior
- **Direct I/O**: O_DIRECT with alignment requirements, mixed buffered/direct I/O
- **Asynchronous I/O**: io_submit/io_getevents, io_uring (modern async I/O)
- **File locking**: flock, fcntl locks, mandatory locking

**Value**: Tests advanced I/O patterns used by databases and high-performance applications.

---

### 11. Filesystem Monitoring and Observability
**Test IDs**: `fs_stats_01`, `fs_notify_01`

**Key Scenarios**:
- Filesystem statistics validation (df, du accuracy)
- statfs/statvfs correctness
- Inode count reporting
- Quota reporting accuracy
- inotify/fanotify stress testing (many watches)
- Event overflow handling

**Value**: Ensures monitoring tools receive accurate information.

---

### 12. Cross-Filesystem Testing
**Test IDs**: `fs_migrate_01`, `fs_compare_01`

**Key Scenarios**:
- Filesystem migration testing (copy between different fs types)
- Data integrity verification after migration
- Metadata preservation (timestamps, permissions, xattrs)
- Performance comparison across filesystems (ext4 vs xfs vs btrfs)
- Feature support matrix generation
- Error handling behavior comparison

**Value**: Helps users choose appropriate filesystems for their workloads.

---

## Example Implementation

Here's a concrete example of a concurrent write stress test:

```c
/*
 * Test: fs_concurrent_write_01
 * Description: Multiple threads writing to same file simultaneously
 * Expected: No data corruption, proper locking behavior
 */

#include "tst_test.h"
#include <pthread.h>

#define NUM_THREADS 100
#define WRITES_PER_THREAD 1000
#define WRITE_SIZE 4096

static int fd;
static char *filename = "testfile";

struct thread_data {
    int thread_id;
    int errors;
};

static void *write_worker(void *arg)
{
    struct thread_data *data = (struct thread_data *)arg;
    char buffer[WRITE_SIZE];
    int i;
    
    memset(buffer, data->thread_id, WRITE_SIZE);
    
    for (i = 0; i < WRITES_PER_THREAD; i++) {
        off_t offset = (data->thread_id * WRITES_PER_THREAD + i) * WRITE_SIZE;
        
        if (pwrite(fd, buffer, WRITE_SIZE, offset) != WRITE_SIZE) {
            data->errors++;
        }
    }
    
    return NULL;
}

static void verify_data_integrity(void)
{
    char buffer[WRITE_SIZE];
    int i, j;
    
    for (i = 0; i < NUM_THREADS; i++) {
        for (j = 0; j < WRITES_PER_THREAD; j++) {
            off_t offset = (i * WRITES_PER_THREAD + j) * WRITE_SIZE;
            
            SAFE_PREAD(1, fd, buffer, WRITE_SIZE, offset);
            
            for (int k = 1; k < WRITE_SIZE; k++) {
                if (buffer[k] != buffer[0]) {
                    tst_res(TFAIL, "Data corruption at offset %ld", offset);
                    return;
                }
            }
        }
    }
    
    tst_res(TPASS, "No data corruption detected");
}

static void run_test(void)
{
    pthread_t threads[NUM_THREADS];
    struct thread_data thread_data[NUM_THREADS];
    int i;
    
    fd = SAFE_OPEN(filename, O_CREAT | O_RDWR, 0644);
    
    for (i = 0; i < NUM_THREADS; i++) {
        thread_data[i].thread_id = i;
        thread_data[i].errors = 0;
        pthread_create(&threads[i], NULL, write_worker, &thread_data[i]);
    }
    
    for (i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        if (thread_data[i].errors > 0) {
            tst_res(TFAIL, "Thread %d had %d errors", i, thread_data[i].errors);
        }
    }
    
    verify_data_integrity();
    SAFE_CLOSE(fd);
}

static struct tst_test test = {
    .test_all = run_test,
    .needs_tmpdir = 1,
};
```

---

## Testing Infrastructure Requirements

### Hardware
- Multiple storage types (HDD, SSD, NVMe)
- Systems with varying RAM (2GB to 128GB+)
- Multi-core systems for concurrency testing

### Software
- Kernel versions: 4.19 LTS, 5.4 LTS, 5.10 LTS, 5.15 LTS, latest stable
- Filesystem tools: mkfs, fsck, tune2fs, xfs_repair, btrfs-progs
- Monitoring: iostat, vmstat, perf, ftrace
- Fault injection: dm-flakey, dm-error, fail_make_request

### CI/CD Integration
- Automated test execution on kernel updates
- Performance regression detection
- Result tracking and historical comparison
- HTML/PDF report generation

---

## Expected Benefits

1. **Improved Filesystem Quality**: Catch bugs before production deployment
2. **Performance Validation**: Ensure filesystems meet performance requirements
3. **Regression Prevention**: Detect performance and functional regressions early
4. **Security Hardening**: Identify and prevent security vulnerabilities
5. **Better Documentation**: Test cases serve as usage examples
6. **Industry Standard**: Strengthen LTP as the comprehensive filesystem testing suite

---

## Request for Feedback

I'm seeking feedback from LTP maintainers and the community on:

1. **Scope**: Is this enhancement plan aligned with LTP's goals?
2. **Priorities**: Which categories should be implemented first?
3. **Implementation**: Should this be:
   - Integrated into existing `testcases/kernel/fs/` structure?
   - Created as a new `testcases/kernel/fs/stress/` directory?
   - Developed as a separate test suite?
4. **Contribution Process**: What's the preferred way to contribute these tests?
   - Single large patch series?
   - Incremental patches by category?
   - RFC patches for review first?
5. **Compatibility**: Any concerns about backward compatibility or test infrastructure changes?

---
I'm ready to contribute these enhancements and would appreciate guidance on the best approach to integrate them into LTP.
---

## References

- LTP Filesystem Tests: https://github.com/linux-test-project/ltp/tree/master/testcases/kernel/fs
- xfstests: https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
- Linux Kernel Documentation: https://www.kernel.org/doc/html/latest/filesystems/
- Device Mapper Documentation: https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/

---

**Author**: Priya A (git id: priyama2)  
**Contact**: priyama2@in.ibm.com 
**Date**: 2026-04-28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Enhance Filesystem Tests: Comprehensive Stress and Advanced Testing Coverage #1309

Abstract

Motivation

Current Test Coverage

Opportunity for Enhancement

Proposed Enhancement Categories

1. Concurrent I/O and Race Condition Testing

2. Filesystem Boundary and Limit Testing

3. Filesystem Corruption and Recovery Testing

4. Memory Pressure and Resource Exhaustion

5. Performance and Scalability Testing

6. Security and Permission Testing

7. Filesystem-Specific Advanced Features

8. Long-Running Stress Tests

9. Error Injection and Fault Tolerance

10. Special File and Feature Testing

11. Filesystem Monitoring and Observability

12. Cross-Filesystem Testing

Example Implementation

Testing Infrastructure Requirements

Hardware

Software

CI/CD Integration

Expected Benefits

Request for Feedback

I'm ready to contribute these enhancements and would appreciate guidance on the best approach to integrate them into LTP.

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RFC] Enhance Filesystem Tests: Comprehensive Stress and Advanced Testing Coverage #1309

Description

Abstract

Motivation

Current Test Coverage

Opportunity for Enhancement

Proposed Enhancement Categories

1. Concurrent I/O and Race Condition Testing

2. Filesystem Boundary and Limit Testing

3. Filesystem Corruption and Recovery Testing

4. Memory Pressure and Resource Exhaustion

5. Performance and Scalability Testing

6. Security and Permission Testing

7. Filesystem-Specific Advanced Features

8. Long-Running Stress Tests

9. Error Injection and Fault Tolerance

10. Special File and Feature Testing

11. Filesystem Monitoring and Observability

12. Cross-Filesystem Testing

Example Implementation

Testing Infrastructure Requirements

Hardware

Software

CI/CD Integration

Expected Benefits

Request for Feedback

I'm ready to contribute these enhancements and would appreciate guidance on the best approach to integrate them into LTP.

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions