Skip to content

This directory contains a comprehensive benchmarking system for comparing FastCSV extension performance against native SplFileObject implementation, designed to provide accurate and credible performance measurements.

Notifications You must be signed in to change notification settings

csvtoolkit/benchmarking-php-fastcsv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CSV Performance Benchmark Suite

This directory contains a comprehensive benchmarking system for comparing FastCSV extension performance against native SplFileObject implementation, designed to provide accurate and credible performance measurements.

🎯 What This Demonstrates

  • FastCSV Extension: 3.6x to 4.8x faster performance for read operations
  • Combined Operations: 1.6x to 2.9x faster for read/write operations
  • Memory Efficiency: Both implementations use constant memory (streaming)
  • Intelligent Fallback: Seamless performance optimization in csv-helper package

πŸ“Š Latest Benchmark Results (June 27, 2025)

Performance Comparison

Operation Data Size FastCSV SplFileObject Speed Improvement
Read Small 3.67ms 15.03ms 4.1x faster
Medium 176ms 640ms 3.6x faster
Large 1,987ms 9,469ms 4.8x faster
Both Small 22.8ms 35.5ms 1.6x faster
Medium 591ms 1,469ms 2.5x faster
Large 7,089ms 20,513ms 2.9x faster

Throughput Achievements

  • FastCSV Read: 272K-568K records/second
  • SplFileObject Read: 67K-156K records/second
  • FastCSV Combined: 88K-339K records/second
  • SplFileObject Combined: 56K-136K records/second

Key Findings

  • βœ… Read Operations: FastCSV shows 3.6x to 4.8x performance improvement
  • βœ… Scalability: Performance advantage increases with data size
  • βœ… Memory Efficiency: Constant ~2MB peak memory usage regardless of file size
  • βœ… Consistency: FastCSV shows lower standard deviation, indicating more predictable performance

πŸ“ Repository Structure

bench/
β”œβ”€β”€ .gitignore              # Excludes generated results and data
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ Makefile               # Automated benchmark commands
β”œβ”€β”€ docker-compose.yml     # Container orchestration
β”œβ”€β”€ shared/                # Shared benchmark code
β”‚   β”œβ”€β”€ benchmark.php      # Main benchmark script
β”‚   β”œβ”€β”€ prepare_test_data.py # Test data generation
β”‚   └── composer.json      # PHP dependencies
β”œβ”€β”€ results/               # Generated benchmark results (Git ignored)
β”‚   └── .gitkeep          # Preserves directory structure
β”œβ”€β”€ data/                  # Generated test data (Git ignored)
β”‚   └── .gitkeep          # Preserves directory structure
β”œβ”€β”€ app-fastcsv/          # FastCSV container config
β”œβ”€β”€ app-native/           # Native SplFileObject container config
└── benchmark/            # Orchestrator container config

Note: The results/ and data/ directories are excluded from Git via .gitignore since they contain generated files that can be recreated.

Architecture

The benchmark suite uses a multi-container Docker setup:

  • app-fastcsv: PHP 8.4 container with FastCSV extension installed
  • app-native: PHP 8.4 container without FastCSV (uses SplFileObject fallback)
  • benchmark: Orchestrator container with Python and Docker CLI for running tests
  • Shared volumes: All containers share the same codebase and data directories

Key Features

🎯 Credible Benchmarking

  • Separate data preparation: Test CSV files are pre-generated using Python
  • No data generation during benchmarks: Ensures pure performance measurement
  • Statistical accuracy: 3 iterations per test with median calculations
  • Consistent test data: Same files used across all implementations

πŸ“Š Comprehensive Testing

  • Multiple data sizes: Small (1K rows), Medium (100K rows), Large (1M rows)
  • Various operations: Read, Write, Both
  • Detailed metrics: Execution time, memory usage, throughput, consistency
  • Comparative analysis: Performance scaling and efficiency trends

πŸ”¬ Advanced Analysis

  • Statistical measures: Mean, median, standard deviation
  • Scalability analysis: How performance changes with data size
  • Memory efficiency: Per-record memory usage
  • Consistency assessment: Performance variability analysis

Quick Start

πŸš€ Using Makefile (Recommended)

The easiest way to run benchmarks is using the provided Makefile:

# Complete setup and run all benchmarks
make setup benchmark

# Or step by step:
make setup          # Build containers and prepare test data
make compare        # Run read benchmarks on both implementations
make benchmark-all  # Run all benchmark scenarios

# Quick help
make help           # Show all available commands

πŸ“‹ Common Makefile Commands

Command Description
make setup Build containers and prepare test data
make benchmark Prepare data and run all benchmarks
make compare Compare FastCSV vs Native performance
make benchmark-read Run read benchmarks on both implementations
make benchmark-write Run write benchmarks on both implementations
make benchmark-small Quick test with small dataset only
make validate-setup Verify everything is configured correctly
make show-results Display latest benchmark results
make clean Stop containers and clean up

πŸ”§ Manual Docker Commands

If you prefer manual control:

1. Build and Start Containers

# Build all containers
docker-compose build

# Start containers in background
docker-compose up -d

2. Prepare Test Data

Important: Always prepare test data before running benchmarks to ensure credible results.

# Generate all test data sizes
docker-compose exec benchmark python3 /app/shared/prepare_test_data.py

# Generate specific sizes only
docker-compose exec benchmark python3 /app/shared/prepare_test_data.py --sizes small medium

# Force regenerate existing files
docker-compose exec benchmark python3 /app/shared/prepare_test_data.py --force

# Verify existing test data
docker-compose exec benchmark python3 /app/shared/prepare_test_data.py --verify

3. Run Benchmarks

# Test FastCSV extension (all sizes, 3 iterations each)
docker-compose exec app-fastcsv php /app/shared/benchmark.php read

# Test native SplFileObject (all sizes, 3 iterations each)
docker-compose exec app-native php /app/shared/benchmark.php read

# Test specific size with custom iterations
docker-compose exec app-fastcsv php /app/shared/benchmark.php read medium 5

# Test write operations
docker-compose exec app-fastcsv php /app/shared/benchmark.php write

# Test both read and write
docker-compose exec app-fastcsv php /app/shared/benchmark.php both

Test Data Configuration

The benchmark uses three predefined data sizes:

Size Rows Columns Approx. File Size
Small 1,000 5 ~50 KB
Medium 100,000 10 ~25 MB
Large 1,000,000 15 ~400 MB

Data Preparation Script Options

python3 prepare_test_data.py [OPTIONS]

Options:
  --data-dir DIR     Directory to store test data (default: /app/data)
  --sizes SIZE...    Which sizes to generate: small, medium, large, all (default: all)
  --verify          Verify existing files instead of creating new ones
  --force           Overwrite existing files
  --help            Show help message

Benchmark Script Usage

php benchmark.php [OPERATION] [SIZE] [ITERATIONS]

Arguments:
  OPERATION    read, write, both (default: read)
  SIZE         small, medium, large, all (default: all)
  ITERATIONS   Number of iterations per test (default: 3)

Examples:
  php benchmark.php                    # Read all sizes, 3 iterations each
  php benchmark.php write              # Write all sizes, 3 iterations each
  php benchmark.php read large 5       # Read large size, 5 iterations
  php benchmark.php both small         # Read+write small size, 3 iterations

Results and Analysis

Output Files

Results are saved in the /app/results directory:

  • Detailed JSON: comprehensive_{implementation}_{operation}_{timestamp}.json
  • CSV Summary: summary_{implementation}_{operation}_{timestamp}.csv

Sample Output

=== FastCSV Comprehensive Benchmark ===
Implementation: FastCSV
Operation: read
Iterations per test: 3

╔══════════════════════════════════════════════════════════════╗
β•‘ Testing: small (1,000 rows Γ— 5 columns)                     β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
  β†’ Iteration 1/3... 3.67ms, 2.0 MB peak
  β†’ Iteration 2/3... 3.89ms, 2.0 MB peak
  β†’ Iteration 3/3... 3.45ms, 2.0 MB peak

  πŸ“Š Results Summary for small:
  β”œβ”€ Execution time: 3.67ms (median), 3.67ms (avg) Β±0.22ms
  β”œβ”€ Throughput: 272,410 records/sec (median)
  β”œβ”€ Memory used: <1KB (streaming - constant memory usage)
  β”œβ”€ Memory efficiency: Constant (streaming)
  β”œβ”€ Peak memory: 2.0 MB (total)
  └─ Time per record: 0.0037ms

╔══════════════════════════════════════════════════════════════╗
β•‘                    COMPARATIVE ANALYSIS                     β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

πŸ“ˆ Performance Summary (FastCSV - read):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Data Size   β”‚ Median Time β”‚ Throughput  β”‚ Memory Used β”‚ Peak Memory β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ small       β”‚     3.67ms  β”‚  272,410/s  β”‚      <1KB   β”‚     2.0MB   β”‚
β”‚ medium      β”‚   176.04ms  β”‚  568,049/s  β”‚      <1KB   β”‚     2.0MB   β”‚
β”‚ large       β”‚  1,987.23ms β”‚  503,212/s  β”‚      <1KB   β”‚     2.0MB   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Scalability Analysis:
  β€’ small β†’ medium: 100.0x data β†’ 47.9x time (efficiency: 2.09x)
  β€’ medium β†’ large: 10.0x data β†’ 11.3x time (efficiency: 0.89x)

🎯 Recommendations:
  βœ… Using FastCSV extension - optimal performance achieved
  πŸ“Š Use these results as baseline for performance comparisons

Comparing Implementations

To compare FastCSV vs native performance:

# Run both implementations with same parameters
docker-compose exec app-fastcsv php /app/shared/benchmark.php read > fastcsv_results.txt
docker-compose exec app-native php /app/shared/benchmark.php read > native_results.txt

# Compare the CSV summary files
# Results are saved with timestamps in /app/results/

Development and Debugging

Container Access

# Access FastCSV container
docker-compose exec app-fastcsv sh

# Access native container  
docker-compose exec app-native sh

# Access benchmark container
docker-compose exec benchmark sh

Verify FastCSV Installation

# Check if FastCSV extension is loaded
docker-compose exec app-fastcsv php -m | grep -i csv
docker-compose exec app-fastcsv php -r "var_dump(extension_loaded('fastcsv'));"

# Check implementation info
docker-compose exec app-fastcsv php -r "
require '/app/shared/vendor/autoload.php';
use CsvToolkit\Helpers\ExtensionHelper;
var_dump(ExtensionHelper::getFastCsvInfo());
"

Troubleshooting

Test data not found error:

# Make sure test data is prepared
docker-compose exec benchmark python3 /app/shared/prepare_test_data.py --verify

Permission issues:

# Fix permissions
docker-compose exec benchmark chmod -R 755 /app/data /app/results

Memory issues with large datasets:

# Check PHP memory limit
docker-compose exec app-fastcsv php -r "echo ini_get('memory_limit');"

# Monitor container resources
docker stats

Best Practices

  1. Always prepare test data first before running benchmarks
  2. Use consistent iterations (3 or 5) for statistical accuracy
  3. Run benchmarks multiple times on different days for consistency
  4. Monitor system resources during large dataset tests
  5. Compare results between FastCSV and native implementations
  6. Document your findings with timestamps and system specifications

This setup ensures credible, reproducible benchmarks that accurately measure FastCSV performance without any interference from data generation overhead.

About

This directory contains a comprehensive benchmarking system for comparing FastCSV extension performance against native SplFileObject implementation, designed to provide accurate and credible performance measurements.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published