WAV Convolution Engine

A C-based convolution tool for WAV audio files, with naive, OpenMP, AVX2, and FFT-based implementations.

Project Overview

This project implements a simple convolution engine that applies an Impulse Response (IR) to an input audio signal. It's primarily designed for educational purposes and testing basic audio DSP concepts.

Main Technologies: C, dr_wav for audio I/O, PFFFT for frequency-domain convolution, and nob.h for the build system.
Key Features:
- Support for mono and stereo WAV files (interleaved).
- Impulse responses can be mono or match the input channel count.
- Naive, Parallel (OpenMP), SIMD (AVX2), and Fast Fourier Transform (FFT / PFFFT) convolution implementations.
- High-resolution timing for performance benchmarking.
- Real-time progress bar with throttled updates for the naive implementation.
- Writes 32-bit float WAV output.
- Automatic build system using nob.c, with compiler selection and debug/profiling flags.
- Memory leak detection via stb_leakcheck.h (optional).

Performance Optimizations

Several optimization techniques have been applied to the convolution engine:

Parallelization (OpenMP): Utilizes multi-core processing to distribute the workload, achieving a ~5x speedup on typical hardware.
SIMD Vectorization (AVX2 + FMA): Uses 256-bit wide registers and Fused Multiply-Add instructions to process 8 samples at once, providing massive throughput for time-domain convolution.
FFT Acceleration (PFFFT): Accelerates convolution by performing it in the frequency domain. This reduces computational complexity from $O(N \cdot M)$ to $O(N \log N)$, providing massive speedups for long signals.
Interleaved Cache Locality: Loops process all channels in a single pass. Since WAV files are interleaved (L, R, L, R...), this ensures sequential memory access, significantly reducing cache misses.
Branchless Inner Loop: Loop bounds are pre-calculated for every output sample, removing conditional checks from the critical path.
Macro-based Sampling: Uses zero-cost pre-processor macros (e.g., X(n, c)) for "math-like" readability of interleaved samples.

Project Structure

main.c: Core logic for loading WAV files, convolution engines, and performance timing.
include/: Third-party libraries and small support headers (dr_wav, pffft, nob.h, stb_leakcheck.h, custom_main.h).
nob.c: Build script that supports gcc, clang, cl, and clang-cl, plus debug and profiling builds.
samples/: Contains sample input audio and impulse response files.
scripts/: Reference scripts and small experiments (main.py, conv1D.py, conv2D.py, fft.c).
profiling/: Contains profiling results and a guide on how to reproduce them.

Building and Running

Build Instructions

The project uses a "nob" style build system.

Bootstrapping:

# On Windows (MSVC)
cl.exe nob.c
# On Windows (GCC/MinGW)
gcc nob.c -o nob.exe

Building the Project: Run the nob executable:
```
.\nob.exe
```

Useful build flags:

-gcc (default), -clang, -msvc, -clang-cl
-debug for a debug build
-profiling for profiling-oriented clang-cl flags
-native to enable -march=native on gcc/clang
-run to run build\main.exe after a successful build

Running the Application

The executable accepts an optional mode flag (-m naive, -m parallel, -m simd, or -m fft). If omitted, the default mode is naive.

Input constraints:

The input and impulse WAV files must have the same sample rate.
The impulse must be mono or have the same number of channels as the input.

.\build\main.exe <input.wav> <impulse.wav> <output.wav> [-m <naive|parallel|simd|fft>]

Example:

.\build\main.exe samples\IN_Snare_Classic.wav samples\IR_DocciaAlbergo_44100.wav samples\OUT_Classic_Doccia.wav -m simd

Implementation Details

The convolution is performed either in the time domain or the frequency domain.

Time Domain:

$$ y[n] = \sum_{k=0}^{M-1} x[n-k] \cdot h[k] $$

The parallel implementation uses #pragma omp parallel for to distribute the outer loop across available CPU cores, while the naive version includes a progress callback for real-time feedback.

Frequency Domain: The fft mode uses PFFFT and an overlap-save style block convolution. Signals are transformed into the frequency domain, where convolution becomes a point-wise multiplication:

$$ Y[f] = X[f] \cdot H[f] $$

This is followed by an Inverse FFT to obtain the output signal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WAV Convolution Engine

Project Overview

Performance Optimizations

Project Structure

Building and Running

Build Instructions

Running the Application

Implementation Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
include		include
profiling		profiling
samples		samples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.c		main.c
nob.c		nob.c

Folders and files

Latest commit

History

Repository files navigation

WAV Convolution Engine

Project Overview

Performance Optimizations

Project Structure

Building and Running

Build Instructions

Running the Application

Implementation Details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages