Modern C++ for Signal Processing

Freed, Adrian

A practical guide to signal processing with modern C++17/20/23

Companion Article: This guide covers C++ approaches to signal processing. For a C-focused guide, see: Modern Guidelines for Signal Processing Applications in C.

Introduction
Templates for Generic Numerical Code
Memory Management with std::vector
Parallel Algorithms (C++17)
Ranges and Views (C++20)
std::span for Function Parameters
Portable SIMD with std::simd
Lambda Expressions
Classes for DSP Components
Compile-Time Computation
Example: FIR Filter Implementation
Example: Lock-Free Buffer Pool
Operator Overloading
Type-Safe Variants and Optionals
Recommended Libraries
Compiler Configuration
Comparison with C
Limitations
Summary

Introduction

Modern C++ (17/20/23) provides C-level performance with stronger type safety, automatic memory management, and composable algorithms. The compiler optimizes templates and abstractions to the same machine code as hand-written C.

This guide focuses on C++ features relevant to signal processing: templates, standard library algorithms, SIMD support, and the parallel execution library. It assumes familiarity with C and signal processing fundamentals.

Templates for Generic Numerical Code

Templates enable writing functions once that work for multiple numeric types without runtime overhead. The compiler generates specialized code for each type used.

// Single implementation works for any numeric type
template<typename T>
void scale(std::span<T> buffer, T gain) {
    for (auto& sample : buffer) {
        sample *= gain;
    }
}

// Compiler generates optimized code for each type
scale(std::span{float_buffer}, 2.0f);
scale(std::span{double_buffer}, 2.0);

Concepts for Type Constraints (C++20)

Concepts specify compile-time requirements on template parameters, providing better error messages and clearer interfaces:

template<std::floating_point T>
class Oscillator {
    T phase_ = 0;
    T freq_;
public:
    T next_sample() { /* ... */ }
};

// Only accepts float, double, long double
Oscillator<float> osc;   // Valid
Oscillator<int> osc;     // Compile error with clear diagnostic

Recommendation: Use concepts to document template requirements.

Memory Management with std::vector

std::vector provides dynamic arrays with automatic memory management. Elements are stored contiguously, making them cache-friendly and compatible with SIMD operations.

std::vector<float> buffer(num_samples);
// Automatically freed when out of scope
// Exception-safe
// Resizable

For SIMD operations requiring specific alignment, a custom allocator can be used:

template<typename T, size_t Alignment = 32>
struct aligned_allocator {
    using value_type = T;

    T* allocate(size_t n) {
        return static_cast<T*>(std::aligned_alloc(Alignment, n * sizeof(T)));
    }

    void deallocate(T* p, size_t) {
        std::free(p);
    }
};

// Vector with 32-byte alignment for AVX
std::vector<float, aligned_allocator<float, 32>> samples(n);

Recommendation: Use std::vector for dynamic arrays. Add custom allocators when alignment is required for SIMD.

Parallel Algorithms (C++17)

The C++17 standard library includes execution policies for automatic parallelization and vectorization of algorithms:

#include <algorithm>
#include <execution>

std::vector<float> input(1'000'000);
std::vector<float> output(1'000'000);

// Sequential execution
std::transform(input.begin(), input.end(), output.begin(),
               [](float x) { return std::sin(x); });

// Parallel and vectorized execution
std::transform(std::execution::par_unseq,
               input.begin(), input.end(), output.begin(),
               [](float x) { return std::sin(x); });

Execution policies:

seq - sequential execution (default)
par - parallel execution across threads
unseq - vectorized execution (SIMD)
par_unseq - both parallel and vectorized

Example: RMS Calculation

float rms(const std::vector<float>& signal) {
    float sum_squares = std::transform_reduce(
        std::execution::par_unseq,
        signal.begin(), signal.end(),
        0.0f,                          // initial value
        std::plus{},                   // reduction operation
        [](float x) { return x * x; }  // transform operation
    );
    return std::sqrt(sum_squares / signal.size());
}

The implementation handles thread creation, load balancing, and SIMD vectorization automatically.

Recommendation: Consider parallel algorithms before implementing custom parallelization with OpenMP or manual SIMD.

Ranges and Views (C++20)

Ranges provide composable operations on sequences without creating intermediate buffers:

#include <ranges>

std::vector<float> samples = /* ... */;

// Compose operations without temporary allocations
auto processed = samples
    | std::views::transform([gain](float x) { return x * gain; })
    | std::views::transform([](float x) { return std::clamp(x, -1.0f, 1.0f); })
    | std::views::take(1000);

// Evaluation is lazy - computed during iteration
for (float sample : processed) {
    // Process sample
}

The compiler fuses these operations into a single loop with no intermediate storage.

Recommendation: Use views to compose operations. The compiler optimizes them to equivalent hand-written loops.

std::span for Function Parameters (C++20)

std::span provides a non-owning view of contiguous memory, replacing pointer-and-length pairs with a type-safe interface:

// C-style interface
void process(float* buffer, size_t length);

// C++ interface with std::span
void process(std::span<float> buffer);

// Works with any contiguous container
std::vector<float> vec;
std::array<float, 1024> arr;
float raw[512];

process(vec);   // Implicit conversion
process(arr);
process(raw);

Recommendation: Use std::span for function parameters that operate on contiguous sequences.

Portable SIMD with std::simd (C++23/experimental)

std::simd provides portable vectorization without platform-specific intrinsics:

#include <experimental/simd>
namespace stdx = std::experimental;

void vector_add(std::span<const float> a,
                std::span<const float> b,
                std::span<float> result) {
    using simd_t = stdx::native_simd<float>;  // Platform-optimal width
    const size_t width = simd_t::size();

    size_t i = 0;
    // SIMD loop
    for (; i + width <= a.size(); i += width) {
        simd_t va(&a[i], stdx::element_aligned);
        simd_t vb(&b[i], stdx::element_aligned);
        simd_t vresult = va + vb;
        vresult.copy_to(&result[i], stdx::element_aligned);
    }

    // Scalar remainder
    for (; i < a.size(); ++i) {
        result[i] = a[i] + b[i];
    }
}

The library abstracts platform differences, generating appropriate code for x86 (SSE/AVX), ARM (NEON), and other architectures.

Recommendation: Consider parallel algorithms first. Use std::simd when explicit control over vectorization is needed.

Lambda Expressions

Lambda expressions provide inline functions with access to local variables:

float gain = 2.0f;
std::transform(std::execution::par_unseq,
               input.begin(), input.end(), output.begin(),
               [gain](float x) { return x * gain; });

Capture specifications:

[gain] - capture by value
[&gain] - capture by reference
[=] - capture all used variables by value
[&] - capture all used variables by reference

Lambda expressions compile to function objects with inline call operators, achieving the same performance as function pointers.

Recommendation: Use lambda expressions with standard algorithms for local operations.

Classes for DSP Components

Classes encapsulate state and operations for signal processing modules:

template<typename T>
class BiquadFilter {
    T a1_, a2_, b0_, b1_, b2_;  // Coefficients
    T x1_ = 0, x2_ = 0;         // Input history
    T y1_ = 0, y2_ = 0;         // Output history

public:
    BiquadFilter(T a1, T a2, T b0, T b1, T b2)
        : a1_(a1), a2_(a2), b0_(b0), b1_(b1), b2_(b2) {}

    T process(T input) {
        T output = b0_ * input + b1_ * x1_ + b2_ * x2_
                   - a1_ * y1_ - a2_ * y2_;

        x2_ = x1_; x1_ = input;
        y2_ = y1_; y1_ = output;

        return output;
    }

    void reset() { x1_ = x2_ = y1_ = y2_ = 0; }
};

// Usage
BiquadFilter<float> lpf(/* coefficients */);
float output = lpf.process(input_sample);

Recommendation: Use classes to encapsulate DSP state. Template parameters enable type-generic implementations.

Compile-Time Computation

constexpr functions execute at compile time when possible:

constexpr float db_to_linear(float db) {
    return std::pow(10.0f, db / 20.0f);
}

// Evaluated at compile time
constexpr float gain = db_to_linear(-6.0f);

// Lookup tables computed at compile time
constexpr auto make_sine_table() {
    std::array<float, 1024> table{};
    for (size_t i = 0; i < table.size(); ++i) {
        table[i] = std::sin(2.0f * std::numbers::pi_v<float> * i / table.size());
    }
    return table;
}

constexpr auto sine_table = make_sine_table();

Recommendation: Use constexpr for lookup tables, coefficient calculations, and derived constants.

Example: FIR Filter Implementation

A complete FIR filter implementation demonstrating modern C++ features:

#include <vector>
#include <span>
#include <algorithm>
#include <numeric>

template<std::floating_point T>
class FIRFilter {
    std::vector<T> coefficients_;
    std::vector<T> history_;
    size_t history_index_ = 0;

public:
    explicit FIRFilter(std::span<const T> coefficients)
        : coefficients_(coefficients.begin(), coefficients.end()),
          history_(coefficients.size(), T{0}) {}

    T process_sample(T input) {
        history_[history_index_] = input;
        history_index_ = (history_index_ + 1) % history_.size();

        T output = std::inner_product(
            coefficients_.begin(), coefficients_.end(),
            history_.begin() + history_index_,
            T{0}
        );

        return output;
    }

    void process(std::span<const T> input, std::span<T> output) {
        std::transform(input.begin(), input.end(), output.begin(),
                       [this](T sample) { return process_sample(sample); });
    }

    void reset() {
        std::fill(history_.begin(), history_.end(), T{0});
        history_index_ = 0;
    }
};

// Usage
std::array<float, 64> coeffs = /* ... */;
FIRFilter<float> filter(coeffs);

std::vector<float> input(48000);
std::vector<float> output(48000);
filter.process(input, output);

Example: Lock-Free Buffer Pool

For real-time audio processing, lock-free memory management avoids allocation in the audio callback:

#include <atomic>
#include <array>

template<typename T, size_t BufferSize, size_t PoolSize>
class BufferPool {
    struct Buffer {
        std::array<T, BufferSize> data;
        std::atomic<bool> in_use{false};
    };

    std::array<Buffer, PoolSize> buffers_;

public:
    std::span<T> acquire() {
        for (auto& buffer : buffers_) {
            bool expected = false;
            if (buffer.in_use.compare_exchange_strong(expected, true)) {
                return buffer.data;
            }
        }
        return {};  // Pool exhausted
    }

    void release(std::span<T> buffer) {
        for (auto& buf : buffers_) {
            if (buf.data.data() == buffer.data()) {
                buf.in_use.store(false);
                return;
            }
        }
    }
};

Operator Overloading

Operator overloading enables natural mathematical syntax for domain-specific types:

template<typename T>
class Signal {
    std::vector<T> samples_;

public:
    explicit Signal(size_t n) : samples_(n) {}

    Signal& operator+=(const Signal& other) {
        std::transform(std::execution::par_unseq,
                       samples_.begin(), samples_.end(),
                       other.samples_.begin(),
                       samples_.begin(),
                       std::plus{});
        return *this;
    }

    Signal operator+(const Signal& other) const {
        Signal result(*this);
        result += other;
        return result;
    }
};

// Mathematical notation
Signal<float> output = (input1 + input2) * 0.5f;

Recommendation: Overload operators for domain-specific types when it improves code clarity.

Type-Safe Variants and Optionals

std::variant provides type-safe tagged unions:

using AudioFormat = std::variant<int16_t, int24_t, float, double>;

void process(const std::vector<AudioFormat>& samples) {
    for (const auto& sample : samples) {
        std::visit([](auto&& s) {
            using T = std::decay_t<decltype(s)>;
            if constexpr (std::is_floating_point_v<T>) {
                // Process floating-point
            } else {
                // Process integer
            }
        }, sample);
    }
}

std::optional represents values that may or may not exist:

std::optional<float> find_peak(std::span<const float> signal) {
    auto it = std::max_element(signal.begin(), signal.end());
    if (it != signal.end() && *it > threshold) {
        return *it;
    }
    return std::nullopt;
}

if (auto peak = find_peak(signal)) {
    // Use *peak
}

Recommendation: Use std::optional instead of sentinel values. Use std::variant instead of C unions.

Recommended Libraries

Established C++ libraries for signal processing:

Linear Algebra

Eigen - matrix operations, expression templates
xtensor - NumPy-like interface
Blaze - high-performance linear algebra

FFT

FFTW - optimized FFT library (C interface)
pocketfft - header-only implementation
KFR - DSP framework with FFT support

Audio I/O

RtAudio - cross-platform real-time audio
PortAudio - portable audio I/O
miniaudio - single-header audio library

DSP

KFR - comprehensive DSP framework
Q - modern C++14 DSP library

Compiler Configuration

# GCC/Clang
g++ -std=c++23 \
    -O3 \
    -march=native \
    -ffast-math \
    -flto \
    -fopenmp \
    -ltbb \
    signal.cpp

# Intel TBB required for parallel algorithms:
# Ubuntu: sudo apt install libtbb-dev
# macOS: brew install tbb

Comparison with C

C++ provides several advantages for signal processing applications:

Type Safety:

Templates provide type-safe generic code
Concepts document type requirements
Strong typing catches errors at compile time

Memory Management:

RAII eliminates manual memory management
No need for explicit free() calls
Exception-safe resource handling

Abstraction:

Standard algorithms with parallel execution
Composable views and ranges
Expression templates for complex operations

Performance:

Zero-overhead abstractions
Same machine code generation as C
Better optimization opportunities through type information

Limitations

C++ may not be appropriate for all signal processing applications:

Embedded systems without C++ runtime support
Systems requiring maximum portability to legacy compilers
Projects with strict binary size constraints
Teams without C++ expertise

For these cases, C remains the better choice.

Summary

Modern C++ (17/20/23) provides tools for writing efficient, type-safe signal processing code:

Templates for generic algorithms
Parallel algorithms for automatic optimization
Ranges and views for composable operations
Standard SIMD support (C++23)
Automatic memory management with RAII
Compile-time computation with constexpr

These features enable writing clear, maintainable code that compiles to machine code comparable to hand-optimized C implementations.

Adrian Freed, 2026