A practical guide to signal processing with modern C++17/20/23
Modern C++ (17/20/23) provides C-level performance with stronger type safety, automatic memory management, and composable algorithms. The compiler optimizes templates and abstractions to the same machine code as hand-written C.
This guide focuses on C++ features relevant to signal processing: templates, standard library algorithms, SIMD support, and the parallel execution library. It assumes familiarity with C and signal processing fundamentals.
Templates enable writing functions once that work for multiple numeric types without runtime overhead. The compiler generates specialized code for each type used.
// Single implementation works for any numeric type
template<typename T>
void scale(std::span<T> buffer, T gain) {
for (auto& sample : buffer) {
sample *= gain;
}
}
// Compiler generates optimized code for each type
scale(std::span{float_buffer}, 2.0f);
scale(std::span{double_buffer}, 2.0);
Concepts specify compile-time requirements on template parameters, providing better error messages and clearer interfaces:
template<std::floating_point T>
class Oscillator {
T phase_ = 0;
T freq_;
public:
T next_sample() { /* ... */ }
};
// Only accepts float, double, long double
Oscillator<float> osc; // Valid
Oscillator<int> osc; // Compile error with clear diagnostic
std::vector provides dynamic arrays with automatic memory management. Elements are stored contiguously, making them cache-friendly and compatible with SIMD operations.
std::vector<float> buffer(num_samples);
// Automatically freed when out of scope
// Exception-safe
// Resizable
For SIMD operations requiring specific alignment, a custom allocator can be used:
template<typename T, size_t Alignment = 32>
struct aligned_allocator {
using value_type = T;
T* allocate(size_t n) {
return static_cast<T*>(std::aligned_alloc(Alignment, n * sizeof(T)));
}
void deallocate(T* p, size_t) {
std::free(p);
}
};
// Vector with 32-byte alignment for AVX
std::vector<float, aligned_allocator<float, 32>> samples(n);
std::vector for dynamic arrays. Add custom allocators when alignment is required for SIMD.
The C++17 standard library includes execution policies for automatic parallelization and vectorization of algorithms:
#include <algorithm>
#include <execution>
std::vector<float> input(1'000'000);
std::vector<float> output(1'000'000);
// Sequential execution
std::transform(input.begin(), input.end(), output.begin(),
[](float x) { return std::sin(x); });
// Parallel and vectorized execution
std::transform(std::execution::par_unseq,
input.begin(), input.end(), output.begin(),
[](float x) { return std::sin(x); });
Execution policies:
seq - sequential execution (default)par - parallel execution across threadsunseq - vectorized execution (SIMD)par_unseq - both parallel and vectorizedfloat rms(const std::vector<float>& signal) {
float sum_squares = std::transform_reduce(
std::execution::par_unseq,
signal.begin(), signal.end(),
0.0f, // initial value
std::plus{}, // reduction operation
[](float x) { return x * x; } // transform operation
);
return std::sqrt(sum_squares / signal.size());
}
The implementation handles thread creation, load balancing, and SIMD vectorization automatically.
Ranges provide composable operations on sequences without creating intermediate buffers:
#include <ranges>
std::vector<float> samples = /* ... */;
// Compose operations without temporary allocations
auto processed = samples
| std::views::transform([gain](float x) { return x * gain; })
| std::views::transform([](float x) { return std::clamp(x, -1.0f, 1.0f); })
| std::views::take(1000);
// Evaluation is lazy - computed during iteration
for (float sample : processed) {
// Process sample
}
The compiler fuses these operations into a single loop with no intermediate storage.
std::span provides a non-owning view of contiguous memory, replacing pointer-and-length pairs with a type-safe interface:
// C-style interface
void process(float* buffer, size_t length);
// C++ interface with std::span
void process(std::span<float> buffer);
// Works with any contiguous container
std::vector<float> vec;
std::array<float, 1024> arr;
float raw[512];
process(vec); // Implicit conversion
process(arr);
process(raw);
std::span for function parameters that operate on contiguous sequences.
std::simd provides portable vectorization without platform-specific intrinsics:
#include <experimental/simd>
namespace stdx = std::experimental;
void vector_add(std::span<const float> a,
std::span<const float> b,
std::span<float> result) {
using simd_t = stdx::native_simd<float>; // Platform-optimal width
const size_t width = simd_t::size();
size_t i = 0;
// SIMD loop
for (; i + width <= a.size(); i += width) {
simd_t va(&a[i], stdx::element_aligned);
simd_t vb(&b[i], stdx::element_aligned);
simd_t vresult = va + vb;
vresult.copy_to(&result[i], stdx::element_aligned);
}
// Scalar remainder
for (; i < a.size(); ++i) {
result[i] = a[i] + b[i];
}
}
The library abstracts platform differences, generating appropriate code for x86 (SSE/AVX), ARM (NEON), and other architectures.
std::simd when explicit control over vectorization is needed.
Lambda expressions provide inline functions with access to local variables:
float gain = 2.0f;
std::transform(std::execution::par_unseq,
input.begin(), input.end(), output.begin(),
[gain](float x) { return x * gain; });
Capture specifications:
[gain] - capture by value[&gain] - capture by reference[=] - capture all used variables by value[&] - capture all used variables by referenceLambda expressions compile to function objects with inline call operators, achieving the same performance as function pointers.
Classes encapsulate state and operations for signal processing modules:
template<typename T>
class BiquadFilter {
T a1_, a2_, b0_, b1_, b2_; // Coefficients
T x1_ = 0, x2_ = 0; // Input history
T y1_ = 0, y2_ = 0; // Output history
public:
BiquadFilter(T a1, T a2, T b0, T b1, T b2)
: a1_(a1), a2_(a2), b0_(b0), b1_(b1), b2_(b2) {}
T process(T input) {
T output = b0_ * input + b1_ * x1_ + b2_ * x2_
- a1_ * y1_ - a2_ * y2_;
x2_ = x1_; x1_ = input;
y2_ = y1_; y1_ = output;
return output;
}
void reset() { x1_ = x2_ = y1_ = y2_ = 0; }
};
// Usage
BiquadFilter<float> lpf(/* coefficients */);
float output = lpf.process(input_sample);
constexpr functions execute at compile time when possible:
constexpr float db_to_linear(float db) {
return std::pow(10.0f, db / 20.0f);
}
// Evaluated at compile time
constexpr float gain = db_to_linear(-6.0f);
// Lookup tables computed at compile time
constexpr auto make_sine_table() {
std::array<float, 1024> table{};
for (size_t i = 0; i < table.size(); ++i) {
table[i] = std::sin(2.0f * std::numbers::pi_v<float> * i / table.size());
}
return table;
}
constexpr auto sine_table = make_sine_table();
constexpr for lookup tables, coefficient calculations, and derived constants.
A complete FIR filter implementation demonstrating modern C++ features:
#include <vector>
#include <span>
#include <algorithm>
#include <numeric>
template<std::floating_point T>
class FIRFilter {
std::vector<T> coefficients_;
std::vector<T> history_;
size_t history_index_ = 0;
public:
explicit FIRFilter(std::span<const T> coefficients)
: coefficients_(coefficients.begin(), coefficients.end()),
history_(coefficients.size(), T{0}) {}
T process_sample(T input) {
history_[history_index_] = input;
history_index_ = (history_index_ + 1) % history_.size();
T output = std::inner_product(
coefficients_.begin(), coefficients_.end(),
history_.begin() + history_index_,
T{0}
);
return output;
}
void process(std::span<const T> input, std::span<T> output) {
std::transform(input.begin(), input.end(), output.begin(),
[this](T sample) { return process_sample(sample); });
}
void reset() {
std::fill(history_.begin(), history_.end(), T{0});
history_index_ = 0;
}
};
// Usage
std::array<float, 64> coeffs = /* ... */;
FIRFilter<float> filter(coeffs);
std::vector<float> input(48000);
std::vector<float> output(48000);
filter.process(input, output);
For real-time audio processing, lock-free memory management avoids allocation in the audio callback:
#include <atomic>
#include <array>
template<typename T, size_t BufferSize, size_t PoolSize>
class BufferPool {
struct Buffer {
std::array<T, BufferSize> data;
std::atomic<bool> in_use{false};
};
std::array<Buffer, PoolSize> buffers_;
public:
std::span<T> acquire() {
for (auto& buffer : buffers_) {
bool expected = false;
if (buffer.in_use.compare_exchange_strong(expected, true)) {
return buffer.data;
}
}
return {}; // Pool exhausted
}
void release(std::span<T> buffer) {
for (auto& buf : buffers_) {
if (buf.data.data() == buffer.data()) {
buf.in_use.store(false);
return;
}
}
}
};
Operator overloading enables natural mathematical syntax for domain-specific types:
template<typename T>
class Signal {
std::vector<T> samples_;
public:
explicit Signal(size_t n) : samples_(n) {}
Signal& operator+=(const Signal& other) {
std::transform(std::execution::par_unseq,
samples_.begin(), samples_.end(),
other.samples_.begin(),
samples_.begin(),
std::plus{});
return *this;
}
Signal operator+(const Signal& other) const {
Signal result(*this);
result += other;
return result;
}
};
// Mathematical notation
Signal<float> output = (input1 + input2) * 0.5f;
std::variant provides type-safe tagged unions:
using AudioFormat = std::variant<int16_t, int24_t, float, double>;
void process(const std::vector<AudioFormat>& samples) {
for (const auto& sample : samples) {
std::visit([](auto&& s) {
using T = std::decay_t<decltype(s)>;
if constexpr (std::is_floating_point_v<T>) {
// Process floating-point
} else {
// Process integer
}
}, sample);
}
}
std::optional represents values that may or may not exist:
std::optional<float> find_peak(std::span<const float> signal) {
auto it = std::max_element(signal.begin(), signal.end());
if (it != signal.end() && *it > threshold) {
return *it;
}
return std::nullopt;
}
if (auto peak = find_peak(signal)) {
// Use *peak
}
std::optional instead of sentinel values. Use std::variant instead of C unions.
Established C++ libraries for signal processing:
# GCC/Clang
g++ -std=c++23 \
-O3 \
-march=native \
-ffast-math \
-flto \
-fopenmp \
-ltbb \
signal.cpp
# Intel TBB required for parallel algorithms:
# Ubuntu: sudo apt install libtbb-dev
# macOS: brew install tbb
C++ provides several advantages for signal processing applications:
Type Safety:
Memory Management:
free() callsAbstraction:
Performance:
C++ may not be appropriate for all signal processing applications:
For these cases, C remains the better choice.
Modern C++ (17/20/23) provides tools for writing efficient, type-safe signal processing code:
These features enable writing clear, maintainable code that compiles to machine code comparable to hand-optimized C implementations.
Adrian Freed, 2026