A high-performance C++ particle simulator leveraging multithreading (OpenMP & GCD), spatial partitioning (grid), SIMD (Apple Accelerate & auto-vectorization), and optimized memory patterns for real-time physics simulation.
ParticleBox is a performant C++ particle simulator designed to efficiently simulate the interactions of thousands of particles in real-time. It utilizes modern C++ features and platform-specific optimizations to achieve high frame rates, even with large particle counts.
Key performance techniques implemented in ParticleBox:
physics.cpp
) to dramatically reduce the number of pairwise collision checks, changing the complexity from O(n²) towards O(n) in typical scenarios. Cell indices are calculated efficiently using bit-shifting (>> CELL_SIZE_SHIFT
).dispatch/dispatch.h
for optimized task scheduling on Apple Silicon (simulation.cpp
) and OpenMP (#pragma omp parallel for
in physics.cpp
) with std::async
/std::future
(simulation.cpp
) for cross-platform parallelization on Linux/Windows.simd/simd.h
, simd_float2
) for vector math (particle.h
, physics.cpp
, simulation.cpp
). Compiler flags like -ftree-vectorize
and -march=native
(Makefile
) encourage auto-vectorization on other platforms.std::vector::reserve()
to pre-allocate memory for particles (simulation.cpp
), minimizing reallocations during runtime. Employs thread_local
random number generators (simulation.cpp
) to reduce contention in multithreaded particle spawning. The grid uses flat std::vector
s for cell data, improving cache locality.Vec2::fastInvSqrt
in particle.h
) and uses squared distance checks to avoid costly square roots in collision detection loops (physics.cpp
). Employs bitwise shift operations for fast grid cell index calculation.std::vector
, pre-allocation, and the flat-array structure of the spatial grid. Uses thread-local storage for RNGs. Particles are stored as an Array of Structs (AoS).reserve()
.ParticleBox simulates particle interactions using optimized numerical methods.
Collision detection between two particles (p1, p2) primarily uses squared distances to avoid expensive square root calculations in the inner loop. The squared distance \(d^2\) is calculated:
\[ d^2 = (p2.x - p1.x)^2 + (p2.y - p1.y)^2 \]
A collision is detected if \(d^2 < (r1 + r2)^2\), where \(r1\) and \(r2\) are the particle radii. When the actual distance or a normalized direction vector is required (e.g., for force calculation), the optimized Fast Inverse Square Root (fastInvSqrt
) function is used to calculate \(1/\sqrt{d^2}\).
A simple linear repulsion force is applied when particles overlap. The magnitude of the repulsion force \(F_r\) between overlapping particles is proportional to the overlap depth:
\[ \mathbf{F}_r = -\hat{\mathbf{n}} \cdot \text{repulsionStrength} \cdot \delta \]
Where \(\delta\) is the overlap distance \((r1 + r2) - d\), \(\text{repulsionStrength}\) is a constant factor (REPULSION_STRENGTH
in physics.h
), and \(\hat{\mathbf{n}}\) is the normalized direction vector from particle \(i\) to particle \(j\), calculated efficiently using fastInvSqrt
. Force calculations leverage SIMD vector types (simd_float2
) on Apple platforms.
The simulation uses a Semi-Implicit Euler integration method to update particle positions and velocities:
\[ \mathbf{v}_{t+\Delta t} = \mathbf{v}_t + (\mathbf{F}_t / m) \cdot \Delta t \] \[ \mathbf{p}_{t+\Delta t} = \mathbf{p}_t + \mathbf{v}_{t+\Delta t} \cdot \Delta t \]
Where \(\mathbf{p}\) is position, \(\mathbf{v}\) is velocity, \(\mathbf{F}\) is the net force, \(m\) is mass, and \(\Delta t\) is the time step. This method is straightforward to implement and computationally inexpensive.
ParticleBox is structured into several key components:
main.cpp
): Initializes SDL, creates windows/renderers, manages the main event loop, and orchestrates simulation updates and rendering.simulation.h
, simulation.cpp
): Manages the collection of particles (std::vector
), controls the simulation state (running/stopped), handles particle creation/deletion, delegates physics updates, manages frame timing, and triggers rendering. Contains the logic for selecting multithreading strategies (GCD vs. OpenMP/async).physics.h
, physics.cpp
): Calculates forces (gravity, repulsion), performs collision detection (using grid or brute-force), applies boundary conditions, and contains toggles for physics features (gravity, grid, pairwise optimization). Implements the spatial partitioning grid logic.particle.h
, particle.cpp
): Defines the Particle
struct and the Vec2
structure (with SIMD optimizations for Apple platforms). Handles individual particle rendering and state (position, velocity, mass, etc.). Includes particle texture caching.gui.h
, gui.cpp
): Manages the separate control window using SDL and SDL_ttf, providing buttons, toggles, and performance metrics display (FPS, particle count, graphs).When enabled, the spatial partitioning system uses a uniform grid to optimize collision detection:
CELL_SIZE = 8.0f
).pos.x >> CELL_SIZE_SHIFT
) to calculate the grid cell indices for each particle's position.std::vector
s (cellCounts
, cellParticles
, cellStartIndices
) to store particle indices sorted by cell, promoting better cache locality during neighbor iteration compared to nested structures.The physics engine distributes computational work across available CPU cores:
dispatch_async
and semaphores for efficient, low-overhead task scheduling managed by the OS.std::async
with std::launch::async
to distribute work across threads (managed by the C++ runtime) and OpenMP directives (#pragma omp parallel for
, #pragma omp atomic
) within the force calculation loops in physics.cpp
.std::thread::hardware_concurrency()
(with a fallback).#pragma omp atomic
for safe updates to shared force contributions in the OpenMP path and GCD's task-based model implicitly manages synchronization or uses semaphores where needed.thread_local std::mt19937
random number generators to avoid lock contention during concurrent particle creation.Vector operations are accelerated using Single Instruction, Multiple Data (SIMD) techniques:
simd_float2
) and functions (e.g., simd_dot
) for Vec2
operations (addition, subtraction, dot product) and within the physics/update loops on macOS/iOS, providing native performance. Enabled via `-framework Accelerate` and `-mcpu=apple-m1` in the Makefile.-O3
, -ffast-math
, -ftree-vectorize
, and -march=native
in the Makefile to encourage the compiler (like GCC/Clang) to automatically generate SIMD instructions (SSE, AVX, NEON depending on the target architecture) for loops and vectorizable code on other platforms.physics.cpp
and simulation.cpp
.Memory access and allocation patterns are optimized for performance:
std::vector::reserve()
when resetting or spawning particles to pre-allocate sufficient memory, reducing the likelihood of costly reallocations during the simulation run.std::vector
which stores elements contiguously, improving cache locality during iteration compared to linked structures.std::vector
. While straightforward, SIMD utilization might be less optimal than a Structure of Arrays (SoA) approach for certain operations.circleTextureCache
in particle.cpp
) for pre-rendered particle circle textures, reducing redundant texture creation and GPU overhead.Mathematical calculations are optimized for speed:
>>
) for grid cell index calculations instead of division.invMass
) to replace division in force-to-acceleration calculations.ParticleBox utilizes multi-core CPUs effectively through:
thread_local
RNGs for parallel particle spawning without locks.Memory performance is enhanced through:
std::vector::reserve()
minimizes runtime reallocations and memory fragmentation.std::vector
and the flat array layout within the spatial grid promotes better CPU cache utilization.Performance improvements were guided by analyzing bottlenecks. Key areas addressed include:
-O3
, -march=native
, -mcpu=apple-m1
, -ftree-vectorize
, -fopenmp
).std::async
, std::future
), Apple Grand Central Dispatch (GCD)simd/simd.h
), Compiler Auto-Vectorization (via flags)