We present a multi-GPU version of GPUSPH, a CUDA implementation of fluid-dynamics models based on the smoothed particle hydrodynamics (SPH) numerical method. The SPH is a well-known Lagrangian model for the simulation of free-surface fluid flows; it exposes a high degree of parallelism and has already been successfully ported to GPU. We extend the GPU-based simulator to run simulations on multiple GPUs simultaneously, to obtain a gain in speed and overcome the memory limitations of using a single device. The computational domain is spatially split with minimal overlapping and shared volume slices are updated at every iteration of the simulation. Data transfers are asynchronous with computations, thus completely covering the overhead introduced by slice exchange. A simple yet effective load balancing policy preserves the performance in case of unbalanced simulations due to asymmetric fluid topologies. The obtained speedup factor (up to 4.5x for 6 GPUs) closely follows the expected one (5x for 6 GPUs) and it is possible to run simulations with a higher number of particles than would fit on a single device. We use the Karp-Flatt metric to formally estimate the overall efficiency of the parallelization.

Advances in Multi-GPU Smoothed Particle Hydrodynamics Simulations

BILOTTA, GIUSEPPE;GALLO, Giovanni
2014

Abstract

We present a multi-GPU version of GPUSPH, a CUDA implementation of fluid-dynamics models based on the smoothed particle hydrodynamics (SPH) numerical method. The SPH is a well-known Lagrangian model for the simulation of free-surface fluid flows; it exposes a high degree of parallelism and has already been successfully ported to GPU. We extend the GPU-based simulator to run simulations on multiple GPUs simultaneously, to obtain a gain in speed and overcome the memory limitations of using a single device. The computational domain is spatially split with minimal overlapping and shared volume slices are updated at every iteration of the simulation. Data transfers are asynchronous with computations, thus completely covering the overhead introduced by slice exchange. A simple yet effective load balancing policy preserves the performance in case of unbalanced simulations due to asymmetric fluid topologies. The obtained speedup factor (up to 4.5x for 6 GPUs) closely follows the expected one (5x for 6 GPUs) and it is possible to run simulations with a higher number of particles than would fit on a single device. We use the Karp-Flatt metric to formally estimate the overall efficiency of the parallelization.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.11769/42715
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 39
  • ???jsp.display-item.citation.isi??? 33
social impact