Parallel benchmark on multi-core CPUs
From Gerris
This benchmark uses the the parallel Bénard–von Kármán Vortex Street example. The problem size is small which makes good parallel performance difficult to reach.
Contents |
popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits
- Ubuntu 10.04 LTS 64-bits
- Linux popinet-new 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux
- Gerris2D version 2012-11-29
MPI versions:
- Open MPI
- 1.4.1-2
Open MPI
#CPUs | Relative speedup |
---|---|
1 | 1 |
2 (load-balanced) | 2.32 |
4 (load-balanced) | 3.60 |
fitzroy: IBM Power 575 4.7 GHz
#CPUs | Relative speedup |
---|---|
1 | 1 |
2 (load-balanced) | 1.98 |
4 (load-balanced) | 3.46 |
load leveler script
for 4 tasks
#!/bin/bash
#@ job_name = gerris
#@ class = General
#@ job_type = parallel
#@ node = 1
#@ tasks_per_node = 4
#@ task_affinity = core(1)
#@ output = gerris4.out
#@ error = log-4
#@ queue
. /home/popinet/.bashrc
export PATH=$PATH:/opt/xlcpp/v10.1.0.4/usr/vacpp/bin
poe gerris2D parallel-p2.gfs
Parameter file
Large outputs and movie generation were turned off, the single-CPU parameter file is:
8 7 GfsSimulation GfsBox GfsGEdge {} {
Time { end = 15 }
Solid (x*x + y*y - 0.0625*0.0625)
RefineSolid 6
VariableTracer {} T
Init {} { U = 1 }
AdaptVorticity { istep = 1 } { maxlevel = 6 cmax = 1e-2 }
AdaptGradient { istep = 1 } { maxlevel = 6 cmax = 1e-2 } T
SourceViscosity 0.00078125
EventBalance { istep = 1 } 0.1
OutputTime { istep = 10 } stderr
OutputTime { istep = 1 } balance
OutputBalance { istep = 1 } balance
OutputProjectionStats { istep = 10 } stderr
OutputTiming { start = end } stderr
OutputSimulation { start = end } end.gfs
}
GfsBox {
left = Boundary {
BcDirichlet U 1
BcDirichlet T { return y < 0. ? 1. : 0.; }
}
}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox { right = BoundaryOutflow }
1 2 right
2 3 right
3 4 right
4 5 right
5 6 right
6 7 right
7 8 right
Gnuplot file
set term pngcairo set output 'balanced.png' set xlabel 'Simulation time' set ylabel 'Wall-clock time (s)' set grid set key top left plot [0:15]'< grep step: log-1' u 4:10 w l t '1 CPU', \ '< grep step: log-2' u 4:10 w l t '2 CPUs (load-balanced)', \ '< grep step: log-4' u 4:10 w l t '4 CPUs (load-balanced)'