Gerris in parallel
From Gerris
The principle is relatively simple: each GfsBox can take a pid
argument which defines the rank of the process on which the solution
for this GfsBox will be computed. If you take the "half cylinder"
example and do something like:
4 3 GfsSimulation GfsBox GfsGEdge {} { Time { end = 10 } Refine 6 GtsSurfaceFile half-cylinder.gts Init {} { U = 1 } OutputProjectionStats { step = 0.02 } stderr OutputSimulation { step = 1 } simulation-%3.1f OutputTiming { start = end } stderr } GfsBox { pid = 0 left = BoundaryInflowConstant 1 } GfsBox { pid = 1 } GfsBox { pid = 2 } GfsBox { pid = 3 right = BoundaryOutflow } 1 2 right 2 3 right 3 4 right
if you run this using
% gerris2D half-cylinder.gfs
it will run on one processor. If you now do
% mpirun -np 4 gerris2D half-cylinder.gfs
it will run on 4 processors with each of the GfsBoxes
assigned to a
different processor. Gerris takes care of the communications necessary
at the boundaries between GfsBoxes
on different processors.
Any Gerris parameter file can be manually "parallelised" as explained previously. Gerris also includes functions designed to create "parallelised" simulation files. A short description of these functions is given when typing:
% gerris2D -h
-s N --split=N splits the domain N times and returns the corresponding simulation -i --pid keep box pids when splitting -p N --partition=N partition the domain in 2^N subdomains and returns the corresponding simulation -b N --bubble=N partition the domain in N subdomains and returns the corresponding simulation
The option -s
is used to split the domain which will create more GfsBoxes
.
It takes the already existing domain, splits it N times and attribute a new pid to each GfsBox
(unless the -i
option is specified).
The -p
and -b
options are used to "parallelise" a Gerris file that already contains enough GfsBoxes
(at least 2^N for the -p
option and N for the -b
option). It will group the GfsBoxes
together in order to get 2^N (or N) subdomains. Each of these domains is attributed a different pid. The difference between the -p
and -b
options is the algorithm used to perform the graph partitioning. The -b
option uses a simple and fast bubble partitioning algorithm which will not necessarily yield well-balanced subdomains. The -p
options uses a more complex and slower recursive bisection algorithm which is optimised to yield well-balanced subdomains.
Example: the GfsSimulation is 2D and made of 3 GfsBoxes
1- If only one processor is to be used no parallelisation is required. We use the usual command line:
% gerris2D simulation.gfs
2- If 2 processors are to be used: being constituted of 3 GfsBoxes
2 solutions can be considered:
- Either the 3
GfsBox
can be redistributed in 2 subdomains, which are bound to be 1 subdomain of 1GfsBox
and one of 2GfsBoxes
. This can be done by:
% gerris2D -b 2 simulation.gfs > parallelsimulation.gfs
then the simulation can be started using:
% mpirun -np 2 gerris2D parallelsimulation.gfs
- If we want to get a better balance between the size of the 2 subdomains, it is possible to split the simulation once and then reassemble it.
The 3 GfsBoxes
can be split once which would create 3*4 = 12 GfsBoxes
% gerris2D -s 1 simulation.gfs > splitsimulation.gfs
then the 12 GfsBoxes
simulation can be partitioned in 2 groups of 6 GfsBoxes
, where the same pid is given to the GfsBoxes
of the same subdomain:
% gerris2D -b 2 splitsimulation.gfs > parallelsimulation.gfs
The simulation is still started in the same way:
% mpirun -np 2 gerris2D parallelsimulation.gfs
3- If 4 processors are to be used, then the domain has to be split anyway.
The 3 GfsBoxes
can be split once wich would create 3*4 = 12 GfsBoxes
% gerris2D -s 1 simulation.gfs > splitsimulation.gfs
then the 12 GfsBoxes
simulation can be partitioned in 4 groups of 3 GfsBoxes
, where the same pid is given to the GfsBoxes
of the same subdomain:
% gerris2D -b 4 splitsimulation.gfs > parallelsimulation.gfs
The simulation is still started in the same way:
% mpirun -np 4 gerris2D parallelsimulation.gfs
Dynamic load-balancing
When adaptive mesh refinement is used, the number of cells of each subdomain will change during the course of the simulation. If the size of the subdomains is not changed, some processors will end up working much harder than others which will lead to inefficient parallelisation. It is then necessary to "rebalance" the simulation. This is done using the GfsEventBalance object. Note that in this case the quality of the initial partition does not matter much as it will be rebalanced regularly anyway. In this case using the simpler and faster -b
option to create the initial partition is adequate.