This page was generated from unit-5a.1-mpi/poisson_mpi.ipynb.
5.1 Poisson Equation in Parallel¶
NGSolve can be executed on a cluster using the MPI message passing interface. You can download poisson_mpi.py and run it as
mpirun -np 4 python3 poisson_mpi.py
This computes and saves a solution which we can visualize it with drawsolution.py
netgen drawsolution.py
For proper parallel execution, Netgen/NGSolve must be configured with -DUSE_MPI=ON
. Recent binaries for Linux and Mac are built with MPI support (?). If you are unsure, when your Netgen/NGSolve supports MPI, look for outputs like “Including MPI version 3.1” during Netgen startup.
MPI-parallel execution using ipyparallel¶
For the MPI jupyter-tutorials we use ipyparallel
module. Please consult https://ipyparallel.readthedocs.io for installation.
On my notebook I have created the profile once by the command
ipython profile create –parallel –profile=mpi
Since I have only two physical cores, I edited the file .ipython/profile_mpi/ipcluster_config.py
to allow for oversubscription:
c.MPILauncher.mpi_args = [‘–oversubscribe’]
I start the cluster via
ipcluster start –engines=MPI -n 4 –profile=mpi
In jupyter, we can then connect to the cluster via a Client
object of ipyparallel.
[1]:
from ipyparallel import Client
c = Client(profile='mpi')
c.ids
[1]:
[0, 1, 2, 3]
We use mpi4py https://mpi4py.readthedocs.io/ for issuing MPI calls from Python. The %%px syntax magic causes parallel execution of that cell:
[2]:
%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD
print (comm.rank, comm.size)
[stdout:0] 0 4
[stdout:1] 1 4
[stdout:2] 2 4
[stdout:3] 3 4
The master process (rank==0) generates the mesh, and distributes it within the group of processes defined by the communicator. All other ranks receive a part of the mesh. The function mesh.GetNE(VOL) returns the local number of elements:
[3]:
%%px
from ngsolve import *
from netgen.geom2d import unit_square
if comm.rank == 0:
mesh = Mesh(unit_square.GenerateMesh(maxh=0.1).Distribute(comm))
else:
mesh = Mesh(netgen.meshing.Mesh.Receive(comm))
print (mesh.GetNE(VOL))
[stdout:0] 0
[stdout:1] 76
[stdout:2] 81
[stdout:3] 75
We can define spaces, bilinear / linear forms, and gridfunctions in the same way as in sequential mode. But now, the degrees of freedom are distributed on the cluster following the distribution of the mesh. The finite element spaces define how the dofs match together.
[4]:
%%px
fes = H1(mesh, order=3, dirichlet=".*")
u,v = fes.TnT()
a = BilinearForm(grad(u)*grad(v)*dx)
pre = Preconditioner(a, "local")
a.Assemble()
f = LinearForm(1*v*dx).Assemble()
gfu = GridFunction(fes)
from ngsolve.krylovspace import CGSolver
inv = CGSolver(a.mat, pre.mat, printing=comm.rank==0, maxsteps=200, tol=1e-8)
gfu.vec.data = inv*f.vec
[stdout:0]
iteration 0 error = 0.0531207003733573
iteration 1 error = 0.07374408660622353
iteration 2 error = 0.06613575382258267
iteration 3 error = 0.050564282985203636
iteration 4 error = 0.048118728884873806
iteration 5 error = 0.03171133576546557
iteration 6 error = 0.026339954640005568
iteration 7 error = 0.018867600495310295
iteration 8 error = 0.013856359824532939
iteration 9 error = 0.011508956583247314
iteration 10 error = 0.008869710488825623
iteration 11 error = 0.004854287485627618
iteration 12 error = 0.0022162679463868917
iteration 13 error = 0.0015615896552425461
iteration 14 error = 0.0011065170423390227
iteration 15 error = 0.0007171571458209736
iteration 16 error = 0.0005251131240243119
iteration 17 error = 0.0003498383890375572
iteration 18 error = 0.00022463539798289107
iteration 19 error = 0.0001592318115618978
iteration 20 error = 0.00012412243193313203
iteration 21 error = 8.768140471147995e-05
iteration 22 error = 5.820850555540186e-05
iteration 23 error = 4.0545678293521785e-05
iteration 24 error = 3.0221249909169026e-05
iteration 25 error = 2.2281511256913374e-05
iteration 26 error = 1.5466555984921558e-05
iteration 27 error = 1.0276039906232418e-05
iteration 28 error = 6.856538971345466e-06
iteration 29 error = 4.775824481043984e-06
iteration 30 error = 3.1094353975493638e-06
iteration 31 error = 2.3089701477004328e-06
iteration 32 error = 1.6644528884649143e-06
iteration 33 error = 1.1420780176195701e-06
iteration 34 error = 7.667709514061564e-07
iteration 35 error = 4.84508000920897e-07
iteration 36 error = 3.5489460115407914e-07
iteration 37 error = 2.4269630502781234e-07
iteration 38 error = 1.602381789666031e-07
iteration 39 error = 1.0291374085372436e-07
iteration 40 error = 6.809889912101377e-08
iteration 41 error = 4.6562019125914776e-08
iteration 42 error = 3.428059952981986e-08
iteration 43 error = 2.3419466928295808e-08
iteration 44 error = 1.4564354581502244e-08
iteration 45 error = 8.733217166602041e-09
iteration 46 error = 5.68444400280555e-09
iteration 47 error = 4.13662431175638e-09
iteration 48 error = 2.9795076726748737e-09
iteration 49 error = 2.0124741677440948e-09
iteration 50 error = 1.2237905564761206e-09
iteration 51 error = 7.854711295756273e-10
iteration 52 error = 6.170296480271633e-10
iteration 53 error = 4.909037899676716e-10
Parallel pickling allows to serialize the distributed solution and transfer it to the client. The process with rank=0 gets the whole mesh and computed solution, all other processes get the local parts of the mesh and solution:
[5]:
gfu = c[:]["gfu"]
We can now draw the whole solution using the the master process’s gfu[0]
.
[6]:
from ngsolve.webgui import Draw
Draw (gfu[0])
[6]:
Drawing gfu[n]
will draw only part of the solution that the process with rank=n
possesses.
[7]:
Draw (gfu[3])
[7]:
We can also visualize the sub-domains obtained by the automatic partitioning, without using any computed solution, as follows.
[8]:
%%px
fesL2 = L2(mesh, order=0)
gfL2 = GridFunction(fesL2)
gfL2.vec.local_vec[:] = comm.rank
[9]:
gfL2 = c[:]["gfL2"]
Draw (gfL2[0])
[9]: