This page was generated from unit-5a.1-mpi/poisson_mpi.ipynb.
5.1 Poisson Equation in Parallel¶
NGSolve can be executed on a cluster using the MPI message passing interface. You can download poisson_mpi.py and run it as
mpirun -np 4 python3 poisson_mpi.py
This computes and saves a solution which we can visualize it with drawsolution.py
netgen drawsolution.py
For proper parallel execution, Netgen/NGSolve must be configured with -DUSE_MPI=ON
. Recent binaries for Linux and Mac are built with MPI support (?). If you are unsure, when your Netgen/NGSolve supports MPI, look for outputs like “Including MPI version 3.1” during Netgen startup.
MPI-parallel execution using ipyparallel¶
For the MPI jupyter-tutorials we use ipyparallel
module. Please consult https://ipyparallel.readthedocs.io for installation.
On my notebook I have created the profile once by the command
ipython profile create –parallel –profile=mpi
Since I have only two physical cores, I edited the file .ipython/profile_mpi/ipcluster_config.py
to allow for oversubscription:
c.MPILauncher.mpi_args = [‘–oversubscribe’]
I start the cluster via
ipcluster start –engines=MPI -n 4 –profile=mpi
In jupyter, we can then connect to the cluster via a Client
object of ipyparallel.
[1]:
from ipyparallel import Client
c = Client(profile='mpi')
c.ids
[1]:
[0, 1, 2, 3]
We use mpi4py https://mpi4py.readthedocs.io/ for issuing MPI calls from Python. The %%px syntax magic causes parallel execution of that cell:
[2]:
%%px
from mpi4py import MPI
comm = MPI.COMM_WORLD
print (comm.rank, comm.size)
[stdout:0] 0 4
[stdout:1] 1 4
[stdout:2] 2 4
[stdout:3] 3 4
The master process (rank==0) generates the mesh, and distributes it within the group of processes defined by the communicator. All other ranks receive a part of the mesh. The function mesh.GetNE(VOL) returns the local number of elements:
[3]:
%%px
from ngsolve import *
from netgen.geom2d import unit_square
if comm.rank == 0:
mesh = Mesh(unit_square.GenerateMesh(maxh=0.1).Distribute(comm))
else:
mesh = Mesh(netgen.meshing.Mesh.Receive(comm))
print (mesh.GetNE(VOL))
[stdout:0]
Generate Mesh from spline geometry
Boundary mesh done, np = 40
CalcLocalH: 40 Points 0 Elements 0 Surface Elements
Meshing domain 1 / 1
load internal triangle rules
Surface meshing done
Edgeswapping, topological
Smoothing
Split improve
Combine improve
Smoothing
Edgeswapping, metric
Smoothing
Split improve
Combine improve
Smoothing
Edgeswapping, metric
Smoothing
Split improve
Combine improve
Smoothing
0 call metis 5 ...
metis start
metis complete
Send/Receive mesh
Update mesh topology
Sending nr of elements
Building vertex/proc mapping
Sending Vertices - vertices
Sending Vertices - identifications
Sending Vertices - distprocs
Sending elements
Sending Face Descriptors
Sending Surface elements
Sending Edge Segments
Point-Elements ...
now wait ...
Sending names
wait for names
Clean up local memory
Update mesh topology
send mesh complete
update parallel topology
Update clusters
update parallel topology
[stdout:1]
76 p1: got 0 elements and 76 surface elements
[stdout:2]
81
p2: got 0 elements and 81 surface elements
[stdout:3]
75
p3: got 0 elements and 75 surface elements
We can define spaces, bilinear / linear forms, and gridfunctions in the same way as in sequential mode. But now, the degrees of freedom are distributed on the cluster following the distribution of the mesh. The finite element spaces define how the dofs match together.
[4]:
%%px
fes = H1(mesh, order=3, dirichlet=".*")
u,v = fes.TnT()
a = BilinearForm(grad(u)*grad(v)*dx)
pre = Preconditioner(a, "local")
a.Assemble()
f = LinearForm(1*v*dx).Assemble()
gfu = GridFunction(fes)
from ngsolve.krylovspace import CGSolver
inv = CGSolver(a.mat, pre.mat, printing=comm.rank==0, maxsteps=200, tol=1e-8)
gfu.vec.data = inv*f.vec
[stdout:0]
assemble VOL element 0/0
WARNING: printing is deprecated, use printrates instead!
WARNING: maxsteps is deprecated, use maxiter instead!
assemble VOL element 0/0
CG iteration 1, residual = 0.0531207003733573
CG iteration 2, residual = 0.07374408660622353
CG iteration 3, residual = 0.06613575382258267
CG iteration 4, residual = 0.050564282985203636
CG iteration 5, residual = 0.048118728884873806
CG iteration 6, residual = 0.03171133576546557
CG iteration 7, residual = 0.026339954640005568
CG iteration 8, residual = 0.018867600495310295
CG iteration 9, residual = 0.013856359824532939
CG iteration 10, residual = 0.011508956583247314
CG iteration 11, residual = 0.008869710488825623
CG iteration 12, residual = 0.004854287485627618
CG iteration 13, residual = 0.0022162679463868917
CG iteration 14, residual = 0.0015615896552425461
CG iteration 15, residual = 0.0011065170423390227
CG iteration 16, residual = 0.0007171571458209736
CG iteration 17, residual = 0.0005251131240243119
CG iteration 18, residual = 0.0003498383890375572
CG iteration 19, residual = 0.00022463539798289107
CG iteration 20, residual = 0.0001592318115618978
CG iteration 21, residual = 0.00012412243193313203
CG iteration 22, residual = 8.768140471147995e-05
CG iteration 23, residual = 5.820850555540186e-05
CG iteration 24, residual = 4.0545678293521785e-05
CG iteration 25, residual = 3.0221249909169026e-05
CG iteration 26, residual = 2.2281511256913374e-05
CG iteration 27, residual = 1.5466555984921558e-05
CG iteration 28, residual = 1.0276039906232418e-05
CG iteration 29, residual = 6.856538971345465e-06
CG iteration 30, residual = 4.775824481043983e-06
CG iteration 31, residual = 3.1094353975493625e-06
CG iteration 32, residual = 2.308970147700432e-06
CG iteration 33, residual = 1.6644528884649138e-06
CG iteration 34, residual = 1.1420780176195693e-06
CG iteration 35, residual = 7.667709514061562e-07
CG iteration 36, residual = 4.845080009208966e-07
CG iteration 37, residual = 3.548946011540788e-07
CG iteration 38, residual = 2.426963050278122e-07
CG iteration 39, residual = 1.602381789666029e-07
CG iteration 40, residual = 1.029137408537242e-07
CG iteration 41, residual = 6.809889912101365e-08
CG iteration 42, residual = 4.656201912591468e-08
CG iteration 43, residual = 3.42805995298198e-08
CG iteration 44, residual = 2.3419466928295755e-08
CG iteration 45, residual = 1.4564354581502226e-08
CG iteration 46, residual = 8.73321716660204e-09
CG iteration 47, residual = 5.684444002805568e-09
CG iteration 48, residual = 4.136624311756414e-09
CG iteration 49, residual = 2.9795076726749184e-09
CG iteration 50, residual = 2.012474167744149e-09
CG iteration 51, residual = 1.2237905564761717e-09
CG iteration 52, residual = 7.854711295756782e-10
CG iteration 53, residual = 6.170296480272248e-10
CG iteration 54, residual = 4.909037899677378e-10
[stdout:1] WARNING: maxsteps is deprecated, use maxiter instead!
[stdout:2] WARNING: maxsteps is deprecated, use maxiter instead!
[stdout:3] WARNING: maxsteps is deprecated, use maxiter instead!
Parallel pickling allows to serialize the distributed solution and transfer it to the client. The process with rank=0 gets the whole mesh and computed solution, all other processes get the local parts of the mesh and solution:
[5]:
gfu = c[:]["gfu"]
Update mesh topology
Update clusters
Update mesh topology
Update clusters
Update mesh topology
Update clusters
Update mesh topology
Update clusters
We can now draw the whole solution using the the master process’s gfu[0]
.
[6]:
from ngsolve.webgui import Draw
Draw (gfu[0])
[6]:
BaseWebGuiScene
Drawing gfu[n]
will draw only part of the solution that the process with rank=n
possesses.
[7]:
Draw (gfu[3])
[7]:
BaseWebGuiScene
We can also visualize the sub-domains obtained by the automatic partitioning, without using any computed solution, as follows.
[8]:
%%px
fesL2 = L2(mesh, order=0)
gfL2 = GridFunction(fesL2)
gfL2.vec.local_vec[:] = comm.rank
[9]:
gfL2 = c[:]["gfL2"]
Draw (gfL2[0])
Update mesh topology
Update clusters
Update mesh topology
Update clusters
Update mesh topology
Update clusters
Update mesh topology
Update clusters
[9]:
BaseWebGuiScene