This page was generated from unit-5a.1-mpi/poisson_mpi.ipynb.

# 5.1 Poisson Equation in Parallel¶

NGSolve can be executed on a cluster using the MPI message passing interface. You can download poisson_mpi.py and run it as

mpirun -np 4 python3 poisson_mpi.py

This computes and saves a solution which we can visualize it with drawsolution.py

netgen drawsolution.py

For proper parallel execution, Netgen/NGSolve must be configured with `-DUSE_MPI=ON`

. Recent binaries for Linux and Mac are built with MPI support (?). If you are unsure, when your Netgen/NGSolve supports MPI, look for outputs like "Including MPI version 3.1" during Netgen startup.

## MPI-parallel execution using ipyparallel¶

For the MPI jupyter-tutorials we use `ipyparallel`

module. Please consult https://ipyparallel.readthedocs.io for installation.

We start an MPI server:

```
[1]:
```

```
from ipyparallel import Cluster
c = await Cluster(engines="mpi").start_and_connect(n=4, activate=True)
```

```
Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
```

We use mpi4py https://mpi4py.readthedocs.io/ for issuing MPI calls from Python. The %%px syntax magic causes parallel execution of that cell:

```
[2]:
```

```
%%px
from mpi4py.MPI import COMM_WORLD as comm
print (comm.rank, comm.size)
```

```
[stdout:0] 0 4
```

```
[stdout:1] 1 4
```

```
[stdout:2] 2 4
```

```
[stdout:3] 3 4
```

The *master* process (rank==0) generates the mesh, and distributes it within the group of processes defined by the *communicator*. All other ranks receive a part of the mesh. The function mesh.GetNE(VOL) returns the local number of elements:

```
[3]:
```

```
%%px
from ngsolve import *
from netgen.occ import unit_square
if comm.rank == 0:
mesh = Mesh(unit_square.GenerateMesh(maxh=0.1).Distribute(comm))
else:
mesh = Mesh(netgen.meshing.Mesh.Receive(comm))
print (mesh.GetNE(VOL))
```

```
[stdout:2] 81
```

```
[stdout:0] 0
```

```
[stdout:3] 76
```

```
[stdout:1] 75
```

We can define spaces, bilinear / linear forms, and gridfunctions in the same way as in sequential mode. But now, the degrees of freedom are distributed on the cluster following the distribution of the mesh. The finite element spaces define how the dofs match together.

```
[4]:
```

```
%%px
fes = H1(mesh, order=3, dirichlet=".*")
u,v = fes.TnT()
a = BilinearForm(grad(u)*grad(v)*dx)
pre = Preconditioner(a, "local")
a.Assemble()
f = LinearForm(1*v*dx).Assemble()
gfu = GridFunction(fes)
from ngsolve.krylovspace import CGSolver
inv = CGSolver(a.mat, pre.mat, printrates=comm.rank==0, maxiter=200, tol=1e-8)
gfu.vec.data = inv*f.vec
```

```
[stdout:0] CG iteration 1, residual = 0.05344510152082692
CG iteration 2, residual = 0.07335467296659272
CG iteration 3, residual = 0.0610674694260373
CG iteration 4, residual = 0.04965686429236397
CG iteration 5, residual = 0.0452518184441288
CG iteration 6, residual = 0.030277964847912075
CG iteration 7, residual = 0.025794422842119066
CG iteration 8, residual = 0.017216403495970298
CG iteration 9, residual = 0.013130605571425603
CG iteration 10, residual = 0.011297762804869443
CG iteration 11, residual = 0.007418438464791829
CG iteration 12, residual = 0.0035092287200178354
CG iteration 13, residual = 0.0019971399098125554
CG iteration 14, residual = 0.0013761297598940663
CG iteration 15, residual = 0.000903922060558815
CG iteration 16, residual = 0.0006240512390656086
CG iteration 17, residual = 0.00044459028698273437
CG iteration 18, residual = 0.0003262241057823474
CG iteration 19, residual = 0.00020591309657151647
CG iteration 20, residual = 0.00015650517646013305
CG iteration 21, residual = 0.00010903505272621296
CG iteration 22, residual = 8.232416345564249e-05
CG iteration 23, residual = 6.313011393893972e-05
CG iteration 24, residual = 4.178559306769995e-05
CG iteration 25, residual = 2.6252835270900835e-05
CG iteration 26, residual = 1.8512831482037585e-05
CG iteration 27, residual = 1.3059431765666551e-05
CG iteration 28, residual = 8.758234489197777e-06
CG iteration 29, residual = 6.0270880532180526e-06
CG iteration 30, residual = 3.7875628716913018e-06
CG iteration 31, residual = 2.2069510885417045e-06
CG iteration 32, residual = 1.8184727625440557e-06
CG iteration 33, residual = 1.478287578601823e-06
CG iteration 34, residual = 9.112034138773395e-07
CG iteration 35, residual = 4.883620238470982e-07
CG iteration 36, residual = 3.038142976567947e-07
CG iteration 37, residual = 2.0211650398256315e-07
CG iteration 38, residual = 1.3967366378247875e-07
CG iteration 39, residual = 1.0549244104497372e-07
CG iteration 40, residual = 7.33917909937771e-08
CG iteration 41, residual = 4.690200695356063e-08
CG iteration 42, residual = 2.8297041322593687e-08
CG iteration 43, residual = 1.8878461321128183e-08
CG iteration 44, residual = 1.4036272475489988e-08
CG iteration 45, residual = 9.18169543187263e-09
CG iteration 46, residual = 5.659420341758749e-09
CG iteration 47, residual = 3.2619329154731113e-09
CG iteration 48, residual = 2.319436388138599e-09
CG iteration 49, residual = 1.6791808348364218e-09
CG iteration 50, residual = 1.100074455943685e-09
CG iteration 51, residual = 6.905317008951446e-10
CG iteration 52, residual = 4.4057048090158657e-10
```

Parallel pickling allows to serialize the distributed solution and transfer it to the client. The process with rank=0 gets the whole mesh and computed solution, all other processes get the local parts of the mesh and solution:

```
[5]:
```

```
gfu = c[:]["gfu"]
```

We can now draw the whole solution using the the master process’s `gfu[0]`

.

```
[6]:
```

```
from ngsolve.webgui import Draw
Draw (gfu[0]);
```

Drawing `gfu[n]`

will draw only part of the solution that the process with rank=`n`

possesses.

```
[7]:
```

```
Draw (gfu[3]);
```

We can also visualize the sub-domains obtained by the automatic partitioning, without using any computed solution, as follows.

```
[8]:
```

```
%%px
fesL2 = L2(mesh, order=0)
gfL2 = GridFunction(fesL2)
gfL2.vec.local_vec[:] = comm.rank
```

```
[9]:
```

```
gfL2 = c[:]["gfL2"]
Draw (gfL2[0]);
```