An iterative and parallel solver

11.2. An iterative and parallel solver#

We consider the same setting, i.e. a flow over a NACA2412 airfoil (naca_geometry.py), but in 3 dimensions. Since the number of unknowns increases significantly we want to use iterative schemes in the following. We discuss three different preconditioning approaches.

11.2.1. MinRes with block(-diagonal) preconditioners#

In a first version we want to use a preconditioned MinRes or GMRes Krylov subspace method. As motivation note that we can decompose our finite element matrix as follows

\[\begin{split} K = \begin{pmatrix} A & B^T \\ B & 0 \end{pmatrix} = \begin{pmatrix} I & 0 \\ BA^{-1} & I \end{pmatrix} \begin{pmatrix} A & 0 \\ 0 & -S \end{pmatrix} \begin{pmatrix} I & A^{-1}B^T \\ 0 & I \end{pmatrix}, \end{split}\]

where \(A, B\) are the finite-element matrices of the bilinear forms \(a(\cdot, \cdot)\) and \(b(\cdot, \cdot)\), respectively, and \(S = B A^{-1} B^T\) is the pressure Schur complement. This factorization, considering preconditioners \(\hat A \simeq A\) and \(\hat S \simeq S\) for which computationally efficient applications of the inverse are available, motivates to define preconditioners (below we also discuss another block preconditioner) for the whole system via

\[\begin{split} \hat K_{block} = \begin{pmatrix} \hat A & 0 \\ 0 & \hat S \end{pmatrix}. \end{split}\]

For the Stokes equations a good choice for \(\hat S\) can be motivated by considering \(B\) as the the discrete divergence operator and \(A\) as the (\(\nu\)-scaled) discrete Laplacian and \(B^T\) as the gradient operator. Then

\[B A^{-1} B^T \hat{=} \operatorname{div}_h \nu^{-1} \Delta_h^{-1} \nabla_h,\]

which can hence be interpreted of being approximately the discrete operator to the identity (scaled with \(\nu^{-1}\)). Hence, a good choice for \(\hat S\) is given by the finite element matrix of the with \(\nu^{-1}\) scaled mass bilinear form for the pressures, i.e.

\[ \hat s(p_h, q_h) \approx \frac{1}{\nu}\int_\Omega p_h \, q_h, \]

where (depending on the approximation space \(Q_h\)) one might use an approximation of the integral, e.g. mass-lumping, such that the application of \(\hat S^{-1}\) is efficient.

Note

Note that the spectral equivalence for the pressure Schur complement is based on the LBB-condition of the Stokes equations. Correspondingly the equivalence scales with the (inverse) LBB-constant that could potentially spoil the convergence of the iterative method

We want to solve the system in parallel. For this we start a corresponding cluster.

# predefined flags, number of ranks and available memory
from commonCFD import *
from ipyparallel import Cluster
MPI_RANKS = min(4,MPI_RANKS) # increase if problem increases (and you have more resources)
c = await Cluster(engines="mpi").start_and_connect(n=MPI_RANKS, activate=True)

Starting 4 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>

%%px
from commonCFD import *
from mpi4py import MPI
from ngsolve import *
from netgen.occ import *
# from ngsolve.krylovspace import CGSolver
from ngsolve.krylovspace import MinResSolver, GMResSolver, BramblePasciakCG
ngsglobals.numthreads=1

import ngsolve.ngs2petsc as n2p
import petsc4py.PETSc as psc

from naca_geometry import *

geom = OCCGeometry(occ_naca_profile(type = "2412", depth=2, height=4, angle=4, h=0.05), dim=3) #.GenerateMesh(maxh=0.2,  grading=0.9,comm=MPI.COMM_WORLD)

if (MPI.COMM_WORLD.rank == 0):
    ngmesh = geom.GenerateMesh(maxh=0.2,  grading=0.9)
    Mesh(ngmesh)
    ngmesh = ngmesh.Distribute(MPI.COMM_WORLD)
else:
    ngmesh = netgen.meshing.Mesh.Receive(MPI.COMM_WORLD)

mesh = Mesh(ngmesh)
mesh.ngmesh.SetGeometry(geom)
mesh.Curve(3)

nu = 1e-4

[stdout:0] The occ_naca_profile function was provided by Xaver Mooslechner. Thanks!

Again we consider the Taylor-Hood element. The velocity space is the product of three times the same scalar space. We will exploit this in the setup of the matrices and their preconditioners. To this end we define the Stokes system and the solver considering separate spaces for each component of the velocity. The viscosity matrix \(A\) is block-diagonal w.r.t. to the three components of the velocity:

\[\begin{split} A = \begin{pmatrix} A_s & 0 & 0 \\ 0 & A_s & 0 \\ 0 & 0 & A_s \end{pmatrix}, \end{split}\]

where \(A_s\) denotes the stiffness matrix to the scalar problem (Laplacian).

The preconditioner \(\hat A \stackrel{!}{\approx} A\) is set up through a preconditioner \(\hat A_s \stackrel{!}{\approx} A_s\) for which we consider the petsc4py.PETSc solver (available via the flag "gamg" later) which can be applied for a scalar-valued H1 space.

To compose the global problem from the scalar components in the solvers we make use of the classes BlockMatrix and BlockVector to setup the Stokes system. We can relate VectorH1 functions/vectors with H1 functions/vectors by using the restrictions and embedddings of the NGSolve VectorH1 space.

%%px
dirichlet_bnd = "wall|inlet"
V = VectorH1(mesh, order=2, dirichlet=dirichlet_bnd)
V1 = H1(mesh, order=2, dirichlet=dirichlet_bnd)
Q = H1(mesh, order=1)

if mesh.comm.rank==0:
    print ("ndof = ", V.ndofglobal,'+',Q.ndofglobal,'=',
           V.ndofglobal+Q.ndofglobal)

u, v = V.TnT()
u1,v1 = V1.TnT()
p,q = Q.TnT()

bfa1 = BilinearForm(nu * InnerProduct(Grad(u1),Grad(v1))*dx)
bfb = BilinearForm(div(u)*q*dx).Assemble()

hata1 = Preconditioner(bfa1, "gamg")
bfa1.Assemble(); #<- also setups up hata1

[stdout:0] ndof =  171468 + 8289 = 179757

\[\begin{align*} \tilde A = & \begin{pmatrix} \tilde A_s & 0 & 0 \\ 0 & \tilde A_s & 0 \\ 0 & 0 & \tilde A_s \end{pmatrix} \\ = & \underbrace{\begin{pmatrix} I \\ 0 \\ 0 \end{pmatrix}}_{\substack{\texttt{V.embeddings[0]} \\ =\texttt{V.restrictions[0].T}}} \tilde A_s \underbrace{\begin{pmatrix} I & 0 & 0 \end{pmatrix}}_{\substack{\texttt{V.embeddings[0].T} \\ =\texttt{V.restrictions[0]}}} & + \begin{pmatrix} 0 \\ I \\ 0 \end{pmatrix} \tilde A_s \begin{pmatrix} 0 & I & 0 \end{pmatrix} & + \begin{pmatrix} 0 \\ 0 \\ I \end{pmatrix} \tilde A_s \begin{pmatrix} 0 & 0 & I \end{pmatrix} \end{align*}\]

with \(\tilde{A} \in \{A, \hat A\}\) and \(\tilde A_s \in \{A_s, \hat A_s\}\).

%%px
# Vector-A matrix and preconditioner:
A = sum( [Ri.T@bfa1.mat@Ri for Ri in V.restrictions] )
hatAinv = sum( [Ei@hata1@Ei.T for Ei in V.embeddings])    

As mentioned before, we consider the mass-lumped pressure mass matrix as preconditioner of the pressure Schur complement. This can be done either using a predefined IntegrationRule or adding the diagonal=True flag to the BilinearForm. Note that we prefer to use the latter since this further does not allocate the additional zero entries on the off diagonal and hence allows for a faster application.

%%px
hatS = BilinearForm(1/nu * p*q*dx, diagonal=True).Assemble()
hatSinv = hatS.mat.Inverse()

Setup the block system:

%%px
B = bfb.mat
K = BlockMatrix([[A,B.T], [B, None]])

F = LinearForm(V).Assemble()
G = LinearForm(Q).Assemble()

rhsvec = BlockVector([F.vec, G.vec])

We first consider the block diagonal precondtioner.

%%px
hatK_block_inv = BlockMatrix([[hatAinv, None],[None, hatSinv]])

Set boundary conditions and homogenize the linear system: Need to solve for \((u,p) = (u_0 + u_{\text{bnd}}, p)\), s.t.

\[\begin{split} K \cdot \begin{pmatrix} u_0 \\ p \end{pmatrix} = \begin{pmatrix} F \\ G \end{pmatrix} - K \cdot \begin{pmatrix} u_{\text{bnd}} \\ 0 \end{pmatrix} = \texttt{rhsvec}. \end{split}\]

%%px

gfu = GridFunction(V)
gfp = GridFunction(Q)

uin = CF((1,0,0))

gfu.Set(uin, definedon = mesh.Boundaries("inlet"))

sol_vec = BlockVector([gfu.vec, gfp.vec])

rhsvec.data = -K* sol_vec

%%px

solver = MinResSolver(mat = K, pre = hatK_block_inv , maxiter=500, printrates='\r' if mesh.comm.rank==0 else False, tol=1e-8)
sol_vec.data += solver * rhsvec

[stdout:0] LinearSolver converged in 415 iterations to residual 1.0623732258990675e-09

gfu = c[:]["gfu"]
gfp = c[:]["gfp"]

from ngsolve.webgui import Draw

# Draw(gfp[0], draw_vol=False)
#Draw(gfu[0], "u");

Let’s build some fieldlines:

N = 10 
p = [(-1+4*i/N,-2+4*j/N,2*k/N) for i in range(1,2*N) for j in range(1,2*N) for k in range(1,N)]

fieldlines = gfu[0]._BuildFieldLines(gfu[0].space.mesh, p, num_fieldlines=N**3//5, randomized=True, length=0.3)

clipping = { "clipping" : { "y":0, "z":-1} }
Draw(gfu[0], gfu[0].space.mesh,  "X", draw_vol=False, draw_surf=True, objects=[fieldlines], \
     min=0, max=1, autoscale=False, settings={"Objects": {"Surface": True}}, **clipping);

11.2.2. Braess-Sarazin block preconditioner with GMRES#

Recall the exact decomposition:

\[\begin{split} K = \begin{pmatrix} A & B^T \\ B & 0 \end{pmatrix} = \begin{pmatrix} I & 0 \\ BA^{-1} & I \end{pmatrix} \begin{pmatrix} A & 0 \\ 0 & -S \end{pmatrix} \begin{pmatrix} I & A^{-1}B^T \\ 0 & I \end{pmatrix}. \end{split}\]

Previously we took the middle matrix and replaced \(A\) and \(S\) with preconditioners to obtain the overall block preconditioner.

Instead, an alternative block preconditioner is obtained by including the other two parts of the decomposition with \(A\) replaced by \(\hat A\):

\[\begin{align*} \hat K_{bs} = \begin{pmatrix} \hat A & B^T \\ B & B \hat A^{-1} B^T - \hat S \end{pmatrix} & = \begin{pmatrix} I & 0 \\ B\hat A^{-1} & I \end{pmatrix} \begin{pmatrix} \hat A & 0 \\ 0 & -\hat S \end{pmatrix} \begin{pmatrix} I & \hat A^{-1}B^T \\ 0 & I \end{pmatrix} \\ &= \begin{pmatrix} I & 0 \\ B\hat A^{-1} & I \end{pmatrix} \begin{pmatrix} \hat A & B^T \\ 0 & -\hat S \end{pmatrix}. \end{align*}\]

Note, that this preconditioner can only be used with the GMRes solver since (generally depending on the system) it is not SPD.

Note

For more details on the choice \(K_{bs}\) we refer to “An efficient smoother for the Stokes problem” by D. Braess and R. Sarazin, Applied Numerical Mathematics. https://doi.org/10.1016/S0168-9274(96)00059-1”

In the case of \(\hat K_{bs}\), note that solving the system

\[\begin{split} \hat K_{bs} \begin{pmatrix} \boldsymbol w \\ \boldsymbol q \end{pmatrix} = \begin{pmatrix} \boldsymbol f \\ \boldsymbol g \end{pmatrix}, \end{split}\]

reduces to solving

\[\begin{align*} &\begin{pmatrix} I & 0 \\ B\hat A^{-1} & I \end{pmatrix} \begin{pmatrix} \hat A & B^T \\ 0 & -\hat S \end{pmatrix} \begin{pmatrix} \boldsymbol w \\ \boldsymbol q \end{pmatrix} = \begin{pmatrix} \boldsymbol f \\ \boldsymbol g \end{pmatrix} \\ \Rightarrow & \begin{pmatrix} \hat A & B^T \\ 0 & -\hat S \end{pmatrix} \begin{pmatrix} \boldsymbol w \\ \boldsymbol q \end{pmatrix} = \begin{pmatrix} I & 0 \\ -B\hat A^{-1} & I \end{pmatrix} \begin{pmatrix} \boldsymbol f \\ \boldsymbol g \end{pmatrix} = \begin{pmatrix} \boldsymbol f \\ \boldsymbol g - B \underbrace{\hat A^{-1} \boldsymbol f}_{\hat w} \end{pmatrix}, \end{align*}\]

and thus

\[\begin{align*} \hat A \boldsymbol{\hat w} &= \boldsymbol f,\\ \hat S \boldsymbol q &= B \boldsymbol{\hat w - g},\\ \hat A \boldsymbol w &= \boldsymbol f - B^T \boldsymbol q. \end{align*}\]

To implement this we define a new class BSPC where we overload the multiplication accordingly.

%%px
from ngsolve import BaseMatrix

class BSPC(BaseMatrix):
    def __init__(self, M, Ahat_inv, Shat_inv):
        super(BSPC, self).__init__()
        self.M = M
        self.A, self.B, self.BT = M[0,0], M[1,0], M[0,1]
        self.Ahat_inv = Ahat_inv
        self.Shat_inv = Shat_inv
        self.mBTSg2 = self.A.CreateColVector()
        self.g2 = self.Shat_inv.CreateColVector()
        self.xtemp = self.M.CreateColVector()
    def IsComplex(self):
        return False
    def Height(self):
        return self.M.height
    def Width(self):
        return self.M.width
    def CreateColVector(self):
        return self.M.CreateColVector()
    def CreateRowVector(self):
        return self.M.CreateRowVector()
    def Mult(self, b, x):
        f, g = b[0], b[1]
        x[0].data = self.Ahat_inv * f
        self.g2.data = g - self.B * x[0]
        x[1].data = - self.Shat_inv * self.g2
        self.mBTSg2.data = self.BT * x[1]
        x[0].data -= self.Ahat_inv * self.mBTSg2
    def MultTrans(self, b, x):
        self.Mult(b, x)
    def MultAdd(self, scal, b, x):
        self.Mult(b, self.xtemp)
        x.data += scal * self.xtemp
    def MultTransAdd(self, scal, b, x):
        self.MultAdd(scal, b, x)

%%px 
hatK_sz_inv = BSPC(M = K, Ahat_inv = hatAinv, Shat_inv = hatSinv)
solver = GMResSolver(mat = K, pre = hatK_sz_inv , maxiter=500, \
                    printrates='\r' if mesh.comm.rank==0 else False, tol=1e-8)
sol_vec.data += solver * rhsvec

[stdout:0] GMRes converged in 77 iterations to residual 5.592831682661148e-07

11.2.3. Bramble Pasciak conjugate gradient method#

Consider again a preconditioner \(\hat A\) for the matrix \(A\) such that the spectral equivalence

\[ \alpha_0 (\hat A \boldsymbol u,\boldsymbol v ) \le (A \boldsymbol u, \boldsymbol v) \le \alpha_1 (\hat A \boldsymbol u, \boldsymbol v ), \]

holds with \(\alpha_1 < 1\). Then, multiplying the system

\[\begin{split} K \begin{pmatrix} \boldsymbol u \\ \boldsymbol p \end{pmatrix} = \begin{pmatrix} \boldsymbol f \\ \boldsymbol g \end{pmatrix}, \end{split}\]

with

\[\begin{split} \begin{pmatrix} \hat A^{-1} & 0 \\ B \hat A^{-1} & -I \end{pmatrix}, \end{split}\]

from the left, we obtain the reformulated system

\[\begin{split} \mathcal{M} \begin{pmatrix} \boldsymbol u \\ \boldsymbol p \end{pmatrix} = \begin{pmatrix} {\hat A}^{-1} A & {\hat A}^{-1} B^T \\ B {\hat A}^{-1} (A - {\hat A}) & B{\hat A}^{-1}B^T \end{pmatrix} \begin{pmatrix} \boldsymbol u \\ \boldsymbol p \end{pmatrix} = \begin{pmatrix} {\hat A}^{-1} \boldsymbol f \\ B {\hat A}^{-1} \boldsymbol f - \boldsymbol g, \end{pmatrix}. \end{split}\]

The main purpose for this reformulation is that with the inner product

\[\begin{split} \Big[ \begin{pmatrix} \boldsymbol u \\ \boldsymbol p \end{pmatrix} , \begin{pmatrix} \boldsymbol v \\ \boldsymbol q \end{pmatrix} \Big] = (A \boldsymbol u, v) - (\hat A \boldsymbol u, \boldsymbol v) + (\boldsymbol p, \boldsymbol q), \end{split}\]

one can show that \(\mathcal{M}\) is symmetric positive definite, which allows to consider the conjugate gradient method.

Warning

Note, that the property \(\alpha_1 < 1\) is crucial for this and therefor the BramblePasciakCG solver considers an initial eigenvalue problem to derive the appropriate scaling of the preconditioner \(\hat A\) beforehand.

Note

For more details on the BPCG solver see “J. H. Bramble and J. E. Pasciak, A preconditioning technique for indefinite systems resulting from mixed approximations of elliptic problems, Math. Comp.”

The inputs for the BramblePasciakCG solver are not the BlockVector and the BlockMatrix that we defined before, but rather the diffusion matrix for the velocity A and the matrix for the constraint B as well as the right hand side vectors separately.

%%px
sol = BramblePasciakCG (A=A, B=B, C=None, f=rhsvec[0], g=rhsvec[1], \
                preA=hatAinv, preS=hatSinv, maxit=500, 
                printrates='\r' if mesh.comm.rank==0 else False, tol = 1e-8)

[stdout:0] lammin/lammax =  0.07687681116600281 / 0.9990994992630187
Iteration 229 err= 1.1709907775959723e-08     

c.shutdown(hub=True)

An iterative and parallel solver

Contents

11.2. An iterative and parallel solver#

11.2.1. MinRes with block(-diagonal) preconditioners#

11.2.2. Braess-Sarazin block preconditioner with GMRES#

11.2.3. Bramble Pasciak conjugate gradient method#