This page was generated from unit-5.4-fetidp_edge/feti-dp-iii.ipynb.

5.6.3 FETI-DP in NGSolve III: Using Non-Point Constraints

We show how to use FETI-DP with point- and edge-constraints.

This includes setting up the primal edge-constraints, but not the implementation of the dual-primal space inverse operator.

[1]:
num_procs = '40'
[2]:
from usrmeeting_jupyterstuff import *
[3]:
stop_cluster()
[4]:
start_cluster(num_procs)
connect_cluster()
Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json
connecting ... try:6 succeeded!
[5]:
%%px
from ngsolve import *
import netgen.meshing as ngmeshing
from ngsolve.la import SparseMatrixd, ParallelMatrix, ParallelDofs
from ngsolve.la import FETI_Jump, DISTRIBUTED, CUMULATED

nref = 0

dim=3
ngmesh = ngmeshing.Mesh(dim=dim)
ngmesh.Load('cube.vol')
for l in range(nref):
    ngmesh.Refine()
mesh = Mesh(ngmesh)
comm = MPI_Init()
fes = H1(mesh, order=2, dirichlet='right|top|top|left')
a = BilinearForm(fes)
u,v = fes.TnT()
a += SymbolicBFI(grad(u)*grad(v))
a.Assemble()
f = LinearForm(fes)
f += SymbolicLFI(x*y*v)
f.Assemble()
pardofs = fes.ParallelDofs()
avg_dof = comm.Sum(fes.ndof) / comm.size
if comm.rank==0:
    print('global,  ndof =', fes.ndofglobal, ', lodofs =', fes.lospace.ndofglobal)
    print('avg DOFs per core: ', avg_dof)
[stdout:31]
global,  ndof = 75403 , lodofs = 10355
avg DOFs per core:  2260.7

A new dual-primal space

If we additionally introduce edge-constraints to the old dual-primal space ˜VDP, we get out new dual-primal space :

˜VDP={uVDP:{u}e is continuous eE}

We already have ADP for the dual-primal space without edge-constraints. We can simply reuse that.

However, the inverse operation, bA1DPb is different.

In other words, we have to find a u˜VDP, such that

u=argminv˜VDP[12Au,ub,u]

In our implementation, instead of using a transformation of basis and explicitely working on ˜VDP, we instead work in VDP and solve a saddle point problem.

More preciely, in order for u to be in ˜VDP, there has to be some vector wRm (with m=#subdomain edges) with

{u(i)}ek=wki such that eΩi

This just means that for every edge there is a single value that equals the edge-average of u for all neighbouring subdomains.

Now, with local edge-average matrices B(i)p and restriction matrices R(i)

u˜VDPuVDPwRm:B(i)pu(i)=R(i)w i

We can now incorporate the constraint $ B_p u = w $ by going to a saddle point problem, and now have to solve:

(A0BTp00RTBpR0)(uwμ)=(b00)

We can eliminate all dual DOFs, as well as the lagrange parameters μ by locally eliminating the matrix block:

(A(i)ΔΔB(i)pTB(i)p0)

(Note: If we use point-constraints, the A(i)ΔΔ are invertible)

We end up with a global problem for (uπ,w)T of size

#subdomain vertices + #subdomain edges

The exact implementation is a bit tedious.

What we have to supply to be able to use it is: - A matrix B(i)p for each subdomain. We will use simple algebraic averages, in which case this is just a sparse matrix filled with ones. - ParallelDofs for the space w-lives in.

Finding the subdomain vertices works mostly as before:

[6]:
%%px
pythonic = True
if pythonic:
    faces = [set(d for d in pardofs.Proc2Dof(p) if d<mesh.nv and fes.FreeDofs()[d] ) for p in pardofs.ExchangeProcs()]
    edges = sorted([tuple(sorted(e)) for e in set(tuple(f1.intersection(f2)) for f1 in faces for f2 in faces if f1 is not f2) if len(e)>1])
    vertices = sorted(set([ v for e1 in edges for e2 in edges if e1 is not e2 for v in set(e1).intersection(set(e2)) ]))
else:
    faces = []
    for p in pardofs.ExchangeProcs():
        faces.append(set(d for d in pardofs.Proc2Dof(p) if d<mesh.nv and fes.FreeDofs()[d]))
    edges = []
    for f1 in faces:
        for f2 in faces:
            if f1 is not f2:
                edge = sorted(tuple(f1.intersection(f2)))
                if len(edge) > 1:
                    if not edge in edges:
                        edges.append(sorted(tuple(edge)))
    vertices = set()
    for e1 in edges:
        for e2 in edges:
            if e1 is not e2:
                vs = set(e1).intersection(set(e2))
                vertices = vertices.union(vs)
    vertices = sorted(vertices)

vec = f.vec.CreateVector()
vec.local_vec[:] = 0.0
for v in vertices:
    vec.local_vec[v] = 1
vec.SetParallelStatus(DISTRIBUTED)
vec.Cumulate()
vertices = [ v for v in range(mesh.nv) if vec.local_vec[v]!=0 ]

primal_dofs = BitArray([v in vertices for v in range(fes.ndof)]) & fes.FreeDofs()

We only make one small adjustment:

  • we do not want any subdomain vertices do be included in the edges

  • (for consistency) we want neither subdomain vertices nor subdomain edge-DOFs in the faces

[7]:
%%px
all_e = set.union(*[set(f) for f in faces]) if len(faces) else {}
faces2 = [[v for v in f if not v in all_e] for f in faces]
faces = [f for f in faces2 if len(f)]


edges2 = [[v for v in e if not v in vertices] for e in edges]
edges = [e for e in edges2 if len(e)]

We still need the same ParallelDofs as before for VDP.

[8]:
%%px
primal_dofs = BitArray([ v in set(vertices) for v in range(fes.ndof) ]) & fes.FreeDofs()
dp_pardofs = pardofs.SubSet(primal_dofs)

nprim = comm.Sum(sum([1 for k in range(fes.ndof) if primal_dofs[k] and comm.rank<pardofs.Dof2Proc(k)[0] ]))
npmin = comm.Min(primal_dofs.NumSet() if comm.rank else nprim)
npavg = comm.Sum(primal_dofs.NumSet())/comm.size
npmax = comm.Max(primal_dofs.NumSet())
if comm.rank==0:
    print('# primal dofs global: ', nprim)
    print('min, avg, max per rank: ', npmin, ' ', npavg, ' ', npmax)
[stdout:31]
# primal dofs global:  122
min, avg, max per rank:  4   12.425   25

Setting up the Edge-Constraint Matrix

Now that we know the edges, it is easy to construct Bp from COO format. Using generator expressions, it can be done in three lines.

[9]:
%%px
ar = [(num_e[0],d,1.0) for num_e in enumerate(edges) for d in num_e[1] ]
rows, cols, vals = [list(x) for x in zip(*ar)] if len(ar) else [[],[],[]]
B_p = SparseMatrixd.CreateFromCOO(rows, cols, vals, len(edges), fes.ndof)

ParallelDofs for w

This is also pretty simple. w has one DOF for each subdomain edge, and each edge’s DOF is shared by the set of all procs that share all DOFs in the entire edge.

[10]:
%%px
edist_procs = [sorted(set.intersection(*[set(pardofs.Dof2Proc(v)) for v in edge])) for edge in edges]
eavg_pardofs = ParallelDofs(edist_procs, comm)

neavg = comm.Sum(sum([1 for ps in edist_procs if comm.rank<ps[0]]))
neavg_min = comm.Min(len(edges) if comm.rank else neavg)
neavg_avg = comm.Sum(len(edges)) / comm.size
neavg_max = comm.Max(len(edges))
if comm.rank==0:
    print('# edge constraints global: ', neavg)
    print('min, avg, max: ', neavg_min, ' ', neavg_avg, ' ', neavg_max)
[stdout:31]
# edge constraints global:  227
min, avg, max:  9   17.025   29

Getting the Dual-Primal Inverse

DPSpace_Inverse needs:

  • the original matrix matrix

  • a freedofs-BitArray

  • a BitArray for the point-constraints

  • the constraint matrix, and paralleldofs for w

  • the inversetype to use for local problems

  • the inversetype to use for the global coarse grid problem

[11]:
%%px
from dd_toolbox import DPSpace_Inverse
A_dp_inv = DPSpace_Inverse(mat=a.mat, freedofs=fes.FreeDofs(), \
                           c_points=primal_dofs, \
                           c_mat=B_p, c_pardofs=eavg_pardofs, \
                           invtype_loc='sparsecholesky', \
                           invtype_glob='masterinverse')

We are not using an explicit tranformation of basis, therefore: # Everything else works the same - experiment below!!

[12]:
from usrmeeting_jupyterstuff import *
num_procs = '100'
stop_cluster()
start_cluster(num_procs)
connect_cluster()
Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json
connecting ... try:6 succeeded!
[13]:
%%px
from ngsolve import *
import netgen.meshing as ngmeshing
from ngsolve.la import ParallelMatrix, FETI_Jump, SparseMatrixd, ParallelDofs
from dd_toolbox import FindFEV, DPSpace_Inverse, ScaledMat

def load_mesh(nref=0):
    ngmesh = ngmeshing.Mesh(dim=3)
    ngmesh.Load('cube.vol')
    for l in range(nref):
        ngmesh.Refine()
    return Mesh(ngmesh)

def setup_space(mesh, order=1):
    comm = MPI_Init()
    fes = H1(mesh, order=order, dirichlet='right|top')
    a = BilinearForm(fes)
    u,v = fes.TnT()
    a += SymbolicBFI(grad(u)*grad(v))
    a.Assemble()
    f = LinearForm(fes)
    f += SymbolicLFI(x*y*v)
    f.Assemble()
    avg_dof = comm.Sum(fes.ndof) / comm.size
    if comm.rank==0:
        print('global,  ndof =', fes.ndofglobal, ', lodofs =', fes.lospace.ndofglobal)
        print('avg DOFs per core: ', avg_dof)
    return [fes, a, f]

def setup_FETIDP(fes, a):
    faces, edges, vertices = FindFEV(mesh.dim, mesh.nv, \
                                     fes.ParallelDofs(), fes.FreeDofs())
    primal_dofs = BitArray([ v in set(vertices) for v in range(fes.ndof) ]) & fes.FreeDofs()
    dp_pardofs = fes.ParallelDofs().SubSet(primal_dofs)
    ar = [(num_e[0],d,1.0) for num_e in enumerate(edges) for d in num_e[1] ]
    rows, cols, vals = [list(x) for x in zip(*ar)] if len(ar) else [[],[],[]]
    B_p = SparseMatrixd.CreateFromCOO(rows, cols, vals, len(edges), fes.ndof)
    edist_procs = [sorted(set.intersection(*[set(fes.ParallelDofs().Dof2Proc(v)) for v in edge])) for edge in edges]
    eavg_pardofs = ParallelDofs(edist_procs, comm)
    nprim = comm.Sum(sum([1 for k in range(fes.ndof) if primal_dofs[k] and comm.rank<fes.ParallelDofs().Dof2Proc(k)[0] ]))
    if comm.rank==0:
        print('# of global primal dofs: ', nprim)
    A_dp = ParallelMatrix(a.mat.local_mat, dp_pardofs)
    dual_pardofs = fes.ParallelDofs().SubSet(BitArray(~primal_dofs & fes.FreeDofs()))
    B = FETI_Jump(dual_pardofs, u_pardofs=dp_pardofs)
    if comm.rank==0:
        print('# of global multipliers = :', B.col_pardofs.ndofglobal)
    A_dp_inv = DPSpace_Inverse(mat=a.mat, freedofs=fes.FreeDofs(), \
                               c_points=primal_dofs, \
                               c_mat=B_p, c_pardofs=eavg_pardofs, \
                               invtype_loc='sparsecholesky', \
                               invtype_glob='masterinverse')
    F = B @ A_dp_inv @ B.T
    innerdofs = BitArray([len(fes.ParallelDofs().Dof2Proc(k))==0 for k in range(fes.ndof)]) & fes.FreeDofs()
    A = a.mat.local_mat
    Aiinv = A.Inverse(innerdofs, inverse='sparsecholesky')
    scaledA = ScaledMat(A, [1.0/(1+len(fes.ParallelDofs().Dof2Proc(k))) for k in range(fes.ndof)])
    scaledBT = ScaledMat(B.T, [1.0/(1+len(fes.ParallelDofs().Dof2Proc(k))) for k in range(fes.ndof)])
    Fhat = B @ scaledA @ (IdentityMatrix() - Aiinv @ A) @ scaledBT
    return [A_dp, A_dp_inv, F, Fhat, B, scaledA, scaledBT]

def prep(B, Ainv, f):
    rhs.data = (B @ Ainv) * f.vec
    return rhs

def solve(mat, pre, rhs, sol):
    t = comm.WTime()
    solvers.CG(mat=mat, pre=pre, rhs=rhs, sol=sol, \
               maxsteps=100, printrates=comm.rank==0, tol=1e-6)
    return comm.WTime() - t

def post(B, Ainv, gfu, lam):
    hv = B.CreateRowVector()
    hv.data = f.vec - B.T * lam
    gfu.vec.data = Ainv * hv
    jump = lam.CreateVector()
    jump.data = B * gfu.vec
    norm_jump = Norm(jump)
    if comm.rank==0:
        print('\nnorm jump u: ', norm_jump)
[14]:
%%px
comm = MPI_Init()
mesh = load_mesh(nref=1)
fes, a, f = setup_space(mesh, order=2)
A_dp, A_dp_inv, F, Fhat, B, scaledA, scaledBT = setup_FETIDP(fes, a)
rhs = B.CreateColVector()
lam = B.CreateColVector()
prep(B, A_dp_inv, f)
if comm.rank==0:
    print('')
t = solve(F,  Fhat,  rhs, lam)
if comm.rank==0:
    print('\ntime solve: ', t)
    print('dofs per core and second: ', fes.ndofglobal / (t * comm.size))
gfu = GridFunction(fes)
post(B, A_dp_inv, gfu, lam)
[stdout:16]
global,  ndof = 576877 , lodofs = 75403
avg DOFs per core:  6607.56
# of global primal dofs:  399
# of global multipliers = : 87867

it =  0  err =  0.12130848690672913
it =  1  err =  0.02390767825931742
it =  2  err =  0.00825023321167997
it =  3  err =  0.002729895619289229
it =  4  err =  0.0008641121927087105
it =  5  err =  0.0003048442822355854
it =  6  err =  9.712889856412031e-05
it =  7  err =  3.3730326402392145e-05
it =  8  err =  1.0925387588517252e-05
it =  9  err =  3.32252185091653e-06
it =  10  err =  1.067131618337071e-06
it =  11  err =  3.318060576723047e-07

time solve:  0.5276237779762596
dofs per core and second:  10933.491326199415

norm jump u:  2.0337283556173004e-06
[15]:
%%px --target 1
for t in sorted(filter(lambda t:t['time']>0.5, Timers()), key=lambda t:t['time'], reverse=True):
    print(t['name'], ':  ', t['time'])
SparseCholesky<d,d,d>::MultAdd :   1.095705270767212
SparseCholesky<d,d,d>::MultAdd fac1 :   0.6547586917877197
[16]:
%%px
t_chol = filter(lambda t: t['name'] == 'SparseCholesky<d,d,d>::MultAdd', Timers()).__next__()
maxt = comm.Max(t_chol['time'])
if t_chol['time'] == maxt:
    print('timers from rank ', comm.rank, ':')
    for t in sorted(filter(lambda t:t['time']>min(0.3*maxt, 0.5), Timers()), key=lambda t:t['time'], reverse=True):
        print(t['name'], ':  ', t['time'])
[stdout:2]
timers from rank  13 :
SparseCholesky<d,d,d>::MultAdd :   1.214454174041748
SparseCholesky<d,d,d>::MultAdd fac1 :   0.7252676486968994
SparseCholesky<d,d,d>::MultAdd fac2 :   0.47362494468688965
SparseCholesky - total :   0.42598485946655273
[17]:
stop_cluster()