5.4 FETI-DP in NGSolve III: Using Non-Point Constraints¶
We show how to use FETI-DP with point- and edge-constraints.
This includes setting up the primal edge-constraints, but not the implementation of the dual-primal space inverse operator.
[1]:
num_procs = '40'
[2]:
from usrmeeting_jupyterstuff import *
[3]:
stop_cluster()
[4]:
start_cluster(num_procs)
connect_cluster()
Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json
connecting ... try:6 succeeded!
[5]:
%%px
from ngsolve import *
import netgen.meshing as ngmeshing
from ngsolve.la import SparseMatrixd, ParallelMatrix, ParallelDofs
from ngsolve.la import FETI_Jump, DISTRIBUTED, CUMULATED
nref = 0
dim=3
ngmesh = ngmeshing.Mesh(dim=dim)
ngmesh.Load('cube.vol')
for l in range(nref):
ngmesh.Refine()
mesh = Mesh(ngmesh)
comm = MPI_Init()
fes = H1(mesh, order=2, dirichlet='right|top|top|left')
a = BilinearForm(fes)
u,v = fes.TnT()
a += SymbolicBFI(grad(u)*grad(v))
a.Assemble()
f = LinearForm(fes)
f += SymbolicLFI(x*y*v)
f.Assemble()
pardofs = fes.ParallelDofs()
avg_dof = comm.Sum(fes.ndof) / comm.size
if comm.rank==0:
print('global, ndof =', fes.ndofglobal, ', lodofs =', fes.lospace.ndofglobal)
print('avg DOFs per core: ', avg_dof)
[stdout:31]
global, ndof = 75403 , lodofs = 10355
avg DOFs per core: 2260.7
A new dual-primal space¶
If we additionally introduce edge-constraints to the old dual-primal space \(\widetilde{V}_{\scriptscriptstyle DP}\), we get out new dual-primal space :
We already have \(A_{\scriptscriptstyle DP}\) for the dual-primal space without edge-constraints. We can simply reuse that.
However, the inverse operation, \(b\rightarrow A_{\scriptscriptstyle DP}^{-1}b\) is different.
In other words, we have to find a \(u\in \widetilde{V}_{\scriptscriptstyle DP}\), such that
In our implementation, instead of using a transformation of basis and explicitely working on \(\widetilde{V}_{\scriptscriptstyle DP}\), we instead work in \(V_{\scriptscriptstyle DP}\) and solve a saddle point problem.
More preciely, in order for \(u\) to be in \(\widetilde{V}_{\scriptscriptstyle DP}\), there has to be some vector \(w\in\mathbb{R}^m\) (with \(m=\#\text{subdomain edges}\)) with
This just means that for every edge there is a single value that equals the edge-average of \(u\) for all neighbouring subdomains.
Now, with local edge-average matrices \(B^{(i)}_p\) and restriction matrices \(R^{(i)}\)
We can now incorporate the constraint $ B_p u = w $ by going to a saddle point problem, and now have to solve:
We can eliminate all dual DOFs, as well as the lagrange parameters \(\mu\) by locally eliminating the matrix block:
(Note: If we use point-constraints, the \(A^{(i)}_{\Delta\Delta}\) are invertible)
We end up with a global problem for \((u_\pi, w)^T\) of size
#subdomain vertices + #subdomain edges
The exact implementation is a bit tedious.
What we have to supply to be able to use it is: - A matrix \(B_p^{(i)}\) for each subdomain. We will use simple algebraic averages, in which case this is just a sparse matrix filled with ones. - ParallelDofs for the space \(w\)-lives in.
Finding the subdomain vertices works mostly as before:
[6]:
%%px
pythonic = True
if pythonic:
faces = [set(d for d in pardofs.Proc2Dof(p) if d<mesh.nv and fes.FreeDofs()[d] ) for p in pardofs.ExchangeProcs()]
edges = sorted([tuple(sorted(e)) for e in set(tuple(f1.intersection(f2)) for f1 in faces for f2 in faces if f1 is not f2) if len(e)>1])
vertices = sorted(set([ v for e1 in edges for e2 in edges if e1 is not e2 for v in set(e1).intersection(set(e2)) ]))
else:
faces = []
for p in pardofs.ExchangeProcs():
faces.append(set(d for d in pardofs.Proc2Dof(p) if d<mesh.nv and fes.FreeDofs()[d]))
edges = []
for f1 in faces:
for f2 in faces:
if f1 is not f2:
edge = sorted(tuple(f1.intersection(f2)))
if len(edge) > 1:
if not edge in edges:
edges.append(sorted(tuple(edge)))
vertices = set()
for e1 in edges:
for e2 in edges:
if e1 is not e2:
vs = set(e1).intersection(set(e2))
vertices = vertices.union(vs)
vertices = sorted(vertices)
vec = f.vec.CreateVector()
vec.local_vec[:] = 0.0
for v in vertices:
vec.local_vec[v] = 1
vec.SetParallelStatus(DISTRIBUTED)
vec.Cumulate()
vertices = [ v for v in range(mesh.nv) if vec.local_vec[v]!=0 ]
primal_dofs = BitArray([v in vertices for v in range(fes.ndof)]) & fes.FreeDofs()
We only make one small adjustment:
- we do not want any subdomain vertices do be included in the edges
- (for consistency) we want neither subdomain vertices nor subdomain edge-DOFs in the faces
[7]:
%%px
all_e = set.union(*[set(f) for f in faces]) if len(faces) else {}
faces2 = [[v for v in f if not v in all_e] for f in faces]
faces = [f for f in faces2 if len(f)]
edges2 = [[v for v in e if not v in vertices] for e in edges]
edges = [e for e in edges2 if len(e)]
We still need the same ParallelDofs as before for \(V_{\scriptscriptstyle DP}\).
[8]:
%%px
primal_dofs = BitArray([ v in set(vertices) for v in range(fes.ndof) ]) & fes.FreeDofs()
dp_pardofs = pardofs.SubSet(primal_dofs)
nprim = comm.Sum(sum([1 for k in range(fes.ndof) if primal_dofs[k] and comm.rank<pardofs.Dof2Proc(k)[0] ]))
npmin = comm.Min(primal_dofs.NumSet() if comm.rank else nprim)
npavg = comm.Sum(primal_dofs.NumSet())/comm.size
npmax = comm.Max(primal_dofs.NumSet())
if comm.rank==0:
print('# primal dofs global: ', nprim)
print('min, avg, max per rank: ', npmin, ' ', npavg, ' ', npmax)
[stdout:31]
# primal dofs global: 122
min, avg, max per rank: 4 12.425 25
Setting up the Edge-Constraint Matrix¶
Now that we know the edges, it is easy to construct \(B_p\) from COO format. Using generator expressions, it can be done in three lines.
[9]:
%%px
ar = [(num_e[0],d,1.0) for num_e in enumerate(edges) for d in num_e[1] ]
rows, cols, vals = [list(x) for x in zip(*ar)] if len(ar) else [[],[],[]]
B_p = SparseMatrixd.CreateFromCOO(rows, cols, vals, len(edges), fes.ndof)
ParallelDofs for w¶
This is also pretty simple. \(w\) has one DOF for each subdomain edge, and each edge’s DOF is shared by the set of all procs that share all DOFs in the entire edge.
[10]:
%%px
edist_procs = [sorted(set.intersection(*[set(pardofs.Dof2Proc(v)) for v in edge])) for edge in edges]
eavg_pardofs = ParallelDofs(edist_procs, comm)
neavg = comm.Sum(sum([1 for ps in edist_procs if comm.rank<ps[0]]))
neavg_min = comm.Min(len(edges) if comm.rank else neavg)
neavg_avg = comm.Sum(len(edges)) / comm.size
neavg_max = comm.Max(len(edges))
if comm.rank==0:
print('# edge constraints global: ', neavg)
print('min, avg, max: ', neavg_min, ' ', neavg_avg, ' ', neavg_max)
[stdout:31]
# edge constraints global: 227
min, avg, max: 9 17.025 29
Getting the Dual-Primal Inverse¶
DPSpace_Inverse needs:
- the original matrix matrix
- a freedofs-BitArray
- a BitArray for the point-constraints
- the constraint matrix, and paralleldofs for \(w\)
- the inversetype to use for local problems
- the inversetype to use for the global coarse grid problem
[11]:
%%px
from dd_toolbox import DPSpace_Inverse
A_dp_inv = DPSpace_Inverse(mat=a.mat, freedofs=fes.FreeDofs(), \
c_points=primal_dofs, \
c_mat=B_p, c_pardofs=eavg_pardofs, \
invtype_loc='sparsecholesky', \
invtype_glob='masterinverse')
We are not using an explicit tranformation of basis, therefore: # Everything else works the same - experiment below!!
[12]:
from usrmeeting_jupyterstuff import *
num_procs = '100'
stop_cluster()
start_cluster(num_procs)
connect_cluster()
Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json
connecting ... try:6 succeeded!
[13]:
%%px
from ngsolve import *
import netgen.meshing as ngmeshing
from ngsolve.la import ParallelMatrix, FETI_Jump, SparseMatrixd, ParallelDofs
from dd_toolbox import FindFEV, DPSpace_Inverse, ScaledMat
def load_mesh(nref=0):
ngmesh = ngmeshing.Mesh(dim=3)
ngmesh.Load('cube.vol')
for l in range(nref):
ngmesh.Refine()
return Mesh(ngmesh)
def setup_space(mesh, order=1):
comm = MPI_Init()
fes = H1(mesh, order=order, dirichlet='right|top')
a = BilinearForm(fes)
u,v = fes.TnT()
a += SymbolicBFI(grad(u)*grad(v))
a.Assemble()
f = LinearForm(fes)
f += SymbolicLFI(x*y*v)
f.Assemble()
avg_dof = comm.Sum(fes.ndof) / comm.size
if comm.rank==0:
print('global, ndof =', fes.ndofglobal, ', lodofs =', fes.lospace.ndofglobal)
print('avg DOFs per core: ', avg_dof)
return [fes, a, f]
def setup_FETIDP(fes, a):
faces, edges, vertices = FindFEV(mesh.dim, mesh.nv, \
fes.ParallelDofs(), fes.FreeDofs())
primal_dofs = BitArray([ v in set(vertices) for v in range(fes.ndof) ]) & fes.FreeDofs()
dp_pardofs = fes.ParallelDofs().SubSet(primal_dofs)
ar = [(num_e[0],d,1.0) for num_e in enumerate(edges) for d in num_e[1] ]
rows, cols, vals = [list(x) for x in zip(*ar)] if len(ar) else [[],[],[]]
B_p = SparseMatrixd.CreateFromCOO(rows, cols, vals, len(edges), fes.ndof)
edist_procs = [sorted(set.intersection(*[set(fes.ParallelDofs().Dof2Proc(v)) for v in edge])) for edge in edges]
eavg_pardofs = ParallelDofs(edist_procs, comm)
nprim = comm.Sum(sum([1 for k in range(fes.ndof) if primal_dofs[k] and comm.rank<fes.ParallelDofs().Dof2Proc(k)[0] ]))
if comm.rank==0:
print('# of global primal dofs: ', nprim)
A_dp = ParallelMatrix(a.mat.local_mat, dp_pardofs)
dual_pardofs = fes.ParallelDofs().SubSet(BitArray(~primal_dofs & fes.FreeDofs()))
B = FETI_Jump(dual_pardofs, u_pardofs=dp_pardofs)
if comm.rank==0:
print('# of global multipliers = :', B.col_pardofs.ndofglobal)
A_dp_inv = DPSpace_Inverse(mat=a.mat, freedofs=fes.FreeDofs(), \
c_points=primal_dofs, \
c_mat=B_p, c_pardofs=eavg_pardofs, \
invtype_loc='sparsecholesky', \
invtype_glob='masterinverse')
F = B @ A_dp_inv @ B.T
innerdofs = BitArray([len(fes.ParallelDofs().Dof2Proc(k))==0 for k in range(fes.ndof)]) & fes.FreeDofs()
A = a.mat.local_mat
Aiinv = A.Inverse(innerdofs, inverse='sparsecholesky')
scaledA = ScaledMat(A, [1.0/(1+len(fes.ParallelDofs().Dof2Proc(k))) for k in range(fes.ndof)])
scaledBT = ScaledMat(B.T, [1.0/(1+len(fes.ParallelDofs().Dof2Proc(k))) for k in range(fes.ndof)])
Fhat = B @ scaledA @ (IdentityMatrix() - Aiinv @ A) @ scaledBT
return [A_dp, A_dp_inv, F, Fhat, B, scaledA, scaledBT]
def prep(B, Ainv, f):
rhs.data = (B @ Ainv) * f.vec
return rhs
def solve(mat, pre, rhs, sol):
t = comm.WTime()
solvers.CG(mat=mat, pre=pre, rhs=rhs, sol=sol, \
maxsteps=100, printrates=comm.rank==0, tol=1e-6)
return comm.WTime() - t
def post(B, Ainv, gfu, lam):
hv = B.CreateRowVector()
hv.data = f.vec - B.T * lam
gfu.vec.data = Ainv * hv
jump = lam.CreateVector()
jump.data = B * gfu.vec
norm_jump = Norm(jump)
if comm.rank==0:
print('\nnorm jump u: ', norm_jump)
[14]:
%%px
comm = MPI_Init()
mesh = load_mesh(nref=1)
fes, a, f = setup_space(mesh, order=2)
A_dp, A_dp_inv, F, Fhat, B, scaledA, scaledBT = setup_FETIDP(fes, a)
rhs = B.CreateColVector()
lam = B.CreateColVector()
prep(B, A_dp_inv, f)
if comm.rank==0:
print('')
t = solve(F, Fhat, rhs, lam)
if comm.rank==0:
print('\ntime solve: ', t)
print('dofs per core and second: ', fes.ndofglobal / (t * comm.size))
gfu = GridFunction(fes)
post(B, A_dp_inv, gfu, lam)
[stdout:16]
global, ndof = 576877 , lodofs = 75403
avg DOFs per core: 6607.56
# of global primal dofs: 399
# of global multipliers = : 87867
it = 0 err = 0.12130848690672913
it = 1 err = 0.02390767825931742
it = 2 err = 0.00825023321167997
it = 3 err = 0.002729895619289229
it = 4 err = 0.0008641121927087105
it = 5 err = 0.0003048442822355854
it = 6 err = 9.712889856412031e-05
it = 7 err = 3.3730326402392145e-05
it = 8 err = 1.0925387588517252e-05
it = 9 err = 3.32252185091653e-06
it = 10 err = 1.067131618337071e-06
it = 11 err = 3.318060576723047e-07
time solve: 0.5276237779762596
dofs per core and second: 10933.491326199415
norm jump u: 2.0337283556173004e-06
[15]:
%%px --target 1
for t in sorted(filter(lambda t:t['time']>0.5, Timers()), key=lambda t:t['time'], reverse=True):
print(t['name'], ': ', t['time'])
SparseCholesky<d,d,d>::MultAdd : 1.095705270767212
SparseCholesky<d,d,d>::MultAdd fac1 : 0.6547586917877197
[16]:
%%px
t_chol = filter(lambda t: t['name'] == 'SparseCholesky<d,d,d>::MultAdd', Timers()).__next__()
maxt = comm.Max(t_chol['time'])
if t_chol['time'] == maxt:
print('timers from rank ', comm.rank, ':')
for t in sorted(filter(lambda t:t['time']>min(0.3*maxt, 0.5), Timers()), key=lambda t:t['time'], reverse=True):
print(t['name'], ': ', t['time'])
[stdout:2]
timers from rank 13 :
SparseCholesky<d,d,d>::MultAdd : 1.214454174041748
SparseCholesky<d,d,d>::MultAdd fac1 : 0.7252676486968994
SparseCholesky<d,d,d>::MultAdd fac2 : 0.47362494468688965
SparseCholesky - total : 0.42598485946655273
[17]:
stop_cluster()