This page was generated from unit-5.4-fetidp_edge/feti-dp-iii.ipynb.
5.6.3 FETI-DP in NGSolve III: Using Non-Point Constraints¶
We show how to use FETI-DP with point- and edge-constraints.
This includes setting up the primal edge-constraints, but not the implementation of the dual-primal space inverse operator.
[1]:
num_procs = '40'
[2]:
from usrmeeting_jupyterstuff import *
[3]:
stop_cluster()
[4]:
start_cluster(num_procs)
connect_cluster()
Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json
connecting ... try:6 succeeded!
[5]:
%%px
from ngsolve import *
import netgen.meshing as ngmeshing
from ngsolve.la import SparseMatrixd, ParallelMatrix, ParallelDofs
from ngsolve.la import FETI_Jump, DISTRIBUTED, CUMULATED
nref = 0
dim=3
ngmesh = ngmeshing.Mesh(dim=dim)
ngmesh.Load('cube.vol')
for l in range(nref):
ngmesh.Refine()
mesh = Mesh(ngmesh)
comm = MPI_Init()
fes = H1(mesh, order=2, dirichlet='right|top|top|left')
a = BilinearForm(fes)
u,v = fes.TnT()
a += SymbolicBFI(grad(u)*grad(v))
a.Assemble()
f = LinearForm(fes)
f += SymbolicLFI(x*y*v)
f.Assemble()
pardofs = fes.ParallelDofs()
avg_dof = comm.Sum(fes.ndof) / comm.size
if comm.rank==0:
print('global, ndof =', fes.ndofglobal, ', lodofs =', fes.lospace.ndofglobal)
print('avg DOFs per core: ', avg_dof)
[stdout:31]
global, ndof = 75403 , lodofs = 10355
avg DOFs per core: 2260.7
A new dual-primal space¶
If we additionally introduce edge-constraints to the old dual-primal space ˜VDP, we get out new dual-primal space :
We already have ADP for the dual-primal space without edge-constraints. We can simply reuse that.
However, the inverse operation, b→A−1DPb is different.
In other words, we have to find a u∈˜VDP, such that
In our implementation, instead of using a transformation of basis and explicitely working on ˜VDP, we instead work in VDP and solve a saddle point problem.
More preciely, in order for u to be in ˜VDP, there has to be some vector w∈Rm (with m=#subdomain edges) with
This just means that for every edge there is a single value that equals the edge-average of u for all neighbouring subdomains.
Now, with local edge-average matrices B(i)p and restriction matrices R(i)
We can now incorporate the constraint $ B_p u = w $ by going to a saddle point problem, and now have to solve:
We can eliminate all dual DOFs, as well as the lagrange parameters μ by locally eliminating the matrix block:
(Note: If we use point-constraints, the A(i)ΔΔ are invertible)
We end up with a global problem for (uπ,w)T of size
#subdomain vertices + #subdomain edges
The exact implementation is a bit tedious.
What we have to supply to be able to use it is: - A matrix B(i)p for each subdomain. We will use simple algebraic averages, in which case this is just a sparse matrix filled with ones. - ParallelDofs for the space w-lives in.
Finding the subdomain vertices works mostly as before:
[6]:
%%px
pythonic = True
if pythonic:
faces = [set(d for d in pardofs.Proc2Dof(p) if d<mesh.nv and fes.FreeDofs()[d] ) for p in pardofs.ExchangeProcs()]
edges = sorted([tuple(sorted(e)) for e in set(tuple(f1.intersection(f2)) for f1 in faces for f2 in faces if f1 is not f2) if len(e)>1])
vertices = sorted(set([ v for e1 in edges for e2 in edges if e1 is not e2 for v in set(e1).intersection(set(e2)) ]))
else:
faces = []
for p in pardofs.ExchangeProcs():
faces.append(set(d for d in pardofs.Proc2Dof(p) if d<mesh.nv and fes.FreeDofs()[d]))
edges = []
for f1 in faces:
for f2 in faces:
if f1 is not f2:
edge = sorted(tuple(f1.intersection(f2)))
if len(edge) > 1:
if not edge in edges:
edges.append(sorted(tuple(edge)))
vertices = set()
for e1 in edges:
for e2 in edges:
if e1 is not e2:
vs = set(e1).intersection(set(e2))
vertices = vertices.union(vs)
vertices = sorted(vertices)
vec = f.vec.CreateVector()
vec.local_vec[:] = 0.0
for v in vertices:
vec.local_vec[v] = 1
vec.SetParallelStatus(DISTRIBUTED)
vec.Cumulate()
vertices = [ v for v in range(mesh.nv) if vec.local_vec[v]!=0 ]
primal_dofs = BitArray([v in vertices for v in range(fes.ndof)]) & fes.FreeDofs()
We only make one small adjustment:
we do not want any subdomain vertices do be included in the edges
(for consistency) we want neither subdomain vertices nor subdomain edge-DOFs in the faces
[7]:
%%px
all_e = set.union(*[set(f) for f in faces]) if len(faces) else {}
faces2 = [[v for v in f if not v in all_e] for f in faces]
faces = [f for f in faces2 if len(f)]
edges2 = [[v for v in e if not v in vertices] for e in edges]
edges = [e for e in edges2 if len(e)]
We still need the same ParallelDofs as before for VDP.
[8]:
%%px
primal_dofs = BitArray([ v in set(vertices) for v in range(fes.ndof) ]) & fes.FreeDofs()
dp_pardofs = pardofs.SubSet(primal_dofs)
nprim = comm.Sum(sum([1 for k in range(fes.ndof) if primal_dofs[k] and comm.rank<pardofs.Dof2Proc(k)[0] ]))
npmin = comm.Min(primal_dofs.NumSet() if comm.rank else nprim)
npavg = comm.Sum(primal_dofs.NumSet())/comm.size
npmax = comm.Max(primal_dofs.NumSet())
if comm.rank==0:
print('# primal dofs global: ', nprim)
print('min, avg, max per rank: ', npmin, ' ', npavg, ' ', npmax)
[stdout:31]
# primal dofs global: 122
min, avg, max per rank: 4 12.425 25
Setting up the Edge-Constraint Matrix¶
Now that we know the edges, it is easy to construct Bp from COO format. Using generator expressions, it can be done in three lines.
[9]:
%%px
ar = [(num_e[0],d,1.0) for num_e in enumerate(edges) for d in num_e[1] ]
rows, cols, vals = [list(x) for x in zip(*ar)] if len(ar) else [[],[],[]]
B_p = SparseMatrixd.CreateFromCOO(rows, cols, vals, len(edges), fes.ndof)
ParallelDofs for w¶
This is also pretty simple. w has one DOF for each subdomain edge, and each edge’s DOF is shared by the set of all procs that share all DOFs in the entire edge.
[10]:
%%px
edist_procs = [sorted(set.intersection(*[set(pardofs.Dof2Proc(v)) for v in edge])) for edge in edges]
eavg_pardofs = ParallelDofs(edist_procs, comm)
neavg = comm.Sum(sum([1 for ps in edist_procs if comm.rank<ps[0]]))
neavg_min = comm.Min(len(edges) if comm.rank else neavg)
neavg_avg = comm.Sum(len(edges)) / comm.size
neavg_max = comm.Max(len(edges))
if comm.rank==0:
print('# edge constraints global: ', neavg)
print('min, avg, max: ', neavg_min, ' ', neavg_avg, ' ', neavg_max)
[stdout:31]
# edge constraints global: 227
min, avg, max: 9 17.025 29
Getting the Dual-Primal Inverse¶
DPSpace_Inverse needs:
the original matrix matrix
a freedofs-BitArray
a BitArray for the point-constraints
the constraint matrix, and paralleldofs for w
the inversetype to use for local problems
the inversetype to use for the global coarse grid problem
[11]:
%%px
from dd_toolbox import DPSpace_Inverse
A_dp_inv = DPSpace_Inverse(mat=a.mat, freedofs=fes.FreeDofs(), \
c_points=primal_dofs, \
c_mat=B_p, c_pardofs=eavg_pardofs, \
invtype_loc='sparsecholesky', \
invtype_glob='masterinverse')
We are not using an explicit tranformation of basis, therefore: # Everything else works the same - experiment below!!
[12]:
from usrmeeting_jupyterstuff import *
num_procs = '100'
stop_cluster()
start_cluster(num_procs)
connect_cluster()
Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json
connecting ... try:6 succeeded!
[13]:
%%px
from ngsolve import *
import netgen.meshing as ngmeshing
from ngsolve.la import ParallelMatrix, FETI_Jump, SparseMatrixd, ParallelDofs
from dd_toolbox import FindFEV, DPSpace_Inverse, ScaledMat
def load_mesh(nref=0):
ngmesh = ngmeshing.Mesh(dim=3)
ngmesh.Load('cube.vol')
for l in range(nref):
ngmesh.Refine()
return Mesh(ngmesh)
def setup_space(mesh, order=1):
comm = MPI_Init()
fes = H1(mesh, order=order, dirichlet='right|top')
a = BilinearForm(fes)
u,v = fes.TnT()
a += SymbolicBFI(grad(u)*grad(v))
a.Assemble()
f = LinearForm(fes)
f += SymbolicLFI(x*y*v)
f.Assemble()
avg_dof = comm.Sum(fes.ndof) / comm.size
if comm.rank==0:
print('global, ndof =', fes.ndofglobal, ', lodofs =', fes.lospace.ndofglobal)
print('avg DOFs per core: ', avg_dof)
return [fes, a, f]
def setup_FETIDP(fes, a):
faces, edges, vertices = FindFEV(mesh.dim, mesh.nv, \
fes.ParallelDofs(), fes.FreeDofs())
primal_dofs = BitArray([ v in set(vertices) for v in range(fes.ndof) ]) & fes.FreeDofs()
dp_pardofs = fes.ParallelDofs().SubSet(primal_dofs)
ar = [(num_e[0],d,1.0) for num_e in enumerate(edges) for d in num_e[1] ]
rows, cols, vals = [list(x) for x in zip(*ar)] if len(ar) else [[],[],[]]
B_p = SparseMatrixd.CreateFromCOO(rows, cols, vals, len(edges), fes.ndof)
edist_procs = [sorted(set.intersection(*[set(fes.ParallelDofs().Dof2Proc(v)) for v in edge])) for edge in edges]
eavg_pardofs = ParallelDofs(edist_procs, comm)
nprim = comm.Sum(sum([1 for k in range(fes.ndof) if primal_dofs[k] and comm.rank<fes.ParallelDofs().Dof2Proc(k)[0] ]))
if comm.rank==0:
print('# of global primal dofs: ', nprim)
A_dp = ParallelMatrix(a.mat.local_mat, dp_pardofs)
dual_pardofs = fes.ParallelDofs().SubSet(BitArray(~primal_dofs & fes.FreeDofs()))
B = FETI_Jump(dual_pardofs, u_pardofs=dp_pardofs)
if comm.rank==0:
print('# of global multipliers = :', B.col_pardofs.ndofglobal)
A_dp_inv = DPSpace_Inverse(mat=a.mat, freedofs=fes.FreeDofs(), \
c_points=primal_dofs, \
c_mat=B_p, c_pardofs=eavg_pardofs, \
invtype_loc='sparsecholesky', \
invtype_glob='masterinverse')
F = B @ A_dp_inv @ B.T
innerdofs = BitArray([len(fes.ParallelDofs().Dof2Proc(k))==0 for k in range(fes.ndof)]) & fes.FreeDofs()
A = a.mat.local_mat
Aiinv = A.Inverse(innerdofs, inverse='sparsecholesky')
scaledA = ScaledMat(A, [1.0/(1+len(fes.ParallelDofs().Dof2Proc(k))) for k in range(fes.ndof)])
scaledBT = ScaledMat(B.T, [1.0/(1+len(fes.ParallelDofs().Dof2Proc(k))) for k in range(fes.ndof)])
Fhat = B @ scaledA @ (IdentityMatrix() - Aiinv @ A) @ scaledBT
return [A_dp, A_dp_inv, F, Fhat, B, scaledA, scaledBT]
def prep(B, Ainv, f):
rhs.data = (B @ Ainv) * f.vec
return rhs
def solve(mat, pre, rhs, sol):
t = comm.WTime()
solvers.CG(mat=mat, pre=pre, rhs=rhs, sol=sol, \
maxsteps=100, printrates=comm.rank==0, tol=1e-6)
return comm.WTime() - t
def post(B, Ainv, gfu, lam):
hv = B.CreateRowVector()
hv.data = f.vec - B.T * lam
gfu.vec.data = Ainv * hv
jump = lam.CreateVector()
jump.data = B * gfu.vec
norm_jump = Norm(jump)
if comm.rank==0:
print('\nnorm jump u: ', norm_jump)
[14]:
%%px
comm = MPI_Init()
mesh = load_mesh(nref=1)
fes, a, f = setup_space(mesh, order=2)
A_dp, A_dp_inv, F, Fhat, B, scaledA, scaledBT = setup_FETIDP(fes, a)
rhs = B.CreateColVector()
lam = B.CreateColVector()
prep(B, A_dp_inv, f)
if comm.rank==0:
print('')
t = solve(F, Fhat, rhs, lam)
if comm.rank==0:
print('\ntime solve: ', t)
print('dofs per core and second: ', fes.ndofglobal / (t * comm.size))
gfu = GridFunction(fes)
post(B, A_dp_inv, gfu, lam)
[stdout:16]
global, ndof = 576877 , lodofs = 75403
avg DOFs per core: 6607.56
# of global primal dofs: 399
# of global multipliers = : 87867
it = 0 err = 0.12130848690672913
it = 1 err = 0.02390767825931742
it = 2 err = 0.00825023321167997
it = 3 err = 0.002729895619289229
it = 4 err = 0.0008641121927087105
it = 5 err = 0.0003048442822355854
it = 6 err = 9.712889856412031e-05
it = 7 err = 3.3730326402392145e-05
it = 8 err = 1.0925387588517252e-05
it = 9 err = 3.32252185091653e-06
it = 10 err = 1.067131618337071e-06
it = 11 err = 3.318060576723047e-07
time solve: 0.5276237779762596
dofs per core and second: 10933.491326199415
norm jump u: 2.0337283556173004e-06
[15]:
%%px --target 1
for t in sorted(filter(lambda t:t['time']>0.5, Timers()), key=lambda t:t['time'], reverse=True):
print(t['name'], ': ', t['time'])
SparseCholesky<d,d,d>::MultAdd : 1.095705270767212
SparseCholesky<d,d,d>::MultAdd fac1 : 0.6547586917877197
[16]:
%%px
t_chol = filter(lambda t: t['name'] == 'SparseCholesky<d,d,d>::MultAdd', Timers()).__next__()
maxt = comm.Max(t_chol['time'])
if t_chol['time'] == maxt:
print('timers from rank ', comm.rank, ':')
for t in sorted(filter(lambda t:t['time']>min(0.3*maxt, 0.5), Timers()), key=lambda t:t['time'], reverse=True):
print(t['name'], ': ', t['time'])
[stdout:2]
timers from rank 13 :
SparseCholesky<d,d,d>::MultAdd : 1.214454174041748
SparseCholesky<d,d,d>::MultAdd fac1 : 0.7252676486968994
SparseCholesky<d,d,d>::MultAdd fac2 : 0.47362494468688965
SparseCholesky - total : 0.42598485946655273
[17]:
stop_cluster()