{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# 5.5.1 Solving the Poisson Equation on devices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "* After finite element discretization we obtain a (linear) system of equations. \n",
    "* The new **ngscuda** module moves the linear operators to the Cuda - decice.\n",
    "* The host is stearing, data stays on the device\n",
    "\n",
    "* The module is now included in NGSolve Linux - distributions, and can be used whenever an accelerator card by NVIDIA is available, and the cuda-runtime is installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ngsolve import *\n",
    "from ngsolve.webgui import Draw\n",
    "from time import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "mesh = Mesh(unit_square.GenerateMesh(maxh=0.1))\n",
    "for l in range(5): mesh.Refine()\n",
    "fes = H1(mesh, order=2, dirichlet=\".*\")\n",
    "print (\"ndof =\", fes.ndof)\n",
    "\n",
    "u, v = fes.TnT()\n",
    "with TaskManager():\n",
    "    a = BilinearForm(grad(u)*grad(v)*dx+u*v*dx).Assemble()\n",
    "    f = LinearForm(x*v*dx).Assemble()\n",
    "\n",
    "gfu = GridFunction(fes)\n",
    "\n",
    "jac = a.mat.CreateSmoother(fes.FreeDofs())\n",
    "\n",
    "with TaskManager(): \n",
    "    inv_host = CGSolver(a.mat, jac, maxiter=2000)\n",
    "    ts = time()\n",
    "    gfu.vec.data = inv_host * f.vec\n",
    "    te = time()\n",
    "    print (\"steps =\", inv_host.GetSteps(), \", time =\", te-ts)\n",
    "\n",
    "# Draw (gfu);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "Now we import the NGSolve - cuda library.\n",
    "\n",
    "It provides\n",
    "\n",
    "* an `UnifiedVector`, which allocates memory on both, host and device. The data is updated on demand either on host, or on device. \n",
    "* NGSolve - matrices can create their counterparts on the device. In the following, the conjugate gradients iteration runs on the host, but all operations involving big data are performed on the accelerator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    from ngsolve.ngscuda import *\n",
    "except:\n",
    "    print (\"no CUDA library or device available, using replacement types on host\")\n",
    "    \n",
    "ngsglobals.msg_level=1\n",
    "fdev = f.vec.CreateDeviceVector(copy=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "adev = a.mat.CreateDeviceMatrix()\n",
    "jacdev = jac.CreateDeviceMatrix()\n",
    "\n",
    "inv = CGSolver(adev, jacdev, maxsteps=2000, printrates=False)\n",
    "\n",
    "ts = time()\n",
    "res = (inv * fdev).Evaluate()\n",
    "te = time()\n",
    "\n",
    "print (\"Time on device:\", te-ts)\n",
    "diff = Norm(gfu.vec - res)\n",
    "print (\"diff = \", diff)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "On an A-100 device I got (for 5 levels of refinement, ndof=476417):"
   ]
  },
  {
   "cell_type": "raw",
   "id": "8",
   "metadata": {},
   "source": [
    "Time on device: 0.4084775447845459\n",
    "diff =  3.406979028373306e-12"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "## CG Solver with Block-Jacobi and exact low-order solver:\n",
    "\n",
    "$$\n",
    "A = \\left( \\begin{array}{cc}\n",
    "    A_{cc} & A_{cf} \\\\\n",
    "    A_{fc} & A_{ff} \n",
    "        \\end{array} \\right)\n",
    "$$\n",
    "\n",
    "Additive Schwarz preconditioner:\n",
    "\n",
    "$$\n",
    "C^{-1} = P A_{cc}^{-1} P^T + \\sum_i E_i A_i^{-1} E_i^T \n",
    "$$\n",
    "\n",
    "with\n",
    "* $P$ .. embedding of low-order space\n",
    "* $A_i$ .. blocks on edges/faces/cells\n",
    "* $E_i$ .. embedding matrices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "fes = H1(mesh, order=5, dirichlet=\".*\")\n",
    "print (\"ndof =\", fes.ndof)\n",
    "\n",
    "u, v = fes.TnT()\n",
    "with TaskManager():\n",
    "    a = BilinearForm(grad(u)*grad(v)*dx+u*v*dx).Assemble()\n",
    "    f = LinearForm(x*v*dx).Assemble()\n",
    "\n",
    "gfu = GridFunction(fes)\n",
    "\n",
    "jac = a.mat.CreateBlockSmoother(fes.CreateSmoothingBlocks())\n",
    "lospace = fes.lospace\n",
    "loinv = a.loform.mat.Inverse(inverse=\"sparsecholesky\", freedofs=lospace.FreeDofs())\n",
    "loemb = fes.loembedding\n",
    "\n",
    "pre = jac + loemb@loinv@loemb.T\n",
    "print (\"mat\", a.mat.GetOperatorInfo())\n",
    "print (\"preconditioner:\") \n",
    "print(pre.GetOperatorInfo())\n",
    "\n",
    "with TaskManager(): \n",
    "    inv = CGSolver(a.mat, pre, maxsteps=2000, printrates=False)\n",
    "    ts = time()\n",
    "    gfu.vec.data = inv * f.vec\n",
    "    te = time()\n",
    "    print (\"iterations =\", inv.GetSteps(), \"time =\", te-ts) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "adev = a.mat.CreateDeviceMatrix()\n",
    "predev = pre.CreateDeviceMatrix()\n",
    "fdev = f.vec.CreateDeviceVector()\n",
    "\n",
    "with TaskManager(): \n",
    "    inv = CGSolver(adev, predev, maxsteps=2000, printrates=False)\n",
    "    ts = time()\n",
    "    gfu.vec.data = (inv * fdev).Evaluate() \n",
    "    te = time()\n",
    "    print (\"iterations =\", inv.GetSteps(), \"time =\", te-ts) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "on the A-100:"
   ]
  },
  {
   "cell_type": "raw",
   "id": "13",
   "metadata": {},
   "source": [
    "SumMatrix, h = 2896001, w = 2896001\n",
    "  N4ngla20DevBlockJacobiMatrixE, h = 2896001, w = 2896001\n",
    "  EmbeddedTransposeMatrix, h = 2896001, w = 2896001\n",
    "    EmbeddedMatrix, h = 2896001, w = 116353\n",
    "      N4ngla17DevSparseCholeskyE, h = 116353, w = 116353\n",
    "\n",
    "\n",
    "iterations = 37 time= 0.6766986846923828"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "## Using the BDDC preconditioner:\n",
    "\n",
    "For the BDDC (balancing domain decomposition with constraints) preconditioning, we build a FEM system with relaxed connectivity:\n",
    "\n",
    "<img src=\"pictures/auxspace.png\" width=\"500\">\n",
    "\n",
    "This allows for static condensation of all local and interface degrees of freedom, only the wirebasket dofs enter the global solver. The resulting matrix $\\tilde A$ is much cheaper to invert.\n",
    "\n",
    "The preconditioner is\n",
    "\n",
    "$$\n",
    "P = R {\\tilde A}^{-1} R^T\n",
    "$$\n",
    "\n",
    "\n",
    "with an averagingn operator $R$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "mesh = Mesh(unit_square.GenerateMesh(maxh=0.1))\n",
    "for l in range(3): mesh.Refine()\n",
    "fes = H1(mesh, order=10, dirichlet=\".*\")\n",
    "print (\"ndof =\", fes.ndof)\n",
    "\n",
    "u, v = fes.TnT()\n",
    "with TaskManager():\n",
    "    a = BilinearForm(grad(u)*grad(v)*dx+u*v*dx)\n",
    "    pre = Preconditioner(a, \"bddc\")\n",
    "    a.Assemble()\n",
    "    f = LinearForm(x*v*dx).Assemble()\n",
    "\n",
    "gfu = GridFunction(fes)\n",
    "with TaskManager(): \n",
    "    inv = CGSolver(a.mat, pre, maxsteps=2000, printrates=False)\n",
    "    ts = time()\n",
    "    gfu.vec.data = (inv * f.vec).Evaluate()\n",
    "    te = time()\n",
    "    print (\"iterations =\", inv.GetSteps(), \"time =\", te-ts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "predev = pre.mat.CreateDeviceMatrix()\n",
    "print (predev.GetOperatorInfo())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "adev = a.mat.CreateDeviceMatrix()\n",
    "predev = pre.mat.CreateDeviceMatrix()\n",
    "fdev = f.vec.CreateDeviceVector()\n",
    "\n",
    "with TaskManager(): \n",
    "    inv = CGSolver(adev, predev, maxsteps=2000, printrates=False)\n",
    "    gfu.vec.data = (inv * fdev).Evaluate()\n",
    "    print (\"iterations =\", inv.GetSteps())"
   ]
  },
  {
   "cell_type": "raw",
   "id": "18",
   "metadata": {},
   "source": [
    "A100:\n",
    "iterations = 53 time= 0.16417622566223145"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "Vite - traces:\n",
    "\n",
    "<img src=\"pictures/bddc-tracings.png\" width=\"500\">\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
