How to add a new scheme to Inciter

Inciter (Compressible flow solver) supports multiple discretization schemes. This page describes how to add a scheme of your choice by walking through an example of adding a new one. We also discuss the main steps of the execution logic, which, at a high level, is the same for all discretization schemes.

Rationale and plan

Similar to the existing discretization schemes, DiagCG, or DG, the new scheme, ALECG (short for Arbitrary Lagrangian-Eulerian Continuous Galerkin), will interact with Discretization in a child-base fashion, e.g., will directly access (and reuse) its member data and functions. It will also intereact with Refiner, for mesh refinement (AMR), and will also be migratable to enable dynamic load balancing. In essence, it will have everything an existing scheme has. However, we will not implement the low-level details of the actual numerical method, only the glue-code necessary to interact with the rest of the code and we make it ready to start implementing the low-level details of a particular discretization, done by a PDE class, held behind a derived class of, e.g., CGPDE or DGPDE. For more details on how these classes interact, see also the Inciter software design page.

1. Add a new keyword

A specific discretization scheme is selected by the user in the control (input) file via the scheme keyword, e.g., scheme diagcg. We add the new keyword, alecg, which then can be recognized by the control file parser, in src/Control/Keywords.hpp by adding the following code block:

Control/Keywords.hpp

$ git diff src/Control/Keywords.hpp
diff --git a/src/Control/Keywords.hpp b/src/Control/Keywords.hpp
index 002869cb..c18f193a 100644
--- a/src/Control/Keywords.hpp
+++ b/src/Control/Keywords.hpp
@@ -4607,6 +4607,19 @@ struct diagcg_info {
 };
 using diagcg = keyword< diagcg_info, TAOCPP_PEGTL_STRING("diagcg") >;

+struct alecg_info {
+  static std::string name() { return "ALE-CG with RK"; }
+  static std::string shortDescription() { return "Select continuous Galerkin "
+    "with ALE + Runge-Kutta"; }
+  static std::string longDescription() { return
+    R"(This keyword is used to select the continuous Galerkin finite element
+    scheme in the arbitrary Lagrangian-Eulerian (ALE) reference frame combined
+    with Runge-Kutta (RK) time stepping. See Control/Inciter/Options/Scheme.hpp
+    for other valid options.)"; }
+};
+using alecg = keyword< alecg_info, TAOCPP_PEGTL_STRING("alecg") >;
+
 struct dg_info {
   static std::string name() { return "DG(P0) + RK"; }
   static std::string shortDescription() { return

We also add the new keyword to inciter's grammar's keywords pool:

Control/Inciter/InputDeck/InputDeck.hpp

$ git diff src/Control/Inciter/InputDeck/InputDeck.hpp
diff --git a/src/Control/Inciter/InputDeck/InputDeck.hpp b/src/Control/Inciter/InputDeck/InputDeck.hpp
index 83572480..20ce8975 100644
--- a/src/Control/Inciter/InputDeck/InputDeck.hpp
+++ b/src/Control/Inciter/InputDeck/InputDeck.hpp
@@ -144,6 +144,7 @@ class InputDeck :
                                    kw::scheme,
                                    kw::matcg,
                                    kw::diagcg,
+                                   kw::alecg,
                                    kw::dg,
                                    kw::dgp1,
                                    kw::flux,

This is required so that the compiler can generate a database containing the help for all the keywords in the grammar understood by inciter's control file parser. The above changes not only add the keyword but also some documentation that gets displayed when passing the -C or -H command line arguments to the inciter executable, so quick help is available at the user's fingertips:

$ inciter -C
inciter Control File Keywords:
             advdiff     string Specify the advection + diffusion physics configuration for a PDE
           advection     string Specify the advection physics configuration for a PDE
               alecg            Select continuous Galerkin with ALE + Runge-Kutta
           algorithm     string Select mesh partitioning algorithm
               alpha       real Set PDE parameter(s) alpha
...
$ inciter -H alecg
inciter control file keyword 'alecg'

   Select continuous Galerkin with ALE + Runge-Kutta (RK)

   This keyword is used to select the continuous Galerkin finite element scheme
   in the arbitrary Lagrangian-Eulerian (ALE) reference frame combined with
   Runge-Kutta (RK) time stepping. See Control/Inciter/Options/Scheme.hpp for other
   valid options.

2. Add new option switch

Next is to add a new state to the existing Scheme option switch. This "option switch" is really only a fancy enum, used to store the user's choice of the discretization scheme after parsing the control file in a type-safe manner. This fancy enum is an option switch because it inherits from tk::Toggle, defined in Control/Toggle.hpp, which is a generic switch (or option), that helps associating enum values to keywords and querying one based on the other. Extending the existing Scheme option switch is done by extending the list of schemes in src/Control/Inciter/Options/Scheme.hpp.

3. Add new Charm++ chare proxy in Scheme

Scheme is a class that implements concept-based runtime polymorphism for migratable Charm++ chare arrays using value semantics. Client code, e.g., Transporter, interacts with Discretization and its children via a uniform interface provided by Scheme, which dispatches entry method calls to the correct class instance, the base or the child, and is capable of performing broadcasts as well as addressing a particular chare array element. Read more details at src/Inciter/Scheme.hpp. To teach it to dispatch to our new ALECG scheme, besides the existing ones, we make the following changes:

Inciter/Scheme.hpp

$ git diff src/Inciter/Scheme.hpp
diff --git a/src/Inciter/Scheme.hpp b/src/Inciter/Scheme.hpp
index 61510d01..dea3d78a 100644
--- a/src/Inciter/Scheme.hpp
+++ b/src/Inciter/Scheme.hpp
@@ -22,6 +22,7 @@

 #include "NoWarning/matcg.decl.h"
 #include "NoWarning/diagcg.decl.h"
+#include "NoWarning/alecg.decl.h"
 #include "NoWarning/distfct.decl.h"
 #include "NoWarning/dg.decl.h"
 #include "NoWarning/discretization.decl.h"
@@ -51,6 +52,8 @@ class Scheme {
       } else if (scheme == ctr::SchemeType::DiagCG) {
         proxy = static_cast< CProxy_DiagCG >( CProxy_DiagCG::ckNew(m_bound) );
         fctproxy= CProxy_DistFCT::ckNew(m_bound);
+      } else if (scheme == ctr::SchemeType::ALECG) {
+        proxy = static_cast< CProxy_ALECG >( CProxy_ALECG::ckNew(m_bound) );
       } else if (scheme == ctr::SchemeType::DG ||
                  scheme == ctr::SchemeType::DGP1) {
         proxy = static_cast< CProxy_DG >( CProxy_DG::ckNew(m_bound) );
@@ -75,11 +78,12 @@ class Scheme {
     const CkArrayOptions& arrayoptions() { return m_bound; }

     //! Variant type listing all chare proxy types modeling the same concept
-    using Proxy = boost::variant< CProxy_DiagCG, CProxy_DG >;
+    using Proxy =
+      boost::variant< CProxy_DiagCG, CProxy_ALECG, CProxy_DG >;
     //! Variant type listing all chare element proxy types (behind operator[])
     using ProxyElem =
       boost::variant< CProxy_DiagCG::element_t,
-                      CProxy_DG::element_t >;
+                      CProxy_ALECG::element_t, CProxy_DG::element_t >;

   protected:
     //! Variant storing one proxy to which this class is configured for

4. Add new Charm++ chare array

Next is to add a new class, ALECG, which will serve as the glue between Transporter, Refiner, and CGPDE. These classes, respectively, are the driver, the mesh refiner, and the polymorphic vector of PDE discretization class objects that hold the low-level details of the numerical implementation of spatial discretizations, dispatching to multiple specific systems of equations, e.g., cg::Transport or cg::CompFlow.

We create the following new files:

Before we discuss the details of the above new files, let's get a couple of simple things out of the way. We also need to add the new include to Refiner.hpp so, e.g., it can call back to ALECG::resize() after a mesh refinement step:

Inciter/Refiner.hpp

$ git diff src/Inciter/Refiner.hpp
diff --git a/src/Inciter/Refiner.hpp b/src/Inciter/Refiner.hpp
index dfcb1ffd..4fe743a4 100644
--- a/src/Inciter/Refiner.hpp
+++ b/src/Inciter/Refiner.hpp
@@ -29,6 +29,7 @@
 #include "SchemeBase.hpp"
 #include "DiagCG.hpp"
+#include "ALECG.hpp"
 #include "DG.hpp"

 #include "NoWarning/transporter.decl.h"

We also tell the build system about our new ALECG class and its Charm++ module:

Inciter/CMakeLists.txt

$ gd src/Inciter/CMakeLists.txt
diff --git a/src/Inciter/CMakeLists.txt b/src/Inciter/CMakeLists.txt
index 141055ec..e339b65b 100644
--- a/src/Inciter/CMakeLists.txt
+++ b/src/Inciter/CMakeLists.txt
@@ -14,6 +14,7 @@ add_library(Inciter
             Sorter.cpp
             DiagCG.cpp
+            ALECG.cpp
             DG.cpp
             FluxCorrector.cpp
             DistFCT.cpp
@@ -74,6 +75,7 @@ addCharmModule( "refiner" "Inciter" )
 addCharmModule( "sorter" "Inciter" )
 addCharmModule( "matcg" "Inciter" )
 addCharmModule( "diagcg" "Inciter" )
+addCharmModule( "alecg" "Inciter" )
 addCharmModule( "distfct" "Inciter" )
 addCharmModule( "dg" "Inciter" )

The addCharmModule cmake macro above, defined in cmake/charm.cmake, ensures that build target Inciter will properly depend on our new alecg Charm++ module, defined in Inciter/alecg.ci. The macro also tells cmake how the two files, alecg.decl.h and alecg.def.h, are generated from alecg.ci: using charmc, a compiler wrapper that generates Charm++-code to make the ALECG from an ordinary C++ class into a Charm++ chare array, with entry methods callable across the network, make it migratable, enable its structured DAGger, etc. See also the Charm++ manual.

Now to the new files. First is the new Charm++ interface file, Inciter/alecg.ci:

Inciter/alecg.ci

This is the file that is parsed by Charm++'s compiler which then generates additional code that makes ALECG a Charm++ chare array, makes it migratable, etc. The full listing is at Inciter/alecg.ci some of whose details are discussed below.

Inciter/alecg.ci – External modules and header includes

  extern module transporter;
  extern module discretization;
  extern module ghosts;

  include "UnsMesh.hpp";
  include "PUPUtil.hpp";

First we declare some external Charm++ modules that ALECG needs to interact with and thus from where we need type information. The extern module statements are followed by some usual C++ includes (without the #): these are in the Charm++ interface file because the Charm++ code below requires type information from them.

Inciter/alecg.ci – 1D Charm++ chare array

    array [1D] ALECG {

Next comes the specification of the ALECG Charm++ chare array. This is a 1D array whose elements at runtime will be distributed across the available processing elements and compute nodes. If load balancing is enabled, the array elements (C++ objects) are migrated to homogenize load across a simulation. Because the array is 1D, we use a single integer index to address a particular array element. Charm++ also allows multi-dimensional arrays which can be useful if the problem naturally maps to a multi-dimensional notion, e.g., partitioning a 3D Cartesian mesh, so index calculations to address array elements (and thus work-units) become cleaner.

Inciter/alecg.ci – Entry methods

      entry ALECG( const CProxy_Discretization& disc,
                   const CProxy_Ghosts& ghostsproxy,
                   const std::map< int, std::vector< std::size_t > >& bface,
                   const std::map< int, std::vector< std::size_t > >& bnode,
                   const std::vector< std::size_t >& triinpoel );
      initnode void registerReducers();
      entry void setup();
      entry void box( tk::real v, const std::vector< tk::real >& blkvol );
      entry void resizeComm();
      entry void nodeNeighSetup();
      entry void transferSol();
      entry void start();
      entry void refine( const std::vector< tk::real >& l2ref );
      entry [reductiontarget] void advance( tk::real newdt, tk::real );
      entry void comdfnorm(
              const std::unordered_map< tk::UnsMesh::Edge,
              std::array< tk::real, 3 >,
              tk::UnsMesh::Hash<2>, tk::UnsMesh::Eq<2> >& dfnorm );
      entry void comnorm( const std::unordered_map< int,
       std::unordered_map< std::size_t, std::array< tk::real, 4 > > >& innorm );
      entry void comblk( const std::vector< std::size_t >& gid,
        const std::vector< std::set< std::size_t > >& mb );
      entry void comChBndGrad( const std::vector< std::size_t >& gid,
                               const std::vector< std::vector< tk::real > >& G );
      entry void comrhs( const std::vector< std::size_t >& gid,
                         const std::vector< std::vector< tk::real > >& R );
      entry void resized();
      entry void meshveldone();
      entry void lhs();
      entry void step();
      entry void next();
      entry void stage();
      entry void evalLB( int nrestart );

We simply list those member functions of ALECG as entry methods, e.g., ALECG::setup() or ALECG::dt(), that we need to be able to call externally, potentially across the network, from another processing element (PE). Entry methods are always public in the C++ object-oriented programming (OOP) sense. Note that there can be other member functions of ALECG. These are simple C++ class member functions and are usually not public but private, such as ALECG::rhs(). Note also that there is an initnode entry method, ALECG::registerReducers() which is a special member function that is also declared as static in the C++ sense (see ALECG.hpp). This is static because the runtime system must be able to call this function without creating an object and a lot earlier than the actual ALECG chare array elements are created. This is how custom reducers can be associated in Charm++ to a chare array. Such custom reducers are an excellent way to rely on the asynchronous, tree-based implementation of parallel reductions in Charm++ yet still do it on custom, arbitrarily complex data types, e.g., a hash-map that holds vectors, as long as one defines how aggregation is to be performed when merging such data. Such an example is given in Inciter/DiagReducer.cpp.

Inciter/alecg.ci – Structured DAG

      entry void wait4meshblk() {
        when ownblk_complete(), comblk_complete() serial "meshblk"
          { continueSetup(); } }

      entry void wait4norm() {
        when ownnorm_complete(), comnorm_complete(),
             owndfnorm_complete(), comdfnorm_complete(),
             transfer_complete() serial "norm" { mergelhs(); } }

      entry void wait4grad() {
        when owngrad_complete(), comgrad_complete() serial "grad" { rhs(); } }

      entry void wait4rhs() {
        when ownrhs_complete(), comrhs_complete() serial "rhs" { solve(); } }

      entry void wait4mesh() {
        when norm_complete(), resize_complete() serial "trans" {
          if (m_newmesh == 0) meshvelstart(); else transfer(); } }

      entry void ownblk_complete();
      entry void comblk_complete();
      entry void ownnorm_complete();
      entry void comnorm_complete();
      entry void owndfnorm_complete();
      entry void comdfnorm_complete();
      entry void transfer_complete();
      entry void ownrhs_complete();
      entry void comrhs_complete();
      entry void owngrad_complete();
      entry void comgrad_complete();
      entry void norm_complete();
      entry void resize_complete();

The entry methods, defined in the .ci file and with when keywords, form a structured directed acyclic graph (DAG). These specify logical relations among tasks and execution logic within the class. For example, wait4grad tells the runtime system that only when owngrad_complete() and comgrad_complete() are both done will rhs() be called. In this case, this construct ensures that the runtime system will call a member function that requires the assembled right-hand side, when both the local and external contributions are complete. Note that this logic only relates to a given array element, say with index 2. Another one, say index 3, may perform this operation at a very different time and independently, thus computation and communication can overlap. The entry methods listed at the bottom, e.g., owngrad_complete() can be thought of as "labels" to the runtime system that help define the task logic. These labels are functions that the runtime system defines and we call them when the given task is complete. Note that the construct we used here, when A and B are both complete then do C, is probably the simplest task-logic Charm++ allows prescribing. There are many more advanced ways of expressing such logic, e.g., using loops. For more details, see Section Structured Control Flow: Structured Dagger in the Charm++ manual.

NoWarning/alecg.decl.h and NoWarning/alecg.def.h

The newly added files to the NoWarning/ directory simply include the Charm++-generated alecg.decl.h and alecg.def.h files and locally, around the include, turn off specific compiler warnings for various compilers – we will not discuss them here further. Full listings are at NoWarning/alecg.decl.h and NoWarning/alecg.def.h.

5. New C++ class

Next are the newly added Inciter/ALECG.hpp and Inciter/ALECG.cpp, header and implementation of ALECG. The full listings are at Inciter/ALECG.hpp and Inciter/ALECG.cpp, some of whose details are discussed below, rougly in order of execution.

ALECG::ALECG – Constructor

{
  usesAtSync = true;    // enable migration at AtSync

  auto d = Disc();

  // Perform optional operator-access-pattern mesh node reordering
  if (g_inputdeck.get< tag::discr, tag::operator_reorder >()) {

    // Create new local ids based on access pattern of PDE operators
    std::unordered_map< std::size_t, std::size_t > map;
    std::size_t n = 0;

    for (std::size_t p=0; p<m_u.nunk(); ++p) {  // for each point p
      if (map.find(p) == end(map)) map[p] = n++;
      for (auto q : tk::Around(m_psup,p)) {     // for each edge p-q
        if (map.find(q) == end(map)) map[q] = n++;
      }
    }

    Assert( map.size() == d->Gid().size(), "Map size mismatch" );

    // Remap data in bound Discretization object
    d->remap( map );
    // Recompute elements surrounding points
    m_esup = tk::genEsup( d->Inpoel(), 4 );
    // Recompute points surrounding points
    m_psup = tk::genPsup( d->Inpoel(), 4, m_esup );
    // Remap boundary triangle face connectivity
    tk::remap( m_triinpoel, map );
  }

  // Query/update boundary-conditions-related data structures from user input
  queryBnd();

  // Activate SDAG wait for initially computing normals, and mesh blocks
  thisProxy[ thisIndex ].wait4norm();
  thisProxy[ thisIndex ].wait4meshblk();

  d->comfinal();

}

As discussed in Section Creating workers on the Inciter software design page, the worker chare array elements, such as ALECG, are created using Charm++'s dynamic array insertion feature. This is an asynchronous call, issued from Sorter::createWorkers(), and it signals the runtime system that it is time to start calling individual constructors of ALECG, passing them the appropriate data, required for each of them to initialize and operate on a mesh partition each is assigned (held by their companion Discretization "base" class). Thus running Sorter::createWorkers() eventually triggers calling ALECG's constructors distributed across the whole problem and available PEs.

In the constructor's body, listed above, various initialization steps are executed, including enabling migration for the class. Mesh-to-mesh solution transfer is also configured, calling member functions of Discretization, which then eventually signals the runtime system that extra communication buffers (other than those alrady stored in Discretization), specific to this particular ALECG scheme, have been created. This is a reduction call, issued by all array elements, eventually calling the reduction target Transporter::comfinal() a single time.

Transporter::comfinal() – Complete communication maps

{
  auto meshid = tk::cref_find( m_meshid, static_cast<std::size_t>(summeshid) );

  if (initial > 0) {
    m_scheme[meshid].bcast< Scheme::setup >();
    // Turn on automatic load balancing
    if (++m_ncom == m_nelem.size()) { // all worker arrays have finished
      m_ncom = 0;
      auto print = printer();
      m_progWork.end( print );
      tk::CProxy_LBSwitch::ckNew();
      print.diag( "Load balancing on (if enabled in Charm++)" );
    }
  } else {
    m_scheme[meshid].bcast< Scheme::lhs >();
  }
}

Though asynchronously executed, the reduction operation targeting Transporter::comfinal() is a global synchronization point: all chares arrive in that function body, synchronized, and all continue from there again by calling ALECG::setup().

Transporter::comfinal() is a global synchronization point because all worker chares must finish resizing and/or constructing their communication maps before their setup() member function can be invoked. This is because setup() is allowed to start using those communication maps that have been constructed before calling setup().

ALECG::setup() – Start setup

void
ALECG::setup()
// *****************************************************************************
// Start setup for solution
// *****************************************************************************
{
  auto d = Disc();

  // Determine nodes inside user-defined IC box
  g_cgpde[d->MeshId()].IcBoxNodes( d->Coord(), d->Inpoel(),
    d->ElemBlockId(), m_boxnodes, m_nodeblockid, m_nusermeshblk );

  // Communicate mesh block nodes to other chares on chare-boundary
  if (d->NodeCommMap().empty())        // in serial we are done
    comblk_complete();
  else // send mesh block information to chare-boundary nodes to fellow chares
    for (const auto& [c,n] : d->NodeCommMap()) {
      // data structure assigning block ids (set of values) to nodes (index).
      // although nodeblockid is a map with key-blockid and value-nodeid, the
      // sending data structure has to be inverted, because of how communication
      // data is handled.
      std::vector< std::set< std::size_t > > mb( n.size() );
      std::size_t j = 0;
      for (auto i : n) {
        for (const auto& [blid, ndset] : m_nodeblockid) {
          // if node was found in a block, add to send-data
          if (ndset.find(tk::cref_find(d->Lid(),i)) != ndset.end())
            mb[j].insert(blid);
        }
        if (m_nusermeshblk > 0)
          Assert(mb[j].size() > 0, "Sending no block data for node");
        ++j;
      }
      thisProxy[c].comblk( std::vector<std::size_t>(begin(n),end(n)), mb );
    }

  ownblk_complete();
}

void
ALECG::comblk( const std::vector< std::size_t >& gid,
               const std::vector< std::set< std::size_t > >& mb )
// *****************************************************************************
//  Receive mesh block information for nodes on chare-boundaries
//! \param[in] gid Global mesh node IDs at which we receive RHS contributions
//! \param[in] mb Block ids for each node on chare-boundaries
//! \details This function receives mesh block information for nodes on chare
//!   boundaries. While m_nodeblockid stores block information for own nodes,
//!   m_nodeblockidc collects the neighbor chare information during
//!   communication. This way work on m_nodeblockid and m_nodeblockidc is
//!   overlapped. The two are combined in continueSetup().
// *****************************************************************************
{
  Assert( mb.size() == gid.size(), "Size mismatch" );

  for (std::size_t i=0; i<gid.size(); ++i) {
    for (const auto& blid : mb[i]) {
      m_nodeblockidc[blid].insert(gid[i]);
    }
  }

  // When we have heard from all chares we communicate with, this chare is done
  if (++m_nmblk == Disc()->NodeCommMap().size()) {
    m_nmblk = 0;
    comblk_complete();
  }
}

void
ALECG::continueSetup()
// *****************************************************************************
// Continue setup for solution, after communication for mesh blocks
// *****************************************************************************
{
  auto d = Disc();

  // Combine own and communicated mesh block information
  for (const auto& [blid, ndset] : m_nodeblockidc) {
    for (const auto& i : ndset) {
      auto lid = tk::cref_find(d->Lid(), i);
      m_nodeblockid[blid].insert(lid);
    }
  }

  // clear receive buffer
  tk::destroy(m_nodeblockidc);

  // Compute volume of user-defined box IC
  d->boxvol( m_boxnodes, m_nodeblockid, m_nusermeshblk );

  // Query time history field output labels from all PDEs integrated
  const auto& hist_points = g_inputdeck.get< tag::history, tag::point >();
  if (!hist_points.empty()) {
    std::vector< std::string > histnames;
    auto n = g_cgpde[d->MeshId()].histNames();
    histnames.insert( end(histnames), begin(n), end(n) );
    d->histheader( std::move(histnames) );
  }
}

In the ALECG::setup() code snippet above the various setup steps are started. What they are may change as we develop this further, so we don't go into detail here, the current listing is above.

The current DAG in Inciter/alecg.ci – Structured DAG should be consulted for the task logic. When the setup phase is done, ALECG::start() is called, which starts time stepping.

ALECG::start() – Start time step

Eventually called by ALECG::start(), ALECG::dt() starts computing the smallest-size dt allowed in the give time step across the whole problem:

    conserved( m_u, Disc()->Vol() );
    if (g_inputdeck.get< tag::discr, tag::steady_state >()) {

      // compute new dt for each mesh point
      g_cgpde[d->MeshId()].dt( d->It(), d->Vol(), m_u, m_dtp );

      // find the smallest dt of all nodes on this chare
      mindt = *std::min_element( begin(m_dtp), end(m_dtp) );

    } else {    // compute new dt for this chare

      // find the smallest dt of all equations on this chare
      auto eqdt = g_cgpde[d->MeshId()].dt( d->Coord(), d->Inpoel(), d->T(),
        d->Dtn(), m_u, d->Vol(), d->Voln() );
      if (eqdt < mindt) mindt = eqdt;

    }
    volumetric( m_u, Disc()->Vol() );

The above code snippet shows a for loop that calls the the dt() member function of all types of PDEs configured by the user and finds the minimum size of the next time step.

  // Actiavate SDAG waits for next time step stage
  thisProxy[ thisIndex ].wait4grad();
  thisProxy[ thisIndex ].wait4rhs();

  // Contribute to minimum dt across all chares and advance to next step
  contribute( sizeof(tk::real), &mindt, CkReduction::min_double,
              CkCallback(CkReductionTarget(ALECG,advance), thisProxy) );

Once we have the time step size, we enable a couple of SDAG waits to get ready for some communication steps and issue a reduction to ALECG::advance() which yields the global minimum of the dt across all chares. advance() saves the new time step in Discretization::m_dt, which is its master copy, then starts computing the right hand sides of all PDEs integrated, which requires the primitive variable gradients first (requiring its own communication step).

ALECG::rhs() & ALECG::comrhs() – Compute and communicate right hand side

When both the own and communicated contributions are complete on a chare, the runtime system continues as described in the DAG in alecg.ci. ALECG::solve(), first combines the own and received contributions then advances all equations using a Runge-Kutta method.

ALECG::solve() – Solve

  // Recompute mesh volumes if ALE is enabled
  if (g_inputdeck.get< tag::ale, tag::ale >()) {

    transfer_complete();
    // Save nodal volumes at previous time step stage
    d->Voln() = d->Vol();
    // Prepare for recomputing the nodal volumes
    d->startvol();
    auto meshid = d->MeshId();
    contribute( sizeof(std::size_t), &meshid, CkReduction::nop,
                CkCallback(CkReductionTarget(Transporter,resized), d->Tr()) );

  } else {

    norm_complete();
    resized();

  }

The above code snippet shows what happens immediately after advancing the solution on a chare. If ALE mesh movement is enabled, the mesh nodes have been moved in physical space, so the nodal volumes need to be recomputed so we get ready for this. This is followed by optionally computing diagnostics, which is a catch-all phrase for various norms and integral quantities, see Inciter/NodeDiagnostics.cpp for details. Note that computing diagnostics only happens every few time step, depending on user configuration. If m_diag.compute() returns true, diagnostics have been computed in this time step. If diagnostics have been computed, their correct values require global reduction operations, performing different aggregation operations depending on the value. As almost all reductions, diagnostics are also collected by Transporter, this time in target Transporter::diagnostics(), which calls back, via a broadcast, to ALECG::refine(), which performs an optional mesh refinement step.

ALECG::refine() – Optionally refine mesh

void
ALECG::refine( const std::vector< tk::real >& l2res )
// *****************************************************************************
// Optionally refine/derefine mesh
//! \param[in] l2res L2-norms of the residual for each scalar component
//!   computed across the whole problem
// *****************************************************************************
{
  auto d = Disc();

  const auto steady = g_inputdeck.get< tag::discr, tag::steady_state >();
  const auto residual = g_inputdeck.get< tag::discr, tag::residual >();
  const auto rc = g_inputdeck.get< tag::discr, tag::rescomp >() - 1;

  if (steady) {

    // this is the last time step if max time of max number of time steps
    // reached or the residual has reached its convergence criterion
    if (d->finished() or l2res[rc] < residual) m_finished = 1;

  } else {

    // this is the last time step if max time or max iterations reached
    if (d->finished()) m_finished = 1;

  }

  auto dtref = g_inputdeck.get< tag::amr, tag::dtref >();
  auto dtfreq = g_inputdeck.get< tag::amr, tag::dtfreq >();

  // Activate SDAG waits for re-computing the normals
  m_newmesh = 1;  // recompute normals after AMR (if enabled)
  thisProxy[ thisIndex ].wait4norm();
  thisProxy[ thisIndex ].wait4mesh();

  // if t>0 refinement enabled and we hit the frequency
  if (dtref && !(d->It() % dtfreq)) {   // refine

    // Convert to conserved unknowns, since the next step changes volumes
    conserved(m_u, d->Vol());

    m_refinedmesh = 1;
    d->startvol();
    d->Ref()->dtref( m_bface, m_bnode, m_triinpoel );
    d->refined() = 1;

  } else {      // do not refine

    m_refinedmesh = 0;
    d->refined() = 0;
    norm_complete();
    resized();

  }
}

The above snippet shows that mesh refinement happens only at every few time step with its frequency configured by the user. If the mesh is not refined, we simply enable the SDAG waits associated to the tasks of the mesh refinement step. If the mesh is refined, we call a member function of the mesh refiner object held by Discretization, Refiner::dtref(), which when done, eventually calls back to ALECG::resizePostAMR(), passing back the new mesh and associated data structures.

ALECG::resizePostAMR() – Resize data after mesh refinement

void
ALECG::resizePostAMR(
  const std::vector< std::size_t >& /*ginpoel*/,
  const tk::UnsMesh::Chunk& chunk,
  const tk::UnsMesh::Coords& coord,
  const std::unordered_map< std::size_t, tk::UnsMesh::Edge >& addedNodes,
  const std::unordered_map< std::size_t, std::size_t >& /*addedTets*/,
  const std::set< std::size_t >& removedNodes,
  const std::unordered_map< std::size_t, std::size_t >& amrNodeMap,
  const tk::NodeCommMap& nodeCommMap,
  const std::map< int, std::vector< std::size_t > >& bface,
  const std::map< int, std::vector< std::size_t > >& bnode,
  const std::vector< std::size_t >& triinpoel,
  const std::unordered_map< std::size_t, std::set< std::size_t > >& elemblockid )
// *****************************************************************************
//  Receive new mesh from Refiner
//! \param[in] ginpoel Mesh connectivity with global node ids
//! \param[in] chunk New mesh chunk (connectivity and global<->local id maps)
//! \param[in] coord New mesh node coordinates
//! \param[in] addedNodes Newly added mesh nodes and their parents (local ids)
//! \param[in] addedTets Newly added mesh cells and their parents (local ids)
//! \param[in] removedNodes Newly removed mesh node local ids
//! \param[in] amrNodeMap Node id map after amr (local ids)
//! \param[in] nodeCommMap New node communication map
//! \param[in] bface Boundary-faces mapped to side set ids
//! \param[in] bnode Boundary-node lists mapped to side set ids
//! \param[in] triinpoel Boundary-face connectivity
//! \param[in] elemblockid Local tet ids associated with mesh block ids
// *****************************************************************************
{
  auto d = Disc();

  d->Itf() = 0;  // Zero field output iteration count if AMR
  ++d->Itr();    // Increase number of iterations with a change in the mesh

  // Resize mesh data structures after mesh refinement
  d->resizePostAMR( chunk, coord, amrNodeMap, nodeCommMap, removedNodes,
    elemblockid );

  // Remove newly removed nodes from solution vectors
  m_u.rm(removedNodes);
  m_un.rm(removedNodes);
  m_rhs.rm(removedNodes);

  // Resize auxiliary solution vectors
  auto npoin = coord[0].size();
  auto nprop = m_u.nprop();
  m_u.resize( npoin );
  m_un.resize( npoin );
  m_rhs.resize( npoin );
  m_chBndGrad.resize( d->Bid().size() );
  tk::destroy(m_esup);
  tk::destroy(m_psup);
  m_esup = tk::genEsup( d->Inpoel(), 4 );
  m_psup = tk::genPsup( d->Inpoel(), 4, m_esup );

  // Update solution on new mesh
  for (const auto& n : addedNodes)
    for (std::size_t c=0; c<nprop; ++c) {
      Assert(n.first < m_u.nunk(), "Added node index out of bounds post-AMR");
      Assert(n.second[0] < m_u.nunk() && n.second[1] < m_u.nunk(),
        "Indices of parent-edge nodes out of bounds post-AMR");
      m_u(n.first,c) = (m_u(n.second[0],c) + m_u(n.second[1],c))/2.0;
    }

  // Update physical-boundary node-, face-, and element lists
  m_bnode = bnode;
  m_bface = bface;
  m_triinpoel = tk::remap( triinpoel, d->Lid() );

  auto meshid = d->MeshId();
  contribute( sizeof(std::size_t), &meshid, CkReduction::nop,
              CkCallback(CkReductionTarget(Transporter,resized), d->Tr()) );
}

The above snippet shows the function that is called by Refiner when it finished mesh refinement. Besides resizing the mesh-related data held locally by ALECG, e.g., ALECG::m_u, etc., we also resize all mesh-related data structures in Discretization. In addition nodal volumes must also be recomputed after mesh refinement. The control flow in all ALECG chares eventually end up with a global synchronization by calling Transporter::resized() - once again if ALE was enabled and mesh refinement has also happened in this time step. The value of ALECG::m_newmesh differentiates what step calls Transporter::resized() (after ALE or after AMR). Transporter::resized() then starts recomputing the volumes and recomputing the left-hand side (LHS).

ALECG::lhs() – Compute LHS

void
ALECG::lhs()
// *****************************************************************************
// Compute the left-hand side of transport equations
//! \details Also (re-)compute all data structures if the mesh changed.
// *****************************************************************************
{
  // No need for LHS in ALECG

  // (Re-)compute boundary point-, and dual-face normals
  norm();
}

As the above ALECG::lhs() code snippet shows, there is no need to compute the left-hand side (LHS) in ALECG. This is because the unknowns stored are the same as the ALECG algorithm solves for, so there is no matrix (not even a diagonal one) on the left-hand side of the system that is solved. However, the function is still called ALECG::lhs() because this is an entry point in all schemes, returning from, e.g., an adaptive mesh refinement step. Also, there are other tasks started from ALECG::lhs(): these are (1) the recomputation of the ALE mesh velocity and (2) the recomputation of the boundary point-, and dual-face normals, due to moving and/or changing the topology of the mesh. (If there was a LHS, it would change if the mesh moved or the topology changed.) These two tasks are independent and so they happen on indepent execution threads. Both threads of execution progress through a series of steps also involving communication. When all tasks are complete, the runtime system continues execution of the chare as specified in the DAG in Inciter/alecg.ci – Structured DAG, in ALECG::merge():

void
ALECG::mergelhs()
// *****************************************************************************
// The own and communication portion of the left-hand side is complete
// *****************************************************************************
{
  // Combine own and communicated contributions of normals
  normfinal();

  if (Disc()->Initial()) {
    volumetric( m_u, Disc()->Vol() );
    // Output initial conditions to file
    writeFields( CkCallback(CkIndex_ALECG::start(), thisProxy[thisIndex]) );
  } else {
    norm_complete();
  }
}

The own and communication contributions to the normal vectors are combined in ALECG::normfinal(). If ALECG::merge() is called during the initial setup, we first output the initial conditions into files then continue to starting the time stepping. If ALECG::merge() is called during time stepping, we wait for yet another thread of execution to complete, as indicated in Inciter/alecg.ci – Structured DAG, in wait4mesh, which resizes the data structures that change after a mesh refinement step.

New time step stage

When all of all necessary data structures have been recomputed after the optional AMR step, execution continues in ALECG::stage(), which starts a new time step stage if it is not the last Runge-Kutta stage.

void
ALECG::stage()
// *****************************************************************************
// Evaluate whether to continue with next time step stage
// *****************************************************************************
{
  transfer_complete();

  // Increment Runge-Kutta stage counter
  ++m_stage;

  // if not all Runge-Kutta stages complete, continue to next time stage,
  // otherwise output field data to file(s)
  if (m_stage < 3) chBndGrad(); else out();
}

If it is the last Runge-Kutta stage, we optionally output results to file then continue to the next time step calling ALECG::next(). Before continuing to a new time step, we optionally perform load balancing as well as saving a checkpoint, then test for the exit condition to see if time stepping is to be continued.

6. Making it all work

Only a couple of minor, but important, steps remain. First we add the new Charm++ module as an external module in inciter's Charm++ module. This is required so that all Charm++ code that references the new ALECG Charm++ chare array is visible and can correctly interact with Inciter's main charm chare.

Main/inciter.ci

$ git diff src/Main/inciter.ci
diff --git a/src/Main/inciter.ci b/src/Main/inciter.ci
index bf7eac98..e9b114b6 100644
--- a/src/Main/inciter.ci
+++ b/src/Main/inciter.ci
@@ -14,6 +14,7 @@ mainmodule inciter {
   extern module partitioner;
   extern module matcg;
   extern module diagcg;
+  extern module alecg;
   extern module dg;
   extern module charestatecollector;

The second, and final, step is to enable triggering the instantiation of specialized CGPDE class objects for our new ALECG scheme when the system of systems is instantiated. This associates the type of generic PDE systems that is used to instantiate the PDE classes, selected by user configuration. Since ALECG will be a node-centered scheme, we assign it to use the CGPDE polymorphic interface (instead of DGPDE, which is tailored for cell-centered discretizations).

PDE/PDEStack.cpp

$ git diff src/PDE/PDEStack.cpp
diff --git a/src/PDE/PDEStack.cpp b/src/PDE/PDEStack.cpp
index 438cb5e3..9b2e14e7 100644
--- a/src/PDE/PDEStack.cpp
+++ b/src/PDE/PDEStack.cpp
@@ -108,7 +108,9 @@ PDEStack::selectedCG() const
   std::vector< CGPDE > pdes;                // will store instantiated PDEs

   const auto sch = g_inputdeck.get< tag::discr, tag::scheme >();
-  if (sch == ctr::SchemeType::DiagCG) {
+  if (sch == ctr::SchemeType::DiagCG || sch == ctr::SchemeType::ALECG) {

     for (const auto& d : g_inputdeck.get< tag::selected, tag::pde >()) {
       if (d == ctr::PDEType::TRANSPORT)
         pdes.push_back( createCG< tag::transport >( d, cnt ) );
       else if (d == ctr::PDEType::COMPFLOW)
         pdes.push_back( createCG< tag::compflow >( d, cnt ) );
       else Throw( "Can't find selected CGPDE" );
     }

   }

7. Augment unit tests for Scheme

Though this is not strictly necessary, we also augment the unit tests of Scheme exercising our new discretization scheme:

$ git diff develop src/UnitTest/TUTSuite.hpp tests/unit/Inciter/TestScheme.cpp
diff --git a/src/UnitTest/TUTSuite.hpp b/src/UnitTest/TUTSuite.hpp
index 191b3972..dd904b02 100644
--- a/src/UnitTest/TUTSuite.hpp
+++ b/src/UnitTest/TUTSuite.hpp
@@ -61,7 +61,7 @@ class TUTSuite : public CBase_TUTSuite {
         { "Base/Factory", 2 }
       , { "Base/PUPUtil", 14 }
       , { "Base/Timer", 1 }
-      , { "Inciter/Scheme", 3 }
+      , { "Inciter/Scheme", 4 }
     };

     // Tests that must be run on PE 0
diff --git a/tests/unit/Inciter/TestScheme.cpp b/tests/unit/Inciter/TestScheme.cpp
index 6dc48c75..e4acfce4 100644
--- a/src/UnitTest/tests/Inciter/TestScheme.cpp
+++ b/src/UnitTest/tests/Inciter/TestScheme.cpp
@@ -84,6 +84,8 @@ void Scheme_object::test< 1 >() {
   ensure_equals( "Underlying type", c.which(), 1 );
   inciter::Scheme d( inciter::ctr::SchemeType::DG );
   ensure_equals( "Underlying type", d.which(), 2 );
+  inciter::Scheme a( inciter::ctr::SchemeType::ALECG );
+  ensure_equals( "Underlying type", a.which(), 3 );
 }

 //! Test if operator[] returns the correct underlying type
@@ -97,6 +99,8 @@ void Scheme_object::test< 2 >() {
   ensure_equals( "Underlying element type", c.which_element(), 1 );
   inciter::Scheme d( inciter::ctr::SchemeType::DG );
   ensure_equals( "Underlying element type", d.which_element(), 2 );
+  inciter::Scheme a( inciter::ctr::SchemeType::ALECG );
+  ensure_equals( "Underlying element type", a.which_element(), 3 );
 }

@@ -162,6 +166,27 @@ void Scheme_object::test< 5 >() {
     inciter::Scheme( inciter::ctr::SchemeType::DG ), 2, "DG" );
 }

+//! Test Pack/Unpack of Scheme holding CProxy_AELCG
+//! \details Every Charm++ migration test, such as this one, consists of two
+//!   unit tests: one for send and one for receive. Both trigger a TUT test,
+//!   but the receive side is created manually, i.e., without the awareness of
+//!   the TUT library. Unfortunately thus, there is no good way to count up
+//!   these additional tests, and thus if a test such as this is added to the
+//!   suite this number must be updated in UnitTest/TUTSuite.hpp in
+//!   unittest::TUTSuite::m_migrations.
+template<> template<>
+void Scheme_object::test< 6 >() {
+  // This test spawns a new Charm++ chare. The "1" at the end of the test name
+  // signals that this is only the first part of this test: the part up to
+  // firing up an asynchronous Charm++ chare. The second part creates a new test
+  // result, sending it back to the suite if successful. If that chare never
+  // executes, the suite will hang waiting for that chare to call back.
+  set_test_name( "Charm:migrate Scheme(ALECG) 1" );
+
+  CProxy_Receiver::ckNew(
+    inciter::Scheme( inciter::ctr::SchemeType::ALECG ), 3, "ALECG" );
+}

Now that we will test ALECG using the unit test harness, UnitTest, we also have to make the UnitTest build target depend on the new ALECG Charm++ module:

$ git diff src/UnitTest/CMakeLists.txt
diff --git a/src/UnitTest/CMakeLists.txt b/src/UnitTest/CMakeLists.txt
index bb740285..e0ea47fe 100644
--- a/src/UnitTest/CMakeLists.txt
+++ b/src/UnitTest/CMakeLists.txt
@@ -48,6 +48,7 @@ add_dependencies("UnitTest" "unittestCharmModule")
 if (ENABLE_INCITER)
   add_dependencies("UnitTest" "matcgCharmModule")
   add_dependencies("UnitTest" "diagcgCharmModule")
+  add_dependencies("UnitTest" "alecgCharmModule")
   add_dependencies("UnitTest" "distfctCharmModule")
   add_dependencies("UnitTest" "dgCharmModule")
   add_dependencies("UnitTest" "discretizationCharmModule")

8. Add new regression tests

Finally, we also add a bunch of new regression tests that stress-test the asynchronous logic in the discretization scheme classes:

$ git diff tests/regression/inciter/transport/SlotCyl/asynclogic/CMakeLists.txt
index b54a207d..62732129 100644
--- a/tests/regression/inciter/transport/SlotCyl/asynclogic/CMakeLists.txt
+++ b/tests/regression/inciter/transport/SlotCyl/asynclogic/CMakeLists.txt
@@ -1,7 +1,7 @@
 # See cmake/add_regression_test.cmake for documentation on the arguments to
 # add_regression_test().

-foreach(scheme matcg diagcg dg)
+foreach(scheme matcg diagcg dg alecg)
   foreach(virt 0.0 0.5 0.9)
     foreach(npes RANGE 1 8)
       add_regression_test(asynclogic_${scheme}_${virt} ${INCITER_EXECUTABLE}