MultiAIM: Fast Electromagnetic Analysis of Multiscale Structures Using Boundary Element Methods

Integral equation methods are extensively used for computational electromagnetism, and can be applied to large problems when accelerated with fast multipole or fast Fourier transform (FFT) techniques. Unfortunately, the efficiency of FFT-based acceleration schemes can be dramatically reduced by the presence of multiscale features. Large triangles will impose a relatively coarse mesh, and large regions where FFT must be replaced by integration. Since many small triangles can fall in this region, integration costs will become prohibitive, diminishing the benefits provided by FFT. We propose an efficient and robust algorithm to overcome this barrier, based on multigrid concept. A hierarchy of grids of different resolution is used to simultaneously resolve subwavelength details and propagate fields efficiently across large distances with the FFT. Integration and precorrection costs are minimized by adapting projection stencils to the size of each triangle and enabling the use of the quasi-static Green’s function for short distances. Finally, a clever implementation based on sparse matrices exploits empty areas to reduce computational cost and memory consumption. The method is fully automated, and was tested on several structures including layouts of commercial products. Compared to existing adaptive integral method (AIM) algorithms, we demonstrate a speed-up between 7.1 and $24.7\times $ and a reduction in memory consumption by up to 2.9 times.

between their largest dimension and their tiniest geometrical feature, makes their design very challenging.A full-wave analysis is extremely time consuming and often unfeasible, even with state of art algorithms and supercomputing facilities.After the initial stages of design, which can rely on simplifying assumptions (such as periodic analysis or homogenization), a full-wave analysis is essential to assess and optimize performance before fabrication.Similar challenges arise in the design of the interconnect networks found in integrated circuits, packages, and printed circuit boards [6], [7], where electromagnetic simulations are indispensable for managing interference, crosstalk, and distortion.Such networks can span several millimeters or centimeters [8] but contain fine features that are on the order of micrometers, such as vias [9].The electromagnetic analysis of these structures is further burdened by their extremely intricate layout, density, and frequencies of operation as high as several tens or even hundreds of Gigahertz.These features make a full-wave electromagnetic analysis often prohibitive.Multiscale problems are known to reduce the efficiency of most algorithms for computational electromagnetism, including volume integral equation methods and surface integral equation methods, which are also referred to as boundary element methods (BEMs) [10].The latter are the focus of this contribution because they are well-suited for antennas and interconnect networks due to their intrinsic ability to model open problems and layered substrates while requiring only a discretization of the scatterers.
The BEM discretizes Maxwell's equations in integral form and results in a dense system of equations to be solved [11].Since a direct solution is prohibitively expensive, an iterative solver is often used with a fast algorithm to compute matrixvector products.The most well-known acceleration method is the fast multipole method (FMM) [12], [13], [14], [15], which calculates matrix-vector products with O(N log N ) complexity.However, the FMM relies on the analytical expansion of the Green's function, a task that becomes less straightforward to perform when a layered substrate is present, which are frequently encountered in metasurfaces, antenna arrays, and interconnect networks [16].In the presence of layered substrates, a kernel-independent method is preferred, such as those based on the fast Fourier transform (FFT), like the adaptive integral method (AIM) [17], [18], [19], [20] and the precorrected-FFT algorithm [21].In these FFT-based methods, a uniform grid is introduced.The current distribution on each mesh element is projected onto equivalent point sources on 0018-926X © 2024 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the uniform grid using a moment-matching method [22].This enables the computation of radiated fields on the uniform grid with an FFT rather than a costly dense matrix-vector product.Finally, fields on the original mesh elements are obtained by interpolation.This strategy reduces the cost of the matrix-vector product in the BEM to O(N 1.5 log N ) operations [17], and avoids an explicit integration of the Green's function.However, these operations are inaccurate for mesh elements that are close to each other.Interactions between the nearby mesh elements must be computed through direct numerical integration of the Green's function.
Unfortunately, multiscale structures can significantly reduce the efficiency of the AIM, as they lead to a mesh with triangle sizes that span several orders of magnitude.In the AIM, the largest mesh element will force the uniform grid to be relatively coarse.Otherwise, the current distribution of the largest triangle cannot be accurately represented as point sources.While a coarse uniform grid will minimize FFT costs, it increases the number of triangle pairs that fall in the near region.Thus, the number of expensive integrations involving the Green's function is also increased [23], [24].As numerical results show, the impact on simulation time can be very dramatic for multiscale structures, where many tiny triangles can fall in the near region [25].
Another disadvantage of FFT-based acceleration methods is that the uniform grid must cover the entire volume occupied by the scatterers, including any empty volume inside or between them.This fact reduces efficiency compared to FMM-based methods, where empty volumes do not increase computational cost.
In this article, we introduce an acceleration technique for the BEM that is applicable to multiscale problems and addresses these challenges, henceforth referred to as MultiAIM.The method is based on four ideas.
1) Adaptive Projection Stencils: The size of the stencils used in AIM to project Rao-Wilton-Glisson (RWG) basis functions onto point sources is adapted (scaled) to the size of each mesh triangle.In this way, both large and small triangles can be projected accurately.Since the uniform grid resolution is no longer constrained by the largest mesh triangle, it can be made significantly finer to minimize near region integrations.2) Multigrid: Since a fine uniform grid would increase FFT costs, we adopt the multigrid approach originally introduced by Brandt [26] for point-like sources but later applied to other basis functions [27].A hierarchy of uniform grids is introduced with a resolution that coarsens by 2× at each level.Fields are translated between levels by interpolation and anterpolation steps.By introducing coarser levels, we are able to propagate fields with FFTs on a grid with a resolution that is solely restricted by wavelength, irrespective of the resolution of the triangular mesh and of the geometrical features of the underlying structure.Earlier works on multigrid concepts applied to electromagnetic simulations were performed by Wang and Chan [28], Shi and Chan [29], and Zhao et al. [30] based on partitioning the object into subcubes using an octree data structure, a method similar to the multilevel FMM [13].Yang et al. [27] applied multigrid methods to accelerate the AIM method in the case of volumetric integral equations.The suitability of these earlier methods for multiscale problems was not investigated.Multigrid concepts were also used to develop better preconditioners [31], [32].Our work: 1) investigates how multigrid techniques perform for multiscale problems; 2) identifies some bottlenecks that limit their efficiency; 3) proposes methods to overcome them; and 4) demonstrates the excellent performance of the resulting algorithm on realistic multiscale problems.3) Quasi-Static Approximation: We show that, in some scenarios, the proposed method reduces the near region to a point where the quasi-static Green's function can be used, for further efficiency and ease of implementation.4) Exploiting Voids and Sparsity: We demonstrate that several matrices in the proposed method are sparse, due to the absence of active sources in the empty regions of the domain.Such sparsity translates into significant computational savings, which can only be exploited by the proposed multigrid approach.In a single-grid method, such as the conventional AIM, sparsity cannot be exploited since it is challenging for FFT algorithms to take advantage of any sparsity in the underlying data [33].When combined, the proposed four contributions lead to computational savings of up to 20× for realistic structures.The paper is organized as follows.In Section II, we introduce the BEM formulation we adopt and review the conventional AIM method.In Section III, we formulate the proposed method.In Section IV, we provide several numerical examples drawn from several applications to demonstrate the accuracy and efficiency of the proposed method in analyzing multiscale structures.

A. Formulation
We consider a set of perfectly conducting objects immersed in a homogeneous medium.The surface S of the objects is discretized into a triangular mesh, with n t triangles and n e inner edges.The goal is to solve Maxwell's equations when the structure is illuminated by an incident field or excited through lumped ports.We use the augmented electric field integral equation (AEFIE) [34] along with surface impedance boundary condition (SIBC) [35] to formulate the problem.The AEFIE is chosen because of its extended low-frequency and multiscale capabilities compared to the EFIE, and because it can be easily excited with lumped ports in addition to an incident field.The SIBC is used to model conductors due to its efficiency and simplicity, and because it can be seamlessly integrated into the AEFIE formulation.However, the proposed method is applicable to essentially any integral formulation.The AEFIE consists of the regular EFIE that is augmented by the continuity equation.As a result, the surface current density J S and surface charge density ρ S are taken as unknowns on S. After discretizing the two equations with the method of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.moments (MoMs) [23], using RWG basis functions for J S and pulse basis functions for ρ s , the discretized AEFIE can be written as where k 0 is the wavenumber, η 0 is the wave impedance, and c 0 is the phase velocity in the medium.In (1), L (A) and L (φ) are the MoM matrices representing the vector and scalar potential parts of the single layer operator.The SIBC is enforced by Z s .The incidence matrix D T ∈ R n t ×n e links the functional space between pulse functions defined on each triangle and RWG basis functions defined on every edge [36].Matrices F and B are used to enforce charge neutrality on each isolated object [34].For plane-wave illumination, given incident field E i , one can use (1) to solve for ρ s and J s .In the case of port excitation, (1) should be rewritten as where I s contains the current injected at each port, and C accounts for their contribution to the continuity equation [37].
It is important to note that in (1) and ( 2), all matrices are sparse, except for L (A) and L (φ) , which are dense.

B. Adaptive Integral Method (AIM)
In the AIM [17], a uniform grid is introduced with resolution (1)  x × (1)  y × (1)  z .Note that here the superscript 1 refers to the uniform grid, and 0 denotes the surface mesh.The MoM matrices L (A) and L (φ) are split into near-region and far-region components as The near-region matrix L N contains the interactions between nearby mesh elements.Numerical integration must be used to compute its entries, but it is computationally expensive as the singularity extraction techniques [23] must be used to deal with singularity and near-singularity in L N .In contrast, the far-region matrix L F contains the interactions between distant mesh elements.Due to the smooth property of the Green's function, this matrix can be approximated as where matrix P (1,0) projects the RWG basis functions and pulse functions defined on the triangular mesh S onto point sources on the uniform grid.Propagation matrix H (1) computes the fields on the grid produced by those point charges.Having a three-level Toeplitz structure, H (1) can be stored very cheaply, and its action on a vector can be computed with a 3-D FFT, which has a much lower computational complexity than using a matrix-vector product.The fields on the uniform grid are interpolated by W (0,1) to determine the fields on the original mesh triangles.Since approximation ( 4) is inaccurate for short distances, its contribution in the near region is removed with the precorrection matrix L c [17].

C. Extended Adaptive Integral Method (AIMx)
The efficiency of AIM is further improved in AIMx [18].In AIMx, the homogeneous Green's function This operation decomposes the L (A) and L (φ) matrices in ( 1) and ( 2) into their frequency-independent parts, L (A) s and L (φ) s containing the singularity and requiring the explicit numerical integration in the near region as in (3), and their frequencydependent parts, L (A) d , which are smooth enough to be approximated by grid-based interpolation and projection in both near and far regions.Denoting with L either L (A)  or L (φ) , we obtain the following expression for their efficient computation: The first term requires direct integration and precorrection due to the singularity of the Green's function.However, being frequency-independent, it has to be computed only once for frequency sweep.As frequency changes, only the second term has to be recomputed.Being smooth, this term can be expeditiously computed with FFTs.It is also noteworthy that, in addition to per-frequency time savings, the CPU time required to generate every entry in the near-region matrix is reduced compared to the conventional AIM.This efficiency gain is achieved by using only the static form of Green's function as the integral kernel, which is faster than the fullwave Green's function as it avoids the need to evaluate the exponential function.

D. Challenges for Multiscale Layouts
The mesh of a multiscale layout will unavoidably contain triangles of very different sizes, a scenario that undermines AIM's efficiency.The degradation is intrinsic to how the AIM works, and cannot be remedied by choosing a different resolution of the uniform grid.The largest mesh triangle will impose a relatively coarse resolution for the uniform grid, since each triangle must be contained in the stencil of points used by the projection matrix P (1,0) .The near-region associated with the largest triangles will be fairly large, contain many small triangles, and lead to dense blocks in L N , resulting in prohibitive integration and precorrection costs.Increasing grid resolution will reduce such costs but compromise accuracy, since the biggest triangles will be larger than the stencil of points meant to represent their far-region contribution, as shown in the left panel of Fig. 1.Moreover, it will increase FFT costs.

III. PROPOSED METHOD
The proposed method overcomes the efficiency issues of AIM for multiscale structures by enabling, simultaneously, the use of as follows.
1) A fine uniform grid to resolve tiny features and minimize integration and precorrection costs.2) A different and coarsened grid with an optimal resolution that minimizes the cost to propagate fields with FFTs, even for electrically large structures.

A. Adaptive Stencils
Large triangles are the obstacle to using a finer uniform grid, as shown in Fig. 1(a).This is because in the conventional AIM, stencils only include consecutive grid points along each dimension.In order to fully enclose large triangles, an unrealistically high polynomial order would be required, increasing computational cost and potentially causing numerical issues, such as Runge's phenomenon.
To overcome this issue, we propose to adapt the projection stencils to the size of each mesh triangle, as shown in Fig. 1(b).Rather than always using consecutive grid points, the stencil is scaled along each dimension by an integer scaling factor α x , α y , and α z .For the ith triangle, the position of the jth grid nodes ξ i, j in its stencil is now controlled by another parameter α.Taking the x direction as an example, the x coordinate ξ i, j x of ξ i, j is expressed as x , j = 0, . . ., M where α x controls the scaling behavior in the x direction.Position c i x is the x coordinate of the node nearest to the center of ith triangle.There are in total, M + 1 grid points along the x direction.A M-order polynomial is used to interpolate the RWG basis function.Here, the order M is selected to specifically capture the varying behavior of the oscillatory Green's function [17].Determining which pairs of mesh elements should be considered as near-region elements is different from the conventional AIM and depends on the grid resolution and also on the size of the adaptive stencil used.The algorithm to find the near-region mesh elements for a given adaptive stencil onto which the source triangle is projected is provided in Algorithm 1. Fig. 2 also shows that if the stencils onto which test triangles are project share grid points with the near-region grid points for a given source stencils, the test triangles are deemed as near-region elements.

B. Multiple Grids
Instead of propagating fields on the finest grid, which requires a relatively small grid spacing, a hierarchy of L uniform grids is introduced, as in multigrid algorithms [26], [27], with the ultimate objective of being able to control the resolution of the coarsest grid where fields are propagated.The scenario of L = 2 grids is illustrated in Fig. 3, where the grid resolution coarsens by a factor of two at each level.Letting (l) x,y,z be the resolution of the level l grid, we have x,y,z (8) for l = 1, . . ., L −1.With multiple grids, the far-region matrix L F can be approximated as Comparing ( 4)-( 9), we see that the propagation matrix H (1)  has been replaced by a new propagation matrix H (1) which Fig. 3. Schematic representation of the multigrid formulation using a simple example consisting of two grids.The workflow initiates with triangular mesh, succeeded by projection of source currents, field propagation, and field interpolation.
uses the multiple grids to propagate fields.This new propagation matrix is defined recursively as [26] where P (l+1,l) is a projection matrix that projects point sources defined on level l onto point sources in level l + 1, as shown in the upper row of Fig. 3. Sources are projected recursively up to the coarsest level, where propagator H (L) computes the field they produce on the level L grid.The action of this propagator can be computed quickly with FFTs [26], as in the conventional AIM, with the difference that the FFTs are performed on a much coarser grid.Finally, fields are interpolated, recursively, from the coarsest grid l = L to the finest grid l = 1 using pth-order Lagrange interpolation [26], as shown in the second row of Fig. 3.Note that the order p can be different from the interpolation order M used in generating P (1,0) .Interpolation is implemented through the multiplication by the sparse matrix W (l,l+1) .The sets of interpolation coefficients can be calculated in O( p 3 ) operations, due to the translational invariance of the uniform grids.Therefore, W (l,l+1) can be set up efficiently using precomputed values [26].Meanwhile, P (l+1,l) is simply the transpose of W (l,l+1) .Since the Green's function exhibits a singularity when the observation grid point n and source grid point n ′ coincide, polynomial interpolation becomes inaccurate for fields computed in the near region of source points.As a result, a correction term H (l) c is needed to subtract the fields at grid l (l = 1, . . ., L − 1) obtained through the new propagation matrix H (l) and add the fields that propagate directly from n ′ to the n through the old propagation matrix H (l) .The correction matrix, which is nonzero only in the near region, is defined as follows: where d {x,y,z} is the maximum distance along each axis between n ′ and n, and r l is a threshold with a default value set to 1.Note that, similar to generating W (l,l+1) , H (l) c can also be efficiently generated by accessing entries from a precomputed table, owing to the translational invariance of the free space Green's function.In Section IV-A, we will show how the number of grid levels L and the resolution of the grids can be chosen to ensure adequate accuracy and minimize computational costs.

C. Quasi-Static Approximation
When using a fine grid to minimize the size of the near region, the grid spacing at the finest level becomes quite subwavelength, and the near region becomes electrically small.In some cases, the near region becomes so small that the numerical integration to compute L N in (3) can be performed Algorithm 1 Identification of Near-Region Grid Points 1: Input: Distance threshold d N , projection matrix P (1,0) , source stencil and its scaling parameters α. 2: Output: Index set I t contains the indices of near-region elements.3: Initialize I g = ∅ 4: Get the index of the node ξ = ξ x , ξ y , ξ z closest to the center of the source stencil.5: Let for j = ξ y − d y to j = ξ y + d y do 10: Add (i, j, k) to I g 12: end for Let i be the index of g 18: Define C i as the set of nonzero column indices in i-th row of P (1,0)   19: for every element e in C i do end for 24: end for 25: return I t using the quasi-static approximation (QSA) of the Green's function The use of the quasi-static form of the Green's function accelerates the numerical integration by avoiding the need to evaluate the exponential function in the full-wave Green's function, further enhancing the computational efficiency to set up near-region matrices L N and L P in (3) and (4).A detailed numerical investigation with and without this approximation will be provided in Section IV-F, demonstrating when this approximation can be used, and its computational benefits.

D. Exploiting Sparsity
In the BEM, basis functions reside only on the surface of the scatterers, and are then projected into point sources located on the nodes of the uniform grid surrounding the triangle.As a result, only few nodes of the uniform grid will contain active point sources.Most nodes located inside the scatterers or in the empty space between them will not be utilized, making matrices P (1,0) and W (0,1) quite sparse.In conventional AIM, this sparsity can be exploited for generating and applying those matrices, but not in propagating the fields, since H (1) is dense, and FFT is most efficiently performed on a dense array.This is Algorithm 2 Identification of Active Grid Points 1: Input: W (l−1,l) , for l = 1, 2, . . ., L 2: Output: Sets S l , containing the index of all the active points for each level l 3: Initialize S 0 to contain the index of all the mesh triangles 4: for l = 1 to L do for every element e in C i do 9: if e / ∈ S l then end for 13: end for 14: end for in contrast with FMMs [12], [13], [14], which do not require operations nor memory for empty regions.
We demonstrate how empty regions and sparsity can be exploited in the proposed method at all grid levels from l = 1 to l = L − 1, with the only exception of level l = L where FFTs are performed.We denote as active grid points, the nodes of the uniform grids with a nonzero point source.Only these grid points which belong to a subset of uniform grid require projection, interpolation, and correction of the propagation matrix entries.Such savings become practically achievable only if one is able to identify and exclude those points in an efficient manner, with minimal overhead and increase in code complexity.We propose an efficient solution based on sparse matrices, which are readily available in many high-performance computing libraries [38].
To identify active grid points, we first fill the matrix W (l,l+1) for each level with no assumptions on their location.Since the matrix is allocated as sparse, inactive grid points do not result in any additional memory consumption.Next, active grid points are identified with Algorithm 2, by finding the nonzero entries in each row of W (l,l+1) .Any sparse matrix library can perform this operation very efficiently.In our case, we used the MatGetRows function in PETSc [38].At the coarsest level l = L, all grid points are marked as active since their value will be required by the FFT.Using the set of active grid points, all unnecessary rows and columns in the matrices in ( 9) and ( 10) can be eliminated.The corresponding vectors of source and field coefficients are compressed in the same way.Overall, as numerical results will show, this procedure will result in substantial CPU time and memory savings, especially at the lower levels, where sparsity is very high given the high resolution of the grids.Due to the use of sparse matrix functionalities, the proposed method can exploit sparsity quite efficiently while requiring a relatively low coding effort.

IV. NUMERICAL RESULTS
The accuracy and efficiency of the proposed MultiAIM method are demonstrated through several test cases.For problems requiring a frequency sweep, the proposed method is Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.λ and N lc .The number of levels L to meet ( 14) is reported on the right axis.
applied to AIMx rather than AIM, and is denoted as Multi-AIMx.Both MultiAIM and MultiAIMx were implemented in C++.For sparse matrix manipulation, we utilized PETSc [38].The GMRES iterative solver [39], also available in PETSc, was employed to solve (1) and ( 2) with a relative residual norm of 10 −6 serving as the convergence tolerance.We use a constraint preconditioner [34] to accelerate convergence of GMRES.The maximum iteration count was set to 1000.All simulations were performed in a single-threaded environment on a 3 GHz Intel Xeon CPU.

A. Choice of Grid Resolution and Number of Levels
First, we investigate the influence of grid resolution and the number of levels L on the accuracy and performance of the proposed method, with the goal of determining reliable criteria to automatically choose all algorithm parameters.For this investigation, we consider two structures.
1) A PEC sphere with a 1 m diameter and mean edge length of the mesh (λ/40), with an incident planewave at 300 MHz.
2) The metagrating in Section IV-C.In the proposed method, two grid parameters have to be chosen: the resolution of the l = 1 grid and the number of levels L.
The purpose of the finest grid is to properly resolve the sources and fields on the original triangular mesh (l = 0) used Fig. 5.As in Fig. 3, but for the metagrating, described in Section IV-C. to discretize the structure.Therefore, we relate its resolution to the average mesh length l c of the triangular mesh as (1) {x,y,z} = N l c l c (13) where N l c is a constant to be determined, which represents the average number of grid points per mesh edge.
The number of levels L ultimately controls the resolution of the coarsest grid (l = L) where the FFTs are performed.To minimize the FFT costs while ensuring adequate accuracy, we choose the lowest L such that ensuring that there are at least N λ coarse cells per wavelength, as suggested in [27].Figs. 4 and 5 show the error in the computation of the bistatic radar cross section (RCS) and the CPU time required for the two structures under investigation as a function of the two control parameters N l c and N λ .The chosen number of levels L is also reported.From Figs. 4(a) and 5(a), we see that N λ should be chosen between 10 and 30 coarse grid points per wavelength.Below this range, accuracy is affected, as visible from the top panels.Above this range, CPU increases due to raising FFT costs, which is only justified if there are stringent accuracy requirements.From the top panel, we see that the number of fine grid points per mesh edge has practically no influence on accuracy.However, it does have a strong impact on CPU time, which is reported in the lower panels and follows the expected parabolic behavior [21].If N l c is too  small, near-region integrations become too expensive.On the other hand, if N l c is too high, the excessive refinement of the l = 1 grid will unnecessarily increase the number of levels and the corresponding projection and interpolation costs.Overall, we see that N l c = 3 minimizes CPU time.The tests performed in this section were performed also on the other structures examined in this article with similar observations.Based on these tests, we recommend N λ = 20 and N l c = 3 on a general basis.These settings were used for all examples in this section.

B. Sphere With Multiscale Mesh
We consider a sphere with a diameter of 1 m at a frequency of 300 MHz as shown in Fig. 6.The sphere is illuminated by an x-polarized uniform plane wave propagating along the z-direction.The sphere is meshed in a nonuniform manner, with a mean edge length of λ/1000 near the south pole which transitions to a coarse λ/10 mesh near the north pole.The resulting mesh leads to 14 868 unknowns.The multiscale factor, which is the ratio between the largest edge length and the smallest edge length, is 56.2.The proposed method used an initial grid with M = 2 to project the surface mesh, and L = 4 levels of grids with second-order interpolation p = 2 used to project the sources onto coarser levels.The level L is selected based on the criterion discussed in Section IV-A.The bistatic RCS results were computed using the proposed method, and compared to the Mie solution [40], as shown in Fig. 8.
Geometry of serpentine gratings and multiscale mesh used, as described in Section IV-C.Table I compares the performance for different methods when simulating the sphere with multiscale mesh.As indicated in the first row, the efficiency of AIM [17] is significantly hindered by the simultaneous presence of small and large triangles in multiscale meshes, which enforces an FFT grid spacing no less than λ/30.This leads to excessive integration and precorrection time.If the grid resolution is increased, FFT costs become very high and accuracy is compromised, since large triangles are not well approximated by small projection stencils, as shown in the second row in Table I.In all tables, grid resolution (1)  x , (1)  y , (1)   z is reported in terms of the wavelength λ calculated at the highest frequency of analysis.
For the proposed method, Table I considers three cases to show the advantage brought by each idea proposed in this paper.Comparing the first, second, and third row in the table, it is evident that with adaptive stencils, the resolution of the uniform grid can be significantly increased to reduce the near-region size without experiencing convergence issues that result from multiscale mesh elements.This results align well with our analysis in Section III-A.As a result, the integration and precorrection costs associated with the near region are greatly reduced, while accuracy is maintained.However, the total CPU time remains high due to increased FFT costs.If multiple grids are introduced (fourth table row), such costs are also reduced, leading to a substantial reduction in total CPU time.Further gains can be achieved with the sparsity exploitation described in Section III-D.This is the case of the last row in the table, which considers the proposed method in its entirety.Overall, the proposed method is 13.8× faster than the AIM and 2.9× more memory efficient, while delivering comparable accuracy.

C. Metagrating
The second example features a realistic metagrating structure designed to control diffraction patterns [41] and is illustrated in Fig. 8.This structure comprises numerous tiny, inductive serpentines positioned above a large ground plane spanning multiple wavelengths.Conductors are approximated Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I SIMULATION PARAMETERS, CPU TIME, MEMORY USAGE, AND ERROR FOR THE MULTISCALE SPHERE IN SECTION IV-B TABLE II SIMULATION PARAMETERS, CPU TIME, MEMORY USAGE, AND ERROR FOR THE METAGRATINGS IN SECTION IV-C
Fig. 9.
of Bistatic RCS calculated from the AIM and the proposed method at 10 GHz for metagratings in Section IV-C.Fig. 10.
Surface current density at 10 GHz for the metagratings (Section IV-C), computed by the proposed method.
as PECs and embedded in free space.The structure is illuminated by an x-polarized plane wave propagating from the −z direction at normal incidence at 10 GHz.The structure is 6λ long and 2λ wide.A multiscale triangular mesh with 124 020 triangles and 186 030 edges is used, which features a characteristic length of l c = λ/80 for the gratings and l c = λ/10 for the ground plane.The multiscale factor is 33.5.The proposed method is applied to the augmented EFIE formulation as shown in ( 1).An initial grid was used with M = 1 to project the surface mesh, and L = 4 levels of grids (L = 4) were established based on the criterion discussed in Section IV-A.Second-order interpolation p = 2 was used to project the sources onto coarser levels.
The bistatic RCS obtained using the proposed method matches the results from the AIM with high accuracy, as shown in Fig. 9. Fig. 10 also demonstrated the results of surface current density.Table II shows the CPU time, memory consumption, and accuracy of AIM and the proposed method.As indicated in the first row, the AIM fails to achieve high efficiency regarding the near-region computation, due to the use of large grid spacing enforced by large triangles, while the proposed method achieves a total speed up of 7.1 times and reduces memory usage from 28.7 to 13.8 GB.

D. Interconnect Network on an Integrated Circuit Package
The next example involves a section of an integrated circuit package, provided as part of a finite element solver (Ansys HFSS).This four-port copper structure is discretized using 31 718 triangles and 47 577 edges, and is placed in free space.At one end, there are bond wires that are excited by lumped ports, shown in Fig. 11.The opposite ends terminate in ports at solder balls.The conductors are modeled using the SIBC, and the system of equations is given in (2).This example illustrates a typical scenario in electronic design, where detailed and accurate electromagnetic simulations are required to ensure the performance and reliability of high-speed integrated circuits.
This structure exhibits multiscale characteristics, as it comprises both large ground planes and small vertical vias and bond wires.The multiscale factor is 85.0.We applied the proposed method to the augmented EFIE formulation with port excitation for 20 evenly spaced frequency points from 200 MHz to 2 GHz.The proposed method used a grid with M = 1 to project the sources defined on triangles onto the   projection stencils, and L = 5 levels of grids were established using second-order interpolation p = 2 to project the point sources onto coarser levels.The AIM, AIMx, and the proposed method were all used to simulate the IC package in order to compare their accuracy and performance.In Fig. 12, the S 11 and S 12 parameters obtained from the three methods match well with each other over the whole frequency band of interest.From Table III, we see that AIMx [18] is able to reduce integration and precorrection costs, but not the FFT costs incurred during the iterative solution process.The proposed method, instead, is able to reduce all cost factors considerably, and is overall 19.5× faster than conventional AIM.The proposed method also reduces memory usage from 8.7 to 6.0 GB.

E. Bus From a Commercial Integrated Circuit
We consider a portion of a high-speed communication bus from a commercial integrated circuit (courtesy of Advanced Micro Devices).The structure is made up of an intricate network of wires and planes, containing numerous conductors of different sizes and shapes.The structure is meshes with 572 204 triangles, 858 306 edges, and has 65 distinct objects.All objects are made of copper with σ = 4.5 × 10 7 S/m and placed in a homogeneous background.The average edge element length is 2.44 µm.The excitation consists of four lumped ports, which were excited at ten frequency points spaced linearly from 10 MHz to 50 GHz.To investigate the low-frequency behavior of the proposed method, an additional analysis at ten frequency points spaced logarithmically from 1 kHz to 50 GHz was performed.All conductors were modeled using the SIBC, and the system of equations is given in (2).
The proposed method used a grid with M = 1 to project the surface mesh, and L = 5 levels of grids were applied using a second order interpolation p = 2 to project sources onto coarser levels.The scattering parameters S 11 and S 12 obtained from the proposed method match well with the AIM and AIMx across the entire 10 MHz to 50 GHz bandwidth, as shown in the upper panel of Fig. 13.In addition, the lower panel of Fig. 13 shows that the results from the proposed method closely follow the results provided by AIM and AIMx down to 1 kHz, a frequency where the size of the structure is smaller than 10 −8 times the wavelength.Finally, Tables IV and V reports the CPU time and memory usage required by the three methods for the 10 MHz to 50 GHz analysis and 1 kHz to 50 GHz analysis, respectively.We see that the proposed method is 16.8× faster than the AIM for 1 to 50 GHz analysis, and 24.7× faster for the 1 kHz to 50 GHz analysis, while reducing memory consumption from about 118 to 72.9 GB.

F. Comparison of Simulation Results With or Without Quasi-Static Approximation
Here, we further investigate the validity and benefits of QSA, by comparing AIM, MultiAIM with or without QSA, and MultiAIMx.
For the first two examples, the data shown in Tables VI-IX suggests that QSA is not recommended to use with AIM, as the   coarse grid utilized by AIM will inevitably lead to a relatively large near region, with electrical size larger than λ/10, where the QSA is not valid.This is shown by the red dashed line in Figs.14-17.In contrast, since the proposed MultiAIM method significantly reduces the electrical size of the near region, it enables the use of the QSA in a number of scenarios.For the sphere and metagratings case, Tables VI, VIII and Figs.14,16 show that the QSA can be used at the lower frequency when the near region size is of the order of λ/50, with a small reduction in accuracy and some noticeable computational savings.MultiAIMx also uses QSA for near-region integrations, but adds some frequency-dependent contributions through the FFT.For this reason, it is not only slightly more accurate but also slightly more expensive.The last two examples, the integrated circuit package and bus, are very interesting since their layout leads in the proposed method to a near region which is extremely subwavelength (∼λ/4500 for integrated circuit package and ∼λ/8000 for bus).In these cases, Tables X, XI and Figs.18,19 show that the QSA approximation can be used in the proposed method with no accuracy penalty and a 3.14× speed up compared to MultiAIM without QSA, and simpler implementation than MultiAIMx.
The reduction in integration time is particularly noticeable in both cases.From these results, we can draw the conclusion that the proposed method, compared to AIM, significantly increases the number of scenarios where the QSA can be used.This is possible when frequency is such that the near region size is of the order of λ/50 or smaller.Above this threshold, the frequency-dependent part of the Green's function must be taken into account, and the most efficient way to do so is with MultiAIMx.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

G. Low-Frequency Analysis
In this section, we analyze the behavior of the proposed MultiAIM method at low frequencies, and compare it to the behavior of the AEFIE formulation itself, solved with a direct solver without any acceleration scheme.We consider the sphere with multiscale mesh of Section IV-B, since an analytical solution is available.Its bistatic RCS is computed at 1 MHz, 100 kHz, 10 kHz, 100 Hz, and 10 Hz.In Fig. 20, we see that both the proposed method and the AEFIE provide accurate results at all frequencies from 1 MHz down to 10 kHz.At 100 Hz, the proposed method starts to perform worse than the other two around 2.1 radians.Fig. 21 shows that both the proposed MultiAIM method and the AEFIE formulation alone become inaccurate at 10 Hz.These results, and the low-frequency analysis performed in Section IV-E, demonstrate that the proposed MultiAIM acceleration method does not alter the low-frequency capabilities of the AEFIE formulation it is applied to.The discrepancies seen in Fig. 21 originate from the low-frequency breakdown of the AEFIE Fig. 20.
Bi-static RCS at 1 MHz, 100 kHz, 10 kHz, and 100 Hz for sphere with multiscale mesh in Section IV-B.The reference solution is calculated with a Mie series approach, which is compared to solutions from the AEFIE with and without the proposed MultiAIM method.Fig. 21.
Bistatic RCS at 10 Hz for sphere with multiscale mesh in Section IV-B.The reference solution is calculated with a Mie series approach, which is compared to solutions from the AEFIE with and without the proposed MultiAIM method.
itself, which happens at much lower frequencies than the EFIE, and is well documented [42].

V. CONCLUSION
We proposed a multiscale algorithm for the fast electromagnetic analysis of complex electromagnetic structures using surface integral equation methods.The methodology can be applied to other integral equation formulations, including volumetric ones.The method is designed to remain efficient even for multiscale problems, unlike existing methods based on FFTs.The efficiency stems from a multigrid extension of the AIM coupled with projection stencils that are adapted to the disparate sizes of mesh triangles.A very effective strategy is devised to exploit empty regions in the simulation domain and the induced sparsity on the matrices and vectors used by the proposed method.With a study on the influence of algorithm parameters on accuracy and performance, we provide clear rules to choose all algorithm parameters, making the method fully automated.Numerical tests show that the method is between 7.1 and 19.5× faster than the AIM, and up to 2.9× more memory efficient, while providing comparable accuracy.Robustness is demonstrated by analyzing the layout of a commercial product.

Fig. 1 .
Fig. 1.Illustration of (a) stencils used in conventional AIM and (b) adaptive stencils used in the proposed method.

Fig. 2 .
Fig. 2. Two-dimensional illustration of algorithm 1.The source stencil is enclosed by a red dashed line.Here, M = 2, d N = 2, α x = α y = 1.The blue dots denote the near-region grid points (enclosed by a blue dashed line), and the red dots are the nodes in the projection stencils onto which the triangles are projected.If the stencils onto which test triangles are project share grid points with the near-region grid points for a given source stencil, the test triangles are deemed as near-region elements.
as the set of nonzero column indices in the i-th row of W (l−1,l) 8:

Fig. 4 .
Fig. 4. Bistatic RCS error (top panel) and total CPU time (bottom panel) for the proposed method for a 1λ PEC sphere, as a function of parameters N (L)λ and N lc .The number of levels L to meet (14) is reported on the right axis.

Fig. 6 .
Fig. 6.Geometry and surface current density at 300 MHz for the multiscale sphere described in Section IV-B.

Fig. 7 .
Fig. 7. Accuracy validation of Bistatic RCS at 300 MHz for multiscale sphere in Section IV-B.The standard reference is calculated using a Mie series solution which is compared to the results generated by the proposed method.

Fig. 7 .
Fig. 7.The results show excellent agreement between all the methods.TableIcompares the performance for different methods when simulating the sphere with multiscale mesh.As indicated in the first row, the efficiency of AIM[17] is significantly hindered by the simultaneous presence of small and large triangles in multiscale meshes, which enforces an FFT grid spacing no less than λ/30.This leads to excessive integration and precorrection time.If the grid resolution is increased, FFT costs become very high and accuracy is compromised, since large triangles are not well approximated by small projection stencils, as shown in the second row in TableI.In all tables, grid resolution(1)  x ,(1)  y ,(1)

Fig. 11 .
Fig. 11.Geometry and surface current density at 1 GHz for the package interconnect described in Section IV-D.

Fig. 12 .
Fig. 12.Comparison of the accuracy of S-parameters S 11 and S 12 from 200 MHz to 2 GHz for the HFSS IC Package discussed in Section IV-D, which are calculated from the AIM, AIMx and the proposed method.

Fig. 13 .
Fig. 13.S-parameters of the integrated circuit bus in Section IV-E plotted from 10 MHz to 50 GHz (upper panel), and from 1 kHz to 50 GHz (lower panel).

Fig. 14 .
Fig. 14.Accuracy validation of bistatic RCS at 100 MHz for the sphere with multiscale mesh described in Section IV-B.The reference result is calculated using a Mie series solution which is compared to the results generated by different methods with or without QSA.

Fig. 15 .
Fig. 15.Accuracy validation of bistatic RCS at 300 MHz for the sphere with multiscale mesh described in Section IV-B.The standard reference is calculated using a Mie series solution which is compared to the results generated by different methods with or without QSA.

Fig. 16 .
Fig. 16.Comparison of bistatic RCS calculated from the AIM and different methods with or without QSA at 3 GHz for the metagrating in Section IV-C.

Fig. 17 .Fig. 18 .
Fig. 17.Comparison of bistatic RCS calculated from the AIM and different methods with or without QSA at 10 GHz for the metagrating in Section IV-C.

Fig. 19 .
Fig. 19.S-parameters of S 11 and S 12 from 10 MHz to 50 GHz for the integrated circuit bus, described in Section IV-E.
Initialize I t = ∅ 16: for each grid point g ∈ I g do

TABLE III SIMULATION
PARAMETERS, CPU TIME, MEMORY USAGE, AND ERROR FOR THE INTEGRATED CIRCUIT PACKAGE IN SECTION IV-D

TABLE IV SIMULATION
PARAMETERS, CPU TIME, MEMORY USAGE, AND ERROR FOR THE 1 GHz TO 50 GHz ANALYSIS OF THE BUS FROM A COMMERCIAL INTEGRATED CIRCUIT DESCRIBED IN SECTION IV-E

TABLE V ,
BUT FOR THE 1 kHz TO 50 GHz ANALYSIS

TABLE VI SIMULATION
PARAMETERS, CPU TIME, AND ERROR FOR THE SPHERE WITH MULTISCALE MESH AT 1 GHz IN SECTION IV-B TABLE VII SIMULATION PARAMETERS, CPU TIME, AND ERROR FOR THE SPHERE WITH MULTISCALE MESH AT 3 GHz IN SECTION IV-B TABLE VIII SIMULATION PARAMETERS, CPU TIME, AND ERROR FOR THE METAGRATING AT 3 GHz IN SECTION IV-C TABLE IX SIMULATION PARAMETERS, CPU TIME, AND ERROR FOR THE METAGRATING AT 10 GHz IN SECTION IV-C

TABLE X SIMULATION
PARAMETERS, CPU TIME, MEMORY USAGE, AND ERROR FOR THE INTEGRATED CIRCUIT PACKAGE IN SECTION IV-D TABLE XI SIMULATION PARAMETERS, CPU TIME, AND ERROR FOR THE BUS FROM A COMMERCIAL INTEGRATED CIRCUIT IN SECTION IV-E